Package idea: attempt

Hi all,

I’m tired of this idiom:

max = 3
begin
Timeout.timeout(val){
# some op that could fail or timeout on occasion
}
rescue Exception
max -= 1
if max > 0
sleep interval
retry
end
raise
end

Mark Fowler wrote a Perl module called “attempt”
(Attempt - attempt to run code multiple times - metacpan.org) that I think
is pretty handy, and I would like this for myself. I figure the API
should look like this:

1st arg is retries, 2nd arg is interval

attempt(3, 300){
FTP.open(host, user, passwd){ … }
}

Here’s my possibly naive implementation:

require ‘timeout’

module Kernel
def attempt(tries = 3, interval = 60, timeout = nil)
begin
if timeout
Timeout.timeout(timeout){ yield }
else
yield
end
rescue
tries -= 1
if tries > 0
sleep interval
retry
end
raise
end
end
end

What do you think? Useful? Are there any gotchas I need to consider,
such as nested begin/end blocks, try/catch? Anything else? Should I
provide some way to provide debug info? Finer grained error handling?

Ideas welcome.

Thanks,

Dan

On Fri, 9 Jun 2006, Daniel B. wrote:

We had a bug in a system that did something like this so it failed
literally 99 times out of a hundred.

Since we had a fast retry we only noticed the bug when I went hunting
another bug and went around inserting logging statements everywhere and
found the retry / fail producing the massive stream of BLAH failed
retrying messages.

Fixed that bug and suddenly system a lot faster / more stable…

Moral of the Story :

Unlogged / unreported retries mask bugs, always log / report number
of
retries.

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

Carter’s Clarification of Murphy’s Law.

“Things only ever go right so that they may go more spectacularly wrong
later.”

From this principle, all of life and physics may be deduced.

John C. wrote:

Fixed that bug and suddenly system a lot faster / more stable…

Moral of the Story :

Unlogged / unreported retries mask bugs, always log / report number of
retries.

Yes, that is a potential issue. It occurred to me that errors that
would normally be ignored could/should be emitted as warnings. That
way, if there’s an obvious problem with your code, you’ll see it right
away, assuming you’re running from the command line (or have some other
way of monitoring stderr).

Regards,

Dan

John C. wrote:

Moral of the Story :

Unlogged / unreported retries mask bugs, always log / report number
of retries.

Also, beware of retries in multiple levels of protocol stack. I’ve
heard stories of system that retried the lowest level of a protocol 3
times with a 30 second timeout (total 90 second timeout). The next
layer above that added its own 3 tries (now we have 4 1/2 minutes before
timeout failure). The next several layers also did retries, with the
end result taking hours to time out.

Moral of story: Don’t add retries indiscriminately.

– Jim W.

Jim W. wrote:

timeout failure). The next several layers also did retries, with the
end result taking hours to time out.

Moral of story: Don’t add retries indiscriminately.

– Jim W.

Yep, definitely something to watch out for. What can I say? Use with
caution. :slight_smile:

  • Dan

On Fri, 9 Jun 2006, Daniel B. wrote:

Yep, definitely something to watch out for. What can I say? Use with
caution. :slight_smile:

for what it’s worth have my own version of attempt in a few
near-real-time
systems where the overriding principle is : keep going at all costs. in
these
systems the ‘fail big and fail early’ priciple doesn’t work unless one
enjoys
working on sundays - so i’ve got lots of stuff like attempt - it all
logs to
stderr and/or logs however, so it doesn’t go unnoticed.

on another note i’ve found that incremental sleep increse with reset is
almost
always what you want. retrying on the same interval seems to clog up
systems
as you get in certain timing rythyms. in rq i use this alot

http://codeforpeople.com/lib/ruby/rq/rq-2.3.3/lib/rq-2.3.3/sleepcycle.rb

it’s a cycle that looks like a sawtooth wave - so basically on each
retry we
timeout for longer than before, essentially becoming more and more
‘patient’
before getting really ‘impatient’ again.

i’ve found this matched the real world pretty well since timing out a
bunch in
a short period normally means you should wait longer.

cheers.

-a

kate rhodes wrote:

The following is incredibly nitpicky I admit but I figure I may as
well mention it.
The line
Timeout.timeout(timeout){ yield }

Is it just me or is that a lot of “timeout” Hurts a strangers
understanding of the code. Why not change the name of the passed in
timeout var to user_timeout or anything else that isn’t just ‘timeout’

  • kate = masukom

Heh, I suppose it might be. I could change that.

I remember, back in the 1.6.x days, when “timeout” was a top level
method and I had a variable called “timeout” in my code. That took a
while to track down. :slight_smile:

Regards,

Dan

The following is incredibly nitpicky I admit but I figure I may as
well mention it.
The line
Timeout.timeout(timeout){ yield }

Is it just me or is that a lot of “timeout” Hurts a strangers
understanding of the code. Why not change the name of the passed in
timeout var to user_timeout or anything else that isn’t just ‘timeout’

  • kate = masukom

[email protected] wrote:

working on sundays - so i’ve got lots of stuff like attempt - it all

cheers.

-a

Hm, interesting. Maybe a more advanced version would use a full fledged
class with lots of options. Something like this:

attempt = Attempt.new{ |a|
a.tries = 3 # Try 3 times
a.interval = 30 # 30 seconds between tries but…
a.max = 90 # In case of nested retries
a.increment = 10 # add 10 seconds to the interval with each
try
a.log = log # Where ‘log’ is an IO handle
a.warnings = $stderr # Send caught errors to IO handle as warnings
}

attempt{ # Some op }

Attempt#max would, in theory, be used to prevent Jim W.'s nightmare
scenario, where you have a bunch of nested retries, all doing their own
sleep + retry thing.

So, using the above example, if I did something like this:

attempt{
begin
# some op
rescue
sleep 500
retry
end
}

It would error out at 90 seconds no matter what (the value we set to
‘max’). I’m not sure if that’s possible, however, or even how you would
implement it. Thoughts?

  • Dan

On Sat, 10 Jun 2006, Daniel B. wrote:

It would error out at 90 seconds no matter what (the value we set to ‘max’).
I’m not sure if that’s possible, however, or even how you would implement it.
Thoughts?

something like:

def done
synchronize(:SH){ @done }
end

def done=d
synchronize(:EX){ @done=d }
end

def ensure_max!
@max ||= Thread.new(max, Thread.current) do |m,c|
sleep max
c.raise MaxError unless done
end
end

def attempt

ensure
@max.kill
end

or something like that :wink:

-a

[email protected] wrote:

synchronize(:SH){ @done }
end

-a

Hm…this has potential. I might be asking you for some help in the
future.

Thanks,

Dan

On Sat, 10 Jun 2006, Daniel B. wrote:

Hm…this has potential. I might be asking you for some help in the future.

sure thing dan. just ping me offline.

cheers.

-a