Forum: Ruby Package idea: attempt

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-06-09 00:40
(Received via mailing list)
Hi all,

I'm tired of this idiom:

max = 3
begin
    Timeout.timeout(val){
       # some op that could fail or timeout on occasion
    }
rescue Exception
    max -= 1
    if max > 0
       sleep interval
       retry
    end
    raise
end

Mark Fowler wrote a Perl module called "attempt"
(http://search.cpan.org/~markf/Attempt-1.01/lib/Attempt.pm) that I think
is pretty handy, and I would like this for myself.  I figure the API
should look like this:

# 1st arg is retries, 2nd arg is interval
attempt(3, 300){
    FTP.open(host, user, passwd){ ... }
}

Here's my possibly naive implementation:

require 'timeout'

module Kernel
    def attempt(tries = 3, interval = 60, timeout = nil)
       begin
          if timeout
             Timeout.timeout(timeout){ yield }
          else
             yield
          end
       rescue
          tries -= 1
          if tries > 0
             sleep interval
             retry
          end
          raise
       end
    end
end

What do you think?  Useful?  Are there any gotchas I need to consider,
such as nested begin/end blocks, try/catch?  Anything else?  Should I
provide some way to provide debug info?  Finer grained error handling?

Ideas welcome.

Thanks,

Dan
D812408537ac3a0fa2fec96eb8811559?d=identicon&s=25 John Carter (johncarter)
on 2006-06-09 02:26
(Received via mailing list)
On Fri, 9 Jun 2006, Daniel Berger wrote:

We had a bug in a system that did something like this so it failed
literally 99 times out of a hundred.

Since we had a fast retry we only noticed the bug when I went hunting
another bug and went around inserting logging statements everywhere and
found the retry / fail producing the massive stream of BLAH failed
retrying messages.

Fixed that bug and suddenly system a lot faster / more stable....

Moral of the Story :

   Unlogged / unreported retries mask bugs, always log / report number
of
   retries.




John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632
PO Box 1645 Christchurch                Email : john.carter@tait.co.nz
New Zealand

Carter's Clarification of Murphy's Law.

"Things only ever go right so that they may go more spectacularly wrong
later."

From this principle, all of life and physics may be deduced.
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-06-09 15:46
(Received via mailing list)
John Carter wrote:
> Fixed that bug and suddenly system a lot faster / more stable....
>
> Moral of the Story :
>
>   Unlogged / unreported retries mask bugs, always log / report number of
>   retries.

Yes, that is a potential issue.  It occurred to me that errors that
would normally be ignored could/should be emitted as warnings.  That
way, if there's an obvious problem with your code, you'll see it right
away, assuming you're running from the command line (or have some other
way of monitoring stderr).

Regards,

Dan
10d4acbfdaccb4eee687a428ca00a5d8?d=identicon&s=25 Jim Weirich (weirich)
on 2006-06-09 16:26
John Carter wrote:
> Moral of the Story :
>
>    Unlogged / unreported retries mask bugs, always log / report number
> of retries.

Also, beware of retries in multiple levels of protocol stack.  I've
heard stories of system that retried the lowest level of a protocol 3
times with a 30 second timeout (total 90 second timeout).  The next
layer above that added its own 3 tries (now we have 4 1/2 minutes before
timeout failure).  The next several layers also did retries, with the
end result taking *hours* to time out.

Moral of story:  Don't add retries indiscriminately.

-- Jim Weirich
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-06-09 16:37
(Received via mailing list)
Jim Weirich wrote:
> timeout failure).  The next several layers also did retries, with the
> end result taking *hours* to time out.
>
> Moral of story:  Don't add retries indiscriminately.
>
> -- Jim Weirich
>

Yep, definitely something to watch out for.  What can I say?  Use with
caution. :)

- Dan
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-06-09 17:19
(Received via mailing list)
On Fri, 9 Jun 2006, Daniel Berger wrote:

> Yep, definitely something to watch out for.  What can I say?  Use with
> caution. :)

for what it's worth have my own version of attempt in a few
near-real-time
systems where the overriding principle is : keep going at all costs.  in
these
systems the 'fail big and fail early' priciple doesn't work unless one
enjoys
working on sundays - so i've got lots of stuff like attempt - it all
logs to
stderr and/or logs however, so it doesn't go unnoticed.

on another note i've found that incremental sleep increse with reset is
almost
always what you want.  retrying on the same interval seems to clog up
systems
as you get in certain timing rythyms.  in rq i use this alot

http://codeforpeople.com/lib/ruby/rq/rq-2.3.3/lib/...

it's a cycle that looks like a sawtooth wave - so basically on each
retry we
timeout for longer than before, essentially becoming more and more
'patient'
before getting really 'impatient' again.

i've found this matched the real world pretty well since timing out a
bunch in
a short period normally means you should wait longer.

cheers.

-a
E3ca69382186f5bce8b43ff5f0cb2287?d=identicon&s=25 kate rhodes (Guest)
on 2006-06-09 17:19
(Received via mailing list)
The following is incredibly nitpicky I admit but I figure I may as
well mention it.
The line
Timeout.timeout(timeout){ yield }

Is it just me or is that a lot of "timeout" Hurts a strangers
understanding of the code. Why not change the name of the passed in
timeout var to user_timeout or anything else that isn't just 'timeout'
- kate = masukom
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-06-09 17:26
(Received via mailing list)
kate rhodes wrote:
> The following is incredibly nitpicky I admit but I figure I may as
> well mention it.
> The line
> Timeout.timeout(timeout){ yield }
>
> Is it just me or is that a lot of "timeout" Hurts a strangers
> understanding of the code. Why not change the name of the passed in
> timeout var to user_timeout or anything else that isn't just 'timeout'
> - kate = masukom

Heh, I suppose it might be. I could change that.

I remember, back in the 1.6.x days, when "timeout" was a top level
method and I had a variable called "timeout" in my code.  That took a
while to track down. :)

Regards,

Dan
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-06-09 17:58
(Received via mailing list)
ara.t.howard@noaa.gov wrote:
> working on sundays - so i've got lots of stuff like attempt - it all
>
> cheers.
>
> -a

Hm, interesting.  Maybe a more advanced version would use a full fledged
class with lots of options.  Something like this:

attempt = Attempt.new{ |a|
    a.tries      = 3       # Try 3 times
    a.interval   = 30      # 30 seconds between tries but...
    a.max        = 90      # In case of nested retries
    a.increment  = 10      # add 10 seconds to the interval with each
try
    a.log        = log     # Where 'log' is an IO handle
    a.warnings   = $stderr # Send caught errors to IO handle as warnings
}

attempt{ # Some op }

Attempt#max would, in theory, be used to prevent Jim Weirich's nightmare
scenario, where you have a bunch of nested retries, all doing their own
sleep + retry thing.

So, using the above example, if I did something like this:

attempt{
    begin
       # some op
    rescue
       sleep 500
       retry
    end
}

It would error out at 90 seconds no matter what (the value we set to
'max').  I'm not sure if that's possible, however, or even how you would
implement it.  Thoughts?

- Dan
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-06-09 18:04
(Received via mailing list)
On Sat, 10 Jun 2006, Daniel Berger wrote:

> It would error out at 90 seconds no matter what (the value we set to 'max').
> I'm not sure if that's possible, however, or even how you would implement it.
> Thoughts?

something like:


   def done
     synchronize(:SH){ @done }
   end

   def done=d
     synchronize(:EX){ @done=d }
   end

   def ensure_max!
     @max ||= Thread.new(max, Thread.current) do |m,c|
       sleep max
       c.raise MaxError unless done
     end
   end

   def attempt
     ...
   ensure
     @max.kill
   end


or something like that ;-)

-a
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-06-09 18:23
(Received via mailing list)
ara.t.howard@noaa.gov wrote:
>     synchronize(:SH){ @done }
>     end
>
> -a

Hm....this has potential.  I might be asking you for some help in the
future.

Thanks,

Dan
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-06-09 18:48
(Received via mailing list)
On Sat, 10 Jun 2006, Daniel Berger wrote:

> Hm....this has potential.  I might be asking you for some help in the future.

sure thing dan.  just ping me offline.

cheers.

-a
This topic is locked and can not be replied to.