Frustrated: System call timeouts

Hello all,

I am having some (un)fun with timing out a database calls.

Basically I have some database calls that go out to a remote database
server on the other side of the planet, (using Rails’ active record).

This works all fine, but occasionally, the link gets interrupted and
you get a stale session and the whole thing just locks up waiting for
the call to complete (which it never does).

This then hangs the rake task that is doing a periodic update through
the system cron, and it can jam until you go in and reset it. - quite
annoying.

Trying timeout.rb didn’t help, as it does not handle system calls
(except I believe for ones that Ruby makes itself, like file I/O).

Trying system-timer (http://ph7spot.com/articles/system_timer) from
Philippe Hanrigou also didn’t work - same hang, waiting for a return
call from the DB driver.

The DB adapter is Oracle instant client then OCI, then Oracle Active
Record Adapter, within ActiveRecord called from a rake task (that
includes the environment), so I am basically calling from within a
full rails stack on top of Ruby 1.8.6p36

When the rake task starts, it checks to see if another copy is running
through a lock file and exits if so, so there is only ever one copy of
the rake task running - so it is not some race condition here.

The time outs happen while I am finding an individual row of a table
[Model.find(id)] which is usually a fast operation, in the context of
where I am using it, it is the slowest part of my process, and so
seems to be where the network has the most chance to crap out, so it
is probably not that that bit of the code fails.

Has anyone found a reliable way to timeout this sort of call / does
anyone have any idea why the system timer would not be timing out
this sort of call.

The hard thing is I am not 100% sure where it is failing, I think
(from looking at tcpdump and copious logging) that it is stalling in
that find method, but this I am not 100% sure.

Any pointers from others that must have tackled this problem on where
to go from here? I see my options are:

  1. Figure out a solution to this problem (preferred)
  2. Abandon it and monitor for a zombie by tailing a log file or the
    like for inactivity and then kill appropriately (sounds like a real
    hack).

Mikel

On Sep 6, 2008, at 3:43 AM, Mikel L. wrote:

[Model.find(id)] which is usually a fast operation, in the context of
that find method, but this I am not 100% sure.

try this

cfp:~/src/ruby > cat timing.rb
Timing.out(2) do
p ‘works’
end

Timing.out(1) do
begin
sleep 2
rescue Timed.out
p ‘times out’
end
end

Timing.out(1) do
sleep 2
p ‘blows up’
end

BEGIN {

module Timing
  class Error < ::StandardError; end

  def Timing.out *seconds, &block
    if seconds.empty?
      return Error
    else
      seconds = Float seconds.first
    end

    pid = Process.pid
    signaler = IO.popen "ruby -e'sleep #{ seconds };
    Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
    thread = Thread.current
    handler = Signal.trap('TERM'){ thread.raise Error,

seconds.to_s }
begin
block.call
ensure
Process.kill ‘TERM’, signaler.pid rescue nil
Signal.trap(‘TERM’, handler)
end
end

  ::Timed = Timing
end

}

cfp:~/src/ruby > ruby timing.rb
“works”
“times out”
timing.rb:34:in out': 1.0 (Timing::Error) from timing.rb:14:in call’
from timing.rb:14:in sleep' from timing.rb:14 from timing.rb:36:in call’
from timing.rb:36:in `out’
from timing.rb:13

a @ http://codeforpeople.com/

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard [email protected]
wrote:

  begin
    block.call
  ensure
    Process.kill 'TERM', signaler.pid rescue nil
    Signal.trap('TERM', handler)
  end

Ara, thank you so much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

There was a bit of delay (putting out some fires here over the past
two days) but I got to your code last night and this morning, and it
basically works… except it doesn’t kill off the signaler threads
fully.

This is because two processes get made, first is the shell which then
creates the ruby -e “sleep…” blah thread.

The ‘hack’ I used to solve this is to replace the ensure block with:

  ensure
    Process.kill 'TERM', signaler.pid rescue nil
    Process.kill('TERM', signaler.pid+1) rescue nil
    Signal.trap('TERM', handler)
  end

But this obviously is insane as it assumes that no other processes get
started on the computer between sh starting up and it firing off the
ruby process.

the ps output looks like this:

$ ps -ef | grep ruby
rails 2153 2152 69 17:04 /usr/sbin/ruby1.8 /usr/bin/rake update:all
rails 2237 2153 69 17:04 sh -c ruby -e’sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil’
rails 2238 2237 69 17:04 ruby -e’sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil’

Any ideas on how to reliably find the PID of the ruby process that the
sh process created by IO.popen creates?

Mikel

On 09/09/2008, Mikel L. [email protected] wrote:

  Process.kill(:TERM.to_s, #{ pid }) rescue nil'"

Ara, thank you so much for this.
creates the ruby -e “sleep…” blah thread.
end
rails 2237 2153 69 17:04 sh -c ruby -e’sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil’
rails 2238 2237 69 17:04 ruby -e’sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil’

Any ideas on how to reliably find the PID of the ruby process that the
sh process created by IO.popen creates?

Since you are using popen anyway you can just have your ruby process
print it’s PID when it starts, and read it in your terminator.

HTH

Michal

On Tue, Sep 9, 2008 at 12:00 AM, Mikel L. [email protected]
wrote:

Ara, thank you so much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

I agree, that was very clever :slight_smile: Bookmarked in case I ever need this.

martin

On Sep 9, 2008, at 6:10 AM, Michal S. wrote:

Since you are using popen anyway you can just have your ruby process
print it’s PID when it starts, and read it in your terminator.

HTH

correct. this is basically how systemu does it, which you could use
similarly to this

require ‘thread’

q = Queue.new

systemu command do |pid|

 q.push pid

end

pid = q.pop

this bizzare syntax will capture the pid but also wait for the
process do start. all it’s doing is reading from a pipe so your
solution seems fine.

cheers.

a @ http://codeforpeople.com/

On Sep 9, 2008, at 1:07 AM, Mikel L. wrote:

 thread = Thread.current

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

i keep meaning to turn this into a library but have not. any other
advice - besides the pid issue - that you encountered trying to make
it live?

cheers.

a @ http://codeforpeople.com/

On Wed, Sep 10, 2008 at 12:45 AM, ara.t.howard [email protected]
wrote:

i keep meaning to turn this into a library but have not. any other advice -
besides the pid issue - that you encountered trying to make it live?

No, the pid issue is the only thing… it sometimes misses.

A library hey?

gem install terminator

Terminate.timeout(40) do
… my block
end

:slight_smile:

Mikel

On Sep 9, 2008, at 8:48 PM, Mikel L. wrote:

gem install terminator
http://lindsaar.net/
Rails, RSpec and Life blog…

oh that’s good! i can give you commit rights to codeforpeople and we
could release. such a great name! :wink:

a @ http://codeforpeople.com/

Mikel L. wrote:

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard [email protected]
wrote:

  begin
    block.call
  ensure
    Process.kill 'TERM', signaler.pid rescue nil
    Signal.trap('TERM', handler)
  end

Ara, thank you so much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :slight_smile: But works well.

There’s also a timeout replacement lib [though I haven’t tried it].
http://ph7spot.com/articles/system_timer

On Thu, Sep 11, 2008 at 4:55 AM, Roger P. [email protected]
wrote:

Mikel L. wrote:

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard [email protected]
Ara, thank you so much for this.
There’s also a timeout replacement lib [though I haven’t tried it].
http://ph7spot.com/articles/system_timer

Thanks for that, I had already tried it. This doesn’t always catch
timed out processes in my experience.