Hello all,
I am having some (un)fun with timing out a database calls.
Basically I have some database calls that go out to a remote database
server on the other side of the planet, (using Rails’ active record).
This works all fine, but occasionally, the link gets interrupted and
you get a stale session and the whole thing just locks up waiting for
the call to complete (which it never does).
This then hangs the rake task that is doing a periodic update through
the system cron, and it can jam until you go in and reset it. - quite
annoying.
Trying timeout.rb didn’t help, as it does not handle system calls
(except I believe for ones that Ruby makes itself, like file I/O).
Trying system-timer (http://ph7spot.com/articles/system_timer) from
Philippe Hanrigou also didn’t work - same hang, waiting for a return
call from the DB driver.
The DB adapter is Oracle instant client then OCI, then Oracle Active
Record Adapter, within ActiveRecord called from a rake task (that
includes the environment), so I am basically calling from within a
full rails stack on top of Ruby 1.8.6p36
When the rake task starts, it checks to see if another copy is running
through a lock file and exits if so, so there is only ever one copy of
the rake task running - so it is not some race condition here.
The time outs happen while I am finding an individual row of a table
[Model.find(id)] which is usually a fast operation, in the context of
where I am using it, it is the slowest part of my process, and so
seems to be where the network has the most chance to crap out, so it
is probably not that that bit of the code fails.
Has anyone found a reliable way to timeout this sort of call / does
anyone have any idea why the system timer would not be timing out
this sort of call.
The hard thing is I am not 100% sure where it is failing, I think
(from looking at tcpdump and copious logging) that it is stalling in
that find method, but this I am not 100% sure.
Any pointers from others that must have tackled this problem on where
to go from here? I see my options are:
- Figure out a solution to this problem (preferred)
- Abandon it and monitor for a zombie by tailing a log file or the
like for inactivity and then kill appropriately (sounds like a real
hack).
Mikel