How to pick up dropped script automatically, active records

lasastard · September 10, 2008, 6:57am

Hi!

I have been using Ruby for a while now - so I have a reasonable
understanding of how it works.
But I have a problem that I can’t seem to be able to solve on my own, so
here goes:

I am using an active records based script that queries a remote MySQL
database and retrieves data. It’s quite an extensive search, meaning it
takes a long time to run. Unfortunately, the connection closes at random
(I suppose there is some sort of time-out if a connection has been
existing for a certain amount of time).
I have the means of identifying the current ID of the last object and
pick up the search where it left off - manually. And that’s the problem:
How to write a wrapper or some loop that:
a) Detects if the database connections closes
b) keeps track of the id of the last processed database object (I
suppose that can be stored in a variable, if we are talking about a
loop, otherwise a "return id"like command (?)
c) re-initializes the script using the last id

I am thinking either a second script, that starts the original
database-search-script and can keep track of its current status or else
a while-loop perhaps?
Since the script simply aborts when the connection closes, I suppose the
first solution would be better?
But I have no idea how to:

keep track of the “health” of the original process (there is obviously
an error message returned to STDOUT, but I am not sure how to pick it
up)
start a ruby script from within a ruby script
or else how to write a loop that keeps track of the child process
without the whole process being terminated by an error

Any help would be greatly appreciated!

Cheers,

Marc

lasastard · September 10, 2008, 3:44pm

On Wed, Sep 10, 2008 at 2:50 PM, Marc H.
[email protected] wrote:

I am using an active records based script that queries a remote MySQL
database and retrieves data. It’s quite an extensive search, meaning it
takes a long time to run. Unfortunately, the connection closes at random
(I suppose there is some sort of time-out if a connection has been
existing for a certain amount of time).

I recently made a similar system.

The best way i found to do it is to make a table at one end that holds
all the rows “to be replicated” or in your case “has been replicated”.
This holds the ID of the field you have replicated

Then you write your script to grab the id to replicate, replicate it
down, and if all was successful, put an entry in your ‘has been
replicated’ table with the ID. Then the next time your script runs it
checks what the last ID was that was successfully replicated and picks
up from there.

Put your replication code at the local end in a transaction so that
you only write into the has been replicated table IF no errors came
up, and you don’t replicate anything unless the whole thing completes
successfully.

This then survives restarting hung processes etc.

On detecting health, I found the best way is to set a timeout on each
replication check, and then nuke the process if it takes too long. If
you have used the above pattern on your replication, this won’t hurt
anything.

Warning though, Ruby’s timeout will not handle this as the DB call is
a system call and will not be killed by Ruby timeout. See my previous
thread in Ruby-talk “Frustrated about system timeouts”. The upshot is
that Ara and I are wrapping up a new library called Terminator that
will handle the timeout correctly. But you could adapt the code in
that thread to your problem right now.

Hope that helps.

Mikel

lasastard · September 10, 2008, 4:11pm

What you want is a message queue solution. These do exactly what you
want. Your main script can dole out tasks, and workers pick the tasks
up, complete them, and report in. An added bonus is that this can give
you some parallelism, potentially speeding up your search.

Take a look at Starling:
http://rubypond.com/articles/2008/07/17/the-complete-guide-to-setting-up-starling/

Drb:
http://chadfowler.com/ruby/drb.html
http://segment7.net/projects/ruby/drb/introduction.html

AP4R:
http://ap4r.rubyforge.org/wiki/wiki.pl?HomePage

in addition to Rinda, bj, beanstalkd, sparrow, rq…

– Mark.