Hi all,
If anyone is willing, I’d be grateful for some advice on the forking
job scheduler I’ve written. It works fine in simple tests, but does
not feel elegant. On IRC kbrooks recommended an asynchronous main
loop, but i don’t understand how to implement that in this situation.
The first version I wrote used threads, but several sources
recommended fork instead. I have also considered just using the shell
command ‘ps’ to see how many jobs are running, launching more as
needed.
The basic requirements:
Each job is a long-running external process (taking a day or more)
and all jobs require a different amount of time to run (so
asynchronous launching will be needed).
I want to keep N jobs running at all times (N = 4 in the example
below)
def start_job
my_job = @jobs.pop
puts “starting job #{my_job}”
exec(“sleep 2”) if fork == nil # launch a job. in reality it would
run for a day or more
end
for num in 1…4 # i want to keep 4 jobs running at all times
start_job
end
this doesn’t wait for the last jobs to finish
while @jobs.size > 0
Process.wait
end
this waits for the last jobs, but if i only had this line, it
wouldn’t wait for all the jobs to start!
Process.wait
Each job is a long-running external process (taking a day or more)
and all jobs require a different amount of time to run (so
asynchronous launching will be needed).
I want to keep N jobs running at all times (N = 4 in the example below)
You say nothing about the coordination requirements of the external
processes with the “watchdog” process. Is your requirement really just
to
ensure that four jobs are running at all times? If so, I would avoid
using a
long-running watchdog process, because you’re making an assumption that
it
will never crash, catch a signal, etc. Why not run a cron job every five
minutes or so that checks the running processes (via pgrep or ps as you
suggested), starts more if necessary, writes status to syslog, and then
quits? Much, much easier.
I actually had considered the cron approach, but wasn’t sure if it was
the best way to do things. What you say makes a lot of sense (I was
already nervous about the watchdog running into trouble, and there are
no coordination requirements), so I will go with your suggestion.
Each job is a long-running external process (taking a day or more)
and all jobs require a different amount of time to run (so
asynchronous launching will be needed).
I want to keep N jobs running at all times (N = 4 in the example below)
there is a command-line interface plus programming api - so you can
almost
certainly accomplish whatever it is you need to do with zero or very
little
coding on your part.
it
simply starts if it’s not running, otherwise it does nothing.
Sounds cool, Ara. How does it keep two copies of itself from running?
Does
it flock a file in /var/run or something like that?
You say nothing about the coordination requirements of the external
processes with the “watchdog” process. Is your requirement really just to
ensure that four jobs are running at all times? If so, I would avoid using a
long-running watchdog process, because you’re making an assumption that it
will never crash, catch a signal, etc. Why not run a cron job every five
minutes or so that checks the running processes (via pgrep or ps as you
suggested), starts more if necessary, writes status to syslog, and then
quits? Much, much easier.
this is exactly how rq works - except it does both: the feeder process
is a
daemon, but one which refuses to start two copies of itself. therefore
a
crontab entry can be used to make it ‘immortal’. basically, the crontab
simply starts if it’s not running, otherwise it does nothing.
Sounds cool, Ara. How does it keep two copies of itself from running? Does
it flock a file in /var/run or something like that?
yeah - basically. it’s under the users home dir though, named after the
queue. the effect is ‘one feeder per host per user’ by default. it
really
works nicely because you can have a daemon process totally independent
of
system space and without root privs. dirwatch works the same way.
here’s my
crontab on our nrt system:
clam, oyster, bismarck, scallop, shrimp are current workers
*/15 * * * * $worker $env $shush $nrtq start
this same crontab is installed across our nrt cluster. basically one
node
runs a bunch of dirwatchs which trigger submits to the master queue.
the
workers, for their part, are completely stupid, all the have is a user
account
and the ‘$worker’ crontab entry that keeps a feeding process running at
all
times, even after reboot. it’s a simple was to setup durable userland
daemons.
($leader and $worker are xargs style programs - $leader obviously only
executes it’s command line if run on the leader, vise verse for worker)
regards.
-a
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.