[ANN] rq-2.3.2

rq v2.3.2

rq (queue | export RQ_Q=q) mode [mode_args]* [options]*


Linux Clustering with Ruby Queue: Small Is Beautiful | Linux Journal

ruby queue (rq) is a zero-admin zero-configuration tool used to
instant linux clusters. rq requires only a central nfs filesystem in
to manage a simple sqlite database as a distributed priority work
this simple design allows researchers to install and configure, in
only a
few minutes and without root privileges, a robust linux cluster
capable of
distributing processes to many nodes - bringing dozens of powerful
cpus to
their knees with a single blow. clearly this software should be kept
out of
the hands of free radicals, seti enthusiasts, and one mr. j safran.

the central concept of rq is that n nodes work in isolation to pull
from an centrally mounted nfs priority work queue in a synchronized
the nodes have absolutely no knowledge of each other and all
is done via the queue meaning that, so long as the queue is available
nfs and a single node is running jobs from it, the system will
continue to
process jobs. there is no centralized process whatsoever - all nodes
to take jobs from the queue and run them as fast as possible. this
a system which load balances automatically and is robust in face of

although the rq system is simple in it’s design it features powerful
functionality such as priority management, predicate and sql query ,
streaming command-line processing, programmable api, hot-backup, and
input/capture of the stdin/stdout/stderr io streams of remote jobs.
to date
rq has had no reported runtime failures and is in operation at dozens
research centers around the world.


the first argument to any rq command is the always the name of the
while the second is the mode of operation. the queue name may be
if, and only if, the environment variable RQ_Q has been set to
contain the
absolute path of target queue.

for instance, the command

 ~ > rq queue list

is equivalent to

 ~ > export RQ_Q=queue
 ~ > rq list

this facility can be used to create aliases for several queues, for
a .bashrc containing

 alias MYQ="RQ_Q=/path/to/myq rq"

 alias MYQ2="RQ_Q=/path/to/myq2 rq"

would allow syntax like

 MYQ2 submit < joblist


rq operates in modes create, submit, resubmit, list, status, delete,
query, execute, configure, snapshot, lock, backup, rotate, feed,
ioview, and help, and a few others. the meaning of ‘mode_args’ will
naturally change depending on the mode of operation.

the following mode abbreviations exist, note that not all modes have

 c  => create
 s  => submit
 r  => resubmit
 l  => list
 ls => list
 t  => status
 d  => delete
 rm => delete
 u  => update
 q  => query
 e  => execute
 C  => configure
 S  => snapshot
 L  => lock
 b  => backup
 R  => rotate
 f  => feed
 io => ioview
 h  => help

create, c :

 creates a queue.  the queue must be located on an nfs mounted file 

visible from all nodes intended to run jobs from it. nfs locking
must be
functional on this file system.

 examples :

   0) to create a queue
       ~ > rq /path/to/nfs/mounted/q create

     or, using the abbreviation

       ~ > rq /path/to/nfs/mounted/q c

submit, s :

 submit jobs to a queue to be proccesed by some feeding node.  any
 'mode_args' are taken as the command to run.  note that 'mode_args' 

subject to shell expansion - if you don’t understand what this
means do
not use this feature and pass jobs on stdin.

 when running in submit mode a file may by specified as a list of 

to run using the ‘–infile, -i’ option. this file is taken to be a
newline separated list of commands to submit, blank lines and
comments (#)
are allowed. if submitting a large number of jobs the input file
is MUCH, more efficient. if no commands are specified on the
command line
rq automatically reads them from stdin. yaml formatted files are
allowed as input (http://www.yaml.org/) - note that the output of
all rq commands is valid yaml and may, therefore, be piped as input
the submit command. the leading ‘—’ of yaml file may not be

 when submitting the '--priority, -p' option can be used here to 

the priority of jobs. priorities may be any whole number including
negative ones - zero is the default. note that submission of a
priority job will NOT supplant a currently running low priority
job, but
higher priority jobs WILL always migrate above lower priority jobs
in the
queue in order that they be run as soon as possible. constant
of high priority jobs may create a starvation situation whereby low
priority jobs are never allowed to run. avoiding this situation is
responsibility of the user. the only guaruntee rq makes regarding
execution is that jobs are executed in an ‘oldest-highest-priority’
and that running jobs are never supplanted. jobs submitted with
‘–stage’ option will not be eligible to be run by any node and
remain in a ‘holding’ state until updated (see update mode) into
‘pending’ mode, this option allows jobs to entered, or ‘staged’, in
queue and then made candidates for running at a later date.

 rq allows the stdin of commands to be provided and also captures 

stdout and stderr of any job run (of course standard shell
redirects may
be used as well) and all three will be stored in a directory
relative the
the queue itself. the stdin/stdout/stderr files are stored by job
id and
there location (though relative to the queue) is shown in the
output of
‘list’ (see docs for list).

 examples :

   0) submit the job ls to run on some feeding host

     ~ > rq q s ls

   1) submit the job ls to run on some feeding host, at priority 9

     ~ > rq -p9 q s ls

   2) submit a list of jobs from file.  note the '-' used to specify
   reading jobs from stdin

     ~ > cat joblist

     ~ > rq q submit --infile=joblist

   3) submit a joblist on stdin

     ~ > cat joblist | rq q submit -


     ~ > rq q submit - <joblist

   4) submit cat as a job, providing the stdin for cat from the file 


     ~ > rq q submit cat --stdin=cat.in

   5) submit cat as a job, providing the stdin for the cat job on 


     ~ > cat cat.in | rq q submit cat --stdin=-


     ~ > rq q submit cat --stdin=- <cat.in

   6) submit 42 priority 9 jobs from a command file, marking them as
      'important' using the '--tag, -t' option.

     ~ > wc -l cmdfile

     ~ > rq -p9 -timportant q s < cmdfile

   6) re-submit all the 'important' jobs (see 'query' section below)

     ~ > rq q query tag=important | rq q s -

   8) re-submit all jobs which are already finished (see 'list' 


     ~ > rq q l f | rq q s

   9) stage the job wont_run_yet to the queue in a 'holding' state. 

feeder will run this job until it’s state is upgraded to

     ~ > rq q s --stage wont_run_yet

ioview, io :

 as shown in the description for submit, a job maybe be provided 

during job submission. the stdout and stderr of the job are also
as the job is run. all three streams are captured in files located
relative to the queue. so, if one has submitted a job, and it’s
jid was
shown to be 42, by using something like

   ~ > rq /path/to/q submit myjob --stdin=myjob.in
     jid : 42
     priority : 0
     stdin : stdin/42
     stdout : stdout/42
     stderr : stderr/42
     command : myjob

 the stdin file will exists as soon as the job is submitted and the 

will exist once the job has begun running. note that these paths
shown relative to the queue. in this case the actual paths would


 but, since our queue is nfs mounted the /path/to/q may or may not 

be the
same on every host. thus the path is a relative one. this can
make it
anoying to view these files, but rq assists here with the ioview
the ioview command spawns an external editor to view all three
it’s use is quite simple

 examples :

   0) view the stdin/stdout/stderr of job id 42

      ~ > rq ioview 42

 by default this will open up all three files in vim.  the editor 

can be specified using the ‘–editor’ option or the ENV var
the default value is ‘vim -R -o’ which allows all three files to be
in a single window.

resubmit, r :

 resubmit jobs back to a queue to be proccesed by a feeding node. 

is essentially equivalent to submitting a job that is already in
the queue
as a new job and then deleting the original job except that using
is atomic and, therefore, safer and more efficient. resubmission
any previous stdin provided for job input. read docs for delete
submit for more info.

 examples :

   0) resubmit job 42 to the queue

     ~> rq q resubmit 42

   1) resubmit all failed jobs

     ~> rq q query exit_status!=0 | rq q resubmit -

   2) resubmit job 4242 with different stdin

     ~ rq q resubmit 4242 --stdin=new_stdin.in

list, l, ls :

 list mode lists jobs of a certain state or job id.  state may be 

one of
pending, holding, running, finished, dead, or all. any ‘mode_args’
are numbers are taken to be job id’s to list.

 states may be abbreviated to uniqueness, therefore the following 

apply :

   p => pending
   h => holding
   r => running
   f => finished
   d => dead
   a => all

 examples :

   0) show everything in q
       ~ > rq q list all


       ~ > rq q l all


       ~ > export RQ_Q=q
       ~ > rq l

   1) show q's pending jobs
       ~ > rq q list pending

   2) show q's running jobs
       ~ > rq q list running

   3) show q's finished jobs
       ~ > rq q list finished

   4) show job id 42
       ~ > rq q l 42

   5) show q's holding jobs
       ~ > rq q list holding

status, t :

 status mode shows the global state the queue and statistics on it's 

cluster’s performance. there are no ‘mode_args’. the meaning of
state is as follows:

   pending  => no feeder has yet taken this job
   holding  => a hold has been placed on this job, thus no feeder 

will start
running => a feeder has taken this job
finished => a feeder has finished this job
dead => rq died while running a job, has restarted, and moved
this job to the dead state

 note that rq cannot move jobs into the dead state unless it has 

restarted. this is because no node has any knowledge of other
nodes and
cannot possibly know if a job was started on a node that died, or
simply taking a very long time. only the node that dies, upon
can determine that is has jobs that ‘were started before it
started’ and
move these jobs into the dead state. normally only a machine crash
cause a job to be placed into the dead state. dead jobs are never
automatically restarted, this is the responsibility of an operator.

 examples :

   0) show q's status

     ~ > rq q t

delete, d :

 delete combinations of pending, holding, finished, dead, or jobs 

by jid. the delete mode is capable of parsing the output of list
query modes, making it possible to create custom filters to delete
meeting very specific conditions.

 'mode_args' are the same as for list.

 note that it is NOT possible to delete a running job.  rq has a
 decentralized architechture which means that compute nodes are 

independant of one another; an extension is that there is no way to
communicate the deletion of a running job from the queue the the
actually running that job. it is not an error to force a job to
prematurely using a facility such as an ssh command spawned on the
host to kill it. once a job has been noted to have finished,
whatever the
exit status, it can be deleted from the queue.

 examples :

   0) delete all pending, finished, and dead jobs from a queue

     ~ > rq q d all

   1) delete all pending jobs from a queue

     ~ > rq q d p

   2) delete all finished jobs from a queue

     ~ > rq q d f

   3) delete jobs via hand crafted filter program

     ~ > rq q list | yaml_filter_prog | rq q d -

     an example ruby filter program (you have to love this)

       ~ > cat yaml_filter_prog
       require 'yaml'
       joblist = YAML::load STDIN
       y joblist.select{|job| job['command'] =~ /bombing_program/}

     this program reads the list of jobs (yaml) from stdin and then 

only those jobs whose command matches ‘bombing_program’, which
subsequently piped to the delete command.

update, u :

 update assumes all leading arguments are jids to update with 

key=value pairs. currently only the ‘command’, ‘priority’, and
fields of pending jobs can be generically updated and the ‘state’
may be toggled between pending and holding.


   0) update the priority of job 42

     ~ > rq q update 42 priority=7

   1) update the priority of all pending jobs

     ~ > rq q update pending priority=7

   2) query jobs with a command matching 'foobar' and update their 

to be ‘barfoo’

     ~ > rq q q "command like '%foobar%'" |\
         rq q u command=barfoo

   3) place a hold on jid 2

     ~ > rq q u 2 state=holding

   4) place a hold on all jobs with tag=disk_filler

     ~ > rq q q tag=disk_filler | rq q u state=holding -

   5) remove the hold on jid 2

     ~ > rq q u 2 state=pending

query, q :

 query exposes the database more directly the user, evaluating the 

clause specified on the command line (or read from stdin). this
can be used to make a fine grained slection of jobs for reporting
or as
input into the delete command. you must have a basic understanding
of SQL
syntax to use this feature, but it is fairly intuitive in this


   0) show all jobs submitted within a specific 10 minute range

     ~ > a='2004-06-29 22:51:00'

     ~ > b='2004-06-29 22:51:10'

     ~ > rq q query "started >= '$a' and started < '$b'"

   1) shell quoting can be tricky here so input on stdin is also 

allowed to
avoid shell expansion

     ~ > cat constraints.txt
     started >= '2004-06-29 22:51:00' and
     started < '2004-06-29 22:51:10'

     ~ > rq q query < contraints.txt
       or (same thing)

     ~ > cat contraints.txt| rq q query -

   2) this query output might then be used to delete those jobs

     ~ > cat contraints.txt | rq q q - | rq q d -

   3) show all jobs which are either finished or dead

     ~ > rq q q "state='finished' or state='dead'"

   4) show all jobs which have non-zero exit status

     ~ > rq q query exit_status!=0

   5) if you plan to query groups of jobs with some common feature 

using the ‘–tag, -t’ feature of the submit mode which allows a
user to
tag a job with a user defined string which can then be used to
query that job group

     ~ > rq q submit --tag=my_jobs - < joblist

     ~ > rq q query tag=my_jobs

   6) in general all but numbers will need to be surrounded by 

quotes unless the query is a ‘simple’ one. a simple query is a
with no boolean operators, not quotes, and where every part of it

         key op value

      with ** NO SPACES ** between key, op, and value.  if, and only 

the query is ‘simple’ rq will contruct the where clause
appropriately. the operators accepted, and their meanings,

        =  : equivalence : sql =
        =~ : matches     : sql like
        !~ : not matches : sql not like

      match, in the context is ** NOT ** a regular expression but a 

style string match. about all you need to know about sql
matches is
that the ‘%’ char matches anything. multiple simple queries
will be
joined with boolean ‘and’

      this sounds confusing - it isn't.  here are some examples of 


        query :
          rq q query tag=important

        where_clause :
          "( tag = 'important' )"

        query :
          rq q q priority=6 restartable=true

        where_clause :
          "( priority = 6 ) and ( restartable = 'true' )"

        query :
          rq q q command=~%bombing_job% runner=~%node_1%

        where_clause :
          "( command like '%bombing_job%') and (runner like 


execute, e :

 execute mode is to be used by expert users with a knowledge of sql 

only. it follows the locking protocol used by rq and then allows
the user
to execute arbitrary sql on the queue. unlike query mode a write
lock on
the queue is obtained allowing a user to definitively shoot
themselves in
the foot. for details on a queue’s schema the file ‘db.schema’ in
queue directory should be examined.

   examples :

     0) list all jobs

       ~ > rq q execute 'select * from jobs'

configure, C :

 this mode is not supported yet.

snapshot, p :

 snapshot provides a means of taking a snapshot of the q. use this 

when many queries are going to be run; for example when attempting
figure out a complex pipeline command your test queries will not
with the feeders for the queue’s lock. you should use this option
whenever possible to avoid lock competition.


   0) take a snapshot using default snapshot naming, which is made 

via the
basename of the q plus ‘.snapshot’

     ~ > rq /path/to/nfs/q snapshot

   1) use this snapshot to chceck status

     ~ > rq ./q.snapshot status

   2) use the snapshot to see what's running on which host

     ~ > rq ./q.snapshot list running | grep `hostname`

 note that there is also a snapshot option - this option is not the 

same as
the snapshot command. the option can be applied to ANY command. if
effect then that command will be run on a snapshot of the database
and the
snapshot then immediately deleted. this is really only useful if
one were
to need to run a command against a very heavily loaded queue and
did not
wish to wait to obtain the lock. eg.

   0) get the status of a heavily loaded queue

     ~ > rq q t --snapshot

   1) same as above

     ~ > rq q t -s


   a really great way to hang all processing in your queue is to do 


     rq q list | less

   and then leave for the night.  you hold a read lock you won't 

until less dies. this is what snapshot is made for! use it like

     rq q list -s | less

   now you've taken a snapshot of the queue to list so your locks 

affect no

lock, L :

 lock the queue and then execute an arbitrary shell command.  lock 

uses the queue’s locking protocol to safely obtain a lock of the
type and execute a command on the user’s behalf. lock type must be
one of

   (r)ead | (sh)ared | (w)rite | (ex)clusive

 examples :

   0) get a read lock on the queue and make a backup

     ~ > rq q L read -- cp -r q q.bak

     (the '--' is needed to tell rq to stop parsing command line
      options which allows the '-r' to be passed to the 'cp' 



   this is another fantastic way to freeze your queue - use with 


backup, b :

 backup mode is exactly the same as getting a read lock on the queue 

making a copy of it. this mode is provided as a convenience.

   0) make a backup of the queue using default naming ( qname + 

timestamp +
.bak )

     ~ > rq q b

   1) make a backup of the queue as 'q.bak'

     ~ > rq q b q.bak

rotate, r :

 rotate mode is conceptually similar to log rolling.  normally the 

list of
finished jobs will grow without bound in a queue unless they are
deleted. rotation is a method of trimming finished jobs from a
without deleting them. the method used is that the queue is copied
to a
‘rotation’; all jobs that are dead or finished are deleted from the
original queue and all pending and running jobs are deleted from
rotation. in this way the rotation becomes a record of the queue’s
finished and dead jobs at the time the rotation was made.

   0) rotate a queue using default rotation name

     ~ > rq q rotate

   1) rotate a queue naming the rotation

     ~ > rq q rotate q.rotation

   2) a crontab entry like this could be used to rotate a queue 


     59 23 * * * rq q rotate `date +q.%Y%m%d`

feed, f :

 take jobs from the queue and run them on behalf of the submitter as
 quickly as possible.  jobs are taken from the queue in an 'oldest 

priority’ first order.

 feeders can be run from any number of nodes allowing you to harness 

CPU power of many nodes simoultaneously in order to more
clobber your network, anoy your sysads, and set output raids on

 the most useful method of feeding from a queue is to do so in 

daemon mode
so that if the process loses it’s controling terminal it will not
when you exit your terminal session. use the ‘–daemon, -d’ option
accomplish this. by default only one feeding process per host per
is allowed to run at any given moment. because of this it is
to start a feeder at some regular interval from a cron entry since,
if a
feeder is alreay running, the process will simply exit and
otherwise a new
feeder will be started. in this way you may keep feeder processing
running even acroess machine reboots without requiring sysad
to add an entry to the machine’s startup tasks.

 examples :

   0) feed from a queue verbosely for debugging purposes, using a 

and maximum polling time of 2 and 4 respectively. you would
specify polling times this brief except for debugging purposes!!!

     ~ > rq q feed -v4 --min_sleep=2 --max_sleep=4

   1) same as above, but viewing the executed sql as it is sent to 


     ~ > RQ_SQL_DEBUG=1 rq q feed -v4 --min_sleep=2 --max_sleep=4

   2) feed from a queue in daemon mode - logging to 


     ~ > rq q feed --daemon -l/home/$USER/rq.log

      log rolling in daemon mode is automatic so your logs should 

need to be deleted to prevent disk overflow.

start :

 the start mode is equivalent to running the feed mode except the 

is implied so the process instantly goes into the background.
also, if no
log (–log) is specified in start mode a default one is used. the
is /home/$USER/$BASENAME_OF_Q.log

 examples :

   0) start a daemon process feeding from q

     ~ > rq q start

   1) use something like this sample crontab entry to keep a feeder 

forever - it attempts to (re)start every fifteen minutes but
exits if
another process is already feeding. output is only created when
daemon is started so your mailbox will not fill up with this

     # crontab.sample

     */15 * * * * /path/to/bin/rq /path/to/q start

   and entry like this on every node in your cluster is all that's 

to keep your cluster going - even after a reboot.

shutdown :

 tell a running feeder to finish any pending jobs and then to exit. 

is equivalent to sending signal ‘SIGTERM’ to the process - this is
using ‘kill pid’ does by default.

 examples :

   0) stop a feeding process, if any, that is feeding from q.  allow 

jobs to be finished first.

     ~ > rq q shutdown


   if you are keeping your feeder alive with a crontab entry you'll 

need to
comment it out before doing this or else it will simply

stop :

 tell any running feeder to stop NOW.  this sends signal 'SIGKILL' 

(-9) to
the feeder process. the same warning as for shutdown applies!!!

 examples :

   0) stop a feeding process, if any, that is feeding from q.  allow 

jobs to be finished first - exit instantly.

     ~ > rq q stop

pid :

 show the pid, if any, of the feeder on this host

 ~ > rq q feeder
 pid : 3176

help, h :

 this message

 examples :

   0) get this message

     ~> rq q help


     ~> rq help


  • realize that your job is going to be running on a remote host and
    this has
    implications. paths, for example, should be absolute, not
    specifically the submitted job script must be visible from all
    currently feeding from a queue as must be the input and output

  • jobs are currently run under the bash shell using the --login
    therefore any settings in your .bashrc will apply - specifically
    your PATH
    setting. you should not, however, rely on jobs running with any

  • you need to consider CAREFULLY what the ramifications of having
    multiple instances of your program all potentially running at the
    time will be. for instance, it is beyond the scope of rq to ensure
    multiple instances of a given program will not overwrite each
    output files. coordination of programs is left entirely to the

  • the list of finished jobs will grow without bound unless you
    delete some (all) of them. the reason for this is that rq cannot
    when the user has collected the exit_status of a given job, and so
    this information in the queue forever until instructed to delete
    it. if
    you have collected the exit_status of you job(s) it is not an error
    then delete that job from the finished list - the information is
    kept for
    your informational purposes only. in a production system it would
    normal to periodically save, and then delete, all finished jobs.

  • know that it is a VERY bad idea to spawn several dozen process all
    reading/writing huge output files to a single NFS server. use this
    paradigm instead

    • copy/move data from global input space to local disk
    • process data
    • move data on local disk to global output space

    this, of course, applies to any nfs processing, not just those jobs
    submitted to rq

    the vsftp daemon is an excellent utility to have running on hosts
    in your
    cluster so anonymous ftp can be used to get/put data between any

  • know that nfs locking is very, very easy to break with firewalls
    put in
    place by overzealous system administrators. be postive not only
    that nfs
    locking works, but that lock recovery server/client crash or reboot
    as well. http://nfs.sourceforge.net/ is the place to learn about
    NFS. my
    experience thus far is that there are ZERO properly configured NFS
    installations in the world. please test yours. contact me for a
    script which can assist you. beer donations required as payment.

RQ_Q: set to the full path of nfs mounted queue

 the queue argument to all commands may be omitted if, and only if, 

environment variable ‘RQ_Q’ contains the full path to the q. eg.

   ~ > export RQ_Q=/full/path/to/my/q

 this feature can save a considerable amount of typing for those 

weak of

success : $? == 0
failure : $? != 0


  • kim baugh : patient tester and design input
  • jeff safran : the guy can break anything
  • chris elvidge : made it possible
  • trond myklebust : tons of help with nfs
  • jamis buck : for writing the sqlite bindings for ruby
  • _why : for writing yaml for ruby
  • matz : for writing ruby

[email protected]

0 < bugno && bugno <= 42

reports to [email protected]

–priority=priority, -p
modes <submit, resubmit> : set the job(s) priority - lowest(0)

highest(n) - (default 0)
–tag=tag, -t
modes <submit, resubmit> : set the job(s) user data tag
modes <submit, resubmit> : set the job(s) required runner(s)
modes <submit, resubmit> : set the job(s) to be restartable on
modes <submit, resubmit> : set the job(s) initial state to be
(default pending)
–infile=infile, -i
modes <submit, resubmit> : infile
–stdin=[stdin], -s
modes <submit, resubmit, update> : stdin
–quiet, -q
modes <submit, resubmit, feed> : do not echo submitted jobs,
silently if another process is already feeding
–daemon, -D
modes : spawn a daemon
modes : the maximum number of concurrent jobs run
modes : specify transaction retries
modes : specify min sleep
modes : specify max sleep
–snapshot, -s
operate on snapshot of queue
–editor=editor, -e
editor command capable of opening multiple files at once =
ENV[“RQ_EDITOR”] || “vim -R -o”)
–verbosity=verbostiy, -v
0|fatal < 1|error < 2|warn < 3|info < 4|debug - (default info)
–log=path, -l
set log file - (default stderr)
daily | weekly | monthly - what age will cause log rolling
size in bytes - what size will cause log rolling (default nil)
–help, -h
this message
show version number



some people, sweet and attractive, and strong and healthy, happen to die
young. they are masters in disguise teaching us about impermanence.

  • h.h. the 14th dali lama