Ruby IPC In an OpenMosix Cluster

I’m undertaking a project that will eventually become a processing
pipeline application of sorts. It will receive data in a common
format, transform it into one of many other formats, compile and send
it off to various endpoints (sounds spammish, but it’s all solicited,
really :).

My team and I have tentatively decided on openMosix to provide an
easily scalable cluster, and Ruby for the application itself. I’m very
new to Ruby, and fairly new to IPC concepts – The Little Book of
Semaphores and the many threads I’ve read here have helped me out a
lot.

Our application will rely on 3 sets of consumer process pools; each
pool is spawned by a daemon responsible for each basic operation (think
MTA): receive, transform, compile/send. By using processes instead of
threads we allow openMosix to migrate each process and make use of the
entire cluster.

So we have the model down, but I need a bit of advise on how to most
efficiently get these processes talking. What is the best form of IPC
to use here? It seems there are tons of Ruby examples on concurrency
and communication between threads, but I can’t seem to find anything
definitive on IPC (to more than one child at least). I tried the sysv
extension off RAA, but couldn’t get it to compile – though I didn’t
try my best.

Things I’m considering:

  • DRb
  • UNIXSocket
  • mkfifo
  • SysV message queue (openMosix doesn’t support shmem segments)
  • popen (though I can’t see how to do it without round robin
    producing)

If anyone has any advice, please shove me in the right direction.

Thanks in advance!

Also, thanks to matz for the great language (code blocks are uber
bueno)!

Best,
Dan

On Thu, 23 Mar 2006 [email protected] wrote:

Our application will rely on 3 sets of consumer process pools; each pool is

Thanks in advance!

Also, thanks to matz for the great language (code blocks are uber
bueno)!

Best,
Dan

i recently built a system exactly like this for noaa. it’s built
upon
ruby queue (rq) for the clustering and dirwatch for the event driven
components. both are on rubyforge and/or raa. the system uses a uniq
library
that allows classes to be parameterize, loaded, and run on input data in
a few
short lines. here is one class representing a processing flow

class Flo5 < NRT::OLSSubscription::Geotiffed
mode “production”

 roi 47,16,39,27
 satellites %w( F15 F16 )
 extensions %w( OIS )

 solarelevations -180, -12, 10.0

 hold 0

 username "flo"
 password "xxx"

 orbital_start_direction "descending"

end

this is one hundred percent of the coding needed to inject a new
processing
flow into the system, have incoming files spawn jobs for it, distribute
processing to a cluster, and to package and deliver data.

i’d like to do a write up about it in the near future but i’ve just got
alpha
and am still pretty busy. in any case there are at least 6 or 7
components
that can be used from this system in any other such system. ping me on
or off
line and i can give you some more info… right now i’m late for a
meeting…

Linux Clustering with Ruby Queue: Small Is Beautiful | Linux Journal
http://raa.ruby-lang.org/project/rq/
http://raa.ruby-lang.org/project/dirwatch/

kind regards.

-a

Right on, Ara. Thanks for your input!

I had looked at rq very briefly, but at first glance it seemed like it
might be a hassle to maintain as we add nodes (having to
update/kill/restart the application all nodes). What’s been your
experience with regards to maintenance?

Thanks,
Dan

Ok, so I went back and actually read through the entire rq article this
time (and noticed who wrote it – many props Ara :).

From what I understood, you’re suggesting something like this:

  1. Use dirwatch to wait for incoming data (files) on an NFS exported
    dir
  2. Inject jobs into rq for each incoming file
  3. rq executes commands on each node that read in each file from the
    NFS mount

How fast is a setup like this? I would think there would be a lot of
overhead in forking processes for each job, and even more in the
NFS/file IO. We’re shooting for 100 jobs/second, starting with a
fairly small cluster and then scaling up. Each piece of data is 4-10k.

My thought was to spawn a pool of processes once, then start feeding
them the data via [unknown IPC]. Seems like that would be a faster
solution as long as openMosix is efficient in redirecting the IO across
nodes. Of course, this may be a development nightmare (learning
experience), since neither my team nor I have a lot of experience with
multiprocessing.

If rq would satisfy our speed requirements, then I would love to avoid
the extra development time. Perhaps we’ll just have to build a basic
prototype and run some tests. :slight_smile:

Best,
Dan

On Thu, 23 Mar 2006 [email protected] wrote:

Right on, Ara. Thanks for your input!

I had looked at rq very briefly, but at first glance it seemed like it might
be a hassle to maintain as we add nodes (having to update/kill/restart the
application all nodes). What’s been your experience with regards to
maintenance?

hmmm. not quite clear on what you are asking - but we regularly add and
remove nodes. you don’t need to stop all nodes to do this at all - to
add a
node simply start a feeder on it, to remove a node simply stop that
nodes
feeder.

is that what you are asking?

in summary we regularly use rq to put together ‘ad-hoc’ clusters of 3-10
nodes
and this generally takes less than 5 minutes or so.

regards.

-a

hmmm. not quite clear on what you are asking - but we regularly add and
remove nodes. you don’t need to stop all nodes to do this at all - to add a
node simply start a feeder on it, to remove a node simply stop that nodes
feeder.

is that what you are asking?

No, actually I really just spat out the wrong question before I read
the entire article and understood the rq setup. :slight_smile:

Dan

On Thu, 23 Mar 2006 [email protected] wrote:

How fast is a setup like this? I would think there would be a lot of
overhead in forking processes for each job, and even more in the
NFS/file IO. We’re shooting for 100 jobs/second, starting with a
fairly small cluster and then scaling up. Each piece of data is 4-10k.

yup this would defintely push the limits unless you can batch them.
i’m
actually working with a group now that will be injesting data at almost
that
exact same rate. in there case it’s sufficient to bundle jobs up -
higher
latency but also hight throughput. so basically a dirwatch would watch
an
incoming directory and, perhaps once per minute, scan the directory and
submit
bunches of 500 files for processing. for something really simply you
might
not even need dirwatch - just move files to another directory once
they’ve
been submitted - eg. sweep the directory as you submit. rq now supports
providing stdin (and saving stdout and stderr) so submitting jobs that
process
500 files is really easy. anyhow, nfs scales pretty dang well on gigE
with
tuned tcp/ip stacks and fast disk, we certainly abuse ours with little
problems. however, we also use vsftpd to access data and this is very
easy to
setup and extremely fast to use - about as fast as you can get. in any
case
you’ll have the same issue with any cluster: distributing jobs is easier
than
distributing data. for instance, many of our inputs are 3gb-600gb - one
has
to be careful with this sort of payload! :wink:

My thought was to spawn a pool of processes once, then start feeding them
the data via [unknown IPC]. Seems like that would be a faster solution as
long as openMosix is efficient in redirecting the IO across nodes. Of
course, this may be a development nightmare (learning experience), since
neither my team nor I have a lot of experience with multiprocessing.

i did look into this a bit - if i recall openmosix makes io transparent
and
can migrate process memory from node to node… in our case this would
be a
disaster : code must follow data and not the other way around. from
what i
know at the moment i can’t imagine that kernel level io network
multiplexing
would be faster than pulling data across and sending it back in huge
chunks…
the resident network expert here seems to think people need to move that
way
to optimize newer networks with things like jumbo frames. but i can’t
say for
certain i’m just a hacker!

If rq would satisfy our speed requirements, then I would love to avoid the
extra development time. Perhaps we’ll just have to build a basic prototype
and run some tests. :slight_smile:

yes. we ran a simulation here, just prducing false input and running a
busy
loop for a few seconds to appriximate the system that was similar to
yours and
it seemed fine. however they are not in production yet so i cannot say
for
sure. the good part is that all the bits are free and it should only
take a
few hours to mock something up - even on a stock linux machine with no
root
privs…

let me know if you go this route as i have upgrades to both dirwatch and
rq
that you’ll surely want.

regards.

-a

yup this would defintely push the limits unless you can batch them.

Not sure if this is an option yet.

let me know if you go this route as i have upgrades to both dirwatch and rq
that you’ll surely want.

I will definitely let you know. I’m going to experiment with a more
generic drb setup first, and see how that goes. Do your updates
contain the stdin/stdout support, or is that already in the newest
release?

Thanks again for all your help.

Dan