Threads vs Processes

I have just switched back to Windows from the Mac/Linux world and my
first non-web Ruby project is a program to manage a bunch of independent
backup tasks … the spec calls for allowing many tasks to be queued but
to only allow a certain number to run at a time. I don’t foresee the
tasks needing to share any data with the parent process or each other
but that might be an option to keep open…

My first thought is to do something along the lines of a thread pool and
have each task as a thread… but then I thought of processes… I am
admittedly a but fuzzy on the distinction between processes and threads
in Ruby, especially on Windows… can someone shine some light on this
for me?

Would one approach be easier to manage in this type of scenario? Are
there any performance or portability issues I should be aware of?

Thanks much!
Tim

Tim F. wrote:

I have just switched back to Windows from the Mac/Linux world and my
first non-web Ruby project is a program to manage a bunch of independent
backup tasks … the spec calls for allowing many tasks to be queued but
to only allow a certain number to run at a time. I don’t foresee the
tasks needing to share any data with the parent process or each other
but that might be an option to keep open…

My first thought is to do something along the lines of a thread pool and
have each task as a thread… but then I thought of processes… I am
admittedly a but fuzzy on the distinction between processes and threads
in Ruby, especially on Windows… can someone shine some light on this
for me?

Would one approach be easier to manage in this type of scenario? Are
there any performance or portability issues I should be aware of?

Thanks much!
Tim

You might take a look at SizedQueue. There’s a good description of it in
“The Ruby P.ming Language” book. I used it as a work queue to
request a boatload of remote files, from which a set of concurrent ftp
connection threads pulled from to go get them.

I did notice, as forwarned, that the performance degraded after a while
using the native ruby threads. If you’re on Windows, though, you might
give IronRuby a fling (or Jruby if it’s installed). Those switch over to
their framework’s Thread classes, and worked very well.

On 02/21/2010 01:50 AM, Tim F. wrote:

in Ruby, especially on Windows… can someone shine some light on this
for me?

Processes are fairly independent. Threads share the same memory space
and if the process exits they are all gone. In Ruby 1.8 there was only
one OS level thread that did the work for all Ruby threads so you could
not make good use of multiple cores that way. OTOH, if your threads
just control external programs that you execute (e.g. via “system” or
“IO.popen”) then the single thread might be sufficient. In 1.9 things
have been improved but still there are some limitations to the
concurrency of multiple threads. Using JRuby with real threads is also
an option.

Would one approach be easier to manage in this type of scenario? Are
there any performance or portability issues I should be aware of?

Performance wise and from a robustness point of view multiple processes
are probably better. AFAIK the windows version of Ruby does not have
support for “fork” (unless you are using cygwin) so there you might
rather want to use threads.

Using processes is fairly easy - you can try it out with something like
this:

#! /usr/bin/env ruby19

def log msg
printf “pid %5d %-10s %s\n”, $$, Time.now, msg
end

tasks = (1…10).map { 2 + rand(5) }

limit = 2
processes = []

log “starting”

tasks.each do |t|
if processes.size == limit
processes.delete Process.wait
end

processes << fork do
log “start #{t}”
sleep t
log “end #{t}”
end
end

log “all started”
Process.waitall
log “done”

Kind regards

robert

Ron Foster wrote:

You might take a look at SizedQueue. There’s a good description of it in
“The Ruby P.ming Language” book.

Thanks, I will check that out…

I did notice, as forwarned, that the performance degraded after a while
using the native ruby threads. If you’re on Windows, though, you might
give IronRuby a fling (or Jruby if it’s installed). Those switch over to
their framework’s Thread classes, and worked very well.

Given the reasonable amount of tasks I expect to handle, some
degradation may be acceptable… but I may test with IronRuby or JRuby
anyway just for the experience. :slight_smile:

The thing that has held me back with both to this point has be the lack
of support for native gems or suitable alternatives to the ones I
need… RMagick, in particular.

Thanks for the tips!
Tim

Robert K. wrote:

Processes are fairly independent. Threads share the same memory space
and if the process exits they are all gone. In Ruby 1.8 there was only
one OS level thread that did the work for all Ruby threads so you could
not make good use of multiple cores that way. OTOH, if your threads
just control external programs that you execute (e.g. via “system” or
“IO.popen”) then the single thread might be sufficient. In 1.9 things
have been improved but still there are some limitations to the
concurrency of multiple threads. Using JRuby with real threads is also
an option.

Thanks - that clears things up a good bit for me :slight_smile:

Performance wise and from a robustness point of view multiple processes
are probably better. AFAIK the windows version of Ruby does not have
support for “fork” (unless you are using cygwin) so there you might
rather want to use threads.

On Windows fork() is supported using the win32-process gem, although I
have not looked at the code to see what mechanism is actually driving
the implementation…

Using processes is fairly easy - you can try it out with something like
this:

Thanks for the example code … and if I am following all of this
correctly, there really isn’t a way to “monitor” the status of a
Process, right? The only option is to “wait()” for it and then grab the
exit status… correct?

Thanks again,
Tim

On 2/21/2010 9:00 AM, Tim F. wrote:

concurrency of multiple threads. Using JRuby with real threads is also
On Windows fork() is supported using the win32-process gem, although I
Process, right? The only option is to “wait()” for it and then grab the
exit status… correct?

Thanks again,
Tim

Correct.

One thing I thought I’d add: you might give DRb a look-see. It provides
a nice easy way
to perform inter-process communication in Ruby. I think it would be
fairly easy to implement a
worker pool with processes instead of threads using DRb.

2010/2/21 Walton H. [email protected]:

One thing I thought I’d add: you might give DRb a look-see. It provides a
nice easy way
to perform inter-process communication in Ruby. I think it would be fairly
easy to implement a
worker pool with processes instead of threads using DRb.

Good point! I guess there are already frameworks around that abstract
this away but cooking your own isn’t too hard either.

Kind regards

robert