Background tasks in rails - any improvement on drb or rinda?

Hi folks,

I’m a rails newby, coming from a Java background, and I’m looking to
move/rewrite an application I have to Rails with some ajax in the front
end. Specifically, it’s an application to process raw digital camera
images - and some of the processing can take quite a long time.

Where I’m struggling a bit is that I want to run several tasks in the
background - the basic flow is:

  • a user requests an image
  • if the image already has been processed, a cached copy of the
    processed image is displayed
  • if the image has not been processed, a quick-and-dirty image is
    displayed, and a background job is started to process the image
    “properly”. More precisely, the image is added to a queue of background
    images, and a pool of worker threads processes images at an optimal rate
    (typically, one worker thread per CPU)
  • once an image has been processed “properly”, the processed image is
    stored in the cache for future views. (And ajax-enabled pages can
    identify this change and reload the proper image immediately)
  • other long-running jobs would also be running, to do stuff like
    cleaning out the cache if it gets too large, pre-processing images, etc.

There was some discussion on the list in February last year about
background jobs, and the suggestion was to use drb/backgroundrb or rinda

  • basically doing all the work in a separate process (if I understand
    correctly) and using sockets or IPC to communicate with the web
    application.

Is it really that hard? In a java servlets app I’d just have something
in my servlet’s init method to start a thread pool, and use syncronized
queues etc. to share the work. Or if I wanted more complex scheduling
I’d use Quartz - either way, everything is in the same memory space, no
external communications, no marshalling of objects…

So, why can’t I do something similar in Rails? Or is there some obvious
thing I’m missing?

  • Korny

Kornelis Sietsma wrote:

Hi folks,

I’m a rails newby, coming from a Java background, and I’m looking to
move/rewrite an application I have to Rails with some ajax in the front
end. Specifically, it’s an application to process raw digital camera
images - and some of the processing can take quite a long time.

  • Korny

You need to take a look at backgroundrb. It’s a very simple plugin for
rails that will do exactly this. Take an hour and run through a few of
the tutorials. Once you do that you can basically do anything you want
with it, especially everything you mentioned. Google backgroundrb.

Ben J. wrote:

Kornelis Sietsma wrote:

Hi folks,

I’m a rails newby, coming from a Java background, and I’m looking to
move/rewrite an application I have to Rails with some ajax in the front
end. Specifically, it’s an application to process raw digital camera
images - and some of the processing can take quite a long time.

  • Korny

You need to take a look at backgroundrb. It’s a very simple plugin for
rails that will do exactly this. Take an hour and run through a few of
the tutorials. Once you do that you can basically do anything you want
with it, especially everything you mentioned. Google backgroundrb.

I had seen backgroundrb (as I kind-of mentioned) - my question though
is, isn’t there any way to do this in-process? It seems wasteful and
needlessly complex to have to run a separate server process, and send
all information needed by my background tasks over TCP/IP to and from
the server. (or do the processes use shared memory? It didn’t seem so
from my initial reading…)

This is exactly the sort of model I’ve been moving away from in the
Java world in the last 5 years - away from the messy inefficient
EJB-style world of multiple distributed servers, endlessly marshalling
and unmarshalling data, when in 99% of cases the same stuff can be done
in a single process…

  • Korny

Jacob A. wrote:

On Sat, Jan 27, 2007 at 09:08:16AM +0100, Kornelis Sietsma wrote:

Is it really that hard?

I’m not sure what exactly it is you think it’s hard. Backgroundrb
provides a pretty simple interface to starting worker tasks.

Sorry, “hard” probably was the wrong term - probably I meant “complex”.
Not “complex for me as a developer to use” so much as “complex
architecturally” - I had assumed there would be a simple way to do this
stuff on a thread. See my comments below.

In a java servlets app I’d just have something in my servlet’s init
method to start a thread pool, and use syncronized queues etc. to
share the work. Or if I wanted more complex scheduling I’d use Quartz

  • either way, everything is in the same memory space, no external
    communications, no marshalling of objects…

So, why can’t I do something similar in Rails? Or is there some obvious
thing I’m missing?

Rails is not threadsafe1. Does this answer your question?

Ah, now it starts to make sense - I hadn’t worked this out at all. I
can see why you’d want this model - it is probably the simplest option
available.

A bit of googling found me:
http://blogs.codehaus.org/people/tirsen/archives/001041_ruby_on_rails_and_fastcgi_scaling_using_processes_instead_of_threads.html
which clarifies how Rails still manages to scale despite this - though
there is some debate still ongoing, obviously.

Thanks for this info - and it’s a pity books like “Agile web development
with rails” don’t make this sort of thing clearer up front! (I found a
description of the above, eventually, on page 616, under “deployment and
production”…)

  • Korny

On Sat, Jan 27, 2007 at 09:08:16AM +0100, Kornelis Sietsma wrote:

Is it really that hard?

I’m not sure what exactly it is you think it’s hard. Backgroundrb
provides a pretty simple interface to starting worker tasks.

In a java servlets app I’d just have something in my servlet’s init
method to start a thread pool, and use syncronized queues etc. to
share the work. Or if I wanted more complex scheduling I’d use Quartz

  • either way, everything is in the same memory space, no external
    communications, no marshalling of objects…

So, why can’t I do something similar in Rails? Or is there some obvious
thing I’m missing?

Rails is not threadsafe1. Does this answer your question?


Cheers,

  • Jacob A.