backgrounDrb 0.2.1 seemed to cause site inaccessible

Hi All,

This is the first time I post to this group. I don’t know how much
info I should provide so that you can help us to debug. I just put
what I thougth would be relevant. If you need more info, please let me
know. Thanks.

We are running Rails 1.1.6, Lighttpd, on a Fedora 5 box (MySql is
running on a separate box),
Linux app 2.6.18-1.2257.fc5 #1 SMP Fri Dec 15 16:07:14 EST 2006 x86_64
x86_64 x86_64 GNU/Linux

Before, we used backgrounDrb 0.1.x version. After frequent high CPU
usage (>90%), we recently upgraded to 0.2.1 version. Now the average
CPU usage is normal (around 20%), but sometimes we see more than one
backgrounDrb main process hanging on. Those stale processes were still
there even after running backgroundrb stop. So we have to manually
kill those old processes.

Another scenario just popped up. While everything seems fine on the
Linux box, CPU, processes, etc., the site just cannot be reached.
Sometimes we can still SSH into it, sometimes we cannot so had to cold
reboot it. When we can still SSH in, we just run backgroudrb stop.
Manually kill the /tmp/socket file sometimes. Then run backgroundrb
start again. The site will behave normally.

Below is from our config file for backgroundrb :
:host: localhost
:protocol: drbunix
:pool_size: 20
:rails_env: production

Has someone encountered the same kind of problems? Your help is highly
appreciated.

Neng

neng wrote:

Has someone encountered the same kind of problems? Your help is highly
appreciated.

I don’t know why you’re ending up with extra backgroundrb processes, but
it sounds to me like you’re running the machine out of memory.

The most recent versions of backgroundrb run in multiple threads, and
your whole application is loaded in each thread. So if you have several
workers going, you can eat up memory fast!

–Al Evans

Thank you Al. Yes, most of my workers need Rails. If each thread is
loaded with the whole application, that would for sure eat much
memory. So my workers (mostly for search) are killed as soon as they
finish the results.

Actually I had to hack into the source code of backgroundrb because it
lacks some functions I need. For example, there is no way to expire
the results. Do you know how I can feedback those changes I made to
backgroundrb? These changes may be useful to other people too.

We just figured out that our server encountered a hardware problem
with our RAID. After we fix it, we will see if backgroundrb works
well. I will report back.