Drb communication problem and crash

Hi,

So first of all a little context of what I’m trying to do.
I have Rails app that needs quite a bit of computation and I want to run
the different queries in a number of different processes. To do so, I’m
trying to implement the following system:

Rails --> Drb Query Dispatcher --> Drb Query Runner

Rails sends a job to the query dispatcher which load balances the jobs
over serveral query runners.

The whole system works and then suddenly hangs. When it hangs I get the
following message on the Drb Query Dispatcher:

message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle

Then I can see that there is still some action for some of the queries
until it freezes completely.

Did anyone encounter similar problems? Or knows where I could fine at
least the signification of these messages?

Thanks alot!
I’ll be glad to give more information if needed.

PS: I have tried to implement this with the Slave library but ran into
even more trouble with logs just making nonsense (looked like some
memory corruption somewhere)

On Oct 12, 2007, at 11:00 , Laurent Francioli wrote:

over serveral query runners.

The whole system works and then suddenly hangs. When it hangs I get
the
following message on the Drb Query Dispatcher:

message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle

I don’t see where this message is coming from in DRb, or ruby.

Then I can see that there is still some action for some of the queries
until it freezes completely.

Did anyone encounter similar problems? Or knows where I could fine at
least the signification of these messages?

grep your code for ‘while idle’, that will help.

Hi,

Thanks for your quick answer! Well the message definitely doesn’t come
from my code. If it really doesn’t come from Drb neither Ruby, maybe it
is a system message?

Also, I’ve read your Seattle.rb presentation slides and on one of your
slides you seem to say that ACL shouldnt be used and could cause
deadlocks; is this right? I’m asking cause we’re using it in our system
to restrain the accepted calls from the localhost only.

Another thing I noticed is that my version not using the Slave lib
actually does produce the same behavior (variables mix-up, etc). It
looks like the communication between the server and clients has some
troubles. I also noticed that the problems occur more often with
increasing number of servers running.
Hope that helps a bit…

I’ll keep you posted if I get new clues or even better…a fix!

Thanks!
Laurent

So I finally found the problem! The message I repported actually came
from Postgres.

The problem was that I had a connection to the DB at the moment of the
fork (both using the Slave lib and my own forking stuff). It seems that
this was somehow passed onto the child processes and interfered with the
child access to the DB. I’m not 100% sure why since the child processes
were actually creating their own connections anyway.
But I’m sure it came from there tho since it is completely stable now!

Thanks alot for your quick reply!
Laurent

On Oct 15, 2007, at 09:50 , Laurent Francioli wrote:

were actually creating their own connections anyway.
But I’m sure it came from there tho since it is completely stable now!

If it still had the file descriptor open, it would be copied.

I don’t recall, which URL?

Ok, it’s really
old…http://blog.segment7.net/articles/2006/04/22/drb-an-introduction-and-overview
and as I said earlier, since I only had the slides and not the commment
on them I couldn’t be sure :slight_smile:

Btw, really nice presentation! It find it pretty difficult to get good
doc on Drb and that’s a great piece!

Thanks,
Laurent

On Oct 16, 2007, at 09:44 , Laurent Francioli wrote:

doc on Drb and that’s a great piece!
Ah, even with an ACL it is still possible for people to do bad stuff
to your DRb processes. ACLs by themselves won’t cause deadlocks, but
they can’t prevent malice.

On Oct 14, 2007, at 14:27 , Laurent Francioli wrote:

Thanks for your quick answer! Well the message definitely doesn’t come
from my code. If it really doesn’t come from Drb neither Ruby,
maybe it
is a system message?

Also, I’ve read your Seattle.rb presentation slides and on one of your
slides you seem to say that ACL shouldnt be used and could cause
deadlocks; is this right? I’m asking cause we’re using it in our
system
to restrain the accepted calls from the localhost only.

I don’t recall, which URL?

Eric H. wrote:

On Oct 16, 2007, at 09:44 , Laurent Francioli wrote:

doc on Drb and that’s a great piece!
Ah, even with an ACL it is still possible for people to do bad stuff
to your DRb processes. ACLs by themselves won’t cause deadlocks, but
they can’t prevent malice.

Ok, thanks for the explanation! :slight_smile: