Deadlocks in Drb Server

Hi,

we still have the problem in that the Ferret Drb server dies on us
sometimes. Looking through ferret_server.out and we come accross this:

deadlock 0xb7261cb0: sleep:F(1012) - /usr/lib/ruby/1.8/drb/drb.rb:566
deadlock 0xb71d8654: sleep:F(409) - /usr/lib/ruby/1.8/drb/drb.rb:566
deadlock 0xb723f3e0: sleep:F(7) - /usr/lib/ruby/1.8/drb/drb.rb:566
deadlock 0xb72607c0: sleep:F(11) - /usr/lib/ruby/1.8/drb/drb.rb:566
deadlock 0xb7d44754: sleep:J(0xb71fab64) (main) -
/var/www/web1/oms/current/script/ferret_start:70
deadlock 0xb71fab64: sleep:F(6) - /usr/lib/ruby/1.8/drb/drb.rb:944
deadlock 0xb726f158: sleep:S - /usr/lib/ruby/1.8/drb/drb.rb:626
/usr/lib/ruby/1.8/drb/drb.rb:626: Thread(0xb726f158): deadlock (fatal)

Anyone have an idea what may be causing the problem and what we can do
to get round it?

Thanks

Matthew

Hey …

do you have very high load on your drb? I serialize
my indexing requests and did not yet have a deadlock,
but then again, i’m not using aaf … afaik, aaf uses
separate threads to add documents to the index, so
there might be a problem when you have a really
high load… but that’s just speculation …

can you provide us with more information about the
circumstances?

Ben

On Tue, Oct 02, 2007 at 04:13:47PM +0200, Matthew L.ham wrote:

/var/www/web1/oms/current/script/ferret_start:70
deadlock 0xb71fab64: sleep:F(6) - /usr/lib/ruby/1.8/drb/drb.rb:944
deadlock 0xb726f158: sleep:S - /usr/lib/ruby/1.8/drb/drb.rb:626
/usr/lib/ruby/1.8/drb/drb.rb:626: Thread(0xb726f158): deadlock (fatal)

Anyone have an idea what may be causing the problem and what we can do
to get round it?

To be honest, I have no idea where this might come from - this trace
looks like it would be worth an email to DRb’s author.

As Ben wrote, changing the way the DRb server works might help to
prevent this.

Option 1) would be to synchronize all calls to the DRb server’s methods,
#method_missing in ferret_server.rb would be a good place to implement
this. Performance will suffer, but it might be worth a try.

Option 2) is how we did it in omdb.org’s DRb server - queue all indexing
requests and work through the queue in a separate thread. Omdb.org uses
local IndexSearcher instances to save the DRb server from handling
search.

Cheers,
Jens

PS: There are plans to give aaf and especially the DRb server a major
overhaul, and to incorporate several of the ideas from the omdb.org
solution. However this won’t be ready before November, I guess.


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

Thanks for the tips guys. We’ll look closer and see if we can come up
with more specific details on when this happens.

At the moment we have low load on the system but call rebuild_index
every now and again. Our guess is that this may be causing the problems

  • but again, we’ll give it a closer look.

Matthew