JRuby mongrel process hanging

supasta · March 17, 2009, 7:59pm

I’m experiencing some weird behavior with my JRuby on Rails app. Every
now
and then one of my mongrel processes will peg the CPU, and the request
never
finishes or writes log information. I have only been able to repro it
once
by slamming the box with ab, but that only worked once for the many
times
I’ve tried it. It seemed to start happening after I introduced a small
bit
of threading code into my app (an alternative possibility is that
whatever
the issue is existed before the threading code but our users’ behavior
changed once we launched the new code). My threading code is pretty
simple,
but as a precaution, I’ve migrated it all to use java.util.concurrency
primitives. jconsole is not able to detect any deadlocks in the hung
process. I thought it might have something to do with the
memcache-client
concurrency issues that some forums have discussed, so I’ve moved my
client
to use jruby-memcache-client. I’ve also upgraded my app to use JRuby
1.2.0,
but it’s still happening. Does anyone have any tool suggestions for how
I
can diagnose this issue?

Thanks,
Chris

supasta · March 17, 2009, 8:06pm

The obvious thing to do would be to grab a thread dump (probably using
kill -QUIT on the hung mongrel) and see where the mongrel process is
when you kill it. That will probably be a good starting point. My
personal guess is a race condition somewhere that allows the mongrel
into an invalid state, from which it never recovers.

Chris W. wrote:

primitives. jconsole is not able to detect any deadlocks in the hung
process. I thought it might have something to do with the
memcache-client concurrency issues that some forums have discussed, so
I’ve moved my client to use jruby-memcache-client. I’ve also upgraded
my app to use JRuby 1.2.0, but it’s still happening. Does anyone have
any tool suggestions for how I can diagnose this issue?

Thanks,
Chris

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

supasta · March 17, 2009, 8:58pm

It looks like most of the threads are stuck on :

java.lang.Thread.State: RUNNABLE
at java.util.HashMap.get(HashMap.java:303)
at
org.codehaus.xfire.aegis.AegisBindingProvider.getParameterType(AegisBindingProvider.java:270)
at
org.codehaus.xfire.aegis.AegisBindingProvider.initializeMessage(AegisBindingProvider.java:145)
at
org.codehaus.xfire.service.binding.AbstractBindingProvider.initialize(AbstractBindingProvider.java:54)
at
org.codehaus.xfire.aegis.AegisBindingProvider.initialize(AegisBindingProvider.java:133)
at
org.codehaus.xfire.service.binding.ObjectServiceFactory.create(ObjectServiceFactory.java:469)
at
org.codehaus.xfire.service.binding.ObjectServiceFactory.create(ObjectServiceFactory.java:374)
at
org.codehaus.xfire.service.binding.ObjectServiceFactory.create(ObjectServiceFactory.java:355)
…

which suggests to me that the multiple XFire 1.2.6 clients I’m using are
not
thread-safe?

Has anyone run into this issue before?

The other thing that’s weird, is that sometimes it pegs the CPU at 100%
and
sometimes at 200% (we’re running Ubuntu 8.04 on an EC2 c1.medium
instance).

supasta · March 17, 2009, 10:43pm

If there’s a top-level object you can create per thread, you could put
them in a pool and pull one out for each request you’re handling
concurrently. You’d have to maintain multiple xfire clients in-memory,
but that would likely keep them from stepping on each other and avoid
synchronizing all consumers against one instance.

Jacob K. wrote:

at java.util.HashMap.get(HashMap.java:303)
org.codehaus.xfire.aegis.AegisBindingProvider.initialize(AegisBindingProvider.java:133)
…
On Tue, Mar 17, 2009 at 12:30 PM, Jacob K. <[email protected]
    but as a precaution, I've migrated it all to use
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

supasta · March 17, 2009, 9:04pm

Yep, that looks like it. If you have multiple threads modifying a
hashmap, they can cause a state like that. I suspect that the 100%/200%
is either one or two threads hanging in there. However, synchronizing
around the Xfire client should make thing work (though that might also
defeat the point of the threading).

Chris W. wrote:

at 
are not thread-safe?
The obvious thing to do would be to grab a thread dump (probably
the CPU, and the request never finishes or writes log
have something to do with the memcache-client concurrency
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
  http://xircles.codehaus.org/manage_email

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

supasta · March 17, 2009, 10:50pm

Yeah, I’m implementing PoolableObjectFactory and synchronizing the
makeObject method. This serializes the clients’ creations but not their
use,
so the bottleneck is minimal.

Thanks for your help, all.

On Tue, Mar 17, 2009 at 2:42 PM, Charles Oliver N. <