Jruby thread locks and rails performance

dubstep · February 12, 2012, 1:52pm

A while ago I posted about an issue i was seeing where calling
Module.new was resulting in jruby synchronizing some resources. We
changed our code and moved on.

Since then we added in some new code that uses the rails class
attributes (defined with class_attribute). To my surprise jruby was
synchronizing every access to a rails class attribute. I looked at
the thread dump, and sure enough it was the same lock I had run into
earlier in my own code.

at org.jruby.RubyModule.calculateName(RubyModule.java:393)

waiting to lock <58c3d9ac> (a org.jruby.MetaClass) owned by

I also found out that a bunch of other places in rails use the same
code that causes this. Things like table_name were causing locks.

The bottom line is that any rails app that uses class_attribute or
activerecord is severely performance limited. In our case we don’t
use activerecord in production, we put all of our data in memory, so
we just had to move away from using class_attribute.

In our case just using class_attribute caused us to regress from
around 700 rps to 140 rps. I’m guessing a lot of people never run
into this, because not everyone runs hardware that can push the app to
the point where the locking becomes an issue. We run 12 core Xeon
I7’s.

Charles said this particular lock was going to be removed in 1.7. I
just wanted to post this because most people probably are not aware
that rails has a bunch of places that trigger this via their
class_attribute implementation.

Chris

snacktime · February 14, 2012, 7:08pm

On Sun, Feb 12, 2012 at 6:50 AM, snacktime [email protected] wrote:

code that causes this. Things like table_name were causing locks.
Interesting that it had such an impact. Maybe I’ll see if it can be
safely/easily removed in 1.6.7.

The bottom line is that any rails app that uses class_attribute or
activerecord is severely performance limited. In our case we don’t
use activerecord in production, we put all of our data in memory, so
we just had to move away from using class_attribute.

In our case just using class_attribute caused us to regress from
around 700 rps to 140 rps. I’m guessing a lot of people never run
into this, because not everyone runs hardware that can push the app to
the point where the locking becomes an issue. We run 12 core Xeon
I7’s.

Nice machines Maybe we should get you to do more high-concurrency
testing for us!

FWIW, what did you use to determine this lock was the bottleneck?

Charlie

snacktime · February 14, 2012, 8:55pm

On Tue, Feb 14, 2012 at 12:06 PM, Charles Oliver N.
[email protected] wrote:

Interesting that it had such an impact. Maybe I’ll see if it can be
safely/easily removed in 1.6.7.

After staring at the code for a while I could not see why it needed to
be synchronized in the first place. It just reads nesting hierarchy,
nested class names, and then writes one field with a String. No need
for sync anywhere.

So it won’t be sync in 1.6.7.

I also added back some caching on master when the class name can be
determined without any anonymous components. Should make Class#name
same perf as 1.6.x for 1.7 (it had degraded a bit due to some fixes).

Charlie

snacktime · February 15, 2012, 1:46am

On Tue, Feb 14, 2012 at 10:06 AM, Charles Oliver N.
[email protected] wrote:

In our case just using class_attribute caused us to regress from
around 700 rps to 140 rps. I’m guessing a lot of people never run
into this, because not everyone runs hardware that can push the app to
the point where the locking becomes an issue. We run 12 core Xeon
I7’s.

Nice machines Maybe we should get you to do more high-concurrency
testing for us!

We do almost daily load testing, and are working to see how high we
can get our concurrency. So far, most of our cpu time is spent in
select() on sockets. Cpu sampling shows select above 90%. We do a
lot of http calls to backend services. Add in memcache, rsyslog and
scribe logging, and we do a lot of network io. I’m slowly replacing
those select calls with NIO via eventmachine.

What branch are releases cut from? I should probably start doing some
testing with that branch. What I’ve really been wanting to do is test
out the bleeding edge with java 7, the InvokeDynamic stuff is looking
pretty cool.

FWIW, what did you use to determine this lock was the bottleneck?

Thread dumps. We monitor jmx stats during load testing, and if I see
any blocking I’ll take a couple of thread dumps and go over them. We
also take a heap dump every run and look at a few things to make sure
we aren’t doing something really bad, like creating millions of
symbols (yes we actually did that).

Chris

snacktime · February 15, 2012, 7:39pm

On Tue, Feb 14, 2012 at 11:23 PM, Charles Oliver N.
[email protected] wrote:

I do.

gist.github.com

https://gist.github.com/anonymous/1838035

gistfile1.txt

2012-02-10 17:34:41
Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode):

"RMI TCP Connection(9)-10.18.18.102" - Thread t@215
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	- locked <2db0f3ae> (a java.io.BufferedInputStream)

This file has been truncated. show original

FYI game_data.rb line 9 is where the rails table_name method is
called, that’s where it all starts. You will see a bunch of blocked
threads stemming from there.

Chris

snacktime · February 15, 2012, 8:02pm

Perfect, thank you.

It’s what I expected…the getName calls are all (as far as I saw)
coming from interpreter logic to get the current name for backtrace
purposes. I’m going to make an additional change to jruby-1_6 I’ve
already made to master and have those calls not use the heavy-weight
getName.

With that and the synchronization gone, 1.6.7 should have no further
issues in this area.

Charlie

snacktime · February 15, 2012, 8:25am

On Tue, Feb 14, 2012 at 6:45 PM, snacktime [email protected] wrote:

We do almost daily load testing, and are working to see how high we
can get our concurrency. So far, most of our cpu time is spent in
select() on sockets. Cpu sampling shows select above 90%. We do a
lot of http calls to backend services. Add in memcache, rsyslog and
scribe logging, and we do a lot of network io. I’m slowly replacing
those select calls with NIO via eventmachine.

If you see excessive select usage within JRuby itself, please let us
know that too. I have a refactoring of IO/Socket in progress.

What branch are releases cut from? I should probably start doing some
testing with that branch. What I’ve really been wanting to do is test
out the bleeding edge with java 7, the InvokeDynamic stuff is looking
pretty cool.

jruby-1_6 is the branch for the stable 1.6.x line, and master is the
JRuby 1.7 (in dev) + invokedynamic stuff. Both have nightly snapshots
(http://ci.jruby.org/snapshots) or you can rvm install them.

FWIW, what did you use to determine this lock was the bottleneck?

Thread dumps. We monitor jmx stats during load testing, and if I see
any blocking I’ll take a couple of thread dumps and go over them. We
also take a heap dump every run and look at a few things to make sure
we aren’t doing something really bad, like creating millions of
symbols (yes we actually did that).

Do you have some of those dumps around? I’d like to see what paths are
calling Module#name (RubyModule.getName) so much that a synchronize
would be a bottleneck.

Charlie