Performance problems when rendering Rails Erb templates

Hi,

Over the past few weeks we’ve been trying to diagnose severe performance
bottlenecks in our JRuby on Rails app, which I had originally reported
in
http://jira.codehaus.org/browse/JRUBY-4773. While the fix for 4773 did
improve the situation, the problem did not disappear.

Today we were able to isolate the problem and it seems like it’s due to
nested Erb partial rendering.

Environment: Jruby 1.5.1, Rails 2.3.4 on Centos 5.4 (64 bit), JDK
1.6.0_17
(32 bit), Jruby-Rack 0.9.6 (1 Runtime, Rails in multithreaded mode),
Jetty
6.

The Rails action in question would fetch backend data in around 100ms
and
would then take 2300ms to render the view. The view involved nested
calls to
partials about 3-levels deep with a total of about 150 partial calls per
page.

Serial requests took around 2400ms, concurrent requests (10 at a time)
took
over 25sec. The requests would fetch backend data concurrently, but
would
all block for 25 sec during view rendering and then complete shortly
thereafter.

The profiler (Yourkit) did not show any further information. I tried
profiling with jit.compile.mode=FORCE, but that didn’t help either.

I also tried hooking up Jruby-Prof, but for some reason the
GlobalRuntime
that it was trying to add an EventHook to was not the one that
Jruby-Rack
created (I was using a patched Jruby-Rack 0.9.6 which set the runtime
created by the DefaultRackApplication to be the GlobalRuntime, using
Ruby.useAsGlobalRuntime). Not sure if this was a classloader problem,
didn’t
investigate this further.

When we collapsed all partials into 1 erb template, the response times
dropped to 250 ms for serial requests and 1100ms for concurrent
requests,
more than an order of magnitude faster!

Has anyone seen this before, any idea what might be causing this? Or
have we
got some basic setting wrong?

What tools do you use to investigate such problems?

Thanks!

  • Yogi

Can you help us cook up a simple example of nested partials? I know
Rails and I know nested partials, but it would be nice to have
something that simulates closely the workload of your particular
application.

Also, do you know whether you’re doing any SQL querying from the view?
Just curious, it complicates the ability to reproduce your situation
since we’d need a database of similar size behind the view to
reproduce.

ERB rendering happens to be an area Tom and I were just looking at
yesterday, so it would be great to solve the mystery.

/Nick

On Wed, May 19, 2010 at 12:44 PM, Yogi [email protected] wrote:

Environment: Jruby 1.5.1, Rails 2.3.4 on Centos 5.4 (64 bit), JDK 1.6.0_17
all block for 25 sec during view rendering and then complete shortly
investigate this further.
Thanks!

  • Yogi

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

We load the domain graph in the controller, and no db / service calls
are
made after that.

I’ll cook up an example and send it across.

Thanks,
Yogi

I was able to reproduce the performance issues pretty easily with a
simple
setup (basically 3-levels of partial calls), see attached source.

There are 2 actions in TestController, test and test_inline.

test renders a view that call 5 partials, each of which call 6 partials,
each of which call 5 partials.

test_inline renders a view with all the partials collapsed into the view

                             serial           concurrent (10 

requests at
a time)
JRuby:
/test/test 1,500 ms 20,000 ms
/test/test_inline 120 ms 813 ms

MRI:
/test/test < 25 ms
/test/test_inline < 5 ms

-Yogi

I’ll jump into the game as well. Have not run with MRI yet, but under
JRuby + Mongrel my request times for /test/test are in the 40ms range
out of the gate. There’s something seriously wrong to cause 1500-20k
ms request times.

On Thu, May 20, 2010 at 8:26 AM, Yogi [email protected] wrote:

                                serial          concurrent (10 requests at


To unsubscribe from this list, please visit:

  http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Note, my JRuby runs were on our custom server which uses an embedded
Jetty
server with a bunch of Spring interceptors before the RackServlet.

I’ve verified that an action that just does a render :text => “hi” on
our
server runs in about than 50 ms.

So yes the MRI and JRuby numbers are not directly comparable, but the
overall times with serial and concurrent load are still interesting.

Thinking aloud… I’m wondering whether Jetty’s classloader has anything
to
do with this. I’ve seen some synchronized blocks in it which might be of
concern given the runtime class generation that Jruby does combined with
Erb
generating methods for each partial. I haven’t looked into this in
enough
detail to know much more about it.

-Yogi

On Thu, May 20, 2010 at 10:25 PM, Charles Oliver N.
<[email protected]

Also I’m loading the server using Apache Bench, specifically:
serial: ab -n 25 -c 1 http:///test/test
concurrent: ab -n 50 -c 10 http:///test/test

-Yogi

MRI results are in on my machine…right around 40ms, maybe a little
below. This is in Mongrel, though, and it sounds like Nick is seeing
that anything jruby-rack based runs significantly slower (sitting with
him now).

  • Charlie

On Thu, May 20, 2010 at 11:50 AM, Charles Oliver N.
[email protected] wrote:

  http://xircles.codehaus.org/manage_email


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

On Thu, May 20, 2010 at 10:12 AM, Yogi [email protected] wrote:

Note, my JRuby runs were on our custom server which uses an embedded Jetty
server with a bunch of Spring interceptors before the RackServlet.

A quick datapoint:
/test/test_inline 186ms
/test/test 953ms

Single run (no warm-up) using JRuby 1.5.0 + Trinidad (Tomcat gem);
full log at http://pastie.org/969770


Hassan S. ------------------------ [email protected]
twitter: @hassan


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Yeah, we’re using ab for benchmarking here too…

I’m just not able to reproduce those numbers with Mongrel, so I think
it’s safe to say that JRuby and the app itself are performing very
well. I just did another run with Mongrel and various flags, and the
JRuby numbers for /test/test drop to 20ms versus MRI’s 37 to 40ms, and
the test_inline numbers are in the sub-7ms range. Pretty solid
performance all the way through (and way better than it used to be,
I’ll say).

We could try re-running this with 1.5 (I ran with 1.6dev) but I don’t
think the numbers will be much different…

It seems like the bottleneck is not in JRuby or Rails at this point
but in the pipeline ahead of them…

On Thu, May 20, 2010 at 12:14 PM, Yogi [email protected] wrote:

server with a bunch of Spring interceptors before the RackServlet.
generating methods for each partial. I haven’t looked into this in enough

him now).

On Thu, May 20, 2010 at 8:26 AM, Yogi [email protected] wrote:

test_inline renders a view with all the partials collapsed into the
 /test/test                 < 25 ms


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email