Rails benchmarks for JRuby

Hello everyone,

I put together the beginnings of a rails benchmark suite in response
to some discussions on the list. I don’t have a ton of bench suites
right now as I haven’t been very successful in identifying any major
bottlenecks using the profiling tools I have and my limited
understanding of the JRuby internals.

My full-cycle testing thus far revealed that Jruby 1.1.1 on Java 5 is
about 30% slower than MRI 1.8.6 for 100k requests. This was running
without ActiveRecord or any template rendering. Only dispatching,
routing, and action execution.

Routing is much slower. 12 seconds for MRI 1.8.6 versus 45 seconds
for Jruby 1.1.1 with the -server switch, executing 200k
Routes.recognize() calls. I wrote this one on a hunch, turned out I
was right.

All you should need to run them is the rails gems installed in your
various environments.

You can download the package from here:
http://people.dashf.com/~ll/jruby_rails_bench.tar.gz

The archive will expand to ./test/bench/rails

I’ll be adding more benchmarks to the collection as I find time. I’ll
probably continue to tear apart request dispatching before moving on
to bench suites for template rendering.

It would be really helpful if someone familiar with profiling Jruby
could give me some insight on how I might sucessfully identify slow
ruby code.

Enjoy!

-Tyler


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

I wonder why something like routing would be that much slower than
MRI…does that suggest there is a specific low level language or
library operation that in jruby implementation is still very slow?

Routing is much slower. 12 seconds for MRI 1.8.6 versus 45 seconds
for Jruby 1.1.1 with the -server switch, executing 200k
Routes.recognize() calls. I wrote this one on a hunch, turned out I
was right.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Here’s my results:arthman:rails jjathman$ java -version
java version “1.5.0_13”
Java™ 2 Runtime Environment, Standard Edition (build
1.5.0_13-b05-237)
Java HotSpot™ Client VM (build 1.5.0_13-119, mixed mode, sharing)
arthman:rails jjathman$ jruby bench_rails_routing.rb
default routing (/controller/action)
0.346000 0.000000 0.346000 ( 0.346000)
0.974000 0.000000 0.974000 ( 0.974000)
4.303000 0.000000 4.303000 ( 4.303000)
8.584000 0.000000 8.584000 ( 8.583000)
17.188000 0.000000 17.188000 ( 17.187000)

arthman:rails jjathman$ java -version
java version “1.6.0_05”
Java™ SE Runtime Environment (build 1.6.0_05-b13-120)
Java HotSpot™ 64-Bit Server VM (build 1.6.0_05-b13-52, mixed mode)
arthman:rails jjathman$ jruby bench_rails_routing.rb
default routing (/controller/action)
0.687000 0.000000 0.687000 ( 0.687000)
2.267000 0.000000 2.267000 ( 2.267000)
3.106000 0.000000 3.106000 ( 3.106000)
5.960000 0.000000 5.960000 ( 5.960000)
11.963000 0.000000 11.963000 ( 11.963000)

arthman:rails jjathman$ ruby bench_rails_routing.rb
default routing (/controller/action)
0.050000 0.000000 0.050000 ( 0.047169)
0.560000 0.010000 0.570000 ( 0.564460)
2.910000 0.010000 2.920000 ( 2.922363)
5.720000 0.030000 5.750000 ( 5.751431)
11.530000 0.050000 11.580000 ( 11.615236)

Here are my results. I have a Thinkpad T61 running Ubuntu - Dual core
3GB memory.

$ jruby -v
ruby 1.8.6 (2008-04-22 rev 6555) [i386-jruby1.1.1]
$java -version
java version “1.6.0_03”

(trunk would get a .5 second increase somewhere due to startup
enhancement)

MRI
default routing (/controller/action)
0.060000 0.010000 0.070000 ( 0.088216)
0.680000 0.070000 0.750000 ( 0.754338)
3.360000 0.330000 3.690000 ( 3.712149)
6.850000 0.520000 7.370000 ( 7.413091)
13.520000 1.220000 14.740000 ( 14.818813)

JRuby with no JAVA_OPTS variations

$ jruby bench_rails_routing.rb
default routing (/controller/action)
0.820000 0.000000 0.820000 ( 0.821367)
2.729000 0.000000 2.729000 ( 2.729202)
3.374000 0.000000 3.374000 ( 3.374406)
6.042000 0.000000 6.042000 ( 6.041808)
11.481000 0.000000 11.481000 ( 11.480242)

JRuby with JAVA_OPTS=“-Xms128m -Xmx1024m -XX:MaxPermSize=256m -server”

$ jruby bench_rails_routing.rb
default routing (/controller/action)
0.606000 0.000000 0.606000 ( 0.607520)
2.242000 0.000000 2.242000 ( 2.241951)
3.415000 0.000000 3.415000 ( 3.414726)
5.791000 0.000000 5.791000 ( 5.790376)
11.526000 0.000000 11.526000 ( 11.525567)

JRuby with JAVA_OPTS=“-Xms128m -Xmx1024m -server”
$ jruby bench_rails_routing.rb
default routing (/controller/action)
0.597000 0.000000 0.597000 ( 0.597342)
2.694000 0.000000 2.694000 ( 2.695142)
2.893000 0.000000 2.893000 ( 2.891958)
5.624000 0.000000 5.624000 ( 5.624612)
11.235000 0.000000 11.235000 ( 11.235389)

Mike H.
Program Manager, Collaborative Software Initiative
[email protected]
http://www.csinitiative.com

Tyler J. wrote:

without ActiveRecord or any template rendering. Only dispatching,
You can download the package from here:
ruby code.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Tyler J. wrote:

I’ll be adding more benchmarks to the collection as I find time. I’ll
probably continue to tear apart request dispatching before moving on
to bench suites for template rendering.

It would be really helpful if someone familiar with profiling Jruby
could give me some insight on how I might sucessfully identify slow
ruby code.

Thanks Tyler…I have a few suggestions:

  • There’s a GSoC project getting going very soon to build out just these
    sorts of benchmark suites and additionally find out where the
    bottlenecks are in JRuby. You’ve got a great start here, so maybe you
    could help me mentor the student, Sergey. Sergey, are you out there?

  • How much warmup do these have? The number one flaw in most benchmarks
    for JRuby is that there’s not enough warmup. And some of these runs are
    really, really short. Granted, you want your server to be as fast as
    possible right away, so cold benchmarks have uses too.

  • You ought to pull JRuby trunk and use that to track results. We’re
    always doing more performance work.

  • One way to get some profiling numbers might be to run with --profile
    and isolate the different types of runs. There’s also a whole myriad of
    Java based profiling tools that can give you a ton of information.
    They’re not trivial to hook up, but it’s usually one-time pain and then
    you’re set. I’d recommend the NetBeans profiler, YourKit (pay),
    JProfiler (pay, but I think they give OSS licenses), and there are
    certainly others.

  • Our bottlenecks are usually going to be core class methods, but
    sometimes they involve part of the execution pipeline, like inefficient
    compiled code or too much overhead during calls. Profiling can usually
    show if it’s core class methods, but often fails to show runtime
    overhead very well.

  • We’re actively working now to improve the performance of JRuby on
    Rails, so this is very timely work.

  • I have also noticed that applications with more “normal” user code do
    better than simple mostly-generated, mostly-magic apps. That means
    there’s something Rails is doing that we’re really bad at, and we
    haven’t found it yet.

  • In general, the lack of complete Rails performance is our biggest
    irritant right now.

Thanks for this effort! And the results from others look
promising…close to MRI or sometimes a little faster. I know we can be
faster than this, probably a lot faster. Hopefully you can help us all
get there.

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

SoyLatte on MacOS:

ruby bench_rails_routing.rb
default routing (/controller/action)
0.060000 0.000000 0.060000 ( 0.063164)
0.680000 0.010000 0.690000 ( 0.715037)
3.440000 0.030000 3.470000 ( 3.823978)
6.880000 0.020000 6.900000 ( 6.976308)
13.640000 0.070000 13.710000 ( 13.727037)

jruby -J-server bench_rails_routing.rb
default routing (/controller/action)
0.856000 0.000000 0.856000 ( 0.855000)
2.743000 0.000000 2.743000 ( 2.743000)
3.500000 0.000000 3.500000 ( 3.501000)
6.601000 0.000000 6.601000 ( 6.601000)
13.178000 0.000000 13.178000 ( 13.178000)

I think Java6 and -J-server will make a big difference

-Tom

On Tue, May 20, 2008 at 8:44 PM, Tyler J.
[email protected] wrote:

without ActiveRecord or any template rendering. Only dispatching,
You can download the package from here:
ruby code.


Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: [email protected] , [email protected]


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Can you rerun with -J-server?

-Tom

On Tue, May 20, 2008 at 11:00 PM, Joseph A. [email protected]
wrote:

8.584000 0.000000 8.584000 ( 8.583000)
5.960000 0.000000 5.960000 ( 5.960000)


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: [email protected] , [email protected]


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Responses below…

Thanks Tyler…I have a few suggestions:

  • There’s a GSoC project getting going very soon to build out just these
    sorts of benchmark suites and additionally find out where the bottlenecks
    are in JRuby. You’ve got a great start here, so maybe you could help me
    mentor the student, Sergey. Sergey, are you out there?

I’d love to help out. Just fill me in on the role and hook me up with
Sergey.

  • How much warmup do these have? The number one flaw in most benchmarks for
    JRuby is that there’s not enough warmup. And some of these runs are really,
    really short. Granted, you want your server to be as fast as possible right
    away, so cold benchmarks have uses too.

On java 5 it looks like performance becomes linear before 100k
requests. I’ve experimented with different run lengths and settled on
a high-end of 100k to 200k requests. Higher values than that don’t
appear to offer any performance improvement and take a very long time
to run.

What would you suggest?

  • You ought to pull JRuby trunk and use that to track results. We’re always
    doing more performance work.

I have it, I’ll do that.

  • One way to get some profiling numbers might be to run with --profile and
    isolate the different types of runs. There’s also a whole myriad of Java
    based profiling tools that can give you a ton of information. They’re not
    trivial to hook up, but it’s usually one-time pain and then you’re set. I’d
    recommend the NetBeans profiler, YourKit (pay), JProfiler (pay, but I think
    they give OSS licenses), and there are certainly others.

I tried ruby’s profiling library, Nick’s stickshift, and the Netbeans
profiler. For various reasons none of those tools were of much help
in isolating the parts of Rails that JRuby is having trouble with.

I’ll try the --profile switch and see what that does for me. Right
now my gut is telling me I may need to roll my own.

  • In general, the lack of complete Rails performance is our biggest irritant
    right now.

Let me know if there is anything specific I can do to help.

Thank you for your feedback and support!

-Tyler


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

I wonder why something like routing would be that much slower than
MRI…does that suggest there is a specific low level language or library
operation that in jruby implementation is still very slow?

This was my fault, I had an old JRuby version on the path that I was
unaware of. I re-ran with JRuby 1.1.1 and have results consistent with
the others for the routing benchmark.

-Tyler


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Tyler J. wrote:

Count Time Pct Location
===== ==== === ========
9981 18708.9 45.5
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.routing.recognize_path5727529_13459508BlockCallback$block_0$RUBY$__block__xx1:call

Looks like the recognize_path method in routing is the place to focus on
for this one…

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

I ran the full request cycle benchmarks with the --profile switch and
came up with some (I think) conclusive results. Routing appears to be
the bottle neck. Profiler output included below, expand your browser
=].

Thanks Charles for pointing out the --profile switch, that is exactly
what I was looking for.

-Tyler

ruby 1.8.6 (2008-05-12 rev 6586) [i386-jruby1.1.1+]

±-------------------------------------
| Most expensive methods (by net time)
| Frame Count Limit: 10
±-------------------------------------

           Net
      ------------

Count Time Pct Location
===== ==== === ========
9981 18708.9 45.5
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.routing.recognize_path5727529_13459508BlockCallback$block_0$RUBY$__block__xx1:call
1 4659.7 11.3
ruby.projects.jruby.test.bench.rails.bench_rails_render:file
9981 4466.9 10.9
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.cgi_process.session3792518_13459508BlockCallback$block_0$RUBY$__block__xx1:call
10000 3069.9 7.5
ruby.projects.jruby.test.bench.rails.bench_rails_render:block_3$RUBY$block
19962 1819.1 4.4
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.dispatcher.run_callbacks15307693_13459508BlockCallback$block_0$RUBY$__block__xx1:call
9981 1422.5 3.5
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.benchmarking.perform_action_with_benchmark9544264_13459508BlockCallback$block_0$RUBY$__block__xx1:call
2 580.8 1.4
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.activesupport_minus_2_dot_0_dot_2.lib.active_support.dependencies.require15495773_13459508BlockCallback$block_0$RUBY$__block__xx1:call
9981 531.4 1.3
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.benchmarking.render_with_benchmark2255122_13459508BlockCallback$block_0$RUBY$__block__xx1:call
9981 487.1 1.2
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.cgi_ext.cookie.to_s6442158_13459508BlockCallback$block_0$RUBY$__block__xx1:call

±-------------------------------------+
| Most expensive methods summarized |
±-------------------------------------+

           Net
      ------------

Count Time Pct Location
===== ==== === ========
9981 18708.9 45.5
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.routing.recognize_path5727529_13459508BlockCallback$block_0$RUBY$__block__xx1:call
1 4659.7 11.3
ruby.projects.jruby.test.bench.rails.bench_rails_render:file
9981 4466.9 10.9
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.cgi_process.session3792518_13459508BlockCallback$block_0$RUBY$__block__xx1:call
10000 3069.9 7.5
ruby.projects.jruby.test.bench.rails.bench_rails_render:block_3$RUBY$block
220 2576.4 6.3
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.activesupport_minus_2_dot_0_dot_2.lib.active_support.dependencies.require15495773_13459508BlockCallback$block_0$RUBY$__block__xx1:call
39964 2067.5 5.0
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.dispatcher.run_callbacks15307693_13459508BlockCallback$block_0$RUBY$__block__xx1:call
9981 1422.5 3.5
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.benchmarking.perform_action_with_benchmark9544264_13459508BlockCallback$block_0$RUBY$__block__xx1:call
1860 899.3 2.2
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.activesupport_minus_2_dot_0_dot_2.lib.active_support.core_ext.module.introspection.local_constants2583806_13459508BlockCallback$block_0$RUBY$__block__xx1:call
9981 531.4 1.3
ruby.jit.ruby.projects.jruby.lib.ruby.gems.$1_dot_8.gems.actionpack_minus_2_dot_0_dot_2.lib.action_controller.benchmarking.render_with_benchmark2255122_13459508BlockCallback$block_0$RUBY$__block__xx1:call


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

I’ve isolated the performance issue with routing. Looks like
returning from within a block is the culprit. These are the offending
lines from recognize_path in routing.rb:

routes.each do |route|
result = route.recognize(path, environment) and return result
end

Consider the following benchmark: Parked at Loopia

MRI: ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.10.1]
ruby bench_return_from_block.rb
Block without return
0.820000 0.000000 0.820000 ( 0.823866)
0.820000 0.010000 0.830000 ( 0.821681)
0.810000 0.000000 0.810000 ( 0.812487)
0.810000 0.000000 0.810000 ( 0.818173)
0.820000 0.000000 0.820000 ( 0.823042)
Block with return
1.260000 0.010000 1.270000 ( 1.269754)
1.260000 0.000000 1.260000 ( 1.270027)
1.270000 0.000000 1.270000 ( 1.276084)
1.260000 0.010000 1.270000 ( 1.266629)
1.270000 0.000000 1.270000 ( 1.289375)

JRuby: ruby 1.8.6 (2008-05-12 rev 6586) [i386-jruby1.1.1+]
jruby -J-server bench_return_from_block.rb
Block without return
1.126000 0.000000 1.126000 ( 1.126000)
0.785000 0.000000 0.785000 ( 0.784000)
0.767000 0.000000 0.767000 ( 0.768000)
0.765000 0.000000 0.765000 ( 0.765000)
0.771000 0.000000 0.771000 ( 0.770000)
Block with return
4.358000 0.000000 4.358000 ( 4.357000)
4.237000 0.000000 4.237000 ( 4.237000)
4.231000 0.000000 4.231000 ( 4.231000)
4.256000 0.000000 4.256000 ( 4.255000)
4.498000 0.000000 4.498000 ( 4.497000)

-Tyler


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Tyler J. wrote:

I’ve isolated the performance issue with routing. Looks like
returning from within a block is the culprit. These are the offending
lines from recognize_path in routing.rb:

routes.each do |route|
result = route.recognize(path, environment) and return result
end

I’ve committed some small improvements for this, but it’s still not
faster than MRI. Could you rebench and file a bug for that?

  • Charlie

To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

I’ve committed some small improvements for this, but it’s still not faster
than MRI. Could you rebench and file a bug for that?

Done: http://jira.codehaus.org/browse/JRUBY-2589

-Tyler


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email