I had a tough time finding recent performance benchmarks for JRuby, so I put together an analysis of multi-threaded performance (JRuby vs 1.9 vs 1.8) here: http://www.restlessprogrammer.com/2013/02/multi-th... Exercise: Counting to 1 million on 4 separate threads (simultaneously) - repeat 20 times & take average JRuby code: https://gist.github.com/phyous/4720249 Ruby 1.8/1.9 code: https://gist.github.com/phyous/4720240 Results: JRuby(1.7.1): 199.3ms Ruby(1.9.3p286): 610.0ms Ruby(ree-1.8.7-2012.02): 748.6ms Does anyone else have some pointers to other benchmarks for JRuby >1.7?
on 2013-02-06 08:20
on 2013-02-06 16:05
Philip - I'm not an expert on HotSpot optimization, but I think I remember hearing that in cases like yours it's possible for the optimizer to notice that removing the loop code, and even the loop itself would have no effect on the running program, and remove them. So for benchmarks like these, I try to make it less likely for HotSpot to be confident about that. One way is to introduce a side effect (such as outputting something somewhere), but that affects the timing measurements. Another way is to call a function from inside the loop. Anyone have any wisdom about this? - Keith --- Keith R. Bennett http://about.me/keithrbennett
on 2013-02-06 17:47
At this point in time we do not inline that times block. So hotspot cannot realize it is a simple loop incrementing an unused local variable (thus potentially eliminating the loop and the variable value set to a no-op). I think your advice about making sure there is a side-effect is sage though. The minor nit I have (and I am really I think just expounding on Keith's point) with this benchmark is that the actual work needed is not as important as showing that this work is happening in parallel. If the work performed per thread can optimize a lot then it might end up including other optimizations and not show native threading benefit. To the lay reader perhaps they don't care to see that performance benefit isolated, but I think it may muddle the picture a bit. [sidebar: I was going to stop there but I remembered that MRI 1.9.3 used tagged pointers for fixnum so they can do not box fixnums like we do. If I change fixnum to a float (MRI 2.0 has flonums but 1.9.3 does not) we then see both implementations allocating a full Ruby object and MRI perf drops by a factor of 2 on this bench. JRuby performance stays the same. This is just an example of how the optimization of the work you are performing per-thread can improperly influence benefit or parallelism.] In an unrealistic perfect world (ignore Amdahl's law -- at 4 cores it is not a big player anyways?) we should see the time mostly linear up to the number of usable cores. Your bench showed nearly 4x speed up on a 4 core machine. This is interesting since GC and a few other threads get started on the JVM (maybe hyperthreads are helping here?). If I were to suggest something I would consider showing the graphs of 1 - 4 threads as separate threads. If your bench is showing actual benefit of parallel execution you should see mostly the same result on JRuby and a linear slowdown on MRI. At least I think showing the slowdown on GIL side is more compelling (IMHO). -Tom On Wed, Feb 6, 2013 at 9:01 AM, Keith Bennett <firstname.lastname@example.org> wrote: > --- >> Exercise: >> JRuby(1.7.1): 199.3ms >> > -- blog: http://blog.enebo.com twitter: tom_enebo mail: email@example.com
on 2013-02-06 17:50
"as separate threads" -- "as separate graphs" On Wed, Feb 6, 2013 at 10:45 AM, Thomas E Enebo <firstname.lastname@example.org> wrote: > up including other optimizations and not show native threading > the work you are performing per-thread can improperly influence > JRuby and a linear slowdown on MRI. At least I think showing the >> >>> I had a tough time finding recent performance benchmarks for JRuby, so I >>> >>> -- >> --------------------------------------------------------------------- > mail: email@example.com -- blog: http://blog.enebo.com twitter: tom_enebo mail: firstname.lastname@example.org