Forum: JRuby jruby 1.7.0 with invokedynamic.all=true on HotSpot not faster than 1.6.8?

Posted by Kristy Branko (Guest)
on 2012-10-22 21:40
(Received via mailing list)
Hi,

I'm testing jruby 1.7.0 and comparing some performance results with 
jruby
on 1.6.8. It seems like I cannot see any performance improvements with
jruby 1.7.0 even if I enable invokedynamic.all=true.

I'm probably doing something wrong because invokedynamic.all=true should
give performance boost. Does somebody has any idea where to check for
possible problems? Do I need a special flags for HotSpot java7?

Here is my setup:
$ uname -a
Linux Ubuntu-1204-precise-64-minimal 3.2.0-29-generic #46-Ubuntu SMP Fri
Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

$  lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04 LTS
Release: 12.04
Codename: precise

$ cat ~/.jrubyrc
compat.version=1.8
backtrace.mask=true
backtrace.style=mri
invokedynamic.all=true

$ java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)

t$ jruby -v
jruby 1.7.0.RC2 (ruby-1.8.7p370) 2012-10-13 3b90805 on Java HotSpot(TM)
64-Bit Server VM 1.7.0_07-b10 [linux-amd64]

I also tried adding the following command line option and no luck:
-J-XX:CompileCommand=dontinline,org.jruby.runtime.invokedynamic.InvokeDynamicSupport::invocationFallback

Do I need a special flags for HotSpot java7?


Thanks,
kbranko
Posted by Charles Nutter (headius)
on 2012-10-22 22:16
(Received via mailing list)
Hello!

On Mon, Oct 22, 2012 at 2:39 PM, Kristy Branko
<cloudhq.tester.4@gmail.com> wrote:
> I'm testing jruby 1.7.0 and comparing some performance results with jruby on
> 1.6.8. It seems like I cannot see any performance improvements with jruby
> 1.7.0 even if I enable invokedynamic.all=true.

The flag you are looking for is compile.invokedynamic=true. On Java 7,
invokedynamic is disabled, due to bugs in the implementation currently
available in OpenJDK/Oracle JDK 7. -Xcompile.invokedynamic=true (or
equivalent in .jrubyrc) will turn it on.

Please do let us know how the performance looks once you have it
enabled, and feel free to file issues if anything is slower with
invokedynamic enabled.

- Charlie
Posted by Karthik Man (bitblender)
on 2013-01-22 07:56
Hi,

On Darwin Kernel Version 12.2.1: root:xnu-2050.20.9~2/RELEASE_X86_64
x86_64, I see that jruby with -Xcompile.invokedynamic=false being
significantly faster than with -Xcompile.invokedynamic=true for my
trivial test program. Why is this so ?

I am a ruby/jruby noob but I have a decent understanding of JVM
internals. Please excuse me if this slowdown is due to a noob
error/oversight.

jruby build
===========

[Karthik:~/test/jruby]jruby --version
jruby 1.7.2 (1.9.3p327) 2013-01-04 302c706 on Java HotSpot(TM) 64-Bit
Server VM 1.7.0_11-b21 [darwin-x86_64]

Simple program
===============
[Karthik:~/test/jruby]cat dispatch.rb

def twoArg(a,b)
 a + b
end

$i = 0
$limit = 100000
$sum = 0

while $i < $limit
  $sum = $sum + twoArg(5,6)
  $i = $i + 1
end
puts("done #{$sum}")

Results
========

[Karthik:~/test/jruby]time jruby -Xcompile.invokedynamic=false
dispatch.rb
done 1100000

real  0m2.042s
user  0m3.560s
sys  0m0.164s

[Karthik:~/test/jruby]time jruby -Xcompile.invokedynamic=true
dispatch.rb
done 1100000

real  0m9.721s
user  0m16.963s
sys  0m0.542s
Posted by Keith B. (keith_b)
on 2013-01-22 08:13
(Received via mailing list)
Karthik -

I have no idea, but is it possible invokedynamic involves some startup 
cost that is responsible for this?  You could try with just one 
iteration and see what the difference is.

One thing I learned today is that 100,000 iterations is not enough to 
accurately test JVM performance.  I'd say if it doesn't take minutes, a 
test isn't long enough. ;)

Also, some Ruby style tips

* Is there a reason you're using global variables (with '$')?  They're 
frowned upon, unless there is a compelling reason to use them.

* One of the greatest things about Ruby is the enumerable functions and 
the improved looping syntax.  For the kind of thing you're doing, it 
could be done more cleanly like this, without the need for the 'i' 
variable:

limit.times {  sum += twoArg(5,6) }

* Ruby convention is to use "snake_case" for method names instead of 
"camel_case", two_arg instead of twoArg (though 'add' would be a good 
name too).

Regards,
Keith

--
Keith R. Bennett
http://about.me/keithrbennett
Posted by Wayne Meissner (Guest)
on 2013-01-22 08:25
(Received via mailing list)
Basically what Keith said - your benchmark is faulty.  It doesn't do
warmup, the iterations are too few, and it doesn't do multiple runs,
and accessing globals like that is slow.

Here is a somewhat better version.

require 'benchmark'

def twoArg(a,b)
 a + b
end

10.times {
  puts Benchmark.measure {
    i = 0
    limit = 10000000
    sum = 0
    while i < limit
      sum += twoArg(5, 6)
      i += 1
    end
  }
}
Posted by Wayne Meissner (Guest)
on 2013-01-22 08:30
(Received via mailing list)
On 22 January 2013 18:11, Keith Bennett <keithrbennett@gmail.com> wrote:
>
> * One of the greatest things about Ruby is the enumerable functions and the 
improved looping syntax.  For the kind of thing you're doing, it could be done 
more cleanly like this, without the need for the 'i' variable:
>
> limit.times {  sum += twoArg(5,6) }

Its nice syntax, but the block dispatch overhead starts to distort
benchmarks.  In some of the benchmarks I do, I use a partially
unrolled while loop (e.g. doing 4 method invocations per loop
iteration), because even the i+= 1 can add up.  i +=1 will cause a new
Fixnum to be created each iteration.  If the method you are benching
does something trivial such as add two small integers, (which hits
JRuby's fixnum cache), then the loop counter increment is actually the
major overhead in each loop.
Posted by Karthik Man (bitblender)
on 2013-01-23 00:55
I used the testcase posted by Wayne and I see that invokedynamic does 
speed things up a bit for his testcase.

I ran an experiment to see what happened if the same code was changed to 
use globals. I observed the following :

a) With or without invokedynamic, code operating on locals is 
significantly faster than code operating on globals

b) While there is improvement, with invokedynamic, for code with locals; 
code with globals becomes very slow(or hangs?) with invokedynamic.

What is the reason for a) and b) ?


The following code snippets and results should be self explanatory

==============================with locals==============================
[Karthik:~/test/jruby]cat loop_locals.rb
require 'benchmark'
def twoArg(a,b)
 a + b
end

limit = 0
sum = 0
i = 0

10.times {
  puts Benchmark.measure {
  i = 0
  limit = 10000000
  while i < limit
    sum = sum + twoArg(5,6)
    i = i + 1
  end
}
}
puts("done. loop count: #{limit} checksum: #{sum}")
[Karthik:~/test/jruby]time jruby -Xcompile.invokedynamic=false 
loop_locals.rb
  0.960000   0.050000   1.010000 (  0.636000)
  0.420000   0.010000   0.430000 (  0.387000)
  0.410000   0.000000   0.410000 (  0.405000)
  0.410000   0.000000   0.410000 (  0.404000)
  0.410000   0.000000   0.410000 (  0.409000)
  0.400000   0.000000   0.400000 (  0.401000)
  0.400000   0.010000   0.410000 (  0.405000)
  0.410000   0.000000   0.410000 (  0.403000)
  0.420000   0.000000   0.420000 (  0.410000)
  0.400000   0.010000   0.410000 (  0.410000)
done. loop count: 10000000 checksum: 1100000000

real  0m6.214s
user  0m7.891s
sys  0m0.247s
[Karthik:~/test/jruby]time jruby -Xcompile.invokedynamic=true 
loop_locals.rb
  0.760000   0.050000   0.810000 (  0.531000)
  0.320000   0.010000   0.330000 (  0.291000)
  0.310000   0.000000   0.310000 (  0.303000)
  0.310000   0.000000   0.310000 (  0.309000)
  0.350000   0.000000   0.350000 (  0.347000)
  0.400000   0.000000   0.400000 (  0.393000)
  0.370000   0.010000   0.380000 (  0.369000)
  0.370000   0.000000   0.370000 (  0.368000)
  0.410000   0.000000   0.410000 (  0.414000)
  0.390000   0.000000   0.390000 (  0.388000)
done. loop count: 10000000 checksum: 1100000000

real  0m5.667s
user  0m7.420s
sys  0m0.251s

============================with 
globals===================================

[Karthik:~/test/jruby]cat loop_globals.rb
require 'benchmark'
def twoArg(a,b)
 a + b
end

$limit = 0
$sum = 0
$i = 0

10.times {
  puts Benchmark.measure {
  $i = 0
  $limit = 10000000
  while $i < $limit
    $sum = $sum + twoArg(5,6)
    $i = $i + 1
  end
}
}
puts("done. loop count: #{$limit} checksum: #{$sum}")
[Karthik:~/test/jruby]time jruby -Xcompile.invokedynamic=false 
loop_globals.rb
  2.550000   0.070000   2.620000 (  2.210000)
  2.020000   0.010000   2.030000 (  1.978000)
  1.990000   0.000000   1.990000 (  1.989000)
  2.000000   0.010000   2.010000 (  1.991000)
  2.070000   0.010000   2.080000 (  2.062000)
  2.040000   0.000000   2.040000 (  2.023000)
  2.020000   0.010000   2.030000 (  2.011000)
  2.020000   0.000000   2.020000 (  2.007000)
  2.310000   0.010000   2.320000 (  2.313000)
  2.340000   0.010000   2.350000 (  2.330000)
done. loop count: 10000000 checksum: 1100000000

real  0m22.756s
user  0m24.597s
sys  0m0.296s

[Karthik:~/test/jruby]time jruby -Xcompile.invokedynamic=true 
loop_globals.rb
^C
real  2m4.312s
user  2m26.083s
sys  0m0.851s
[Karthik:~/test/jruby]
[[ Killed the last test run. It does not complete even a single 
iteration in 6x the time that the previous(without invokedynamic) run 
finished 10 iterations ]]
Posted by Keith B. (keith_b)
on 2013-01-24 02:38
(Received via mailing list)
Wayne -

I hadn't thought about the block dispatch overhead.  I guess raw looping 
would always be faster than any kind of function/block call -- some kind 
of goto or jump call in byte code with no need to push or pop parameters 
to/from the stack I presume.

However, if one is testing different strategies, the relationship of the 
performance results should be accurate (which is greater than which), 
wouldn't it?  It's just that the ratios would not be accurate, right? 
That is, if the .times iteration overhead took .5 seconds, and 1 
strategy produced 1.0 second and another 2.0, then the latter would 
really be 3 times slower ((2.0 - 0.5 = 1.5) / (1.0 - 0.5 = 0.5), rather 
than the 2x indicated by the numbers (2.0 / 1.0).

Can you say more about the production of the intermediate fixnum using i 
+= 1?  I thought it was just a syntactic convenience that would be 
treated identically to the more verbose i = i + 1.

Thanks,
Keith
Posted by Wayne Meissner (Guest)
on 2013-01-24 23:33
(Received via mailing list)
On 24 January 2013 11:37, Keith Bennett <keithrbennett@gmail.com> wrote:
> Wayne -
>
> I hadn't thought about the block dispatch overhead.  I guess raw looping would 
always be faster than any kind of function/block call -- some kind of goto or jump 
call in byte code with no need to push or pop parameters to/from the stack I 
presume.
>
> However, if one is testing different strategies, the relationship of the 
performance results should be accurate (which is greater than which), wouldn't it? 
It's just that the ratios would not be accurate, right?  That is, if the .times 
iteration overhead took .5 seconds, and 1 strategy produced 1.0 second and another 
2.0, then the latter would really be 3 times slower ((2.0 - 0.5 = 1.5) / (1.0 - 
0.5 = 0.5), rather than the 2x indicated by the numbers (2.0 / 1.0).

The problem comes when e.g. block dispatch overhead = 0.9 seconds, and
the method under test takes 0.1 seconds.  You start to lose
resolution.  And you need to benchmark empty block dispatch and take
that away from the result ... and its just easier to use a partially
unrolled while loop.


>
> Can you say more about the production of the intermediate fixnum using i += 1? 
I thought it was just a syntactic convenience that would be treated identically to 
the more verbose i = i + 1.

It is identical (afaik).  Fixnums in JRuby are immutable objects - so
when you do i = i + 1, you're doing something equivalent to:

    i = new RubyFixnum(i.getLongValue() + 1)

So, every loop iteration does object allocation.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.