Nokogiri under JRuby 1.7.3 significantly slower than 1.9.3-p392 (benchmark inside)

A few days ago in the IRC channel I asked whether anyone had noticed a
significant decrease in speed using Nokogiri under JRuby vs. ruby
1.9.3-p392 and Charles suggested I send in a benchmark. I’ve pasted the
notes verbatim below, but you can find the code I wrote to run the
benchmark here: Gemfile · GitHub

Scenario:

Building an XML sitemap for a website, given an Array of URLs. In the
actual example it’s an Array of Strings, but to simplify things I chose
to
use an Array of Integers which I then interpolate. This is similar to
the
actual use-case as well.

I know I could easily create the output using simple String
interpolation,
but I’m providing this Benchmark in the hopes that it can better the
community and how jruby uses Nokogiri.

System Information:

Macbook Pro 13 inch.
2.9GHz Intel Core i7
8GB 1600MHz DDR3

jruby was installed with rbenv 0.4.0

$ jruby --version
jruby 1.7.3 (1.9.3p385) 2013-02-21 dac429b on Java HotSpot™
64-Bit Server VM 1.6.0_43-b01-447-11M4203 [darwin-x86_64]

Invoked using the following command on the terminal:
$ jruby -J-Xms2048m -J-Xmx2048m -S nokogiri_benchmark.rb

Output (jruby):
user system total real
100 urls 2.490000 0.130000 2.620000 ( 1.038000)
5000 urls 36.050000 0.230000 36.280000 ( 32.682000)
10000 urls 142.100000 0.680000 142.780000 (138.808000)
30000 urls 1329.230000 5.760000 1334.990000 (1271.284000)
50000 urls 3857.420000 15.890000 3873.310000 (3570.221000)

When run on 1.9.3-p392 the following output occurs when invoking the
script
with: ruby nokogiri_benchmark.rb

Output (ruby):
user system total real
100 urls 0.010000 0.000000 0.010000 ( 0.010593)
5000 urls 0.380000 0.000000 0.380000 ( 0.379011)
10000 urls 0.740000 0.010000 0.750000 ( 0.742528)
30000 urls 2.290000 0.010000 2.300000 ( 2.308314)
50000 urls 4.060000 0.030000 4.090000 ( 4.095105)

Thanks,

  • Kush
    @KushalP

Hi,

I’ve added a comment to your gist. You might want to replace
nokogiri/builder by plain builder, until the issue is investigated.

Regards