Testing speed of xml parsing in MRI and JRuby

FYI:

I’m hoping to use Hpricot for general XML processing instead of Rexml
or Libxml in some projects and I wanted to find out the speeds of
different XML parsers in MRI and JRuby.

I was very impressed by how much faster JRuby is when running in Java
1.6 than in 1.5. In Java 1.6 Hpricot in JRuby was only 10% slower
than in MRI.

So far I’ve only got one test parsing a 100k xml file and counting a
certain type of element. I’m planning to add more tests that cover
the kind of processing I need to do.

This is the test:

Do this 100 times:

  • parse a 100k XML file and count the 466 leaf nodes

The results shown below are the times after a “rehearsal”. The times
for JRuby are faster when the JVM has been “warmed-up”. The rehearsal
has no effect on the MRI timings.

Platform and method total time

JRuby (Java 1.6.0) jdom_document_builder 0.363
MRI: libxml 0.389
JRuby (Java 1.6.0 server) jdom_document_builder 0.412
JRuby (server) jdom_document_builder 0.617
JRuby: jdom_document_builder 1.451
MRI: hpricot 2.056
JRuby (Java 1.6.0 server) hpricot 2.272
JRuby (Java 1.6.0) hpricot 2.273
JRuby (server) hpricot 3.447
JRuby: hpricot 6.198
JRuby (Java 1.6.0 server) rexml 6.251
JRuby (Java 1.6.0) rexml 6.356
MRI: rexml 7.624
JRuby (server) rexml 9.609
JRuby: rexml 12.944

  • I’d also like to add tests for Ruby 1.9.

The timings reported here are taken from the second time the 100x
loop is run for each platform/library test so the JVM should be
warmed up.

Tested on:

MacBook Pro
2.33 GHz Intel Core 2 Duo
4 GB memory
running MacOS X 10.5.2

Ruby versions tested:
MRI: ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]
JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on Java
1.5.0_13
JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on
Java 1.6.0_03 (Soylatte)

Library versions MRI:
libxml-ruby 0.5.4
hpricot 0.6

Library versions JRuby:
hpricot 0.6.161

More details are available in thelinks below:

Benchmark code and data checked into subversion here:
https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks

Trac:
http://trac.cosmos.concord.org/projects/browser/trunk/common/ruby/xml_benchmarks

  • Hpricot uses code created by Ragel, a state machine compiler that
    can produce C or Java code, for the initial parsing. The Ragel =>
    Java compiler can only produce one style of code generation and it is
    not the fastest. The style chosen by Hpricot for generating the C
    code produces a larger executable and is faster.