YARV speed test: XML processing

I have a small (200 lines) program that processes XML files and
creates ruby object trees. It uses the native Ruby xml library REXML
a great deal.

On my MacOS X 2GHz macintel system using the latest YARV speed up my
code by a factor of 2.

ruby 1.8.4 (2005-12-24) [i686-darwin8.6.1]
53 seconds

ruby 2.0.0 (Base: Ruby 1.9.0 2006-04-08) [i686-darwin8.6.1]
YARVCore 0.4.0 Rev: 502 (2006-05-18) [opts: ]
28 seconds

My program processes BlackBoard XML course archives and produces
numerous statistics about the discussion threads. In the original
program it then produced an excel output file. In these tests that
function was removed. In my test I processed a course with 26
separate XML files with discussion threads and created ruby objects
representing the info I am interested in.

ruby 2.0.0 (Base: Ruby 1.9.0 2006-04-08) [i686-darwin8.6.1]
YARVCore 0.4.0 Rev: 502 (2006-05-18) [opts: ]
28 seconds

I’ve never looked at yarv’s internals, but I’m guessing that it creates
ruby objects directly as it processes the stream.

Unless you’re using the stream api in rexml, you’re comparing apples and
oranges. It stands to reason that parsing a stream directly is much
faster than parsing a stream to a document tree and processing the tree.

I’d be surprised if it still wasn’t faster to use Yarv to build a ruby
object tree rather than building from an xml stream, but the difference
probably wont be as great.

Hi Daniel,

probably wont be as great.
I’m not using the stream api. I’m creating a dom tree and using rexml’s
version of xpath to process the tree. I do processing on the results and
create a set of ruby objects. Unless there is something important I’m
missing I assume that yarv is just executing my algorithms faster. I
can’t see how it would know to use the stream api.

I hadn’t used yarv before and wanted to see how it works and I picked a
script that runs pretty slowly and spends most of its time in native
Ruby so I could see how much faster it would be. I think a 2x speed up
is nice.

Quoting [email protected], on Fri, May 26, 2006 at 08:59:38AM +0900:
I think that you are mistaking yarv for an XML parser. Its not. Its a
virtual machine for ruby code. Do a quick google.

I know it’s a ruby VM, I checked out the latest yarv from the subversion
repository and compiled it and then compared the speed of ruby 1.8.4 and
a newer ruby compiled by yarv by running my program with both ruby and
yarv. The xml parser is rexml a ruby library hat can be used by either
ruby or yarv.

I think you have misinterpreted my original post. I was comparing the
time of execution like this:

ruby test.rb

and then …

/usr/local/yarv/bin/ruby-yarv test.rb

On May 26, 2006, at 3:23 AM, Stephen B. wrote:

He knows that you know. He doesn’t think Daniel S. knows.
That’s who he was replying to.

Quoting [email protected], on Fri, May 26, 2006 at 08:59:38AM +0900:

ruby 2.0.0 (Base: Ruby 1.9.0 2006-04-08) [i686-darwin8.6.1]
YARVCore 0.4.0 Rev: 502 (2006-05-18) [opts: ]
28 seconds

I’ve never looked at yarv’s internals, but I’m guessing that it creates
ruby objects directly as it processes the stream.

I think that you are mistaking yarv for an XML parser. Its not. Its a
virtual machine for ruby code. Do a quick google.

The benchmarks are on the same code, same rexml, same xml processing
technique, same numbers of objects created.

Cheers,
Sam