Introducing Xaggly, a C-based XML Parser for Ruby

I have written a C-based XML parser as a ruby plugin. I managed to
benchmark it against REXML and Hpricot, and it appears to run quite
speedily (http://involution.com/images/xmlshootout.png).

The source code is here:
http://involution.com/xaggly.tar.gz

Unfortunately, my XPath support is very primitive right now. The
package only supports fully qualified queries, and attribute searches
aren’t working yet. So, you can search /html/body/* to give you all
of the tags open under body, but things like //p and //p[class=foo] do
not work yet. Attributes are parsed and can be accessed from Ruby
though.

I was trying to jigger this plugin to a gem, but it appears that mkmf
doesn’t detect the presence of Flex and Bison files automatically. Is
there a supported or standard for doing such a thing?

Regards,

Tony P.
http://involution.com

Hi Tony,

On 12/31/06, Tony P. [email protected] wrote:

I have written a C-based XML parser as a ruby plugin. I managed to
benchmark it against REXML and Hpricot, and it appears to run quite
speedily (http://involution.com/images/xmlshootout.png).

Sounds ambitious. Any reason you didn’t go with libxml? Have you
tried benchmarking it against libxml?

Tony P. wrote:

I was trying to jigger this plugin to a gem, but it appears that mkmf
doesn’t detect the presence of Flex and Bison files automatically. Is
there a supported or standard for doing such a thing?

From my own limited experience, it is better to ship both the flex and
bison sources files and the C files produced. My belief is that most of
the time, ‘it will just work’. But I have limited experience with bison,
flex and cross-platform stuff (apart from the fact that everytime I had
a friend try one of my things on a different platform, I had to give him
the C files because flex wasn’t understanding the same set of
options…).

Vince

Hmmm, I still wonder if it’s possible to coerce mkmf to handle .l and
.y files in a general way so I can turn this into a gem.

Tony

Tony P. wrote:

Hmmm, I still wonder if it’s possible to coerce mkmf to handle .l and
.y files in a general way so I can turn this into a gem.

In my own experience, mkmf is not really flexible. I tried to make a
more flexible version (see mkmf2.rubyforge.org, quite outdated compared
to cvs repository), but that isn’t really satisfying yet.

And, you’ll be bitten by options varying from one computer to
another… Don’t expect also that everyone will have flex and bison
installed.

I tried using libxml, but a lot of RSS feeds aren’t really conformant.
So, I had trouble reading in various feeds from different sites using
libxml. While REXML and Hpricot worked, I found them to be fairly
slow to parse large files. That was the impetus for me writing this
library.

Tony
http://involution.com

Hi,

In message “Re: Introducing Xaggly, a C-based XML Parser for Ruby”
on Tue, 2 Jan 2007 04:30:27 +0900, “Tony P.” [email protected]
writes:

|Hmmm, I still wonder if it’s possible to coerce mkmf to handle .l and
|.y files in a general way so I can turn this into a gem.

You can check ext/ripper/{extconf.rb,depend} for the trick. It’s in
the Subversion 1.9.

						matz.

Thank you.

Tony

“Tony P.” [email protected] writes:

I tried using libxml, but a lot of RSS feeds aren’t really conformant.
So, I had trouble reading in various feeds from different sites using
libxml. While REXML and Hpricot worked, I found them to be fairly
slow to parse large files. That was the impetus for me writing this
library.

Can you show me a non-wellformed RSS feed in the wild? I often thought
lots of them ought to exist, but I didn’t find one when I needed one…