XML parser

Hi guys,

I’m looking for some information about the xml libraries available in
Ruby.

I’ve read a few blog post about the pro’s and con’s of REXML and
Libxml but I still have some questions :

  • as I understand it REXML is part of ruby standard library and so is
    included in ruby distribution ?

  • libxml is a wrapper for gnome libxml and must be installed and
    compiled with gem ?

  • is libxml really a full validating and compliant parser ?

  • how do you use xslt in Ruby ? do you use
    http://raa.ruby-lang.org/project/ruby-xslt/
    or http://rubyforge.org/projects/libxsl/ (if I’m right the second one
    is part of libxml ? )

As you see I’m lost and I would really appreciate your help or some
comprehensive post about xml processing in ruby .

Thanks !

Cedric

On Jul 19, 10:42 pm, Cédric H. [email protected] wrote:

comprehensive post about xml processing in ruby .

Thanks !

Cedric

Parsing, manipulating XML is such wide subject. There is a more then
one bookshelf full with books about it. Doing it with Ruby is not an
exception.

Beside these two libraries mentioned there is also an Hpricot (http://
code.whytheluckystiff.net/hpricot/) and you should try it to.

When dealing with XML you should consider the following questions:
Who and on what OS the code will be running?
How big the XML document is?
Is the speed a decisive parameter?
What’s the magnitude of manipulation required?

Answers to these questions could help you pick the optimum library but
you should be familiar with all of them.

Do a research, play a little and pick the more appealing to you.

Cédric H. wrote:

I’m looking for some information about the xml libraries available in
Ruby.

I’ve read a few blog post about the pro’s and con’s of REXML and
Libxml but I still have some questions :

  • as I understand it REXML is part of ruby standard library and so is
    included in ruby distribution ?

Yes. It’s also widely acknowledged as very slow. The RE stands for
Regular
Expressions, which are only fast when used carefully. Basing an entire
parser on
them tends to abuse them.

This blog show how to spot-check compliance issues in the three leading
Ruby XML
parsers:

  • libxml is a wrapper for gnome libxml and must be installed and
    compiled with gem ?

Ordinarily, that process would be mostly harmless. You may already have
libxml2-dev, if you have a GNU platform such as Ubuntu or CygWin.

However, the current libxml-ruby has a nasty bug. First, it sprays lots
of

No definition for ruby_xml_parser_context_options_get

into the console. Then it refuses to install the libxml_so.so file that
it just
created. I don’t know this bug’s status, but because my assert_xpath
works best
with libxml, I must overcome it whenever we build a new workstation at
work!
Sometimes I must manually copy its executables into Ruby’s paths…

(Our production code does not use libxml - only the test code.)

I just tried to install while writing this post, and 0.8.1 might have
worked on
Ubuntu.

  • is libxml really a full validating and compliant parser ?

I suspect it’s the reference implementation for XML. It certainly takes
every
DOCTYPE and schema very seriously!

Better, it actually forgives some errors and keeps working, unlike REXML

As you see I’m lost and I would really appreciate your help or some
comprehensive post about xml processing in ruby .

Sorry! I was knocking 'em down, and you lost me at XSLT.

In a pinch, I would pipe text thru xsltproc, and not worry about deep
language
integration. XSLT is nothing but a big filter, so I thought you could
use it
without making an object out of it.

Beside these two libraries mentioned there is also an Hpricot (http://
code.whytheluckystiff.net/hpricot/) and you should try it to.

Hpricot is a jack-of-all-trades-master-of-some-of-them. Don’t look to it
for
schema validation, XSLT, or true XPath.

When dealing with XML you should consider the following questions:
Who and on what OS the code will be running?
How big the XML document is?
Is the speed a decisive parameter?
What’s the magnitude of manipulation required?

The two XML parser models are DOM and SAX.

DOM converts every tag into an Object (hence Document Object Model), and
lets
you traverse the objects. The conversion is slow, and puts the entire
document
into memory, simultaneously.

SAX lets you register callbacks to call when an XML reader encounters
certain
tags. It treats the input XML as a stream, hence zipping past nodes you
don’t
need is very fast.

But I don’t know the Ruby SAX solution!

On Jul 19, 9:03 pm, Phillip O. [email protected] wrote:

hi,

you may enjoy reading this!http://www.rubyinside.com/ruby-xml-crisis-over-libxml-0-8-0-released-
(posted two days ago)

FYI, Still some final fine-tuning going on, so don’t expect everything
to be all roses just quite yet. But we are close, and might actually
get to to a 1.0.0 release soon.

T.

hi,

you may enjoy reading this!
http://www.rubyinside.com/ruby-xml-crisis-over-libxml-0-8-0-released-955.html
(posted two days ago)

kind regards,
phillip


Am 20.07.2008 um 02:34 schrieb Phlip:

Phlip wrote:

But I don’t know the Ruby SAX solution!

REXML supports a “SAX Like” stream listening interface as well as DOM.
See the REXML tutorial at
http://www.germane-software.com/software/rexml/docs/tutorial.html,
scroll down until you see the section headed with “Stream Parsing”. The
upshot is you write a class that has callback methods (see
http://www.germane-software.com/software/rexml/doc/classes/REXML/StreamListener.html
for a complete list of callbacks) and pass an instance of the class to
REXML’s parse_stream method. REXML also supports a SAX2 API, but I have
never used it. Look for the heading “SAX2 Stream Parsing” in the
tutorial link above.

Recently converted a poor DOM based parsing solution to a stream
listener based solution (not SAX2) and realized an order of magnitude
improvement in performance.

Saludos,

-Doug

Trans wrote:

you may enjoy reading this!http://www.rubyinside.com/ruby-xml-crisis-over-libxml-0-8-0-released-
(posted two days ago)

Tx - that’s why my install today worked, right?

FYI, Still some final fine-tuning going on, so don’t expect everything
to be all roses just quite yet. But we are close, and might actually
get to to a 1.0.0 release soon.

And to use it with assert_xpath you just gotta put invoke_libxml in your
setup…