XML in ruby

Pedro_CSSSSrte-Real · June 28, 2006, 7:47pm

What do you usually use to work with XML in ruby. REXML seems to be
too slow and libxml too buggy. Any other option I should try?

Thanks,

Pedro.

Pedro_CSSSSrte-Real · June 29, 2006, 11:04am

Pedro Côrte-Real wrote:

What do you usually use to work with XML in ruby. REXML seems to be
too slow and libxml too buggy. Any other option I should try?
Unless you’ve got a specific reason not to (as in, what is it you’re
actually trying to acheive?), check out REXML’s StreamParser. More than
fast enough for my needs, but I’m probably not a representative sample.

Then again, who is?

Pedro_CSSSSrte-Real · June 29, 2006, 1:59pm

Pedro Côrte-Real wrote:

I am using the stream parser. And although that seems a little slow
it’s ok. What I’m stuck with being too slow is the XPath support. If I
can’t get libxml to work I guess I’ll have to end up creating a small
ruby extension just to wrap the C libxml xpath support. My needs are
very simple so this should be easy, but I was hoping there was another
way.
Oh, I see… Sorry, I can’t help you there…

Pedro_CSSSSrte-Real · June 29, 2006, 12:07pm

On 6/29/06, Alex Y. [email protected] wrote:

Pedro Côrte-Real wrote:

What do you usually use to work with XML in ruby. REXML seems to be
too slow and libxml too buggy. Any other option I should try?
Unless you’ve got a specific reason not to (as in, what is it you’re
actually trying to acheive?), check out REXML’s StreamParser. More than
fast enough for my needs, but I’m probably not a representative sample.

I am using the stream parser. And although that seems a little slow
it’s ok. What I’m stuck with being too slow is the XPath support. If I
can’t get libxml to work I guess I’ll have to end up creating a small
ruby extension just to wrap the C libxml xpath support. My needs are
very simple so this should be easy, but I was hoping there was another
way.

Pedro.

Pedro_CSSSSrte-Real · June 29, 2006, 4:14pm

On 6/29/06, Mark Van H. [email protected] wrote:

My co-workers and I recently converted a bunch of rexml code to libxml. The
speed increase was dramatic ( 100-1000 times faster ). We have not run into
any stability issues. We use libxml to read, search, delete/change nodes and
values, and write out new files, all with no issues. What kind of issues are
you hitting while using libxml?

I did something as simple as:

parser = XML::Parser.new
parser.string = mydocstring
doc = parser.parse

That last line blew up with a segfault. I can do the same in irb and
it works although it happened once when exiting irb. Seems to be a
race condition of some sort.

100-1000 times faster seems great. If it worked well I’d convert
xmlcodec over to it.

Pedro.

Pedro_CSSSSrte-Real · June 29, 2006, 3:58pm

My co-workers and I recently converted a bunch of rexml code to libxml.
The
speed increase was dramatic ( 100-1000 times faster ). We have not run
into
any stability issues. We use libxml to read, search, delete/change nodes
and
values, and write out new files, all with no issues. What kind of issues
are
you hitting while using libxml?

Mark

Pedro_CSSSSrte-Real · June 29, 2006, 4:49pm

I am using the stream parser. And although that seems a little slow
it’s ok. What I’m stuck with being too slow is the XPath support. If I
can’t get libxml to work I guess I’ll have to end up creating a small
ruby extension just to wrap the C libxml xpath support. My needs are
very simple so this should be easy, but I was hoping there was another
way.

You might take a look at http://teius.rubyforge.org. I’ve been very
happy with it.

From the teius wiki homepage:

Teius is really a tiny Ruby wrapper around the LibXML C library.

Pedro_CSSSSrte-Real · June 29, 2006, 5:02pm

On 6/29/06, Gordon T. [email protected] wrote:

From the teius wiki homepage:
Teius is really a tiny Ruby wrapper around the LibXML C library.

Seems great, but I couldn’t install it. The gem threw a bunch of
errors when trying to compile. And it seems it only supports reading
from a file and not an IO or string. I’ll have to look at the code.

Pedro.

Pedro_CSSSSrte-Real · June 29, 2006, 10:12pm

2006/6/29, Pedro Côrte-Real [email protected]:

I am using the stream parser. And although that seems a little slow
it’s ok. What I’m stuck with being too slow is the XPath support. […] My needs are
very simple so this should be easy, but I was hoping there was another
way.

If your needs are so simple then you should be able to handle this
with the stream parser - you’ll likely have to only remember all nodes
on the stack (probably along with their attributes, depending on what
criteria you have to apply) and then decide what to do with the
current node.

HTH

robert

Pedro_CSSSSrte-Real · June 29, 2006, 5:11pm

On 6/29/06, Pedro Côrte-Real [email protected] wrote:

Seems great, but I couldn’t install it. The gem threw a bunch of
errors when trying to compile. And it seems it only supports reading
from a file and not an IO or string. I’ll have to look at the code.

It does support reading from a string with #parse_string. I got it to
install by changing extconf.rb. It was looking in /usr/include/libxml
instead of /usr/include/libxml2. It’s throwing some signdness warnings
but it works fine otherwise.

Pedro.

Pedro_CSSSSrte-Real · July 10, 2006, 6:40pm

On 6/29/06, Pedro Côrte-Real [email protected] wrote:

parser.string = mydocstring
doc = parser.parse

I was able to use it for a while but it seems it suffers from race
conditions. I got this today:

./imports/…/config/…/app/models/xmldoc.rb:20: [BUG] rb_gc_mark():
unknown data type 0x38(0x8bbb4c8) non object
ruby 1.8.4 (2005-12-24) [i486-linux]

The mailing list seems dead as well. I’m going to try to use teius.

Pedro.

Pedro_CSSSSrte-Real · June 30, 2006, 11:49am

On 6/29/06, Robert K. [email protected] wrote:

current node.
Not really, because I want to support arbitrary XPath’s and I’m not
going to implement a XPath engine by myself. The XPath’s aren’t an
internal thing, they’re defined in a config file to get stuff from the
XML, so I want full XPath support.

Pedro.