On 03.01.2007 10:57, Imobach González Sosa wrote:
These are your options as far as I can see at the moment:
- use REXML DOM
You can read the file in a single line and get a document object. You
can then extract attributes via XPath or direct traversal. In this case
you need zero code for the parsing / reading but the extraction might be
a bit more complex than you want (using XPath expressions is more
complex than just using obj.some_attribute).
- use REXML stream parser with Hash
You will have to code a generic listener to XML stream parsing events
(not too complicated) that basically transfers data seen into a tree if
Hash instances. XML element names become Hash keys and nested elements
become Hashes as values. You probably need additional (fixed) keys for
storing XML attributes and a link to the parent element.
- use REXML stream parser with OpenStruct
Basically the same as option 2 but you use OpenStruct instead of Hash.
- use REXML stream parser with report on the fly
Depending on your reporting needs you might not have to construct an
object tree at all but can create your report while you go through the
file. This is the most efficient approach for large files.
- use REXML stream parser with custom classes on the fly
You implement a listener for XML stream parsing events that will create
custom classes on the fly, i.e. when it sees a “Person” it will create
class Person; then when it sees a nested element it will create a
attribute for it etc. This is the most complex and think it’s probably
not worth the effort.
- XML Mapping
Not sure whether that fits your needs or is stable at all:
Or this one
Other XML related libs:
- use an XSLT tool
You could as well use an XSLT tool to create your report as
transformation of your input file. Whether that works depends on your
Depending on what kinds of reports you want to do, option 1 or 2/3 might
be the most efficient. Especially if you just want to do something like
“task x has n sub tasks” that can be easily accomplished with XPath.
If those XML files are large, then you should go with one of the stream
parsing approaches because they are more memory efficient. Also you can
filter the data while you go (i.e. ignore attributes and nested elements
you are not interested in) and thus further optimize your app. The most
efficient is certainly 4 if it’s feasible.
Note, there are other XML parsers for Ruby around. I prefer to start
wiht REXML and only convert to something else if performance is an issue
simply because it’s part of the standard distribution and has a nice