Rexml and some rather basic problems

Hi,

I’ve been playing around with ruby for a while now, but wouldnt consider
myself an experienced user.
For a new project I want to use a xml parser to extract some information
from a file. I understand that rexml is the tool of choice and that it
has various options to actually perform this task (tree, stream
parsing).

My question concerns how to access multiple children of an element at
one go… I guess that requires some explanation:

This is, in principle, how the xml source looks like

... ... ... ... ...

and so on.

In reality, we are dealing with a file that holds information about
genes, their name, their location and some other features. Each gene
needs to be dealt with individually (e.g. iterating) as I have some
methods that need to be applied to each entry or rather some of its
features. What I cant figure out is thus:

How do I get the entry (I figure its Element.elements.each(‘entry’)) and
then in the same “go” also some, not all, of its children. These
children are at different levels, too. If I use
Elements.elements.each(‘entry’), the whole entry gets stored as one
element in an array. That on its own is not a big problem, but at that
point I havent even touched on the children yet. If I try to further
treat them as if I was dealing with XML code (i.e. filter for elements)
it wont work. But isnt there a way other than normal array methods and
simple text parsing to get the children?
Most, if not all the tutorials I found were specifically focusing on
attributes after filtering on the “primary” level, which is no good to
me since I dont have any attributes in my xml file (although having
those would make thinks much easier…).

Or in other words:
How can I filter for an entry and then puts (or store in a variable)
some of its children like
‘puts Element.elements.each(‘entry’) do {|output| puts
output(’//feature1’, ‘//feature2’)}’.
The last bit is obviously nonsense, but in principle what I am looking
for.

Anyhow, I hope someone understands what I am trying to say here and can
point me in the right direction :slight_smile:

Cheers,
Marc

Cheers,
Marc
Hi Marc,

There are multiple ways to go about what you appear to be asking for,
but
the main ambiguity in your question is why do you want to select
the target elements all “in one go”? If this is an absolute
requirement,
perhaps you could explain the rationale behind it a bit more.

If it is not a requirement, then you can simply write an event-
based
parser (one approach) and then store only the information that
interests
you in a “placeholder” or “bookkeeping” variable. If there are
multiple
subsets of information that you want to extract, then just save those
multiple subsets into multiple variables, or save them into multiple
attributes of a single “bookkeeping object” that converts the XML into
a native ruby object.

Probably the best way to illustrate is by way of example code.

Here’s a sample link:

http://www.janvereecken.com/2007/4/11/event-driven-xml-parser-in-ruby

HTH

Thanks for the link!

What I did, after some sweat and tears and so on is the following:

After storing all instances of the element in an array
(Element.elements.each) I created a new document via Document.new and
used a string as input that I created from the array.

Like so:

Element.elements.each(’//entry’) do |solution|
eachentry = Document.new solution.to_s

 eachentry.elements.each('feature1') {|somemethod| print 

somemethod.text}

end

Probably far from elegant, but it actually does what I need.

Concerning why I need this in one go is because I am creating some
additional information from the actual entries which, among other
things, use a counter that has to be resetted for each entry (1, 2, 3
and so on).

Anyhow, thanks again!

/Marc