Parsing through XML with REXML/XPath

bodikp · September 25, 2007, 4:34pm

Hi,
I need to sort groups of xml data based on the first instances of
particular elements down deep in the element structure of documents.
regs = []
regs = XPath.match(doc, “//registration”)
regs.each do |reg|
codes = XPath.match(regs, “//issue[1]/”)# { |element| puts
element.text }
puts codes
end
I’m getting:
Energy/Nuclear
Education
Agriculture
…

There are only 86 entries of in the document, but, I’m getting
over 2,550 results here for “puts codes!” Obviously, it’s looping and I
don’t know why. It is pulling just the first entries, which I want, but,
obiously, it’s doing it lots and lots of times.

I’m also trying to parse out these results, so that, I only end up with
the actual element text, not any attributes. So, for example, in the
above results, I only want “Energy/Nuclear, Education, and Agriculture,”
not any of the surrounding stuff. So, I’ve tried this, inside the above:
codes.each do |code|
code.to_s.gsub!(/(.?)/.*</issue>/, “$1”)
puts code
end

Thanks,
Peter

bodikp · September 25, 2007, 4:44pm

On Sep 25, 2007, at 9:34 AM, Peter B. wrote:

I’m getting:
obiously, it’s doing it lots and lots of times.
I not sure what you mean by “sort groups of xml data based on the
first instance of particular elements”. Can you explain that more?
Feel free to email me off-list if you’d like since this isn’t really
a Ruby question.

I’m also trying to parse out these results, so that, I only end up
with
the actual element text, not any attributes.

To ask for just the text inside an element, you need to use “text()”.
For example, this would give you a node set containing all the text
in the issue elements.

/registration//issue/text()

Note that you should only use one slash at the beginning to specify
that the root element should be “registration”.

Thanks,
Peter

Posted via http://www.ruby-forum.com/.

Mark V.

bodikp · September 25, 2007, 5:36pm

Thanks, Mark. Yes, I’m sorry about the lingo. I’m not that versed in
XML-speak yet. But, basically, I want to sort all of the
data sets in my files, just alphabetically. But, as I said, I need to
put in some headings into my output, and, those headings are essentially
the text that’s in the very first instance of the element. The
element is a child element to . Now, your
suggestion with the use of /text() worked, meaning, I got just that text
that I want. That’s great. But, it’s still looping and giving me many
repeats of these instances.

Here’s my e-mail address, if you’d like to strike up a separate
conversation.

Thanks,
Peter
[email protected]

Parsing through XML with REXML/XPath

Thanks, Peter

Thanks,
Peter