How to best extract Ruby strings from REXML::Text instances?

Kenneth_McDonald · August 26, 2007, 1:46am

REXML is a great package and is making it very easy for me to extract
data from web pages. However, I’m having trouble with extracting a file
string value from text nodes. My understanding from reading the API doc
is that saying something like

p a_textnode.value

should print out the string value of the textnode with special character
entities back-substituted, eg. with " " put in place of " ".
However, I’m getting the XML-style value, i.e. I’m getting something
like

&nbsp;15.16&nbsp;

printed to the terminal, special character entities aren’t being
substituted for.

Am I misinterpreting what .value does? Is there a better or other way to
do this?

Thanks,
Ken

P.S. Can anyone recommend a good XPath quick reference or summary?

Kenneth_McDonald · August 27, 2007, 4:53pm

2007/8/26, Kenneth McDonald [email protected]:

REXML is a great package and is making it very easy for me to extract
data from web pages. However, I’m having trouble with extracting a file
string value from text nodes. My understanding from reading the API doc
is that saying something like
p a_textnode.value

I think you want “element.text”.

irb(main):008:0> t=REXML::Document.new(“bar”)
=> … </>
irb(main):009:0> t.root.text
=> “bar”
irb(main):010:0> t.root.text.class
=> String

Am I misinterpreting what .value does? Is there a better or other way to
do this?

Thanks,
Ken

P.S. Can anyone recommend a good XPath quick reference or summary?

I use this frequently:
http://www.w3schools.com/xpath/

and sometimes this:
http://www.zvon.org/xxl/XPathTutorial/General/examples.html

Kind regards

robert