pozza
March 19, 2007, 12:38pm
1
Hi
I am using REXML to pull text from a NewsML document.
require ‘rexml/document’
include REXML
file = File.new(“Main_News.xml”)
doc = Document.new(file)
root = doc.root
puts
root.elements[“NewsItem/NewsComponent/NewsComponent[1]/NewsComponent/ContentItem/DataContent/nitf/body/body.head/hedline/hl1”]
Gives me…
Blueprint to cut emissions unveiled
Is there an easy way (ie something in REXML) to pull just the text
without the containers and .
Paul
pozza
March 19, 2007, 12:52pm
2
Paul W. wrote:
root.elements[“NewsItem/NewsComponent/NewsComponent[1]/NewsComponent/ContentItem/DataContent/nitf/body/body.head/hedline/hl1”]
Gives me…
Blueprint to cut emissions unveiled
Is there an easy way (ie something in REXML) to pull just the text
without the containers and .
If I understood correctly, you need the text content of the node rather
than the whole node. This can be accomplished with:
some_element.text
so you could do something like
root.elements[…your stuff_here…].to_a.each {|e| puts e.text}
HTH,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby
pozza
March 19, 2007, 3:06pm
3
On Mar 19, 5:38 am, Paul W. [email protected] wrote:
root.elements[“NewsItem/NewsComponent/NewsComponent[1]/NewsComponent/Conten tItem/DataContent/nitf/body/body.head/hedline/hl1”]
Gives me…
Blueprint to cut emissions unveiled
Is there an easy way (ie something in REXML) to pull just the text
without the containers and .
require ‘rexml/document’
doc = REXML::Document.new(“hello world”)
p REXML::XPath.first( doc, ‘/root/kid/text()’ )
#=> “hello world”
pozza
March 19, 2007, 3:25pm
4
On Mar 19, 8:04 am, “Phrogz” [email protected] wrote:
require ‘rexml/document’
doc = REXML::Document.new(“hello world”)
p REXML::XPath.first( doc, ‘/root/kid/text()’ )
#=> “hello world”
Also, depending on your needs:
include REXML
doc = Document.new(“helloworld”)
p XPath.match( doc, ‘/root/kid/text()’ )
#=> [“hello”, “world”]
pozza
March 19, 2007, 4:39pm
5
Hey,
Two notes:
I always suggest the REXML::XPath methods over the others for
people who grok XPath.
A REXML::XPath.* … text() match will return a REXML::Text node,
which may not be what you want:
$ irb --simple-prompt foo.rb
require ‘rexml/document’
=> true
doc = REXML::Document.new(“hello world”)
=> … </>
REXML::XPath.first( doc, ‘/root/kid/text()’ )
=> “hello world”
REXML::XPath.first( doc, ‘/root/kid/text()’ ).class
=> REXML::Text
Just something to be aware of (use .to_s if you want a string, as
usual).
HTH,
Keith
pozza
March 19, 2007, 1:06pm
6
Peter S. wrote:
If I understood correctly, you need the text content of the node rather
than the whole node. This can be accomplished with:
some_element.text
You did understand correctly, .text on the end was all I needed.
Cheers
Paul
pozza
March 22, 2007, 5:44pm
7
require ‘rexml/document’
doc = REXML::Document.new(“hello world”)
p REXML::XPath.first( doc, ‘/root/kid/text()’ )
#=> “hello world”
Thanks for that, I’m now using REXML::XPath with a combination of .first
and .match to pull the element text out.
One more thing, given an XML document…
hello world
What would be the path to the attribute ‘stuff’ and return
‘some-other-text’?
Paul
pozza
March 22, 2007, 5:56pm
8
On Mar 22, 10:44 am, Paul W. [email protected] wrote:
One more thing, given an XML document…
hello world
What would be the path to the attribute ‘stuff’ and return
‘some-other-text’?
require ‘rexml/document’
include REXML
doc = Document.new( <<ENDDOC )
hello world
hello world
ENDDOC
att = XPath.first( doc, ‘//kid/@stuff ’ )
p att, att.class, att.value
#=> stuff=‘some-other-text’
#=> REXML::Attribute
#=> “some-other-text”
p XPath.first( doc, ‘//kid[@class=“best”]/@stuff ’ ).value
#=> “gibbles”
I don’t know what the XPath syntax is to select the value of an
attribute directly. I’d be interested to know if someone else knows it.
pozza
March 22, 2007, 6:02pm
9
Gavin K. wrote:
att = XPath.first( doc, ‘//kid/@stuff ’ )
I don’t know what the XPath syntax is to select the value of an
attribute directly. I’d be interested to know if someone else knows it.
Cheers, it was the kid/@stuff I needed…
puts XPath.first( doc, ‘/root/kid/@stuff ’ )
#=> some-other-text
Paul
pozza
March 22, 2007, 6:12pm
10
On Mar 22, 11:02 am, Paul W. [email protected] wrote:
Gavin K. wrote:
att = XPath.first( doc, ‘//kid/@stuff ’ )
I don’t know what the XPath syntax is to select the value of an
attribute directly. I’d be interested to know if someone else knows it.
Cheers, it was the kid/@stuff I needed…
puts XPath.first( doc, ‘/root/kid/@stuff ’ )
#=> some-other-text
Nice, I didn’t realize that REXML::Attribute had such different output
for #inspect versus #to_s . It’s nice, then, that you don’t need to
call .value in this particular case. Just be aware that without
the .value call you still have an Attribute instance that can just be
treated as a string in some areas:
att = XPath.first( doc, ‘//kid/@stuff ’ )
puts att
#=> some-other-text
puts att.value + ‘-more’
#=> some-other-text-more
puts att + “-more”
#=> tmp.rb:17: undefined method `+’ for
stuff=‘some-other-text’:REXML::Attribute (NoMethodError)