Using XPath to retrieve an XML element which contains a given text

This code returns the first dataformat element.
And yet the second dataformat is the one containing SPPT.
What am I doing wrong?

require “rexml/document”

include REXML

string = <<EOF



CFMT




SPPT



EOF

doc = Document.new string
xpathquery="//dataformat[contains(fileidentifier, SPPT)]"
p XPath.first(doc,xpathquery).to_s

On Aug 10, 9:37 pm, anne001 [email protected] wrote:

  <dataformat>

EOF

doc = Document.new string
xpathquery="//dataformat[contains(fileidentifier, SPPT)]"
p XPath.first(doc,xpathquery).to_s

I think you XPath query should be:
xpathquery="//dataformat[contains(., ‘SPPT’)]"

or more specific one:
xpathquery="//dataformat[contains(fileidentifiers/
fileidentifier,‘SPPT’)]"

Thank you, the first formulation works.

I had tried the second one on the complete xml file and it does not
work.
Do you have an idea why? Is there a typo I am not seeing?

Here is a test file a little closer to the XML file I am working with

require “rexml/document”
include REXML

string = <<EOF


NARSAD recognition

NARSAD



SPFT

SPFT
SPPT



EOF

doc = Document.new string

xpathquery="//dataformat[contains(., ‘SPPT’)]"
p ‘yours1’
p XPath.first(doc,xpathquery).to_s

xpathquery="//dataformat[contains(fileidentifiers/
fileidentifier,‘SPPT’)]"
p ‘yours2’
p XPath.first(doc,xpathquery).to_s

result
“yours1”
“\n\t\tSPFT\n\t\t\n\t\t
\tSPFT\n\t\t\tSPPT</
fileidentifier>\n\t\t\n\t”
“yours2”
“”

On 11.08.2008 17:11, Robert K. wrote:

I believe “contains” is the wrong function as it does a textual
comparison and I have no idea whether a node is actually allowed as
input. I believe the correct XPath expression is this:

Wait, change “correct” to “more appropriate”.

“//dataformat[descendant::fileidentifier[text()=‘SPPT’]]”

Here are some expressions that you may want to try:

Here are even more that yield the result you want (or so I believe):

[
“//dataformat[descendant::fileidentifier[text()=‘SPPT’]]”,
“//dataformat[fileidentifiers/fileidentifier[text()=‘SPPT’]]”,
“//dataformat[descendant::fileidentifier[contains(text(),‘SPPT’)]]”,
“//dataformat[fileidentifiers/fileidentifier[contains(text(),‘SPPT’)]]”,
“//dataformat[descendant::fileidentifier[starts-with(text(),‘SPPT’)]]”,

“//dataformat[fileidentifiers/fileidentifier[starts-with(text(),‘SPPT’)]]”,
“//dataformat[descendant::fileidentifier[ends-with(text(),‘SPPT’)]]”,
“//dataformat[fileidentifiers/fileidentifier[ends-with(text(),‘SPPT’)]]”,
].each do |xpath|
printf “\nXPath: %p\n\n”, xpath

XPath.each doc, xpath do |elm|
puts elm
end
end

Interestingly ends-with() does not seem to work. Maybe we hit a REXML
bug.

XPath nicely fits Ruby because of TIMTOWTDI. :slight_smile:

Kind regards

robert

Hi Anne,

welcome back!

2008/8/11 anne001 [email protected]:

           <fileidentifiers>

p ‘yours1’
p XPath.first(doc,xpathquery).to_s

xpathquery="//dataformat[contains(fileidentifiers/
fileidentifier,‘SPPT’)]"
p ‘yours2’
p XPath.first(doc,xpathquery).to_s

I believe “contains” is the wrong function as it does a textual
comparison and I have no idea whether a node is actually allowed as
input. I believe the correct XPath expression is this:

“//dataformat[descendant::fileidentifier[text()=‘SPPT’]]”

Here are some expressions that you may want to try:

find the correct fileidentifier

XPath.each doc, “//fileidentifier[text()=‘SPPT’]” do |elm|
puts elm
end

puts ‘-------------’

go upwards from there to find the dataformat node

XPath.each doc, “//fileidentifier[text()=‘SPPT’]/ancestor::dataformat”
do |elm|
puts elm
end

puts ‘-------------’

select all dataformats that contain a fileidentifier with text “SPPT”

this seems to best reflect what you want

XPath.each doc,
“//dataformat[descendant::fileidentifier[text()=‘SPPT’]]” do |elm|
puts elm
end

Btw, I have these bookmarked and they serve me well with regard to
XPath issues (I always have to look them up):
http://www.w3schools.com/xpath/default.asp
http://www.zvon.org/xxl/XPathTutorial/General/examples.html

(I use the first one most of the time.)

Kind regards

robert