Nokigiri xpath

dubstep · November 24, 2011, 9:43am

I have a long XML like below … I wish to select DATA (“cdef” in this
case) when key=“English”

What can be the easiest way. XML below is a part of 100 page XML

Spanish description ABCDEF server` systems title Directir English description CDEF server producer title Update 66

rubymarc · November 24, 2011, 10:41am

On Thu, Nov 24, 2011 at 9:43 AM, Ruby M. [email protected]
wrote:

description
description

Posted via http://www.ruby-forum.com/.

Try this:

doc.xpath(“//key[. = ‘English’]/following-sibling::topic/data”)

=> [#<Nokogiri::XML::Element:0x4918efe name=“data”
children=[#<Nokogiri::XML::Text:0x4918dfa "\n CDEF\n ">]>]

Jesus.

rubymarc · November 24, 2011, 12:30pm

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

thanks again

Try this:

doc.xpath("//key[. = ‘English’]/following-sibling::topic/data")

=> [#<Nokogiri::XML::Element:0x4918efe name=“data”
children=[#<Nokogiri::XML::Text:0x4918dfa "\n CDEF\n ">]>]

Jesus.

rubymarc · November 24, 2011, 12:49pm

one way i can think of is to loop and break after getting first value
its ok if one english tag
what is multiple tag in the long file

Ruby M. wrote in post #1033514:

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

thanks again

Try this:

doc.xpath("//key[. = ‘English’]/following-sibling::topic/data")

=> [#<Nokogiri::XML::Element:0x4918efe name=“data”
children=[#<Nokogiri::XML::Text:0x4918dfa "\n CDEF\n ">]>]

Jesus.

rubymarc · November 24, 2011, 1:24pm

On Thu, Nov 24, 2011 at 12:30 PM, Ruby M. [email protected]
wrote:

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

I’m not sure why is this. I’m still trying to come up with a good
XPath that will return just that node,
but in the meantime you can do this:

doc.xpath(“//key[. = ‘English’]/following-sibling::topic/data”)[0]

Jesus.

rubymarc · November 24, 2011, 1:50pm

2011/11/24 Jess Gabriel y Galn [email protected]:

On Thu, Nov 24, 2011 at 12:30 PM, Ruby M. [email protected] wrote:

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

I’m not sure why is this. I’m still trying to come up with a good
XPath that will return just that node,

Well, there could be many matches and from the original posting I
cannot see that only the first is needed.

but in the meantime you can do this:

doc.xpath(“//key[. = ‘English’]/following-sibling::topic/data”)[0]

Or better

doc.at_xpath(“//key[. = ‘English’]/following-sibling::topic[1]/data”)

I would probably do

doc.xpath(‘//topic[preceding-sibling::key[text()=“English”]]//data’)

or, for one hit only

doc.at_xpath(‘//topic[preceding-sibling::key[text()=“English”]][1]//data’)

Not sure about efficiency but I prefer it visually to have the path to
the selected node as basis and use criteria in [] for filtering.

If we want to be even more robust we could do

doc.xpath(‘//topic[preceding-sibling::key[last() and
text()=“English”]]//data’)

This will avoid matching the topic in

English
…
Foo
…

or

English
…

Kind regards

robert

PS: My favorite XPath help
http://www.w3schools.com/xpath/default.asp
http://www.zvon.org/xxl/XPathTutorial/General/examples.html

rubymarc · November 24, 2011, 4:33pm

On Thu, Nov 24, 2011 at 1:49 PM, Robert K.
[email protected] wrote:

2011/11/24 Jess Gabriel y Galn [email protected]:

On Thu, Nov 24, 2011 at 12:30 PM, Ruby M. [email protected] wrote:

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

I’m not sure why is this. I’m still trying to come up with a good
XPath that will return just that node,

Well, there could be many matches and from the original posting I
cannot see that only the first is needed.

What I don’t understand is why that xpath returns nodes whose
preceding key sibling doesn’t have ‘English’ as value.
I mean:

English CDEF Spanish ABC

Why that xpath returns the ABC also. I would have thought that
following-sibling for English would only be the
CDEF, from which we are selecting the data
node.

doc.xpath(‘//topic[preceding-sibling::key[text()=“English”]]//data’)

or, for one hit only

doc.at_xpath(‘//topic[preceding-sibling::key[text()=“English”]][1]//data’)

Not sure about efficiency but I prefer it visually to have the path to
the selected node as basis and use criteria in [] for filtering.

I agree with you, and I would guess this is more efficient, since
nokogiri doesn’t have to return as many nodes.

Jesus.

rubymarc · November 24, 2011, 5:03pm

2011/11/24 Jess Gabriel y Galn [email protected]:

Well, there could be many matches and from the original posting I
cannot see that only the first is needed.

What I don’t understand is why that xpath returns nodes whose
preceding key sibling doesn’t have ‘English’ as value.

With the statement above I was referring to the case where there are
multiple pairs of key “English” and topic.

ABC

Why that xpath returns the ABC also. I would have thought that

Which XPath expression are you referring to here with “that xpath”?
If you mean this

irb(main):020:0> doc =
Nokogiri.XML(“12 3”)
=> …
irb(main):022:0> doc.xpath(‘//k/following-sibling::b’).size
=> 3
irb(main):023:0> puts doc.xpath(‘//k/following-sibling::b’)
1
2
3
=> nil

Then you get three matches but from different parents - even though
you cannot distinguish them immediately. If you want to only match
exactly one entry you need to add more criteria:

irb(main):024:0> doc.xpath(‘//k/following-sibling::b[1]’).size
=> 2
irb(main):025:0> puts doc.xpath(‘//k/following-sibling::b[1]’)
1
3
=> nil

following-sibling for English would only be the
CDEF, from which we are selecting the data
node.

Generally *-sibling refers to all siblings, i.e. sub nodes of the same
node

irb(main):016:0> doc = Nokogiri.XML(“12”)
=> #<Nokogiri::XML::Document:0x832daa4 name=“document”
children=[#<Nokogiri::XML::Element:0x832d810 name=“a”
children=[#<Nokogiri::XML::Element:0x832d68a name=“k”>,
#<Nokogiri::XML::Element:0x832d568 name=“b”
children=[#<Nokogiri::XML::Text:0x832d450 “1”>]>,
#<Nokogiri::XML::Element:0x831c02e name=“b”
children=[#<Nokogiri::XML::Text:0x831bf02 “2”>]>]>]>

irb(main):017:0> doc.xpath(‘//k/following-sibling::b’).size
=> 2

irb(main):019:0> puts doc.xpath(‘//k/following-sibling::b’)
1
2
=> nil

See also the XPath resources I mentioned earlier.

Kind regards

robert

rubymarc · November 24, 2011, 1:53pm

On Thu, Nov 24, 2011 at 1:49 PM, Robert K.
[email protected] wrote:

doc.xpath(‘//topic[preceding-sibling::key[text()=“English”]]//data’)
doc.xpath(‘//topic[preceding-sibling::key[last() and text()=“English”]]//data’)
English
XPath 教程
PPS: You can append /text() to directly get the text:

//topic[preceding-sibling::key[last() and
text()=“English”]]//data/text()

e.g.

doc.xpath(‘//topic[preceding-sibling::key[last() and
text()=“English”]]//data/text()’).map {|x| x.text.strip}

rubymarc · November 25, 2011, 9:43am

On Thu, Nov 24, 2011 at 5:02 PM, Robert K.
[email protected] wrote:

irb(main):017:0> doc.xpath(‘//k/following-sibling::b’).size
=> 2

irb(main):019:0> puts doc.xpath(‘//k/following-sibling::b’)
1
2
=> nil

Hi,

Now I see what was wrong with my reasoning. I was misunderstanding the
XML structure. Somehow, I thought that the only topic at the same
level as the key was the one we wanted to search. Looking more closely
I realized that key is at the same level as all other topic nodes in
the document.

Thanks,

Jesus.

Nokigiri xpath

description description

description
description