REXML XPath: bug or misunderstanding?

xmg-eric · August 2, 2006, 3:01am

This code looks for a table that matches
specific criteria:

 XPath.each(@doc, '//table') do |tbl|
   XPath.each(tbl, '//td') do |td|
     # look for matching data

According to the XPath doc, the first argument
is the “context” element.

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean “anywhere
within the context”.

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

The workaround is to dispense with the
outer loop. Once matching data is found
and keep visiting parents until the
table ancestor is found. (That patch
simplifies the code, actually.)

But if this implementation isn’t a bug, it
means that the definition of “context” is
“the entire tree in which the specified
node is found”.

In that case, the XPath expression that
asks for “all

elements under the
current node” must be something
other than what I coded…

What might that path be, I wonder?

xmg-eric · August 3, 2006, 12:12pm

Eric A. wrote:

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean “anywhere
within the context”.

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

That sounds like a bug.

The workaround is to dispense with the
outer loop. Once matching data is found
and keep visiting parents until the
table ancestor is found. (That patch
simplifies the code, actually.)

IMHO the proper solution is to craft a single XPath expression that will
cover your requirement, i.e. any “td” somewhere below a “table”.

What might that path be, I wonder?
I’m not too involveld with XPath but did you try something like this?

//table/*/td
//table//td

IMHO having a single XPath expression is the preferred way to go.

Also, I usually use method doc.elements.each ‘xpath here’ do … instead
of the specific XPath expression you used.

Kind regards

robert

xmg-eric · August 3, 2006, 12:16pm

On 02/08/06, Robert K. [email protected] wrote:

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

That sounds like a bug.

That’s what I thought when I first encountered it, but after looking
into it in more detail I concluded that it follows the specification.
The receiving element does indeed provide context, but using //
overrides it: // means the root node or any of its descendants.

The workaround is to dispense with the
outer loop. Once matching data is found
and keep visiting parents until the
table ancestor is found. (That patch
simplifies the code, actually.)

IMHO the proper solution is to craft a single XPath expression that will
cover your requirement, i.e. any “td” somewhere below a “table”.

In general, that is indeed likely to be the best solution. However,
just to clear things up, here’s how the original poster could have
done what he wanted: use “descendant::td”. Here’s an example:

xml = %{

}

doc = REXML::Document.new(xml)

REXML::XPath.each(doc, ‘//b’) do |b|
REXML::XPath.each(b, ‘descendant::d’) do |d|
puts(d.to_s)
end
end

=> <d id=‘foo’/â??>

For comparison:

REXML::XPath.each(doc, ‘//b’) do |b|
REXML::XPath.each(b, ‘//d’) do |d|
puts(d.to_s)
end
end

=> <d id=‘foo’/â??>

=> <d id=‘bar’/â??>

Regards,
Paul.

xmg-eric · August 3, 2006, 12:16pm

I have a collection of XPath examples on my Wiki, stolen from an old
Microsoft document I think:
http://mobeus.homelinux.org/eclectica/show/XmlPath

I haven’t used it yet because I started playing with REXML yesterday,
but it’s pretty detailed.

Les

xmg-eric · August 3, 2006, 12:55pm

Eric A. skrev:

So what /should/ happen is that search
for td elements occurs in the subtree
rooted at the tbl element, using the
//td path --which I take to mean “anywhere
within the context”.

But what actually happens is that the
search for td elements occurs in the
entire document, so the code above returns
the first table, regardless of where the
matching data is found.

// always uses the document root as the context regardless of the
context node provided. This is according to the XPath spec.

But if this implementation isn’t a bug, it
means that the definition of “context” is
“the entire tree in which the specified
node is found”.

Well, yes, when you are using // in the beginning of your path
expression.

In that case, the XPath expression that
asks for “all
elements under the
current node” must be something
other than what I coded…
What might that path be, I wonder?
To get all descendants of the current context node using a shorthand you
do
.//td
The . in the beginning makes sure the expression will use the current
node as the context. There is also the following to xpath axes you can
use
descendant-or-self::td (includes the context node itself)
and
descendant::td (equivalent with .//td)
/Marcus

xmg-eric · August 4, 2006, 1:06am

Paul B. wrote:

within the context".
into it in more detail I concluded that it follows the specification.
The receiving element does indeed provide context, but using //
overrides it: // means the root node or any of its descendants.

Matches my experience.

…could have used: XPath.each(tbl, ‘descendant::td’)

Interesting construct. I guess that’s a good way to go,
and I guess that matches the Xpath spec…but I’m
highly dubious about a spec that defines “context” as
something it can simply ignore. Makes little sense,
from my current perspective.

xmg-eric · August 3, 2006, 12:59pm

Eric A. wrote:

This code looks for a table that matches specific criteria:
XPath.each(@doc, '//table') do |tbl|
  XPath.each(tbl, '//td') do |td|
    # look for matching data

Are you doing anything with tbl other than making it a stopping point
for iteration? You could remove one loop if you simply delve to
precisely what you’re looking for:

XPath.each(@doc, ‘//table//td’) do |td|
# look for matching data

And, depending on how you define “matching data” you may be able to add
some XPath conditions that get rid of the loop entirely.

Mark.

xmg-eric · August 4, 2006, 1:12am

Marcus A. wrote:

// always uses the document root as the context regardless of the
context node provided. This is according to the XPath spec.

To get all descendants of the current context node using a shorthand you do

.//td

The . in the beginning makes sure the expression will use the current
node as the context.

Very useful syntax. Thanks.

I continue to be disgruntled by a syntax that ignores
the context you specified, unless you add the additional
“.” to say, “No, I really mean it”. But I thank you for
a fine solution, and the additional explanation.

xmg-eric · August 4, 2006, 1:09am

Leslie V. wrote:

I have a collection of XPath examples on my Wiki, stolen from an old
Microsoft document I think:
http://mobeus.homelinux.org/eclectica/show/XmlPath

I haven’t used it yet because I started playing with REXML yesterday,
but it’s pretty detailed.

Good collection of examples. Just what the doctor
order for a fast fix…

xmg-eric · August 4, 2006, 1:30am

On 8/4/06, Eric A. [email protected] wrote:

I continue to be disgruntled by a syntax that ignores
the context you specified, unless you add the additional
“.” to say, “No, I really mean it”. But I thank you for
a fine solution, and the additional explanation.

I suppose this is because XPath originated from XSLT. When using XPath
in
XSLT you always have, implicitly, the current node as the context node.
Sometimes you need to break out of the context and then you do it by
appending a “/” or a “//” in the beginning of your path expression to
search
from the root. XPath works extremely smooth in most ways in XSLT.

But from a DOM and XPath perspective it’s kind of silly…

/Marcus

xmg-eric · August 4, 2006, 1:15am

Thomas, Mark - BLS CTR wrote:

XPath.each(@doc, ‘//table//td’) do |td|
# look for matching data

And, depending on how you define “matching data” you may be able to add
some XPath conditions that get rid of the loop entirely.

Most excellent. Some damn good Xpath expertise
on this list. Thanks, all.

eric
(Who hasn’t used Xpath expressions in more than 3 years,
and who is entirely capable of forgetting everything he
ever knew in less than 6 months.)
:_)

xmg-eric · August 4, 2006, 9:53am

Eric A. wrote:

(Who hasn’t used Xpath expressions in more than 3 years,
and who is entirely capable of forgetting everything he
ever knew in less than 6 months.)
:_)

Same here: I rarely use XPath and I always have to look up the details
again. IMHO it’s not very intuitive. My 0.02EUR…

Btw, I find http://www.xmlcooktop.com/ a handy tool for experimenting
with XPath expressions. It’s an XML editor with an XPath evaluation
window where you can immediately see results. Nothing too fancy but I
liked the XPath direct evaluation.

Kind regards

robert

xmg-eric · August 4, 2006, 9:56am

Robert K. wrote:

precisely what you’re looking for:
eric
(Who hasn’t used Xpath expressions in more than 3 years,
and who is entirely capable of forgetting everything he
ever knew in less than 6 months.)
:_)

Same here: I rarely use XPath and I always have to look up the details
again. IMHO it’s not very intuitive. My 0.02EUR…

PS: This is the page I use occasionally for refreshing:
http://www.w3schools.com/xpath/xpath_syntax.asp

Can anyone recommend XPath and XPointer [Book] ?

robert

xmg-eric · August 4, 2006, 1:30pm

On Friday 04 August 2006 3:55 am, Robert K. wrote:

Can anyone recommend XPath and XPointer [Book] ?

No, not really. I think it’s much easier to learn XPath (an incredibly
powerful language once you get your head around it) in a more practical
environment (probably XSLT or XQuery). Quite a few of the XSLT books
have good introductions to XPath. I quite like the new edition of the
XSLT Cookbook, which has a section comparing XPath1 and XPath2, but
this may not cover enough of the basics for some folks.

(Note: Our first edition general XSLT book is being revised presently,
so I’d wait for the new edition)

HTH,
Keith

xmg-eric · August 15, 2006, 2:18pm

Robert K. wrote:

I rarely use XPath and I always have to look up the details
again. IMHO it’s not very intuitive. My 0.02EUR…

I find http://www.xmlcooktop.com/ a handy tool for experimenting
with XPath expressions. It’s an XML editor with an XPath evaluation
window where you can immediately see results. Nothing too fancy but I
liked the XPath direct evaluation.

PS: This is the page I use occasionally for refreshing:
http://www.w3schools.com/xpath/xpath_syntax.asp

Again, greatly appreciated.

xmg-eric · August 4, 2006, 1:51pm

Keith F. wrote:

On Friday 04 August 2006 3:55 am, Robert K. wrote:

Can anyone recommend XPath and XPointer [Book] ?

No, not really. I think it’s much easier to learn XPath (an incredibly
powerful language once you get your head around it) in a more practical
environment (probably XSLT or XQuery).

So you’re basically saying that the book is more similar to the standard
page at Cover page | xpath | W3C standards and drafts | W3C - did I get you right?

Quite a few of the XSLT books
have good introductions to XPath. I quite like the new edition of the
XSLT Cookbook, which has a section comparing XPath1 and XPath2, but
this may not cover enough of the basics for some folks.

XSLT Cookbook, 2nd Edition [Book]

(Note: Our first edition general XSLT book is being revised presently,
so I’d wait for the new edition)

I’ll probably go with the XPath / XPointer anyway as I prefer the more
thorough coverage over the easy learning path.

Thanks for your hints anyway!

Kind regards

robert

xmg-eric · August 15, 2006, 2:18pm

Marcus B. wrote:

appending a “/” or a “//” in the beginning of your path expression to
search
from the root. XPath works extremely smooth in most ways in XSLT.

But from a DOM and XPath perspective it’s kind of silly…

That XSLT perspective may explain it, somewhat.
Thanks for the attempt to make it seem rational, at least.
:_)