I want to trim trailing whitespace at the end of all XHTML paragraphs.
I am using the REXML library.
Say I have the following in a valid XHTML file:
hello world a
Hi there
The End
I want to end up with this:
hello world a
Hi there
The End
So I was thinking what I could use XPath to get just the text nodes
that I want, then just trim the text, which would allow me to end up
with what I want (previous).
I started with the following XPath: //root/p/child::text()
Of course, the problem here is that it returns all text nodes that are
children of all p-tags. Which is this:
'hello ’
’ a ’
'Hi there ’
'The End ’
Trying the following XPath gives me the last text node of the last
paragraph. Not the last text node of each paragraph that is a child of
the root node.
//root/p/child::text()[last()]
This only returns: 'The End ’
What I would like to get from the XPath is therefore:
’ a ’
'Hi there ’
'The End ’
I have tried //root//p/child::text()[last()] on other XPath parsers,
and it works. Could just be a bug (or different interpretation of the
rules) by REXML?
Of course, the problem here is that it returns all text nodes that are
I have tried //root//p/child::text()[last()] on other XPath parsers,
and it works. Could just be a bug (or different interpretation of the
rules) by REXML?
Could well be both. When I try ‘//p/text()[last()]’ I get only the
last node of the whole document. The issue seems to be the binding of
last() i.e. which collection it references or when it is applied. I
lean towards the bug variant.
One workaround would be to use a two step approach, i.e. first select
all
and then the last text:
irb(main):062:0> doc.elements.each(‘//p’){|x|
REXML::XPath.each(x,‘text()[last()]’){|t|p t}}
" a "
"Hi there "
"The End "
=> [
I have tried //root//p/child::text()[last()] on other XPath parsers,
and it works. Could just be a bug (or different interpretation of the
rules) by REXML?
Sounds like a bug to me too. Is there a reason you don’t want to use a
parser like libxml-ruby, which is fully XPath 1.0 compliant (and will
give you a speed boost as well)?
@Robert: As you suggest, a work around is in order. I had a look at
other XPath implementations and they were returning what I originally
expected. Just not REXML. At least now I am sure it’s not just me.
@Mark: I would consider something other than REXML if (for example) I
maybe had performance issues using a work around. Or if there was not
easy work around. As Robert commented, I use it because it’s there and
had not run in to any real problems prior to this. So I was happy with
it. Certainly if it was a show-stopper then I would switch to
something else. But thanks for the recommendation. Something to keep
in mind.
Cheers.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.