Odd puts behaviour with REXML

davidstokar · November 30, 2009, 7:18pm

Hello,

This is really weird, and I’m not sure if maybe there’s a formatting
option that I can set on the puts method that will solve it. Hopefully
someone has done something similar??

I have been using REXML to parse through an XML document that I get back
from calling a ReST-ish web service, and up until now it has parsed
every single element as I’d expect. But I run into this small patch of
data were it acts weird. The XML looks like this:

eSupport Find related product support and help in http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel

And I’m parsing it like this:
data.elements.each(“product/profile/support”){|element|
element.elements.each() do |child|
puts “Did XPATH match any profile elements?”
puts child.text
end
}

I know it’s pretty rudimentary, but I’m just confused as to why this
same pattern has been able to get text out of every other element I have
searched for, but here I get something like this when I loop through the
subelemenents:
Did XPATH match any profile elements?
???Product_Support_Link_Title???
Did XPATH match any profile elements?
???Product_Support_Link_Prefix???
Did XPATH match any profile elements?
???Product_Support_Link???&mdl=someModel

It looks like XPATH doesn’t like the format of the text somehow, but I
have no idea what it is about it that it doesn’t like since it appears
to be just normal text to me.

Anyone have any ideas? It looks like something that REXML is doing
since it outputs the path to the element in the document Product -->
Support but I don’t have a clue where the _Link is coming from?

Thanks in advance for all your help!
David

davidstokar · November 30, 2009, 7:36pm

Morning David,

On Mon, Nov 30, 2009 at 10:19 AM, David Sainte-claire <
[email protected]> wrote:

searched for, but here I get something like this when I loop through the
to be just normal text to me.

Anyone have any ideas? It looks like something that REXML is doing
since it outputs the path to the element in the document Product →
Support but I don’t have a clue where the _Link is coming from?

Rule one of anything like this is to remove each element from the XML
file
one by one and see which one is breaking you. In this case you will find
that it’s the link element because you are using ampersands in the URL.
See
here

http://www.w3.org/TR/xhtml1/guidelines.html#C_12

John

davidstokar · November 30, 2009, 7:45pm

On 30.11.2009 19:33, John W Higgins wrote:

option that I can set on the puts method that will solve it. Hopefully
Find related product support and help in
puts “Did XPATH match any profile elements?”
Did XPATH match any profile elements?
Support but I don’t have a clue where the _Link is coming from?

Rule one of anything like this is to remove each element from the XML file
one by one and see which one is breaking you. In this case you will find
that it’s the link element because you are using ampersands in the URL. See
here

XHTML 1.0 - HTML Compatibility Guidelines

One more hint: when posting things like this it is best to provide a
complete example. From the XPath given it is clear that something
must be missing (there are no “product” tags). It’s pretty easy with
REXML:

robert@fussel ~
$ cat x.rb

require ‘rexml/document’

data = REXML::Document.new <<‘DOC’

eSupport Find related product support and help in http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel DOC

data.elements.each(“product/profile/support”){|element|
element.elements.each() do |child|
puts “Did XPATH match any profile elements?”
puts child.text
end
}

Which on my box produces:

robert@fussel ~
$ ruby19 x.rb
/usr/local/lib/ruby19/1.9.1/rexml/parsers/treeparser.rb:95:in rescue in parse': #<RuntimeError: Illegal character '&' in raw string " (REXML::ParseException) http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel "> /usr/local/lib/ruby19/1.9.1/rexml/text.rb:155:in block in check’
/usr/local/lib/ruby19/1.9.1/rexml/text.rb:153:in scan' /usr/local/lib/ruby19/1.9.1/rexml/text.rb:153:in check’
/usr/local/lib/ruby19/1.9.1/rexml/text.rb:125:in parent=' /usr/local/lib/ruby19/1.9.1/rexml/parent.rb:19:in add’
/usr/local/lib/ruby19/1.9.1/rexml/parsers/treeparser.rb:45:in parse' /usr/local/lib/ruby19/1.9.1/rexml/document.rb:228:in build’
/usr/local/lib/ruby19/1.9.1/rexml/document.rb:43:in initialize' x.rb:4:in new’
x.rb:4:in `’
…
Illegal character ‘&’ in raw string "
http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel
"
Line: 8
Position: 221
Last 80 unconsumed characters:

from /usr/local/lib/ruby19/1.9.1/rexml/parsers/treeparser.rb:20:in `parse' from /usr/local/lib/ruby19/1.9.1/rexml/document.rb:228:in `build' from /usr/local/lib/ruby19/1.9.1/rexml/document.rb:43:in `initialize' from x.rb:4:in `new' from x.rb:4:in `'

robert@fussel ~
$

How did you manage to get REXML parse this?

Kind regards

robert

davidstokar · November 30, 2009, 7:43pm

John W Higgins wrote:

Morning David,

On Mon, Nov 30, 2009 at 10:19 AM, David Sainte-claire <
[email protected]> wrote:

searched for, but here I get something like this when I loop through the
to be just normal text to me.

Anyone have any ideas? It looks like something that REXML is doing
since it outputs the path to the element in the document Product →
Support but I don’t have a clue where the _Link is coming from?

Rule one of anything like this is to remove each element from the XML
file
one by one and see which one is breaking you. In this case you will find
that it’s the link element because you are using ampersands in the URL.
See
here

XHTML 1.0 - HTML Compatibility Guidelines

John

Thanks for the help, in terms of how to narrow it down, but if I change
my XPATH statement to look more like this:

data.elements.each("product/profile/support/title"){|element|
  puts element.text
}

Where product is the root note, and I’m only pulling out the title
element, I still get output that looks like this:

???Product_Support_Link_Title???

Can the link element be messing up my XPATH query even though I’m not
looking at that element?

davidstokar · November 30, 2009, 7:52pm

Morning Again David,

On Mon, Nov 30, 2009 at 10:43 AM, David Sainte-claire <
[email protected]> wrote:

???Product_Support_Link_Title???

Can the link element be messing up my XPATH query even though I’m not
looking at that element?

The ampersand messes things up because if you confuse the parser then
how is
it supposed to figure out where the end of the support element is? XML
is
sort of a “it works or it doesn’t” concept most of the time. Not much
wiggle
room for the most part.

John

davidstokar · November 30, 2009, 8:15pm

David,

On Mon, Nov 30, 2009 at 10:58 AM, David Sainte-claire <
[email protected]> wrote:

Where data is the body of an HTTP GET request to a ReST-ish web service
(I can’t post the endpoint of the web service since it’s an internal
server)

First, you need to get your internal service fixed so it returns
properly
XML encoded URLs. You are not currently getting XML back from the
service so
your stuck with garbage at the moment. If you get really stuck you’re
going
to need to find and replace those ampersands.

URL that has & between all the query parameters

You don’t have to - any XML parser will make the change for you because
it
understands the & encodings and will return you the proper string
with
nice pretty ampersands Please try using the proper XML string prior
to
complaining that it’s not what you want.

John

davidstokar · November 30, 2009, 7:58pm

Robert K. wrote:

On 30.11.2009 19:33, John W Higgins wrote:

option that I can set on the puts method that will solve it. Hopefully
Find related product support and help in
puts “Did XPATH match any profile elements?”
Did XPATH match any profile elements?
Support but I don’t have a clue where the _Link is coming from?

Rule one of anything like this is to remove each element from the XML file
one by one and see which one is breaking you. In this case you will find
that it’s the link element because you are using ampersands in the URL. See
here

XHTML 1.0 - HTML Compatibility Guidelines

One more hint: when posting things like this it is best to provide a
complete example. From the XPath given it is clear that something
must be missing (there are no “product” tags). It’s pretty easy with
REXML:

robert@fussel ~
$ cat x.rb

require ‘rexml/document’

data = REXML::Document.new <<‘DOC’

eSupport Find related product support and help in http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel DOC
data.elements.each(“product/profile/support”){|element|
element.elements.each() do |child|
puts “Did XPATH match any profile elements?”
puts child.text
end
}

Which on my box produces:

robert@fussel ~
$ ruby19 x.rb
/usr/local/lib/ruby19/1.9.1/rexml/parsers/treeparser.rb:95:in rescue in parse': #<RuntimeError: Illegal character '&' in raw string " (REXML::ParseException) http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel "> /usr/local/lib/ruby19/1.9.1/rexml/text.rb:155:in block in check’
/usr/local/lib/ruby19/1.9.1/rexml/text.rb:153:in scan' /usr/local/lib/ruby19/1.9.1/rexml/text.rb:153:in check’
/usr/local/lib/ruby19/1.9.1/rexml/text.rb:125:in parent=' /usr/local/lib/ruby19/1.9.1/rexml/parent.rb:19:in add’
/usr/local/lib/ruby19/1.9.1/rexml/parsers/treeparser.rb:45:in parse' /usr/local/lib/ruby19/1.9.1/rexml/document.rb:228:in build’
/usr/local/lib/ruby19/1.9.1/rexml/document.rb:43:in initialize' x.rb:4:in new’
x.rb:4:in `’
…
Illegal character ‘&’ in raw string "
http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel
"
Line: 8
Position: 221
Last 80 unconsumed characters:
from /usr/local/lib/ruby19/1.9.1/rexml/parsers/treeparser.rb:20:in `parse' from /usr/local/lib/ruby19/1.9.1/rexml/document.rb:228:in `build' from /usr/local/lib/ruby19/1.9.1/rexml/document.rb:43:in `initialize' from x.rb:4:in `new' from x.rb:4:in `'
robert@fussel ~
$

How did you manage to get REXML parse this?

Kind regards

robert

That’s really interesting. This works, but just gives that odd output
like this: ???Product_Support_Link_Title???

Here is exactly what I’m doing:

I have XML that looks like this:

eSupport Find related product support and help in http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel

And I’m walking it using REXML like this:
data.elements.each(“product/profile/support”){|element|
element.elements.each() do |child|
puts “Did XPATH match any profile elements?”
puts child.text
end
}

Where data is the body of an HTTP GET request to a ReST-ish web service
(I can’t post the endpoint of the web service since it’s an internal
server)

I’m invoking the web service from a file called product.rb using the
Rails command script/runner app/models/product.rb

Maybe if I had done it all from IRB I would have gotten the illegal
character exception.

Is there any way that I can ignore the formatting rules for illegal
character and just take the whole text as a string? The article that
the first person posted said that the webservice should return links in
this format:
http://www.somecompany.com/US/perl/model-home.pl?XID=M:products:crmportal&LOC=3&mdl=someModel

using & instead of just & between arguments in the URL, which might
fly with REXML, but I’m not sure how to strip all that back off to make
it into a valid link since my browser doesn’t know what to do with the
URL that has & between all the query parameters

Again, apologies if these are totally naive questions. I’m pretty new
at this…

Thanks,
David

davidstokar · November 30, 2009, 8:21pm

Where data is the body of an HTTP GET request to a ReST-ish web service
(I can’t post the endpoint of the web service since it’s an internal
server)

First, you need to get your internal service fixed so it returns
properly
XML encoded URLs. You are not currently getting XML back from the
service so
your stuck with garbage at the moment. If you get really stuck you’re
going
to need to find and replace those ampersands.

URL that has & between all the query parameters

You don’t have to - any XML parser will make the change for you because
it
understands the & encodings and will return you the proper string
with
nice pretty ampersands Please try using the proper XML string prior
to
complaining that it’s not what you want.

John
Hey John! Thanks! You saved me so much time! I wasn’t exactly
complaining. I just didn’t fully understand the implications of having
ampersands in XML document. Now that I know, I can tell the guys who
wrote the service that they need to clean it up! I’ll try to keep my
questions to a minimum, or at least the really obviously NEWB ones…

Thanks again