Which behaves correctly, Hpricot or Nokogiri?


#1

I’ve been considering switching to Nokogiri instead of Hpricot, mostly
'cause Mechanize has switched. However, the two actually behave quite
differently. The Nokogiri objects don’t simulate standard container
behavior nearly as well as Hpricot. I also noticed that this:

require ‘nokogiri’
require ‘hpricot’

xml = ‘content’

doc = Nokogiri::XML(xml)
puts doc.search(‘first’)[0].attributes[‘look’]
doc = Hpricot(xml)
puts doc.search(‘first’)[0].attributes[‘look’]

...produces this output:

Big & small…
Big & smal…

I don't know which output is the correct one.  Does anyone know 

what’s
going on here?
Thank you…


#2

On Mon, Feb 2, 2009 at 11:00 AM, Just Another Victim of the Ambient
Morality removed_email_address@domain.invalid wrote:

xml = ‘content’

It’s not valid xml. It should be “Big & small…”
I guess that for non-valid xml there is now “valid” behavior. Ask
hpricot and nokogiri developers what happen when xml is not valid
(they try to fix it or smth?)

Big & small…
Big & smal…

Strange. I get:
Big small…
Big & small…

The difference is about ‘&’ which is not valid in xml (& should be
used instead).


Pozdrawiam

Rados³aw Bu³at
http://radarek.jogger.pl - mój blog


#3

“Rados³aw Bu³at” removed_email_address@domain.invalid wrote in message
news:removed_email_address@domain.invalid…

It’s not valid xml. It should be “Big & small…”
I guess that for non-valid xml there is now “valid” behavior. Ask
hpricot and nokogiri developers what happen when xml is not valid
(they try to fix it or smth?)

Actually, "Big & small" is what I wrote in the example.  The 

second
output is erroneously missing an “l” but I think that’s understood…
I’m wondering if anyone knows what the correct behaviour is supposed
to
be…

Oh, I get it.  Maybe my use of & amp ; was translated in whatever 

client
you’re using?


#4

Just Another Victim of the Ambient M. wrote:

I don't know which output is the correct one.  Does anyone know what's 

going on here?

The second one is correct, because & is an encoding, and an XML tool
should
use & outside its interface and & inside its interface.

Now try these XPaths in Hpricot and NokoGiri - which combinations find
the node?

first[ @look = ‘Big & small…’ ]
first[ @look = ‘Big & small…’ ]