Forum: Ruby Which behaves correctly, Hpricot or Nokogiri?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
A7690aa3cc37bb04399041f0049bc21a?d=identicon&s=25 Just Another Victim of the Ambient Morality (Guest)
on 2009-02-02 11:03
(Received via mailing list)
I've been considering switching to Nokogiri instead of Hpricot, mostly
'cause Mechanize has switched.  However, the two actually behave quite
differently.  The Nokogiri objects don't simulate standard container
behavior nearly as well as Hpricot.  I also noticed that this:


require 'nokogiri'
require 'hpricot'

xml = '<first look="Big &amp; small...">content</first>'

doc = Nokogiri::XML(xml)
puts doc.search('first')[0].attributes['look']
doc = Hpricot(xml)
puts doc.search('first')[0].attributes['look']


    ...produces this output:


Big &amp; small...
Big & smal...


    I don't know which output is the correct one.  Does anyone know
what's
going on here?
    Thank you...
Aaca034456897ccbc8bb14953c4a41c1?d=identicon&s=25 Radosław Bułat (radarek)
on 2009-02-02 13:52
(Received via mailing list)
On Mon, Feb 2, 2009 at 11:00 AM, Just Another Victim of the Ambient
Morality <ihatespam@hotmail.com> wrote:
> xml = '<first look="Big & small...">content</first>'

It's not valid xml. It should be "Big &amp; small..."
I guess that for non-valid xml there is now "valid" behavior. Ask
hpricot and nokogiri developers what happen when xml is not valid
(they try to fix it or smth?)

> Big & small...
> Big & smal...

Strange. I get:
Big  small...
Big & small...

The difference is about '&' which is not valid in xml (&amp; should be
used instead).

--
Pozdrawiam

Rados³aw Bu³at
http://radarek.jogger.pl - mój blog
A7690aa3cc37bb04399041f0049bc21a?d=identicon&s=25 Just Another Victim of the Ambient Morality (Guest)
on 2009-02-02 20:16
(Received via mailing list)
"Rados³aw Bu³at" <radek.bulat@gmail.com> wrote in message
news:de8b82ea0902020449t586cb2e7jfaeca2976bee4852@mail.gmail.com...
> It's not valid xml. It should be "Big &amp; small..."
> I guess that for non-valid xml there is now "valid" behavior. Ask
> hpricot and nokogiri developers what happen when xml is not valid
> (they try to fix it or smth?)

    Actually, "Big &amp; small" is what I wrote in the example.  The
second
output is erroneously missing an "l" but I think that's understood...
    I'm wondering if anyone knows what the correct behaviour is supposed
to
be...

    Oh, I get it.  Maybe my use of & amp ; was translated in whatever
client
you're using?
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2009-02-02 21:01
(Received via mailing list)
Just Another Victim of the Ambient Morality wrote:

>
>     I don't know which output is the correct one.  Does anyone know what's
> going on here?

The second one is correct, because &amp; is an encoding, and an XML tool
should
use & outside its interface and &amp; inside its interface.

Now try these XPaths in Hpricot and NokoGiri - which combinations find
the node?

   first[ @look = 'Big &amp; small...' ]
   first[ @look = 'Big & small...' ]
This topic is locked and can not be replied to.