Hpricot test for equivalence of two xml segments?

I’m looking through what documentation I can find for Hpricot (nokogirl
wouldn’t
install for me, and I just wand a quick an simple solution), and I
cannot find a
simple method to take two xml strings and find out if they are
equivalent. I’m
getting a bunch of xhmtl back from our rendering agent with random
permutations
of attributes inside of the tags, and I want a quick and easy ruby way
to find
out of segments are equivalent without writing my own regex based
parser…???

It seems like there should be a simple method for this. If I had
written
Hpricot, equivalence of segments would have been the first method I
would have
written…???

xc

On Sat, Jul 17, 2010 at 12:28 AM, Xeno C. / Eskimo North and
Gmail <
[email protected]> wrote:

I’m looking through what documentation I can find for Hpricot (nokogirl
wouldn’t install for me, and I just wand a quick an simple solution), and I
cannot find a simple method to take two xml strings and find out if they are
equivalent. I’m getting a bunch of xhmtl back from our rendering agent with
random permutations of attributes inside of the tags, and I want a quick and
easy ruby way to find out of segments are equivalent without writing my own
regex based parser…???

I can think of a few definitions for equivalence. One definition would
simply require unifying the case of both strings and checking if they
are
the same. A second definition would require building a tree of the
structure
in each string, including attributes, sorting it, and looping over them
to
check if they contain the same elements (Nokogiri’s XML::NodeSet does
something like this with ==). A third definition would build on the
second
one, while treating certain tags as equivalent to other tags (for
example q
is equivalent to blockquote).

What’s your definition of equivalence for two xml documents or
fragments?

Ammar

On 10-07-16 03:16 PM, Ammar A. wrote:

What’s your definition of equivalence for two xml documents or fragments?

Ammar

The only thing I am concerned about is permutations of attributes inside
the
tags. Everything else I’m seeing is regular. Is there something where
I can
parse all the tags in a segment and tell if they are equivalent and just
have
the attributes in different orders? I’m not even concerned about
different tag
forms. We don’t see that. A typical example is:

<

  • alt textMy Text
  • alt textMy
  • Text

    I need to have something that can help me judge such things as
    equivalent.
    Again, I NEVER see tag permutations, but just attribute permutations.

    Thank you for you response.

    Sincerely, Xeno

    On Fri, Jul 16, 2010 at 6:52 PM, Xeno C. / Eskimo North and Gmail
    <
    [email protected]> wrote:

    equivalent. I’m getting a bunch of xhmtl back from our rendering agent
    simply require unifying the case of both strings and checking if they are
    fragments?
    <

  • alt textMy Text
  • alt textMy Text
  • I need to have something that can help me judge such things as equivalent.
    Again, I NEVER see tag permutations, but just attribute permutations.

    You should take a look at Lorax:

    which is Nokogiri-based.

    Your definition of equivalence (the semantically correct one, imho) can
    be
    tested with:

    Lorax::Signature.new(Nokogiri::XML(string1).root).signature ==
    

    Lorax::Signature.new(Nokogiri::XML(string2).root).signature

    And note that Nokogiri will also alllow you to parse XML fragments.

    HTH,
    -m

    Again, I NEVER see tag permutations, but just attribute permutations.

    I believe you. Nokogirl wouldn’t install though…yes, and nor did
    Lorax…

    Looks like there’s an install site, but I hesitate to use something this
    outside
    the mainstream on a project like this. I don’t want to impose needless
    maintenance problems on my environment.

    On Sat, Jul 17, 2010 at 09:03:21AM +0900, Xeno C. / Eskimo North
    and Gmail wrote:

    I need to have something that can help me judge such things as equivalent.
    Again, I NEVER see tag permutations, but just attribute permutations.

    I believe you. Nokogirl wouldn’t install though…yes, and nor did Lorax…

    Do you mind emailing our list with the problems? We do our best to make
    sure that nokogiri works on most systems, so if you’re having trouble
    we’d love to hear about it:

    http://groups.google.com/group/nokogiri-talk

    Looks like there’s an install site, but I hesitate to use something
    this outside the mainstream on a project like this. I don’t want to
    impose needless maintenance problems on my environment.

    I’m not sure that nokogiri is outside the mainstream. Take a look at
    our gem downlods vs the hpricot gem downloads:

    nokogiri | RubyGems.org | your community gem host
    hpricot | RubyGems.org | your community gem host

    Or the frequency of commits:

    http://github.com/tenderlove/nokogiri/commits/master
    Commits · hpricot/hpricot · GitHub

    Or the mailing list activity:

    http://groups.google.com/group/nokogiri-talk
    http://librelist.com/browser/hpricot/

    But “mainstream” is a judgement is for you to make! :slight_smile: