Forum: Ruby lorax 0.1.0 Released

Posted by Mike Dalessio (Guest)
on 2010-03-10 04:39
(Received via mailing list)
lorax version 0.1.0 has been released!

* <http://github.com/flavorjones/lorax>

The Lorax is a diff and patch library for XML/HTML documents, based on
Nokogiri.

It can tell you whether two XML/HTML documents are identical, or if
they're not, tell you what's different. In trivial cases, it can even
apply the patch.

It's based loosely on Gregory Cobena's master's thesis paper, which
generates deltas in less than O(n * log n) time, accepting some
tradeoffs in the size of the delta set. You can find his paper at
http://gregory.cobena.free.fr/www/Publications/thesis.html.

This is an early alpha release, so please expect bugs. See the failing
tests for more information.

"I am the Lorax, I speak for the trees."

Changes:

## 0.1.0 (2010-03-09)

* Happy Birthday!
* Diffs and generates patches, and for trivial cases applies patches
correctly.
Posted by Tony Arcieri (Guest)
on 2010-03-10 04:54
(Received via mailing list)
On Tue, Mar 9, 2010 at 8:38 PM, Mike Dalessio 
<mike.dalessio@gmail.com>wrote:

> "I am the Lorax, I speak for the trees."
>

That's awesome, great name
Posted by Thomas Sawyer (7rans)
on 2010-03-10 12:10
(Received via mailing list)
On Mar 9, 10:38 pm, Mike Dalessio <mike.dales...@gmail.com> wrote:
> lorax version 0.1.0 has been released!
>
> * <http://github.com/flavorjones/lorax>
>
> The Lorax is a diff and patch library for XML/HTML documents, based on
> Nokogiri.
>
> It can tell you whether two XML/HTML documents are identical, or if
> they're not, tell you what's different. In trivial cases, it can even
> apply the patch.

Why not in every case?
Posted by Mike Dalessio (Guest)
on 2010-03-10 13:57
(Received via mailing list)
On Wed, Mar 10, 2010 at 6:09 AM, Intransition <transfire@gmail.com> 
wrote:

> > It can tell you whether two XML/HTML documents are identical, or if
> > they're not, tell you what's different. In trivial cases, it can even
> > apply the patch.
>
> Why not in every case?
>

Because there are still boogs! :-D

One example: the XPath pointing at the elements involved in the deltas
doesn't take into consideration the fact that other sibling elements may
have been inserted or removed as part of an earlier delta. Another 
example:
there are edge cases where Lorax can get confused by many identical 
sibling
nodes interleaved with changing elements (think whitespace in an HTML 
doc).

I'd like to note that the library uses dependency injection to allow a
modular choice of algorithm. So people with better CS chops than me can 
take
a whack at it by building their own delta-set generator for their 
favorite
algorithm, while still taking using the fast subtree signatures.

If you're curious and interested, I'd love to have more eyes and hands 
on
these issues. Both master and whitespace-fix branches have failing tests
which can tell you where to dive in. The TODO has information on class
responsibilities, algorithmic notes, missing integration tests and a 
list of
needed features (like an rspec matcher).
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.