lorax version 0.2.0 has been released!
The Lorax is a full diff and patch library for XML/HTML documents, based
It can tell you whether two XML/HTML documents are identical, or if
they’re not, tell you what’s different. In trivial cases, it can even
apply the patch.
It’s based loosely on Gregory Cobena’s master’s thesis paper, which
generates deltas in less than O(n * log n) time, accepting some
tradeoffs in the size of the delta set. You can find his paper at
“I am the Lorax, I speak for the trees.”
- Better handling of whitespace: blank text nodes are ignored, as is
leading and trailing whitespace in text nodes. GH#2.
== Features / Problems
- Detect differences between documents, or tell whether two documents
- Generate patches for the differences between documents.
- Apply patches for trivial cases.
- More work needs to be done to make sure patches apply cleanly.
Imagine you have two Nokogiri::XML::Documents. You can tell if they’re
You can generate a delta set (currently opaque (sorry kids)):
delta_set = Lorax.diff(doc1, doc2)
and apply the delta set as a patch to the original document:
new_doc = delta_set.apply(doc1)