I went browsing in CPAN to find something interesting, and came up
with Algorithm::Merge.
I don’t use the perl version, but 3 way merging is something I do
often since we allow concurrent access with our source control at
work.
Merge.rb is a fairly straight port of the perl version. I did change
a callback to a block, and added some symbols in place of numeric
constants. I need to add better documentation, but I wanted to get
this in before it was too late for the summary.
Usage:
original= “Ok,\n this is a test sentence\n which will be edited.”
edited =“Ok,\n this is a sample phrase\n which has been edited.”
change=“Hello World,\n this is a test phrase\n which I edited.”
puts “\nSplit by lines\n--------------------”
$;=“\n”
Merge::diff3(original.split, edited.split, change.split).each{|l| puts
l.inspect}
puts Merge::merge(original.split, edited.split, change.split)
puts “\nSplit by words\n--------------------”
$;=" "
Merge::diff3(original.split, edited.split, change.split).each{|l| puts
l.inspect}
puts Merge::merge(original.split, edited.split,
change.split){|conflicts|
[“<<”]+conflicts[1]+[“|”]+conflicts[2]+[“>>”]}.join(’ ')
yields:
Split by lines
[“r”, “Ok,”, “Ok,”, “Hello World,”]
[“c”, " this is a test sentence", " this is a sample phrase", " this
is a test phrase"]
[“c”, " which will be edited.“, " which has been edited.”, " which I
edited."]
Hello World,
this is a sample phrase
which has been edited.
this is a test phrase
which I edited.
}
Split by words
[“r”, “Ok,”, “Ok,”, “Hello”]
[“r”, nil, nil, “World,”]
[“u”, “this”, “this”, “this”]
[“u”, “is”, “is”, “is”]
[“u”, “a”, “a”, “a”]
[“l”, “test”, “sample”, “test”]
[“o”, “sentence”, “phrase”, “phrase”]
[“u”, “which”, “which”, “which”]
[“c”, “will”, “has”, “I”]
[“c”, “be”, “been”, nil]
[“u”, “edited.”, “edited.”, “edited.”]
Hello World, this is a sample phrase which << has been | I >> edited.
Bugs:
- Merge::diff3(original,edited, change) - does a character-based diff,
but returns inconsistent results (lines like [u, e, s, e]). I think
this is because the callback_map has some no-ops where it should have
valid callbacks, but it could be due to a porting error. I am still
struggling to completely grok the use of the callback_map, with hopes
of simplifying/clarifying it.
Question:
Can I add to or replace the Perl license with the ruby one?
Source:
---- Merge.rb -----
module Merge
Module Merge
Three-way merge and diff
based on perl’s Algorithm::Merge
Copyright (C) 2003 Texas A&M University. All Rights Reserved.
This module is free software; you may redistribute it and/or
modify it under the same terms as Perl itself.
ported to Ruby
require ‘diff/lcs’
Given references to three lists of items, diff3 performs a
three-way difference.
This function returns an array of operations describing how the
left and right lists differ from the original list. In scalar
context, this function returns a reference to such an array.
Given the following three lists,
original: a b c e f h i k
left: a b d e f g i j k
right: a b c d e h i j k
merge: a b d e g i j k
we have the following result from diff3:
[ ‘u’, ‘a’, ‘a’, ‘a’ ],
[ ‘u’, ‘b’, ‘b’, ‘b’ ],
[ ‘l’, ‘c’, undef, ‘c’ ],
[ ‘o’, undef, ‘d’, ‘d’ ],
[ ‘u’, ‘e’, ‘e’, ‘e’ ],
[ ‘r’, ‘f’, ‘f’, undef ],
[ ‘o’, ‘h’, ‘g’, ‘h’ ],
[ ‘u’, ‘i’, ‘i’, ‘i’ ],
[ ‘o’, undef, ‘j’, ‘j’ ],
[ ‘u’, ‘k’, ‘k’, ‘k’ ]
The first element in each row is the array with the difference:
c - conflict (no two are the same)
l - left is different
o - original is different
r - right is different
u - unchanged
The next three elements are the lists from the original, left,
and right arrays respectively that the row refers to (in the synopsis,
def Merge::diff3( pivot, doc_a, doc_b)
ret = []
no_change = proc do |args|
ret << ['u', pivot[args[0]], doc_a[args[1]], doc_b[args[2]] ]
end
conflict = proc do |args|
p= pivot[args[0]] if args[0]
a= doc_a[args[1]] if args[1]
b= doc_b[args[2]] if args[2]
ret << ['c', p, a, b]
end
diff_a = proc do |args|
case args.size
when 1
ret << ['o',pivot[args[0]], nil, nil]
when 2
ret << ['o',nil, doc_a[args[0]], doc_b[args[1]]]
when 3
ret << ['o', pivot[args[0]], doc_a[args[1]], doc_b[args[2]]]
end
end
diff_b = proc do |args|
case args.size
when 1
ret << ['l', nil, doc_a[args[0]], nil]
when 2
ret << ['l', pivot[args[0]], nil, doc_b[args[1]]]
when 3
ret << ['l', pivot[args[0]], doc_a[args[1]], doc_b[args[2]]]
end
end
diff_c = proc do |args|
case args.size
when 1
ret << ['r', nil, nil, doc_b[args[0]]]
when 2
ret << ['r', pivot[args[0]], doc_a[args[1]], nil]
when 3
ret << ['r', pivot[args[0]], doc_a[args[1]], doc_b[args[2]]]
end
end
traverse_sequences3(pivot, doc_a, doc_b,
{:NO_CHANGE=>no_change, :CONFLICT=>conflict,
:A_DIFF=> diff_a, :B_DIFF=>diff_b, :C_DIFF=>diff_c}
)
return ret
end
#callbacks for Diff::LCS
class LCS_Traverse_Callbacks
def initialize diffs
@diffs = diffs
end
def [] l,r
@diffs[@left=l]=[]
@diffs[@right=r]=[]
self
end
def match *args
end
def discard_a event
@diffs[@left]<<event.old_position
end
def discard_b event
@diffs[@right]<<event.new_position
end
end
constants for traverse_sequences
D=nil
AB_A=32
AB_B=16
AC_A=8
AC_C=4
BC_B=2
BC_C=1
CB_B=5 #not used in calculations
CB_C=3 #not used in calculations
@base_doc = {AB_A=>:A,AB_B=>:B,AC_A=>:A,AC_C=>:C,BC_B=>:B,BC_C=>:C}
def Merge::traverse_sequences3(adoc, bdoc, cdoc, callbacks = {})
target_len = [bdoc.size,cdoc.size].min
bc_different_len = (bdoc.size != cdoc.size)
diffs = Hash.new([])
# callbacks#match:: Called when +a+ and +b+ are
pointing
# to common elements in +:A+ and
+:B+.
# callbacks#discard_a:: Called when +a+ is pointing to
an
# element not in +:B+.
# callbacks#discard_b:: Called when +b+ is pointing to
an
# element not in +:A+.
# The methods for callbacks#match,
callbacks#discard_a,
# and callbacks#discard_b are invoked with an event
comprising
# the action (“=”, “+”, or “-”, respectively), the indicies +ii+
and
# +jj+, and the elements :A[ii] and :B[jj].
Return
# values are discarded by #traverse_sequences.
ts_callbacks = LCS_Traverse_Callbacks.new(diffs)
Diff::LCS::traverse_sequences(adoc, bdoc, ts_callbacks[AB_A, AB_B])
Diff::LCS::traverse_sequences(adoc, cdoc, ts_callbacks[AC_A,AC_C])
if (bc_different_len)
Diff::LCS::traverse_sequences(cdoc, bdoc, ts_callbacks[CB_C,CB_B])
Diff::LCS::traverse_sequences(bdoc, cdoc, ts_callbacks[BC_B,BC_C])
if diffs[CB_B] != diffs[BC_B] || diffs[CB_C] != diffs[BC_C]
puts "Diff::diff is not symmetric for second and third
sequences - results might not be correct";
#trim to equal lengths and try again
b_len, c_len = bdoc.size, cdoc.size
bdoc_save = bdoc.slice!(target_len..-1)
cdoc_save = cdoc.slice!(target_len..-1)
Diff::LCS::traverse_sequences(bdoc, cdoc,
ts_callbacks[BC_B,BC_C])
#mark the trimmed part as different and then restore
diffs[BC_B] += (target_len..b_len).to_a if target_len < b_len
diffs[BC_C] += (target_len..c_len).to_a if target_len < c_len
bdoc.concat bdoc_save
cdoc.concat cdoc_save
end
else # not bc_different_len
Diff::LCS::traverse_sequences(bdoc, cdoc, ts_callbacks[BC_B,BC_C])
end
pos = {:A=>0,:B=>0,:C=>0}
sizes ={:A=>adoc.size, :B=>bdoc.size, :C=>cdoc.size}
matches=[]
noop = proc {}
# Callback_Map is indexed by the sum of AB_A, AB_B, ..., as
indicated by @matches
# this isn’t the most efficient, but it’s a bit easier to
maintain and
# read than if it were broken up into separate arrays
# half the entries are not noop - it would seem then that no
# entries should be noop. I need patterns to figure out what
the
# other entries are.
callback_Map = [
[ callbacks[:NO_CHANGE], :A, :B, :C ], # 0 - no matches
[ noop, ], # 1 -
BC_C
[ callbacks[:B_DIFF], :B ], #*2 -
BC_B
[ noop, ], # 3 -
BC_B BC_C
[ noop, ], # 4 -
AC_C
[ callbacks[:C_DIFF], :C ], # 5 -
AC_C BC_C
[ noop, ], # 6 -
AC_C BC_B
[ noop, ], # 7 -
AC_C BC_B BC_C
[ callbacks[:A_DIFF], :A ], # 8 - AC_A
[ noop, ], # 9 - AC_A
BC_C
[ callbacks[:C_DIFF], :A, :B ], # 10 - AC_A
BC_B
[ callbacks[:C_DIFF], :A, :B, ], # 11 - AC_A
BC_B BC_C
[ noop, ], # 12 - AC_A
AC_C
[ noop, ], # 13 - AC_A
AC_C BC_C
[ callbacks[:C_DIFF], :A, :B, ], # 14 - AC_A
AC_C BC_B
[ callbacks[:C_DIFF], :A, :B, :C ], # 15 - AC_A
AC_C BC_B BC_C
[ noop, ], # 16 - AB_B
[ noop, ], # 17 - AB_B
BC_C
[ callbacks[:B_DIFF], :B ], # 18 - AB_B
BC_B
[ noop, ], # 19 - AB_B
BC_B BC_C
[ callbacks[:A_DIFF], :B, :C ], # 20 - AB_B
AC_C
[ noop, ], # 21 - AB_B
AC_C BC_C
[ noop, ], # 22 - AB_B
AC_C BC_B
[ callbacks[:CONFLICT], :A, :B, :C ], # 23 - AB_B
AC_C BC_B BC_C
[ callbacks[:B_DIFF], :B ], # 24 - AB_B AC_A
[ noop, ], # 25 - AB_B AC_A
BC_C
[ callbacks[:C_DIFF], :B, :C ], # 26 - AB_B AC_A
BC_B
[ noop, ], # 27 - AB_B AC_A
BC_B BC_C
[ callbacks[:A_DIFF], :B, :C ], # 28 - AB_B AC_A
AC_C
[ noop, ], # 29 - AB_B AC_A
AC_C BC_C
[ noop, ], # 30 - AB_B AC_A
AC_C BC_B
[ callbacks[:B_DIFF], :B ], # 31 - AB_B AC_A
AC_C BC_B BC_C
[ callbacks[:NO_CHANGE], :A, :B, :C ], # 32 - AB_A
[ callbacks[:B_DIFF], :A, :C ], # 33 - AB_A
BC_C
[ noop, ], # 34 - AB_A
BC_B
[ callbacks[:B_DIFF], :A, :C ], # 35 - AB_A
BC_B BC_C
[ noop, ], # 36 - AB_A
AC_C
[ noop, ], # 37 - AB_A
AC_C BC_C
[ noop, ], # 38 - AB_A
AC_C BC_B
[ noop, ], # 39 - AB_A
AC_C BC_B BC_C
[ callbacks[:A_DIFF], :A, ], # 40 - AB_A AC_A
[ noop, ], # 41 - AB_A AC_A
BC_C
[ callbacks[:A_DIFF], :A ], # 42 - AB_A AC_A
BC_B
[ noop, ], # 43 - AB_A AC_A
BC_B BC_C
[ noop, ], # 44 - AB_A AC_A
AC_C
[ callbacks[:C_DIFF], :A, D, :C ], # 45 - AB_A AC_A
AC_C BC_C ##ADS: I think this should be :CONFLICT??
[ noop, ], # 46 - AB_A AC_A
AC_C BC_B
[ noop, ], # 47 - AB_A AC_A
AC_C BC_B BC_C
[ noop, ], # 48 - AB_A AB_B
[ callbacks[:B_DIFF], :A, :C ], # 49 - AB_A AB_B
BC_C
[ noop, ], # 50 - AB_A AB_B
BC_B
[ callbacks[:B_DIFF], :A, :B, :C ], # 51 - AB_A AB_B
BC_B BC_C
[ callbacks[:A_DIFF], :B, :C ], # 52 - AB_A AB_B
AC_C
[ noop, ], # 53 - AB_A AB_B
AC_C BC_C
[ noop, ], # 54 - AB_A AB_B
AC_C BC_B
[ callbacks[:C_DIFF], :C ], # 55 - AB_A AB_B
AC_C BC_B BC_C
[ callbacks[:B_DIFF], :A, :C ], # 56 - AB_A AB_B AC_A
[ noop, ], # 57 - AB_A AB_B AC_A
BC_C
[ callbacks[:CONFLICT], :A, :B, D ], # 58 - AB_A AB_B AC_A
BC_B ##ADS: I changed this one to :CONFLICT
[ noop, ], # 59 - AB_A AB_B AC_A
BC_B BC_C
[ callbacks[:A_DIFF], :A, :B, :C ], # 60 - AB_A AB_B AC_A
AC_C
[ callbacks[:CONFLICT], :A, D, :C ], # 61 - AB_A AB_B AC_A
AC_C BC_C
[ callbacks[:CONFLICT], :A, :B, D ], # 62 - AB_A AB_B AC_A
AC_C BC_B
[ callbacks[:CONFLICT], :A, :B, :C ], # 63 - AB_A AB_B AC_A
AC_C BC_B BC_C
]
#while there is something to work with
while diffs.values.find{|e|e.size>0} &&
[:A,:B,:C].find{|n|pos[n]<sizes[n]}
#find all the differences at the current position of each doc
matchset=[:A,:B,:C].inject([]) do |ms,i|
ms+diffs.find_all {|k,v|@base_doc[k]==i && v[0]==pos[i]}
end
callback_num=matchset.uniq.inject(0){|cb,val| (cb|val[0])}
callback = callback_Map[callback_num]
args = callback[1..-1]
callback[0].call(args.map{|ar| ar&&pos[ar]})
args.each do |n|
pos[n]+=1 if n
case n
when :A
diffs[AB_A].shift while diffs[AB_A][0] && ( diffs[AB_A][0]
< pos[n] )
diffs[AC_A].shift while diffs[AC_A][0] && ( diffs[AC_A][0]
< pos[n] )
when :B
diffs[AB_B].shift while diffs[AB_B][0] && ( diffs[AB_B][0]
< pos[n] )
diffs[BC_B].shift while diffs[BC_B][0] && ( diffs[BC_B][0]
< pos[n] )
when :C
diffs[AC_C].shift while diffs[AC_C][0] && ( diffs[AC_C][0]
< pos[n] )
diffs[BC_C].shift while diffs[BC_C][0] && ( diffs[BC_C][0]
< pos[n] )
end
end #args.each
#raise “args empty” if args.empty? ##ADS: args.empty? is true
if the callback was a no-op. I don’t think that should happen.
break if args.empty?
end
#this part takes care of the leftovers
bits={:A=>4,:B=>2,:C=>1}
while [:A,:B,:C].find{|n|pos[n]<sizes[n]}
match = 0
args=[]
[:A,:B,:C].each do |i|
if pos[i]<sizes[i]
match|=bits[i]
args << pos[i]
pos[i]+=1
end
end
switch = [0,5,24,17,34,8,10,0][match] #ADS: I totally don't
understand how these callbacks were chosen
callback_Map[switch][0].call(*args) if callback_Map[switch][0]
end
end
Given references to three lists of items, merge performs a three-way
merge. The merge function uses the diff3 function to do most of
the work.
The optional block parameter is called for conflicts. It should
accept an array of 3 arrays
The first array holds a list of elements from the original list.
The second array has a list of elements from the left list.
The last array holds a list of elements from the right list.
The block should return a list of elements to place in the merged
list in place of the conflict.
The default conflict handler returns:
[“<!-- ------ START CONFLICT ------ -->”,
args[1],
“<!-- ---------------------------- -->”,
args[2],
“<!-- ------ END CONFLICT ------ -->}”]
def Merge::merge(pivot,doc_a, doc_b)
conflict_callback = proc do |args|
["<!-- ------ START CONFLICT ------ -->",
args[1],
"<!-- ---------------------------- -->",
args[2],
"<!-- ------ END CONFLICT ------ -->}"]
end
diff = diff3(pivot, doc_a, doc_b);
ret = []
conflict = [[],[],[]]
diff.each do |diffline|
i = 0
if diffline[0] == 'c' # conflict
conflict[0] << diffline[1] if diffline[1];
conflict[1] << diffline[2] if diffline[2];
conflict[2] << diffline[3] if diffline[3];
else
unless (conflict[0].empty? && conflict[1].empty? &&
conflict[2].empty?)
ret << (block_given? ? yield(conflict) :
conflict_callback.call(conflict))
conflict = [[],[],[]]
end
case diffline[0]
when ‘u’ # unchanged
ret << diffline[2] || diffline[3];
when ‘o’,‘l’ # added by both or left
ret << diffline[2] if diffline[2]
when ‘r’ #added by right
ret << diffline[3] if diffline[3]
end
end
end
unless (conflict[0].empty? && conflict[1].empty? &&
conflict[2].empty?)
ret << (block_given? ? yield(conflict) :
conflict_callback.call(conflict))
end
ret
end
end