This is the Levenshtein function I’m gankin’ for my file comparison
project (see “40 million comparison…” thread):
Modified slightly by John Perkins:
– removed $KCODE call
def distance(str1, str2)
unpack_rule = ‘C*’
s = str1.unpack(unpack_rule)
t = str2.unpack(unpack_rule)
n = s.length
m = t.length
return m if (0 == n) # stop the madness if either string is empty
return n if (0 == m)
d = (0…m).to_a
x = nil
(0…n).each do |i|
e = i + 1
(0…m).each do |j|
cost = (s[i] == t[j]) ? 0 : 1
x = [
d[j + 1] + 1, # insertion
e + 1, # deletion
d[j] + cost # substitution
d[j] = e
e = x
d[m] = x
When I ran this with test data in ruby 1.8 the output was 969, but
when I ran it on a 1.9 install the output was 1011. I’m aware that
some of the rules have changed, especially with arrays. Does anyone
see where the discrepancy lies, because I sure as heck don’t. The
files didn’t change so the distance shouldn’t either. Thanks for all
your help in advance.
Here’s an idea: a migration script that detects when and where a
script will break after the big 1.9/2 move.
I’m not that good of a rubyist, but I know there’s about a dozen
people reading these that could have it done by Friday night (I’m
looking at YOU, _why).
-----BEGIN PGP SIGNED MESSAGE-----
DJ Jazzy L. wrote:
| Here’s an idea: a migration script that detects when and where a
| script will break after the big 1.9/2 move.
| I’m not that good of a rubyist, but I know there’s about a dozen
| people reading these that could have it done by Friday night (I’m
| looking at YOU, _why).
Maybe the chance to get your 15 minutes of fame, and deepen your
understanding of Ruby at the same time.
Use recursive procedures for recursively-defined data structures.
~ - The Elements of Programming Style (Kernighan & Plaugher)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
I think a no-brainer prerequisite to this script would be 2.0 actually