well, I’m trying to get the levenshtein function to work, and it’s
close, but it’s not quite giving me the right answer and i have no idea
why.
here’s the code:
class Array2D
def initialize(width, height) @data = Array.new(width) { Array.new(height) }
end
def [](x, y) @data[x][y]
end
def []=(x, y, value) @data[x][y] = value
end
end
def levenshtein(s, t) #include cost?
m = s.size
n = t.size
d = Array2D.new(m+1,n-1)
for i in 0…m
d[i,0]=i
end
for j in 0..n
d[0,j]=j
end
for j in 1…n
for i in 1…m
if(s[i].eql?(t[i])) then
d[i,j]=d[i-1,j-1]
end
if not (s[i].eql?(t[i])) then
d[i,j] = [ (d[i-1,j]+1), (d[i,j-1]+1), (d[i-1,j-1]+1)
].min
end
end
end
puts “After:”+ d.inspect
puts "Ans: "
return d[m,n]
end #end func
when I call it:
puts “Call to levenshtein():”
puts levenshtein(“string1”,“2strings”)
Looking at the description at http://www.levenshtein.net/ it appears
that both m and n should be +1
The strings are then offset to allow indexing from 1. Guess you
either need to add a character (blank?)
to the beginning of both strings, or adjust your indexes into s and t
(the 1st char is the 0th index)
I’m totally pulling this out of my head, as I don’t have Ruby on my
work computer
Looking at the description at http://www.levenshtein.net/ it appears
that both m and n should be +1
The strings are then offset to allow indexing from 1. Guess you
either need to add a character (blank?)
to the beginning of both strings, or adjust your indexes into s and t
(the 1st char is the 0th index)
I’m totally pulling this out of my head, as I don’t have Ruby on my
work computer
Thank you for the reply. Unfortunately, I get the same error when I try
this
Now, that bugged me too, and I had a go (using more idiomatic ruby
ranges and Enumerable methods, and “enhancing” the String class):
Notes:
When first going through the matrix, I fill the initial values in the
first column and row, and set the rest to nil.
When traversing the matrix a second time, I start at index ‘1’ and end
at the index calculated from the size, but the strings actually use
zero-based indexes. To illustrate that:
S T R I N G ← This is the string - spaced out f. visibility
0 1 2 3 4 5 ← These are the corresponding index numbers
^ This is the last index, but the size is returned as 6.
So, when I am traversing the matrix, I use 1…6, but the corresponding
letters are at 0…5. That is the reason that I have to use
cost = (self[s-1] == other[t-1]) ? 0 : 1
so I really do compare the strings letter by letter.