Levenshtein function issue

well, I’m trying to get the levenshtein function to work, and it’s
close, but it’s not quite giving me the right answer and i have no idea
why.

here’s the code:
class Array2D
def initialize(width, height)
@data = Array.new(width) { Array.new(height) }
end
def [](x, y)
@data[x][y]
end
def []=(x, y, value)
@data[x][y] = value
end
end

def levenshtein(s, t)
#include cost?

m = s.size
n = t.size

d = Array2D.new(m+1,n-1)

for i in 0…m
d[i,0]=i
end

for j in 0..n
d[0,j]=j

end

for j in 1…n
for i in 1…m
if(s[i].eql?(t[i])) then
d[i,j]=d[i-1,j-1]
end

  if not (s[i].eql?(t[i])) then
  d[i,j] = [  (d[i-1,j]+1),     (d[i,j-1]+1),    (d[i-1,j-1]+1)

].min
end
end
end

puts “After:”+ d.inspect

puts "Ans: "
return d[m,n]
end #end func

when I call it:
puts “Call to levenshtein():”
puts levenshtein(“string1”,“2strings”)

and the output…
here’s the output:
Call to levenshtein():
After:#<Array2D:0x283ee90 @data=[[0, 1, 2, 3, 4, 5, 6, 7, 8], [1, 1, 2,
3, 4, 5,
6, 7, 8], [2, 2, 2, 3, 4, 5, 6, 7, 8], [3, 3, 3, 3, 4, 5, 6, 7, 8], [4,
4, 4, 4
, 4, 5, 6, 7, 8], [5, 5, 5, 5, 5, 5, 6, 7, 8], [6, 6, 6, 6, 6, 6, 6, 7,
8], [7,
7, 7, 7, 7, 7, 7, 7, 8]]>
Ans:
8

If you check those two strings I tested with at this URL, my resulting
matrix is wrong (no idea why), and the answer should be 2…
check it here:
http://www.miislita.com/searchito/levenshtein-edit-distance.html

=

THANK YOU SO MUCH FOR ANY HELP!

d = Array2D.new(m+1,n-1)

You have ‘n-1’ here, but subsequently use indexes up to n.

cheers

Chris H. wrote:

� d = Array2D.new(m+1,n-1)

You have ‘n-1’ here, but subsequently use indexes up to n.

cheers

Thanks for the reply. Yes, but when I change that, I just get an
error… (that’s why I had originally changed it)

On Sep 26, 12:43 am, Joe S. [email protected] wrote:

error… (that’s why I had originally changed it)

Posted viahttp://www.ruby-forum.com/.

Looking at the description at http://www.levenshtein.net/ it appears
that both m and n should be +1
The strings are then offset to allow indexing from 1. Guess you
either need to add a character (blank?)
to the beginning of both strings, or adjust your indexes into s and t
(the 1st char is the 0th index)

I’m totally pulling this out of my head, as I don’t have Ruby on my
work computer

On Sep 27, 3:37 pm, Joe S. [email protected] wrote:

to the beginning of both strings, or adjust your indexes into s and t
(the 1st char is the 0th index)

I’m totally pulling this out of my head, as I don’t have Ruby on my
work computer

Thank you for the reply. Unfortunately, I get the same error when I try
this :frowning:

Posted viahttp://www.ruby-forum.com/.

This was bugging me so I installed Ruby and gave it a go.
You do need to change that ‘n-1’ to be ‘n+1’

Then in the nested loop you need to use both indexes:
for j in 1…n
for i in 1…m
if(s[i].eql?(t[i])) then
^this ‘i’ should be ‘j’

But you still need to adjust the indexes into the strings (i-1 and j-1
seems to work here)

cheers

Chris H. wrote:

On Sep 26, 12:43�am, Joe S. [email protected] wrote:

error… (that’s why I had originally changed it)

Posted viahttp://www.ruby-forum.com/.

Looking at the description at http://www.levenshtein.net/ it appears
that both m and n should be +1
The strings are then offset to allow indexing from 1. Guess you
either need to add a character (blank?)
to the beginning of both strings, or adjust your indexes into s and t
(the 1st char is the 0th index)

I’m totally pulling this out of my head, as I don’t have Ruby on my
work computer

Thank you for the reply. Unfortunately, I get the same error when I try
this :frowning:

On 09/28/2010 08:10 PM, Chris H. wrote:

On Sep 27, 3:37 pm, Joe S. [email protected] wrote:

Chris H. wrote:

On Sep 26, 12:43 am, Joe S. [email protected] wrote:

error… (that’s why I had originally changed it)

[…]

Now, that bugged me too, and I had a go (using more idiomatic ruby
ranges and Enumerable methods, and “enhancing” the String class):

Notes:

When first going through the matrix, I fill the initial values in the
first column and row, and set the rest to nil.

When traversing the matrix a second time, I start at index ‘1’ and end
at the index calculated from the size, but the strings actually use
zero-based indexes. To illustrate that:

S T R I N G ← This is the string - spaced out f. visibility
0 1 2 3 4 5 ← These are the corresponding index numbers

      ^ This is the last index, but the size is returned as 6.

So, when I am traversing the matrix, I use 1…6, but the corresponding
letters are at 0…5. That is the reason that I have to use

cost = (self[s-1] == other[t-1]) ? 0 : 1

so I really do compare the strings letter by letter.

Hope that helps a bit,

t.

Seems to be working after some minor changes, thank you everyone!!!