Finding: Want closest match, not exact (complex LIKE)

Currently I am working on a game scoring system. This system has team
names with scores attached. These scores are going to be entered by copy
and pasting a CSV sheet in an input field. I am currently working on the
parser for this.

The problem I run into is with finding the team name in the database
when the CSV sheet does not entirely match. You must imagine that
referees verbally register teams. Let’s say the team ‘Skullz n Bonez’
signs up for a match. The referee might put down ‘Skulls and Bones’
which will not match when it is entered later on in the parsebox.

Obviously for any human ‘Skullz n Bonez’ matches ‘Skulls and Bones’
better than any other entry in the team name table. How can I bring this
insight to my Ruby on Rails application?

I already was thinking about replacing all non-alpha characters with %
signs and then doing an SQL-LIKE operation. This would match '‘n’, ‘&’,
‘and’. Yet this would not solve the ‘Skullz’-‘Skulls’ mismatch for
example.

I know Word for example uses an algorithm to check which word in it’s
dictionary is closest to the misspelled word. I have written such an
algorithm once in C++ which would work very well in this situation,
though I fear that without pointer and manual memory management these
recursive operations would be very performance heavy.

Chris D. wrote:

dictionary is closest to the misspelled word. I have written such an
algorithm once in C++ which would work very well in this situation,
though I fear that without pointer and manual memory management these
recursive operations would be very performance heavy.

There is an algorithm used in geneology for encoding names so as to get
matches when the spelling varies a bit. It is called soundex. It is a
simple algorithm that combines many of the consonants and assigns each
group a number. You end up with a letter plus 3 digits. It is most
commonly used for indexing the census. A definition of the algorithm is
at Soundex - What It Is and How to Use It - Genealogy.com

2008/9/1 Norm [email protected]:

Chris D. wrote:

Obviously for any human ‘Skullz n Bonez’ matches ‘Skulls and Bones’
better than any other entry in the team name table. How can I bring this
insight to my Ruby on Rails application?

There is an algorithm used in geneology for encoding names so as to get
matches when the spelling varies a bit. It is called soundex.

MySQL already has this function. Not sure if PostgreSQL does, but I
think it does.

Cheers!


Rodrigo F.
Concepción, Chile