Is there a way to take two strings, and decide if they are “similar.”
I’m creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.
Is there a way to check if 2 strings are “identical” up to a certain
percentage, such as only having 1 or 2 characters different?
Is there a way to take two strings, and decide if they are “similar.”
I’m creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.
Is there a way to check if 2 strings are “identical” up to a certain
percentage, such as only having 1 or 2 characters different?
sender: “Dylan M.” date: “Tue, Aug 01, 2006 at 03:25:59PM +0900” <<<EOQ
Is there a way to take two strings, and decide if they are “similar.”
I’m creating a contact system in Rails, and am having a large problem
with my users punching in duplicate entries with the last names spelled
slightly different.
Is there a way to check if 2 strings are “identical” up to a certain
percentage, such as only having 1 or 2 characters different?
Looks like there is a soundex implementation for Ruby: http://raa.ruby-lang.org/search.rhtml?search=soundex
If by any chance you are using MySQL you could use the soundex function
builtin into it as well.
You know, there are lots of implementations there, but the Ruby one
seems to be missing [There’s no reason to restrict it to working on
strings. If you duck, it’ll work just as nicely on arrays of what have
you.]
Looks like there is a soundex implementation for Ruby:
that are no more complex, so you might want to explore about a little.
If the names might be from different languages I would rather use
Levenstein than soundex. Levenstein is probably good at describing
typos as “very close” but soundex might be somewhat language specific.
Is there a way to take two strings, and decide if they are “similar.”
I’m creating a contact system in Rails, and am having a large problem
Interesting your name is almost “Markov” ;-}
Anyway, besides algorithms mentioned here, I have notes on Double
Metaphone, NYSIIS, Phonex. For sequence analysis in general, google
McIlroy-Hunt, Ratcliff/Obershelp: