Fuzzy Validation Before Adding New Record?

I’m just looking for some direction on the best way to implement this
and I’m not sure if there is a ton of info out there on this, but I’m
just searching under the wrong terminology.

I have a MySql database that contains a table of products. Each
product has a field for the “product name”, “store”, “city”, “state”,
“price”. When someone adds a new product record I’d like to validate
the record against the existing database and check for duplicates to
include spelling differences. If it finds a similar match in “product
name” and “store” I’d like it not to duplicate, but add it if it’s
different city or store (So you could add the same product, but show
pricing in different cities or different stores).

Example:
Name: Magnavox 42 inch
Store: Wal-Mart
City: Milwaukee
State: WI

New entry:
Name: Magnavox 42 in
Store: Walmart
City: Milwauke
State: WI

I don’t want a duplicate record created since these match. Trying to
figure out what to do when product names can be abbreviated or store
names spelled differently (wal-mart/walmart/wal mart or h.e.b./heb) or
city names mispelled.

This sounded like a fuzzy search using something like Ferret, or a
“did you mean” spelling plugin such as:


or
http://www.ruby-forum.com/topic/104327 It also sounded like a
validation problem, but I didn’t find anything under
http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html
that fit. A combination of all three sounds like a bloated solution
that would degrade performance.

Am I on the right track or should I be looking into something else to
help me figure this out? Thanks.

On Apr 21, 2:34 am, Jeremy [email protected] wrote:

I don’t want a duplicate record created since these match. Trying to
figure out what to do when product names can be abbreviated or store
names spelled differently (wal-mart/walmart/wal mart or h.e.b./heb) or
city names mispelled.

I’ve previously used aspell (a spellchecker) (via the raspell ruby
library) to handle a similar sort of problem.

Fred

The algorithm you want is called the Levenschtein algorithm, and is
the one used in many spell checkers. It is the minimal number of
changes to get from one string to another (so in your example it would
be 2… c and h). Since you want to manually compare two strings, a
spell-checker isn’t exactly what you want. Words very close together
will likely be variations of each other. When doing this, good UI
standards suggest using a “hint box” (think like google does for
common searches) with the already existing items after they have typed
it in is a good idea… list the nearest matches (up to distance 5 or
so) and put a "Did you mean … " label on it. Humans are always
better then computers to see if two things are the same, and this is a
good way to combine good computing with good human interaction. You
can also use the spelling package to cull out misspellings as Fred
suggested.

This library has the raw algorithm you need to do it very quickly.

http://text.rubyforge.org/

On Apr 21, 2:36 am, Frederick C. [email protected]

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs