That would actually be perfect, because I could easily find the term
that is closest to that number. The only problem is that I have
absolutely no idea how to write that algorithm. Any idea what to search
for on google or do you know where I can start with this? I searched
Google for a good hour and basically ended up with nothing.
The problem you describe has not yet been completely solved in computer
sciences - at least to my knowledge.
You are dealing with a set of words and search the “closest” match.
first you have to define what “close” means. In mathematics and CS, the
“function” that returns the numeric distance between two elements of a
is called the set’s “metric”. An example is the euclidian metric
sqrt(sqr(x) + sqr(y)). Try to search the internet for “metric” and “word
metric” to find out about metrics in general.
The problem that presents itself now is to index a metric space without
much more special properties (as vector spaces would have). The only
structure I know is called “M Tree”.
Why index? Well, consider the following approach: Query all datasets and
then for each data set compute the difference between the query word and
the word in the data set. Sort the differences and select the 5
Sloooow for large sets. So the indexing has to take place in the
I do not know any database that implements M trees as of now and
PostgreSQL, for example, to support M Tree indexes would propably break
your project’s scope.
So what to do? Settle for a heuristic solution!
The easiest thing here would be either to use the full text index engine
of the RDBMS of your choice or fall back to a separate server process
running on your server like Lucene or something that uses the
or a similar approach. For each word you enter into your database, you
the word with the primary key of that entry to your word database server
Another possibility would be to “factorize” similar words, i.e. similar
sounding words and/or letters are changed to the same. For example,
“similar” and “cimilar” could be considered the same.
Hope That Helps,