Forum: Ruby Fuzzy Searching a Database

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
tomrossi7 (Guest)
on 2005-12-04 20:39
(Received via mailing list)
How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database.  For example, I want to search a field in a
table for anything that is similar to "Frank Smithe".  I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%).

Am I making sense?  Is there a third party app I would need?

Thanks,
Tom
rdm (Guest)
on 2005-12-04 20:59
(Received via mailing list)
At 3:37 AM +0900 12/5/05, TomRossi7 wrote:
>How would you recommend implementing something like a fuzzy search with
>Ruby and a SQL database.  For example, I want to search a field in a
>table for anything that is similar to "Frank Smithe".  I would want an
>array of matches with some sort of score telling me how good of a match
>the item received ("frank smithe" 100%, "frank smith" 90%).

  Soundex is a phonetic algorithm for indexing names by their sound when
  pronounced in English. The basic aim is for names with the same
  pronunciation to be encoded to the same string so that matching can
occur
  despite minor differences in spelling. Soundex is the most widely
known of
  all phonetic algorithms and is often used (incorrectly) as a synonym
for
  "phonetic algorithm".

    -- http://en.wikipedia.org/wiki/Soundex

If you process your database's names into a column of Soundex encodings,
you should be able to search this for an encoded target word.  You could
then post-process the results to get the "score" for each one.  Note
that
you may want to look around for alternative encodings (eg, Metaphone).

-r
hcatlin (Guest)
on 2005-12-04 23:18
(Received via mailing list)
Just make Kit screen all of the queries. Get her a blackberry and email
them to her.
vanek (Guest)
on 2005-12-04 23:22
(Received via mailing list)
this was discussed on both this ML and the Rails ML about a week ago.
The
threads have the word 'fuzzy' in the subject.
steve (Guest)
on 2005-12-04 23:38
(Received via mailing list)
On Dec 4, 2005, at 10:58 AM, Rich M. wrote:

> If you process your database's names into a column of Soundex
> encodings,
> you should be able to search this for an encoded target word.

Also worth mentioning is that many database systems have Soundex
support built-in.

--Steve
w3gat (Guest)
on 2005-12-04 23:38
(Received via mailing list)
Lou V. wrote:

>>
>> Am I making sense?  Is there a third party app I would need?
>>
>> Thanks,
>> Tom
>>
>>
>>
>
>
>
I needed to do something similar with my office database.  I used the
Levenshtein distance.  The routine was also
quite useful in finding name differences between my office data base and
the billing company's.
#------------------------------------------------------
# Determine Levenshtein distance of two strings

def Ld(s,t)

    #s string #1
    #t string #2

    n = s.size
    m = t.size
    a = Array.new

    if n != 0 && m != 0

        #2 create array
        r = Array.new
        rz = Array.new

        0.upto(m) {|x| r.push(0)}

        0.upto(n) {|x|a.push(r.dup)}
        a.each_index {|x| a[x][0] = x}
        0.upto(m) {|x| a[0][x] = x}

        cost =  0
        1.upto(n) do |i|
            1.upto(m) do |j|
                if s[i] == t[j]
                    cost =0
                else
                    cost = 1
                end
                a[i][j] = [a[ i- 1][j] +1,a[i][j - 1] + 1,a[i - 1][j -
1] + cost].min
            end
        end
        a[n][m]
    else
        0
    end
end
Paul M. (Guest)
on 2005-12-05 01:29
(Received via mailing list)
see fulltext searching at mysql.com.

from http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
.....
 The following example is more complex. The query returns the relevance
values and it also sorts the rows in order of decreasing relevance. To
achieve this result, you should specify MATCH() twice: once in the
SELECT list and once in the WHERE clause. This causes no additional
overhead, because the MySQL optimizer notices that the two MATCH()
calls are identical and invokes the full-text search code only once.

mysql> SELECT id, body, MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root') AS score
    -> FROM articles WHERE MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root');
+----+-------------------------------------
+-----------------+
| id | body                                                       |
score           |
+----+-------------------------------------
+-----------------+
|  4 | 1. Never run mysqld as root. 2. ...             |
1.5219271183014 |
|  6 | When configured properly, MySQL ...        | 1.3114095926285 |
+----+-------------------------------------
+-----------------+
2 rows in set (0.00 sec)
.....

http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
ben (Guest)
on 2005-12-05 04:10
(Received via mailing list)
/How would you recommend implementing something like a fuzzy search with
Ruby and a SQL database.  For example, I want to search a field in a
table for anything that is similar to "Frank Smithe".  I would want an
array of matches with some sort of score telling me how good of a match
the item received ("frank smithe" 100%, "frank smith" 90%)./


Less direct, but maybe useful...
The pyLucene project is the apache java lucene library implemented in C
and SWIG'd to python (i think).
You might be able to create ruby bindings for the full text search C
library used there.


http://pylucene.osafoundation.org/

http://lucene.apache.org/java/docs
hsanson (Guest)
on 2005-12-05 06:52
(Received via mailing list)
check http://ferret.davebalmain.com/api/files/TUTORIAL.html

Ferret is a Lucene port for Ruby.



Monday 05 December 2005 11:10ã?Ben Stroud ã?ã??はæ?¸ãã¾ã?ã?:
This topic is locked and can not be replied to.