Forum: Ferret AAF + DRb + Windows. Index works, search maybe?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Chris D. (Guest)
on 2008-10-28 16:21
Currently running the following setup:
-Ferret 0.11.5
-Acts as Ferret plugin 0.4.3 Rev.257
-Rails 2.1.1
-Ruby 1.8.6 Patchlevel 111
-Windows 2003 Server

This all works fine and dandy and after a lot of struggling with getting
these specific versions together I managed to get it up and running on
Windows. In all my models I have set the :remote => true option on the
acts_as_ferret declaration so the DRb is used in production.
The DRb runs fine and is defined in ferret_server.yml

Index directory is created in the project folder and the DRb seems to
index nicely in there. The top of the log says:

DRb server: ensure_index_exists for class Contest
Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil,
looks like we are the server
Will use local index.
using index in C:/websites/BRProject/index/production/contest

1.) Is this correct? Or should I set this FERRET_USE_LOCAL_INDEX value?


2.) Then after indexing all my teams in the database I get these lines
in my ferret_server.log:

Adding field name with value 'BLAH TEAM NAME' to index
creating doc for class: Team, id: 4891
Adding field name with value 'OTHER TEAM NAME' to index
reindex model Team : 191.58% complete : -8.05 secs to finish
  Team Load (0.000000)   SELECT * FROM `teams` WHERE
(id > 4891) LIMIT 1000
  SQL (0.000000)   COMMIT
changing index dir to
C:/websites/BRProject/index/production/team/20081025165554
index dir is now
C:/websites/BRProject/index/production/team/20081025165554
#method_missing(:find_id_by_contents, ["Team", "Othe~ OR Team~",
{:limit=>4}])
#method_missing(:find_id_by_contents, ["Team", "J~ OR P~ OR Custom~",
{:limit=>4}])
#method_missing(:find_id_by_contents, ["Team", "Munchin~ OR Hilto~",
{:limit=>4}])

Followed by a similar 'method_missing' line about 1000 times. One for
each apparent search query entered on the website.

I use the 'find_by_contents' function throughout my application where I
split the query on non-word chars /[^\w]/ and join em with an OR and
make all the terms fuzzy. Should I use a different call? Or what am I
doing wrong here.


3.) I feel like I am abusing the query language for this purpose: Fuzzy
finding team names. Often I type in a nearly exact match with the team
and it wont show up. Especially team names with '.com' in it for
example. Or exact matches yield different team names with a higher hit
rate. Also there is a problem with matching certain reserved words.
Apparently querying the States DB with 'IN' for Indiana is not going to
work :)

Is there any advice on how to better realize a fuzzy search engine?


thanks for the support!
Jens K. (Guest)
on 2008-10-28 21:26
Hi!

Chris D. wrote:
> Currently running the following setup:
> -Ferret 0.11.5
> -Acts as Ferret plugin 0.4.3 Rev.257
> -Rails 2.1.1
> -Ruby 1.8.6 Patchlevel 111
> -Windows 2003 Server
>
> This all works fine and dandy and after a lot of struggling with getting
> these specific versions together I managed to get it up and running on
> Windows. In all my models I have set the :remote => true option on the
> acts_as_ferret declaration so the DRb is used in production.
> The DRb runs fine and is defined in ferret_server.yml

my congrat ulations :)

> Index directory is created in the project folder and the DRb seems to
> index nicely in there. The top of the log says:
>
> DRb server: ensure_index_exists for class Contest
> Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil,
> looks like we are the server
> Will use local index.
> using index in C:/websites/BRProject/index/production/contest
>
> 1.) Is this correct? Or should I set this FERRET_USE_LOCAL_INDEX value?

no, just ignore that possibly misleading message. As long as the server
is self-aware enough to know it's the server, and the application does
not think it's the server, everything is ok :)

> 2.) Then after indexing all my teams in the database I get these lines
> in my ferret_server.log:
>
> Adding field name with value 'BLAH TEAM NAME' to index
> creating doc for class: Team, id: 4891
> Adding field name with value 'OTHER TEAM NAME' to index
> reindex model Team : 191.58% complete : -8.05 secs to finish
>   Team Load (0.000000)   SELECT * FROM `teams` WHERE
> (id > 4891) LIMIT 1000
>   SQL (0.000000)   COMMIT
> changing index dir to
> C:/websites/BRProject/index/production/team/20081025165554
> index dir is now
> C:/websites/BRProject/index/production/team/20081025165554
> #method_missing(:find_id_by_contents, ["Team", "Othe~ OR Team~",
> {:limit=>4}])
> #method_missing(:find_id_by_contents, ["Team", "J~ OR P~ OR Custom~",
> {:limit=>4}])
> #method_missing(:find_id_by_contents, ["Team", "Munchin~ OR Hilto~",
> {:limit=>4}])
>
> Followed by a similar 'method_missing' line about 1000 times. One for
> each apparent search query entered on the website.

Everything is ok, aaf is just a bit noisy :)
you might want to raise the log level for your DRb server to get rid of
these (at some point in time I introduced a configuration option for
ferret_server.yml, not sure if you already have it or not in your
version), or just comment out the relevant line of code that spams the
log this way...

> 3.) I feel like I am abusing the query language for this purpose: Fuzzy
> finding team names. Often I type in a nearly exact match with the team
> and it wont show up. Especially team names with '.com' in it for
> example. Or exact matches yield different team names with a higher hit
> rate.

I often find myself doing the same. I go even further - I build
different query variants from the query string the user entered, with a
varying degree of exactness (i.e. exact phrase match, match with
wildcards, fuzzy match), and OR them together with different boosts.
this way more relevant matches are likely to be on top of the result
list because their sub query receives a higher boost, but you also find
matches further away from the original query in case the user had a typo
or so...

> Also there is a problem with matching certain reserved words.
> Apparently querying the States DB with 'IN' for Indiana is not going to
> work :)

lowercase 'in' should work (in case you also index lowercase 'in', of
course), imho the reserved words are uppercase only. Or don't use the
build in query parser but construct your own query objects (which
however might not work with DRb if you build them in your application)


Cheers,
Jens
This topic is locked and can not be replied to.