Hello,
I’m using: Ruby 1.8.6, Rails 1.2.3, ferret 0.11.4, acts_as_ferret from
svn stable.
I’ve had quite a day wrestling with trying to remove the use of
stopwords. The problem was that when searching for words like “no” or
“the”, no results were found. I found a confusing thing behavior that
has taken me some time to figure out, and I hope sharing it saves
someone else some time.
From searching around online and in the source code I came up with the
following config in my ActiveRecord model:
acts_as_ferret({:fields => {:name => {:boost => 10},
:type => {:boost => 2},
:email => {:boost => 10},
:bio => {:store => :no},
:status_id => {:boost => 1}},
:store_class_name => true,
:remote => true,
:ferret => { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([]) }
} )
With the StandardAnalyzer added, I do find results with “no” or “the”.
The complicating factor is that as you can see, I have a field
“status_id”. This field lets me filter for profiles that are
published or draft in my CMS.
Before I added the StandardAnalyzer, the status_id field worked fine
in queries like this:
a = Profile.find_by_contents(“smith status_id:100”)
a.total_hits
=> 2 # this is correct, only 2 are published
a = Profile.find_by_contents(“smith”)
a.total_hits
=> 4 # this is correct, there are 4 total
So, you can see that the status_id was automatically “AND”-ed to the
query word.
However, after adding the above StandardAnalyzer config, the status_id
was now “OR”-ed, like so:
a = Profile.find_by_contents(“no”)
a.total_hits
=> 5 # this is good
a = Profile.find_by_contents(“no status_id:100”)
a.total_hits
=> 208 # this is bad – it’s the same as if I only searched for
status_id:100.
a = Profile.find_by_contents(“smith status_id:100”)
a.total_hits
=> 208 # this is just as bad – it’s the same as if I only searched
for status_id:100.
The fix here is to add the AND keyword explicitly to the query:
a = Profile.find_by_contents(“smith AND status_id:100”)
a.total_hits
=> 2 # works just like before.
In fact, OR becomes the default search regardless of whether I use a
field in the query:
a = Profile.find_by_contents(“smith jones”)
a.total_hits
=> 5 # OR’ed results
a = Profile.find_by_contents(“smith AND jones”)
a.total_hits
=> 0
Again, before StandardAnalyzer, “AND” was the default so the first
“smith jones” query would have returned 0 as it should.
Any insight as to why this might be? I would prefer AND to be the
default.
Thanks,
Doug