Stemming, stop words, acts_as_ferret

I’d like to get the following behavior:

  1. Stemming. The search is on a database of summaries of California
    legal
    cases. Things like a search for “thermal image” needs to hit “thermal
    imaging.”

  2. Stop words. Searches for “failing to instruct the jury” should come
    up
    with hits on a search for “fail to instruct.”

  3. Case-insensitive.

What I tried was:

class StemmedAnalyzer < Ferret::Analysis::Analyzer
def token_stream(field, reader)
return
Ferret::Analysis::PorterStemFilter.new(Ferret::Analysis::LowerCaseTokenizer.
new(reader))
end
end

class Summary < ActiveRecord::Base
acts_as_ferret(:analyzer => StemmedAnalyzer.new)

But this doesn’t appear to give me either stemming or stopwords. It
does
give me basic searching (searches for exact keywords without stopwords
work,
searches with stopwords return no results).

I’ve looked through the archives, and I’m still confused. Suggestions?

  • James M.

On Mon, Nov 13, 2006 at 11:06:22AM -0800, James M. wrote:

class Summary < ActiveRecord::Base
acts_as_ferret(:analyzer => StemmedAnalyzer.new)

But this doesn’t appear to give me either stemming or stopwords. It does
give me basic searching (searches for exact keywords without stopwords work,
searches with stopwords return no results).

what version of Ferret/AAF are you using ? In the most recent Ferret
(0.10.13) there is no class PorterStemFilter.

With said Feret version, the following seems to suit your needs:
http://pastie.caboo.se/22629

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66