Ignore apostrophes in words

Hi, I just started using ferret and the aaf plugin and it seems to work
quite nicely. However, my fields are very short (titles of music) and I
don’t think may users will be typing in apostrophes when they are
looking for something. Right now, for a simple document such as “what
i’ve done” I’d like it to be indexed as “what ive done” instead. Right
now I’m using this for my aaf line (I don’t want any stop words either
as smaller docs, each word even articles can have some significance):

acts_as_ferret( { :fields => [ :name ] }, { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([]) } )

How should I go about removing the apostrophes when docs are added to
the index?

Thanks,
Chris

On Mon, Jun 25, 2007 at 05:02:54PM +0200, Chris Brickley wrote:

How should I go about removing the apostrophes when docs are added to
the index?

I’d implement a custom analyzer that does what StandardAnalyzer does,
plus filtering out the apostrophes from the tokens (which should be
possible with a custom filter added to the chain).

For a starting point, see
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardAnalyzer.html

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Ok thanks for that link. However, I am a bit lost as to where I would
put my analyzer code? In my model itself or somewhere else?

This is what I came up with:

class MyAnalyzer < Analyzer
def initialize(stop_words = FULL_ENGLISH_STOP_WORDS, lower = true)
@lower = lower
@stop_words = stop_words
end

def token_stream(field, str)
  ts = StandardTokenizer.new(str)
  ts = LowerCaseFilter.new(ts) if @lower
  ts = StopFilter.new(ts, @stop_words)
  ts = HyphenFilter.new(ts)
  ts = ApostropheFilter.new(ts)
end

end

class ApostropheFilter
def next()
t = @input.next()

   if (t == nil)
     return nil
   end

   t.term_text = t.term_text.tr("'","")

   return t
 end

end

I tried putting it below my aaf declaration in my model file but I just
get:
“NameError: uninitialized constant Ferret::Analysis::MyAnalyzer” when
trying to do Model.rebuild_index.

Thanks.

Jens K. wrote:

I’d just put this into lib/, if you call the file my_analyzer.rb it
should be found and loaded by Rails automatically when you use the
class.

if not, require it explicitly in environment.rb.

Jens

Awesome! Thanks Jens :slight_smile: Adding the require to environment.rb did the
trick (as well as putting it in the lib dir). Thanks for all your help!

I’d just put this into lib/, if you call the file my_analyzer.rb it
should be found and loaded by Rails automatically when you use the
class.

if not, require it explicitly in environment.rb.

Jens

On Tue, Jun 26, 2007 at 04:25:27PM +0200, Chris Brickley wrote:

end

class ApostropheFilter
end
Posted via http://www.ruby-forum.com/.


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa