Need help creating my own Filter in Ruby

Hi,

I posted a Trac ticket about it, but I thought I’d ask the mailing
list to reach more people.

I’m using these filters together in my analyzer (with acts_as_ferret

  • Ferret 0.11.1).

HyphenFilter.new(
StopFilter.new(
LowerCaseFilter.new(
MappingFilter.new(
StandardTokenizer.new(str),
mapping)),
FULL_FRENCH_STOP_WORDS + FULL_ENGLISH_STOP_WORDS)
)

The mapping filter maps pretty much all the french accents to the
letter without the accent. So far so good.

Only thing missing for what I want to do: I need to be able to make
the words singular, and remove other patterns (j’, d’, l’). I thought
I’d just create my own Filter that do a couple of .gsub’s and add it
in the chain.

Did any of you ever do this? If so, how?

Everytime I use my filter manually, it works very well! When I launch
MyModel.rebuild_index, it fails randomly (works most of the time, but
I’m sure some documents are not well indexed). It fails with messages
like this one:

failed adding 140823996. r_analysis.c:432

Thanks for any help.

If you want to see how I’ve done it so far, go to http://
ferret.davebalmain.com/trac/ticket/168

Philippe A.

Hi Dave,

I hear you… I’ll try to make something up for you… Thanks :slight_smile:

But just to know: implementing a Filter IS the right solution to this
right?

On 3/1/07, Philippe A. [email protected] wrote:

I posted a Trac ticket about it, but I thought I’d ask the mailing
list to reach more people.

Hi Philippe,

I’d love to help you with this but I can’t reproduce it here. If you
can modify the example I gave under your ticket to reproduce the
problem or produce your own self contained failing test I will be able
to fix the problem right away. Otherwise I waste too much time trying
to reproduce the problem.

Cheers,
Dave

Hi Dave,

I just put a way how to reproduce in the Trac ticket. My filter seems
to work fine when it’s included alone with a StandardTokenizer only
but as soon as I put another kind of filter in the chain (I used
HyphenFilter here, but it does the same error with any other filter),
errors show up randomly.

See for yourself, I hope you can trigger the error too :slight_smile:

On 3/2/07, Philippe A. [email protected] wrote:

Hi Dave,

I just put a way how to reproduce in the Trac ticket. My filter seems
to work fine when it’s included alone with a StandardTokenizer only
but as soon as I put another kind of filter in the chain (I used
HyphenFilter here, but it does the same error with any other filter),
errors show up randomly.

See for yourself, I hope you can trigger the error too :slight_smile:

Thanks Philippe, I’ll get that fixed as soon as possible.

Cheers,
Dave