Portuguese Stemming

Pedro_SSISO-8859-1 · August 18, 2006, 12:35pm

Today while compiling ferret I noticed there was a Portuguese stemmer
being compiled. How do I enable it’s use for my index?

Pedro.

Pedro_SSISO-8859-1 · August 22, 2006, 7:47pm

Hi Pedro,

You need to build a custom analyzer. Maybe something like this;

class PortugueseAnalyzer
    def token_stream(field, string)
        StemFilter.new(StopFilter.new(StandardTokenizer.new(string),
                                      FULL_PORTUGUESE_STOP_WORDS),
                       "pt", "UTF_8")
    end
end

index = Index.new(:analyzer => PortugueseAnalyzer.new)

I hope the formatting works in your email reader.

Cheers,
Dave