Ruby Forum Ferret > Using StemFilter with PhraseQuery

Posted by S D (Guest)
on 12.05.2008 20:35
(Received via mailing list)
Hi,

I'm having difficulty getting the StemFilter and PhraseQuery to work
properly together. When I use a StemFilter with a PhraseQuery, searches 
only
work if the phrase consists of stems. For example, the search phrase
"reduces health care" will not work but the phrase "reduce health care" 
will
work even though the exact text "reduces health care" is contained in 
the
original document. I'd like to use StemFilter in conjunction with
PhraseQuery because I need the stemming and I also need to be able to 
use
the slop feature of PhraseQuery. Below is my use of StemFilter and
PhraseQuery. Is there anything I'm doing  wrong or is the above 
description
what I should expect? To get the response that I'm expecting I could 
parse
the phrase and build up a query to be used by QueryParser but I'd like a
more succinct solution for now.

I use a StemFilter in my analyzer as follows:

    def token_stream(field, str)
      ...
      ts = LowerCaseFilter.new(ts) if @lower
      ts = StopFilter.new(ts, @stop_words)
      ts = StemFilter.new(ts)
      ...
    end

My use of PhraseQuery is as follows:

  def generate_query(phrase)
    phrase = phrase.downcase
    phrase_parts = phrase.split(' ')
    query = Ferret::Search::PhraseQuery.new(:content, 2)
    phrase_parts.each do |part|
      # puts "part: \"" + part + "\""
      query.add_term(part, 1)
    end
    query
  end
Posted by Jens Kraemer (Guest)
on 13.05.2008 09:11
(Received via mailing list)
Hi!

I think what you get is the expected behaviour. Since you don't use
Ferret's QueryParser but build your queries on your own, you're also
responsible for proper tokenization / analysis of your query content.

So running your phrase through your analyzer before constructing the
phrase query should work as expected.

Cheers,
Jens

On Mon, May 12, 2008 at 02:33:23PM -0400, S D wrote:
> PhraseQuery. Is there anything I'm doing  wrong or is the above description
>       ts = StemFilter.new(ts)
>       # puts "part: \"" + part + "\""
>       query.add_term(part, 1)
>     end
>     query
>   end

> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk@rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold