Using StemFilter with PhraseQuery

Hi,

I’m having difficulty getting the StemFilter and PhraseQuery to work
properly together. When I use a StemFilter with a PhraseQuery, searches
only
work if the phrase consists of stems. For example, the search phrase
“reduces health care” will not work but the phrase “reduce health care”
will
work even though the exact text “reduces health care” is contained in
the
original document. I’d like to use StemFilter in conjunction with
PhraseQuery because I need the stemming and I also need to be able to
use
the slop feature of PhraseQuery. Below is my use of StemFilter and
PhraseQuery. Is there anything I’m doing wrong or is the above
description
what I should expect? To get the response that I’m expecting I could
parse
the phrase and build up a query to be used by QueryParser but I’d like a
more succinct solution for now.

I use a StemFilter in my analyzer as follows:

def token_stream(field, str)
  ...
  ts = LowerCaseFilter.new(ts) if @lower
  ts = StopFilter.new(ts, @stop_words)
  ts = StemFilter.new(ts)
  ...
end

My use of PhraseQuery is as follows:

def generate_query(phrase)
phrase = phrase.downcase
phrase_parts = phrase.split(’ ')
query = Ferret::Search::PhraseQuery.new(:content, 2)
phrase_parts.each do |part|
# puts “part: “” + part + “””
query.add_term(part, 1)
end
query
end

Hi!

I think what you get is the expected behaviour. Since you don’t use
Ferret’s QueryParser but build your queries on your own, you’re also
responsible for proper tokenization / analysis of your query content.

So running your phrase through your analyzer before constructing the
phrase query should work as expected.

Cheers,
Jens

On Mon, May 12, 2008 at 02:33:23PM -0400, S D wrote:

PhraseQuery. Is there anything I’m doing wrong or is the above description
ts = StemFilter.new(ts)
# puts “part: "” + part + “"”
query.add_term(part, 1)
end
query
end


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold