QueryParser doesn't use StandardAnalyzer correctly?


#1

I am having a bit of a problem with my search queries being parsed
correctly it seems, and I wonder if anyone else has experienced this.

I have written an index using StandardAnalyzer for analysis. I want to
search that index by passing my user query through a QueryParser
instance which is also using a StandardAnalyzer. However the resultant
query does not seem to be a valid term query and therefore the search
produces no hits.

Specifically I have a bunch of docs with the phrase “museum of art” in
the source text.

A query ‘museum art’ gets parsed into ‘+contents:museum +contents:art’
which works just fine and produces hits.

A query of ‘museum of art’ gets parsed into ‘+contents:museum +contents:
+contents:art’ which produces no hits. The resulting term query itself
seems to be malformed, containing an extraneous term for a stop word
which was (correctly) filtered out.

Using the Luke gui tool for Lucene, I have verified that passing my
query through StandardAnalyzer should indeed work, as it produces the
expected term query and the expected hits in that environment. But as
for the same query in Ferret, I’m at a loss.

This should be easily reproducible with the following code fragment:

require ‘ferret’

parser = Ferret::QueryParser.new(‘contents’, :analyzer =>
Ferret::Analysis::StandardAnalyzer.new, :occur_default =>
Ferret::Search::BooleanClause::Occur::MUST)
q1 = parser.parse(‘museum art’)
q2 = parser.parse(‘museum of art’)
puts q1, q2

Thanks for any insight.

-Roop


#2

roop wrote:

I am having a bit of a problem with my search queries being parsed
correctly it seems, and I wonder if anyone else has experienced this.

See this recent thread: “Stop words in queries”
(http://www.ruby-forum.com/topic/60599).

HTH,

Nathaniel


#3

Nathaniel, thanks for the info. I will await the bug fix. In the
meantime my own workaround looks like this. In my QueryParser subclass,
I override parse() so that it filters out stopwords first:

class SafeQueryParser < Ferret::QueryParser

def initialize(default_field, options)
my_options = { :analyzer => Ferret::Analysis::StandardAnalyzer.new
}.update(options)
super(default_field, my_options)
# breaking encapsulation here, but whaddya gonna do…
@stop_words =
my_options[:analyzer].instance_variable_get(:@stop_words)
end

def parse(query)
@stop_words.each do |word|
query.gsub!(/\b#{word}\b\s*/, ‘’)
end
super(query)
end

end


#4

Hey guys,

Since this was a pretty easy fix, I’ve fixed it in the pure ruby
version. You’ll have to get it out of the subversion repo if you want
it.

Cheers,
Dave