Forum: Ferret QueryParser doesn't use StandardAnalyzer correctly?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
01ae3e19f38b23c3138b76af03ff768a?d=identicon&s=25 roop (Guest)
on 2006-04-13 11:34
I am having a bit of a problem with my search queries being parsed
correctly it seems, and I wonder if anyone else has experienced this.

I have written an index using StandardAnalyzer for analysis. I want to
search that index by passing my user query through a QueryParser
instance which is also using a StandardAnalyzer. However the resultant
query does not seem to be a valid term query and therefore the search
produces no hits.

Specifically I have a bunch of docs with the phrase "museum of art" in
the source text.

A query 'museum art' gets parsed into '+contents:museum +contents:art'
which works just fine and produces hits.

A query of 'museum of art' gets parsed into '+contents:museum +contents:
+contents:art' which produces no hits. The resulting term query itself
seems to be malformed, containing an extraneous term for a stop word
which was (correctly) filtered out.

Using the Luke gui tool for Lucene, I have verified that passing my
query through StandardAnalyzer should indeed work, as it produces the
expected term query and the expected hits in that environment. But as
for the same query in Ferret, I'm at a loss.

This should be easily reproducible with the following code fragment:

  require 'ferret'

  parser = Ferret::QueryParser.new('contents', :analyzer =>
Ferret::Analysis::StandardAnalyzer.new, :occur_default =>
Ferret::Search::BooleanClause::Occur::MUST)
  q1 = parser.parse('museum art')
  q2 = parser.parse('museum of art')
  puts q1, q2

Thanks for any insight.

-Roop
De4ad5b12586407fd72276710dc0fcb5?d=identicon&s=25 Nathaniel Talbott (ntalbott)
on 2006-04-14 15:33
roop wrote:

> I am having a bit of a problem with my search queries being parsed
> correctly it seems, and I wonder if anyone else has experienced this.

See this recent thread: "Stop words in queries"
(http://www.ruby-forum.com/topic/60599).

HTH,


Nathaniel
01ae3e19f38b23c3138b76af03ff768a?d=identicon&s=25 roop (Guest)
on 2006-04-14 22:29
Nathaniel, thanks for the info. I will await the bug fix. In the
meantime my own workaround looks like this. In my QueryParser subclass,
I override parse() so that it filters out stopwords first:

class SafeQueryParser < Ferret::QueryParser

  def initialize(default_field, options)
    my_options = { :analyzer => Ferret::Analysis::StandardAnalyzer.new
}.update(options)
    super(default_field, my_options)
    # breaking encapsulation here, but whaddya gonna do...
    @stop_words =
my_options[:analyzer].instance_variable_get(:@stop_words)
  end

  def parse(query)
  	@stop_words.each do |word|
  		query.gsub!(/\b#{word}\b\s*/, '')
  	end
  	super(query)
  end

end
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2006-04-18 06:50
(Received via mailing list)
Hey guys,

Since this was a pretty easy fix, I've fixed it in the pure ruby
version. You'll have to get it out of the subversion repo if you want
it.

Cheers,
Dave
This topic is locked and can not be replied to.