Whitespace Issues

bluejay · July 14, 2006, 4:35pm

I am trying to build up a filtered search using the logic below.

bq = Ferret::Search::BooleanQuery.new
	bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section",section.downcase!)),

Ferret::Search::BooleanClause::Occur::MUST)

 	filter = Ferret::Search::QueryFilter.new(bq)
 	@vobjects = VoObject.find_by_contents(search_input,:filter =>

filter, :sort => [“section”, “sale_category”])

This works fine when the “section” is a single word like “book” but when
there is white spaces in the query like “paperback book” it does not
find the appropriate result and comes back with zero hits.

I changed this to use FuzzyQuery and it works but I sometimes get
segmentation errors (this was reported in another topic).

Does anyone have a solution to this problem for me?

Thanks very much.

bluejay · July 14, 2006, 5:00pm

It’s hard to know for sure without seeing how your index is built, but
if
you are using TOKENIZED on that field, then whenever the index is built
the
text is split on whitespace, and each element is added as a separate
term.
It looks like when you are searching, you are trying to find the entire
text
as a single term.

In order to solve this, I believe you can either construct your query
using
QueryParser, which will use the analyzer / tokenizer and split the terms
out
for you, or you can simply split the ‘section’ string on whitespace and
build a Term and TermQuery for each resulting element and build a
PhraseQuery from that set.

I hope this is some help,

Jeremy

bluejay · July 14, 2006, 6:37pm

Jeremy Bensley wrote:

It’s hard to know for sure without seeing how your index is built, but
if
you are using TOKENIZED on that field, then whenever the index is built
the
text is split on whitespace, and each element is added as a separate
term.

Jeremy

Thanks for the reply. I am building the index like this…

class VoObject < ActiveRecord::Base
acts_as_ferret :fields=>
[‘short_description’,‘section’,‘sale_category’,‘sale_type’,‘outcode’]

It looks like when you are searching, you are trying to find the entire
text
as a single term.

In order to solve this, I believe you can either construct your query
using
QueryParser, which will use the analyzer / tokenizer and split the terms
out
for you, or you can simply split the ‘section’ string on whitespace and
build a Term and TermQuery for each resulting element and build a
PhraseQuery from that set.

Sorry for asking a silly question but how would I go about doing this?

I hope this is some help,

Jeremy

bluejay · July 14, 2006, 8:18pm

Method #1 should be shorter / easier, and would look something like
this:

qp = Ferret::QueryParser.new(“section”) #section defines the default
field
to build the query

query = qp.parse(""#{section}"")

modified boolean query

bq = Ferret::Search::BooleanQuery.new
bq.add_query(pq, Ferret::Search::BooleanClause::Occur::MUST)

filter = Ferret::Search::QueryFilter.new(bq)
@vobjects = VoObject.find_by_contents(search_input,:filter =>
filter, :sort => [“section”, “sale_category”])

Uness you have more than one query in the boolean query, you should
probably
just skip that entirely and build your filter from the PhraseQuery.