Aaf and stop words; query parser

I’ve been trying to implement acts_as_ferret in my latest project and
ran into a snag. If I do a search for ‘auditor state’ then the search
works perfectly. If I include a stop word, as in ‘auditor of state’,
then I get no results. I’d prefer not to set stop words to nil and index
everything.

The solution, that I have yet to attempt, is to use Ferret::QueryParser
instead of passing the query as a string to the search method.

I couldn’t find a way to do this with the current acts_as_ferret plugin
and was wondering if modifying the plugin to have a
“ferret_query_parser” method would be better than trying to use Ferret
directly from my app model.

Also, wouldn’t this approach be necessary if I implement my own
analyzer? I was thinking of possibly using the double metaphone
algorithm and thinking that without the query parser to analyze the
search string using my custom analyzer that I wouldn’t get any results.

I hope that I haven’t missed something obvious in aaf’s api.

On a side note, is there any recommended place to place custom analyzers
for rails apps?

Thanks,
Curtis

Hi!

On Wed, Nov 01, 2006 at 09:54:25AM -0500, Curtis Hatter wrote:

I’ve been trying to implement acts_as_ferret in my latest project and ran into a snag. If I do a search for ‘auditor state’ then the search works perfectly. If I include a stop word, as in ‘auditor of state’, then I get no results. I’d prefer not to set stop words to nil and index everything.

what version of AAF/Ferret do you use ? Afair that issue isn’t new, and
should have been fixed some time ago.

cheers,
Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Currently I’m using AAF 0.10 and windows build of Ferret version 0.10.9

I’m currently moving my development platform to a FreeBSD machine which
is
why I haven’t been able to do much testing. The FreeBSD version will be
0.10.13

I looked into the archives I have but only solution I found was to set
the
stopwords to nil.

Thanks,
Curtis

----- Original Message -----
From: “Jens K.” [email protected]
To: [email protected]
Sent: Wednesday, November 01, 2006 12:27 PM
Subject: Re: [Ferret-talk] aaf and stop words; query parser

Hi!

On Wed, Nov 01, 2006 at 09:54:25AM -0500, Curtis Hatter wrote:

I’ve been trying to implement acts_as_ferret in my latest project and ran
into a snag. If I do a search for ‘auditor state’ then the search works
perfectly. If I include a stop word, as in ‘auditor of state’, then I
get no
results. I’d prefer not to set stop words to nil and index everything.

what version of AAF/Ferret do you use ? Afair that issue isn’t new, and
should have been fixed some time ago.

cheers,
Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

I’m using the same version of AAF and Ferret 0.3.0 and 0.10.9
respectively. I sent David B. my index so he could analyze it. I
posted a similiar message here:

http://www.ruby-forum.com/topic/84909

Any index I built with AAF seemed to demostrate this problem. I checked
the code, but I couldn’t see where it might have been modifying the
query string in anyway.

Any help?

Charlie

Curtis Hatter wrote:

Currently I’m using AAF 0.10 and windows build of Ferret version 0.10.9

I’m currently moving my development platform to a FreeBSD machine which
is
why I haven’t been able to do much testing. The FreeBSD version will be
0.10.13

I looked into the archives I have but only solution I found was to set
the
stopwords to nil.

Thanks,
Curtis

----- Original Message -----
From: “Jens K.” [email protected]
To: [email protected]
Sent: Wednesday, November 01, 2006 12:27 PM
Subject: Re: [Ferret-talk] aaf and stop words; query parser

Hi!

On Wed, Nov 01, 2006 at 09:54:25AM -0500, Curtis Hatter wrote:

I’ve been trying to implement acts_as_ferret in my latest project and ran
into a snag. If I do a search for ‘auditor state’ then the search works
perfectly. If I include a stop word, as in ‘auditor of state’, then I
get no
results. I’d prefer not to set stop words to nil and index everything.

what version of AAF/Ferret do you use ? Afair that issue isn’t new, and
should have been fixed some time ago.

cheers,
Jens


webit! Gesellschaft f�r neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr�mer [email protected]
Schnorrstra�e 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Charlie H. wrote:

I’m using the same version of AAF and Ferret 0.3.0 and 0.10.9
respectively. I sent David B. my index so he could analyze it. I
posted a similiar message here:

Problems with stop word analysis and queries - Ferret - Ruby-Forum

Any index I built with AAF seemed to demostrate this problem. I checked
the code, but I couldn’t see where it might have been modifying the
query string in anyway.

Any help?

I should also say that I was not able to reproduce this when I created
an index using just ferret. So doing something similar to what David
suggested in the other thread. I got hits when I submitted queries with
stop words. Hope that helps.

Charlie

I believe the problem was in how I was creating my index. My
acts_as_ferret
declaration was as follows:

acts_as_ferret( :fields => {
:name => {},
:desc => {:index => :untokenized_omit_norms},
:body => {:store => :yes},
:role => {},
})

With the above a search that used stop words, ex. “auditor of state”,
would
return no hits. When I removed the “:index => :untokenized_omit_norms”
and
rebuilt the index that same search started to work with acts_as_ferret.
I
haven’t played around with just using ferret and seeing what would
happen
because of time constraints on this current project.

If there’s any suggestions or anything I’d gladly try them. I would like
to
keep the “desc” untokenized and omit the norms because I don’t do
boosting
and may wish to sort by the “desc” field.

Thanks,
Curtis

----- Original Message -----
From: “Charlie H.” [email protected]
To: [email protected]
Sent: Tuesday, November 07, 2006 9:59 AM
Subject: Re: [Ferret-talk] aaf and stop words; query parser

On Tue, Nov 07, 2006 at 06:25:20PM -0500, Curtis Hatter wrote:

With the above a search that used stop words, ex. “auditor of state”, would
return no hits. When I removed the “:index => :untokenized_omit_norms” and
rebuilt the index that same search started to work with acts_as_ferret. I
haven’t played around with just using ferret and seeing what would happen
because of time constraints on this current project.

If there’s any suggestions or anything I’d gladly try them. I would like to
keep the “desc” untokenized and omit the norms because I don’t do boosting
and may wish to sort by the “desc” field.

you really should tokenize the desc field if you want to run searches
across it. If you have to sort by the desc field and therefore
can’t tokenize it, you could index it twice, once tokenized for
searching
and once untokenized (and maybe truncated to save some space in your
index) for sorting.

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Jens K. wrote:

you really should tokenize the desc field if you want to run searches
across it. If you have to sort by the desc field and therefore
can’t tokenize it, you could index it twice, once tokenized for
searching
and once untokenized (and maybe truncated to save some space in your
index) for sorting.

Jens,

I’m seeing this same behavior as Curtis, but here is how I"m building my
index:

acts_as_ferret( { :additional_fields => [:content] } )

See my other thread for some observations from what I initially tested.

http://www.ruby-forum.com/topic/84909

However, when I tried to reproduce this using just ferret I couldn’t.
Any ideas?

Charlie

Hi!

On Sat, Nov 18, 2006 at 04:29:43PM +0100, Charlie H. wrote:

However, when I tried to reproduce this using just ferret I couldn’t.
Any ideas?

yes, I think it’s a Ferret bug that was introduced some time after
0.10.1. Have a look at this script: Parked at Loopia

This reproduces the problem by adding an untokenized field to the index.
If there is no untokenized field, everything is fine. As AAF uses
untokenized fields to store IDs and class names, the problem is always
present.

I checked the following versions of Ferret:
working: 0.10.1
not working: 0.10.9, 0.10.11, 0.10.13

I already tried to conact Dave about this, but he still seems to be
offline. Hope he’s fine and back soon to help us out here :wink:

cheers,
Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66