Did you mean ...? with act_as_ferret

Hello,

does anybody know how to implement a “Did you mean …?” like Google
with act_as_ferret?

I think this is a possible way:

  1. Generate a keyword-list (this is my difficulty. I don’t know how to
    build such a list from the index) with no stop-words from the first
    index.
    e. g. (car, ship, plant, house)

  2. Build a second index from this word-list where we store the word in
    the index.

  3. Make a Fuzzy-Search over the new list, e. g. “pland”

  4. Fetch the stored keyword => plant, now you can write “Did you mean
    ‘plant’?”

  5. Make a sharp search with “plant” on the first index.

How can I generate a word-list from the first (standard) index?

Best greetings

Lars

Hi!

On Sun, Jan 06, 2008 at 07:16:48PM +0100, Lars Heese wrote:

e. g. (car, ship, plant, house)

How can I generate a word-list from the first (standard) index?

TermEnum
(http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html)
might help here. You can’t do this through acts_as_ferret, instead I’d
suggest you create a little script outside your application which
rebuilds the word-list index from the real index by using Ferret
directly
to access the index.

cheers,
Jens


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

On Sun, 2008-01-06 at 19:16 +0100, Lars Heese wrote:

Hello,

does anybody know how to implement a “Did you mean …?” like Google
with act_as_ferret?

I think this is a possible way:

Hi Lars,

I did a similar thing in a project except the only things I wanted to
suggest were pre-defined tag names in a table. So I just indexed that
and did a fuzzy search on it.

But anyway, you can enumerate all the terms in an index for a given
field using the terms method of the IndexReader instance for the index:

http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html

Then generate your other index from that. You can store the words
directly in the Ferret index to avoid the unnecessary overhead of an SQL
lookup.

With a small list of terms, I’m not sure what the overhead of Ferret
would be here though. Might be worth experimenting with some
alternatives, like maybe generating an index yourself as an array
directly in Ruby. See the Text library for Metaphone and Soundex
algorithms:

http://text.rubyforge.org/

Ferret will probably be best though tbh.

John.

http://www.brightbox.co.uk - UK Ruby on Rails hosting

As a suggestion: why not build that with a spelling checker instead of
Ferret?

I believe there’s free services around for that, not to mention aspell
on the console if you’re running this on a *nix. You could then
rebuild the sentence with the output of the spelling checker, and
build a link with it that you can present to the user.

Strikes me as easier than building the words index yourself.