Wildcard queries on stemmed indexes

Dobai-Pataky_BSSSSl · December 2, 2010, 5:30pm

Hello!

I’m using the German stemming analyzer to index a database using
acts_as_ferret. I have some troube with wildcard queries which I’m
extensivly using (and needing) for autocompleter fields.

The problem is the following:

In this example, the indexed model contains street names. Some of these
names are:

Alte Bürger
Alter Fährweg
Am Alten Vorhafen
…

So lots of names which Street#find_with_ferret could match. Let’s try:

Street.find_with_ferret “al*”

-> [“Alte Bürger”, “Alter Fährweg”, “Am Alten Vorhafen”, …]

Fine so far. Next:

Street.find_with_ferret “alt*”

-> [“Alte Bürger”, “Alter Fährweg”, “Am Alten Vorhafen”, …]

No let’s add another letter:

Street.find_with_ferret “alt*”

-> []

Whoops, nothing there. It should match all the same list entries. It
looks like this happens to all words added to the index using a stemming
analyzer. Using without wildcards works:

Street.find_with_ferret “alte”

-> [“Alte Bürger”, “Alter Fährweg”, “Am Alten Vorhafen”, …]

Something similar happens with other search terms:

–> Database contains “Rasenweg” (“weg” is stripped away by an analyzer
and also a stopword)

Street.find_with_ferret “rasen*”

-> [] # <-- unexpected

Street.find_with_ferret “rasen”

-> [“Rasenweg”] # <-- expected

Street.find_with_ferret “ras*”

-> [“Rasenweg”] # <-- expected

How can I fix this or how is this usually handled? I need to do queries
like this:

Street.find_with_ferret “(alte~ bü~)||(altebü)”

and it should return “Alte Bürger” in the results. This works when I
reformulate the query to:

Street.find_with_ferret “alte~||bü~||altebü”

but this delivers way too inaccurate results.