2010-12-02

I'm using the German stemming analyzer to index a database using
acts_as_ferret. I have some troube with wildcard queries which I'm
extensivly using (and needing) for autocompleter fields.

The problem is the following:

In this example, the indexed model contains street names. Some of these
names are:

* Alte Bürger
* Alter Fährweg
* Am Alten Vorhafen
* ...

So lots of names which Street#find_with_ferret could match. Let's try:

# Street.find_with_ferret "al*"
-> ["Alte Bürger", "Alter Fährweg", "Am Alten Vorhafen", ...]

Fine so far. Next:

# Street.find_with_ferret "alt*"
-> ["Alte Bürger", "Alter Fährweg", "Am Alten Vorhafen", ...]

No let's add another letter:

# Street.find_with_ferret "alt*"
-> []

Whoops, nothing there. It should match all the same list entries. It
looks like this happens to all words added to the index using a stemming
analyzer. Using without wildcards works:

# Street.find_with_ferret "alte"
-> ["Alte Bürger", "Alter Fährweg", "Am Alten Vorhafen", ...]

Something similar happens with other search terms:

--> Database contains "Rasenweg" ("weg" is stripped away by an analyzer
and also a stopword)

# Street.find_with_ferret "rasen*"
-> []           # <-- unexpected

# Street.find_with_ferret "rasen"
-> ["Rasenweg"] # <-- expected

# Street.find_with_ferret "ras*"
-> ["Rasenweg"] # <-- expected

How can I fix this or how is this usually handled? I need to do queries
like this:

# Street.find_with_ferret "(alte~ bü~)||(alte*bü*)"

and it should return "Alte Bürger" in the results. This works when I
reformulate the query to:

# Street.find_with_ferret "alte~||bü~||alte*bü*"

but this delivers way too inaccurate results.
