We’re using Ferret in a slightly unorthodox way: We’re indexing a
large (>100,000) list of names of places all around the world. Mostly
we’re quite happy with it, and have been able to graft on our own
particular required functionality with just a little tweaking.
There’s one strange problem, though: We’ve got a place in Cyprus
called “Gazima\304\237usa” (that \304\237 is a multibyte character in
UTF-8), and it matches a search for “usa”. We’d rather it not match.
I don’t know that much about Ferret or about this sort of indexing in
general, but is this because Ferret views \304\237 as a word break,
and splits the name into two words? If so, is there a way you’d
recommend to get around this – keeping in mind that we’ve got names
in romanized forms of many different languages?
Thanks in advance,