Ignoring 'accents' when i search

Does anyone know of a way of being ‘accent-insensitive’ when i do a
search?

For example, if i have a resource with the name “La Bohème”, and someone
searches for ‘boheme’ i want them to find that resource, even though the
‘e’
doesn’t have the accent. At the moment, it will only find it if they
search
for the properly accented version.

I guess soundex support for ferret is what I mean, but maybe there’s
another
way?

thanks, max

I just discovered the rather handy fuzzy searches, which i can do by
adding
(eg) “~0.6” to the end of my search term. So, this does the job (yay),
but
i’d still be interested in hearing if anyone else has solved this
problem in
a different way. :slight_smile:

Hi!

You might create a custom Analyzer that does the job of replacing
accentuated characters with their non-accentuated counterparts. If you
apply this kind of analysis to both indexed content and queries, you’ll
find “La Bohème” with both ‘boheme’ and ‘bohème’ as the query string.

there’s a sample method that does the replacement part of the job up on
the aaf wiki: http://projects.jkraemer.net/acts_as_ferret/#UTF-8support

Have a look at the analyzer used in the omdb project for a more complete
example:
https://svn.omdb-beta.org/trunk/lib/omdb/ferret/omdb_analyzer.rb

Cheers,
Jens

On Mon, Apr 21, 2008 at 05:49:43PM +0100, Max W. wrote:

For example, if i have a resource with the name “La Bohème”, and someone
searches for ‘boheme’ i want them to find that resource, even though the ‘e’
doesn’t have the accent. At the moment, it will only find it if they search
for the properly accented version.

I guess soundex support for ferret is what I mean, but maybe there’s
another way?

thanks, max


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

That’s very useful, thanks! I’m just using the fuzzy search for now,
but if
it proves too vague (too many false positive results) then i’ll look at
this.

I’d actually never seen that tr() method before, that combined with the
ready-made accent substitutions in your link is itself very handy!

cheers, max