Synonyms

Hi,

Using ferret and acts_as_ferret.
Great work.

Is there a way to define some synonyms (searchable words that would not
appear in the texts ?
Like stop words, but instead of being removed from query and index,
they would be added :wink:

Can some synonyms be regexp ? I’d like for instance to have ? (oelig)
be equivalent to oe in French.
Or maybe an utf8 normalization could achieve this last point easier ?

Jean-Christophe M.

Symétrie, édition de musique et services multimédia
30 rue Jean-Baptiste Say
69001 LYON (FRANCE)
tél +33 (0)478 29 52 14
fax +33 (0)478 30 01 11
web www.symetrie.com

Hi!

On Tue, Aug 22, 2006 at 10:35:52AM +0200, Jean-Christophe M. wrote:

Hi,

Using ferret and acts_as_ferret.
Great work.

thanks :slight_smile:

Is there a way to define some synonyms (searchable words that would not
appear in the texts ?
Like stop words, but instead of being removed from query and index,
they would be added :wink:

This can be done with a custom analyzer. The Lucene in Action book has a
good chapter on the whola analysis topic, which does cover synonyms,
too. You really should get this if you intend to do serious work with
Ferret and/or Lucene, it was really helpful to me.

Basically you can add synonyms to your index at indexing time (afair
by having multiple terms sharing the same TermPosition), or you
can expand your user’s queries using synonyms (e.g. the term ‘lift’
could be expaned to the boolean clause ‘lift OR elevator’). There’s some
code in lucene contrib that takes the wordnet synonym database and
builds an index from it, that in turn can be used for the query
expansion task.

Can some synonyms be regexp ? I’d like for instance to have ½ (oelig)
be equivalent to oe in French.
Or maybe an utf8 normalization could achieve this last point easier ?

I would put this into a custom analyzer that then gets used for indexing
and query parsing. Just replace the ½ to oe in both queries and indexed
text. But remember that you lose some information that way, as now any
query having ‘oe’ in it will also match terms that once had ½ in this
place.

Jens

–
webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

On 8/22/06, Jens K. [email protected] wrote:

I would put this into a custom analyzer that then gets used for indexing
and query parsing. Just replace the ? to oe in both queries and indexed
text. But remember that you lose some information that way, as now any
query having ‘oe’ in it will also match terms that once had ? in this
place.

Is this possible? I mean, are there any words where changing ? to oe
would change the meaning or any other examples like ß to ss? Just
curious. I’m just an ignorant English speaker. :wink:

Dave

Is this possible? I mean, are there any words where changing ? to oe
would change the meaning or any other examples like ß to ss? Just
curious. I’m just an ignorant English speaker. :wink:

well… i can think of examples where an ä must not be changed to an ae …
umlauts and its ‘ascii counterpart’ are not interchangeable… but it
would
work most of the time …

greetings from an civilised european to an ignorant english speaker :wink: