Forum: Ferret How to have 'o' == 'ö'

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
15cf2f1fa9ef1e1e63a7baa03477b8b5?d=identicon&s=25 John Private (smokinggun)
on 2007-01-19 18:12
Greetings,

(using acts_as_ferret)

So I have a book title "Möngrel „Horsemen“" in my index.

Searching for "Möngrel" retrieves the document.

But I would like searching for "Mongrel" to also retrieve the document.
Which it does not currently.

Anyone have any good solutions to this problem?

I suppose I could filter the documents and queries first which something
like:


(Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "Möngrel
„Horsemen“").gsub(/[^a-zA-Z0-9/im,"")

But perhaps there is a better, or built in solution.


Thanks
C9dd93aa135988cabf9183d3210665ca?d=identicon&s=25 Jens Kraemer (Guest)
on 2007-01-22 14:55
(Received via mailing list)
On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote:
>
> Anyone have any good solutions to this problem?
>
> I suppose I could filter the documents and queries first which something
> like:
>
>
> (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "Möngrel
> „Horsemen“").gsub(/[^a-zA-Z0-9/im,"")
>
> But perhaps there is a better, or built in solution.

I don't think so - a custom Analyzer would be the right place for
this.

Jens

--
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer       kraemer@webit.de
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
7223c62b7310e164eb79c740188abbda?d=identicon&s=25 Xavier Noria (Guest)
on 2007-01-22 16:27
(Received via mailing list)
On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote:

>> document.
>> „Horsemen“").gsub(/[^a-zA-Z0-9/im,"")
>>
>> But perhaps there is a better, or built in solution.
>
> I don't think so - a custom Analyzer would be the right place for
> this.

We use a normalizer to store/query (to be revised for Rails 1.2):

   # Utility method that retursn an ASCIIfied, downcased, and
sanitized string.
   # It relies on the Unicode Hacks plugin by means of String#chars.
We assume
   # $KCODE is 'u' in environment.rb. By now we support a wide range
of latin
   # accented letters, based on the Unicode Character Palette bundled
in Macs.
   def self.normalize(str)
     n = str.chars.downcase.strip.to_s
     n.gsub!(/[àáâãäåāă]/,    'a')
     n.gsub!(/æ/,            'ae')
     n.gsub!(/[ďđ]/,          'd')
     n.gsub!(/[çćčĉċ]/,       'c')
     n.gsub!(/[èéêëēęěĕė]/,   'e')
     n.gsub!(/Æ’/,             'f')
     n.gsub!(/[ĝğġģ]/,        'g')
     n.gsub!(/[ĥħ]/,           'h')
     n.gsub!(/[ììíîïīĩĭ]/,    'i')
     n.gsub!(/[įıijĵ]/,        'j')
     n.gsub!(/[ķĸ]/,          'k')
     n.gsub!(/[łľĺļŀ]/,       'l')
     n.gsub!(/[ñńňņʼnŋ]/,      'n')
     n.gsub!(/[òóôõöøōőŏŏ]/,  'o')
     n.gsub!(/Å“/,            'oe')
     n.gsub!(/Ä…/,             'q')
     n.gsub!(/[ŕřŗ]/,         'r')
     n.gsub!(/[śšşŝș]/,       's')
     n.gsub!(/[ťţŧț]/,        't')
     n.gsub!(/[ùúûüūůűŭũų]/,  'u')
     n.gsub!(/ŵ/,             'w')
     n.gsub!(/[ýÿŷ]/,         'y')
     n.gsub!(/[žżź]/,         'z')
     n.gsub!(/\s+/,            ' ')
     n.gsub!(/[^\sa-z0-9_-]/,   '')
     n
   end

And this convenience class method to use in Rails models with
acts_as_ferret (slightly edited):

   # Wrapper function to normalize fields before calling acts_as_ferret
   #
   # Usage: index_fields [:field1, :field2], :option1
=> ..., :option2 => ...
   #
   # Please note that your queries should use a "_normalized" suffix on
   # each field, i.e: +field1_normalized:foo
   class ActiveRecord::Base
     def self.index_fields(fields, *options)
       aaf_fields = []
       fields.each do |f|
         class_eval <<-EOS
           def #{f}_normalized
             MyAppUtils.normalize(#{f})
           end
         EOS
         aaf_fields.push ":#{f}_normalized"
       end
       aaf_call = 'acts_as_ferret :fields => [' + aaf_fields.join
(',') + ']'
       options.each do |option_pair|
         option_pair.each do |key, value|
           aaf_call << ", :#{key} => #{value}"
         end
       end
       logger.info aaf_call
       class_eval(aaf_call)
     end
   end

-- fxn
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2007-02-24 13:58
(Received via mailing list)
On 1/23/07, Xavier Noria <fxn@hashref.com> wrote:
> >>
> >>
>    # Utility method that retursn an ASCIIfied, downcased, and
>      n.gsub!(/æ/,            'ae')
>      n.gsub!(/[ñńňņʼnŋ]/,      'n')
>      n.gsub!(/\s+/,            ' ')
> => ..., :option2 => ...
>            end
>        logger.info aaf_call
>        class_eval(aaf_call)
>      end
>    end
>
> -- fxn

Sorry to bring this one back from the archives (I'm going through all
the email I've missed in my long absence). Anyway, I thought that
since not even Jens knew about this I should point out the existence
of MappingFilter:

    http://ferret.davebalmain.com/api/classes/Ferret/A...

It essentially does the same thing as Xavier's code above but it is
much faster. It compiles the mappings to a single deterministic finite
automaton (DFA):

    http://en.wikipedia.org/wiki/Deterministic_finite_...

Basically, this means the filter does a single pass through the string
to do all the mappings rather than a pass for each mapping.

Hope that helps somebody,
Dave
This topic is locked and can not be replied to.