On Jan 22, 2007, at 2:49 PM, Jens K. wrote:
document.
„Horsemen“").gsub(/[^a-zA-Z0-9/im,"")
But perhaps there is a better, or built in solution.
I don’t think so - a custom Analyzer would be the right place for
this.
We use a normalizer to store/query (to be revised for Rails 1.2):
Utility method that retursn an ASCIIfied, downcased, and
sanitized string.
It relies on the Unicode Hacks plugin by means of String#chars.
We assume
$KCODE is ‘u’ in environment.rb. By now we support a wide range
of latin
accented letters, based on the Unicode Character Palette bundled
in Macs.
def self.normalize(str)
n = str.chars.downcase.strip.to_s
n.gsub!(/[à áâãäåÄă]/, ‘a’)
n.gsub!(/æ/, ‘ae’)
n.gsub!(/[ÄÄ‘]/, ‘d’)
n.gsub!(/[çćÄĉċ]/, ‘c’)
n.gsub!(/[èéêëēęěĕė]/, ‘e’)
n.gsub!(/Æ’/, ‘f’)
n.gsub!(/[ÄÄŸÄ¡Ä£]/, ‘g’)
n.gsub!(/[ĥħ]/, ‘h’)
n.gsub!(/[ììÃîïīĩÄ]/, ‘i’)
n.gsub!(/[įıijĵ]/, ‘j’)
n.gsub!(/[ķĸ]/, ‘k’)
n.gsub!(/[łľĺļŀ]/, ‘l’)
n.gsub!(/[ñńňņʼnŋ]/, ‘n’)
n.gsub!(/[òóôõöøÅÅ‘ÅÅ]/, ‘o’)
n.gsub!(/Å“/, ‘oe’)
n.gsub!(/Ä…/, ‘q’)
n.gsub!(/[ŕřŗ]/, ‘r’)
n.gsub!(/[śšşÅÈ™]/, ‘s’)
n.gsub!(/[ťţŧț]/, ‘t’)
n.gsub!(/[ùúûüūůűÅũų]/, ‘u’)
n.gsub!(/ŵ/, ‘w’)
n.gsub!(/[ýÿŷ]/, ‘y’)
n.gsub!(/[žżź]/, ‘z’)
n.gsub!(/\s+/, ’ ')
n.gsub!(/[^\sa-z0-9_-]/, ‘’)
n
end
And this convenience class method to use in Rails models with
acts_as_ferret (slightly edited):
Wrapper function to normalize fields before calling acts_as_ferret
Usage: index_fields [:field1, :field2], :option1
=> …, :option2 => …
Please note that your queries should use a “_normalized” suffix on
each field, i.e: +field1_normalized:foo
class ActiveRecord::Base
def self.index_fields(fields, *options)
aaf_fields = []
fields.each do |f|
class_eval <<-EOS
def #{f}_normalized
MyAppUtils.normalize(#{f})
end
EOS
aaf_fields.push “:#{f}_normalized”
end
aaf_call = ‘acts_as_ferret :fields => [’ + aaf_fields.join
(’,’) + ‘]’
options.each do |option_pair|
option_pair.each do |key, value|
aaf_call << “, :#{key} => #{value}”
end
end
logger.info aaf_call
class_eval(aaf_call)
end
end
– fxn