or maybe not a bug :S
so back to zero
require ‘rubygems’
require ‘ferret’
Ferret.locale = ‘’ #“de_DE.iso88591”
i = Ferret::I.new
i << ‘Ãœbersicht’
i << ‘übersicht’
for q in [ ‘Ãœbersicht’, ‘übersicht’, ‘Ãœber*’, ‘über*’, ‘*bersicht’ ]
puts “#{q} : #{i.search(q).total_hits} hit(s)”
end
with an empty locale in the test script it’ll work in the new version as
well.
but in my rails app the aaf generated index will have broken umlauts
with an empty Ferret.locale.
e.g. the word “Ãœbersicht” in the index shows this behavior when queried:
“Ãœbersicht” = hit
“übersicht” = hit
“Ãœbers*” = no hit
“übers*” = no hit
“bersicht” = hit (?!?!)
with a locale set to “de_DE.iso88591” the umlauts seem correct but case
sensitive.
Query
“Ãœbersicht” = hit
“übersicht” = no hit
“Ãœbers*” = hit
“ÃœBERSICHT” = hit
“üBERSICHT” = no hit
“ÃœBERsi*” = hit
i simplified my model a bit to speed up the 200 index rebuilds i’ve done
the last days:
acts_as_ferret( { :fields => [ :title ], :remote => true }, {
:analyzer => GermanStemmingAnalyzer.new } )
def title
Iconv.new(‘ISO-8859-1’, ‘UTF-8’).iconv(self.xstrtitle.to_s)
end
here are a couple of terms from the index:
[“massnahm”,2],
[“medi”,1],
[“medikament”,1],
[“patientenwert”,1],
[“patientinn”,1],
[“prufprotokoll”,1],
[“regionalanasthesi”,2],
[“reisekostenabrechn”,1],
[“reparaturanzeig”,2],
[“schwachelt”,1],
[“sonderw”,1],
[“ssnahmenkurz”,1],
[“stundenabrechn”,2],
[“sturzereignisprotokoll”,1],
[“urlaubsubertrag”,1],
[“verwalt”,1],
[“zuschlagsformular”,1],
[“zytostatica”,2],
[“Äquivalenzdos”,1],
[“Ãœbergabeprotokoll”,1],
[“Ãœbersicht”,1],
[“Ãœberstundendokumentation”,1]]
the lowercase umlauts seem to be properly processed by the lowercase
filter through the stemming analyzer, just the four terms on the end
that start with uppercase umlauts are unprocessed
any idea? i can’t think of anything else i could try (except solr)