I’ve installed ferret 0.10.9 together with the latest acts_as_ferret
using Windows XP and indexed a location database (geonames.org) with
Location.rebuild_index. The data is in utf-8.
Now calling Location.find_by_contents “ö” does not return a result,
causes a lot of CPU load, and finally exits with an error “index.rb:702:
in ‘parse’: failed to allocate memory (NoMemoryError)”. Seems a problem
in ‘process_query’.
Similar results for sometimes for other German Umlauts…
On 3/19/07, Star B. [email protected] wrote:
I’ve installed ferret 0.10.9 together with the latest acts_as_ferret
using Windows XP and indexed a location database (geonames.org) with
Location.rebuild_index. The data is in utf-8.
Now calling Location.find_by_contents “ö” does not return a result,
causes a lot of CPU load, and finally exits with an error “index.rb:702:
in ‘parse’: failed to allocate memory (NoMemoryError)”. Seems a problem
in ‘process_query’.
Similar results for sometimes for other German Umlauts…
Unfortunately Ferret doesn’t come with UTF-8 support in Windows as the
win32 runtime environment doesn’t seem to support UTF-8. You will
therefore need to write your own analyzer on Windows if you want to
support UTF-8 searches.
Hopefully the NoMemoryError will be fixed in the next win32 gem I
release.
David B. wrote:
Unfortunately Ferret doesn’t come with UTF-8 support in Windows as the
win32 runtime environment doesn’t seem to support UTF-8. You will
therefore need to write your own analyzer on Windows if you want to
support UTF-8 searches.
Hello Star B.,
if you’re planning to write your own UTF-8 Analyzer consider the
unpack/pack duo:
utf-8_encoded_string_from_db.unpack(“U*”).pack(“C*”)
@index << {:content => utf-8_encoded_string_from_db}
@index.search_each(‘content:Behörde’) {|id,score| do_sth}
I didn’t try this in afa, but with ruby it worked in my case.
I tried this with an UTF-8 encoded string (japanese):
“\u304A\u308C\u3068\u9B5A”.unpack(“U*”).pack(“C*”)
Which gives me this in return:
“u304Au308Cu3068u9B5A”
And that’s not what I want stored in my index, right?
Now I’m pretty sure I’m doing something dumb
hopefully someone can
clarify.
Thanks.