Problems with Unicode and acts_as_ferret

Thanks to Dave and everyone who has contributed to ferret and
acts_as_ferret! Man, can I not wait until I get this up and working on
my project. Just the fuzzy search alone will be worth it.

I’m having problems with rebuilding the index. My database includes
Unicode entries, and I have configured the rest of Rails to correctly
use it. On the initial index creation, I get this exception:

: Error occured at :678
Error: exception 2 not handled: Error decoding input string. Check that
you have the locale set correctly

I tried the methods that Albert Delamednolls wrote about on his blog for
Unicode and acts_as_rails but that failed for me. It seems like that
might have been based on older versions of acts_as_rails but I just
couldn’t make that work.

I’ve got two levels of solution that I’m looking for. The best would be
to figure out the errors and be able to index even when the content
contains Unicode characters. As a workaround, if I could just trap the
exception during indexing so that it would skip the problematic entries
but index everything else, that would get me going. I tried to catch the
exception in this line from rebuild_index:

self.find(:all, :order=>“id” ).each { |content| index <<
content.to_doc }

But even though I tried to rescue there, the exception still is thrown.
Does anyone have any insight on this?

Have you looked at changeset 302 for Ferret
http://ferret.davebalmain.com/trac/changeset/302

It seems to adress your problem.

Dave Slusher wrote:

I’m having problems with rebuilding the index. My database includes
Unicode entries, and I have configured the rest of Rails to correctly
use it. On the initial index creation, I get this exception: