Hi all,
maybe not a Ferret question, but I assume here might have came across
that already.
I wrote a simple CGI app that adds docs into a Ferret index. The idea
is testing asian languages input and searching.
The script that does the input seems to be OK. As David mentioned in a
question I made a little while ago, Ferret’s index is agnostic, in the
sense that you can store anything in it. I then wrote another one to
search the index created. This is what it looks like:
####################################
#!/usr/bin/ruby
$KCODE = ‘u’
require ‘cgi’
require ‘ferret’
include Ferret
index = Index::Index.new(:path => ‘/var/index’, :default_field => “*”)
cgi = CGI.new(“html4”)
result = “”
if cgi[‘query’] and not cgi[‘query’].empty?
index.search_each(cgi[‘query’]) do |doc, score|
result << "
#{index[doc]['tileid']} | #{index[doc]['title']} | #{index[doc]['description']} |
It’s A-OK for searching english. But when trying to input chinese
characters in the “query” field, I’m getting the following error in my
lighttpd log file:
####################################
/var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:15:in
`search_each’: : Error occured at <analysis.c>:701 (Exception)
Error: exception 2 not handled: Error decoding input string. Check
that you have the locale set correctly
from /var/www/localhost/htdocs/cgi-bin/search_chinese.ruby:15
####################################
Is the error message above suggesting I should specify a chinese
locale and not UTF-8? I thought UTF-8 would actually handle chinese
and anything else one could throw at it as long as it’s a human
language.
Any help is appreciated.