Chinese search support


#1

I need decide on if our site will go with Java or Ruby on Rails. The
major factor is that does Farret support Lucene’s ChineseAnalyzer or
CJKAnalyzer or not.

Can anyboby shine some lights on Farret’s Chinese search support?

Really appreciate.


#2

Hi Jerry,
Basically you’ll have to write an analyzer that matches Chinese tokens
(words). If you can write a regular expression in Ruby that matches
Chinese tokens then it’s very simple to write an Analyzer for Ferret.
I haven’t looked at teh CJKAnalyzer in Lucene but I can’t imagine it
would be too hard to port to Ruby.

Cheers,
Dave


#3

There is nothing fancy about the CJKAnalyzer… it chunks characters
into pairs. So the phrase ä½ å¥½å? would be tokenized into two
tokens [ä½ å¥½] [好å?].

Erik