[ANN] Ferret 0.9.0-alpha (port of Apache Lucene to pure ruby

david · March 19, 2006, 1:12pm

Hi Folks,

I’ve just released version 0.9.0. This latest version of Ferret is an
alpha release. I have removed the old c extension and Ferret is now
running on a fully ported C library. This has allowed some huge
performance improvements both with regard to memory and CPU usage.

There will probably be a few portability issues to start with. It has
been developed on Linux so it should work fine there. Windows and Mac
users beware.

Also, the current version doesn’t allow you to extend Ferret. For
example, you can’t write your own analyzer or filter. This will be
rectified in the near future.

http://ferret.davebalmain.com/trac/

Dave Balmain

== Description

Ferret is a full port of the Apache Lucene searching and indexing
library. It’s available as a gem so try it out! To get started quickly
read the quick start at the project homepage;

http://ferret.davebalmain.com/api
http://ferret.davebalmain.com/api/files/TUTORIAL.html

== Changes

currently this version isn’t very extendable. For example,
you can’t write your own Analyzer, Filter or Query.
changed Token#term_text to Token#text
changed Token#position_increment to Term#pos_inc
changed order of args to Token.new. Now Term.new(text, start_offset,
end_offset, pos_inc=1, type=“text”). NOTE: type does nothing.
changed TermVectorOffsetInfo#start_offset to
TermVectorOffsetInfo#start
changed TermVectorOffsetInfo#end_offset to TermVectorOffsetInfo#end
added :id_field option to Index::Index class.

david · March 28, 2006, 1:34pm

hi david,
I installed 0.9.0 to a heavily busy webserver (100k pagevisits/day) and
its working flawlessly (at least it seems so )… But I have a major
problem. Now ferret doesnt index nor search unicode turkish characters.
I was using StandardAnalyzer in 0.3.2 and it was working fine; because
w+ RegExp statement was somehow working with turkish charset (UTF-8) (in
normal conditions it shouldnt be; but I am luck I think ).

Now is there a way that I can make ferret work with unicode again or
should I stick to 0.3.2

thanks in advance, thanks for great work.
onur

david · March 28, 2006, 3:04pm

Hi Onur,

I’m trying to solve this problem right now. You had better stick with
0.3.2 for the moment but better analyzer support is on it’s way. I’m
still trying to decide whether to include Oniguruma (the future Ruby
regexp library) with ferret or just use the current regex library
comes with Ruby. ferret-0.9.1 should have UTF-8 support.

Cheers,
Dave

david · March 29, 2006, 12:43pm

thanks david, keep up the great work.