[ANN] Ferret 0.9.0-alpha (port of Apache Lucene to pure ruby

Hi Folks,

I’ve just released version 0.9.0. This latest version of Ferret is an
alpha release. I have removed the old c extension and Ferret is now
running on a fully ported C library. This has allowed some huge
performance improvements both with regard to memory and CPU usage.

There will probably be a few portability issues to start with. It has
been developed on Linux so it should work fine there. Windows and Mac
users beware.

Also, the current version doesn’t allow you to extend Ferret. For
example, you can’t write your own analyzer or filter. This will be
rectified in the near future.

http://ferret.davebalmain.com/trac/

Dave Balmain

== Description

Ferret is a full port of the Apache Lucene searching and indexing
library. It’s available as a gem so try it out! To get started quickly
read the quick start at the project homepage;

http://ferret.davebalmain.com/api
http://ferret.davebalmain.com/api/files/TUTORIAL.html

== Changes

  • currently this version isn’t very extendable. For example,
    you can’t write your own Analyzer, Filter or Query.
  • changed Token#term_text to Token#text
  • changed Token#position_increment to Term#pos_inc
  • changed order of args to Token.new. Now Term.new(text, start_offset,
    end_offset, pos_inc=1, type=“text”). NOTE: type does nothing.
  • changed TermVectorOffsetInfo#start_offset to
    TermVectorOffsetInfo#start
  • changed TermVectorOffsetInfo#end_offset to TermVectorOffsetInfo#end
  • added :id_field option to Index::Index class.

hi david,
I installed 0.9.0 to a heavily busy webserver (100k pagevisits/day) and
its working flawlessly (at least it seems so :slight_smile: )… But I have a major
problem. Now ferret doesnt index nor search unicode turkish characters.
I was using StandardAnalyzer in 0.3.2 and it was working fine; because
w+ RegExp statement was somehow working with turkish charset (UTF-8) (in
normal conditions it shouldnt be; but I am luck I think :slight_smile: ).

Now is there a way that I can make ferret work with unicode again or
should I stick to 0.3.2

thanks in advance, thanks for great work.
onur

Hi Onur,

I’m trying to solve this problem right now. You had better stick with
0.3.2 for the moment but better analyzer support is on it’s way. I’m
still trying to decide whether to include Oniguruma (the future Ruby
regexp library) with ferret or just use the current regex library
comes with Ruby. ferret-0.9.1 should have UTF-8 support.

Cheers,
Dave

thanks david, keep up the great work.