Ferret 0.10.6 released (and some benchmarks)

Hey folks,

** Description **

Firstly for those who don’t know, Ferret is a full-text search library
which makes adding search to your application a breeze. It’s much
faster than MySQL full-text search as well most other search libraries
out there. It allows you to do Boolean (+ruby + rails -jewelry) and
phrase queries (“the quick brown fox”) as well as some more unusual
queries like fuzzy queries (misspelling~ matches mispeling or
misspellng), wildcard queries (Aus?ral*), range queries
(date:<=20050601) and a lot more. Ferret also now offers query result
highlighting and excerpting.

** Announcement **

This is the first Ferret announcement I’ve put up for a while, the
reason being, the most recent releases of Ferret have been alpha
releases. I completely rewrote Ferret from the ground up so that it
no-longer uses Lucene’s file format and I was able to gain so great
performance improvements in the process.

On the topic of performance, it has recently been brought to my
attention that some people are aware of Ferret but avoid it because
they think it is slow. Just to put that myth to rest, here are the
outputs for a simple benchmark, indexing the reuters corpus available
at:

http://www.daviddlewis.com/resources/testcollections/reuters21578/

First Apache Lucene. (Yes Java users, as you can see, I did warm up
the JVM (with 6 repetitions of the test) and I used the options
-server -Xmx500M -XX:CompileThreshold=100 so this is a fair test).


1 Secs: 47.09 Docs: 19043
2 Secs: 46.46 Docs: 19043
3 Secs: 44.07 Docs: 19043
4 Secs: 45.92 Docs: 19043
5 Secs: 45.97 Docs: 19043
6 Secs: 47.06 Docs: 19043

Lucene 1.9-rc1-dev
JVM 1.5.0_06 (Sun Microsystems Inc.)
Linux 2.6.15-27-386 i386
Mean: 46.10 secs
Truncated mean (4 kept, 2 discarded): 46.35 secs

And now Ferret:


0 Secs: 8.03 Docs: 19043
1 Secs: 10.15 Docs: 19043
2 Secs: 9.78 Docs: 19043
3 Secs: 10.31 Docs: 19043
4 Secs: 9.78 Docs: 19043
5 Secs: 10.13 Docs: 19043

Mean 9.70 secs
Truncated Mean (4 kept, 2 discarded): 9.96 secs

So as you can see, performance is no longer a problem. (incidentally,
the pure C version can index the reuters corpus in under 3 seconds, an
order of magnitude faster than Lucene).

One new addition in the 0.10.* series of Ferret is a win32 gem so all
those windows users out there can now get the super speed searches
too.

There have also be a lot of other changes in the Ferret API. You may
want to check out the documentation for a refresher:

http://ferret.davebalmain.com/api
http://ferret.davebalmain.com/api/files/TUTORIAL.html

** Now Accepting Donations **

Ferret has been a labour of love but it has taken up a lot more of my
life than I ever expected. At in excess of 50,000 lines of code, I
believe it is one of the largest Ruby projects, especially with only a
single developer. (previous version before rewrite had >70,000 LOC so
added together that is a lot of work). I would love to keep pushing
Ferret forward at the rate it has been going but other things are
going to have to start taking priority (like putting food on the
table). If you find Ferret useful in your application and you aren’t
able to contribute with the development, please consider making a
donation at the Ferret website:

http://ferret.davebalmain.com/trac

So where do I see Ferret going in the future? I’d really like to build
an object-database based on Ferret, with ActiveRecord and Og bindings.
Why?:

* Fixes the current DRY problems with Ferret. ie, should you store

data in the Ferret index to take advantage or highlighting? Or build
your own highlighter so that the data isn’t stored in two places.
* Simplifies things. You’ll be able to forget about IndexReaders,
IndexWriters, file-locking, etcetera. Just create the database as you
usually would and you have Ferret full-text search built in.
* Range queries just work. No need to pad numbers or format dates
correctly.
* Sort just works. And it won’t take forever to build the
sort-index (currently a problem on very large indexes).
* Performance, performance, performance. As people are often
pointing out, the bottle neck in many applications falls in the data
access layer. Mapping relational database schemas to Ruby objects (or
any OO language for that matter) can be very expensive at run-time. A
good object database should easily outperform even SQLite. (and I’m
being very cautious here)

Right now, I’d need to raise at least 5 figures before I’d consider
this undertaking please send some encouragement my way if you would be
interested in something like this. Otherwise I’d appreciate any kind
of contribution, financial or assisting with development. In the mean
time I will continue to improve test coverage and Ferret
documentation, fix bugs and help people on the Ferret mailing list.

Happy Ferreting.
Dave

On Sep 20, 2006, at 7:48 PM, David B. wrote:

queries like fuzzy queries (misspelling~ matches mispeling or
performance improvements in the process.

Hey Dave-

Thank you for your continuing hard work on ferret. I am using it

heavily in quite a few production rails applications and a few pure
ruby projects too. This looks like a nice improvement over the last
version and the benchmarks look great.

Thanks
-Ezra

On 9/21/06, David B. [email protected] wrote:

queries like fuzzy queries (misspelling~ matches mispeling or
misspellng), wildcard queries (Aus?ral*), range queries
(date:<=20050601) and a lot more. Ferret also now offers query result
highlighting and excerpting.

Dave, congratulations on the great work you’ve done. I’ve been using
Ferret a lot and it has never disappointed me. The benchmarks are
fantastic, looking forward to what’s coming next.

Thanks,
Max

On Thursday 21 September 2006 03:48, David B. wrote:

able to contribute with the development, please consider making a
donation at the Ferret website:

http://ferret.davebalmain.com/trac

So where do I see Ferret going in the future? I’d really like to build
an object-database based on Ferret, with ActiveRecord and Og bindings.

Wow, I’m completely astounded by the work you’ve done with ferret.
You’re a
one man coding machine. Especially considering the number of projects or
attempts to port lucene to C or other languages that have floundered.
Might I
suggest you post this announcement/call for donations to the rails
mailing
list? I think people might be very interested in your idea for an
object
database built on Ferret with AR bindings. That would be an incredibly
exciting development, and hopefully some of the big Rails users will
realise
that.

Regards,

Alex

On Thursday 21 September 2006 15:48, David B. wrote:

Thanks Alex. I did, in fact, announce this on the rails list as you
suggested. I agree that it would be very useful for a lot of Rails
developers, especially the way many are currently using relational
databases with AR (ie no foreign key constraints, all access to the
database through the model, one database per application). This
definitely something I’m very keen to do and I will get around to it
eventually, with or without support. It’s more a matter of whether
I’ll be able to do it in the next 6 months or the next 5 years. :slight_smile:

I hadn’t actually noticed your post was also sent to
[email protected] (should have checked). However, that address
is
now defunct as the RoR mailing list has moved to google groups.
http://groups.google.com/group/rubyonrails-talk

Your message seems to have made it to
http://groups.google.com/group/railinglist - which is some how
subscribed on
the old list and seems to get a post every few days or so. I was trying
to
work out why my client hadn’t picked up your message to the rails
list…

Regards,

Alex

On 9/21/06, A. S. Bradbury [email protected] wrote:

table). If you find Ferret useful in your application and you aren’t
one man coding machine. Especially considering the number of projects or
attempts to port lucene to C or other languages that have floundered. Might I
suggest you post this announcement/call for donations to the rails mailing
list? I think people might be very interested in your idea for an object
database built on Ferret with AR bindings. That would be an incredibly
exciting development, and hopefully some of the big Rails users will realise
that.

Regards,

Alex

Thanks Alex. I did, in fact, announce this on the rails list as you
suggested. I agree that it would be very useful for a lot of Rails
developers, especially the way many are currently using relational
databases with AR (ie no foreign key constraints, all access to the
database through the model, one database per application). This
definitely something I’m very keen to do and I will get around to it
eventually, with or without support. It’s more a matter of whether
I’ll be able to do it in the next 6 months or the next 5 years. :slight_smile:

Thanks again for your support.

Dave