How to proceed with incorporating Ferret?

Hi,

I’ve listened in to this mail list for quite a while now but not
doing anything with Ferret until I was ready to incorporate it. I’ve
used Lucene for years, but not Ferret.

I downloaded and installed the ‘bleeding edge’ version (lets call it
0.10.9.1). There appears to be a significant re-working of the API
happening. It all looks good. But there might be a couple of gaps
still there.

The first question: should I even consider using the 0.10.9.1 version
of Ferret? What I intend to use it for will not be a critical
component, at least for the time being. I’m also used to working with
shifting software. The advantage that I see is the new API.
Performance is a BIG issue with my project.

The second question: are there any opinions regarding ease-of-upgrade
from the current stable version to what is being worked on now. I
don’t have anything to upgrade at the moment, but if I go with the
stable version then I will have.

The third question: it looks to me that in the 0.10.9.1 version the
content of the fields is being stored in the index. For my
application this is worse than a waste of time. Am I missing something.

The fourth question: in a message from August 23 there was a hint of
a write-up discussing the new API. Did this ever get published?

I think there is some very nice work here. I’m looking forward to
using Ferret.

Cheers,
Bob


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

On 10/7/06, Bob H. [email protected] wrote:

Hi,

I’ve listened in to this mail list for quite a while now but not
doing anything with Ferret until I was ready to incorporate it. I’ve
used Lucene for years, but not Ferret.

I downloaded and installed the ‘bleeding edge’ version (lets call it
0.10.9.1). There appears to be a significant re-working of the API
happening. It all looks good. But there might be a couple of gaps
still there.

I’m all ears. What do you think needs improvement?

The first question: should I even consider using the 0.10.9.1 version
of Ferret? What I intend to use it for will not be a critical
component, at least for the time being. I’m also used to working with
shifting software. The advantage that I see is the new API.
Performance is a BIG issue with my project.

I’ve just release 0.10.10. Version 0.10.9 is probably the most stable
version to date. 0.10.10 has some significant changes to improve
performance of sorting and filtering of large unoptimized indexes
(putting Ferret orders up to orders of magnitude ahead of Lucene for
these tasks). In a few days we should know if I broke anything. There
are currently only 3 outstanding tickets on Trac and they are only on
Windows and OS X so if you are on Linux you should be fine.

The second question: are there any opinions regarding ease-of-upgrade
from the current stable version to what is being worked on now. I
don’t have anything to upgrade at the moment, but if I go with the
stable version then I will have.

Well, 0.10.9 is the most stable version since the pure ruby version so
that would be the version I go with. Also, I can usually fix most
problems within a day or two if I can reproduce the problem or you are
willing to give me ssh access to your server.

The third question: it looks to me that in the 0.10.9.1 version the
content of the fields is being stored in the index. For my
application this is worse than a waste of time. Am I missing something.

It depends how you set your index up. You specify which fields you
want stored/indexed or term-vectorized (I know, it’s not a word).

# set to not store fields by default
field_infos = FieldInfos.new(:store => :no)
# must store id field however
field_infos.add_field(:id, :store => :yes, :index => :untokenized)

The fourth question: in a message from August 23 there was a hint of
a write-up discussing the new API. Did this ever get published?

No. But I did update the documentation here:

http://ferret.davebalmain.com/api/files/TUTORIAL.html

You may even find the Ferret FAQ even better.

http://ferret.davebalmain.com/trac/wiki/FAQ

And there may be an O’Reilly “shortcut” coming out soon.

I think there is some very nice work here. I’m looking forward to
using Ferret.

Great. Thanks,
Dave

On 8-Oct-06, at 12:24 AM, David B. wrote:

still there.

I’m all ears. What do you think needs improvement?

It may simply be a misunderstanding on my part, read on. I also can’t
figure out how to redefine the field used as an id (again, read on,
the documented way isn’t working for me and probably because of what
comes up below).

(putting Ferret orders up to orders of magnitude ahead of Lucene for
these tasks). In a few days we should know if I broke anything. There
are currently only 3 outstanding tickets on Trac and they are only on
Windows and OS X so if you are on Linux you should be fine.

Of course I’m running OS X… this couldn’t be easy :slight_smile: I’m also
seeing issues 127 and 136 (like everyone else on OS X will be).
Another thing for OS X, until Apple fixes their gcc4 compiler either
use the gcc3 compiler or use -O1 rather than -O2. I changed the
ext_conf file to do this, but the two OS X issue remain. If you don’t
do this you will eventually get a corrupted heap (usually takes a
while). I’ve had to recompile ruby to this optimisation level for it
to work reliably.

The second question: are there any opinions regarding ease-of-upgrade
from the current stable version to what is being worked on now. I
don’t have anything to upgrade at the moment, but if I go with the
stable version then I will have.

Well, 0.10.9 is the most stable version since the pure ruby version so
that would be the version I go with. Also, I can usually fix most
problems within a day or two if I can reproduce the problem or you are
willing to give me ssh access to your server.

Okay, I’m convinced. The most recent is the way to go.

# set to not store fields by default
field_infos = FieldInfos.new(:store => :no)
# must store id field however
field_infos.add_field(:id, :store => :yes, :index => :untokenized)

So, I tried requiring ferret. It simply won’t admit to knowing
anything about the FieldInfos class. How bad are those two remaining
OS X bugs?

So, I tried requiring rferret. That worked better.

I tried your example (actually I tried this before posting and this
is why I said I thought I saw a few gaps). It doesn’t work for me.
The initialize method for FieldInfos is defined as:

   def initialize(dir = nil, name = nil)
     @fi_array = []
     @fi_hash = {}
     if dir and dir.exists?(name)

The options in your example are assigned to the dir and an exists?
method is undefined on a hash and so a method missing exception is
thrown.

I’ve happily forgotten most of my C code, but it looks as though the
C version is doing something similar (not that it matters in my case
because FieldInfos is invisible)

The fourth question: in a message from August 23 there was a hint of
a write-up discussing the new API. Did this ever get published?

No. But I did update the documentation here:

http://ferret.davebalmain.com/api/files/TUTORIAL.html

I thought that was the old way since I couldn’t get it to work (see
above).

You may even find the Ferret FAQ even better.

http://ferret.davebalmain.com/trac/wiki/FAQ

I don’t know how I missed that. Thanks.

And there may be an O’Reilly “shortcut” coming out soon.

That’s great!

Cheers,
Bob

I think there is some very nice work here. I’m looking forward to
using Ferret.

Great. Thanks,
Dave


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

It looks as though I somehow got the wrong version out of subversion.
Hold on while I do this again. Sorry about that.

Bob

On 8-Oct-06, at 11:11 AM, Bob H. wrote:

I downloaded and installed the ‘bleeding edge’ version (lets call it

version to date. 0.10.10 has some significant changes to improve
the ext_conf file to do this, but the two OS X issue remain. If you

The third question: it looks to me that in the 0.10.9.1 version the
# must store id field however
is why I said I thought I saw a few gaps). It doesn’t work for me.

http://ferret.davebalmain.com/api/files/TUTORIAL.html

using Ferret.
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – <http://rubyforge.org/projects/
xampl/>


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

On 10/9/06, Bob H. [email protected] wrote:

svn checkout svn://davebalmain.com/ferret/trunk ferret

and I don’t remember where I got that from)

Well, I’m comfortably set.

Cheers,
Bob

Sorry that was my fault. The current version of Ferret is in a
different repository:

svn co svn://www.davebalmain.com/exp ferret

The reason for this was that the curretn version started out as an
experimental version where I was trying a few things out and ended out
being a complete rewrite with different file format and all. I still
have to roll it into the original ferret repository.

Dave

On 8-Oct-06, at 11:32 AM, Bob H. wrote:

It looks as though I somehow got the wrong version out of
subversion. Hold on while I do this again. Sorry about that.

That is what happened, sorry for the noise. The 0.10.10 version is
running at least 225 times faster. And the tutorial works. Sigh.

(I got the version I was working from with this command:

svn checkout svn://davebalmain.com/ferret/trunk ferret

and I don’t remember where I got that from)

Well, I’m comfortably set.

Cheers,
Bob


Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/