Fuzzy searching using Ferret and KirbyBase?

We need to get fuzzy search and full-text search. We’re using MySQL as a
backend right now, and another developer tried to switch the application
PostgreSQL, but it added more problems than it solved. So … I’d like
to find a way to stick with MySQL and have fuzzy searching.

Since PostgreSQL looks like it might not work, the first thing I thought
of was Ferret. The searches that Ferret is capable of are better than
the full-text and fuzzy searching that PostgreSQL seems to be able to
do, and if we could get it to work w/ our application, we’d be able to
stick with MySQL, which would also make life easier for the admins who
don’t know anything about PostgreSQL.

Someone proposed a solution for using Ferret with Rails on the Rails
list (I’m posting here instead because KirbyBase hasn’t been discussed
on the other list) that involved using a singleton, which would be
responsible for writing to the index so as to avoid any conflicts.
Unfortunately, since Rails can have multiple instances running, you not
only need to make sure there are no other instances of an index writer
within your process, but also within other processes. You could of
course prevent this by writing a lock file; however, then you have the
issue of communicating between different Rails instances.

Instead of trying to solve this problem, I had the idea that maybe
Ferret could be setup as a server, just like MySQL. There would only be
one instance of this daemon, and it could be responsible for dealing
with write conflicts (which would be much easier within a single
process). So … I’m all for using work that other people have done
first, so I started thinking about projects I could borrow server code
from, and thought of KirbyBase. What if KirbyBase were extended, I
thought, so that it could index fields using Ferret and return search
results? After some searching around, I realized that Zed S. had a
similar idea – his Ruby/Odeum library has an example that does exactly
that, but with Ruby/Odeum instead of Ferret. My immediate thought was
to forget about writing something new and just use what Zed had
written. Unfortunately, it seems Odeum doesn’t support fuzzy search.

So … before I embark on adapting Zed’s work to work with Ferret, I
thought I should ask … has anyone solved this problem already? Or is
there someone working on a Ferret server? Or has someone learned enough
from other projects to know that this isn’t going to work (I just
noticed that KirbyBase only just recently got a feature that would allow
it to write memo fields – I wonder if there’s anything else that might
be missing)?



Hi Jen,

I don’t really think a server is necessary. I guess it really depends
on what load you are expecting but Ferret already has file locking. If
you just flush the index everytime you do an update then you shouldn’t
have a problem. Granted this isn’t the most efficient way to work with
Ferret but it is certainly the easiest and you’d need to be doing a
lot of updates before the server model would be worthwhile.

I hope this helps. Please let me know if you are unclear on anything.
I’m thinking of adding an auto_flush option to the index so that you
wouldn’t need to worry about this. So you’d create the index like

index = Ferret::Index::Index.new(:auto_flush =>true, ... other 

options …)

and you could happily have multiple processes modifying the index
without ever having to worry.