We all know that using ferret/aaf without the drb server is not
thread-safe-- but why not? Would it be so hard to sacrifice
performance by using a simple locking system?
Very often I run into a situation where I want to quickly stage a
project, and I want to use a few mongrels but don’t want to configure
every last piece of the system, including the drb server. It would be
nice if I didn’t have to worry about index corruption.
Just a thought.
Please anyone correct me if I’m wrong.
I think it’s because the thread safety is implemented at the class
level. Which means it’s ok to share a single instance of IndexWriter
across many threads. But when you have a multi-process model going
(Rails), you effectively have many different programs accessing the
same index files, thus running into file locking issues.
Implementing a shared locking mechanism that’s fast enough not to get
in the way of performance (in a really bad way) is a subject for a lot
of research. The first idea that springs to my mind is using a
tempfile based one. Solves the problem, but you can kiss goodbye to
fast indexing. (e.g.: if File.exists? ‘foo’ # index is locked, wait a
bit and try again)
One other solution is to have a common daemon that all processes share
and handle file locking in memory, and that’s precisely what the DRb
server is. Perhaps a quickie would be to tie the DRb server to a
thread that runs along with Rails. This would make it transparent.
I think I’ll risk it and reply later with a hack.