On Mar 23, 2007, at 10:46 AM, Matt S. wrote:
I personally would love some support for multi-threaded write
locking, built-in. It’s pretty easy this days to set up a
multithreaded Rails/Ferret server using Mongrel and Lighttpd.
I’m not sure if Dave’s solved a problem that neither Lucene nor
KinoSearch has solved, but I’d say it’s difficult to outright
impossible to allow more than one write process access to the index
at any given moment under the segmented, write-once model used by all
of us.
What is possible is to manage access to an index on a shared volume
so that an active write process causes all other attempts to open a
write process to fail, including those from other machines. The key
is to put the write.lock file in the index directory, rather than in
the temp directory – since the temp directory is per-machine, no
other machine knows about another machine’s lock files and write
processes may stomp each other.
I believe the default location of the lock directory was changed in
Lucene in 2.1 (if not the change is in svn trunk). It changed in
KinoSearch as of 0.20_01, though with a twist that makes things more
convenient for everyone else at a minor cost to NFS users:
Concurrency
Only one InvIndexer may write to an invindex
at a time. If a write lock cannot be secured,
new() will throw an exception.
If your an index is located on a shared volume,
each writer application must identify itself by
passing a LockFactory to InvIndexer's constructor,
or index corruption will occur.
Imposing that condition means that stale lock files associated with
dead pids can be zapped automatically by default.
In earlier versions of Lucene, it’s possible to specify a global lock
dir location, putting it on the shared volume for example and
allowing multiple machines to become aware of each other’s lock
files. It wouldn’t surprise me if Dave had duplicated that in Ferret.
It’d also be nice if the docs gave special warning for this case.
It came pretty unexpectedly.
NFS is bleedin’ PITA to support because it doesn’t do “delete-on-last-
close” and flock/fcntl locking is unreliable on so many operating
systems. What I’d really like to do is detect NFS somehow and throw
errors at construction time, but since that’s not realistic, there
are moderately prominent warnings now in the KS docs.
It’s not an ideal set-up because inevitably some fraction of users
will get burned when they move their indexes to NFS without taking
stock of the warnings, but without getting into the gory details,
I’ll just say that’s hard to avoid.
Marvin H.
Rectangular Research
http://www.rectangular.com/