Help with Multiple Readers, 1 Writer scenario

Neville_B · August 28, 2006, 9:28am

Hi,

I’m building a web server application using Ferret [thanks so much
Dave], Mongrel and Camping which works fine servicing one request at a
time, but serialises searches if more than one request arrives, so I’d
like some advice please about the best way to use multiple readers and
one writer.

Some background … query requests which in my case are always read
only, arrive via Mongrel, which allocates a thread for each request.
Should I create a new IndexReader for each request also, or can I use
one IndexReader concurrently?

Index updates on the other hand are coordinated by a special Update
Thread which runs every 10 minutes or so. I’m guessing that the best
approach is to create an IndexWriter for each update run, which can be
closed and discarded at the end of the update run. Or can I close and
reuse a single IndexWriter?

I searched http://ferret.davebalmain.com/api for details on the
MultiReader, but I couldn’t find any details. If someone could post a
link to point me in the right direction that would be great.

Thanks so much

Neville

Neville_B · September 1, 2006, 12:19pm

On 8/28/06, Neville B. [email protected] wrote:

Should I create a new IndexReader for each request also, or can I use
one IndexReader concurrently?

Creating a new reader per request is not a good idea since creating a
new IndexReader is an expensive operation (although it has been
significantly improved in version 0.10). A lot of data needs to be
read into memory for fast access. In most situations the ideal
solution is to have a single IndexReader per thread. You can have as
many IndexReaders open on an index as your operating system will
allow.

The one situation where you might be better off using a single
IndexReader is when you are relying on caching. Filters and Sorts are
cached per IndexReader and Sorts in particular can take up a fair
chunk of memory so if you have a large index (large as in number of
documents, not size of data) then you may be better off with a single
IndexReader. IndexReader is thread-safe so using it concurrently
should be fine.

Index updates on the other hand are coordinated by a special Update
Thread which runs every 10 minutes or so. I’m guessing that the best
approach is to create an IndexWriter for each update run, which can be
closed and discarded at the end of the update run. Or can I close and
reuse a single IndexWriter?

You can’t reuse an IndexWriter after it has been closed. But you can
commit the changes to disk;

writer.commit()

IndexWriter#optimize will also commit all changes to disk as an
optimal index but depending on the size of your index you may only
want to call optimize once a day if at all. For a small index however,
calling it every ten minutes is definitely possible.

I searched http://ferret.davebalmain.com/api for details on the
MultiReader, but I couldn’t find any details. If someone could post a
link to point me in the right direction that would be great.

You can actually pass an array of readers as the first (only)
parameter to IndexReader.new.

reader = IndexReader.new([reader1, reader2, reader3])

In the current working version of Ferret you can also pass Directory
objects or paths;

iw = IndexReader.new([dir, dir2, dir3])

iw = IndexReader.new(["/path/to/index1", "/path/to/index2"])

wait for 10.2 for this functionality (and an update to include this
info in the API docs).

Cheers,
Dave