Index::Index.new vs. Readers and Writers


#1

Hey gang,

A post on the Rails forum a while back had it sound like you pretty much
had to use the Index Readers & Writers if you were going to be
potentially accessing an index from more than one process. (i.e.
multiple dispatch.fcgi’s, etc)

Is this still the case, or does the main Index class do that black magic
behind the scenes? =)

I was having trouble implementing the Readers & Writers so I thought I’d
post an example stub of what I have here. Any feedback would be much
appreciated.

Non-Reader/Writer Example - Main Index::Index.new only

works like a charm but haven’t tried firing up a bunch to see if we

get IO blocks.

require ‘ferret’

class SearchEngine
include Ferret
include Ferret::Document

def self.get_index()
index_dir = “/var/search/index”

index = Index::Index.new(:path => index_dir,
                       :create_if_missing => true)
return index

end
end

Reader/Writer Example

require ‘ferret’

class SearchEngine
include Ferret
include Ferret::Document

Creates or returns an existing index for an organization

def self.get_index(type = ‘writer’)
index_dir = “/var/search/index”
if type == ‘writer’
index = Index::IndexWriter.new(index_dir,
:create_if_missing => true)
elsif type == ‘reader’
index = Index::IndexReader.open(index_dir, false)
end
return index
end
end

Thanks!!

  • Shanti

#2

Hi Shanti,

When you have multi processes accessing the index, it’s not a matter
of which class you use but how many processes you have writing to the
index. The recommended way to do things is to have only one process
writing to the index. You can have as many index readers open as you
like. The trouble is that the IndexWriter opens a commit lock on the
index. If another IndexWriter comes along and tries to open the lock
at the same time it will raise an exception. The same thing goes for
using the Index class as it is just really a simple interface to the
IndexWriter and IndexReader classes.

One possibility is to use the Index class with :autoflush set to true.
This should work most of the time as the IndexWriter class will keep
trying for 5 seconds (broken in C version of 0.9.0, 0.9.1) to gain the
commit lock so if it misses the first time it should eventually get
it. This is an easy way to do things but it’s still dangerous. I’d
recommend using a single IndexWriter as described above. That doesn’t
mean you have to use the IndexWriter and IndexReader classes. You can
still use the Index class as long as only one Index is doing the
writing.

I hope that helps. Stay tuned for much better documentation on this.

Dave


#3

Hi David,

Thanks for the heads up re: index readers & writers.

Just one more question: how do you search an Index in read-only mode?

The :autoflush option sounds like a viable backup scenario as well, but
I couldn’t find anything in the docs about it. (tried passing it into
index via something like: Index::Index.new(:autoflush => true) but it
dodn’t like that either)

Cheers,

  • Shanti

David B. wrote:

Hi Shanti,

When you have multi processes accessing the index, it’s not a matter
of which class you use but how many processes you have writing to the
index. The recommended way to do things is to have only one process
writing to the index. You can have as many index readers open as you
like. The trouble is that the IndexWriter opens a commit lock on the
index. If another IndexWriter comes along and tries to open the lock
at the same time it will raise an exception. The same thing goes for
using the Index class as it is just really a simple interface to the
IndexWriter and IndexReader classes.

One possibility is to use the Index class with :autoflush set to true.
This should work most of the time as the IndexWriter class will keep
trying for 5 seconds (broken in C version of 0.9.0, 0.9.1) to gain the
commit lock so if it misses the first time it should eventually get
it. This is an easy way to do things but it’s still dangerous. I’d
recommend using a single IndexWriter as described above. That doesn’t
mean you have to use the IndexWriter and IndexReader classes. You can
still use the Index class as long as only one Index is doing the
writing.

I hope that helps. Stay tuned for much better documentation on this.

Dave


#4

Hi Shanti,

It’s :auto_flush, not :autoflush. Sorry for the confusion.

Dave