Concurrency errors adding to a keyed index

Hi,

I’m adding some news articles to a keyed Ferret 0.10.14 index and
encountering quite serious instability when concurrently reading and
writing to the index, even though with just 1 writer and 1 reader
process.

If I recreate the index without a key, concurrent reading and writing
seem to work fine (and indexing is about 10 times quicker :slight_smile:

I’m testing by running my indexing script (which retrieves up to 1000
database records using ActiveRecord, adds to the index and exits) and
concurrently manually re-running a search on the index using my Rails
web interface. This is in a dev environment with only 1 user (me) and
about 58000 docs.

The error I get is along the lines of the following, with a different
filename each time:

IO Error occured at <except.c>:79 in xraise
Error occured in fs_store.c:324 - fs_open_input
couldn’ferret_index/development/news_article_versions/_2ih.tix:

/usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:682:in
initialize' /usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:682:in ensure_reader_open’
/usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:385:in
[]' /usr/lib/ruby/1.8/monitor.rb:229:in synchronize’
/usr/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:384:in
[]' #{RAILS_ROOT}/app/models/news_article_version.rb:35:in ferret_search’
#{RAILS_ROOT}/app/models/news_article_version.rb:35:in ferret_search' #{RAILS_ROOT}/app/controllers/news_articles_controller.rb:56:in search’

It seems to occur roughly once per batch, and usually towards the end of
the batch. I’m not using aaf. I create my keyed index like this:

@@ferret_index = Index::Index.new(:path =>
“#{RAILS_ROOT}/ferret_index/#{RAILS_ENV}/news_article_versions”,
:field_infos => field_infos,
:id_field => :id,
:key => :id,
:default_input_field => :text)

Unkeyed, I just drop the :key option (duh). :id is just the
ActiveRecord id, from an auto_increment field in MySQL.

As a note, when concurrently searching on the keyed index, the number of
hits returned increases throughout the indexing process. With a
non-keyed index, the number of hits doesn’t increase until the end.

It looks to me that when using a keyed index, Ferret commits each record
added. When non-keyed, it commits when the Index is closed. That I
don’t get the error with non-keyed might just be because there are less
commits, so less opportunities for the “bug” to trigger.

Is this is bug I’ve come across? Is concurrent reading/writing like
this expected to work?

I’m using Ferret 0.10.14 on Ubuntu Edgy, with “ruby 1.8.4 (2005-12-24)
[i486-linux]” and “gcc version 4.1.2 20060928”

Thanks in advance!

John

http://johnleach.co.uk