Forum: Ruby Writing to ferret index from multiple processes

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Andreas S. (Guest)
on 2005-12-19 00:08
Hi,

what do I have to do to be able to write a ferret index from multiple
processes at the same time?

I was indexing a lot of documents with a script when another process
made a change to the index; suddenly all of the imported data was gone
from the index, and the import script quit with the exception
"Errno::ENOENT: No such file or directory - ./ferret_index/_1ah.fnm".

auto_flush => true didn't help. Is there something else?

Andreas
David B. (Guest)
on 2005-12-19 05:06
(Received via mailing list)
Hi Andreas,

Can you show me some more code? How are you creating the index?
Perhaps you are setting :create => true in which case it will
overwrite the old index.

Dave
Andreas S. (Guest)
on 2005-12-19 09:01
(Received via mailing list)
David B. wrote:
> Hi Andreas,
>
> Can you show me some more code? How are you creating the index?
> Perhaps you are setting :create => true in which case it will
> overwrite the old index.
>
> Dave

Oops. I am indeed using :create => true. I forgot that I set it because
create_if_missing did not work.

I removed it, but now there is a different problem. When I change the
index while the indexing script is running, it quits, but with another
error message:

316
317
318
RuntimeError: docs out of order curent doc = 9 and previous doc = 17
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_merger.rb:276:in
`append_postings'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_merger.rb:262:in
`append_postings'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_merger.rb:240:in
`merge_term_info'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_merger.rb:215:in
`merge_term_infos'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_merger.rb:176:in
`merge_terms'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/segment_merger.rb:48:in
`merge'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_writer.rb:403:in
`merge_segments'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_writer.rb:371:in
`maybe_merge_segments'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_writer.rb:161:in
`add_document'
        from /usr/local/lib/ruby/1.8/monitor.rb:229:in `synchronize'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index_writer.rb:159:in
`add_document'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index.rb:270:in
`<<'
        from /usr/local/lib/ruby/1.8/monitor.rb:229:in `synchronize'
        from
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.3.2/lib/ferret/index/index.rb:238:in
`<<'
        from ./app/models/search_ferret.rb:38:in `update'
        from (irb):1
David B. (Guest)
on 2005-12-19 12:07
(Received via mailing list)
I'm not to sure about this one. Are you by any chance explicitely
deleting the lock files when your app starts up? I've seen a few
people do that. The only way I can see doc numbers getting out of
order is if you delete the lock files. Any chance I could look at more
of your code? Is this for RForum? Perhaps I could check it out of svn.
Anyway, I hope I can help you out with this.

Dave

PS: If you are interested you should join the Ferret mailing list. You
seem to be doing some more advanced stuff judging from the bugs you're
finding. ;-)
Andreas S. (Guest)
on 2005-12-19 15:20
David B. wrote:
> I'm not to sure about this one. Are you by any chance explicitely
> deleting the lock files when your app starts up?

No.

> I've seen a few
> people do that. The only way I can see doc numbers getting out of
> order is if you delete the lock files. Any chance I could look at more
> of your code? Is this for RForum? Perhaps I could check it out of svn.

It is for RForum. You can see the the code here:
http://rforum.andreas-s.net/trac/file/trunk/app/mo...

My indexing script simply fetches all the posts from the database and
calls Post.search_handler.update(post) for each one. If another process
calls the update method while this script is running, I am getting the
exception. If you need more information to reproduce the problem, please
let me know.

> PS: If you are interested you should join the Ferret mailing list. You
> seem to be doing some more advanced stuff judging from the bugs you're
> finding. ;-)

I didn't know there was a list. I will definetely join it.

Thanks for fixing the other bugs so quickly.

Andreas
David B. (Guest)
on 2005-12-19 17:56
(Received via mailing list)
Hey Andreas,

The latest version of RForum still has :create => true so I'm guessing
you haven't checked in your latest changes. Could you let me know when
you have?

Cheers,
Dave
Andreas S. (Guest)
on 2005-12-19 21:29
David B. wrote:
> Hey Andreas,
>
> The latest version of RForum still has :create => true so I'm guessing
> you haven't checked in your latest changes. Could you let me know when
> you have?

I have checked it in.
Andreas S. (Guest)
on 2005-12-19 21:45
Andreas S. wrote:
> David B. wrote:
>> Hey Andreas,
>>
>> The latest version of RForum still has :create => true so I'm guessing
>> you haven't checked in your latest changes. Could you let me know when
>> you have?
>
> I have checked it in.

Btw, I tried it again on another machine, and couldn't reproduce the
"docs out of order" exception, but instead I got
RuntimeError: could not obtain lock:
./ferret_index/ferret-f62496686e637eca67e933a9cdc5eb21write.lock
David B. (Guest)
on 2005-12-20 03:47
(Received via mailing list)
Hi Andreas,

This is what I would expect to happen. What machine where you running
it on the first time. Whatever it was, Ferret's locking mechanism must
not work.

Anyway, to avoid this problem you need to make sure the batch process
doesn't keep the lock for too long (about 5 seconds). I would change
the rebuild index method to use an IndexWriter or switch auto_flush to
false. This should speed the reindexing up. I'd also add a pause in
there so other processes can get a hold of the lock if they need to.
Since you are flushing explicitly you may as well set auto_flush to
false anyway.

  def index
    @index ||= Index::Index.new(:path => @path,
                                #:auto_flush =>true <= don't use this
anymore
                                :default_search_field => ['subject'],
                                :key => ['id', 'class'])
  end

  # update will continue to work, handling the flushing explicitly
  def update(post)
    index << create_doc(post)
    index.flush
  end

  # batch_update will keep the IndexWriter open between updates
  # so it will run much faster
  def batch_update(post)
    index << create_doc(post)
  end

  # define a flush method for use with the batch_update method
  def flush
    index.flush
  end

Then in your process that is doing the reindex I'd use the
batch_update method and I might even add some pauses in there.
Something like this;
  MAX_ADDS_BEFORE_FLUSH = 10
  def rebuild_index
    i = 0
    Post.find_all_by_deleted(0).each do |post|
      self.update(post)
      i += 1
      if (i % MAX_ADDS_BEFORE_FLUSH) == 0
        self.flush
        sleep(0.5)
      end
    end
  end

These are just ideas. You'll probably come up with something better. I
think the best solution is just to keep the Ferret index in sync with
the database so that you don't need to reindex everything.

Let me know what kind of system you were running it on the first time
to get the documents out of order error. I'll see if I can find out
why the locking wasn't working.

Cheers,
Dave
This topic is locked and can not be replied to.