Acts _As_Ferret - How to confirm Indexing is complete?

Hello I have a couple of questions, Hope someone here can help answer
them.

I am using acts_as_ferret on a model Item with around 10 million rows.
I use Item.rebuild_index at the ruby console to build the index. It
seems to run for at least 48 hours when building.

My questions are:

  1. How do you know when the indexing is over and complete?
  2. How can you confirm that ALL records in the table were indexed?
    (especially since the table runs into millions of records)

Thanks!!

Hi!

On Sun, Feb 25, 2007 at 06:20:55AM +0100, Jen wrote:

Hello I have a couple of questions, Hope someone here can help answer
them.

I am using acts_as_ferret on a model Item with around 10 million rows.
I use Item.rebuild_index at the ruby console to build the index. It
seems to run for at least 48 hours when building.

My questions are:

  1. How do you know when the indexing is over and complete?

indexing is done when rebuild_index returns. atm there is no logging of
the progress rebuild_index already has made with a running rebuild.

However I’m thinking about adding some kind of logging now.

  1. How can you confirm that ALL records in the table were indexed?
    (especially since the table runs into millions of records)

if rebuild_index returns normally and no error is thrown, I’d say it was
successful and indexed all your records. To make sure you have all 10
million documents in the index, you can inspect the index with a small
script like that:

require ‘rubygems’
require ‘ferret’
reader = Ferret::Index::IndexReader.new(‘path/to/index’)
puts “#{reader.num_docs} documents in index”

cheers,
Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Thanks, Jens! I will try your suggestion. It would be nice to have the
logging thing if you plan to add it in - esp for builds that take a
loooong time :slight_smile:

Btw is there any way to speed up the build process?

Thanks, again…
-Jen

Jens K. wrote:

Hi!

On Sun, Feb 25, 2007 at 06:20:55AM +0100, Jen wrote:

Hello I have a couple of questions, Hope someone here can help answer
them.

I am using acts_as_ferret on a model Item with around 10 million rows.
I use Item.rebuild_index at the ruby console to build the index. It
seems to run for at least 48 hours when building.

My questions are:

  1. How do you know when the indexing is over and complete?

indexing is done when rebuild_index returns. atm there is no logging of
the progress rebuild_index already has made with a running rebuild.

However I’m thinking about adding some kind of logging now.

  1. How can you confirm that ALL records in the table were indexed?
    (especially since the table runs into millions of records)

if rebuild_index returns normally and no error is thrown, I’d say it was
successful and indexed all your records. To make sure you have all 10
million documents in the index, you can inspect the index with a small
script like that:

require ‘rubygems’
require ‘ferret’
reader = Ferret::Index::IndexReader.new(‘path/to/index’)
puts “#{reader.num_docs} documents in index”

cheers,
Jens


Jens Kr�mer
webit! Gesellschaft f�r neue Medien mbH
Schnorrstra�e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

On Tue, Feb 27, 2007 at 06:19:00PM +0100, Jen wrote:

Thanks, Jens! I will try your suggestion. It would be nice to have the
logging thing if you plan to add it in - esp for builds that take a
loooong time :slight_smile:

Btw is there any way to speed up the build process?

if you have enough ram you can increase the batch size used during
rebuilding (declared class_methods.rb, look for batch_size), that should
result in less database calls.

You can also limit the number of fields you index by excplicitly naming
the fields you need to search in in your call to acts_as_ferret, if you
don’t do this already.

cheers,
Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa