Forum: Ferret Large index performance = 8x decrease

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
9a5cd797facf14a7cecb9607b2333a19?d=identicon&s=25 Mike Daly (mike)
on 2007-05-10 20:48
hi,

i'm indexing a really large db table (~4.2 million rows).  i've noticed
that after ~2M records, index performance decreases by almost an order
of magnitude.  full dataset graph here:
http://i122.photobucket.com/albums/o244/spokeo/ind...

here's a couple best-fit lines that represent the data points:
0-2M  : y = 78.65655x + 144237.5
2.5M+ : y = 10.79832x + 1980630

the part that strikes me as most odd is the bend between 2M and 2.5M.  i
haven't read the ferret algorithm, but i would expect either linear or
hyperbolic performance over time.  however, the graph seems to indicate
a particular breaking point after which performance is cut by 8x.  is
this behavior normal/expected?  is there anything i could be doing to
speed up an index of this size? (the index grows to ~12G while indexing,
then gets shrunk to ~6G by the optimization)

thanks for the help!
-m


--- MODEL CODE
class MyModel < ActiveRecord::Base
  # think of body/title in terms of an average blog
  acts_as_ferret :fields => { 'body' => {}, 'title' => { :boost => 2 } }
end

--- INDEX CODE
index =
Ferret::Index::Index.new(MyModel.aaf_configuration[:ferret].dup.update(:auto_flush
=> false, :field_infos => MyModel.aaf_index.field_infos, :create =>
true))

n = 0
BATCH_SIZE = 1000

while true
  # new index from scratch
  records = MyModel.find(:all, :limit => BATCH_SIZE, :offset => n,
:select =>
"id,#{MyModel.aaf_configuration[:ferret_fields].keys.join(',')}")
  break if (!records || records.length == 0)

  records.each do |record|
    index << record.to_doc  # aaf method
  end

  n += BATCH_SIZE
end

index.flush
index.optimize  # 30+ minutes =(
index.close


--- CONFIG
> gem list | grep ferret
acts_as_ferret (0.4.0)
ferret (0.11.3)

> uname -a
Linux gentoo 2.6.20-hardened #3 SMP Fri Mar 30 19:27:10 UTC 2007 x86_64
Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux
This topic is locked and can not be replied to.