AAF rebuild_index taking a long time

Hello,

First of all, thank you to the people behind Ferret, AAF, and everyone
who’s been a part of this forum. Your code, problems and insights have
already been a great help to me with my current project.

I’m working on a rails project that uses AAF to search an “Article”
model. I have approximately 24,000 records for a total database size of
around 313mb. Each record contains a medium text field that contains a
large amount of xml.

It used to take Article.rebuild_index around 40 min to index the data.
Lately, it has started taking several hours and often, it doesn’t
complete before I get an error saying it’s lost the mysql connection.
As far as I can determine, nothing changed with the code between when it
was taking 40 min to now.

Here’s my aaf line:
acts_as_ferret :fields => {:title =>{:store=>:yes}, :authors
=>{:store=>:yes},
:volume =>{:store=>:yes}, :year =>{:store=>:yes}, :fpage
=>{:store=>:yes},:abstract=>{:store=>:yes},:article_text =>
{:store=>:yes}, :issue => {:store=>:yes},:pub_date_sort => {:index =>
:untokenized}},:remote=>true

Ruby Version 1.8.5
AAF Version 0.4.3
Ferret Version 0.11.6
Rails Version 1.2.6
Mysql gem 2.7

Here’s some output from ferret_index.log
reindex model Article : 0.00% complete : 7670.28 secs to finish
reindex model Article : 4.18% complete : 14270.12 secs to finish

Then is stopped with the following error in the console

ActiveRecord::StatementInvalid: Mysql::Error: Lost connection to MySQL
server during query: SELECT * FROM articles ORDER BY id ASC LIMIT 2000,
1000
from
/opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/abstract_adapter.rb:128:in
log' from /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/mysql_adapter.rb:243:inexecute’
from
/opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/mysql_adapter.rb:399:in
select' from /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/abstract/database_statements.rb:7:inselect_all’
from
/opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/base.rb:427:in
find_by_sql' from /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/base.rb:997:infind_every’
from
/opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/base.rb:418:in
find' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/local_index.rb:219:inreindex_model’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/local_index.rb:217:in
step' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/local_index.rb:217:inreindex_model’
from
/opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/abstract/database_statements.rb:59:in
transaction' from /opt/local/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/transactions.rb:95:intransaction’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/local_index.rb:216:in
reindex_model' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/local_index.rb:47:inrebuild_index’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/local_index.rb:46:in
each' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/local_index.rb:46:inrebuild_index’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/local_index.rb:23:in
ensure_index_exists' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/local_index.rb:9:ininitialize’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:233:in
new' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:233:increate_index_instance’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:25:in
aaf_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:15:inrebuild_index’

I don’t always get that Mysql error, sometimes it does rebuild, but it
takes a good 2 - 3 hours.

Anyone have any ideas on what to try or where to look? I’ve poured over
the forum and google and the only thing I found, which I haven’t tried
to implement yet, is that thread on here about parallelizing the
indexing.

I’ve tried running with and without drb / remote=>true and there’s no
different.

Thank you!
-km