Ferret 0.11.4.win32 indexing speed vs Ferret 0.10.9.win32

Firstly, thanks Dave for all your hard work. Ferret Rocks!,

I am just testing 0.11.4.win32 and it seems to work just fine, however
the index creation phase of my app is perhaps 3x slower under 0.11.4 vs
0.10.9

Details follow:

System: windows xp sp2, index on local hard disk, Ruby 1.8.6

Run #1, Ferret 0.10.9

  • Reboot
  • Build index, 35,000 rows added in 297 seconds

Run #2, Ferret 0.11.4

  • Reboot
  • Build index, 35,000 rows added in 1044 seconds

Searching both indexes “feels” about the same

Any comments on whether Ferret 0.11.4 should be much slower for bulk
inserts ?

Kind regards

Neville

On 4/12/07, Neville B. [email protected] wrote:

Run #1, Ferret 0.10.9

  •   Reboot
    
  •   Build index, 35,000 rows added in 297 seconds
    

Run #2, Ferret 0.11.4

  •   Reboot
    
  •   Build index, 35,000 rows added in 1044 seconds
    

Ouch, that sucks. There is a difference in indexing speed on Linux too
depending a lot on the parameters you use but bulk indexing is largely
unchanged. The differences are due to the changes I’ve made to make
Ferret more stable when indexing and adding the ability to Ferret to
recover when the index is corrupted. This makes Ferret much slower
when opening an index but the indexing procedure hasn’t changed.

I haven’t really looked at the performance in Windows. A few questions
here might allow me to fix this problem. Are you using the Index class
or the IndexWriter class? What parameters are you passing to the
indexer? I’ll see what I can do but I can’t promise anything.

Searching both indexes “feels” about the same

Searching should be the same, although opening the index for searching
will be slower. But this shouldn’t be done for every search so it
shouldn’t be a problem.

Any comments on whether Ferret 0.11.4 should be much slower for bulk
inserts ?

I guess I already answered this. No, it shouldn’t be slower for bulk
updates. Actually, looking at your times, it seems like you may not
have the optimal settings for indexing as even 297 seconds seems
like a long time to index 35,000 documents although it depends on the
documents and where they are coming from. If you give me a little more
information I may be able to help you speed this up.

Cheers,
Dave

I haven’t really looked at the performance in Windows. A few questions
here might allow me to fix this problem. Are you using the Index class
or the IndexWriter class? What parameters are you passing to the
indexer? I’ll see what I can do but I can’t promise anything.

I’m using IndexWriter.add_document(doc)

For the purposes of the timing comparison, I’m using an empty directory,
and passing :create => true and a :field_infos hash which details
certain fields which indexes but not stored, or vice versa.

it shouldn’t be slower for bulk updates.

I hope I haven’t misused “bulk”

Actually, looking at your times, it seems like you may not
have the optimal settings for indexing as even 297 seconds seems
like a long time to index 35,000 documents although it depends on the
documents and where they are coming from. If you give me a little more
information I may be able to help you speed this up.

Thanks Dave. I’m generating the index for rows from a SQL database and
in general I’m ok with the 297 secs for 35,000 docs, but a 3x hit does
hurt somewhat, particularly for larger SQL databases.

The logic goes something like this:

Create new ferret index
Connect to SQL dbms
For t in table[1…n] do
Prepare sql
For row in resultset do
IndexWriter.add_document(row)
End
End

Each row retrieved from the SQL dbms is a hash of up to 30 fields, and
some fields are longish text [3000chars].
For a baseline, if I comment out the IndexWriter.add_document(row) then
the SQL part of the process only takes around 12 secs, so most of the
work is done by add_document I think.

Thanks for your help,

Nev