Forum: Ferret Ferret 0.11.4.win32 indexing speed vs Ferret 0.10.9.win32

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
332b27a7e8304eaaf473752432a3f244?d=identicon&s=25 Neville Burnell (Guest)
on 2007-04-12 03:07
(Received via mailing list)
Firstly, thanks Dave for all your hard work. Ferret Rocks!,

I am just testing 0.11.4.win32 and it seems to work just fine, however
the index creation phase of my app is perhaps 3x slower under 0.11.4 vs
0.10.9

Details follow:

System: windows xp sp2, index on local hard disk, Ruby 1.8.6

Run #1, Ferret 0.10.9
-  Reboot
-  Build index, 35,000 rows added in 297 seconds
-
Run #2, Ferret 0.11.4
-  Reboot
-  Build index, 35,000 rows added in 1044 seconds

Searching both indexes "feels" about the same

Any comments on whether Ferret 0.11.4 should be much slower for bulk
inserts ?

Kind regards

Neville
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2007-04-12 13:04
(Received via mailing list)
On 4/12/07, Neville Burnell <Neville.Burnell@bmsoft.com.au> wrote:
> Run #1, Ferret 0.10.9
> -       Reboot
> -       Build index, 35,000 rows added in 297 seconds
> -
> Run #2, Ferret 0.11.4
> -       Reboot
> -       Build index, 35,000 rows added in 1044 seconds

Ouch, that sucks. There is a difference in indexing speed on Linux too
depending a lot on the parameters you use but bulk indexing is largely
unchanged. The differences are due to the changes I've made to make
Ferret more stable when indexing and adding the ability to Ferret to
recover when the index is corrupted. This makes Ferret much slower
when opening an index but the indexing procedure hasn't changed.

I haven't really looked at the performance in Windows. A few questions
here might allow me to fix this problem. Are you using the Index class
or the IndexWriter class? What parameters are you passing to the
indexer? I'll see what I can do but I can't promise anything.

> Searching both indexes "feels" about the same

Searching should be the same, although opening the index for searching
will be slower. But this shouldn't be done for every search so it
shouldn't be a problem.

> Any comments on whether Ferret 0.11.4 should be much slower for bulk
> inserts ?

I guess I already answered this. No, it shouldn't be slower for bulk
updates. Actually, looking at your times, it seems like you may not
have the optimal settings   for indexing as even 297 seconds seems
like a long time to index 35,000 documents although it depends on the
documents and where they are coming from. If you give me a little more
information I may be able to help you speed this up.

Cheers,
Dave
332b27a7e8304eaaf473752432a3f244?d=identicon&s=25 Neville Burnell (Guest)
on 2007-04-13 02:33
(Received via mailing list)
> I haven't really looked at the performance in Windows. A few questions
> here might allow me to fix this problem. Are you using the Index class
> or the IndexWriter class? What parameters are you passing to the
> indexer? I'll see what I can do but I can't promise anything.

I'm using IndexWriter.add_document(doc)

For the purposes of the timing comparison, I'm using an empty directory,
and passing :create => true and a :field_infos hash which details
certain fields which indexes but not stored, or vice versa.

> it shouldn't be slower for bulk updates.

I hope I haven't misused "bulk"

> Actually, looking at your times, it seems like you may not
> have the optimal settings   for indexing as even 297 seconds seems
> like a long time to index 35,000 documents although it depends on the
> documents and where they are coming from. If you give me a little more
> information I may be able to help you speed this up.

Thanks Dave. I'm generating the index for rows from a SQL database and
in general I'm ok with the 297 secs for 35,000 docs, but a 3x hit does
hurt somewhat, particularly for larger SQL databases.

The logic goes something like this:

Create new ferret index
Connect to SQL dbms
For t in table[1..n] do
  Prepare sql
  For row in resultset do
    IndexWriter.add_document(row)
  End
End

Each row retrieved from the SQL dbms is a hash of up to 30 fields, and
some fields are longish text [3000chars].
For a baseline, if I comment out the IndexWriter.add_document(row) then
the SQL part of the process only takes around 12 secs, so most of the
work is done by add_document I think.

Thanks for your help,

Nev
This topic is locked and can not be replied to.