I haven’t really looked at the performance in Windows. A few questions
here might allow me to fix this problem. Are you using the Index class
or the IndexWriter class? What parameters are you passing to the
indexer? I’ll see what I can do but I can’t promise anything.
I’m using IndexWriter.add_document(doc)
For the purposes of the timing comparison, I’m using an empty directory,
and passing :create => true and a :field_infos hash which details
certain fields which indexes but not stored, or vice versa.
it shouldn’t be slower for bulk updates.
I hope I haven’t misused “bulk”
Actually, looking at your times, it seems like you may not
have the optimal settings for indexing as even 297 seconds seems
like a long time to index 35,000 documents although it depends on the
documents and where they are coming from. If you give me a little more
information I may be able to help you speed this up.
Thanks Dave. I’m generating the index for rows from a SQL database and
in general I’m ok with the 297 secs for 35,000 docs, but a 3x hit does
hurt somewhat, particularly for larger SQL databases.
The logic goes something like this:
Create new ferret index
Connect to SQL dbms
For t in table[1…n] do
For row in resultset do
Each row retrieved from the SQL dbms is a hash of up to 30 fields, and
some fields are longish text [3000chars].
For a baseline, if I comment out the IndexWriter.add_document(row) then
the SQL part of the process only takes around 12 secs, so most of the
work is done by add_document I think.
Thanks for your help,