Memory concerns ferret 11.4

cubiq · June 4, 2007, 1:30pm

Hi list,

We just built our own ferret drb server (mostly because we don’t do
an indexing from within rails).

The ferret drb server only handles index inserts and some deletes.
Usually we make batch inserts were we retrieve a couple of hundred or
thousands of documents from a database and then inserts them inte
ferret one by one.
We call flush every 50th file. We are very impressed with the insert
speeds 56 000 documents with varying size in 32 minutes.

When started the ferret drb server takes about 9 MB ram but after its
been running for a while doing some indexing it reaches about 150 MB
RAM and when indexing is finished it still stays around 130 MB.
We do manual GC.start at the end of every batch indexing.

The index is now about 2.7 Gb.

Any suggestions on what can be wrong?
Maybe its natural for a ferret drb with an 2.7G index to use that
much memory when idle?

Please let me know if you need any more info.

Regards,
Henrik

cubiq · June 6, 2007, 12:42pm

5 jun 2007 kl. 18:32 skrev John L.:

Hi John,

HI Henrik,

when the IndexWriter is opened, the term dictionary is loaded into
RAM.
So memory usage is certainly dependent on the number of unique
terms in
the index.

OK, interesting.
I’ll do some more testing eliminating as much non-ferret code as
possible to see what is making my ferret_server eat up about 130-150
MB of ram after it has been running for a while.

The entire term dictionary isn’t actually loaded, just an even
spread of
terms. The :index_skip_interval parameter allows you to twiddle this
spread - the higher the skip interval, the less memory will be
used, but
the slower your searches.

Right now in my code I use
def initialize
@index = Index::Index.new( :path => SafeCube::FERRET_INDEX_PATH )
end
But as this drb server is for writing and deleting only, should I
specifically create an IndexWriter instead?

Then In my rails application I can specify an IndexReader instead as
I only do searches for there.

Would this change anything?

Play with this parameter and see if it improves things for you - if
not,
at least you know it’s not down to having lots of unique terms.

I’ll try setting some different high low values and see if Ic an
control the amounts of RAM taken.

Thanks again for the info John!

cubiq · June 5, 2007, 6:34pm

Hi Henrik,

when the IndexWriter is opened, the term dictionary is loaded into RAM.
So memory usage is certainly dependent on the number of unique terms in
the index.

The entire term dictionary isn’t actually loaded, just an even spread of
terms. The :index_skip_interval parameter allows you to twiddle this
spread - the higher the skip interval, the less memory will be used, but
the slower your searches.

Play with this parameter and see if it improves things for you - if not,
at least you know it’s not down to having lots of unique terms.

Tbh, probably a long shot, but worth a look.

John.

On Mon, 2007-06-04 at 12:29 +0200, Henrik Z. wrote:

speeds 56 000 documents with varying size in 32 minutes.
much memory when idle?

Please let me know if you need any more info.

Regards,
Henrik
–
http://johnleach.co.uk