Many index files

I’m using acts_as_ferret and have indexed a model with acts_as_ferret
:fields => [:name, :ascii_name, :alt_names], :single_index => true.

Now in the index directory more than 95.000 files are generated! The
number of tuples I’m indexing is approx. 86.000.

I can’t remember this from earlier ferret/acts_as_ferret versions where
I’ve indexed millions of tuples without having such a number of files.

Is there a way of reducing the number of index files? What are the
consequences?

Thanks.

On Mon, Mar 19, 2007 at 06:46:00AM +0100, Star B. wrote:

I’m using acts_as_ferret and have indexed a model with acts_as_ferret
:fields => [:name, :ascii_name, :alt_names], :single_index => true.

Now in the index directory more than 95.000 files are generated! The
number of tuples I’m indexing is approx. 86.000.

That doesn’t sound ok. Is the index useable? And did you do a rebuild
that resultet in this index or was it normal application usage?

I can’t remember this from earlier ferret/acts_as_ferret versions where
I’ve indexed millions of tuples without having such a number of files.

Is there a way of reducing the number of index files? What are the
consequences?

Try to optimize the index - either directly with Ferret
i = Ferret::I.new(:path => ‘path/to/index’)
i.optimize

or via aaf:

Model.aaf_index.ferret_index.optimize

Thanks.


Posted via http://www.ruby-forum.com/.


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

On Mon, Mar 19, 2007 at 06:46:00AM +0100, Star B. wrote:

I’m using acts_as_ferret and have indexed a model with acts_as_ferret
:fields => [:name, :ascii_name, :alt_names], :single_index => true.

Now in the index directory more than 95.000 files are generated! The
number of tuples I’m indexing is approx. 86.000.

That doesn’t sound ok. Is the index useable? And did you do a rebuild
that resultet in this index or was it normal application usage?

I can’t remember this from earlier ferret/acts_as_ferret versions where
I’ve indexed millions of tuples without having such a number of files.

Is there a way of reducing the number of index files? What are the
consequences?

Try to optimize the index - either directly with Ferret
i = Ferret::I.new(:path => ‘path/to/index’)
i.optimize

or via aaf:

Model.aaf_index.ferret_index.optimize

Thanks.

The index is usable (although doesn’t seem to be the fastest) and is a
direct result of Model.rebuild_index. The index wasn’t built up step by
step from application usage, but with a singel rebuild_index from a
filled DB.

starburger

I have seen this issue with Ferret also. Somehow, a working index had
nearly 250,000 files, requiring 2.5GB. Rebuilding the index resulted in
the count dropping to 900 files requiring only 700MB.

On Mon, Mar 19, 2007 at 04:25:08PM +0100, Star B. wrote:
[…]

The index is usable (although doesn’t seem to be the fastest) and is a
direct result of Model.rebuild_index. The index wasn’t built up step by
step from application usage, but with a singel rebuild_index from a
filled DB.

I guess optimizing the index didn’t solve the problem?

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

On Mon, Mar 19, 2007 at 04:25:08PM +0100, Star B. wrote:
[…]

The index is usable (although doesn’t seem to be the fastest) and is a
direct result of Model.rebuild_index. The index wasn’t built up step by
step from application usage, but with a singel rebuild_index from a
filled DB.

I guess optimizing the index didn’t solve the problem?

Jens


Jens Kr�mer
webit! Gesellschaft f�r neue Medien mbH
Schnorrstra�e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Location.aaf_index.ferret_index.optimize

resluted in “=> nil” and took a fraction of a second only. The index
structure didn’t change.

On 3/20/07, Star B. [email protected] wrote:

I guess optimizing the index didn’t solve the problem?

Most files (e.g. _j2.cfs) are listed as 1kb only.

I’ve read in earlier posts that these might be temporary files that
ferret couldn’t delete for some reason. Is there a way to fix that?

Could you send me a full listing of the directory privately, as well
as a copy of the segments_* file. That would be a big help in
debugging this problem.

Cheers,
Dave

Star B. wrote:

Jens K. wrote:

On Mon, Mar 19, 2007 at 04:25:08PM +0100, Star B. wrote:
[…]

The index is usable (although doesn’t seem to be the fastest) and is a
direct result of Model.rebuild_index. The index wasn’t built up step by
step from application usage, but with a singel rebuild_index from a
filled DB.

I guess optimizing the index didn’t solve the problem?

Jens


Jens Kr�mer
webit! Gesellschaft f�r neue Medien mbH
Schnorrstra�e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Location.aaf_index.ferret_index.optimize

resluted in “=> nil” and took a fraction of a second only. The index
structure didn’t change.

Again - the number of files were the result of one indexing process:
Model.rebuild_index.

Most files (e.g. _j2.cfs) are listed as 1kb only.

I’ve read in earlier posts that these might be temporary files that
ferret couldn’t delete for some reason. Is there a way to fix that?