Term frequency doesn't decrement after document is deleted

Hey all,

The frequency count returned by my ferret reader doesn’t decrement
after I remove a documents with those terms. Using the example from
http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html
the frequency increments after a document is added but stays the same
after a document is deleted.

index.reader.terms(:tags).each do |term, freq|
“#{term} appears #{freq} times”
end

If I iterate through each document matched by terms_for I get the
correct frequency but I assume at a higher performance cost.

index.reader.terms(:tags).each do |term|
freq = index.reader.terms_for(:tags, term).each{}
“#{term} appears #{freq} times”
end

I’m wondering if I’m plain just doing something wrong. I’m running the
gem version 0.11.6 (ruby) on i686-darwin9.1.0 and I can provide a unit
test if it’d help.

Cheers,
Shane.

Hi!

I’m not sure if this is the intended behaviour, so it might be a Ferret
bug indeed.

However you should get the correct term frequency again after
optimizing the index.

Cheers,
Jens

On Thu, Dec 06, 2007 at 10:47:09AM +1100, Shane Hanna wrote:

end
gem version 0.11.6 (ruby) on i686-darwin9.1.0 and I can provide a unit
test if it’d help.

Cheers,
Shane.


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

Indeed looks like a bug. I’ve gone through a small hell recently
because of a similar issue =)

index.size also suffers from the same problem. Apparently values for
num_docs (or you tell me what it is exactly if I’m getting it wrong)
get cached in IndexReader and when you call it, it returns values that
are not necessarily consistent with what’s actually in the index.

Also in this same situation, index.optimize before index.size solves
the problem.