Index browser inconsistent with IndexReader


#1

Hi,

We have an index of around 1M web pages as part of our web app. The
app uses ferret by way of RDig to perform searches. We have noticed
anecdotally that some searches don’t work the way we thought they
should, as if documents were missing from the index. Yesterday we
came upon a concrete instance of this.

Our documents have several fields, one of which is called :keywords
and another called :data, both of which are used for searching. We
isolated a single document that is not found on the web app by terms
in the :data field, but which can be found by the terms in its
:keywords field.

We assumed first that a problem occurred in the indexing which
resulted in the :data field being lost. However, the index browser
that’s included with version 0.11.4 showed the document with all its
fields intact, including the :data field. All the :data field terms
that failed to retrieve the document on the web app were indeed
present, according to the browser.

We then built a short script with the API that instantiated an
IndexReader and called IndexReader.term_vectors() with the id of our
subject doc. The term_vectors returned included a vector for
:keywords, but not for :data.

Somehow the core API funcs are not finding this document’s :data field
when the 0.11.4 browser is. Are there differences between the two
that would explain this? Does this problem description ring a bell
with anyone out there?

Many thanks.