Balancing relevancy and recentness


#1

I was wondering if there was a good way to either balance the relevancy
score with recentness of matching documents- or include the recentness
in the score somehow?

Thanks,
Ben


#2

Hi Ben,

Currently there is no way to do this. You can easily sort by the age
of the document but to score by the age of the document is not
possible without making a change to Ferret. Mark J. came up with
this idea recently;

One other mod to Ferret I’ve found useful is to add
the following line at the top of the each_hit() block
in Search::IndexSearcher.search:

 score = yield( doc, score ) if block_given?

This allows a block attached to a search call to adjust
document scores before documents are sorted, based on
some (possibly dynamic) numerical factors associated
with the document, e.g. the number and importance

With this change you’d be able to modify the score based on the age of
the document. Hope that helps.

Cheers,
Dave


#3

As long as Ferret does what Lucene does with boosts, you could scale
document boosts at indexing time by some factor related to age and
that will factor into scoring. Right, Dave?

For a real-world example of this, look at TheServerSide case study in
“Lucene in Action” and online here:

<http://www.theserverside.com/articles/article.tss?l=ILoveLucene>

(search for “boost” to hone in on the specific topic)

Erik

#4

On 1/23/06, Erik H. removed_email_address@domain.invalid wrote:

As long as Ferret does what Lucene does with boosts, you could scale
document boosts at indexing time by some factor related to age and
that will factor into scoring. Right, Dave?

Sorry for the slow reply. Sure you could do this. Ferret handles
boosts in exactly the same way.