I have a general question about using a Ferret/Lucene index for
grouping results. I am not sure how much of the heavy lifting the
index can do for me, so I would appreciate any input. I am using
ferret to index some objects that have the following properties:
url, image_url, price, tags (space separated tags), created_at
I would like search the index for any documents that match a specific
tag. The way these results will be processed is as follows:
Each URL must be unique in the results. If there are duplicates, I
would like to merge the results using some fuzzy merge criteria.
Ideally, this merge would take the most common occurrence of each of
the properties and apply them to the final single result.
My current thoughts on how to implement this is to search the index
using a standard search and sorting by the URL. Then I will just
manually apply the merge logic to each set of URLs.
Does this sound reasonable?