List of terms matched by a query (and their position/offset)

Karl_Meisterheim · October 31, 2008, 4:00pm

Hi Jens,

Thanks for the reply.

What you say makes sense, but I’m hoping for a simpler implementation.
I guess it boils down to this:
When I conduct a search in ferret / AAF, I get an array of documents
back. Somehow, the highlight method knows where the terms that were
matched by the search exist in those documents / field. Is there
anyway that I can get that information? I looked through the API and
even the source and unfortunately, couldn’t quite grok how it was
happening.

This will allow me to do the following, I can chunk together several
distinct pieces of information into one field for the purpose of
indexing. Then, if I know which terms were matched and their position
in the field, I can use that information to figure out which piece of
information in that single field it came from.

For example, if my document has six figures, I take the caption text
of those six figures and concatenate them all together and index them
in one field, captions. Then, when the document comes back as a
match, if I knew that the term “empire” matched whatever search query
was used, and that the offset was 100 in my captions field, I could
piece together which of the original illustrations it belongs to.
Again, this is all necessary because I don’t know in advance how many
figures a given document may contain.

I know this sounds overly complicated, but I think it’s easier than
creating a new model, a separate index and then having to search
multiple times etc.

Does this make sense, or am I going at this completely wrong?

Thank you,

-km