Custom highlighter/match vector access?

luislavena · February 23, 2011, 5:16pm

Hi everyone,

I know from the archives things have kinda slowed down on ferret and
there’s an effort ongoing with lucy, but I was wondering if anyone had
discovered a way to enumerate the matches of a particular field in the
document and get the offsets?

With what I’m trying to do, ferret will be indexing large portions of
structured information, but I really don’t want to store it all in the
ferret index just to have highlighting. My understanding (I’m still new
at this) is that if you index and store the match offsets, you can do
this without storing the full text of the field.

Ideally, what I’d like is to expose the contents of the C MatchRange
structure as an array of Ruby hash objects so that I could then use
those offsets in the actual data store to create my own highlighted
extracts (or something along those lines).

Short of adding a hacked version of searcher_highlight to the C API to
do this and creating a corresponding wrapped Ruby version, is there any
way to get to this information right now from the Ruby API?

Alternatively, is there another/better way to do this besides storing
the whole field values and using the built-in highlighter?

Any advice or pointers would be really appreciated.

Cheers,

ast

Andrew S. Townley [email protected]
http://atownley.org