Grep style output?

kastorilo · June 13, 2006, 7:49am

Hi All,

Hope all is going well. Was just wondering if anyone has implemented a
grep style output page of hits using Ferret as the index/query engine?

Any thoughts about how best to implement it? The previous thread
discussess highlighting - would that be the best approach to follow or
is there a better way?

Cheers,

Marcus

kastorilo · June 17, 2006, 3:27am

On 6/13/06, Marcus C. [email protected] wrote:

Marcus

Hi Marcus,

If you can read java the best way would be to check out the
highlighter in Apache Lucene and porting that code to Ruby. You can
see the highlighter module here;

http://svn.apache.org/viewvc/lucene/java/trunk/contrib/

I’m going to do this myself eventually but you’ll have to do it
yourself if you need it soon. Before you put too much work into it
though, be warned that there are possible major Ferret API changes
ahead.

Cheers,
Dave

kastorilo · June 21, 2006, 8:00am

David B. wrote:

On 6/13/06, Marcus C. [email protected] wrote:
Hi Marcus,

If you can read java the best way would be to check out the
highlighter in Apache Lucene and porting that code to Ruby. You can
see the highlighter module here;
http://svn.apache.org/viewvc/lucene/java/trunk/contrib/
I’m going to do this myself eventually but you’ll have to do it
yourself if you need it soon. Before you put too much work into it
though, be warned that there are possible major Ferret API changes
ahead.

Hi David,

Thanks for your response.

I noticed in a previous post you referenced the lucene highlighter and
have already started porting it to Ferret. I’m already quite a ways
along and have got the first 3 test cases passing properly (ie. simple
and fuzzy fragments) and will continue with getting the rest of the test
cases to work.

Hopefully the API changes don’t break too much then

I’ll post the code once it’s all working, hopefully within the next
days.

Cheers,

Marcus

kastorilo · June 21, 2006, 12:35pm

On 6/21/06, Marcus C. [email protected] wrote:

I’m going to do this myself eventually but you’ll have to do it
along and have got the first 3 test cases passing properly (ie. simple
Marcus
That’d be great. The new API shouldn’t be too hard to adjust to. I’ll
be implementing the highlighter in C rather than in Ruby so I’ll be
interested to see how you go with it.

The main difference in the API is that you won’t specify the store,
index and term_vector parameters per document field any more. This
option will still be available but the behaviour will be slightly
different. I’ll go into more detail later.

Cheers,
Dave

kastorilo · June 21, 2006, 3:55pm

On Jun 21, 2006, at 3:32 AM, David B. wrote:

I’ll
be implementing the highlighter in C rather than in Ruby so I’ll be
interested to see how you go with it.

The main difference in the API is that you won’t specify the store,
index and term_vector parameters per document field any more. This
option will still be available but the behaviour will be slightly
different. I’ll go into more detail later.

How close is what you’re going to be doing to the Lucene contrib
highlighter?

FWIW, the KinoSearch Highlighter uses similar techniques for adding
tags and encoding, but the excerpt selection is pretty different. No
TokenStream required, it uses a heat map. Right now it requires that
the field have term vectors stored with positions and offsets, but it
could be adapted to generate the vectors by re-analyzing.

The principle advantage it has over the Lucene Highlighter in that it
handles phrases properly:

http://xrl.us/nm2z (Link to www.lucenebook.com)
http://xrl.us/nm25 (Link to www.rectangular.com)

Whatever algorithm we choose for Lucy, I hope it will meet that
constraint.

Higlighter.pm isn’t that long (384 lines including docs) and if I
didn’t have an serious deadlines bearing down doing a Ruby version
would be a great exercise for me. If you or Marcus want to check it
out, the new version’s only in subversion:

http://xrl.us/nm28 (Link to www.rectangular.com)

Marvin H.
Rectangular Research
http://www.rectangular.com/

kastorilo · June 21, 2006, 6:08pm

On 6/21/06, Marvin H. [email protected] wrote:

different. I’ll go into more detail later.

How close is what you’re going to be doing to the Lucene contrib
highlighter?

Well I haven’t actually started it yet so we’ll see.

http://xrl.us/nm25 (Link to www.rectangular.com)
Whatever algorithm we choose for Lucy, I hope it will meet that
constraint.

Higlighter.pm isn’t that long (384 lines including docs) and if I
didn’t have an serious deadlines bearing down doing a Ruby version
would be a great exercise for me. If you or Marcus want to check it
out, the new version’s only in subversion:

http://xrl.us/nm28 (Link to www.rectangular.com)

Cool, I’ll definitely check this out. Thanks Marvin.