Grep style output?


#1

Hi All,

Hope all is going well. Was just wondering if anyone has implemented a
grep style output page of hits using Ferret as the index/query engine?

Any thoughts about how best to implement it? The previous thread
discussess highlighting - would that be the best approach to follow or
is there a better way?

Cheers,

Marcus


#2

On 6/13/06, Marcus C. removed_email_address@domain.invalid wrote:

Marcus

Hi Marcus,

If you can read java the best way would be to check out the
highlighter in Apache Lucene and porting that code to Ruby. You can
see the highlighter module here;

http://svn.apache.org/viewvc/lucene/java/trunk/contrib/

I’m going to do this myself eventually but you’ll have to do it
yourself if you need it soon. Before you put too much work into it
though, be warned that there are possible major Ferret API changes
ahead.

Cheers,
Dave


#3

David B. wrote:

On 6/13/06, Marcus C. removed_email_address@domain.invalid wrote:
Hi Marcus,

If you can read java the best way would be to check out the
highlighter in Apache Lucene and porting that code to Ruby. You can
see the highlighter module here;

http://svn.apache.org/viewvc/lucene/java/trunk/contrib/

I’m going to do this myself eventually but you’ll have to do it
yourself if you need it soon. Before you put too much work into it
though, be warned that there are possible major Ferret API changes
ahead.

Hi David,

Thanks for your response.

I noticed in a previous post you referenced the lucene highlighter and
have already started porting it to Ferret. I’m already quite a ways
along and have got the first 3 test cases passing properly (ie. simple
and fuzzy fragments) and will continue with getting the rest of the test
cases to work.

Hopefully the API changes don’t break too much then :slight_smile:

I’ll post the code once it’s all working, hopefully within the next
days.

Cheers,

Marcus


#4

On 6/21/06, Marcus C. removed_email_address@domain.invalid wrote:

I’m going to do this myself eventually but you’ll have to do it
along and have got the first 3 test cases passing properly (ie. simple
Marcus
That’d be great. The new API shouldn’t be too hard to adjust to. I’ll
be implementing the highlighter in C rather than in Ruby so I’ll be
interested to see how you go with it.

The main difference in the API is that you won’t specify the store,
index and term_vector parameters per document field any more. This
option will still be available but the behaviour will be slightly
different. I’ll go into more detail later.

Cheers,
Dave


#5

On Jun 21, 2006, at 3:32 AM, David B. wrote:

I’ll
be implementing the highlighter in C rather than in Ruby so I’ll be
interested to see how you go with it.

The main difference in the API is that you won’t specify the store,
index and term_vector parameters per document field any more. This
option will still be available but the behaviour will be slightly
different. I’ll go into more detail later.

How close is what you’re going to be doing to the Lucene contrib
highlighter?

FWIW, the KinoSearch Highlighter uses similar techniques for adding
tags and encoding, but the excerpt selection is pretty different. No
TokenStream required, it uses a heat map. Right now it requires that
the field have term vectors stored with positions and offsets, but it
could be adapted to generate the vectors by re-analyzing.

The principle advantage it has over the Lucene Highlighter in that it
handles phrases properly:

http://xrl.us/nm2z (Link to www.lucenebook.com)
http://xrl.us/nm25 (Link to www.rectangular.com)

Whatever algorithm we choose for Lucy, I hope it will meet that
constraint.

Higlighter.pm isn’t that long (384 lines including docs) and if I
didn’t have an serious deadlines bearing down doing a Ruby version
would be a great exercise for me. If you or Marcus want to check it
out, the new version’s only in subversion:

http://xrl.us/nm28 (Link to www.rectangular.com)

Marvin H.
Rectangular Research
http://www.rectangular.com/


#6

On 6/21/06, Marvin H. removed_email_address@domain.invalid wrote:

different. I’ll go into more detail later.

How close is what you’re going to be doing to the Lucene contrib
highlighter?

Well I haven’t actually started it yet so we’ll see.

http://xrl.us/nm25 (Link to www.rectangular.com)

Whatever algorithm we choose for Lucy, I hope it will meet that
constraint.

Higlighter.pm isn’t that long (384 lines including docs) and if I
didn’t have an serious deadlines bearing down doing a Ruby version
would be a great exercise for me. If you or Marcus want to check it
out, the new version’s only in subversion:

http://xrl.us/nm28 (Link to www.rectangular.com)

Cool, I’ll definitely check this out. Thanks Marvin.