Forum: Ferret Grep style output?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Marcus Crafter (Guest)
on 2006-06-13 07:49
Hi All,

Hope all is going well. Was just wondering if anyone has implemented a
grep style output page of hits using Ferret as the index/query engine?

Any thoughts about how best to implement it? The previous thread
discussess highlighting - would that be the best approach to follow or
is there a better way?

Cheers,

Marcus
David Balmain (Guest)
on 2006-06-17 03:27
(Received via mailing list)
On 6/13/06, Marcus Crafter <crafterm@gmail.com> wrote:
>
> Marcus

Hi Marcus,

If you can read java the best way would be to check out the
highlighter in Apache Lucene and porting that code to Ruby. You can
see the highlighter module here;

    http://svn.apache.org/viewvc/lucene/java/trunk/contrib/

I'm going to do this myself eventually but you'll have to do it
yourself if you need it soon. Before you put too much work into it
though, be warned that there are possible major Ferret API changes
ahead.

Cheers,
Dave
Marcus Crafter (Guest)
on 2006-06-21 08:00
David Balmain wrote:
> On 6/13/06, Marcus Crafter <crafterm@gmail.com> wrote:
> Hi Marcus,
>
> If you can read java the best way would be to check out the
> highlighter in Apache Lucene and porting that code to Ruby. You can
> see the highlighter module here;
>
>     http://svn.apache.org/viewvc/lucene/java/trunk/contrib/
>
> I'm going to do this myself eventually but you'll have to do it
> yourself if you need it soon. Before you put too much work into it
> though, be warned that there are possible major Ferret API changes
> ahead.

Hi David,

Thanks for your response.

I noticed in a previous post you referenced the lucene highlighter and
have already started porting it to Ferret. I'm already quite a ways
along and have got the first 3 test cases passing properly (ie. simple
and fuzzy fragments) and will continue with getting the rest of the test
cases to work.

Hopefully the API changes don't break too much then :)

I'll post the code once it's all working, hopefully within the next
days.

Cheers,

Marcus
David Balmain (Guest)
on 2006-06-21 12:35
(Received via mailing list)
On 6/21/06, Marcus Crafter <crafterm@gmail.com> wrote:
> > I'm going to do this myself eventually but you'll have to do it
> along and have got the first 3 test cases passing properly (ie. simple
> Marcus
That'd be great. The new API shouldn't be too hard to adjust to. I'll
be implementing the highlighter in C rather than in Ruby so I'll be
interested to see how you go with it.

The main difference in the API is that you won't specify the store,
index and term_vector parameters per document field any more. This
option will still be available but the behaviour will be slightly
different. I'll go into more detail later.

Cheers,
Dave
Marvin Humphrey (Guest)
on 2006-06-21 15:55
(Received via mailing list)
On Jun 21, 2006, at 3:32 AM, David Balmain wrote:

> I'll
> be implementing the highlighter in C rather than in Ruby so I'll be
> interested to see how you go with it.
>
> The main difference in the API is that you won't specify the store,
> index and term_vector parameters per document field any more. This
> option will still be available but the behaviour will be slightly
> different. I'll go into more detail later.

How close is what you're going to be doing to the Lucene contrib
highlighter?

FWIW, the KinoSearch Highlighter uses similar techniques for adding
tags and encoding, but the excerpt selection is pretty different.  No
TokenStream required, it uses a heat map.  Right now it requires that
the field have term vectors stored with positions and offsets, but it
could be adapted to generate the vectors by re-analyzing.

The principle advantage it has over the Lucene Highlighter in that it
handles phrases properly:

    http://xrl.us/nm2z (Link to www.lucenebook.com)
    http://xrl.us/nm25 (Link to www.rectangular.com)

Whatever algorithm we choose for Lucy, I hope it will meet that
constraint.

Higlighter.pm isn't that long (384 lines including docs) and if I
didn't have an serious deadlines bearing down doing a Ruby version
would be a great exercise for me.  If you or Marcus want to check it
out, the new version's only in subversion:

   http://xrl.us/nm28 (Link to www.rectangular.com)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
David Balmain (Guest)
on 2006-06-21 18:08
(Received via mailing list)
On 6/21/06, Marvin Humphrey <marvin@rectangular.com> wrote:
> > different. I'll go into more detail later.
>
> How close is what you're going to be doing to the Lucene contrib
> highlighter?

Well I haven't actually started it yet so we'll see.

>     http://xrl.us/nm25 (Link to www.rectangular.com)
>
> Whatever algorithm we choose for Lucy, I hope it will meet that
> constraint.
>
> Higlighter.pm isn't that long (384 lines including docs) and if I
> didn't have an serious deadlines bearing down doing a Ruby version
> would be a great exercise for me.  If you or Marcus want to check it
> out, the new version's only in subversion:
>
>    http://xrl.us/nm28 (Link to www.rectangular.com)

Cool, I'll definitely check this out. Thanks Marvin.
This topic is locked and can not be replied to.