More_like_this

Hi,

I’m using acts_as_ferret in my rails application and I’d like to use
more_like_this to retrieve some ‘similar’ item suggestions. I have a
class ‘items’ which has a status field and I need to retrieve items that
only have one of the two possible statuses.

Looking at the more_like_this method indicates it supports an
:append_to_query option that allows you to specify a proc that will
modify the query object before the query is ‘run’. This would seem to
allow me to specify extra conditions to the query (such as
+status:live).

Item.more_like_this(:field_names => [:title, :description, :status],
:append_to_query => Proc … )

It’s a little unclear exactly what the query object is and there seem to
be no examples I can find outlining how to use this functionality, does
anybody have an example they could contribute ?

Thanks

On Wed, May 09, 2007 at 12:59:44PM +0200, Rob L. wrote:

allow me to specify extra conditions to the query (such as
+status:live).

Item.more_like_this(:field_names => [:title, :description, :status],
:append_to_query => Proc … )

It’s a little unclear exactly what the query object is and there seem to
be no examples I can find outlining how to use this functionality, does
anybody have an example they could contribute ?

I don’t have an exampla at hand, but maybe I can help anyway. The Proc
parameter
is a BooleanQuery instance. You can add your own conditions to this by
adding your own Query to this:

query.add_query(Ferret::Search::TermQuery.new(:status, ‘live’), :must)

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Rob L. wrote:

Item.more_like_this(:field_names => [:title, :description, :status],
:append_to_query => Proc … )

I don’t mean to be nitpicky but more_like_this is an instance method not
a class method. This has come up for me because more_like_this does not
work for unsaved records in the current AAF which doesn’t mesh with the
rails convention of creating a new active record object to store user
query params. I’d like to make a regular rails form using a blank object
and then call more_like_this on that object to do a search.

On Tue, May 15, 2007 at 05:37:19PM +0200, Jacob Robbins wrote:

rails convention of creating a new active record object to store user
query params. I’d like to make a regular rails form using a blank object
and then call more_like_this on that object to do a search.

This isn’t supported by aaf but should be possible to do with a bit of
hacking :slight_smile:

It’ll get a bit harder if you want to do this with the DRb server, since
then you’ll have to transfer your unsaved record over to the server for
the more_like_this query to be built. Atm only id and class name
are transferred with method calls.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

On Tue, May 15, 2007 at 05:37:19PM +0200, Jacob Robbins wrote:

rails convention of creating a new active record object to store user
query params. I’d like to make a regular rails form using a blank object
and then call more_like_this on that object to do a search.

This isn’t supported by aaf but should be possible to do with a bit of
hacking :slight_smile:

It’ll get a bit harder if you want to do this with the DRb server, since
then you’ll have to transfer your unsaved record over to the server for
the more_like_this query to be built. Atm only id and class name
are transferred with method calls.

Jens

Thanks for checking into this Jens, i’ve done what i wanted by adding an
instance method to aaf. In instance_methods.rb, right after the to_doc
method, i added a to_ferret_query method. This avoids transfering the
whole object when using the DRb server. Tell me what you think…

# Turn this instance into a ferret query derived from its field 

values.
# Empty fields are ignored. Can be used on unsaved records. Typical
use is to make
# ferret query from a new object initialized from posted form
values.
#
# Example: college.to_query(:fuzz => 0.6)
# #=> “name:seattle~0.6 and name:university~0.6 and
city:seattle~0.6”
#
#
# === Options
#
# fuzz:: Default: nil. Float value for fuzziness to attach
to search terms.
# field_names:: Default: nil. (uses ferret indexed fields) Array
of field names to use in query.
# join_type:: Default: ‘and’. String used to join query terms.
# exclude:: Default: [‘and’, ‘or’]. Array of words to ignore
in field values.
def to_ferret_query(options = {})
options = {
:field_names =>
self.class.aaf_configuration[:ferret_fields].keys,
:join_type => ‘and’,
:exclude => [‘and’,‘or’]
}.update(options)
terms = []
options[:field_names].each do |field|
if val = self.send(field)
val.to_s.split.each do |word|
unless options[:exclude].include?(word.strip.downcase)
terms << field.to_s + ‘:’ + word + ( options[:fuzz] ? ‘~’

  • options[:fuzz].to_s : ‘’ )
    end
    end
    end
    end
    terms.join ’ ’ + options[:join_type] + ’ ’
    end
    <<<<<<<<<<<<<<<<<<<<<<<<<<

On Tue, May 15, 2007 at 09:08:06PM +0200, Jacob Robbins wrote:

then you’ll have to transfer your unsaved record over to the server for
the more_like_this query to be built. Atm only id and class name
are transferred with method calls.

Jens

Thanks for checking into this Jens, i’ve done what i wanted by adding an
instance method to aaf. In instance_methods.rb, right after the to_doc
method, i added a to_ferret_query method. This avoids transfering the
whole object when using the DRb server. Tell me what you think…

Perfectly fine if it works for you.

Aaf’s more_like_this is more complicated, mainly because it tries to
find out the 15 or so most relevant terms of your record’s content to
construct the query to support large documents (and it can even boost
these single terms according to their relevance).

I’ll look into refactoring aaf a bit so that in future versions
more_like_this can be used on unsaved records, too.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Aaf’s more_like_this is more complicated, mainly because it tries to
find out the 15 or so most relevant terms of your record’s content to
construct the query to support large documents (and it can even boost
these single terms according to their relevance).

Oh, now i get it. Yeah i run into this a lot with my deployment because
we don’t index big documents and most of ferret is geared for them. I
use ferret to help users find bands, recordings and labels that are
commonly mispelled. So for me… fuzzy searching: good, stopwords: bad.