Ferret finds 'tests' but not 'test'

Hello all,

Quick question (possibly!) - I’ve got a few records indexed and doing a
search for ‘test’ reports in no hits even though I know the word ‘tests’
exists in the indexed field. Doing a search for ‘tests’ produces a
result. I would have thought that ‘test’ would match ‘tests’ but no such
luck!

Thanks,

Alastair

On 9/6/06, Alastair M. [email protected] wrote:

Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;

require 'rubygems'
require 'ferret'

module Ferret::Analysis
  class MyAnalyzer
    def token_stream(field, text)
      StemFilter.new(StandardTokenizer.new(text))
    end
  end
end

index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)

index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits

Hope that helps,
Dave

David B. wrote:

On 9/6/06, Alastair M. [email protected] wrote:

Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;

require 'rubygems'
require 'ferret'

module Ferret::Analysis
  class MyAnalyzer
    def token_stream(field, text)
      StemFilter.new(StandardTokenizer.new(text))
    end
  end
end

index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)

index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits

Hope that helps,
Dave

Hi Dave,

Many thanks for the help, it does help! However given the short timespan
for this project, I think the users of the site will just have to be a
bit more specific in their search terms :slight_smile: Cheers and will bookmark your
reply for a later project.

Alastair

Hi there,

Thanks for this useful piece of information! What I’m wondering is how
do stemming on queries as well. My first try was:

query = Ferret::QueryParser.new(:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new).parse(query_string)

index.search_each(query) { |doc, score| … }

But this does not work the way I would expect it to work, i.e., it seems
to deliver empty results independent of the input.

Does anybody have an idea what I’m doing wrong?

Cheers,

Albert

David B. wrote:

On 9/6/06, Alastair M. [email protected] wrote:

Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;

require 'rubygems'
require 'ferret'

module Ferret::Analysis
  class MyAnalyzer
    def token_stream(field, text)
      StemFilter.new(StandardTokenizer.new(text))
    end
  end
end

index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)

index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits

Hope that helps,
Dave

On 9/29/06, Albert [email protected] wrote:

  class MyAnalyzer
puts index.search("test").total_hits

Ferret::Analysis::StemmingAnalyzer.new).parse(query_string)
Albert
Hi Albert,

Could you show us your implementation of StemmingAnalyzer as well.
Also, you need to be sure to use the same analyzer for both indexing
and analysis, although I think you already new this.

Cheers,
Dave

Hi Dave,

Thanks for following up! The StemmingAnalyzer is actually just the
MyAnalyzer from the example above:

module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end

I’ve been trying to find the error but no success. The searching is
done this way:

i = Ferret::Index::Index.new(:path => index)
qp = Ferret::QueryParser.new(:analyzer => 

Ferret::Analysis::StemmingAnalyzer.new)
query = qp.parse(query_string)
i.search_each(query) { |doc, score| … }

What I don’t get is that search_each(query) never returns a result
whereas when I use the original query string as in

i = Ferret::Index::Index.new(:path => index)

qp = Ferret::QueryParser.new(:analyzer =>

Ferret::Analysis::StemmingAnalyzer.new)

query = qp.parse(query_string)

i.search_each(query_string) { |doc, score| ... }
              ------------

things work as expected (modulo the stemmming, of course). So, it may
be that I fundamentally misunderstand something or make a stupid mistake

Cheers,

Albert

David B. wrote:

On 9/29/06, Albert [email protected] wrote:

  class MyAnalyzer
puts index.search("test").total_hits

Ferret::Analysis::StemmingAnalyzer.new).parse(query_string)
Albert
Hi Albert,

Could you show us your implementation of StemmingAnalyzer as well.
Also, you need to be sure to use the same analyzer for both indexing
and analysis, although I think you already new this.

Cheers,
Dave

Hi Dave,

Wonderful! Thanks! I should have taken a deeper look at the
documentation, indeed. Anyway, thanks for your patience!

Cheers,

Al.

David B. wrote:

On 9/30/06, Albert [email protected] wrote:

end
i.search_each(query) { |doc, score| ... }

things work as expected (modulo the stemmming, of course). So, it may
be that I fundamentally misunderstand something or make a stupid mistake

Cheers,

Albert

Sorry, I must have been tired last night. The problem is obvious to me
now. You need to set the :fields parameter. The above query parser
should work as long as you explicitly specify all fields in your
query. For example:

"content:(ruby rails) title:(ruby rails)"

But if you want to search all fields by default then you need to tell
the QueryParser what fields exist. The Index class will handle all of
this for you including using the same analyzer as is used during
indexing. It looks like you are using the Index class for your
searches so why not just leave the query parsing to it. Otherwise you
can get the fields from the reader.

query = Ferret::QueryParser.new(
    :analyzer => Ferret::Analysis::StemmingAnalyzer.new,
    :fields => reader.fields,
    :tokenized_fields => reader.tokenized_fields
).parse(query_string)

index.search_each(query) { |doc, score| ... }

Hope that helps,
Dave

On 9/30/06, Albert [email protected] wrote:

end
i.search_each(query) { |doc, score| ... }

things work as expected (modulo the stemmming, of course). So, it may
be that I fundamentally misunderstand something or make a stupid mistake

Cheers,

Albert

Sorry, I must have been tired last night. The problem is obvious to me
now. You need to set the :fields parameter. The above query parser
should work as long as you explicitly specify all fields in your
query. For example:

"content:(ruby rails) title:(ruby rails)"

But if you want to search all fields by default then you need to tell
the QueryParser what fields exist. The Index class will handle all of
this for you including using the same analyzer as is used during
indexing. It looks like you are using the Index class for your
searches so why not just leave the query parsing to it. Otherwise you
can get the fields from the reader.

query = Ferret::QueryParser.new(
    :analyzer => Ferret::Analysis::StemmingAnalyzer.new,
    :fields => reader.fields,
    :tokenized_fields => reader.tokenized_fields
).parse(query_string)

index.search_each(query) { |doc, score| ... }

Hope that helps,
Dave

Hi, if I use this stemming analyzer, where do I put it ? /lib/ and
require it in each model?

-Anrake

David B. wrote:

On 9/6/06, Alastair M. [email protected] wrote:

Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;

require 'rubygems'
require 'ferret'

module Ferret::Analysis
  class MyAnalyzer
    def token_stream(field, text)
      StemFilter.new(StandardTokenizer.new(text))
    end
  end
end

index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)

index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits

Hope that helps,
Dave

Alastair M. wrote:

Hello all,

Quick question (possibly!) - I’ve got a few records indexed and doing a
search for ‘test’ reports in no hits even though I know the word ‘tests’
exists in the indexed field. Doing a search for ‘tests’ produces a
result. I would have thought that ‘test’ would match ‘tests’ but no such
luck!

Thanks,

Alastair

Alastair - if you only want to find the plural of something and not the
full stem of words then ROR has a plurisation capability. It will take
test and bring back all the plurals or take tests and bring back the
singulars. You can then search on all these words. It is not a full
stemmer but in some circumstances perhaps this may be all that you are
wanting to do.

One thing to watch that caught us out was that as standard
pluralistation of words with two ‘ss’ at the end does not work properly.
For example, “glass” would come back as “glas” from the pluralizer.
There is a simple fix that is in the ROR forum that covers all this off.

I would only use the ror pluraliser if all you are looking to do is
bring back plurals of words and are not interested in the full stemming
of the words. For example, if you do a search on “tax” full stemming
should also search on “taxes” and “taxation”. Pluralise would not search
on “taxation”.

Hope this helps.

Clare

On 26.10.2006, at 22:06, Ghost wrote:

Can someone give me an idiots guide as to how to implement this custom
stemming analyser. I do not know where to start.

  1. Create the analyzer as David outlined it and name the file
    “my_analyzer.rb”. If you put it in /app/models you don’t need any
    require statements since every .rb file in /app/models gets
    automagically ‘required’ by Rails.

end
end

  1. When you create an Index instance, pass it your analyzer, like so:

index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)

  1. Test your analyzer, e.g.

index << “walking”
index << “walked”
index << “walks”

index.search(“walk”).total_hits # -> 3

Thanks for your patience.

You’re welcome. And may I kindly ask you to use a valid email address
and perhaps your real name for future posts?

Kind regards,
Andreas

anrake wrote:

Hi, if I use this stemming analyzer, where do I put it ? /lib/ and
require it in each model?

-Anrake

David B. wrote:

On 9/6/06, Alastair M. [email protected] wrote:
Can someone give

Can someone give me an idiots guide as to how to implement this custom
stemming analyser. I do not know where to start.

Thanks for your patience.

Alastair
The default analyzer doesn’t perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;

require 'rubygems'
require 'ferret'

module Ferret::Analysis
  class MyAnalyzer
    def token_stream(field, text)
      StemFilter.new(StandardTokenizer.new(text))
    end
  end
end

index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)

index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits

Hope that helps,
Dave

Hi I’m still having trouble with this. Probably something stupid but
here goes.

I’m using ferret version 0.13 and aaf.

I created this file in my app/models directory

require ‘ferret’
include Ferret

module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end

naming it my_analyzer.rb as directed.

and then in my ferret model i have the following declarion.

acts_as_ferret :fields=> [‘short_description’],:analyzer =>
Ferret::Analysis::MyAnalyzer.new

I tried to rebuild my index but it crashes out with the following error:

VoObject.rebuild_index
NameError: uninitialized constant MyAnalyzer
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in
const_missing' from script/../config/../config/../app/models/vo_object.rb:14 from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:140:inload’
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:56:in
require_or_load' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:30:independ_on’
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:85:in
require_dependency' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:98:inconst_missing’
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:131:in
`const_missing’
from (irb):11

Nasty eh?

Any idea what is going on here? Why can’t my VoObject model see the new
analyzer?
Thanks again.

You’re welcome. And may I kindly ask you to use a valid email address
and perhaps your real name for future posts?

I used to post with a valid email address. But then the number of spam
messages i recieved went from 1 or 2 a week to 50-60 a day. Ruby Forum
used to print the email addresses on the page. Heres a comprimise.

Regards
Caspar

Andreas K. wrote:

Hi Caspar,

On 27.10.2006, at 11:58, Ghost wrote:

NameError: uninitialized constant MyAnalyzer
Sorry, I forgot to mention that the directory structure needs to
resemble the module nesting, i.e. the file must go in app/models/
ferret/analysis instead of just app/models.

Cheers,
Andy

I’ve been trying to use the solution for stemming discussed in this
thread and have run into a bit of trouble.

I’m using this analyzer:

module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end

I’ve configured aaf thusly:

AAF_DEFAULT_FERRET_OPTIONS = {:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new}

acts_as_ferret({:store_class_name => true,
:fields => {:description => {:store =>
:yes}}}.merge(AAF_DEFAULT_OPTIONS),
AAF_DEFAULT_FERRET_OPTIONS)

The first time I search for something a new index is created in index,
and it successfully returns a set of results. The second time I search,
however, I get a strange error:

uninitialized constant Ferret::Search

#{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies.rb:264:in
load_missing_constant' #{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies.rb:453:inconst_missing’
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:160:in
query_for_record' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:152:indocument_number’
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:135:in
highlight' /opt/local/lib/ruby/1.8/monitor.rb:238:insynchronize’
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:134:in
highlight' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:30:inhighlight’

Perhaps it has something to do with loading an already created index?

Thanks,
-Adam

This is just postscript correction for this thread, in case anyone else
browses to it (like i did) and gets sent down the slightly wrong track.

If you’re going to include the :analyzer option in your call to
acts_as_ferret, then it needs to live inside another option hash called
:ferret. EG, some of the examples above say to do this:

acts_as_ferret :fields=> [‘short_description’],
:analyzer => Ferret::Analysis::MyAnalyzer.new

This won’t work - it needs to be like this:

acts_as_ferret :fields=> [‘short_description’],
:ferret => {:analyzer =>
Ferret::Analysis::MyAnalyzer.new}

Thanks to Jens for setting me straight on this :slight_smile:

Hi Caspar,

On 27.10.2006, at 11:58, Ghost wrote:

NameError: uninitialized constant MyAnalyzer
Sorry, I forgot to mention that the directory structure needs to
resemble the module nesting, i.e. the file must go in app/models/
ferret/analysis instead of just app/models.

Cheers,
Andy