Tweaking minimum word length?

Hi,

Can Ferret be configured to change the minimum word length of what it
indexes? Right now it seems to drop words 3 characters or less, but
I’d like to include words going down to 2 characters. How would I do
that?

Francis

Sorry, false alarm, I was not indexing some of my records.

Hello,
I am actually experiencing the same problem (I am using Ferret 0.9.5).
When I search for terms that are under 4 characters, Ferret doesn’t
return any result. Is there a way to index all words (even
single-character words) easily ? Since I am using acts_as-ferret for my
project, is there a way to also specify that within acts_as_ferret
options ?

Thank you,
Maxime CUrioni

Francis H. wrote:

Sorry, false alarm, I was not indexing some of my records.

Hi,

Can Ferret be configured to change the minimum word length of what it
indexes? Right now it seems to drop words 3 characters or less, but
I’d like to include words going down to 2 characters. How would I do
that?

Francis

Hi Maxime,

Ferret already indexes all words no matter what their length (unless
you add a custom filter). Could you give an example of the problem?
ie. what words are you trying to search for?

Cheers,
Dave

Hello Dave,
Sorry for responding so late. I am actually using Ferret via the
acts_as_ferret Rails plugin.

I have a problem with small words, especially when I search for them
between quotes. For example, I have indexed the following sentence:
“e-commerce growth strategy for a major business to leverage key
intangible assets”

When I search for the sentence ‘“for a”’ (not just ‘for AND a’ but the
sentence “for a”), I don’t get any results. Is there a way to impose to
Ferret to return results strictly containing certain words (i.e. exact
results, not approximate results) ?

I am also experiencing problems with words containing special characters
(especially words separated with dashes). Is there a way to send a raw
query to Ferret without having to escape the special characters ?

Thank you for your help,
Maxime C.

David B. wrote:

Hi Maxime,

Ferret already indexes all words no matter what their length (unless
you add a custom filter). Could you give an example of the problem?
ie. what words are you trying to search for?

Cheers,
Dave

On 9/8/06, Maxime C. [email protected] wrote:

sentence “for a”), I don’t get any results.
Hi Maxime,
It’s not the length of the words that is the problem. If you did a
search for “cat” it would find it. The problem is that the default
analyzer which you are using removes common stop-words like “and”,
“the”, “a” and “for”. You can create a StandardAnalyzer that doesn’t
remove stopwords like this;

include Ferret::Index
include Ferret::Analysis

index = Index.new(:analyzer => StandardAnalyzer.new([]))

Is there a way to impose to
Ferret to return results strictly containing certain words (i.e. exact
results, not approximate results) ?

I’m not sure what you mean here. Can you give me an example where
Ferret returns approximate results?

I am also experiencing problems with words containing special characters
(especially words separated with dashes). Is there a way to send a raw
query to Ferret without having to escape the special characters ?

words separated by dashes are treated as single words by the current
StandardAnalyzer but that will change in version 0.10.3. Here is an
example;

require 'rubygems'
require 'ferret'

index = Ferret::I.new(:analyzer =>

Ferret::Analysis::StandardAnalyzer.new([]))

index << "e-commerce growth strategy for a major business to

leverage key intangible assets"

puts index.search("e-commerce")
puts index.search("commerce")
puts index.search("for a")

Currently the search for “commerce” won’t return any results. In
version 0.10.3 both “e-commerce” and “commerce” and “e” for that
matter will find the document.

Hello David and Jens,
I cannot thank you enough for your prompt answers. I had quickly browsed
through both Ferret and aaf APIs but being short on schedule, I did not
really have time to dive in the technology. I have successfully used aaf
and Ferret out out the box for my product, thanks to your work and the
Rails environment. I am realizing now that if I had read the
documentation (especially about Ferret analyzers), I could have saved
some of your time… so thanks a lot !

I now understand about Ferret parsing the query for common words. I will
use the basic analyzer that you provided me with. Regarding the
“approximate results”, after what you have told me, it makes more sense:
the record “Defining an e-commerce growth strategy for a major business”
would be matched by both ‘“Defining an”’ and ‘“Defining as”’. I thought
that Ferret would match ‘approximate results’, considering that those
queries were somehow close enough to return the previous record as a
valid result for both of them. I understand that “an” and “as” are
considered common words and Ferret removes them, therefore giving the
results of the ‘“Defining”’ query.

I understand that the feature I am looking for (matching words separated
with dashes) will be available in the next released version:

  • what can I do, in the meantime, to match those words ? Do I need to
    write an ad hoc analyzer ? Could you tell me the list of the “special
    characters” ?
  • when do you estimate that 0.10.3 will be released ? Having to deliver
    my product soon, I was wondering if that version would make it into my
    work.

Thank you again for your time and your help. Regards.
Maxime C.

On Fri, Sep 08, 2006 at 03:24:27PM +0900, David B. wrote:

When I search for the sentence ‘“for a”’ (not just ‘for AND a’ but the
include Ferret::Analysis

index = Index.new(:analyzer => StandardAnalyzer.new([]))

or, with aaf:
acts_as_ferret :analyzer => StandardAnalyzer.new([])

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

or, with aaf:
acts_as_ferret :analyzer => StandardAnalyzer.new([])

I’ve tried this with aaf, and it still uses stopwords. Anyone else have
this problem? I’m running 10.10 and aaf, plugin (as current as
today…not sure what v.).

I’ve tried:
acts_as_ferret :fields => [:name], :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([])

acts_as_ferret :fields => [:name], :analyzer => StandardAnalyzer.new([])

even different analyzers. All of them still seem to use the stopwords.
Anyone have an idea?

Brad Adams wrote:

Brad Adams wrote:

or, with aaf:
acts_as_ferret :analyzer => StandardAnalyzer.new([])

I’ve tried this with aaf, and it still uses stopwords. Anyone else have
this problem? I’m running 10.10 and aaf, plugin (as current as
today…not sure what v.).

I’ve tried:
acts_as_ferret :fields => [:name], :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([])

acts_as_ferret :fields => [:name], :analyzer => StandardAnalyzer.new([])

even different analyzers. All of them still seem to use the stopwords.
Anyone have an idea?

I’ve got it to work…after countless tries with different syntax, and
analyzers.
It worked only when I passed ‘nil’.
acts_as_ferret( { :fields => [:name] }, { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([nil]) } )

Hope that’ll help anyone else that comes across this.

Thanks everyone for posting this.

I have a question.

acts_as_ferret( { :fields => [:name] }, { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([nil]) } )
works by allowing stopwords in my searches, but what if I want to allow
stopword searching in only ONE field?

This is what I have:
acts_as_ferret({:fields => {:name => {:boost => 10, :store => :yes},
:description => {},
:title => {:boost => 3}}}, { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([nil]) } )

I want to allow stopword searching for :title, and remove stopwords for
:name and :description. Is there a way to do it?

I’m new to Ferret, and can’t really figure out how to use StopFilter in
QueryParser.

qp = QueryParser.new(:fields => [:name, :description], :analyzer =>
StopFilter.new())

Thanks a lot!

Brad Adams wrote:

or, with aaf:
acts_as_ferret :analyzer => StandardAnalyzer.new([])

I’ve tried this with aaf, and it still uses stopwords. Anyone else have
this problem? I’m running 10.10 and aaf, plugin (as current as
today…not sure what v.).

I’ve tried:
acts_as_ferret :fields => [:name], :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([])

acts_as_ferret :fields => [:name], :analyzer => StandardAnalyzer.new([])

even different analyzers. All of them still seem to use the stopwords.
Anyone have an idea?

I’ve got it to work…after countless tries with different syntax, and
analyzers.
It worked only when I passed ‘nil’.
acts_as_ferret( { :fields => [:name] }, { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([nil]) } )

Hope that’ll help anyone else that comes across this.

On Mon, Apr 02, 2007 at 03:37:09PM +0200, David wrote:
[…]

I want to allow stopword searching for :title, and remove stopwords for
:name and :description. Is there a way to do it?

Have a look at PerFieldAnalyzer, it allows you to specify separate
Analyzers for fields.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Thanks Jens, exactly what I needed.

Jens K. wrote:

On Mon, Apr 02, 2007 at 03:37:09PM +0200, David wrote:
[…]

I want to allow stopword searching for :title, and remove stopwords for
:name and :description. Is there a way to do it?

Have a look at PerFieldAnalyzer, it allows you to specify separate
Analyzers for fields.

Jens


Jens Kr�mer
webit! Gesellschaft f�r neue Medien mbH
Schnorrstra�e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

On Mon, Apr 02, 2007 at 03:37:09PM +0200, David wrote:
[…]

I want to allow stopword searching for :title, and remove
stopwords for
:name and :description. Is there a way to do it?

Have a look at PerFieldAnalyzer, it allows you to specify separate
Analyzers for fields.

Hey…

that’s what we do over at omdb.org:

@analyzer = PerFieldAnalyzer.new( OmdbDefaultAnalyzer.new )
@analyzer[:aliases] = OmdbContentAnalyzer.new( Locale.base_language )
@analyzer[:keywords] = OmdbContentAnalyzer.new
( Locale.base_language )
LOCALES.each_key do |key|
language = Language.pick(key)
@analyzer[“content_#{key}”.to_sym] = OmdbContentAnalyzer.new
( language )
@analyzer[“keywords_#{key}”.to_sym] = OmdbContentAnalyzer.new
( language )
end

Where a ContentAnalyzer is a MappingFilter > StemFilter > StopFilter

LowerCaseFilter
and a DefaultAnalyzer is simply a MappingFilter > HyphenFilter >
LowerCaseFilter

:slight_smile:

Ben