Short words not indexed?


#1

I noticed that if I have a field that contains something like “Institute
for medicine”, that if I search using nay of these queries:

for
for
for~

Nothing shows up. If I search for either of the other two words, though,
that term would show up in the result set. Does this indicate that short
words like “for” are not indexed?

Thanks!

Jen


#2

What analyzer are you using?
On Dec 28, 2005, at 7:13 PM, jennyw wrote:

that term would show up in the result set. Does this indicate that
short
words like “for” are not indexed?

Jen - what analyzer are you using?

If you’re using the default, it is the StandardAnalyzer, which
removes these stop words during tokenization:

 ENGLISH_STOP_WORDS = [
   "a", "an", "and", "are", "as", "at", "be", "but", "by", "for",

“if”,
“in”, “into”, “is”, “it”, “no”, “not”, “of”, “on”, “or”, “s”,
“such”,
“t”, “that”, “the”, “their”, “then”, “there”, “these”,
“they”, “this”, “to”, “was”, “will”, “with”
]

Off the cuff, you should be able to adjust this to not remove any
stop words by using:

:analyzer => StandardAnalyzer.new([])

if you’re using the Index class Ferret provides.

Erik

#3

Erik H. wrote:

:analyzer => StandardAnalyzer.new([])

I am having a similar problem, and i’ve tried implementing your
suggestion like this:

acts_as_ferret :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([])

in my indexed classes.

I have rebuilt the indexes, but still can’t get some of these short
words to return results.

I’ve also found that words which have hyphens in them don’t work.

Is there something else necessary in order to get this working in an
active record class?

Thanks,

Cam


#4

On 8/24/06, Cameron H. removed_email_address@domain.invalid wrote:

Ferret::Analysis::StandardAnalyzer.new([])

Thanks,

Cam

Hi Cam,

Get Ferret 0.9.6. This was a bug which should now be fixed. Better
yet, wait until acts_as_ferret works with Ferret 0.10. Jens K. is
already working on it and I have few bugs to work out. As for hyphens
not working, it sounds like the same analyzer is not being used for
the queries but you’d have to check with one of the acts_as_ferret
developers that you are using it correctly.

Cheers,
Dave


#5

David B. wrote:

Get Ferret 0.9.6. This was a bug which should now be fixed. Better
yet, wait until acts_as_ferret works with Ferret 0.10. Jens K. is
already working on it and I have few bugs to work out. As for hyphens
not working, it sounds like the same analyzer is not being used for
the queries but you’d have to check with one of the acts_as_ferret
developers that you are using it correctly.

I am currently using 0.9.6 after reading about the update somewhere
else. I didn’t have luck switching the analyzer, so i tried the
alternate solution suggested elsewhere on this forum, which was just
stripping the STOP words out of the original query. This actually seems
to work somewhat well, but doesn’t solve the hypen problem.

Can you clarify what you mean about using the same analyzer for queries?
Is there any reason that hyphenated terms would not be getting indexed
and searched properly by default?

Thanks

Cam


#6

On 8/25/06, Cameron H. removed_email_address@domain.invalid wrote:

else. I didn’t have luck switching the analyzer, so i tried the
alternate solution suggested elsewhere on this forum, which was just
stripping the STOP words out of the original query. This actually seems
to work somewhat well, but doesn’t solve the hypen problem.

Can you clarify what you mean about using the same analyzer for queries?
Is there any reason that hyphenated terms would not be getting indexed
and searched properly by default?

This was in fact a bug. Thanks Cameron. It is fixed in subversion now.

Cheers,
Dave