Short words not indexed?

jennyw · December 29, 2005, 1:16am

I noticed that if I have a field that contains something like “Institute
for medicine”, that if I search using nay of these queries:

for
for
for~

Nothing shows up. If I search for either of the other two words, though,
that term would show up in the result set. Does this indicate that short
words like “for” are not indexed?

Thanks!

Jen

jennyw · December 29, 2005, 1:31am

What analyzer are you using?
On Dec 28, 2005, at 7:13 PM, jennyw wrote:

that term would show up in the result set. Does this indicate that
short
words like “for” are not indexed?

Jen - what analyzer are you using?

If you’re using the default, it is the StandardAnalyzer, which
removes these stop words during tokenization:

 ENGLISH_STOP_WORDS = [
   "a", "an", "and", "are", "as", "at", "be", "but", "by", "for",

“if”,
“in”, “into”, “is”, “it”, “no”, “not”, “of”, “on”, “or”, “s”,
“such”,
“t”, “that”, “the”, “their”, “then”, “there”, “these”,
“they”, “this”, “to”, “was”, “will”, “with”
]

Off the cuff, you should be able to adjust this to not remove any
stop words by using:

:analyzer => StandardAnalyzer.new([])

if you’re using the Index class Ferret provides.

Erik

jennyw · August 23, 2006, 10:59pm

Erik H. wrote:

:analyzer => StandardAnalyzer.new([])

I am having a similar problem, and i’ve tried implementing your
suggestion like this:

acts_as_ferret :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([])

in my indexed classes.

I have rebuilt the indexes, but still can’t get some of these short
words to return results.

I’ve also found that words which have hyphens in them don’t work.

Is there something else necessary in order to get this working in an
active record class?

Thanks,

Cam

jennyw · August 24, 2006, 1:54am

On 8/24/06, Cameron H. [email protected] wrote:

Ferret::Analysis::StandardAnalyzer.new([])

Thanks,

Cam

Hi Cam,

Get Ferret 0.9.6. This was a bug which should now be fixed. Better
yet, wait until acts_as_ferret works with Ferret 0.10. Jens K. is
already working on it and I have few bugs to work out. As for hyphens
not working, it sounds like the same analyzer is not being used for
the queries but you’d have to check with one of the acts_as_ferret
developers that you are using it correctly.

Cheers,
Dave

jennyw · August 24, 2006, 5:34pm

David B. wrote:

Get Ferret 0.9.6. This was a bug which should now be fixed. Better
yet, wait until acts_as_ferret works with Ferret 0.10. Jens K. is
already working on it and I have few bugs to work out. As for hyphens
not working, it sounds like the same analyzer is not being used for
the queries but you’d have to check with one of the acts_as_ferret
developers that you are using it correctly.

I am currently using 0.9.6 after reading about the update somewhere
else. I didn’t have luck switching the analyzer, so i tried the
alternate solution suggested elsewhere on this forum, which was just
stripping the STOP words out of the original query. This actually seems
to work somewhat well, but doesn’t solve the hypen problem.

Can you clarify what you mean about using the same analyzer for queries?
Is there any reason that hyphenated terms would not be getting indexed
and searched properly by default?

Thanks

Cam

jennyw · August 25, 2006, 4:43pm

On 8/25/06, Cameron H. [email protected] wrote:

else. I didn’t have luck switching the analyzer, so i tried the
alternate solution suggested elsewhere on this forum, which was just
stripping the STOP words out of the original query. This actually seems
to work somewhat well, but doesn’t solve the hypen problem.

Can you clarify what you mean about using the same analyzer for queries?
Is there any reason that hyphenated terms would not be getting indexed
and searched properly by default?

This was in fact a bug. Thanks Cameron. It is fixed in subversion now.

Cheers,
Dave