Bug when assigning new analyzer?

require ‘rubygems’
require ‘ferret’
include Ferret

PATH = ‘/tmp/ferret_stopwords_test’

index = Index::IndexWriter.new(:path => PATH, :create => true)

index.analyzer = Analysis::StandardAnalyzer.new([])
index << {:title => ‘a few good men’, :language => ‘en’}

index.analyzer = Analysis::StandardAnalyzer.new([‘men’])
index << {:title => ‘a few good men’, :language => ‘nl’}

index.close

searcher = Index::Index.new(:path => PATH)
puts searcher.search(’*:men AND language:nl’).total_hits
#=> 1

i’d expect zero results, as ‘men’ is a stopword at the time of indexing
with language:nl. is this a bug or a lack of understanding on my part.

a workaround would be to close and reopen the index after every
language, that returns the expected zero, as expected. don’T know how
much overhead that would be.

i am on ruby 1.8.5 / os x.

any assistance would be greatly appreciated since i have no clue why
this happens …

cheers,
phillip

  • addendum 1: i use ferret 0.11.4

  • addendum 2: when i comment out the first index.analyzer assignment, i
    get:
    /Users/phillip/Sites/ruby/playground/ferret_stopwords.rb:13: [BUG] Bus
    Error
    ruby 1.8.5 (2006-12-25) [i686-darwin8.8.2]

  • addendum 3: the underlying problem i have is that i have many
    different languages that have to be correctly indexed. is there a best
    practise how to do that? i mean, better than having one index and
    switching the analyzer around?

thanks again,
phillip

On Wed, May 09, 2007 at 11:59:59PM +0200, Phillip O. wrote:

with language:nl. is this a bug or a lack of understanding on my part.
Queries get analyzed, too, i.e. to remove stop words from them. So
you’ll have to use the correct language-dependent Analyzer for your
searcher, too.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

hi jens,

thanks for making that clear, and sorry for the long delay in replying.
we were quite busy.

cheers,
phillip

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs