Hi all, I cannot make aaf (rev. 220) use my custom analyzer, despite following the indications @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage To pinpoint the problem, I created a model + a simple analyzer with 2 stop words : "fax" and "gsm". test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a stop word. => I get a result when I should not. (note : I delete the index directory => I can see the index is recreated, index/develop ). test 2 : insert a 'raise' in the token_stream() method => it's never thrown. test 3 : use the standard analyzer, to exclude the 2 stop words => same wrong result. class AccessPointKind2 < ActiveRecord::Base set_table_name "access_point_kinds2" acts_as_ferret( {:remote => true, :fields => { :name => {:store => :yes}} } , { :analyzer => Ferret::Analysis::StandardAnalyzer.new(["fax","gsm"]) } ) end Here are the model and the analyzer : MODEL : class AccessPointKind2 < ActiveRecord::Base set_table_name "access_point_kinds2" acts_as_ferret( {:remote => true, :fields => { :name => {:store => :yes}} } , {:analyzer => PlainAsciiAnalyzer.new} ) end ANALYZER lib : plain_ascii_analyzer.rb class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer include ::Ferret::Analysis def token_stream(field, str) StopFilter.new( StandardTokenizer.new(str) , ["fax", "gsm"] ) # raise <<<----- is never executed when uncommented !! end end In the console, I rebuild the index + search for a stop word => I get a results, when I should not : >> reload!; AccessPointKind2.rebuild_index ; AccessPointKind2.find_by_contents("gsm").collect &:name Reloading... AccessPointKind2 Columns (0.002963) SHOW FIELDS FROM access_point_kinds2 Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, looks like we are not the server Will use remote index server which should be available at druby://localhost:9010 default field list: [:name] AccessPointKind2 Load (0.002706) SELECT * FROM access_point_kinds2 WHERE (access_point_kinds2.id in ('7','12','13','8','2')) Query: gsm total hits: 5, results delivered: 5 => ["gsm", "gsm", "gsm(werk)", "gsm(privé)", "gsm(privé)"] >> I guess it's obvious, but I cannot see it. Help. Thanks in advance. Alain
on 13.11.2007 13:47
on 14.11.2007 10:26
Hi, I just tried and I'm afraid I couldn't reproduce your problem here (with aaf trunk). I just committed a testcase using StandardAnalyzer with your stop word list, and it works as intended. I also tried with your analyzer class from below, same result. Could you please try the lates aaf from trunk to see if it fixes your problem? Cheers, Jens On Tue, Nov 13, 2007 at 01:47:04PM +0100, Alain Ravet wrote: > words : "fax" and "gsm". > test 2 : insert a 'raise' in the token_stream() method => it's never thrown. > Ferret::Analysis::StandardAnalyzer.new(["fax","gsm"]) > > ANALYZER > end > AccessPointKind2 Columns (0.002963) SHOW FIELDS FROM access_point_kinds2 > >> > > > I guess it's obvious, but I cannot see it. > Help. > > Thanks in advance. > > Alain > _______________________________________________ > Ferret-talk mailing list > Ferret-talk@rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Krämer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
on 14.11.2007 22:56
Jens, > I just tried and I'm afraid I couldn't reproduce your problem here (with aaf trunk). ... > Could you please try the lates aaf from trunk to see if it fixes your problem? Same problem after installing the lasted version (262) of aaf : the custop analyzer I pass as an aaf parameter is not used. As a quick test, I tried using the "No Stop Word" custom analyzer as documented @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage on a simple LUT table/model, to no avail. I tried the new syntax with the same wrong result. Setup : * I've installed the latest trunk version of aaf (262) * killed + restarted a (new) DrB server $ ./script/ferret_server -e production start * checked the Ferret version : $ gem list ferret ==> ferret (0.11.4) Test : I created a record where the name is a default stop word >> Country.find 11 Country Load (0.000388) SELECT * FROM countries WHERE (countries.`id` = 11) => #<Country id: 11, name: " the"> model, way 1 : class Country < ActiveRecord::Base acts_as_ferret( { :fields => [:name] }, { :analyzer => Ferret::Analysis::StandardAnalyzer.new( []) } ) end model, way 2 : class Country < ActiveRecord::Base acts_as_ferret( :fields => [:name] , :remote => true, :ferret => {:analyzer => Ferret::Analysis:: StandardAnalyzer.new([]) } ) end PROBLEM : in both cases it doesn't find any record where the name is 'the' >> reload! ; Country.*rebuild_index* ; Country.*find_by_contents*(" the") >> reload! ; Country.rebuild_index ; Country.find_by_contents ("the") Reloading... Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, looks like we are not the server Will use remote index server which should be available at druby://localhost:9010 default field list: [:name] Query: the total hits: 0, results delivered: 0 => #<ActsAsFerret::SearchResults:0x324ab3c @per_page=0, @current_page=nil, @total_hits=0, @results=[], @total_pages=0> I tried with my custom analyser (from the previous message), with the same wrong result. So, it looks like aaf is not using the custom analyzer I declared in the model. It doesn't make any sense to me. Alain Ravet
on 14.11.2007 22:58
remark : some spaces were erroneously inserted before the word "the"
when I formatted the email, and are not present in the real code.
So
> => #<Country id: 11, name: " the">
> ..
> >> reload! ; Country.rebuild_index ; Country.find_by_contents("
the")
should read :
> => #<Country id: 11, name: "the">
> ..
> >> reload! ; Country.rebuild_index ;
Country.find_by_contents("the")
on 15.11.2007 00:00
I'm one step further :
- Good : I now know aaf knows about/received the custom analyzer
but
- Bad : the analyzer is not used by aaf ( : it stops on words it
should
not stop on)
New test : a "no stop word" analyzer, adapted from the german stemming
analyser @
http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage
file: model/country.rb
----------------------
class Test2Analyzer < ::Ferret::Analysis::Analyzer
include Ferret::Analysis
def initialize(stop_words = [])
@stop_words = stop_words
end
def token_stream(field, str)
StemFilter.new(StopFilter.new(LowerCaseFilter.new(
StandardTokenizer.new(str)), @stop_words), 'de')
end
end
class Country < ActiveRecord::Base
acts_as_ferret(
:fields => [:name] ,
:remote => true,
:ferret => {:analyzer => Test2Analyzer.new([]) }
)
end
0°/ delete the ferret index directory
1°/ restart the console and rebuild the index :
./script/console
>> Country.rebuild_index
Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is
nil,
looks like we are not the server
Will use remote index server which should be available at
druby://localhost:9010
default field list: [:name]
=> nil
2°/ confirm that aaf knows about my "no_stop_words" custom analyzer :
>> puts Country.aaf_index.to_yaml
--- !ruby/object:ActsAsFerret::RemoteIndex
config:
:fields:
- :name
:mysql_fast_batches: true
:name: countries
:class_name: Country
:index_dir:
/Users/aravet/aaprojets/newgids/newgids_machine/index/development/country
:remote: druby://localhost:9010
:reindex_batch_size: 1000
:store_class_name: false
:ferret_fields:
:name:
:store: :no
:term_vector: :with_positions_offsets
:boost: 1.0
:index: :yes
:highlight: :yes
:single_index: false
:ferret: &id001
:key: :id
:auto_flush: true
:or_default: false
:path:
/Users/aravet/aaprojets/newgids/newgids_machine/index/development/country
:create_if_missing: true
:handle_parse_errors: true
:analyzer: !ruby/object:Test2Analyzer <<<<----------- Good
stop_words: [] <<<<----------- Good
:default_field:
- :name
:enabled: true
ferret_config: *id001
server: !ruby/object:DRb::DRbObject
ref:
uri: druby://localhost:9010
=> nil
3°/ confirm that there is record with name == "the"
>> Country.find_by_name "the"
Country Load (0.000427) SELECT * FROM countries WHERE
(countries.`name`
= 'the') LIMIT 1
=> #<Country id: 11, name: "the">
4°/ try and find "t*" it with aaf
=> DOES NOT WORK (does not find Country[:name => "the"])
>> Country.find_by_contents "t*"
Query: t*
total hits: 0, results delivered: 0
=> #<ActsAsFerret::SearchResults:0x31ff754 @per_page=0,
@current_page=nil,
@total_hits=0, @results=[], @total_pages=0>
5°/ do the same for "t*", a non stop word
=> IT WORKS (finds Country[:name => "Frankrijk"])
>> Country.find_by_contents "f*"
Country Load (0.000420) SELECT * FROM countries WHERE (countries.id
in
('2'))
Query: f*
total hits: 1, results delivered: 1
=> #<ActsAsFerret::SearchResults:0x31fa4ac @per_page=1,
@current_page=nil,
@total_hits=1, @results=[#<Country id: 2, name: "Frankrijk">],
total_pages1
So, aaf (rev 262)
* associates the right custom analyzer with the model,
* but doesn't seem to use it when finding_by_contents (? and rebuilding
the
index ??)
Alain
on 15.11.2007 00:25
Alain Ravet wrote: > class Country < ActiveRecord::Base > acts_as_ferret( > :fields => [:name] , > :remote => true, > :ferret => {:analyzer => Test2Analyzer.new([]) } > ) > end Try this: acts_as_ferret({ :fields => [:name], :remote => true }, { :analyzer => Test2Analyzer.new([]) })
on 15.11.2007 10:04
On Thu, Nov 15, 2007 at 12:24:25AM +0100, Hongli Lai wrote: > > acts_as_ferret({ :fields => [:name], :remote => true }, > { :analyzer => Test2Analyzer.new([]) }) this won't help, these are both valid ways to call acts_as_ferret. The :ferret syntax is the preferred one, however. Jens -- Jens Krämer webit! Gesellschaft für neue Medien mbH Schnorrstraße 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer@webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
on 15.11.2007 10:09
Hi Alain, could you please check the index created by aaf with plain ferret and your custom analyzer to see if your queries deliver the expected results then? That way we should be able to find out if the problem is with indexing or searching through aaf. Jens On Thu, Nov 15, 2007 at 12:00:04AM +0100, Alain Ravet wrote: > > end > 0°/ delete the ferret index directory > => nil > :name: countries > :boost: 1.0 > :handle_parse_errors: true > > > 5°/ do the same for "t*", a non stop word > > So, aaf (rev 262) > * associates the right custom analyzer with the model, > * but doesn't seem to use it when finding_by_contents (? and rebuilding the > index ??) > > > Alain > _______________________________________________ > Ferret-talk mailing list > Ferret-talk@rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- Jens Krämer webit! Gesellschaft für neue Medien mbH Schnorrstraße 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer@webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
on 29.01.2008 21:22
Jens Kraemer <kraemer@webit.de> writes: >> Try this: >> >> acts_as_ferret({ :fields => [:name], :remote => true }, >> { :analyzer => Test2Analyzer.new([]) }) > > this won't help, these are both valid ways to call acts_as_ferret. The > :ferret syntax is the preferred one, however. Just for information, I was using an old or bad syntax for aaf. I was using acts_as_ferret :fields [], :analyzer => MyAnalyzer.new and it wasn't working. (A raise in initialize of MyAnalyzer was raising but not in token_stream) I'm now using :ferret => {:analyzer => MyAnalyzer} and it works as expected.