Ruby Forum Ferret > acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)

Posted by Alain Ravet (Guest)
on 13.11.2007 13:47
(Received via mailing list)
Hi all,


I cannot make aaf (rev. 220) use my custom analyzer, despite following 
the
indications @

   http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage


To pinpoint the problem, I created a model + a simple analyzer with 2 
stop
words : "fax" and "gsm".

test 1 : model.rebuild_index + model.find_by_contents("fax")  # fax is a
stop word.
   => I get a result when I should not.

 (note : I delete the index directory => I can see the index is 
recreated,
        index/develop

).

test 2 : insert a 'raise' in the token_stream() method => it's never 
thrown.

test 3 : use the standard analyzer, to exclude the 2 stop words => same
wrong result.
    class AccessPointKind2 < ActiveRecord::Base

      set_table_name "access_point_kinds2"

        acts_as_ferret(
            {:remote => true, :fields => { :name  => {:store => :yes}} } 
,
            { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new(["fax","gsm"])
}
         )
    end





Here are the model and the analyzer :
MODEL :

  class AccessPointKind2 < ActiveRecord::Base
      set_table_name "access_point_kinds2"

      acts_as_ferret(
          {:remote => true, :fields => { :name  => {:store => :yes}} } ,
          {:analyzer => PlainAsciiAnalyzer.new}
        )
  end


ANALYZER
lib : plain_ascii_analyzer.rb
  class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer
    include ::Ferret::Analysis
    def token_stream(field, str)
          StopFilter.new(
            StandardTokenizer.new(str) ,
            ["fax", "gsm"]
          )
      # raise <<<----- is never executed when uncommented !!
    end
  end



In the console, I rebuild the index + search for a stop word => I get a
results, when I should not :


>> reload!; AccessPointKind2.rebuild_index ;
AccessPointKind2.find_by_contents("gsm").collect &:name
Reloading...
  AccessPointKind2 Columns (0.002963)   SHOW FIELDS FROM 
access_point_kinds2
Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil,
looks like we are not the server
Will use remote index server which should be available at
druby://localhost:9010
default field list: [:name]
  AccessPointKind2 Load (0.002706)   SELECT * FROM access_point_kinds2 
WHERE
(access_point_kinds2.id in ('7','12','13','8','2'))
Query: gsm
total hits: 5, results delivered: 5
=> ["gsm", "gsm", "gsm(werk)", "gsm(privé)", "gsm(privé)"]
>>


I guess it's  obvious, but I cannot see it.
Help.

Thanks in advance.

Alain
Posted by Jens Kraemer (Guest)
on 14.11.2007 10:26
(Received via mailing list)
Hi,

I just tried and I'm afraid I couldn't reproduce your problem here (with
aaf trunk). I just committed a testcase using StandardAnalyzer with your
stop word list, and it works as intended. I also tried with your
analyzer class from below, same result.

Could you please try the lates aaf from trunk to see if it fixes your
problem?


Cheers,
Jens


On Tue, Nov 13, 2007 at 01:47:04PM +0100, Alain Ravet wrote:
> words : "fax" and "gsm".
> test 2 : insert a 'raise' in the token_stream() method => it's never thrown.
> Ferret::Analysis::StandardAnalyzer.new(["fax","gsm"])
> 
> ANALYZER
>   end
>   AccessPointKind2 Columns (0.002963)   SHOW FIELDS FROM access_point_kinds2
> >>
> 
> 
> I guess it's  obvious, but I cannot see it.
> Help.
> 
> Thanks in advance.
> 
> Alain

> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk@rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

--
Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database
Posted by Alain Ravet (Guest)
on 14.11.2007 22:56
(Received via mailing list)
Jens,

  > I just tried and I'm afraid I couldn't reproduce your problem here 
(with
aaf trunk).  ...
  > Could you please try the lates aaf from trunk to see if it fixes 
your
problem?


Same problem after installing the lasted version (262) of aaf  : the 
custop
analyzer I pass as an aaf parameter is not used.

As a quick test, I tried using the "No Stop Word" custom  analyzer as
documented @
http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage
on a simple LUT table/model, to no avail.
I tried the new syntax with the same wrong result.

Setup :

  * I've installed the latest trunk version of aaf (262)
  * killed + restarted a (new) DrB server
      $ ./script/ferret_server -e production start
  * checked the Ferret version :
      $ gem list ferret   ==> ferret (0.11.4)


Test :

I created a record where the name is a default stop word
   >> Country.find 11

      Country Load (0.000388)   SELECT * FROM countries WHERE
(countries.`id` = 11)
    => #<Country id: 11, name: " the">

model, way 1 :

  class Country < ActiveRecord::Base
      acts_as_ferret( { :fields => [:name] }, { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new( []) } )
  end


model, way 2 :

class Country < ActiveRecord::Base
acts_as_ferret(
    :fields => [:name] ,
    :remote => true,
    :ferret =>  {:analyzer => Ferret::Analysis:: 
StandardAnalyzer.new([]) }
     )
end



PROBLEM : in both cases it doesn't find any record where the name is 
'the'


 >> reload! ; Country.*rebuild_index*  ; Country.*find_by_contents*(" 
the")


 >> reload! ; Country.rebuild_index  ; Country.find_by_contents ("the")
 Reloading...
 Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil,
looks like we are not the server
 Will use remote index server which should be available at
druby://localhost:9010
 default field list: [:name]
 Query: the
 total hits: 0, results delivered: 0
 => #<ActsAsFerret::SearchResults:0x324ab3c @per_page=0, 
@current_page=nil,
 @total_hits=0, @results=[], @total_pages=0>




I tried with my custom analyser (from the previous message), with the 
same
wrong result.


So, it looks like aaf is not using the custom analyzer I declared in the
model.
It doesn't make any sense to me.



Alain Ravet
Posted by Alain Ravet (Guest)
on 14.11.2007 22:58
(Received via mailing list)
remark : some spaces were erroneously inserted before the word "the"
when I formatted the email, and are not present in the real code.

So

  >     => #<Country id: 11, name: "  the">
  >   ..
  >  >> reload! ; Country.rebuild_index  ; Country.find_by_contents(" 
the")


should read :

  >  => #<Country id: 11, name: "the">
  > ..
  >  >> reload! ; Country.rebuild_index  ; 
Country.find_by_contents("the")
Posted by Alain Ravet (Guest)
on 15.11.2007 00:00
(Received via mailing list)
I'm one step further :
  - Good : I now know aaf knows about/received the custom analyzer
but
  - Bad : the analyzer is not used by aaf ( : it stops on words it 
should
not stop on)

New test : a "no stop word" analyzer, adapted from the german stemming
analyser @
        http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage


file: model/country.rb
----------------------
  class Test2Analyzer < ::Ferret::Analysis::Analyzer
    include Ferret::Analysis
    def initialize(stop_words = [])
      @stop_words = stop_words
    end
    def token_stream(field, str)
      StemFilter.new(StopFilter.new(LowerCaseFilter.new(
StandardTokenizer.new(str)), @stop_words), 'de')
    end
  end
  class Country < ActiveRecord::Base
    acts_as_ferret(
      :fields => [:name] ,
      :remote => true,
      :ferret =>  {:analyzer => Test2Analyzer.new([]) }
    )
  end


0°/ delete the ferret index directory
1°/ restart the console and rebuild the index :


  ./script/console
  >> Country.rebuild_index
  Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is 
nil,
looks like we are not the server
  Will use remote index server which should be available at
druby://localhost:9010
  default field list: [:name]
  => nil


2°/ confirm that aaf knows about my "no_stop_words" custom analyzer :

>> puts Country.aaf_index.to_yaml
--- !ruby/object:ActsAsFerret::RemoteIndex
config:
  :fields:
  - :name
  :mysql_fast_batches: true
  :name: countries
  :class_name: Country
  :index_dir:
/Users/aravet/aaprojets/newgids/newgids_machine/index/development/country
  :remote: druby://localhost:9010
  :reindex_batch_size: 1000
  :store_class_name: false
  :ferret_fields:
    :name:
      :store: :no
      :term_vector: :with_positions_offsets
      :boost: 1.0
      :index: :yes
      :highlight: :yes
  :single_index: false
  :ferret: &id001
    :key: :id
    :auto_flush: true
    :or_default: false
    :path:
/Users/aravet/aaprojets/newgids/newgids_machine/index/development/country
    :create_if_missing: true
    :handle_parse_errors: true
    :analyzer: !ruby/object:Test2Analyzer    <<<<----------- Good
      stop_words: []                <<<<----------- Good
    :default_field:
    - :name
  :enabled: true
ferret_config: *id001
server: !ruby/object:DRb::DRbObject
  ref:
  uri: druby://localhost:9010
=> nil




3°/ confirm that there is record with name == "the"

 >> Country.find_by_name "the"
   Country Load (0.000427)   SELECT * FROM countries WHERE 
(countries.`name`
= 'the') LIMIT 1
 => #<Country id: 11, name: "the">


4°/ try and find "t*" it with aaf
=>  DOES NOT WORK (does not find Country[:name => "the"])

  >> Country.find_by_contents "t*"
  Query: t*
  total hits: 0, results delivered: 0
  => #<ActsAsFerret::SearchResults:0x31ff754 @per_page=0, 
@current_page=nil,
@total_hits=0, @results=[], @total_pages=0>


5°/ do the same for "t*", a non stop word
=>  IT WORKS (finds Country[:name => "Frankrijk"])

>> Country.find_by_contents "f*"
  Country Load (0.000420)   SELECT * FROM countries WHERE (countries.id 
in
('2'))
Query: f*
total hits: 1, results delivered: 1
=> #<ActsAsFerret::SearchResults:0x31fa4ac @per_page=1, 
@current_page=nil,
@total_hits=1, @results=[#<Country id: 2, name: "Frankrijk">], 
total_pages1


So, aaf  (rev 262)
* associates the right custom analyzer with the model,
* but doesn't seem to use it when finding_by_contents (? and rebuilding 
the
index ??)


Alain
Posted by Hongli Lai (Guest)
on 15.11.2007 00:25
(Received via mailing list)
Alain Ravet wrote:
>   class Country < ActiveRecord::Base
>     acts_as_ferret(  
>       :fields => [:name] ,
>       :remote => true,
>       :ferret =>  {:analyzer => Test2Analyzer.new([]) } 
>     )
>   end

Try this:

acts_as_ferret({ :fields => [:name], :remote => true },
{ :analyzer => Test2Analyzer.new([]) })
Posted by Jens Kraemer (Guest)
on 15.11.2007 10:04
(Received via mailing list)
On Thu, Nov 15, 2007 at 12:24:25AM +0100, Hongli Lai wrote:
> 
> acts_as_ferret({ :fields => [:name], :remote => true },
> { :analyzer => Test2Analyzer.new([]) })

this won't help, these are both valid ways to call acts_as_ferret. The
:ferret syntax is the preferred one, however.

Jens


--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Jens Kraemer (Guest)
on 15.11.2007 10:09
(Received via mailing list)
Hi Alain,

could you please check the index created by aaf with plain ferret and
your custom analyzer to see if your queries deliver the expected results
then?

That way we should be able to find out if the problem is with indexing
or searching through aaf.


Jens

On Thu, Nov 15, 2007 at 12:00:04AM +0100, Alain Ravet wrote:
> 
>     end
> 0°/ delete the ferret index directory
>   => nil
>   :name: countries
>       :boost: 1.0
>     :handle_parse_errors: true
> 
> 
> 5°/ do the same for "t*", a non stop word
> 
> So, aaf  (rev 262)
> * associates the right custom analyzer with the model,
> * but doesn't seem to use it when finding_by_contents (? and rebuilding the
> index ??)
> 
> 
> Alain

> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk@rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by unknown (Guest)
on 29.01.2008 21:22
(Received via mailing list)
Jens Kraemer <kraemer@webit.de> writes:

>> Try this:
>> 
>> acts_as_ferret({ :fields => [:name], :remote => true },
>> { :analyzer => Test2Analyzer.new([]) })
>
> this won't help, these are both valid ways to call acts_as_ferret. The
> :ferret syntax is the preferred one, however.

Just for information, I was using an old or bad syntax for aaf.

I was using acts_as_ferret :fields [], :analyzer => MyAnalyzer.new
and it wasn't working. (A raise in initialize of MyAnalyzer was raising
but not in token_stream)

I'm now using :ferret => {:analyzer => MyAnalyzer} and it works as
expected.