ANN: acts_as_ferret

weibel · December 2, 2005, 7:43pm

Hi all

This week I have worked with Rails and Ferret to test Ferrets (and
Lucenes)
capabilities. I decided to make a mixin for ActiveRecord as it seemed
the
simplest possible solution and I ended up making this into a plugin.

For more info on Ferret see:
http://ferret.davebalmain.com/trac/

The plugin is functional but could easily be refined. Anyway I want to
share it
with you. Regard it as a basic solution. Most of the ideas and code is
taken
from these sources

Howtos and help on Ferret with Rails:

Peak Obsession

http://article.gmane.org/gmane.comp.lang.ruby.rails/26859

http://ferret.davebalmain.com/trac

http://aslakhellesoy.com/articles/2005/11/18/using-ferret-with-activerecord

http://rubyforge.org/pipermail/ferret-talk/2005-November/000014.html

Howtos on creating plugins:

Peak Obsession

http://www.jamis.jamisbuck.org/articles/2005/10/11/plugging-into-rails

Simplest Possible Plugin Manager For Rails

Peak Obsession

The result is the acts_as_ferret Mixin for ActivcRecord.

Use it as follows:
In any model.rb add acts_as_ferret

class Foo < ActiveRecord::Base
acts_as_ferret
end

All CRUD operations will be performed on both ActiveRecord (as usual)
and a
ferret index for further searching.

The following method is available in your controllers:

ActiveRecord::find_by_contents(query) # Query is a string representing
you query

The plugin follows the usual plugin structure and consists of 2 files:

{RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb
{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb

The Ferret DB is stored in:

{RAILS_ROOT}/db/index.db

Here follows the code:

CODE for init.rb

require ‘acts_as_ferret’

END init.rb

CODE for acts_as_ferret.rb

require ‘active_record’
require ‘ferret’

module FerretMixin #(was: Foo)
module Acts #:nodoc:
module ARFerret #:nodoc:

     def self.append_features(base)
        super
        base.extend(MacroMethods)
     end

declare the class level helper methods

which will load the relevant instance methods defined below when

invoked

     module MacroMethods

        def acts_as_ferret
           extend FerretMixin::Acts::ARFerret::ClassMethods
           class_eval do
              include FerretMixin::Acts::ARFerret::ClassMethods

              after_create :ferret_create
              after_update :ferret_update
              after_destroy :ferret_destroy
           end
        end

     end

     module ClassMethods
        include Ferret

        INDEX_DIR = "#{RAILS_ROOT}/db/index.db"

        def self.reloadable?; false end

        # Finds instances by file contents.
        def find_by_contents(query, options = {})
           index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
           query_parser   ||=

QueryParser.new(index_searcher.reader.get_field_names.to_a)
query = query_parser.parse(query)

           result = []
           index_searcher.search_each(query) do |doc, score|
              id = index_searcher.reader.get_document(doc)["id"]
              res = self.find(id)
              result << res if res
           end
           index_searcher.close()
           result
        end

        # private

        def ferret_create
           index ||= Index::Index.new(:key => :id,
                                   :path => INDEX_DIR,
                                   :create_if_missing => true,
                                   :default_field => "*")
           index << self.to_doc
           index.optimize()
           index.close()
        end

        def ferret_update
           #code to update index
           index ||= Index::Index.new(:key => :id,
                                   :path => INDEX_DIR,
                                   :create_if_missing => true,
                                   :default_field => "*")
           index.delete(self.id.to_s)
           index << self.to_doc
           index.optimize
           index.close()
        end

        def ferret_destroy
           # code to delete from index
           index ||= Index::Index.new(:key => :id,
                                   :path => INDEX_DIR,
                                   :create_if_missing => true,
                                   :default_field => "*")
           index_writer.delete(self.id.to_s)
           index_writer.optimize()
           index_writer.close()
        end

        def to_doc

Churn through the complete Active Record and add it to the Ferret

document
doc = Ferret::Document::Document.new
self.attributes.each_pair do |key,val|
doc << Ferret::Document::Field.new(key, val.to_s,
Ferret::Document::Field::Store::YES,
Ferret::Document::Field::Index::TOKENIZED)
end
doc
end
end
end
end
end

reopen ActiveRecord and include all the above to make

them available to all our models if they want it

ActiveRecord::Base.class_eval do
include FerretMixin::Acts::ARFerret
end

END acts_as_ferret.rb

weibel · December 2, 2005, 8:15pm

+1 great work

weibel · December 2, 2005, 9:01pm

Very nice Kasper-

Thanks for sharing!

Cheers-
-Ezra
On Dec 2, 2005, at 10:22 AM, Kasper W. wrote:

http://ferret.davebalmain.com/trac

All CRUD operations will be performed on both ActiveRecord (as
{RAILS_ROOT}/vendor/plugins/acts_as_ferret/init.rb

END init.rb
        super
           extend FerretMixin::Acts::ARFerret::ClassMethods

           query_parser   ||=
           result
           index.optimize()
           index << self.to_doc
           index_writer.delete(self.id.to_s)
Ferret::Document::Field::Store::YES,

them available to all our models if they want it

http://lists.rubyonrails.org/mailman/listinfo/rails

-Ezra Z.
Yakima Herald-Republic
WebMaster
http://yakimaherald.com
509-577-7732
[email protected]

weibel · December 2, 2005, 9:18pm

Thanks… one problem. I beleive that I’m doing everything correctly
except I keep getting this error on any CRUD operating:

undefined local variable or method `document’ for #Region:0xb7124c50

(where #Region:.... is the name of my model)

any ideas? The index is created and I’ve been able to test Ferret from a
command line script just fine.

weibel · December 2, 2005, 9:45pm

On 2-dec-2005, at 19:22, Kasper W. wrote:

Hi all

This week I have worked with Rails and Ferret to test Ferrets (and
Lucenes)
capabilities. I decided to make a mixin for ActiveRecord as it
seemed the
simplest possible solution and I ended up making this into a plugin.

I recently finished a simple search plugin, which works like this

class Page < ActiveRecord::Base
indexes_columns :title, :body, :into=>‘somecolumn’
end

it’s here http://julik.textdriven.com/svn/tools/rails_plugins/
simple_search/ (just finished the tests)

Maybe we can join the two plugins and get a nice search hook for AR
searching? Along the lines of

class Page < ActiveRecord::Base
indexes_columns :title, :body, :into=>MainFerretIndex # if you
pass a Ferret index it gets hooked instead of a column for LIKE
end

Or even maintain named Ferret indexes if the user has Ferret and
resort to LIKE queries if he doesn’t?

Julian ‘Julik’ Tarkhanov
me at julik.nl

weibel · December 3, 2005, 2:10am

Hi Kasper,

Nice work. Do you mind if I put this on the Ferret Wiki?

A few minor points. And a disclaimer, I haven’t had time to use Rails
since I started working on Ferret so I could be wrong about a few
things here. I noticed in ferret_destroy you have index_writer. I
think this is meant to be just index. Also, where you have the lines;

          index.optimize()
          index.close()

I would replace these with;

          index.flush()

Optimizing the index every time is not necessary and can be quite slow
for large indexes. Also, if you close the index, the next time you try
to use it you should get an error. I’m not sure why it works for you.
It might be a bug. I’ll have to check it out. Better to leave the
index open. If you are optimizing every time because you are really
concerned about search speed, it is better just to set the merge
factor to 2. ie;

           index ||= Index::Index.new(:key => :id,
                                  :path => INDEX_DIR,
                                  :merge_factor => 2)

Remember that there is generally a payoff between indexing speed and
search speed. Also note that I removed the :default_field and
:create_if_missing options. They were set to the defaults anyway.

Another thing, since you are setting the key to :id, there is no need
to do the delete when you do the update. This will happen
automatically.

Lastly, and most importantly, I think this will only work if you only
apply it to one object or you’ll get conflicting ids from two
different tables. To make this available to more than one object,
there are two solutions I can think of. You could have a separate
index directory for each object. Or you can set the key like this;

           index ||= Index::Index.new(:key => [:id, :table],
                                  :path => INDEX_DIR)

And your to_doc method would need to store the name of the table in
the :table field in the document.

I hope all this information helps. When I get some time to use Rails
I’ll post my own code.

Cheers,
Dave

PS: I just released Ferret 0.3.0 so gem update and enjoy.

weibel · December 3, 2005, 3:23am

David B. <dbalmain.ml@…> writes:

Hi Kasper,

Nice work. Do you mind if I put this on the Ferret Wiki?

Thanks David

This is really quality input!

It’s my first week with Ferret and I’m still working my way into it. I
hope I’ll
get time to reflect on your comments before monday.

Feel free to put it on the wiki!

Kasper

weibel · December 5, 2005, 5:44am

great job on this Kasper. I took a look at this a few days ago and
started
playing with it this weekend. I’ve taken a few of Erik’s suggestions and
started
trying to implement them. I don’t know if you’ve already started working
on
enhancing it, but I’d be very interested in contributing my changes.
It’ll
probably be a few days before I can get back in and finish things up,
though.
(The Portland Ruby Brigade has their monthly meeting on Tuesday, so
that’s one
nights work missed.
;~)

Here’s the changes I’ve started working on:

Adding configuration

The notation I’m working on is something like this:
```
 acts_as_ferret :index_dir => "#{RAILS_ROOT}/index/", fields => 
```

{…}

Still playing with the configuration of the fields. I've also

written it so
that the default is to index all fields with the default settings. In
addition,
it should be possible to simply pass an array to the fields parameter
and
default the settings for Storable, etc.

Adding the ability to pass Query objects to the find_by_contents
method.

I’ve been doing some refactoring along the way, too, and hope to add
some unit
tests eventually. One final suggestion, perhaps the name should be
changed to
acts_as_indexed?

Anyway, this is great work. I hope I can make worthwhile contributions
to this.

–
Thomas L.

weibel · December 3, 2005, 12:04am

James R <adamjroth@…> writes:

Thanks… one problem. I beleive that I’m doing everything correctly
except I keep getting this error on any CRUD operating:

The following in acts_as_ferret.tb should be one line (almost at the end
of the
file)

Churn through the complete Active Record and add it to the Ferret

document

Take care with those line breaks

Kasper

weibel · December 5, 2005, 6:16am

Hi Thomas,

For additionial ideas look here;

http://ferret.davebalmain.com/trac/wiki/FerretOnRails

And of course, please feel free to add your improvements.

Cheers,
Dave

weibel · December 5, 2005, 10:39am

On Dec 4, 2005, at 11:39 PM, Thomas L. wrote:

(The Portland Ruby Brigade has their monthly meeting on Tuesday, so
that’s one
nights work missed.
;~)

You Portland Rubyists really know how to party! I went to the event
during OSCON in August - what a blast.

Adding configuration

The notation I’m working on is something like this:
 acts_as_ferret :index_dir => "#{RAILS_ROOT}/index/", fields  
=> {…}

So you’re thinking that each model may have its own index? I wasn’t
sure if one index per model made sense or whether a single index,
globally configured through environment.rb and friends, made the most
sense. Using one index would allow some future clever things such as
querying without the table name allowing results to come back with
objects spanning multiple models.

I’m leaning towards preferring a single index, such that
the :index_dir configuration would be done via environments.rb
globally, not per model.

Adding the ability to pass Query objects to the find_by_contents
method.

Cool. Maybe this should be renamed to find_by_ferret? If a String
is passed in, it gets parsed (with the options hash allowing control
over the parsing), and if a Query is passed in then it is used as-is.

I’ve been doing some refactoring along the way, too, and hope to
add some unit
tests eventually. One final suggestion, perhaps the name should be
changed to
acts_as_indexed?

I like it being acts_as_ferret personally. “indexed” is overloaded
within the relational database domain, so it could be construed as
having to do with DB indexes.

Anyway, this is great work. I hope I can make worthwhile
contributions to this.

Thanks for your efforts! I’m glad to see this all coming together.

Erik

weibel · December 3, 2005, 12:17pm

CC’ing ferret-talk also.

Nice work, Kasper! You’ve beaten me to it - this was something I
was planning on tackling in the near future.

I’ve got some additional feedback for you inlined below. Keep in
mind that I’m being highly detailed in my feedback, in order to help
this extension become the best it can be given Lucene best
practices. Your work is a great start, and I want to see this
evolve. All comments below are constructive, not even ‘criticism’.
Thanks for getting this started!

On Dec 2, 2005, at 1:22 PM, Kasper W. wrote:

The result is the acts_as_ferret Mixin for ActivcRecord.

Use it as follows:
In any model.rb add acts_as_ferret

class Foo < ActiveRecord::Base
acts_as_ferret
end

Ideally there will be many options desired besides just enabling a
table to be indexed fully. More on that in a moment.

All CRUD operations will be performed on both ActiveRecord (as
usual) and a
ferret index for further searching.

The toughest issue to deal with here is transactions. Suppose a
database operation rolls back - then what happens to the index? It’s
out of sync. I don’t have any easy solutions though, and it is an
issue that pops up regularly in the Java Lucene community as well.
There is quite a mismatch between a relational database and a full-
text index when it comes to how updates and additions are handled.

At the very least, a warning should be included mentioning the
transactional issue.

Another facility that is desirable with Lucene is the ability to
rebuild the entire index from scratch. Why? Perhaps you change the
analyzer, you will need to re-index all documents to have them re-
analyzed.

The following method is available in your controllers:

ActiveRecord::find_by_contents(query) # Query is a string
representing you query

Dave mentioned this, but you’re currently only indexing “id”, but not
the table name. Thus you could get documents that matching the query
from other tables, and get an id that doesn’t exist for the current
table or one from a different table. Table name needs to be
considered somehow, either by building a separate index for each
table, or adding the table name as an indexed, untokenized field.

The Ferret DB is stored in:

{RAILS_ROOT}/db/index.db

Please consider NOT calling it a “DB”. Ferret is Lucene. What it
builds is an “index”, not a “database” in the traditional sense. I
think it would be best to avoid “db” terminology to prevent confusion.

     module ClassMethods
        include Ferret

        INDEX_DIR = "#{RAILS_ROOT}/db/index.db"

I’m not sure how to parameterize “acts_as” extensions, but making the
index location more configurable would be good.

        # Finds instances by file contents.
        def find_by_contents(query, options = {})
           index_searcher ||= Search::IndexSearcher.new(INDEX_DIR)
           query_parser   ||=
QueryParser.new(index_searcher.reader.get_field_names.to_a)
query = query_parser.parse(query)

QueryParser is only one (and often crude) way to formulate a Query.
Ideally there would be a couple of methods to search with, one that
takes a QueryParser-friendly expression like “foo AND bar NOT baz”
and another that takes a Query instance allowing a developer to
formulate sophisticated queries via the Ferret query API rather than
parsing an expression. There are many good reasons for this, most
importantly from a user interface perspective where the application
makes more sense to have separate fields that build up a query rather
than the one totally free-form Google-esque text box. Many
applications need full-text search, but not in a way that users need
to know query expression operators like +/-/AND/OR.

Back to the table name issue, here you’ll want to wrap the query with
a BooleanQuery AND’d with a TermQuery for table:

so that
you’re sure the only hits returned will be for the current table.

           result = []
           index_searcher.search_each(query) do |doc, score|
              id = index_searcher.reader.get_document(doc)["id"]
              res = self.find(id)
              result << res if res
           end

Some handling of paging needs to be added here. It is unlikely that
all hits are needed, and accessing the Document for every hit will be
an enormous performance bottle-neck with lots of data. It is very
important to choose the hits enumeration carefully. Doing a database
query for every hit is also likely to be a huge bottleneck. Perhaps
doing a SQL “IN” query for all id’s after the narrowing the set of
hits (by page) is feasible, though I’m not sure what limits exist on
how many items you can have with an “IN” clause. I’ve not delved
into Ferret in much depth yet, but in Java Lucene a HitCollector
would possibly be a good way to handle this.

           index_searcher.close()
           result
        end

It is definitely unwise to close the IndexSearcher instance for every
search - leaving it open allows for field caches to warm up and
speeds up successive searches.

        # private

        def ferret_create
           index ||= Index::Index.new(:key => :id,
                                   :path => INDEX_DIR,
                                   :create_if_missing => true,
                                   :default_field => "*")

Dave mentioned the key thing, and I’ll reiterate the need to add the
table name to it.

           index << self.to_doc
           index.optimize()
           index.close()
        end

Reiterating Dave, but just to be thorough, optimizing and closing an
index is not a good thing to do on every document operation as it can
be slow. And definitely heed his advice about using flush. There
does need to be a facility to optimize the index on demand, which
developers may choose to do as a nightly batch process, or
periodically as the index becomes segmented.

        def ferret_update
           #code to update index
           index ||= Index::Index.new(:key => :id,
                                   :path => INDEX_DIR,
                                   :create_if_missing => true,
                                   :default_field => "*")

I recommend centralizing the Index constructor, so as to not
duplicate all of those parameters and allowing them to be changed in
one spot.

                                   :create_if_missing => true,
                                   :default_field => "*")
           index_writer.delete(self.id.to_s)
           index_writer.optimize()
           index_writer.close()
        end

Again, the table name should be part of the key for all operations
above.

end

This to_doc is where a lot of fun can be had. There are many options
that need to be parameterized by the developer at the model level.
For example, how a field is indexed is crucial. You’re storing and
tokenizing every field, including the “id” field. You definitely do
not want to tokenize the “id” field. Adding the table name is needed
also, untokenized. Each field should allow flexibility on how it is
(or is not) indexed, including whether to store/tokenize the field or
not. Storing fields is unnecessary in the ActiveRecord sense, since
what you’re returning from the search method are records from the
database, not documents from the index. Making the analyzer
controllable is necessary at a global level for the index, and
overridable on a per-field level too.

A common technique with Lucene when field-level searching granularity
is not relevant is to create an aggregate field, say “contents” where
all text is indexed. With Ferret, you could do this by iterating
over all fields that should be indexed/tokenized using the “contents”
as the field name for all fields of the record. Then searches would
occur only against “contents”. While Dave likes the default field to
be “*”, I personally find distributing a query expression across all
fields tricky and error-prone, especially given that different fields
may be analyzed differently. Consider a query for “foo bar”. With
two fields “title” and “body”, how do you expand that query across
all fields? Not trivial. This is why I like the aggregate
“contents” field technique, which can work in conjunction with fields
indexed individually also, so a query for “foo bar” would search the
“contents” field by default, but someone could do “title:foo
body:bar” to refine things.

I think this is enough, and perhaps too much(!), feedback for
now Sorry if it seems overly picky, but I think this is a very
important addition to the Rails and ActiveRecord. The magic that is
Lucene is very special, with I’m thrilled that it has now entered the
Ruby world. I want to help Ferret and its integration into places
like ActiveRecord goes as smoothly as possible and keeps the
outstanding reputation that Lucene has in the Java (and C# and
Python, etc) world. There are many ways to use Lucene inefficiently

I’ll be here doing what I can to help oversee that things are done
in the best possible way.

Erik

weibel · December 5, 2005, 12:05pm

Hi all

First of all I’d like to take the oppertunity to thank you all for the
great
response. Personally I feel that this approach to Ferret/Rails
integration will
be a good thing to investigate further. People need quality search.

I think that we should agree on where to put the input for this project.
The
page on David B.s wiki is a good start - thanks for that David.
http://ferret.davebalmain.com/trac/wiki/FerretOnRails

I needed this code for a specific task on my job and there is still many
things
to do to make it general usable.

I will comment on different peoples input below.

Thanks to David for giving direct input for enhancing the quality of the
code
and explaining index.flush() to me. It’s good to have the author of
ferret
giving direct input as I’m not really sure where the pitfalls in the
implementation are speed/quality wise.

As both David and Eric Hatcher has pointed out the current
implementation will
only index one model per application. My view on this issue is that I
would like
to have one index for all models as opposed to multiple index files;
that is ONE
Ferret index per application.

I will also need to implement a method for rebuilding the index. This
will come
in handy both when in development mode and probably also in production.

Eric pointed out that there will be problems with transactions and I
must admit
that I don’t have any viable ideas of how to approach this issue. I have
thought
of turning transactions off for the SQL tables in question - if that’s
possible
at all.

Eric also had problems with the name index.db. Instead I suggest
index.frt

The current search method should be worked on. At the moment it fires
quite a
few SQL select statements. There is also a need for the implementation
of
pagination.

The to_doc method is one way to approach things when building the index.
I
actually thought of Erics suggestion about an aggregate field which
sounds
practical. There should be a way of configuring which fields goes where.

I have had many ideas of what other things to implement. One of them is
that
hard core Lucene folks will probably not put up with the limitations of
a
specific implementation if it makes things difficult. One of the things
I like
about Active Recored in Rails is the find_by_sql() method which lets you
do
whatever you want on the SQL side. A similar approach could be
implemented with
Ferret. find_by_fql() - if there is such a term as Ferret Query
Language.

Also the many possibilities for fine tuning should not be forgotten in
favour of
simplicity. There should allways be a way to make the configuration
exactly as
you would like it. I favour the configuration approach Thomas L.
has
suggested.

Lastly: I really appreciate your contributions and I feel that with our
combined
efforts it will be possible to build a quality solution. In time
acts_as_ferret
could become the prefered choice for Ferret/Rails integration.

Kasper

weibel · December 5, 2005, 5:25pm

Erik H. <erik@…> writes:

On Dec 4, 2005, at 11:39 PM, Thomas L. wrote:

(The Portland Ruby Brigade has their monthly meeting on Tuesday, so
that’s one
nights work missed.
;~)

You Portland Rubyists really know how to party! I went to the event
during OSCON in August - what a blast.

Well, that was my first PRX.rb event since I had just moved here, so I
can’t
take credit for all that…

The notation I'm working on is something like this:

    acts_as_ferret :index_dir => "#{RAILS_ROOT}/index/", fields  
=> {…}
So you’re thinking that each model may have its own index?

Actually, I guess I didn’t indicate very well what was going to be
optional
configuration and what was fixed. I only put that there to indicate that
you
could have one index per model. I left out the part that would allow
you to
configure it globaly. I tend to agree with you, in fact, that one global
index
makes the most sense.

Adding the ability to pass Query objects to the find_by_contents
method.

Cool. Maybe this should be renamed to find_by_ferret?

sounds reasonable to me.

If a String is passed in, it gets parsed (with the options hash allowing
control over the parsing), and if a Query is passed in then it is used
as-is.

That’s pretty much what I was aiming for.

I like it being acts_as_ferret personally. “indexed” is overloaded
within the relational database domain, so it could be construed as
having to do with DB indexes.

Seems reasonable to me.

Thomas

weibel · December 14, 2005, 3:29am

Great work Thomas,

I just notices two things in my quick glance. Firstly, you need to
change Document::Field::Index::NO to
Document::Field::Index::UNTOKENIZED for the :ferret_class and :id
fields. My fault as I made the same mistake in my code above.

Also, I don’t know if you meant to use symbols but you shouldn’t use
‘:’ in a field name as it will through off the query parser. Get rid
of the ‘"’ around :ferret_class and :id and you’ll be fine.

I made both these changes on the wiki already.

One other change you may like to make is to allow Query objects to be
passed to the find_by_contents method as well as Strings, but I’ll
leave that one up to you for the moment.

Hope that helps,
Dave

weibel · December 14, 2005, 1:43am

Since it’s been over a week and I’ve only had time to tinker here and
there on
my proposed changes to the acts_as_ferret plugin, I thought it was time
to just
post what I had so far and let others weigh in on it or take their own
stab at
making it more complete. I’ve posted my updated version along with some
brief
notes at the bottom of the ferret wiki page here:
http://ferret.davebalmain.com/trac/wiki/FerretOnRails

I’m still actively working on this, but I’ve only been able to do it in
fits and
spurts so far. I appologize for the ugliness of some of the code, I’m
still
trying to figure out how to do all the dynamic “magic” necessary for
this sort
of thing.

weibel · December 14, 2005, 7:04am

It’s so great that people are working on this! Ferret is great and I
look forward to seeing it better integrated with Rails.

Thomas – I tried this code but experienced a few problems with it. I
never got it to work, and gave up since it’s not exaclty what I need
(the documents I’m storing in Ferret don’t exactly match my model
objects, but are a composite of them). Still, I have some feedback that
might (or might not) be helpful.

In addition to what David mentioned, I noticed that you use the method
class_variable_set in the method acts_as_ferret. This isn’t available in
Ruby 1.8.2. Moreover, I’m not sure why you’re using this here since the
variable names are not dynamic. I just changed these to:

        @@fields_for_ferret = Array.new
        @@class_index_dir = configuration[:index_dir]

Also, I noticed that the indentation on the class method append_features
was a bit off … it looked like super was the beginning of a block.
Just a minor thing.

Also, I’m confused about the name for the SingletonMethods module. What
is the singleton that’s being referred to here? This isn’t a criticism
– I’m just confused, since it seems to me that these methods get added
to your model classes and are available to each instance. Are they named
such because each model has a single instance of the index?

Also, I was wondering – since ferret_create is aliased as
ferret_update, shouldn’t it first call a delete before adding itself to
the index? For example, something like:

    def ferret_create
      begin
        ferret_delete
      rescue nil
      end
      ferret_index << self.to_doc
    end
    alias :ferret_update :ferret_create

Also, a question for David – is auto_flush => true supposed to remove
the lock automatically after writes? I ask because I also tried the
code that Kasper originally posted, and I kept getting locking errors
unless I closed the index after updates (and I also wasn’t quite able to
get that code to work before giving up). I was running both a Web
instance and trying to get at it with console, which is similar, I
think, to what would happen with multiple FCGI processes.

Thanks to everyone for your efforts, especially David for Ferret itself!

Jen

weibel · December 14, 2005, 8:22am

jennyw wrote:

Also, a question for David – is auto_flush => true supposed to remove
the lock automatically after writes? I ask because I also tried the
code that Kasper originally posted, and I kept getting locking errors
unless I closed the index after updates (and I also wasn’t quite able
to get that code to work before giving up). I was running both a Web
instance and trying to get at it with console, which is similar, I
think, to what would happen with multiple FCGI processes.

Oops! Never mind about the locking problem … it turns out I had an
older version of Ferret installed that probably didn’t support
auto_flush.

Jen

weibel · December 14, 2005, 9:07am

I am rewriting parts of the plug (ill contribute it around next week), I
wanted to use search, with some special arguments for ferret, and
arguments for find. So that when search its done, it calls find with the
found id’s and conditions/include enc. And return whats needed. I am
hessitating about ferret_search (no risk of being reimplemented by
someone else) or search (very common, maybe could someday became a
method for rails itself), what is your opinion? I was thinking of
fetching the ferret query first and then the database entry’s (from
mysql for example). But I can’t really think of what would be faster
(searching ferret first or activerecords), really depends on the use of
conditions…

weibel · December 14, 2005, 6:43pm

Thomas L. wrote:

of thing.
It’s great that you guys are working on this. I have been following the
developments with a fair amount of interest and am hoping to integrate
some of this work with my own code on a project I am working on. A
couple of questions:

Has anyone considered a universal search across multiple models yet? How
would this work considering the fact that currently the code is per
model?

What about indexing fields that are not contained in the model? For
example: say I have an Article model with a belongs_to relationship to
an Author model. I would like the author’s name to be indexed along with
the contents of the article in the ferret document. I guess this may be
more of a ruby programming issue than a ferret issue. It seems that the
general practice is to keep track of fields to be used/indexed/inspected
as an array of symbols. In my notional article example that might be:

[:title, :document]

I’d prefer it to look more like:

[:title, :document, :author.name]

but “:author.name” is going to be problematic, is it not?

Any thoughts on these issues? Let me know if I have not been clear
enough.

-F

CODE for init.rb

END init.rb

CODE for acts_as_ferret.rb

declare the class level helper methods

which will load the relevant instance methods defined below when

Churn through the complete Active Record and add it to the Ferret

reopen ActiveRecord and include all the above to make

them available to all our models if they want it

END acts_as_ferret.rb

END init.rb

them available to all our models if they want it

Or even maintain named Ferret indexes if the user has Ferret and resort to LIKE queries if he doesn’t?

Churn through the complete Active Record and add it to the Ferret

Or even maintain named Ferret indexes if the user has Ferret and
resort to LIKE queries if he doesn’t?