Ferret / acts_as_ferret multiple server deployment

Has anyone deployed ferret & act_as_ferret to a load balanced multiple
server environment? If so, did you simply use a shared network index?
I have a couple of ideas on how to deploy - but each have shortcomings
and I’m hoping to find out if anyone else has deployed ferret in this
manner. The application is simply load balanced between multiple
servers running the same app for speed and redundancy, and things that
are to be indexed could be changed at the same time on each instance -
and to make sure the index is up to date we’ll be using acts_as_ferret,
but this seems to cause a potential problem when coming from multiple
servers to the same index. Any insight you could provide would
appreciated?

On Tue, Sep 12, 2006 at 07:33:19PM +0200, J Coppedge wrote:

Has anyone deployed ferret & act_as_ferret to a load balanced multiple
server environment? If so, did you simply use a shared network index?

I’m unsure if an index on a shared network drive would work.

I have a couple of ideas on how to deploy - but each have shortcomings
and I’m hoping to find out if anyone else has deployed ferret in this
manner. The application is simply load balanced between multiple
servers running the same app for speed and redundancy, and things that
are to be indexed could be changed at the same time on each instance -
and to make sure the index is up to date we’ll be using acts_as_ferret,
but this seems to cause a potential problem when coming from multiple
servers to the same index. Any insight you could provide would
appreciated?

interesting problem, that had to come up sooner or later :slight_smile:

In case the ‘index on a network drive’ doesn’t work out (file locking
is one thing that could go wrong), I’d go for a central index server
handling all the searching and indexing. This won’t work with
acts_as_ferret, though.

If searching speed is an issue and accuracy of results is not, you
could replicate the index to your app servers once in a while and search
there.

I feel it’s time for acts_as_remote_ferret :wink: something like aaf, but
connecting to a remote index server whenever a record is saved.
Or implemented as an option to aaf, which then would be working on local
indexes in development and test environments, and against a remote index
server in production mode.

sounds really interesting…
what other deployment scenarios did you think of ?

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

I believe you touched on each one…

  1. Shared network index.

  2. Sync of centralized index to individual index on each “slave” server.

  3. Centralizing the searching / indexing to a separate search server -
    however it’s possible that you would also need to load balance service
    at some point…

sounds really interesting…
what other deployment scenarios did you think of ?

On Tue, Sep 12, 2006 at 10:57:32PM +0200, J Coppedge wrote:

I believe you touched on each one…

  1. Shared network index.

  2. Sync of centralized index to individual index on each “slave” server.

  3. Centralizing the searching / indexing to a separate search server -
    however it’s possible that you would also need to load balance service
    at some point…

load balancing the indexing to several servers can only be done via
segmenting the data across those servers, and merging it when searching.
This seems possible but is not implemented in Ferret (yet?)
Java-Lucene has some kind of RMI stuff for searching multiple remote
indexes afair.

Even with 2 servers accessing the same physical index on a shared
network drive you would see no indexing speed increase, since only one
process may write-access the index at a time. searching speed would
increase, of course.

I don’t know what amounts of traffic you expect, but I’d go with the
simplest solution (besides the shared disk, where I’m somewhat unsure if
it is possible) as long as possible:

one centralized server handling all searching/indexing. fail safety
could be reached with a replication of the index to another box, that
steps in when needed.

cheers,
Jens

http://rubyforge.org/mailman/listinfo/ferret-talk

webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

On 9/13/06, Jens K. [email protected] wrote:

load balancing the indexing to several servers can only be done via
segmenting the data across those servers, and merging it when searching.
This seems possible but is not implemented in Ferret (yet?)

The start of this is there (ie the MultiSearcher). I just need to
implement RemoteSearcher. Don’t expect it any time soon however as I’m
a little burnt out at the moment. I’m just going to be cleaning up
what is currently already built for the time being.

Cheers,
Dave

David B. wrote:

On 9/13/06, Jens K. [email protected] wrote:

load balancing the indexing to several servers can only be done via
segmenting the data across those servers, and merging it when searching.
This seems possible but is not implemented in Ferret (yet?)

The start of this is there (ie the MultiSearcher). I just need to
implement RemoteSearcher. Don’t expect it any time soon however as I’m
a little burnt out at the moment. I’m just going to be cleaning up
what is currently already built for the time being.

Cheers,
Dave

Any progress on RemoteSearcher? :slight_smile: