Index location for multiple servers

cpjolicoeur · December 7, 2006, 5:11pm

I have a rails app that is going to be deployed across several servers.

In my understanding of acts_as_ferret, the index files are kept in the
index folder in the root of the rails app. This won’t work for multiple
servers, of course, since all the servers will have their own directory
tree.

How are people using ferret for apps deployed on multiple servers?
Could I run the index files through the database much like I do with
session info to prevent this problem? Or is there a completely
different work around that I am just not thinking of?

cpjolicoeur · December 7, 2006, 5:25pm

On 07.12.2006, at 17:11, Craig J. wrote:

How are people using ferret for apps deployed on multiple servers?
Could I run the index files through the database much like I do with
session info to prevent this problem? Or is there a completely
different work around that I am just not thinking of?

I haven’t done this myself yet, but I think the best approach is to
employ an index server. The Ferret index is kept on one machine which
runs a DRb service. The other servers do not talk directly to Ferret
but to the index server which performs the indexing/searching and
returns the results as Ruby objects to the clients.

I suppose this fit’s best into Rails’ share-nothing approach to scaling.

–
Andy

cpjolicoeur · December 7, 2006, 5:59pm

Hey …

How are people using ferret for apps deployed on multiple servers?
Could I run the index files through the database much like I do with
session info to prevent this problem? Or is there a completely
different work around that I am just not thinking of?

I haven’t done this myself yet, but I think the best approach is to
employ an index server. The Ferret index is kept on one machine which
runs a DRb service. The other servers do not talk directly to Ferret
but to the index server which performs the indexing/searching and
returns the results as Ruby objects to the clients.

actually that’s the only way to do it… you will run into severe
problems
if you want to modify the index from different severs on a shared
filesystem… at least with the current version of ferret.

dave wanted to include support for a indexing server in ferret, having
some sort of backgroundRb-alike mechanism, where you fire your indexing
requests to a remote process/server/whatever …

Actually Jens and I will discuss this topic on Saturday, if we find a
solution, we’ll let you and the list know…

Ben

cpjolicoeur · December 7, 2006, 6:19pm

Craig J. wrote:

I have a rails app that is going to be deployed across several servers.

In my understanding of acts_as_ferret, the index files are kept in the
index folder in the root of the rails app. This won’t work for multiple
servers, of course, since all the servers will have their own directory
tree.

How are people using ferret for apps deployed on multiple servers?
Could I run the index files through the database much like I do with
session info to prevent this problem? Or is there a completely
different work around that I am just not thinking of?

As others have already mentioned, you need to make sure the index writer
is a singleton instance with all other servers as clients.

It is a fun problem if you want to write it yourself and are familiar
with
DRb (quick to learn) and threading/locking issues.

However if you don’t want to spend the time to do that yourself I would
highly
suggest looking into the Searchable plugin which uses Ferret or Solr as
search backends. It integrates nicely into your models using
ActiveRecord
hooks - much like aaf.

The key feature in your case is that it also provides a DRb backend
which you can turn on by a simple config switch, allowing you to
have a central indexing server. more about this plugin at RubyForge:

http://searchable.rubyforge.org/

It has been mentioned once before on this list but I think it deserves
more exposure - especially as more people try to use Ferret for larger
projects which require a multiple-server deployment. My company wrote
our own DRb ferret implementation in house – but that was before I saw
the Searchable rails plugin

Good Luck,
-damian

cpjolicoeur · December 7, 2006, 6:29pm

damian wrote:

However if you don’t want to spend the time to do that yourself I would
highly
suggest looking into the Searchable plugin which uses Ferret or Solr as
search backends. It integrates nicely into your models using
ActiveRecord
hooks - much like aaf.

The key feature in your case is that it also provides a DRb backend
which you can turn on by a simple config switch, allowing you to
have a central indexing server. more about this plugin at RubyForge:

http://searchable.rubyforge.org/

It has been mentioned once before on this list but I think it deserves
more exposure - especially as more people try to use Ferret for larger
projects which require a multiple-server deployment. My company wrote
our own DRb ferret implementation in house – but that was before I saw
the Searchable rails plugin

Good Luck,
-damian

Thanks Damian,

I’m going to download and test out the searchable plugin. It seems as
if it should be able to handle what I need, and if it does, you are
right; it deserves more exposure. I had never heard of it before.

cpjolicoeur · December 7, 2006, 8:06pm

On 07.12.2006, at 19:19, Raymond O’connor wrote:

I may be a special case since my index is never updated through my
frontend app, but I was planning on keeping one “gold” index on a
backend server and update this index through a script whenever
there are
updates to my documents (about once a week in my case). This server
will update the index and then do basically a cp to the frontend
webservers and copy over the old indexes on each of these machines.
What do people think of this solution? The searchable drb solution
looks interesting though and I may consider it if I run into issues
doing it this way.

If updates happen so rarely, this might be a feasible solution. At
least, it involves less moving parts than an indexing server would
and it’s faster than querying a remote server. I’m not sure, however,
if Ferret would like it if the index files are overwritten during a
read operation. You could write a capistrano recipe which shuts down
your app, updates the index files and restarts it afterwards.
Depending on the size of your index, this involves downtimes of a few
seconds which is certainly tolerable.

–
Andy

cpjolicoeur · December 7, 2006, 7:19pm

I may be a special case since my index is never updated through my
frontend app, but I was planning on keeping one “gold” index on a
backend server and update this index through a script whenever there are
updates to my documents (about once a week in my case). This server
will update the index and then do basically a cp to the frontend
webservers and copy over the old indexes on each of these machines.
What do people think of this solution? The searchable drb solution
looks interesting though and I may consider it if I run into issues
doing it this way.

cpjolicoeur · December 8, 2006, 10:26am

On Thu, Dec 07, 2006 at 08:01:34PM +0100, Andreas K. wrote:

What do people think of this solution? The searchable drb solution
seconds which is certainly tolerable.
I’d suggest to swap out the whole index directory at once (something
like ‘mv index index.old && mv index.new index’) and then
re-open the Searcher. at least on unix/linux that should work without
downtime.

Jens

–
webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

cpjolicoeur · December 9, 2006, 5:22am

On 12/7/06, Craig J. [email protected] wrote:

session info to prevent this problem? Or is there a completely
different work around that I am just not thinking of?

A pretty typical way to do this with lucene is described here:
http://www.mail-archive.com/[email protected]/msg12709.html

-ryan