Scaling Ferret Beyond One Server

andy · July 15, 2006, 12:48am

Hi Everyone,

I was wondering if folks here have had experience scaling Ferret beyond
a
single server? Currently, we are running Ferret in the same physical
server
as its Rails front end (via acts_as_ferret), but it is evident that we
need
a more scalable solution already. How would you split up the tasks (via
dRB
perhaps?) between two or three servers? Shared disk, replicated Ferret
index (?), or any other ideas?

Thanks in advance,
AC

andy · July 15, 2006, 7:33am

On 7/15/06, Andy C. [email protected] wrote:

AC
Hi Andy,

I guess the answer depends on which part of the application is the
bottleneck. If it is Ferret then replicating the index might be the
solution but it’s complicated and I doubt that is your problem.

If Ferret is handling the workload (which it should be if you have the
C extension installed) then my guess would be to use a DRb solution.
In a few weeks I’m going to start experimenting with using Ferret with
DRb and future versions may even come with a DRb server included. In
the mean time let me know how you go.

Cheers,
Dave

andy · July 17, 2006, 4:27pm

Dave,

Thanks for your feedback and for developing the wonderful Ferret!

Besides performance, our application requirement is to have no single
point
of failure - which is why we are looking at running Ferret (at least the
search node) beyond a single server.

In the lucene world, there’s an interesting post at
http://www.mail-archive.com/[email protected]/msg12709.html
on
how Technorati is doing distributed Lucene…

Our current options are (1) dRB, (2) some replication technique similar
to
the one described by Doug Cutting in the above post, and (3) possibly
some
form of distributed file system like hadoop (which will also serve other
needs for our app). Will let the list know how it goes. Also,
interested
in hearing anybody else’ experience on using Ferret on more than one
machine.

-AC

andy · July 18, 2006, 1:37am

Andy C. wrote:

(3) possibly some form of distributed file system like hadoop

Actually hadoop is build for distributed filesystem that only need
sequential reading of files. Its not useful for random access. You might
want to try something like MogileFS instead.