Scaling full text indexing(ferret vs solr vs hyperstraier)

Hi,

Does any have experience scaling full text search in RoR?

Right now our project is running a simple setup with ferret and
acts_as_ferret. We are thinking about deploying a feature that would
send 50x more search requests.

So we probably have to rethink our solution. How do services like
search.twitter.com (the former Summize) use?

Or in what direction should I look?


Thanks,
M.

Marcelo B. wrote:

Right now our project is running a simple setup with ferret and
acts_as_ferret. We are thinking about deploying a feature that would
send 50x more search requests.

With Ferret you can scale reads horizontally: you can have multiple read
servers on a single index. You can only have one write server on a
single index or you’ll risk data corruption.

Another strategy is partitioning: having separate indices for buckets of
data. Each index could run on it’s own server or cluster of servers.


Roderick van Domburg
http://www.nedforce.com

With Ferret you can scale reads horizontally: you can have multiple read
servers on a single index. You can only have one write server on a
single index or you’ll risk data corruption.

Another strategy is partitioning: having separate indices for buckets of
data. Each index could run on it’s own server or cluster of servers.

Would it be easier to scale with hyperestraier or something else?


M.

On Sun, Mar 8, 2009 at 1:14 PM, Marcelo B. [email protected]
wrote:

Does any have experience scaling full text search in RoR?

One option that worked very well for me is ultrasphinx.

http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/README.html

IIRC, 2 limitations of ultrasphinx are:

  • new entries can only be found after reindexing
    (full reindexing or delta indexing)
  • you need a separate sphinx process somewhere on a server
    (if you run a shared hosting system, this may be an issue)

If you can live with those 2 limitations, ultrasphinx is a very good
candidate.

HTH,

Peter