Scaling full text indexing(ferret vs solr vs hyperstraier)

arvias · March 8, 2009, 1:15pm

Hi,

Does any have experience scaling full text search in RoR?

Right now our project is running a simple setup with ferret and
acts_as_ferret. We are thinking about deploying a feature that would
send 50x more search requests.

So we probably have to rethink our solution. How do services like
search.twitter.com (the former Summize) use?

Or in what direction should I look?

–
Thanks,
M.

arvias · March 8, 2009, 2:40pm

Marcelo B. wrote:

Right now our project is running a simple setup with ferret and
acts_as_ferret. We are thinking about deploying a feature that would
send 50x more search requests.

With Ferret you can scale reads horizontally: you can have multiple read
servers on a single index. You can only have one write server on a
single index or you’ll risk data corruption.

Another strategy is partitioning: having separate indices for buckets of
data. Each index could run on it’s own server or cluster of servers.

–
Roderick van Domburg
http://www.nedforce.com

arvias · March 8, 2009, 10:53pm

With Ferret you can scale reads horizontally: you can have multiple read
servers on a single index. You can only have one write server on a
single index or you’ll risk data corruption.

Another strategy is partitioning: having separate indices for buckets of
data. Each index could run on it’s own server or cluster of servers.

Would it be easier to scale with hyperestraier or something else?

–
M.

arvias · March 12, 2009, 10:04pm

On Sun, Mar 8, 2009 at 1:14 PM, Marcelo B. [email protected]
wrote:

Does any have experience scaling full text search in RoR?

One option that worked very well for me is ultrasphinx.

http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/README.html

IIRC, 2 limitations of ultrasphinx are:

new entries can only be found after reindexing
(full reindexing or delta indexing)
you need a separate sphinx process somewhere on a server
(if you run a shared hosting system, this may be an issue)

If you can live with those 2 limitations, ultrasphinx is a very good
candidate.

HTH,

Peter