Forum: Ruby on Rails Full-text Search Options Redux

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
8194edd5ad2d97cac9d4f04d2595dfcc?d=identicon&s=25 Shanti Braford (sbraford)
on 2006-04-28 02:14
Hey all,

I haven't heard much on the list in a while re: ActiveSearch, Ferret,
Hyper Estraier, etc. in a while.

Has anyone successfully used ActiveSearch on tables with rows in the
order of hundreds of thousands, if not millions?

(in our case, each row containing anywhere from a paragraph of text to
many pages)


Here's some background on our search efforts.

A. Hyper Estraier - this seemed like a real front-runner.
http://hyperestraier.sourceforge.net/
- choked / crawled to a halt after about 80k documents had been added to
the index
- supposed to be super-scalable, high performance, P2P, etc.
- perhaps we needed a server or two more for an index of this size to
use HE?
- is unable to do a hot-copy backup.  the index also seems to be easily
corrupted.  and, once that happens, re-indexing 100k+ documents is not a
pleasant experience!  =)

B. Ferret - this also seemed like a clear winner.
http://wiki.rubyonrails.com/rails/pages/HowToInteg...
- was taking 2 seconds on average to index a new text (but kept getting
slower as the index grew)
- other than that (performance), seems like an incredible piece of
software + tight integration w/ Rails through acts_as_ferret

Next we're planning on giving ActiveSearch a run for its money.

Although, if a bunch of ya'll chime and and say "Nooooooooo!" we might
change courses once again =)

FWIW, we're using PostgreSQL as the database.  We've also considered
using Postgres' tsearch2 full-text search feature.  The setup for that
seems a bit more complicated, but perhaps it's worth it.  Anyone have a
favorite tsearch2 story to share?

Thanks!

- Shanti

http://sproutit.com/     - team-based support/sales email management
http://sablog.com/       - personal blog
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2006-04-28 03:02
(Received via mailing list)
On 4/28/06, Shanti Braford <shanti@braford.org> wrote:
> B. Ferret - this also seemed like a clear winner.
> http://wiki.rubyonrails.com/rails/pages/HowToInteg...
> - was taking 2 seconds on average to index a new text (but kept getting
> slower as the index grew)

The current version of Ferret indexes about 100 times as fast as the
pure ruby version so performance shouldn't be a problem anymore. Also,
It shouldn't get much slower as the index grows.
This topic is locked and can not be replied to.