Full-text Search Options Redux

Hey all,

I haven’t heard much on the list in a while re: ActiveSearch, Ferret,
Hyper Estraier, etc. in a while.

Has anyone successfully used ActiveSearch on tables with rows in the
order of hundreds of thousands, if not millions?

(in our case, each row containing anywhere from a paragraph of text to
many pages)

Here’s some background on our search efforts.

A. Hyper Estraier - this seemed like a real front-runner.
http://hyperestraier.sourceforge.net/

  • choked / crawled to a halt after about 80k documents had been added to
    the index
  • supposed to be super-scalable, high performance, P2P, etc.
  • perhaps we needed a server or two more for an index of this size to
    use HE?
  • is unable to do a hot-copy backup. the index also seems to be easily
    corrupted. and, once that happens, re-indexing 100k+ documents is not a
    pleasant experience! =)

B. Ferret - this also seemed like a clear winner.
http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails

  • was taking 2 seconds on average to index a new text (but kept getting
    slower as the index grew)
  • other than that (performance), seems like an incredible piece of
    software + tight integration w/ Rails through acts_as_ferret

Next we’re planning on giving ActiveSearch a run for its money.

Although, if a bunch of ya’ll chime and and say “Nooooooooo!” we might
change courses once again =)

FWIW, we’re using PostgreSQL as the database. We’ve also considered
using Postgres’ tsearch2 full-text search feature. The setup for that
seems a bit more complicated, but perhaps it’s worth it. Anyone have a
favorite tsearch2 story to share?

Thanks!

  • Shanti

http://sproutit.com/ - team-based support/sales email management
http://sablog.com/ - personal blog

On 4/28/06, Shanti B. [email protected] wrote:

B. Ferret - this also seemed like a clear winner.
http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails

  • was taking 2 seconds on average to index a new text (but kept getting
    slower as the index grew)

The current version of Ferret indexes about 100 times as fast as the
pure ruby version so performance shouldn’t be a problem anymore. Also,
It shouldn’t get much slower as the index grows.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs