Full-text Search Options Redux

Hey all,

I haven’t heard much on the list in a while re: ActiveSearch, Ferret,
Hyper Estraier, etc. in a while.

Has anyone successfully used ActiveSearch on tables with rows in the
order of hundreds of thousands, if not millions?

(in our case, each row containing anywhere from a paragraph of text to
many pages)

Here’s some background on our search efforts.

A. Hyper Estraier - this seemed like a real front-runner.

  • choked / crawled to a halt after about 80k documents had been added to
    the index
  • supposed to be super-scalable, high performance, P2P, etc.
  • perhaps we needed a server or two more for an index of this size to
    use HE?
  • is unable to do a hot-copy backup. the index also seems to be easily
    corrupted. and, once that happens, re-indexing 100k+ documents is not a
    pleasant experience! =)

B. Ferret - this also seemed like a clear winner.
http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails

  • was taking 2 seconds on average to index a new text (but kept getting
    slower as the index grew)
  • other than that (performance), seems like an incredible piece of
    software + tight integration w/ Rails through acts_as_ferret

Next we’re planning on giving ActiveSearch a run for its money.

Although, if a bunch of ya’ll chime and and say “Nooooooooo!” we might
change courses once again =)

FWIW, we’re using PostgreSQL as the database. We’ve also considered
using Postgres’ tsearch2 full-text search feature. The setup for that
seems a bit more complicated, but perhaps it’s worth it. Anyone have a
favorite tsearch2 story to share?

Thanks!

  • Shanti

http://sproutit.com/ - team-based support/sales email management
http://sablog.com/ - personal blog

On 4/28/06, Shanti B. [email protected] wrote:

B. Ferret - this also seemed like a clear winner.
Peak Obsession

  • was taking 2 seconds on average to index a new text (but kept getting
    slower as the index grew)

The current version of Ferret indexes about 100 times as fast as the
pure ruby version so performance shouldn’t be a problem anymore. Also,
It shouldn’t get much slower as the index grows.