Hey all, I haven't heard much on the list in a while re: ActiveSearch, Ferret, Hyper Estraier, etc. in a while. Has anyone successfully used ActiveSearch on tables with rows in the order of hundreds of thousands, if not millions? (in our case, each row containing anywhere from a paragraph of text to many pages) Here's some background on our search efforts. A. Hyper Estraier - this seemed like a real front-runner. http://hyperestraier.sourceforge.net/ - choked / crawled to a halt after about 80k documents had been added to the index - supposed to be super-scalable, high performance, P2P, etc. - perhaps we needed a server or two more for an index of this size to use HE? - is unable to do a hot-copy backup. the index also seems to be easily corrupted. and, once that happens, re-indexing 100k+ documents is not a pleasant experience! =) B. Ferret - this also seemed like a clear winner. http://wiki.rubyonrails.com/rails/pages/HowToInteg... - was taking 2 seconds on average to index a new text (but kept getting slower as the index grew) - other than that (performance), seems like an incredible piece of software + tight integration w/ Rails through acts_as_ferret Next we're planning on giving ActiveSearch a run for its money. Although, if a bunch of ya'll chime and and say "Nooooooooo!" we might change courses once again =) FWIW, we're using PostgreSQL as the database. We've also considered using Postgres' tsearch2 full-text search feature. The setup for that seems a bit more complicated, but perhaps it's worth it. Anyone have a favorite tsearch2 story to share? Thanks! - Shanti http://sproutit.com/ - team-based support/sales email management http://sablog.com/ - personal blog
on 2006-04-28 02:14
on 2006-04-28 03:02
On 4/28/06, Shanti Braford <email@example.com> wrote: > B. Ferret - this also seemed like a clear winner. > http://wiki.rubyonrails.com/rails/pages/HowToInteg... > - was taking 2 seconds on average to index a new text (but kept getting > slower as the index grew) The current version of Ferret indexes about 100 times as fast as the pure ruby version so performance shouldn't be a problem anymore. Also, It shouldn't get much slower as the index grows.