Ferret / hyperestraier indexing time


#1

since i cant manage to get the accelerated ferret going (amd64
incompatible C code) and hyperestraier returns 0 results for everything
after a reboot (despite claiming its index is every bit as big as
before) i made a simple rails inverted index that essentially just does
a find_or_create_by_word for each word, and then adds its id to a join
table linking words and documents…

the only thing is it takes about 2 or 3 seconds to index a reasonably
large article, so this slows down ‘add’ operations, etc…

ezra’s backrounDRb sounds like it will hit the spot. but how does
acts_as_searchable and acts_as_ferret handle this. are they so much
faster than indexing time is moot?


#2

On Jun 12, 2006, at 3:44 PM, carmen wrote:

the only thing is it takes about 2 or 3 seconds to index a reasonably
removed_email_address@domain.invalid
http://lists.rubyonrails.org/mailman/listinfo/rails

Carmen-

Building an index is exactly the kind of thing that backgroundrb is

great for. There are already some people already using it to build
their hyper estraier and ferret indexes. Join the mailing list[1] and
i can help you get the hang of how to use it. Eventually I want to
set up a small repo of user contributed worker classes for others to
use.

Cheers-
-Ezra

[1] http://rubyforge.org/mailman/listinfo/backgroundrb-devel


#3

carmen removed_email_address@domain.invalid wrote: since i cant manage to get the
accelerated ferret going (amd64
incompatible C code) and hyperestraier returns 0 results for everything
after a reboot (despite claiming its index is every bit as big as
before) i made a simple rails inverted index that essentially just does
a find_or_create_by_word for each word, and then adds its id to a join
table linking words and documents…

the only thing is it takes about 2 or 3 seconds to index a reasonably
large article, so this slows down ‘add’ operations, etc…

ezra’s backrounDRb sounds like it will hit the spot. but how does
acts_as_searchable and acts_as_ferret handle this. are they so much
faster than indexing time is moot?
I’m using hyper estraier. W/ about 20K articles in the index, on my dev
box with tons of other processes running, here’s sample performance:

a.body.split(’ ').size
=> 1382

t1 = Time.now; a.update_index(true); Time.now - t1
=> 1.150097

Not exactly lightning fast, but not a deal breaker for me as inserts are
relatively infrequent.
BTW, sounds like either your app is looking for the wrong HE node, or
you had a corrupted index. Have you had that “can’t find anything”
problem come up multiple times? I haven’t had any trouble in testing.

phil