I’m technical lead at Lingr (http://www.lingr.com), a chatroom-based
social networking site. We’ve currently got several million user
utterances stored in MySQL, and we’re looking to build a local search
functionality. I’ve played around with aaf and I really like it, but I
have some questions.
Is anyone out there using aaf to index a corpus of this size? If
so, how has your scaling experience been?
We would be running one central aaf server instance, talking to it
over drb from our many application servers. We add tens of thousands of
utterances per day- anyone out there indexing this many items on a daily
basis over drb? If so, how has your experience been in terms of
All of our utterance data is in UTF8, but we don’t know what
language a particular utterance is in. It’s common to have both latin
and non-latin text even in the same room. How can I index both types of
strings effectively within the same model field index?
Any suggestions on how to build the initial index in an offline way?
I suspect it will probably take many hours to build the initial index.
I suspect we will have to disable_ferret(:always) on our utterance
model, then update the index manually on some periodic basis (cron job,
backgroundrb worker, etc.). The reason for this is that we don’t want
to introduce any delay into the process of storing a new utterance,
which occurs in realtime during a chat session. Anyone have experience
Any advice is appreciated!