Thesaurus - use rails/database, or serve files via apache?

I need to serve thesaurus content via AJAX requests. I can think of
several ways to do it, but performance will definitely be an issue - if
there are thousands upon thousands of requests, I want to make sure it’s
as fast and efficient as possible.

So, what do you folks think is the optimal way to go about this? The
obvious route is to use a controller that queries a database for the
word and returns a simple list of synonyms in return, but I wonder if it
would be faster to use some sort of caching? I’m pondering “exploding”
the thesaurus data out into thousands of folders and subfolders and
small text files, and serving it up via Apache:

/aar/aardvark

If Apache returns something, it would be the list of synonyms at
/aar/aardvark. If There is no word there, or it has no synonyms, it
could just return a 404, and the AJAX request would deal with the
failure appropriately. These folders could be nested enough so that no
folder had too many thousands of entries (because that could be a system
bottleneck.)

Any opinions?

Thanks for your input!

Mike L.
mikelaurence.com

Memcached would be great for this. You could even simply store the
synonym list for every possible word, which is of course very
inefficient from a storage point of view, but then again, all caching
is by definition.

Marc
CloudCache.net

Sent from my iPhone

On Apr 19, 2008, at 2:45 PM, Mike L.
<[email protected]

Marc B. wrote:

Memcached would be great for this. You could even simply store the
synonym list for every possible word, which is of course very
inefficient from a storage point of view, but then again, all caching
is by definition.

Marc
CloudCache.net

Sent from my iPhone

On Apr 19, 2008, at 2:45 PM, Mike L.
<[email protected]

That would be pretty speedy. One issue - the thesaurus data is about 12
MB per language, so if many languages are available, that could be
hundreds of MB of RAM tied up. Not terrible, but not ideal.

Do you see any issues with the Apache model I mentioned above? I don’t
have much experience with Apache, so I’m unsure if there would be
performance issues to due large numbers of folders/files in the paths.

Thanks!

Mike

Some sizing estimates:

Number of words in a good dictionary: 1M

Average length of word: 8 bytes

Average number of words in thesaurus for each word: 30

Size of memcached “exploded” thesaurus for each language: 256 MB

Cost of a 1.7 GB machine on EC2: $65/month.

Serving up thesaurus results fast enough for AJAX: priceless ;^)

Cheers,

Marc
CloudCache.net

On Sat, Apr 19, 2008 at 3:02 PM, Mike L. <

Mike,

My opinion:

1- Cache is way faster than the file system
2- Once it is cached, it doesn’t matter if it comes from the file
system or the database
3- Managing your thesaurus in file system could become a big mess

So, I would definitely go for DB + memcached.

Cheers, Sazima