Nginx serving static files - caching?

Dobai-Pataky_BSSSSl · November 12, 2010, 10:15am

Hi,

we want to establish some service that will deliver static files
(images) with a size between 1-120 KB. The service should handle (at the
moment) ~130.000.000 requests per month, so not that few and not
extremly many requests. The to-be-served files have a size of several
hundred GBs in total. Some of the files (5-10%) are asked more
frequently and the bigger part less frequently.

Let’s say we have 16 GB Ram available for the server. We could simply
avoid using some caching mechanism and read the files from disk. For my
scenario: is it possible to establish some reasonable caching? No
decision regarding the to-be-used technology is made so far. An
associate favours varnish (with some webserver as backend) and some
least recently used strategy, i’d give nginx+memcached a try. Any
recommendations?

Posted at Nginx Forum:

revirii · November 12, 2010, 10:25am

On Fri, Nov 12, 2010 at 4:14 PM, revirii [email protected] wrote:

Let’s say we have 16 GB Ram available for the server. We could simply
avoid using some caching mechanism and read the files from disk. For my
scenario: is it possible to establish some reasonable caching? No
decision regarding the to-be-used technology is made so far. An
associate favours varnish (with some webserver as backend) and some
least recently used strategy, i’d give nginx+memcached a try. Any
recommendations?

how about you… let your operating system to handle it?

–
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

revirii · November 12, 2010, 3:36pm

On Fri, Nov 12, 2010 at 3:24 AM, Edho P Arief [email protected]
wrote:

how about you… let your operating system to handle it?

Agreed. There’s no reason to try to re-implement what the OS already
does with the file-system cache; you’ll probably do it poorly. If
you’re using FreeBSD, simply using nginx by itself is probably your
best bet as you can enable asynchronous IO. Varnish and nginx
proxy_cache a great at caching, but there’s no reason for that tier
unless you’re caching dynamically generated assets (or have a very
slow back-end such as Amazon S3).

If you’re using Linux, AIO doesn’t work unless you’re using direct IO
(bypassing filesystem cache), so you will probably want to use a large
number of nginx workers to keep throughput up even though disk IO is
blocking. With 130,000,000 requests per month, that’s 50 requests per
second average. I assume that you will see peaks several times that
number, so you will want to tune the number of worker processes based
on the average request duration, IO load, etc. It all depends on your
cache hit ratio and the patterns of your clients. We use 10 nginx
workers on a dual-core machine to serve about 150M requests per month,
but our content set is only about 80 GB and the cache hit ratio is
very high.

RPM

revirii · November 12, 2010, 5:07pm

revirii wrote:

avoid using some caching mechanism and read the files from disk. For my
scenario: is it possible to establish some reasonable caching? No
decision regarding the to-be-used technology is made so far. An
associate favours varnish (with some webserver as backend) and some
least recently used strategy, i’d give nginx+memcached a try. Any
recommendations?

I’m investigating Kyoto Cabinet/Kyoto Tycoon for a caching strategy at
the moment and I can recommend it, though the use case is lots of small
text files. If you don’t know, Kyoto Cabinet is a file based “key-value”
database. There are various flavours of Cabinet, one of which is a
“directory” database, so if you create a Cabinet with a ‘.kcd’ extension
then the resulting database just stores each object as a unique file in
a directory.

Kyoto Tycoon then is a HTTP server through which you can access (one or
many) databases via GET and POST. You can access the databases directly,
but one of the benefits of using Tycoon is auto-expiration - whenever
you set a value you can give an expiry time and the file will be removed
after that.

Honestly not an expert in this sort of thing, but the setup I’m looking
at is:

[nginx] -> [4 x Tornado] -> [16 x Kyoto Tycoon] -> [4096 x Kyoto

Cabinet]

Tornado being the Python webserver which, as well as actually retrieving
the data from a remote source, makes the decision of which database to
store items based on a hash of the url. I’m thinking of a single server
(box), but since it’s all HTTP I suppose you could have a fan out among
servers if required.

Again, not an expert, and this is all exploratory at this point, so take
with a pinch of salt.

revirii · November 15, 2010, 2:24am

On 11/12/2010 4:14 AM, revirii wrote:

Let’s say we have 16 GB Ram available for the server. We could simply
avoid using some caching mechanism and read the files from disk. For my
scenario: is it possible to establish some reasonable caching? No
decision regarding the to-be-used technology is made so far. An
associate favours varnish (with some webserver as backend) and some
least recently used strategy, i’d give nginx+memcached a try. Any
recommendations?

I’d consider using a RAIDed SSD for storage to achieve the highest
performance.

–

Robert La Ferla
VP Engineering
OMS SafeHarbor

This message (and any attachments) contains confidential information and
is protected by law. If you are not the intended recipient, you should
delete this message and are hereby notified that any disclosure,
copying, distribution, or the taking of any action based on this
message, is strictly prohibited.