Nginx Caching strategy

All - Is there a high level tutorial on how nginx does caching ? Does
it use mmap to map the cache directory/file into memory and share it
with all nginx worker processes (and use sendfile to send it out to the
socket) ? Or does it use directIO to open the file and use regular
read/write syscalls ? This is for Linux debian platform.

Thanks,

Posted at Nginx Forum:

Also a related question - If AIO is used, does Nginx maintain two levels
of cache (one in-memory and one on the disk) ? Does the cache manager
look at only disk caches for replacement ?

Posted at Nginx Forum:

Hello!

On Mon, Oct 04, 2010 at 03:12:57PM -0400, cachenewbie wrote:

All - Is there a high level tutorial on how nginx does caching ? Does

Some information is available here (in Russian):

http://sysoev.ru/nginx/docs/http/ngx_http_proxy_module.html#proxy_cache_path

Translation is available here (looks like correct one, but I
haven’t really checked):

http://wiki.nginx.org/NginxHttpProxyModule#proxy_cache_path

it use mmap to map the cache directory/file into memory and share it
with all nginx worker processes (and use sendfile to send it out to the
socket) ? Or does it use directIO to open the file and use regular
read/write syscalls ? This is for Linux debian platform.

nginx just saves responses to disk under the distinct names
(derived from cache key), with headers at start. When reading
from cache, nginx uses regular read() (or aio if available and
switched on) to read headers and normal file sending mecachnism to
send body.

“Normal file sending mechanism” implies sendfile, aio, directio
and so on - if available and configured.

Some metainformation about cache contents (to count cache
accesses, remove old files and so on) are stored in shared memory.

Maxim D.

Thanks Maxim, Guzman.

I think storing responses directly in disk makes it easy to store and
retrieve from different process contexts. However, if the storage space
is huge (in TB) and if the disk storage media is SSD, write avoidance is
something that should be considered. In this case, small files can be
stored in memory cache to speed up access to frequent webcontent and
cachemanager can try to store big blocks to disks when evicting content
from memory cache. Having memory cache and disk cache separated will
make more sense in such a case. I was trying to figure out if Nginx
supported that concept.

Posted at Nginx Forum:

Hi!

I’m not a nginx developer and I’ve not even seen the source yet, however
I’ve nginx running on cache and I can reply by experience one of your
questions.

As long as you have free memory and your os (mine is debian too)
correctly uses that memory to cache, you won’t see a single IO Read into
the cache files by nginx. You will see the first io read to the cached
files whenever you have no more free memory to cache.

Please share later your experience :slight_smile:

In my case in case it helps someone else, I had serious issues with IO
but didn’t blame nginx but my current virtualization setup: XenServer
from Citrix over two SATA disks using software raid-1, using lvm for
partitions and having on the same host other vm’s with the backends,
mysql database, etc. My IO was killing me when reaching a few hundreds
requests per second, leaving sometimes most of workers locked on IO
wait. After tunning and tunning nginx decided it wasn’t the one to
blame and fixed it other way: added SSD disk (software RAID-1 and LVM
too) to the host and mounted a partition on the nginx debian machine
exclusively for nginx cache, temp folder and logs. Since then i’ve not
been able to throw enough load to have a IO problem again :-).

My current iostats on the production server:
avg-cpu:
%user %nice %system %iowait %steal %idle
0.10 0.03 0.31 0.39 0.02 99.15

Cheers

Guzman

Posted at Nginx Forum:

On Wed, Oct 13, 2010 at 9:17 AM, cachenewbie [email protected]
wrote:

I think storing responses directly in disk makes it easy to store and
retrieve from different process contexts. However, if the storage space
is huge (in TB) and if the disk storage media is SSD, write avoidance is
something that should be considered. In this case, small files can be
stored in memory cache to speed up access to frequent webcontent and
cachemanager can try to store big blocks to disks when evicting content
from memory cache. Having memory cache and disk cache separated will
make more sense in such a case. I was trying to figure out if Nginx
supported that concept.

This is something the OS and filesystem layers can and should handle.
Coding such behavior into the application layer is a bad idea for
innumerable reasons (not the least of which is potability of the
application). Good SSD controllers also do memory caching, block
remapping, wear-leveling, and small-write coalescing internally,
allowing them to be used with traditional filesystems.

In short: buy good SSDs (a recent Intel or Sandforce-based drive is
good) and don’t worry about it.

Do you have any real-world data that shows that small file writes in a
proxy caching scenario is a actually problem worth worrying about? As
Knuth once said, “premature optimization is the root of all evil.”

RPM