Proxy caching large files with too many temporary files

Hello community !

I’m currently using nginx as a proxy cache to a backend where large
files are stored and I have a big blocking issue.
I’m using the following structure:

Client — Cache1 — Cache2 — HTTP with large files

The problem I’m experiencing is in Cache1 and happens whenever a large
file (not cached) is requested by hundreds users.

The proxy_cache module will buffer the request to disk, writing it to
temporary proxy cache path, but, because of the excessive number of
requests, it will write the same file hundreds times, until is fully
cached (and then will be served from cache).

This cause having also 2000 files open for writing 4GB at the same time,
blocking the workers, issuing kernel panics and smashing literally the
machine.

What I wanted to ask you guys, is : is there a way that nginx will
buffer the file only one time and then all the workers may even use the
same buffer to serve the uncached requests? So the temp file will be
written only one time?

To work around this, I wrote a small perl script that I can share with
you if you need, that basically do the following.

In nginx, with EmbeddedPerlModule, I open a file in /dev/shm and read
each line to find the proxy_cache_key.
If the key is there, then it will set a variable that will be passed to
proxy_no_cache to 1. If the key is not there, proxy_no_cache will be set
to 0 and the relative proxy_cache_key will be appended in the last line
of the /dev/shm/cache/keys file.

This will cause nginx to use cache only for the first request and to
fetch the file directly from Cache2 for all the successive requests.

Moreover, to avoid situations where that cache keys keep staying there
forever, possibly blocking new file’s caching, I have another small
script that run outside nginx (perl script) that open the
/dev/shm/cache/keys file and scan it line by line.
Foreach line it will compute the md5 hash and check if the file is in
cache folder. If yes, will remove the line from the file. If not, then
it opens all temporary cache files (I know it might sound bad, but I
can’t find a better way right now), and read the second line that
contains “KEY: http://www.example.com/file.bin”. If the KEY match the
key in /dev/shm/cache/keys file, then the line is keept there. If no
temporary files contain that key, then the key is removed from the keys
file, so that it will be cached at the next request without problems.

Do you have any better suggestion / patches that may fix the problem I
have?

Please, consider that I get thousands connections per time (2000+),
requesting a mix of large (mostly) and small files. I tried to play with
all the proxy_cache module settings but had no luck.

Thank you in advance for any help !

Kind regards,

Paolo Iannelli

Posted at Nginx Forum: