I’m looking at using Nginx as a reverse proxy to cache a few millions
HTML
pages coming from a backend server.
The cached content will very seldom (if at all) change so both
proxy_cache
and proxy_store could do, but all page URLs have a “/foo/$ID” pattern
and
IIUC with proxy_store that would cause millions of files in the same
directory, which the filesystem might not be ecstatic about. So for now
I’m
going with proxy_cache and two levels of directories. All is going great
in
my preliminary tests.
Now, rather than caching uncompressed files and gzipping them before
serving them most of the time, it would be great if cached content could
be
gzipped once (on disk) and served as such most of the time. This would
decrease both disk space requirements (by 7-8 times) and processor load.
Is this doable? Patching/recompiling nginx as well as using Lua are fine
with me. Serving gzipped content from the backend would in theory be
possible though for other reasons better avoided.
Just make sure the “Accept-Encoding: gzip” is being passed to your
back-end, and let the back end do the compression. We actually normalize
the Accept-Encoding header as well with an if statement. Also use the
value of the Accept-Encoding header in your proxy_cache_key. This allows
non-cached responses for those clients that don’t support gzip (usually
coming through an old, weird proxy). So you will get both compressed and
uncompressed versions in your cache, but with our clients it’s like 99%
compressed versions at any one time.
Example:
server {
#your server stuff here
#normalize all accept-encoding headers to just gzip
set $myae “”;
if ($http_accept_encoding ~* gzip) {
set $myae “gzip”;
}
location / {
proxy_pass http://backend; #the following allows comressed responses from backend
proxy_set_header Accept-Encoding $myae;
Thanks for the pointers. As I wrote, I’d rather
avoid gzipping in the
backend, but if that’s the only option so be it.
There’s no reason the “backend” for your caching layer cannot be another
nginx server block running on a high port bound to localhost. This
high-port server block could do gzip compression, and proxy-pass to the
back end with “Accept-Encoding: identity”, so the back-end never has to
do compression. The backend server will have to use “gzip_http_version
1.0” and “gzip_proxied any” to do compression because it is being
proxied from the front-end.
There might be a moderate performance impact, but because you’re caching
at the “frontmost” layer, the number of back-end hits should be small.
With the gunzip module, you can configure things so that you always
cache compressed data, then only decompress it for the small number of
clients that don’t support gzip compression.
Thanks for the pointers. As I wrote, I’d rather avoid gzipping in the
backend, but if that’s the only option so be it.
I was also concerned about caching gzipped content using the value of
Accept-Encoding in the cache key and ending with many duplicates because
of
slightly different yet equivalent headers, but your suggestion to
normalize
it solves it nicely.
I was hoping that gunzip’ping for clients that don’t support compression
would be as simple as adding the following inside the above block:
if ($http_accept_encoding !~* gzip) {
gunzip on;
}
But when nginx configuration is reloaded, I get: “nginx: [emerg]
“gunzip”
directive is not allowed here”.
I suppose I could rewrite the request to an internal location, then
within
that location’s block re-set the proxy_cache_key accordingly. But
perhaps
there’s an easier way?
There’s no reason the “backend” for your caching layer cannot be another
nginx server block running on a high port bound to localhost. This
high-port server block could do gzip compression, and proxy-pass to the
back end with “Accept-Encoding: identity”, so the back-end never has to
do compression. The backend server will have to use “gzip_http_version
1.0” and “gzip_proxied any” to do compression because it is being
proxied from the front-end.
Ah, good point. I tried to take this an extra step further by using a
virtual host of the same server as “compression backend” and it appears
to
work nicely. Below is what I did so far, in case anyone is looking for
the
same and Google leads them here.
(Feels a bit like getting out of the door and back in through the window
but perhaps just like we have internal redirects it would be possible to
use ngx_lua to simulate an internal proxy and avoid the extra HTTP
request.)
With the gunzip module, you can configure things so that you always
cache compressed data, then only decompress it for the small number of
clients that don’t support gzip compression.
This looks perfect for having a gzip-only cache, which may not lead to
save
that much disk space but it certainly helps with mind space.
It will gunzip responses for clients which don’t support gzip (as per
Accept-Encoding and gzip_http_version/gzip_proxied/gzip_disabled,
i.e. the same checks as done for gzip and gzip_static).
Thanks Max, I had come to the conclusion that it was always
decompressing
content but now I see I had been “testing” with curl -H
‘Content-Encoding:
gzip’. No wonder…
On Sat, Feb 18, 2012 at 09:43:11PM +0100, Massimiliano M. wrote:
I suppose I could rewrite the request to an internal location, then within
that location’s block re-set the proxy_cache_key accordingly. But perhaps
there’s an easier way?
Yes. The easier way is to just write
gunzip on;
It will gunzip responses for clients which don’t support gzip (as per
Accept-Encoding and gzip_http_version/gzip_proxied/gzip_disabled,
i.e. the same checks as done for gzip and gzip_static).
Maxim D.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.