Can proxy_cache gzip cached content?

dubstep · February 14, 2012, 1:12pm

Hello,

I’m looking at using Nginx as a reverse proxy to cache a few millions
HTML
pages coming from a backend server.

The cached content will very seldom (if at all) change so both
proxy_cache
and proxy_store could do, but all page URLs have a “/foo/$ID” pattern
and
IIUC with proxy_store that would cause millions of files in the same
directory, which the filesystem might not be ecstatic about. So for now
I’m
going with proxy_cache and two levels of directories. All is going great
in
my preliminary tests.

Now, rather than caching uncompressed files and gzipping them before
serving them most of the time, it would be great if cached content could
be
gzipped once (on disk) and served as such most of the time. This would
decrease both disk space requirements (by 7-8 times) and processor load.

Is this doable? Patching/recompiling nginx as well as using Lua are fine
with me. Serving gzipped content from the backend would in theory be
possible though for other reasons better avoided.

Thanks for any insight!
Massimiliano

Massimiliano_M · February 14, 2012, 2:50pm

Just make sure the “Accept-Encoding: gzip” is being passed to your
back-end, and let the back end do the compression. We actually normalize
the Accept-Encoding header as well with an if statement. Also use the
value of the Accept-Encoding header in your proxy_cache_key. This allows
non-cached responses for those clients that don’t support gzip (usually
coming through an old, weird proxy). So you will get both compressed and
uncompressed versions in your cache, but with our clients it’s like 99%
compressed versions at any one time.

Example:

server {

#your server stuff here

#normalize all accept-encoding headers to just gzip
set $myae “”;
if ($http_accept_encoding ~* gzip) {
set $myae “gzip”;
}

location / {
proxy_pass http://backend;
#the following allows comressed responses from backend
proxy_set_header Accept-Encoding $myae;

proxy_cache zone1;
proxy_cache_key “$request_uri $myae”;
proxy_cache_valid 5m;
proxy_cache_use_stale error updating;

}

Posted at Nginx Forum:

Massimiliano_M · February 15, 2012, 3:56pm

bard Wrote:

Thanks for the pointers. As I wrote, I’d rather
avoid gzipping in the
backend, but if that’s the only option so be it.

There’s no reason the “backend” for your caching layer cannot be another
nginx server block running on a high port bound to localhost. This
high-port server block could do gzip compression, and proxy-pass to the
back end with “Accept-Encoding: identity”, so the back-end never has to
do compression. The backend server will have to use “gzip_http_version
1.0” and “gzip_proxied any” to do compression because it is being
proxied from the front-end.

There might be a moderate performance impact, but because you’re caching
at the “frontmost” layer, the number of back-end hits should be small.

Also note there may be better options in the latest nginx versions, or
by using the gunzip 3rd-party module:
http://mdounin.ru/hg/ngx_http_gunzip_filter_module/file/27f057249155/README

With the gunzip module, you can configure things so that you always
cache compressed data, then only decompress it for the small number of
clients that don’t support gzip compression.

–
RPM

Posted at Nginx Forum:

Massimiliano_M · February 14, 2012, 9:03pm

Thanks for the pointers. As I wrote, I’d rather avoid gzipping in the
backend, but if that’s the only option so be it.

I was also concerned about caching gzipped content using the value of
Accept-Encoding in the cache key and ending with many duplicates because
of
slightly different yet equivalent headers, but your suggestion to
normalize
it solves it nicely.

Cheers,
Massimiliano

Massimiliano_M · February 18, 2012, 9:43pm

About this:

        location /proxied-stuff {
                proxy_set_header Accept-Encoding gzip;
                proxy_cache_key "$scheme$host$request_uri";
                proxy_cache_valid 2d;
                proxy_cache myapp_cache;
                proxy_pass http://127.0.0.1:85;
        }

I was hoping that gunzip’ping for clients that don’t support compression
would be as simple as adding the following inside the above block:

    if ($http_accept_encoding !~* gzip) {
            gunzip on;
    }

But when nginx configuration is reloaded, I get: “nginx: [emerg]
“gunzip”
directive is not allowed here”.

I suppose I could rewrite the request to an internal location, then
within
that location’s block re-set the proxy_cache_key accordingly. But
perhaps
there’s an easier way?

Massimiliano

Massimiliano_M · February 17, 2012, 2:04pm

On Wed, Feb 15, 2012 at 3:55 PM, rmalayter [email protected] wrote:

There’s no reason the “backend” for your caching layer cannot be another
nginx server block running on a high port bound to localhost. This
high-port server block could do gzip compression, and proxy-pass to the
back end with “Accept-Encoding: identity”, so the back-end never has to
do compression. The backend server will have to use “gzip_http_version
1.0” and “gzip_proxied any” to do compression because it is being
proxied from the front-end.

Ah, good point. I tried to take this an extra step further by using a
virtual host of the same server as “compression backend” and it appears
to
work nicely. Below is what I did so far, in case anyone is looking for
the
same and Google leads them here.

(Feels a bit like getting out of the door and back in through the window

but perhaps just like we have internal redirects it would be possible to
use ngx_lua to simulate an internal proxy and avoid the extra HTTP
request.)

proxy_cache_path  /var/lib/nginx/cache/myapp
    levels=1:2
    keys_zone=myapp_cache:10m
    max_size=1g
    inactive=2d;

log_format cache '***$time_local '
    '$upstream_cache_status '
    'Cache-Control: $upstream_http_cache_control '
    'Expires: $upstream_http_expires '
    '$host '
    '"$request" ($status) '
    '"$http_user_agent" '
    'Args: $args ';

access_log /var/log/nginx/cache.log cache;

upstream backend {
        server localhost:8002;
}

server { # this step only does compression
        listen 85;

        server_name myapp.local;
        include proxy_params;

        location / {
                gzip_http_version 1.0;
                proxy_set_header Accept-Encoding identity;
                proxy_pass http://backend;
        }
}

server {
        listen 80;

        server_name myapp.local;
        include proxy_params;

        location / {
                proxy_pass http://127.0.0.1:85;
        }

        location /similar-to {
                proxy_set_header Accept-Encoding gzip;
                proxy_cache_key "$scheme$host$request_uri";
                proxy_cache_valid 2d;
                proxy_cache myapp_cache;
                proxy_pass http://127.0.0.1:85;
        }
}

Also note there may be better options in the latest nginx versions, or
by using the gunzip 3rd-party module:
ngx_http_gunzip_filter_module: 27f057249155 README

With the gunzip module, you can configure things so that you always
cache compressed data, then only decompress it for the small number of
clients that don’t support gzip compression.

This looks perfect for having a gzip-only cache, which may not lead to
save
that much disk space but it certainly helps with mind space.

Cheers,
Massimiliano

Massimiliano_M · February 19, 2012, 11:03am

On Sun, Feb 19, 2012 at 12:31 AM, Maxim D. [email protected]
wrote:

Yes. The easier way is to just write
gunzip on;
It will gunzip responses for clients which don’t support gzip (as per
Accept-Encoding and gzip_http_version/gzip_proxied/gzip_disabled,
i.e. the same checks as done for gzip and gzip_static).

Thanks Max, I had come to the conclusion that it was always
decompressing
content but now I see I had been “testing” with curl -H
‘Content-Encoding:
gzip’. No wonder…

Cheers,
Massimiliano

Massimiliano_M · February 19, 2012, 12:31am

Hello!

On Sat, Feb 18, 2012 at 09:43:11PM +0100, Massimiliano M. wrote:

I suppose I could rewrite the request to an internal location, then within
that location’s block re-set the proxy_cache_key accordingly. But perhaps
there’s an easier way?

Yes. The easier way is to just write

 gunzip on;

It will gunzip responses for clients which don’t support gzip (as per
Accept-Encoding and gzip_http_version/gzip_proxied/gzip_disabled,
i.e. the same checks as done for gzip and gzip_static).

Maxim D.