Accept-Encoding: gzip and the Vary header

I have used gzip_static for some years without any issue that I am aware
of
with the default gzip_vary off.

My reasoning is that the HTTP spec says in

http://tools.ietf.org/html/rfc2616#page-145

that “the Vary field value advises the user agent about the criteria
that
were used to select the representation”, and my understanding is that
compressed content is not a representation per se. The representation
would
be the result of undoing what Content-Encoding says.

So, given the same .html endpoint you could for example serve content in
a
language chosen according to Accept-Language. That’s a representation
that
depends on headers in my understanding. If you serve the same .css over
and
over again no matter what, the representation does not vary. The
compressed
thing that is transferred is not the representation itself, so no Vary
needed.

Do you guys agree with that reading of the spec?

Then, you read posts about buggy proxy servers. Have any of you founded
a
real (modern) case in which the lack of “Vary: Accept-Encoding” resulted
in
compressed content being delivered to a client that didn’t support it?
Or
are those proxies mythical criatures as of today?

Thanks!

Xavier

Hi

On 4 Jun 2015, at 08:16, Xavier N. [email protected] wrote:

I have used gzip_static for some years without any issue that I am aware of with
the default gzip_vary off.

My reasoning is that the HTTP spec says in

http://tools.ietf.org/html/rfc2616#page-145

that “the Vary field value advises the user agent about the criteria that were
used to select the representation”, and my understanding is that compressed
content is not a representation per se. The representation would be the result of
undoing what Content-Encoding says.

This is fine to do. However, there’s a chance a proxy may cache an
uncompressed version if a client does not support compression and its
response ends up in a proxy cache. Any subsequent user also behind that
cache, even if it accepts compression, would be served it uncompressed
in most cases.

So, given the same .html endpoint you could for example serve content in a
language chosen according to Accept-Language. That’s a representation that depends
on headers in my understanding. If you serve the same .css over and over again no
matter what, the representation does not vary. The compressed thing that is
transferred is not the representation itself, so no Vary needed.

Do you guys agree with that reading of the spec?

This bit of the spec (same page at bottom) explains it better I think:

An HTTP/1.1 server SHOULD include a Vary header field with any cacheable
response that is subject to server-driven negotiation. Doing so allows a
cache to properly interpret future requests on that resource and informs
the user agent about the presence of negotiation on that resource.

I would say compression is a server driven negotiation. I would also
say, based on my understanding, that when the spec says representation
it means including encoding such as compression. That is, you can
represent a resource with gzip or without gzip.

Then, you read posts about buggy proxy servers. Have any of you founded a real
(modern) case in which the lack of “Vary: Accept-Encoding” resulted in compressed
content being delivered to a client that didn’t support it? Or are those proxies
mythical criatures as of today?

Proxy are bound by the spec too so yes it would be a buggy proxy. They
can’t send a Content-Encoding gzip unless the client sends
Accept-Encoding. I’m not entirely sure what would happen though - I
guess either bypass the compressed cache version or replace it
uncompressed. Most likely up to the proxy implementation.

Jason

Hello!

On Thu, Jun 04, 2015 at 11:49:18AM +0200, Xavier N. wrote:

http://tools.ietf.org/html/rfc2616#page-72

explicitly mentions Accept-Encoding as an example. So case closed.

Next question is: why is gzip_vary off by default? Isn’t the most common
case that you want it enabled?

The problem with Vary is that it causes bad effects on shared caches, in
particular, it normaly results in cache duplication. So by
default nginx doesn’t add Vary, and also doesn’t send compressed
content to proxies (gzip_proxied off). This approach works with
both HTTP/1.0 and HTTP/1.1 caches, and doesn’t cause cache
duplication.

See related discussion in this thread:

http://mailman.nginx.org/pipermail/nginx/2015-March/046965.html


Maxim D.
http://nginx.org/

On Thu, Jun 4, 2015 at 3:11 PM, Maxim D. [email protected] wrote:

The problem with Vary is that it causes bad effects on shared caches, in

particular, it normaly results in cache duplication.

You mean that if client A requests a resource with Accept-Encoding:
gzip,
and client B without, and the resource has Cache-Control: public, then a
shared cache would store the compressed and uncompressed responses thus
having the content kind of repeated?

On Thu, Jun 4, 2015 at 10:56 AM, Jason W. [email protected]
wrote:

An HTTP/1.1 server SHOULD include a Vary header field with any

cacheable response that is subject to server-driven negotiation.
Doing so allows a cache to properly interpret future requests on that
resource and informs the user agent about the presence of negotiation on that
resource.

You are right, and the section about server-driven negotiation

http://tools.ietf.org/html/rfc2616#page-72

explicitly mentions Accept-Encoding as an example. So case closed.

Next question is: why is gzip_vary off by default? Isn’t the most common
case that you want it enabled?

Xavier

PS: In my next reencarnation I promise to only work on specs written as
axiomatic systems.

Ahhh, I see.

We’ve seen that if you want cache + compression, then you need Vary. So
by counter-reciprocal the trade-off of gzip_vary off is that the
response
can’t be cached at all in the sense that you’re not sending the proper
headers. No matter if the cache is private or shared. At least in
theory.

If you turn gzip_vary on to get some caching, but keep gzip_proxied off,
and Cache-Control is “public”, then I guess clients behind those shared
caches would get uncompressed content unless the shared caches
themselves
compress on the fly (does that happen?)

In the typical use case of a CSS file with a fingerprint in its filename
for aggressive caching I guess you actually need to go with (off the top
of
my head):

gzip_vary on;
gzip_proxied on;

expires max;
add_header Cache-Control "public";

Hello!

On Thu, Jun 04, 2015 at 03:41:32PM +0200, Xavier N. wrote:

On Thu, Jun 4, 2015 at 3:11 PM, Maxim D. [email protected] wrote:

The problem with Vary is that it causes bad effects on shared caches, in

particular, it normaly results in cache duplication.

You mean that if client A requests a resource with Accept-Encoding: gzip,
and client B without, and the resource has Cache-Control: public, then a
shared cache would store the compressed and uncompressed responses thus
having the content kind of repeated?

Not really. The main problem is that there is more than 2
clients, and many of them will use different Accept-Encoding
headers, e.g.:

gzip,deflate
gzip, deflate
gzip,deflate,sdch
gzip
deflate, gzip
identity
gzip,deflate,lzma,sdch
gzip;q=1.0,deflate;q=0.6,identity;q=0.3
gzip;q=1.0, deflate;q=0.8, chunked;q=0.6, identity;q=0.4, *;q=0
deflate
identity,gzip,deflate
gzip, deflate, peerdist
gzip, deflate, identity
gzip, x-gzip
gzip, deflate, compress

As a result, there will be many copies of compressed and
uncompressed responses in the cache.


Maxim D.
http://nginx.org/

On Thu, Jun 4, 2015 at 9:36 PM, Xavier N. [email protected] wrote:

gzip_proxied on;

s/on/any/