Potential Bug: Gzipping preventing HTTP range requests

Hi!

A question was posed in IRC yesterday about Range requests not being
honoured when using gzipping.
After testing it I’ve confirmed that there appears to be a problem.

I’ve put a test file online you can test against:
http://216.218.189.55/range.txt

The following curl runs were used to test:
curl -r 0-5 http://216.218.189.55/range.txt ← gives 6 bytes
curl -r 0-5 --compressed http://216.218.189.55/range.txt ← gives full
file

I’ve pasted the results of debug logging here:
http://pastie.org/pastes/2116321/text?key=shexc9k0h7nmu6p4gkhug

Thanks for your attention,
Martin Fjordvald

Posted at Nginx Forum:

On Fri, Jun 24, 2011 at 01:46:46PM -0400, Ensiferous wrote:

curl -r 0-5 http://216.218.189.55/range.txt ← gives 6 bytes
curl -r 0-5 --compressed http://216.218.189.55/range.txt ← gives full
file

I’ve pasted the results of debug logging here:
http://pastie.org/pastes/2116321/text?key=shexc9k0h7nmu6p4gkhug

Thanks for your attention,
Martin Fjordvald

There is an issue with gzipping and ranges - what should be done first.
As far as I know there is no order description in RFC (correct me, if
I wrong), so we can implement only de facto state (as it’s already
done with gzip content encoding).

Apache (at least 2.3.8) gzips content first and then process ranges
on gzipped body and according gzipped content length, but I do not
know how browsers handle this.


Igor S.

Yeah it’s a weird situation. As a user I would probably expect that the
range applied to the actual content served, before it was compressed. So
that if I request 100 bytes when everything is transferred and
decompressed I have 100 bytes worth of content.

Posted at Nginx Forum:

On Fri, Jun 24, 2011 at 07:04:04PM -0400, Ensiferous wrote:

Yeah it’s a weird situation. As a user I would probably expect that the
range applied to the actual content served, before it was compressed. So
that if I request 100 bytes when everything is transferred and
decompressed I have 100 bytes worth of content.

A “Content-Length” header of a gzipped response corresponds to length
of the gzipped data. I’m not sure if this specified in RFC, but this
is de facto behaviour. So since range is associated with the
“Content-Length” header it should work with already gzipped body,
so Apache 2.3.8 does it right:


$ nc httpd.apache.org 80
GET / HTTP/1.0
Host: httpd.apache.org
Range: bytes=10-151
Accept-Encoding: gzip

HTTP/1.1 206 Partial Content
Date: Sat, 25 Jun 2011 04:17:51 GMT
Server: Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c
Last-Modified: Sun, 22 May 2011 17:04:34 GMT
ETag: “b96c29-247b-4a3e0595b4c80-gzip”
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Range: bytes 10-151/2631
Content-Length: 142
Connection: close
Content-Type: text/html

?Zms?:??_ [ … gzipped data ]


$ nc httpd.apache.org 80
GET / HTTP/1.0
Host: httpd.apache.org
Range: bytes=10-20, 30-50
Accept-Encoding: gzip

HTTP/1.1 206 Partial Content
Date: Sat, 25 Jun 2011 04:19:14 GMT
Server: Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c
Last-Modified: Sun, 22 May 2011 17:04:34 GMT
ETag: “b96c29-247b-4a3e0595b4c80-gzip”
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 226
Connection: close
Content-Type: multipart/byteranges; boundary=4a6819ef865b815212

–4a6819ef865b815212
Content-type: text/html
Content-range: bytes 10-20/2631

?Zms?:??_?
–4a6819ef865b815212
Content-type: text/html
Content-range: bytes 30-50/2631

oo ???;yas?SF?Vc[
–4a6819ef865b815212–

However, curl does not understand it:


$ curl -r 10-151 httpd.apache.org/
html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>

$ curl -r 10-151 --compressed httpd.apache.org/ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/ --------

Note also that it’s impossible to ungzip a response part if you have not
preceding parts from the very start.


Igor S.

Hello!

On Sat, Jun 25, 2011 at 08:22:21AM +0400, Igor S. wrote:

so Apache 2.3.8 does it right:
Yes, as long as Content-Encoding used (not Transfer-Encoding)
ranges must be in interpreted on compressed content.

The Content-Length entity-header field indicates the size of the
entity-body, in decimal number of OCTETs…

Byte range specifications in HTTP apply to the sequence of bytes in
the entity-body (not necessarily the same as the message-body).

The message-body (if any) of an HTTP message is used to carry the
entity-body associated with the request or response. The message-body
differs from the entity-body only when a transfer-coding has been
applied, as indicated by the Transfer-Encoding header field (section
14.41).

And, just for completeness, http message syntax is:

    generic-message = start-line
                      *(message-header CRLF)
                      CRLF
                      [ message-body ]

(RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1)

[…]

Note also that it’s impossible to ungzip a response part if you have not
preceding parts from the very start.

This as well applies to many other types of data.

The main problem with Content-Encoding and ranges is that one
somehow should be able to reproduce exactly the same entity-body
(or at least make sure cache validators would change on
entity-body change). This is not something trivial when you
compress on the fly with possible different compression options.

I personally think that moving towards using Transfer-Encoding
would be a good step for “on the fly” compression. But browser
support seems to be not here at all.

Maxim D.

On Sat, Jun 25, 2011 at 08:22:21AM +0400, Igor S. wrote:

so Apache 2.3.8 does it right:
Server: Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c
?Zms?:??_ [ … gzipped data ]
Server: Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c
–4a6819ef865b815212

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/ -------- Note also that it's impossible to ungzip a response part if you have not preceding parts from the very start.

It seems I was wrong: curl tries to ungzip received body using zlib,
so since the first 10 bytes is gzip header, curl successfully
decompresses
body, but this example does not work:

$ curl -r 11-151 --compressed httpd.apache.org/
curl: (61) Error while processing content unencoding: invalid distance
code


Igor S.