Proxying large downloads from s3

Hi guys!

We are proxying files from s3 through our app and had a couple questions
on ideal config.

When I arrived on the scene, the following config was in place:

location /download/ {
internal;
proxy_pass https://s3.amazonaws.com;
proxy_buffering off;
proxy_buffers 2 4m;
proxy_buffer_size 4m;
proxy_busy_buffers_size 4m;
}

I was called in because the server started running out of memory. (It
seemed fine for a long time, probably just didn’t max out for a while)

After looking up config, it seemed like 4m was VERY large proxy_buffer
size. I was uncertain about whether proxy_buffering should be off or
not. Nginx as proxy for Amazon S3 (Example) recommends it off, but most of the
mailing list replies say unless you are using Comet or something,
there’s no need for it to be off.

We removed all the extra proxy config and we ended up with something
like this:

location /download/ {
internal;
proxy_pass https://s3.amazonaws.com;
proxy_buffering off;
chunked_transfer_encoding off;
}

We also tried with proxy_buffering on. In both cases it seems like we
are seeing truncated responses. This especially happens on large files
– the file will just be “done” downloading early, and the zip file will
be corrupt.

We are also seeing errors like this, but uncertain if it’s related.

2014/03/20 00:02:36 [error] 15519#0: *24132 upstream timed out (110:
Connection timed out) while reading response header from upstream,
client: 207.154.10.35, server: localhost, request: “GET
/download/?uuid=bdba5a17f58c8c808b6cc7fd97cf752a7b80ce1cc6eba6f0b56e411ef7cf3135
HTTP/1.1”, upstream:
https://192.168.99.1:443/utilities/s3_manifest_for_nginx_to_zip_and_stream/?uuid=bdbaecc7fd97cf752a7b80ce1cc6eoaoef7cf3135”,
host: “streaming.somesite.com

I was suspicious of chunked_transfer_encoding being off, so config now
looks like this:

location /download/ {
internal;
proxy_pass https://s3.amazonaws.com;
proxy_max_temp_file_size 256m;
proxy_read_timeout 300;
}

We are now waiting to find out if our large downloads are still being
interrupted. But we would appreciate any advice about proxying large
files from s3. Should proxy_buffering be on or off?

Thanks :slight_smile:

Sudara

Hello!

On Tue, Apr 01, 2014 at 07:59:45PM +0200, Sudara Williams wrote:

[…]

interrupted. But we would appreciate any advice about proxying large
files from s3. Should proxy_buffering be on or off?

There is no need to switch off proxy_buffering unless you are
doing streaming and/or long polling.

In most cases, proxy_buffering as seen in various configs is
misused to disable disk buffering. This is wrong,
proxy_max_temp_file_size should be used to control disk buffering.


Maxim D.
http://nginx.org/

Thanks Maxim!

That is what I suspected with regards to proxy_buffering — it is in
line with your other responses on the list.

With regards to early terminated / truncated large files — is this
something you or anyone else has seen before?

I’ll see if I can get some better logging going on and report back.
Might be tough to correlate failed requests in production with log
entries, but I’ll do my best :slight_smile:

Sudara

Hello!

On Wed, Apr 02, 2014 at 05:05:36PM +0200, Sudara Williams wrote:

[…]

With regards to early terminated / truncated large files — is this
something you or anyone else has seen before?

I’ll see if I can get some better logging going on and report back.
Might be tough to correlate failed requests in production with log
entries, but I’ll do my best :slight_smile:

Upstream timeouts certainly may result in truncated responses sent
to clients - if a timeout happens in the middle of a response,
there is no way how nginx can handle this.

(The log line you provided is “… while reading a response header
from upstream …” though, and it should result in 504 returned to a
client or next upstream tried if there are multiple upstream
servers.)

Use of “chunked_transfer_encoding off;” is expected to make things
worse, as it makes truncation undetectable if Content-Length isn’t
known. It should not be used unless there are good reasons too -
e.g., you have to support broken clients which use HTTP/1.1 but do
not understand chunked transfer encoding (see
Module ngx_http_core_module).


Maxim D.
http://nginx.org/