Several times recently, we have seen our production nginx memory usage
flare
up a hundred-fold, from its normal ~42 MB to 3-4 GB, for 20 minutes to
an
hour or so, and then recover. There is not a spike in number of
connections,
just memory use, so whatever causes this, it does not seem to be an
increase
in concurrency.
The obvious thing to suspect for this is our app’s newest change, which
involves streaming responses proxied from an upstream (via uwsgi_pass);
these responses can get very large and run for many minutes, pumping
hundreds of megabytes each. But I expect nginx memory use for these
requests
to be bounded by uwsgi_buffers (shouldn’t it be?) – and indeed I cannot
reproduce the problem by making such requests, even copying the exact
ones
that are being made when the memory spike occurs. In my tests, the
responses
get buffered as they should be, and delivered normally, without memory
growth.
So, what is a good way to investigate what causes all this memory to be
suddenly allocated? Is there a way of introspecting/logging nginx memory
allocation edge cases like this? (Is there documentation on this which I
didn’t find?)
During the memory spike (and before and after), connections are around
270,
of which around 20-25 are writing and the rest are waiting.
All other *_buffers settings are at their defaults, on a 64-bit machine.
Accounting for the above buffers, other buffer defaults, and some
uwsgi_cache and proxy_cache zones, and estimating on the high side (e.g.
all
buffer budgets fully used for each active connection) for 270
connections,
here is my rough sketch of the memory uses I am aware of, probably with
some
misunderstandings along the way:
On Mon, Jul 21, 2014 at 11:15:00AM -0400, gthb wrote:
to be bounded by uwsgi_buffers (shouldn’t it be?) – and indeed I cannot
reproduce the problem by making such requests, even copying the exact ones
that are being made when the memory spike occurs. In my tests, the responses
get buffered as they should be, and delivered normally, without memory
growth.
How do you track “nginx memory”?
From what you describe I suspect that disk buffering occurs (see Module ngx_http_uwsgi_module), and the number you
are looking at includes the size of files on disk.
So, what is a good way to investigate what causes all this memory to be
suddenly allocated? Is there a way of introspecting/logging nginx memory
allocation edge cases like this? (Is there documentation on this which I
didn’t find?)
The debuging log includes full information about all memory
allocations, see A debugging log.
What I was tracking was memory use per process name as reported by New
Relic
nrsysmond, which I’m pretty sure is RSS from ps output, summed over all
nginx processes.
From what you describe I suspect that disk buffering occurs (see Module ngx_http_uwsgi_module), and the number you
are looking at includes the size of files on disk.
I wish : ) because that’s what I want to happen for these large
responses.
But that’s definitely not it, because we see a spike of swap when this
occurs, with most other processes on the machine being paged out … and
in
the worst spikes swap has filled up and an OOM kill has occurred, which
conveniently records in syslog the RSS for an nginx process being
killed:
Jul 21 03:54:16 ren2 kernel: [3929562.712779] uwsgi invoked oom-killer:
gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
…
Jul 21 03:54:16 ren2 kernel: [3929562.737340] Out of memory: Kill
process
5248 (nginx) score 328 or sacrifice child
Jul 21 03:54:16 ren2 kernel: [3929562.737352] Killed process 5248
(nginx)
total-vm:3662860kB, anon-rss:3383776kB, file-rss:16kB
So that catches Nginx holding a 3.2GB resident set, matching what New
Relic
says about the same time.
The debuging log includes full information about all memory
allocations, see A debugging log.
Thank you. I haven’t been able to reproduce this outside of production
(or
even in production) so I might have to leave debug logging enabled in
production and hope to catch this next time it happens. Am I right to
assume
that enabling debug is going to weigh quite heavily on production usage,
and
eat up disk very fast? (Traffic peaks at around 100 req/sec and 2
MB/sec.)
On Mon, Jul 21, 2014 at 02:18:08PM -0400, gthb wrote:
[…]
The debuging log includes full information about all memory
allocations, see A debugging log.
Thank you. I haven’t been able to reproduce this outside of production (or
even in production) so I might have to leave debug logging enabled in
production and hope to catch this next time it happens. Am I right to assume
that enabling debug is going to weigh quite heavily on production usage, and
eat up disk very fast? (Traffic peaks at around 100 req/sec and 2 MB/sec.)
While debug logging is costly enough to notice, 100 r/s is quite
low and it should be fine to just enable the logging. If in
doubt, you may start with enabling debug only for a fraction of
users using the debug_connection directive, see Core functionality.
In any case, it may be a good idea to show full configuration as
recommended at Debugging | NGINX -
mostly to make sure you haven’t overlooked something obvious like
“uwsgi_buffer_size 10m” or alike.
On Mon, Jul 21, 2014 at 05:44:45PM -0400, gthb wrote:
upstream: “uwsgi://123.45.67.89:3003”, host: “foo.com”
2014/07/21 11:41:18 [alert] 16885#0: worker process 21182 exited on
signal 9
and retrying that request reproduces the problem, nginx growing in size
without bound. (The request never made it to the access log because of the
OOM kill, which is why my previous testing didn’t reproduce it)
[…]
These extra lines never appear in the healthy requests, so I imagine they
point to the problem (but I am not at all familiar with Nginx debug output);
in particular those “write new buf” lines look relevant; they are output
right after ngx_alloc_chain_link is called.
The lines in question are just sending the response to the client
via the response body filter chain.
All the possibly relevant Nginx config:
I don’t see anything obviously wrong in the config, but again - I
strongly recommend you to post full configuration. Or, better
yet, if you are able to reproduce the problem with a reduced
config, post it instead (and corresponding debug log).
[…]
Does this narrow down the problem? Can I provide anything further?
I finally reproduced this, with debug logging enabled — I found the
problematic request in the error log preceding the kill signal, saying
it
was being buffered to a temporary file:
2014/07/21 11:39:39 [warn] 21182#0: *32332838 an upstream response
is
buffered to a temporary file /var/cache/nginx/uwsgi_temp/9/90/0000186909
while reading upstream, client: x.x.x.x, server: foo.com, request: “GET
/api/nasty/troublemaker.csv?el=xyzzy!a:b&dates_as_dates=1 HTTP/1.1”,
upstream: “uwsgi://123.45.67.89:3003”, host: “foo.com”
2014/07/21 11:41:18 [alert] 16885#0: worker process 21182 exited on
signal 9
and retrying that request reproduces the problem, nginx growing in size
without bound. (The request never made it to the access log because of
the
OOM kill, which is why my previous testing didn’t reproduce it)
So here are debug log segments for single epoll events, during a healthy
streaming request (one that doesn’t trigger this problem), and during
the
problematic request. These requests ought to behave just the same (I’m
not
aware of any behavior difference in our upstream app, invoked via
uwsgi_pass, except that the CSV lines are a little longer in the
problematic
response).
For the problematic request, the epoll events often look identical
(all
the same kinds of lines, plus sometimes an extra pipe recv chain and
readv
line pair, presumably just because of data being streamed in slightly
bigger
chunks) … but sometimes they have some extra lines, which I’ve
highlighted
with an XXX prefix here:
These extra lines never appear in the healthy requests, so I imagine
they
point to the problem (but I am not at all familiar with Nginx debug
output);
in particular those “write new buf” lines look relevant; they are output
right after ngx_alloc_chain_link is called.
All the possibly relevant Nginx config:
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '...';
access_log /var/log/nginx/access.log main buffer=32k;
sendfile on;
keepalive_timeout 65;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
gzip on;
gzip_min_length 1280;
gzip_types text/css application/json application/javascript
Everything goes OK for a while, just a few stray mallocs, and then maybe
half a minute into the request (the time varies), after maybe 20-25MB have
been transferred, the flood of mallocs starts:
Ok, I see what goes on here. It is a combination of multiple
factors:
there are more than 16 buffers and hence stack-based buffer for
iovs isn’t big enough and nginx have to allocate memory in
ngx_readv_chain();
your backend app returns data in very small chunks, thus there
are many ngx_readv_chain() calls;
response is big, and hence the effect of the above is
noticeable.
Trivial workaround is to use “uwsgi_buffers 8 64k” instead.
Or you may try the following patch:
Everything goes OK for a while, just a few stray mallocs, and then maybe
half a minute into the request (the time varies), after maybe 20-25MB
have
been transferred, the flood of mallocs starts:
Trivial workaround is to use “uwsgi_buffers 8 64k” instead.
Or you may try the following patch:
Thank you! I tried the uwsgi_buffers workaround in production, and the
patch
in my reproduction setup, and indeed both seem to fix this problem; the
request runs to completion with no memory growth.
your backend app returns data in very small chunks, thus there
are many ngx_readv_chain() calls;
That’s a likely cause of high CPU usage in Nginx, right? It goes to 20%
for
this one request (without debug), the Python app taking the rest. My
intuition was that joining chunks on the Python side would be much more
expensive … but those thousands of ngx_readv_chain() calls per second
are
quite costly too, I take it?
On Tue, Jul 22, 2014 at 01:07:58PM -0400, gthb wrote:
your backend app returns data in very small chunks, thus there
are many ngx_readv_chain() calls;
That’s a likely cause of high CPU usage in Nginx, right? It goes to 20% for
this one request (without debug), the Python app taking the rest. My
intuition was that joining chunks on the Python side would be much more
expensive … but those thousands of ngx_readv_chain() calls per second are
quite costly too, I take it?
Syscalls on Python side, small packets over the network (even
local one), and syscalls on nginx side are all costly when
compared to using a reasonably sized buffer on Python side.