Weird Memleak problem

paul · August 31, 2009, 3:56am

I had nginx 0.6.32 and just upgraded to 0.7.61 and still have the same
memleak issue.
What happens is as soon as I start nginx, it starts using ram and
continues to use more and more and more over a 16 hour period is
consumes 8gb ram and all the swap and then errors out because it cant
use any more.
This never used to happen until recently and the only difference at all
in the config is more server entries.

here’s the config:

user www www;

worker_processes 16;
error_log logs/error.log;
worker_rlimit_nofile 65000;

events
{

    worker_connections 40000;

}

####### HTTP SETTING
http
{
access_log off;
log_format alot '$remote_addr - $remote_user [$time_local] ’
'“$request” $status $body_bytes_sent ’
‘“$http_referer” “$http_user_agent” “$http_accept”
$connection’;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 0;
output_buffers 16 128k;
server_tokens off;
ssl_verify_client off;
ssl_session_timeout 10m;

ssl_session_cache shared:SSL:500000;

include /usr/local/nginx/conf/mime.types;
default_type application/octet-stream;

cache_max_size 24;

    gzip on;
    gzip_min_length 512;
    gzip_buffers 64 32k;
    gzip_types text/plain text/html text/xhtml text/css text/js;

    proxy_buffering on;
    proxy_buffer_size 32m;
    proxy_buffers 16 32m;
    proxy_busy_buffers_size 64m;
    proxy_temp_file_write_size 2048m;
    proxy_intercept_errors on;
    proxy_ssl_session_reuse  off;
    proxy_read_timeout 120;
    proxy_connect_timeout 60;
    proxy_send_timeout 120;
    client_body_buffer_size 32m;
    client_header_buffer_size 64k;
    large_client_header_buffers 16 64k;
    client_max_body_size 16m;


    server
    {
            listen 1.2.3.4:80;
            location /
            {

                    proxy_pass http://3.4.5.6;
                    proxy_redirect http://3.4.5.6/

http://$http_host/;
proxy_redirect default;
proxy_set_header Host
$host; ##Forwards host along
proxy_set_header X-Real-IP
$remote_addr; ##Sends realip to customer svr
proxy_set_header X-Forwarded-For
$remote_addr; ##Sends realip to customer svr
}
}
server
{
listen 1.2.3.4:443;

            ssl on;
            ssl_certificate

/usr/local/nginx/conf/whatever.com.crt;
ssl_certificate_key
/usr/local/nginx/conf/whatever.com.key;
ssl_protocols SSLv3;
ssl_ciphers HIGH:!ADH;
ssl_prefer_server_ciphers on;
location /
{
proxy_pass https://3.4.5.6;
proxy_redirect https://3.4.5.6/
http://$http_host/;
proxy_redirect default;
proxy_set_header Host
$host; ##Forwards host along
proxy_set_header X-Real-IP
$remote_addr; ##Sends realip to customer svr
proxy_set_header X-Forwarded-For
$remote_addr; ##Sends realip to customer svr
proxy_set_header X-FORWARDED_PROTO https;
}
}

And these server entries repeated about 60 or so times and that’s it.
When it was around 40 we never had a memleak issue.

This is on Linux, kernel is 2.6.25

Thanks

paul · August 31, 2009, 8:31am

On Sun, Aug 30, 2009 at 09:45:55PM -0400, Paul wrote:

}
tcp_nopush on;

cache_max_size 24;
   proxy_temp_file_write_size 2048m;

                   proxy_set_header        Host            

                   proxy_pass https://3.4.5.6;
           }
   }
And these server entries repeated about 60 or so times and that’s it.
When it was around 40 we never had a memleak issue.

This is on Linux, kernel is 2.6.25

What does “nginx -V” show ?

paul · August 31, 2009, 7:07pm

nginx version: nginx/0.7.61
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-44)
configure arguments: --with-http_ssl_module --with-http_addition_module
–with-http_stub_status_module --with-http_realip_module
–with-http_sub_module --with-http_dav_module --with-poll_module
–with-openssl=/root/openssl-0.9.8h

paul · August 31, 2009, 10:24pm

I know, but this problem has never occured until recently… Once the
request is done, it should remove the memory allocation, but it looks
like maybe it isn’t?
The only difference in a month ago and now is that we have more server
entries, and more requests per second. It used to not even use a gig of
ram doing 200 requests/sec and now it keeps using more and more ram
until it fills the entire ram and swap and errors out 8gb+ …
What would you suggest?

paul · August 31, 2009, 10:43pm

On Mon, Aug 31, 2009 at 04:17:03PM -0400, Paul wrote:

I know, but this problem has never occured until recently… Once the
request is done, it should remove the memory allocation, but it looks
like maybe it isn’t?

No. These workers run since Aug 7 and handle up to 3,800r/s:

ps ax -o pid,ppid,%cpu,vsz,wchan,start,command|egrep ‘(nginx|PID)’
PID PPID %CPU VSZ WCHAN STARTED COMMAND
42412 51429 16.7 372452 kqread 7Aug09 nginx: worker process (nginx)
42413 51429 18.2 372452 kqread 7Aug09 nginx: worker process (nginx)
42414 51429 0.0 291556 kqread 7Aug09 nginx: cache manager process
(nginx)
51429 1 0.0 291556 pause 28Jul09 nginx: master process
/usr/local/nginx/ng

The only difference in a month ago and now is that we have more server
entries, and more requests per second. It used to not even use a gig of
ram doing 200 requests/sec and now it keeps using more and more ram
until it fills the entire ram and swap and errors out 8gb+ …
What would you suggest?

Just 250 simultaneous proxied connections take 8G. And 250 connections
are too little in modern world. Why at all you have set up so huge
buffers ?

paul · August 31, 2009, 9:18pm

On Sun, Aug 30, 2009 at 09:45:55PM -0400, Paul wrote:

}
tcp_nopush on;

cache_max_size 24;

   gzip on;
   gzip_min_length 512;
   gzip_buffers 64 32k;
   gzip_types text/plain text/html text/xhtml text/css text/js;

   proxy_buffering on;
   proxy_buffer_size 32m;
   proxy_buffers 16 32m;

These settings allocate 32M buffer for each proxied request.

   proxy_busy_buffers_size 64m;
   proxy_temp_file_write_size 2048m;
   proxy_intercept_errors on;
   proxy_ssl_session_reuse  off;
   proxy_read_timeout 120;
   proxy_connect_timeout 60;
   proxy_send_timeout 120;
   client_body_buffer_size 32m;

This setting allocates 32M buffer for each request with body.

paul · August 31, 2009, 11:06pm

On Mon, Aug 31, 2009 at 04:52:51PM -0400, Paul wrote:

Had some issues with people uploading/downloading files before on 0.6
and the buffers fixed it…

These issues should be fixed in other way.

What would you suggest I change the config to? I will try and let you
know if I still see the problem.
Thank you.

```
proxy_buffer_size 32m;
```
```
proxy_buffers 16 32m;
```
```
proxy_busy_buffers_size 64m;
```
```
proxy_temp_file_write_size 2048m;
```
```
client_body_buffer_size 32m;
```

```
proxy_buffer_size 32k;
```
```
proxy_buffers 16 32k;
```
```
client_body_buffer_size 32k;
```

paul · August 31, 2009, 11:00pm

Had some issues with people uploading/downloading files before on 0.6
and the buffers fixed it…
What would you suggest I change the config to? I will try and let you
know if I still see the problem.
Thank you.

paul · August 31, 2009, 11:07pm

On Mon, 2009-08-31 at 16:17 -0400, Paul wrote:

I know, but this problem has never occured until recently… Once the
request is done, it should remove the memory allocation, but it looks
like maybe it isn’t?
The only difference in a month ago and now is that we have more server
entries, and more requests per second. It used to not even use a gig of
ram doing 200 requests/sec and now it keeps using more and more ram
until it fills the entire ram and swap and errors out 8gb+ …
What would you suggest?

There’s many more factors than simple requests per second and number of
server entries:

the speed of the proxied backends: if the backend is slow, then that
memory will be held onto for much longer.
the speed of clients: slow clients can cause the request processing
cycle to be slowed down, causing memory to be held onto for longer.
resource contention: more backends means slower overall performance
as the OS can’t cache as efficiently (both CPU and disk), must handle
more context switches, more disk seeks, etc.

In short, one resource shortage (say CPU or I/O) can cause systemic
resource shortages as other resources can’t be freed in a timely fashion
(in turn causing even more resource shortages), so the problem quickly
spirals out of control.

These problems are pretty much unavoidable on a single server, the only
question is at what point will you encounter them. Basically there’s a
breaking point in the scalability of any particular system. Tuning is
the art of pushing that breaking point out as far as possible. Because
you’ve got excessively large buffers, you’ve almost certainly brought
that breaking point upon yourself much earlier than need be.

As Igor mentions, 32MB seems really, really excessive (an application
that generates responses of that size calls for one or more dedicated
servers and brain surgery for the developer). Maybe you should try
something closer to 8 kilobytes and see if that addresses your issues.

Regards,
Cliff

paul · August 31, 2009, 11:16pm

On Mon, 2009-08-31 at 16:52 -0400, Paul wrote:

Had some issues with people uploading/downloading files before on 0.6
and the buffers fixed it…

What sort of issues? Timeouts or “client body too large”? I regularly
upload 10MB+ CSV files to a proxied application with the following:

client_max_body_size 20m;
client_body_buffer_size 128k;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 90;
proxy_buffer_size 4k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;

Cliff

paul · September 1, 2009, 12:08am

Yes, it was timeouts, but that could have been fixed in the newer nginx
versions. I’m just wondering why
the ram usage hasn’t occured before and and is now… It never used to
use much at all and now it’s using an
exorbitant amount of ram with nothing major changed… That’s why I
figured something was wrong.
It takes it about 12 hours or so to use 6-7gb of ram. That is another
reason why I thought something might be wrong.
I’ll try changing the buffers and watching the RAM usage again and see
what happens.
Thank you, and Igor, for your inputs.

Paul

paul · September 1, 2009, 12:18am

Oh I also notice another issue, when ram usage is high like this, when i
reload nginx it does this:
root 5776 0.0 0.1 19528 4876 ? Ss Aug30 0:00 nginx:
master process /usr/local/nginx/sbin/nginx -c conf/ng1.conf
www 5777 2.3 2.3 184060 94364 ? S Aug30 28:53 nginx:
worker process is shutting down
www 5778 2.3 2.3 315500 95496 ? S Aug30 29:23 nginx:
worker process is shutting down
www 5781 2.2 3.8 342904 154144 ? S Aug30 28:08 nginx:
worker process is shutting down
www 5784 2.4 4.9 508552 199036 ? S Aug30 30:17 nginx:
worker process is shutting down
www 5785 2.4 3.0 349364 122764 ? S Aug30 30:18 nginx:
worker process is shutting down
www 5786 2.3 5.8 511264 234928 ? R Aug30 29:27 nginx:
worker process is shutting down
www 5787 2.3 2.7 209608 111840 ? S Aug30 29:29 nginx:
worker process is shutting down
www 5788 2.3 6.5 406672 263352 ? S Aug30 29:24 nginx:
worker process is shutting down
www 5789 2.3 2.4 275580 98532 ? S Aug30 29:25 nginx:
worker process is shutting down
www 5790 2.3 3.5 316852 145448 ? S Aug30 29:10 nginx:
worker process is shutting down
www 5791 2.3 6.5 505276 265980 ? S Aug30 29:13 nginx:
worker process is shutting down
www 5792 2.4 1.5 255980 62228 ? S Aug30 30:05 nginx:
worker process is shutting down
www 5793 2.3 5.4 546796 221996 ? S Aug30 29:34 nginx:
worker process is shutting down
www 5794 2.3 2.8 250940 116300 ? S Aug30 28:39 nginx:
worker process is shutting down
www 5795 2.3 2.1 246528 86908 ? S Aug30 29:08 nginx:
worker process is shutting down
www 5796 2.4 3.9 340724 161644 ? S Aug30 30:20 nginx:
worker process is shutting down
www 6347 2.8 0.7 43356 28640 ? S 17:10 0:13 nginx:
worker process
www 6348 2.7 0.6 42472 27636 ? S 17:10 0:13 nginx:
worker process
www 6349 2.4 0.6 40776 25672 ? S 17:10 0:11 nginx:
worker process
www 6350 2.3 0.6 41864 27104 ? S 17:10 0:11 nginx:
worker process
www 6351 2.3 0.6 42728 28028 ? S 17:10 0:11 nginx:
worker process
www 6352 2.3 0.6 43040 27828 ? S 17:10 0:11 nginx:
worker process
www 6353 2.6 0.7 43972 29360 ? S 17:10 0:12 nginx:
worker process
www 6354 2.4 0.7 46104 30892 ? S 17:10 0:11 nginx:
worker process
www 6355 2.0 0.6 40252 25776 ? S 17:10 0:09 nginx:
worker process
www 6356 2.9 0.7 46408 31396 ? S 17:10 0:14 nginx:
worker process
www 6357 2.2 0.6 41600 26816 ? S 17:10 0:10 nginx:
worker process
www 6358 2.3 0.6 42536 27960 ? S 17:10 0:11 nginx:
worker process
www 6359 2.4 0.6 40564 26120 ? S 17:10 0:11 nginx:
worker process
www 6360 2.4 0.6 42272 27656 ? S 17:10 0:11 nginx:
worker process
www 6361 2.8 0.7 46764 30336 ? S 17:10 0:14 nginx:
worker process
www 6362 2.1 0.6 40224 25636 ? S 17:10 0:10 nginx:
worker process

and the processes stay in ‘process is shutting down’ for a LONG time…
I’ve waited quite a while before
doing a restart instead of reload to clear them out. Not sure what
would cause this?
Most of my connection timeouts are set between 60-120seconds. Except SSL
session one is 10m, but i’ve waited
hours for the old processes to shut down.

Thanks again

paul · September 2, 2009, 1:56am

I know this… but there isn’t any slow requests going on Or shouldn’t
be anyway. Just odd it never used to
take more than a few mins, must have been a new site we added or
something that is keeping the connections open long.

paul · September 1, 2009, 9:47am

Hello!

On Mon, Aug 31, 2009 at 06:10:16PM -0400, Paul wrote:

Oh I also notice another issue, when ram usage is high like this, when i
reload nginx it does this:
root 5776 0.0 0.1 19528 4876 ? Ss Aug30 0:00 nginx:
master process /usr/local/nginx/sbin/nginx -c conf/ng1.conf
www 5777 2.3 2.3 184060 94364 ? S Aug30 28:53 nginx:
worker process is shutting down
www 5778 2.3 2.3 315500 95496 ? S Aug30 29:23 nginx:

[…]

and the processes stay in ‘process is shutting down’ for a LONG time…
I’ve waited quite a while before
doing a restart instead of reload to clear them out. Not sure what
would cause this?
Most of my connection timeouts are set between 60-120seconds. Except SSL
session one is 10m, but i’ve waited
hours for the old processes to shut down.

nginx worker processes wait for all currently processed requests
to finish before exiting. It can take hours if there are active
but slow requests or requests with big responses.

You may use normal network diagnostic tools (netstat, tcpdump,
…) to see what actually happens on the wire.

Maxim D.