Race condition in limit_zone/limit_conn?

I am seeing a behavior in nginx/0.7.62 that I cannot explain.

I want to limit the maximum number of concurrent connections per
virtual host (server block). As an approximation of that, I’m using

limit_zone max_conn $host 10m;

(10MB because we have a lot of domain names) and then within each
server block

limit_conn max_conn N;

to limit to N concurrent connections. To test this, I put a script
called sleep.php on the back-end web nodes that is just

<?php sleep($_GET['sleep']); ?>

Suppose I set N to 3. If I run

ab -n 3 -c 3 ‘http://domain.name/sleep.php?sleep=30

it works as expected. All three connections are established and hang
for 30 seconds. During that time, if I try to connect to that host
name in another window, I get a 503. After 30 seconds, the 3 sleeping
requests complete normally.

With N set to 3, if I run

ab -n 4 -c 4 ‘http://test3.bjaspan.acquia-sites.com/sleep.php?sleep=30

what I would expect to see is that 3 connections would succeed and the
fourth would fail quickly with a 503. Instead, what happens is that
all 4 fail immediately with a 503.

The log is not particularly illuminating (at least to me):

root@bal-11:~# tail -f /var/log/nginx/error.log | grep “limit”
2010/05/03 19:29:34 [debug] 4080#0: *1 limit zone: 265A3FCB 1
2010/05/03 19:29:34 [debug] 4080#0: *2 limit zone: 265A3FCB 2
2010/05/03 19:29:34 [debug] 4080#0: *4 limit zone: 265A3FCB 3
2010/05/03 19:29:34 [error] 4080#0: *6 limiting connections by zone
“max_conn”, client: 127.0.0.1, server: domain.name, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host: “domain.name”
2010/05/03 19:29:34 [debug] 4080#0: *6 http write filter limit 0
2010/05/03 19:29:34 [error] 4080#0: *8 limiting connections by zone
“max_conn”, client: 127.0.0.1, server: domain.name, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host: “domain.name”
2010/05/03 19:29:34 [debug] 4080#0: *8 http write filter limit 0
2010/05/03 19:29:35 [error] 4080#0: *9 limiting connections by zone
“max_conn”, client: 127.0.0.1, server: domain.name, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host: “domain.name”
2010/05/03 19:29:35 [debug] 4080#0: *9 http write filter limit 0
2010/05/03 19:29:35 [error] 4080#0: *10 limiting connections by zone
“max_conn”, client: 127.0.0.1, server: domain.name, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host: “domain.name”
2010/05/03 19:29:35 [debug] 4080#0: *10 http write filter limit 0
2010/05/03 19:29:35 [debug] 4080#0: limit zone cleanup: 265A3FCB 3
2010/05/03 19:29:35 [debug] 4080#0: limit zone cleanup: 265A3FCB 2
2010/05/03 19:29:35 [debug] 4080#0: limit zone cleanup: 265A3FCB 1

Looking in ngx_http_limit_zone_handler(), it seems that only a single
connection should get NGX_HTTP_SERVICE_UNAVAILABLE when the limit is
exceeded, unless I do not understand what the return value from the
handler means (which is entirely possible, I’ve never looked inside
nginx’s code before).

Any thoughts?

Thanks,

Barry

Hello!

On Mon, May 03, 2010 at 03:37:49PM -0400, Barry Jaspan wrote:

[…]

2010/05/03 19:29:34 [debug] 4080#0: *2 limit zone: 265A3FCB 2
“max_conn”, client: 127.0.0.1, server: domain.name, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host: “domain.name”
2010/05/03 19:29:35 [debug] 4080#0: *9 http write filter limit 0
2010/05/03 19:29:35 [error] 4080#0: *10 limiting connections by zone
“max_conn”, client: 127.0.0.1, server: domain.name, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host: “domain.name”
2010/05/03 19:29:35 [debug] 4080#0: *10 http write filter limit 0
2010/05/03 19:29:35 [debug] 4080#0: limit zone cleanup: 265A3FCB 3
2010/05/03 19:29:35 [debug] 4080#0: limit zone cleanup: 265A3FCB 2
2010/05/03 19:29:35 [debug] 4080#0: limit zone cleanup: 265A3FCB 1

You had at least 7 connections (*1, *2, *4, *6, *8, *9, *10)
during the test. First 3 were passed through, others were
rejected.

Most likely you configured nginx to proxy to itself and this
causes extra connections.

Maxim D.

Maxim,

Thank you for your reply.

On Mon, May 3, 2010 at 7:58 PM, Maxim D. [email protected] wrote:

You had at least 7 connections (*1, *2, *4, *6, *8, *9, *10)
during the test. First 3 were passed through, others were
rejected. … Most likely you configured nginx to proxy to itself and this
causes extra connections.

I understand how making nginx proxy to itself would cause extra
connections and how those might make all of the initial connections
fail. If the limit is 3, I open 3, then each of those 3 attempt to
proxy to nginx again, nginx will return 503 to the proxy request which
will then be returned to the original requests.

However, I do not see how that is happening in my case. I have reduced
my configuration file to a minimum. Here it is:

---- snip snip ----
user nginx;
worker_processes 10;
error_log /var/log/nginx/error.log debug;
pid /var/run/nginx.pid;
events {
worker_connections 2048;
}
http {
proxy_set_header Host $host;
limit_zone max_conn $host 10m;
limit_conn max_conn 3;
server {
listen 80;
server_name _;
location / {
proxy_pass http://10.252.86.98;
}
}
}
---- snip snip ----

Does this config file make nginx proxy to itself? I do not see how.
But with this file, when I run the command

ab -v 3 -c 4 -n 4

http://maxcon1.bjaspan.acquia-sites.com/sleep.php?sleep=15

ab shows that all four requests fail with status 503 and I get the
nginx error log output:

tail -f /var/log/nginx/error.log | grep limit

2010/05/04 16:06:23 [debug] 4215#0: *1 limit zone: 265A3FCB 1
2010/05/04 16:06:23 [debug] 4215#0: *2 limit zone: 265A3FCB 2
2010/05/04 16:06:23 [debug] 4215#0: *4 limit zone: 265A3FCB 3
2010/05/04 16:06:23 [error] 4215#0: *6 limiting connections by zone
“max_conn”, client: 64.119.156.90, server: _, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host:
maxcon1.bjaspan.acquia-sites.com
2010/05/04 16:06:23 [debug] 4215#0: *6 http write filter limit 0
2010/05/04 16:06:23 [error] 4215#0: *8 limiting connections by zone
“max_conn”, client: 64.119.156.90, server: _, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host:
maxcon1.bjaspan.acquia-sites.com
2010/05/04 16:06:23 [debug] 4215#0: *8 http write filter limit 0
2010/05/04 16:06:23 [error] 4215#0: *9 limiting connections by zone
“max_conn”, client: 64.119.156.90, server: _, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host:
maxcon1.bjaspan.acquia-sites.com
2010/05/04 16:06:23 [debug] 4215#0: *9 http write filter limit 0
2010/05/04 16:06:23 [error] 4215#0: *10 limiting connections by zone
“max_conn”, client: 64.119.156.90, server: _, request: “GET
/sleep.php?sleep=15 HTTP/1.0”, host:
maxcon1.bjaspan.acquia-sites.com
2010/05/04 16:06:23 [debug] 4215#0: *10 http write filter limit 0
2010/05/04 16:06:23 [debug] 4215#0: limit zone cleanup: 265A3FCB 3
2010/05/04 16:06:23 [debug] 4215#0: limit zone cleanup: 265A3FCB 2
2010/05/04 16:06:23 [debug] 4215#0: limit zone cleanup: 265A3FCB 1

Any thoughts?

Thanks,

Barry

Maxim,

The backend proxy server in my example is Apache on another box; there
is no proxy loop. I verified this with the X-Loop-Test header you
suggested.

At your suggestion I started looking more closely into the nginx log,
and I seem to have discovered that the problem is a bug with ab on
MacOS X. nginx is reporting that the remote client is closing the
connections (status 499) in addition to (as you pointed out) creating
extra connections I do not seem to have requested. When I run ab on
Ubuntu (from the machine running nginx itself and from a different
machine), I get the expected behavior: with limit_conn set to 3 and
making 4 concurrent requests, 3 requests succeed, and the 4th gets a
503. And the nginx error log looks much more like what I’d expect:

ab -c 4 -n 4
http://maxcon1.bjaspan.acquia-sites.com/sleep.php?sleep=3

root@bal-11:~# grep limit /var/log/nginx/error.log
2010/05/04 19:05:21 [notice] 936#0: getrlimit(RLIMIT_NOFILE):
10000:10000
2010/05/04 19:05:28 [debug] 938#0: *1 limit zone: 265A3FCB 1
2010/05/04 19:05:28 [debug] 938#0: *2 limit zone: 265A3FCB 2
2010/05/04 19:05:28 [debug] 938#0: *4 limit zone: 265A3FCB 3
2010/05/04 19:05:28 [error] 938#0: *6 limiting connections by zone
“max_conn”, client: 174.129.169.92, server: _, request: “GET
/sleep.php?sleep=3 HTTP/1.0”, host: “maxcon1.bjaspan.acquia-sites.com
2010/05/04 19:05:28 [debug] 938#0: *6 http write filter limit 0
2010/05/04 19:05:32 [debug] 938#0: *1 http write filter limit 0
2010/05/04 19:05:32 [debug] 938#0: limit zone cleanup: 265A3FCB 3
2010/05/04 19:05:32 [debug] 938#0: *4 http write filter limit 0
2010/05/04 19:05:32 [debug] 938#0: limit zone cleanup: 265A3FCB 2
2010/05/04 19:05:32 [debug] 938#0: *2 http write filter limit 0
2010/05/04 19:05:32 [debug] 938#0: limit zone cleanup: 265A3FCB 1

So, there is no apparent bug in nginx or my config files, and there
will be no more using ab on MacOS X for me!

Thanks,

Barry

Hello!

On Tue, May 04, 2010 at 12:10:26PM -0400, Barry Jaspan wrote:

I understand how making nginx proxy to itself would cause extra
connections and how those might make all of the initial connections
fail. If the limit is 3, I open 3, then each of those 3 attempt to
proxy to nginx again, nginx will return 503 to the proxy request which
will then be returned to the original requests.

However, I do not see how that is happening in my case.

Sure, but 1) you haven’t provided any evidence it doesn’t happen
with 3 requests and 2) even assumming it doesn’t happen - nobody
says loop is unconditional.

}
}
}
---- snip snip ----

Does this config file make nginx proxy to itself? I do not see how.

The question is: what handles connections on 10.252.86.98:80 and
how it handles them? In most simple case it’s the same nginx, and
loop is clear enough.

[…]

Any thoughts?

Your logs clearly show there are more requests than you expect
from ab. And your config doesn’t make proxy loop impossible.

You may either remove “proxy_set_header Host $host” to make sure
Host header in original request will differ from one in proxied,
or add something like

proxy_set_header X-Loop-Test washere;

to make sure requests will be at least distingushable in debug
log.

Maxim D.