Max clients != worker processes * worker connections

dubstep · October 11, 2011, 5:02pm

Hello,

I relise that it may have been hammered on quite a few times already,
but I cannot understand the results I’m getting. I have a rather basic
setup on Ubuntu Server 11.04, where nginx (run as root) spawns 2
worker_processes and serves a basic HTML page. Each of these processes
should have its worker_connections set to 8192. There’s no limit_zone
defined. worker_rlimit_nofile is set to 65536, keepalive_timeout 5.

Verifying max load, I run ab from another server on the subnet. It works
fine with “ab -kc 8000 -n 16000 http://10.1.1.10/”. However, when I do
“ab -kc 10000 -n 16000 http://10.1.1.10/”, ab shows about 3486 failed
requests in its results (Length: 1743, Exceptions: 1743), while nginx’s
error.log features numerous “2011/10/11 16:49:24 [alert] 12081#0: 8192
worker_connections are not enough” errors.

Testing a number of settings, it seems that there’s a close to 1-1
relationship between worker_connections and the maximum concurrency
parameter to ab that doesn’t produce errors. I tried setting
worker_processes to some high number (like 16), but it seems to have no
effect whatsoever.

Can you please let me know why this setup might not be serving the
“promised” worker_processes * worker_connections connections? Is it
possible that new connections are not being evenly distributed to the
two processes? Apologies if this is some basic error on our side, we’re
still learing (and admiring) nginx and are more used to IIS!

Kind regards,
Dawid Ciecierski

Posted at Nginx Forum:

davidcie · October 11, 2011, 5:38pm

Hello!

On Tue, Oct 11, 2011 at 11:02:34AM -0400, davidcie wrote:

fine with “ab -kc 8000 -n 16000 http://10.1.1.10/”. However, when I do

Can you please let me know why this setup might not be serving the
“promised” worker_processes * worker_connections connections? Is it
possible that new connections are not being evenly distributed to the
two processes? Apologies if this is some basic error on our side, we’re
still learing (and admiring) nginx and are more used to IIS!

With small html page it’s very likely that one worker process will
be able to hold accept mutex exclusively. Upon aproaching
worker_connections limit it will stop using accept mutex, but it
may take some time (up to accept_mutex_delay, default 500ms) for
other workers to come into play. Additionally, the worker will
still try to accept() connections until it sees other worker
locked accept mutex (and this may take some additional time).

With real load you should get much better distribution of client
connections between workers. In extreme use cases like the above
you may try using

events {
    accept_mutex off;
    ...
}

to achieve better connection distribution between workers.

This implies some CPU overhead though, especially when using many
worker processes (OS will wake up each worker on any new
connection instead of only one holding accept mutex).

Maxim D.

davidcie · October 11, 2011, 5:39pm

On 10/11/11, davidcie [email protected] wrote:

Can you please let me know why this setup might not be serving the
“promised” worker_processes * worker_connections connections?

Is it possible that new connections are not being evenly distributed to the
two processes?

They aren’t supposed to be evenly distributed. Whoever gets cpu time
first is likely to accept as many connections as possible. Unless you
have blocking or cpu intensive requests in which case it won’t have
enough time to call another accept but other worker will.
And it’s actually not specific to nginx. I think any non-blocking
server behaves like this.
Although you can try setting accept_mutex or something to influence
this.

davidcie · October 12, 2011, 10:40am

This is an excellent explanation - thank you for such an in-depth look
into the inner workings of nginx. I will consider sligthly decreasing
accept_mutex_delay for our high traffic site, and stop worrying about
benchmark results too much.

Thanks again, “problem” solved

Best regards,
Dawd Ciecierski

Posted at Nginx Forum:

davidcie · October 11, 2011, 5:43pm

On 10/11/11, Maxim D. [email protected] wrote:
Sorry, haven’t noticed Maxim’s answer.