Why set keepalive_timeout to a short period when Nginx is great at handling them?

Sadaf_N · June 18, 2016, 1:26pm

I read something interesting today:
https://blog.martinfjordvald.com/2011/04/optimizing-nginx-for-high-traffic-loads/

“Keep alive is a HTTP feature which allows user agents to keep the
connection to your server open for a number of requests or until the
specified time out is reached. This won’t actually change the
performance of our nginx server very much as it handles idle
connections very well. The author of nginx claims that 10,000 idle
connections will use only 2.5 MB of memory, and from what I’ve seen
this seems to be correct.”"

So why is it that people on the web (and in IRC) still recommend
setting keepalive_timeout to a short period (< 30 seconds) when
Nginx can handle idle keep-alive connections like a champ (using very
little resources) while serving active ones? Is that bad advise?

I get this advise so often that I believe there must be something that
I am missing. What’s it?

Aahan_Krish · June 18, 2016, 5:29pm

Hi B.R.,

You raised a good point.

So you are referring to the 4-tuple (source_IP, source_port,
server_IP, server_port) socket limitation, correct? I just came to
know about this and it’s interesting. Please tell me if this
understanding of mine is correct:

So a server identifies a user's connection based on a combination

of: user’s internet connection’s IP + port the user’s client is
connecting from (e.g. Chrome on 8118, IE on 8080, etc.) + server IP +
server_port (80 for HTTP / 443 for HTTPS).

And the limitation is that a maximum of ~ 65536 clients all on

same port (say all are using Chrome and therefore connecting from
8118) can connect simultaneously to a web server that is connected to
the internet via 1 public IP address and port 80 (let’s say HTTP
only), IFF the resources of the server permit.

And that means I can double the no. of connections (2x 65536 per

second) my server can handle, if it has enough resources in the first
place (i.e. sufficient RAM, CPU, I/O capacity or whatever is relevant)
by simply adding another public IP address to my server and making
sure that the traffic is load-balanced between the two public IPs of
the server.

Am I correct?

(If my understanding is correct, this comment was helpful:

sockets - NGINX : Exceeds 65535 connections limit - Stack Overflow)

Check out the post I recently made to this list answering my own
question about keepalive_timeout:
http://mailman.nginx.org/pipermail/nginx/2016-June/051026.html

If you follow ((5)) in the post, you’ll note that keepalive_timeout
set to anything over 300s or 5m is probably pointless as most browsers
drop the keep-alive connection in under 2 min, and 5 minutes max. This
is just an FYI as I’d like to hear what you think.

Lastly, your suggestion on utilizing keepalive_requests to recycle
keep-alive connections is smart. Noted.

I think I learnt a lot today. =)

Aahan_Krish · June 18, 2016, 2:14pm

There is no downside on the server application I suppose, especially
since,
as you recalled, nginx got no trouble for it.

One big problem is, there might be socket exhaustion on the TCP stack of
your front-end machine(s). Remember a socket is defined by a triple
<protocol, address, port> and the number of available ports is 65535
(layer
4) for every IP (layer 3) double <protocol, address>.
The baseline is, for TCP connections underlying your HTTP communication,
you have 65535 port for each IP version your server handles.

Now, you have to consider the relation between new clients (thus new
connections) and the existing/open ones.
If you have very low traffic, you could set an almost infinite timeout
on
your keepalive capability, that would greatly help people who never
sever
connection to your website because they are so addicted to it (and never
close the tab of their browser to it).
On the contrary, if you are very intensively seing new clients, with the
same parameters, you would quickly exhaust your available sockets and be
unable to accept client connections.

On the opposite scenario where you are setting a timeout on keepalive
which
is too low, you would hurt you server performance by using CPU to manage
overhead connections for a single client, thus wasting resources and
inducing latency, which are issues keepalive helps to address.

Given the 65535 ports limitation is not going to change, at least in a
recent future (hardcoded on 16 bits in nowadays protocols), you have
essentially 2 parameters to consider:

How often you get new clients
What is the mean time users spend connected to your server(s)

Those should help you define the most efficient keepalive timeout. nginx
sets the default time for it at 75 seconds
http://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_timeout.

On a side note, there are also browser trouble with it, see
keepalive_disable
http://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_disable.

And finally, nginx provides the ability to recycle connections based on
a
number of requests made (default 100).
I guess that is a way of mitigating clients with different behaviors: a
client having made 100 requests is probably considered to hav had its
share
of time on the server, and it is time to put it back in the pool to give
others access in case of congestion.
On the other hand, a client taking its time to browse your website (and
thus not reaching the requests limit) should be given the whole timeout
time allocated on the server.
I see no other reason than justice/balancing here, no technical one
which
is other than th already addressed one: giving each client enough time
to
browse the website with minimal disconnection, while avoid resources
unfairly taken away from other people.

I might be misled, in which case I ocunt on people to correct me.
I suggest you also read:

Tuning NGINX for Performance - NGINX (‘Keepalive Connections’
part)
and more importantly
Using NGINX as an Accelerating Proxy for HTTP Servers

B. R.

Aahan_Krish · June 19, 2016, 10:53am

On Saturday 18 June 2016 14:12:31 B.R. wrote:

There is no downside on the server application I suppose, especially since,
as you recalled, nginx got no trouble for it.

One big problem is, there might be socket exhaustion on the TCP stack of
your front-end machine(s). Remember a socket is defined by a triple
<protocol, address, port> and the number of available ports is 65535 (layer
4) for every IP (layer 3) double <protocol, address>.
The baseline is, for TCP connections underlying your HTTP communication,
you have 65535 port for each IP version your server handles.
[…]

Each TCP connection is identified by 4 parameters: source IP, source
PORT,
destination IP, destination PORT. Since usually clients have different
public IPs there’s not limitation by the number of ports.

Now, you have to consider the relation between new clients (thus new
connections) and the existing/open ones.
If you have very low traffic, you could set an almost infinite timeout on
your keepalive capability, that would greatly help people who never sever
connection to your website because they are so addicted to it (and never
close the tab of their browser to it).
On the contrary, if you are very intensively seing new clients, with the
same parameters, you would quickly exhaust your available sockets and be
unable to accept client connections.

No, keep-alive connections shouldn’t exhaust available sockets, because
there’s “worker_connections” directive in nginx that limits number of
open
connections and must be set according to other limits in your system.

[…]

And finally, nginx provides the ability to recycle connections based on a
number of requests made (default 100).
I guess that is a way of mitigating clients with different behaviors: a
client having made 100 requests is probably considered to hav had its share
of time on the server, and it is time to put it back in the pool to give
others access in case of congestion.

No, it’s to overcome possible memory leaks of long lived connections in
nginx,
because some modules may allocate memory from connection pool on each
request.
It’s usually save to increase this value to 1000-10000.

wbr, Valentin V. Bartenev

Aahan_Krish · June 19, 2016, 12:37pm

Hi Valentin,

(I repeat the same question I put to B.R. as you raised the same
point.)

So you are referring to the 4-tuple (source_IP, source_port,
server_IP, server_port) socket limitation, correct? I just came to
know about this and it’s interesting. Please tell me if this
understanding of mine is correct:

So a server identifies a user's connection based on a combination
of: user's internet connection's IP + port the user's client is
connecting from (e.g. Chrome on 8118, IE on 8080, etc.) +
server IP + server_port (80 for HTTP / 443 for HTTPS).

And the limitation is that a maximum of ~ 65536 clients all on
same port (say all are using Chrome and therefore connecting from
8118) can connect simultaneously to a web server that is connected
to the internet via 1 public IP address and port 80 (let's say
HTTP only), IFF the resources of the server permit.

And that means I can double the no. of connections (2x 65536 per
second) my server can handle, if it has enough resources in the
first place (i.e. sufficient RAM, CPU, I/O capacity or whatever
is relevant) by simply adding another public IP address to my
server and making sure that the traffic is load-balanced between
the two public IPs of the server.

Am I correct?

If my understanding is correct, this comment was helpful:

Aahan_Krish · June 19, 2016, 3:36pm

On Sunday 19 June 2016 16:06:56 Aahan Krish wrote:

So a server identifies a user's connection based on a combination
And that means I can double the no. of connections (2x 65536 per
second) my server can handle, if it has enough resources in the
first place (i.e. sufficient RAM, CPU, I/O capacity or whatever
is relevant) by simply adding another public IP address to my
server and making sure that the traffic is load-balanced between
the two public IPs of the server.

Am I correct?
[…]

No, first of all, there’s no limitation of 65535 clients.

Clients usually use different IPs, so one element of 4-tuple already
different.

Even if they are behind NAT, that only limits number of connections
from one public IP of that gateway, not all clients of your server.
Chrome, IE, etc. don’t use the same port each time for outgoing
connections.

wbr, Valentin V. Bartenev

Aahan_Krish · June 19, 2016, 4:46pm

Ah, I didn’t know about NAT before. So that’s how we have shared IP
addresses vs. dedicated IP addresses. This is beautiful; there’s so
much to learn.

So the 2^16 limitation that B.R. mentioned is nothing to worry about.
It’s like worrying that there are limited IP addresses available so we
can’t serve infinite number of users, heh.

Thank you very much B.R. and Valentin for answering my questions. Have
a great day!