High Traffic setup problem, module status don't deliver data

Dear list member.

currently we have a huge traffic come up.

~500 r/s
http://download.none.at/nginx_request-day.png

~3.5K active connections
http://download.none.at/port_www-day.png
http://download.none.at/nginx_combined.png

The Peaks are the raw values from module status.

~1.1g b/s traffic
http://download.none.at/if_eth2-day.png
http://download.none.at/tcp-day.png

I have tried to setup the machine for this traffic but it looks to me
that was not successfully.

HW:
24 CPUs
Memory: 49381124k/52166656k available (7176k kernel code, 1897452k
absent, 888080k reserved, 6067k data, 1016k init)

OS:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.4 LTS
Release: 12.04
Codename: precise

Nginx:
/home/nginx/server/sbin/nginx -V
nginx version: nginx/1.4.4
built by gcc 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
TLS SNI support enabled
configure arguments: --prefix=/home/nginx/server --with-debug
–without-http_uwsgi_module --without-http_scgi_module
–without-http_empty_gif_module --with-http_stub_status_module
–with-http_gzip_static_module --with-http_ssl_module --user=nginx
–group=www-data --with-file-aio --without-http_ssi_module
–with-http_secure_link_module --with-http_sub_module
–with-http_spdy_module

Conf:
http://download.none.at/my_nginx.conf

When I activate the aio, nginx and xfs crashes, that’s why aio is not
active.

In one include file we have the following.

     location ~ recent {
       add_header Cache-Control "no-cache";
     }

sysctl -a
http://download.none.at/sysctl.txt

lsmod:
http://download.none.at/lsmod.txt

dmesg:
http://download.none.at/dmesg.txt

On this machine also runs a postgresql and php-fpm but the current
traffic is from delivering of pictures from the file system.

/dev/mapper/pada2_vg-pada2_lv on /home/ type xfs
(rw,noatime,nodiratime,attr2,inode64,noquota)

Thanks for help.

Best regards
Aleks

On 10 February 2014 12:06, Aleksandar L. [email protected] wrote:

Thanks for help.

Aleksandar - I can’t work out what you need help with. There aren’t
any questions (or question marks!) in your email :slight_smile:

I can’t see your problem at first or second glance; I’m sure others
will, but I’m quite slow. Could you spell the problem out (what you
observe; what you expect to observe; what’s changed; how you’re
testing)?

J

Hi Jonathan.

Sorry to be unclear, thanks for answer and question.

Am 10-02-2014 16:37, schrieb Jonathan M.:

On 10 February 2014 12:06, Aleksandar L. [email protected] wrote:

Thanks for help.

Aleksandar - I can’t work out what you need help with. There aren’t
any questions (or question marks!) in your email :slight_smile:

I can’t see your problem at first or second glance; I’m sure others
will, but I’m quite slow. Could you spell the problem out (what you
observe; what you expect to observe; what’s changed; how you’re
testing)?

I run nginx on the described HW & OS.

I use

to get the statistics from stub_status_module.

The call from nginx-combined_ runs on the same machine as the
nginx server.

Due to this fact we have no external network traffic, just an ip alias
call on eth2.

Every time when I have more then ~400 r/s we get no data from the
status-request, this request rate means ~20k Packets/Second.
I use netfilter with fail2ban, but not the connection tracking module!

I have now seen on the tcpdump that I get a ‘RST’ Package quite
immediately after a request when the ‘no answer from server’ cames.

I think this could be a kernel-network issue not a nginx issue.

The question is:
Please can you help me to find the reason for the immediately ‘RST’
answer.

I hope my question is more clear now.

Thanks for reading and patience.

On Monday 10 February 2014 17:41:47 Aleksandar L. wrote:

I use netfilter with fail2ban, but not the connection tracking module!
Do you see the issue without fail2ban?

I hope my question is more clear now.

Thanks for reading and patience.

You haven’t shown your server level configuration.
Do you use deferred accept?

wbr, Valentin V. Bartenev

Am 11-02-2014 12:14, schrieb Valentin V. Bartenev:

On Monday 10 February 2014 17:41:47 Aleksandar L. wrote:

[snipp]

Every time when I have more then ~400 r/s we get no data from the
status-request, this request rate means ~20k Packets/Second.
I use netfilter with fail2ban, but not the connection tracking module!

Do you see the issue without fail2ban?

I haven’t tried the setup with out.

Thanks for reading and patience.

You haven’t shown your server level configuration.
Do you use deferred accept?

yes

listen :80 deferred default_server;

Aleks

Hello!

On Mon, Feb 10, 2014 at 05:41:47PM +0100, Aleksandar L. wrote:

[…]

Please can you help me to find the reason for the immediately ‘RST’ answer.
Listen queue overflow?

On modern Linux’es, it should be possible to check some listen
queue numbers with “ss -nlt” / “netstat -nlt” (on BSD, detailed
information is available with “netstat -Lan”), and number of
overflows happended in past should be in “netstat -s” stats. To
tune listen queue size used by nginx, use “backlog” parameter of
the listen directive. Note that system limits like
tcp_max_syn_backlog and somaxconn also require tuning.

If stateful firewall is used, this also can be a result of “out of
states” conditions, check your firewall stats.


Maxim D.
http://nginx.org/

On Tuesday 11 February 2014 12:34:59 Aleksandar L. wrote:
[…]

You haven’t shown your server level configuration.
Do you use deferred accept?

yes

listen :80 deferred default_server;

Ok. Two other guesses: you have tcp_syncookies disabled,
and tcp_abort_on_overflow enabled?

Please note, that with deferred accept enabled it is very
easy to have tcp_max_syn_backlog overflowed, especially
with nginx prior to 1.5.10.

wbr, Valentin V. Bartenev

Am 11-02-2014 12:15, schrieb Maxim D.:

I have now seen on the tcpdump that I get a ‘RST’ Package quite

On modern Linux’es, it should be possible to check some listen
queue numbers with “ss -nlt” / “netstat -nlt” (on BSD, detailed
information is available with “netstat -Lan”), and number of
overflows happended in past should be in “netstat -s” stats. To
tune listen queue size used by nginx, use “backlog” parameter of
the listen directive. Note that system limits like
tcp_max_syn_backlog and somaxconn also require tuning.

root@ns61620:~# ss -nlt|egrep ‘Sta|’
State Recv-Q Send-Q Local Address:Port Peer
Address:Port
LISTEN 0 128 :80 :

sysctl -a|egrep ‘somaxconn|tcp_max_syn’
net.core.somaxconn = 4069
net.ipv4.tcp_max_syn_backlog = 8192

I have not add “backlog” to the listen directive.

Do you have some suggestions about useful values for that amount of
traffic?

If stateful firewall is used, this also can be a result of “out of
states” conditions, check your firewall stats.

I don’t use connection track module.

Aleks

Hello!

On Tue, Feb 11, 2014 at 12:34:59PM +0100, Aleksandar L. wrote:

Please can you help me to find the reason for the immediately ‘RST’
yes

listen :80 deferred default_server;

Try switching it off, there could be a problem if kernel decides
to switch to syncookies, see this ticket for details:

http://trac.nginx.org/nginx/ticket/353

(The problem is fixed in 1.5.10, and 1.4.5 will have the fix,
too.)


Maxim D.
http://nginx.org/

Am 11-02-2014 12:45, schrieb Valentin V. Bartenev:

Ok. Two other guesses: you have tcp_syncookies disabled,
and tcp_abort_on_overflow enabled?

Please note, that with deferred accept enabled it is very
easy to have tcp_max_syn_backlog overflowed, especially
with nginx prior to 1.5.10.

sysctl -a|egrep ‘tcp_syncookies|tcp_abort_on_overflow’
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_syncookies = 1

download.none.at # egrep ‘tcp_syncookies|tcp_abort_on_overflow’
sysctl.txt
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_syncookies = 1

Am 11-02-2014 12:48, schrieb Maxim D.:

Hello!

On Tue, Feb 11, 2014 at 12:34:59PM +0100, Aleksandar L. wrote:

[snipp]

#353 (nginx 1.4.[0|1] empty reply from server) – nginx

(The problem is fixed in 1.5.10, and 1.4.5 will have the fix,
too.)

Ok thanks.
I have now removed deferred and added backlog=1024

Should I add deferred again when I update to 1.4.5?

Hello!

On Tue, Feb 11, 2014 at 01:10:59PM +0100, Aleksandar L. wrote:

Do you use deferred accept?
(The problem is fixed in 1.5.10, and 1.4.5 will have the fix,
too.)

Ok thanks.
I have now removed deferred and added backlog=1024

Does it actually solve the problem? It also would be intresting
to know what exactly did it - removing deferred or adding backlog?

(As for backlog size, I usually set it to something big enough to
accomodate about 1 or 2 seconds of expected peek connection rate.
That is, 1024 is good enough for about 500 connections per second.
But with deferred on Linux, it looks like deferred connection are
sitting in the same queue as normal ones, and this may drastically
change things.)

Should I add deferred again when I update to 1.4.5?

It should be safe, though I don’t recommend it unless it’s
beneficial in your setup.


Maxim D.
http://nginx.org/

Hello!

On Tue, Feb 11, 2014 at 02:14:14PM +0100, Aleksandar L. wrote:

listen :80 deferred default_server;
I have now removed deferred and added backlog=1024

I will try today

https://bitbucket.org/yarosla/httpress/wiki/Home
GitHub - httperf/httperf: The httperf HTTP load generator

But if anybody have a suggestion for distributed load testing service I’m
open eared.

I personally prefer http_load, mostly for historical reasons. We
also use wrk here, which is quite good as well.

18195743 SYNs to LISTEN sockets dropped

What could be this the reason?

I’m not really a Linux expert as I prefer FreeBSD, but number
suggests there are no listen queue overflows during the second,
but there are other SYN drops. Looking into the code - there are lots
of possible reasons, including various allocation failures and/or
other edge cases. It needs additional investigation to tell what
goes on.

(As for backlog size, I usually set it to something big enough to
accomodate about 1 or 2 seconds of expected peek connection rate.
That is, 1024 is good enough for about 500 connections per second.
But with deferred on Linux, it looks like deferred connection are
sitting in the same queue as normal ones, and this may drastically
change things.)

OK. That means with deferred I should double or divide the listening value?

With deferred, it looks like all (potential) deferred connection
should be added to the value. And it’s very hard to tell how many
there will be.


Maxim D.
http://nginx.org/

Hi.

Am 11-02-2014 13:28, schrieb Maxim D.:

to switch to syncookies, see this ticket for details:
to know what exactly did it - removing deferred or adding backlog?
Due to the fact that we have mostly in the morning our highest traffic I
can tell you this tomorrow.
I still search for a useable load testing setup.
Currently I have one server (1 GB) with

I will try today

https://bitbucket.org/yarosla/httpress/wiki/Home

But if anybody have a suggestion for distributed load testing service
I’m open eared.

What I also have seen is that.

netstat -s|egrep ‘listen queue|SYNs to LISTEN sockets dropped’
131753 times the listen queue of a socket overflowed
18195732 SYNs to LISTEN sockets dropped

a second later.

netstat -s|egrep ‘listen queue|SYNs to LISTEN sockets dropped’
131753 times the listen queue of a socket overflowed
18195743 SYNs to LISTEN sockets dropped

What could be this the reason?

(As for backlog size, I usually set it to something big enough to
accomodate about 1 or 2 seconds of expected peek connection rate.
That is, 1024 is good enough for about 500 connections per second.
But with deferred on Linux, it looks like deferred connection are
sitting in the same queue as normal ones, and this may drastically
change things.)

OK. That means with deferred I should double or divide the listening
value?

Should I add deferred again when I update to 1.4.5?

It should be safe, though I don’t recommend it unless it’s
beneficial in your setup.

OK. Thanks for now I will not activate it for the current setup.

Hello!

On Tue, Feb 11, 2014 at 06:16:48PM +0400, Valentin V. Bartenev wrote:

With deferred, it looks like all (potential) deferred connection
should be added to the value. And it’s very hard to tell how many
there will be.

Deferred connections stay in syn backlog which controlled
by tcp_max_syn_backlog.

You mean they aren’t counted to listen socket’s backlog, right?
Is there any way to see how many such connections are queued then?


Maxim D.
http://nginx.org/

On Tuesday 11 February 2014 18:06:38 Maxim D. wrote:
[…]

should be added to the value. And it’s very hard to tell how many
there will be.

Deferred connections stay in syn backlog which controlled
by tcp_max_syn_backlog.

wbr, Valentin V. Bartenev

Am 11-02-2014 16:28, schrieb Valentin V. Bartenev:

But with deferred on Linux, it looks like deferred connection are
Deferred connections stay in syn backlog which controlled

netstat -n | grep SYN_RECV | grep :80 | wc -l

We have a average from ~200 here.

Is this good/bad/not worth ?

On Tuesday 11 February 2014 19:06:37 Maxim D. wrote:

sitting in the same queue as normal ones, and this may drastically
by tcp_max_syn_backlog.

You mean they aren’t counted to listen socket’s backlog, right?

Yes.

Is there any way to see how many such connections are queued then?

They all stay in SYN_RECV state. If I understand right, something
like this will show:

netstat -n | grep SYN_RECV | grep :80 | wc -l

wbr, Valentin V. Bartenev

Hello!

On Tue, Feb 11, 2014 at 07:28:43PM +0400, Valentin V. Bartenev wrote:

But with deferred on Linux, it looks like deferred connection are
Deferred connections stay in syn backlog which controlled

netstat -n | grep SYN_RECV | grep :80 | wc -l

How these connections are different from ones in real SYN_RECV
state then? I.e., how one is expected to distinguish them from
connections not yet passed 3-way handhake?


Maxim D.
http://nginx.org/

On Tuesday 11 February 2014 16:44:57 Aleksandar L. wrote:

accomodate about 1 or 2 seconds of expected peek connection rate.
there will be.

They all stay in SYN_RECV state. If I understand right, something
like this will show:

netstat -n | grep SYN_RECV | grep :80 | wc -l

We have a average from ~200 here.

Is this good/bad/not worth ?

Not a problem at all, till it’s noticeable lower than
tcp_max_syn_backlog.

wbr, Valentin V. Bartenev