Nginx slow for no reason

Hello,
I am running nginx 0.7.67-3 on debian, now without php, mostly serving
one empty html. It was working fine for several weeks, handling about
800 connections per second with 2 work processes. Few days ago it
started to be very slow, during peak hours almost impossible to connect,
but I don’t see any bottleneck. I tried increasing the number of
workers, using one 1 worker, using multi_accept - nothing helped. What I
see on server is that I have more than 500 sockets in SYN_RECV state and
nginx process seems to be doing nothing… strace shows this:

gettimeofday({1312807826, 920091}, NULL) = 0
epoll_wait(16, {}, 512, 500) = 0
gettimeofday({1312807827, 421795}, NULL) = 0
epoll_wait(16, {}, 512, 500) = 0
gettimeofday({1312807827, 923222}, NULL) = 0
epoll_wait(16, {}, 512, 500) = 0
gettimeofday({1312807828, 424994}, NULL) = 0
epoll_wait(16, {}, 512, 500) = 0
gettimeofday({1312807828, 926449}, NULL) = 0
epoll_wait(16, {}, 512, 500) = 0
gettimeofday({1312807829, 428064}, NULL) = 0
epoll_wait(16, {}, 512, 500) = 0

like if it had nothing to do - despite those waiting connections… I
din’t change anything on the server or nginx itself. Maybe the traffic
is different (coming from different countries for example), but that
shouldn’t have any influence.
The traffic is being moved from this old server and domain to a new one.
The bad thing is that today the slowness started also on the new server!
There we have nginx 1.0.5 from debian dotdeb with php5-fpm. Traffic is
to one php script which does one INSERT to the mysql (ad tracking). The
new server has 6 GB RAM, 6-core Xeon CPU and is almost idle.

epoll_wait(6, {}, 512, 500) = 0
epoll_wait(6, {}, 512, 500) = 0
epoll_wait(6, {}, 512, 500) = 0
epoll_wait(6, {}, 512, 500) = 0
epoll_wait(6, {}, 512, 500) = 0

What could be the problem?

Posted at Nginx Forum:

Hello,

On Mon, Aug 8, 2011 at 2:57 PM, Marki555 [email protected] wrote:

Hello,
I am running nginx 0.7.67-3 on debian, now without php, mostly serving
one empty html. It was working fine for several weeks, handling about
800 connections per second with 2 work processes. Few days ago it
started to be very slow, during peak hours almost impossible to connect,
but I don’t see any bottleneck. I tried increasing the number of
workers, using one 1 worker, using multi_accept - nothing helped. What I
see on server is that I have more than 500 sockets in SYN_RECV state and
nginx process seems to be doing nothing… strace shows this:
Could someone/thing be trying to SYN flood you ?
Do they all come from same IP/block ?
Can you provide a little sample of the output of netstat -tnp | grep
SYN_RECV ?

A.

Do you think it can be a synflood attack? I can see it only during peak
hours, if it would be attack, I would expect it to be nonstop. If it
would be synflood, how would nginx handle it? SYN_RECV means that kernel
has received the initial SYN packet, but the userspace (nginx) didn’t
reply with SYN+ACK yet. But from strace it seems that nginx is not
receiving those connections…

Every request is from different IP (as it’s ad-tracking I have more than
3 milions diff. IPs per day). Here is output:

bill3:~:# netstat -tnp | grep SYN_RECV |sort -k5,5
tcp 0 0 92.x.x.x:80 108.84.25.217:49988 SYN_RECV

tcp 0 0 92.x.x.x:80 171.0.128.220:59857 SYN_RECV

tcp 0 0 92.x.x.x:80 188.22.33.219:63756 SYN_RECV

tcp 0 0 92.x.x.x:80 194.168.179.130:54327 SYN_RECV

tcp 0 0 92.x.x.x:80 2.218.18.53:49980 SYN_RECV

tcp 0 0 92.x.x.x:80 212.106.232.41:3887 SYN_RECV

tcp 0 0 92.x.x.x:80 213.105.53.187:56882 SYN_RECV

tcp 0 0 92.x.x.x:80 213.105.53.187:56948 SYN_RECV

tcp 0 0 92.x.x.x:80 213.107.67.17:56947 SYN_RECV

tcp 0 0 92.x.x.x:80 217.137.153.229:4384 SYN_RECV

tcp 0 0 92.x.x.x:80 46.25.124.158:59649 SYN_RECV

tcp 0 0 92.x.x.x:80 62.254.142.85:59674 SYN_RECV

tcp 0 0 92.x.x.x:80 62.255.147.169:58835 SYN_RECV

tcp 0 0 92.x.x.x:80 62.31.128.35:51695 SYN_RECV

tcp 0 0 92.x.x.x:80 77.100.4.202:56501 SYN_RECV

Posted at Nginx Forum:

Hello,

A synflood was my first thought (guess I should take my pills… ;)).
But it’s unlikely given the IP patterns.

Are the logs saying anything ?

Did you trying playing with those:

Good luck with it,

Antoine.

Hello!

On Mon, Aug 08, 2011 at 09:41:06AM -0400, Marki555 wrote:

Do you think it can be a synflood attack? I can see it only during peak
hours, if it would be attack, I would expect it to be nonstop. If it
would be synflood, how would nginx handle it? SYN_RECV means that kernel
has received the initial SYN packet, but the userspace (nginx) didn’t
reply with SYN+ACK yet. But from strace it seems that nginx is not
receiving those connections…

You understanding of how tcp stack works isn’t really correct.
Userland (and nginx) will see connection once it’s ESTABLISHED.
Connections in SYN_RECV state are sitting in kernel (traditionally
in listen socket’s incomplete queue, on modern OSes likely in
syncache or something like it) and userland won’t be able to
accept() them.

Every request is from different IP (as it’s ad-tracking I have more than
3 milions diff. IPs per day). Here is output:

I suggest most likely cause is network problems: packets are lost
somewhere in transit, and that’s why you see many incomplete
connections.

Maxim D.

Hello!

On Mon, Aug 08, 2011 at 02:01:55PM -0400, Marki555 wrote:

Thanks, I wasn’t sure whether kernel or nginx replies to the SYN
packets. Now the question is how can I check for the network problems.
SYN_RECV is also when kernel has replied with SYN+ACK, but I waiting for
a final ACK? But SYN_RECV is also when user-space is unable to accept()
the new connections so fast. How can I distinguish between these 2
reasons?

Again: SYN_RECV connections are sitting in kernel waiting for ACK
to happen. Only ESTABLISHED connections are passed to userland.

SYN_RECV connections may appear on Linux as a result of listen
queue being overflowed (as Linux just drops ACKs by default if
there isn’t enough room in listen queue), but this is not a
listen queue, these connections can’t be acepted. This is rather
artifact of the way how Linux handles listen queue overflows.

But anyway, why should that prevent legitimate normal connections from
being accepted? When I try to access nginx’ server-status, I am waiting
for many seconds…

First of all, you may want to actually check how many connections
are sitting in listen queue. Under FreeBSD use

netstat -Lan

to find queue lengths. Under Linux 2.6.18+ it should be possible
to examine listen queue with

ss -nlt

For older Linux versions check

netstat -ntp

and count connections in ESTABLISHED state without associated
process.

Maxim D.

On Mon, 08 Aug 2011 14:01:55 -0400, “Marki555” [email protected]
wrote:

But anyway, why should that prevent legitimate normal connections from
being accepted? When I try to access nginx’ server-status, I am
waiting for many seconds…

Maybe local hosts / DNS issues?

M.

Thanks, I wasn’t sure whether kernel or nginx replies to the SYN
packets. Now the question is how can I check for the network problems.
SYN_RECV is also when kernel has replied with SYN+ACK, but I waiting for
a final ACK? But SYN_RECV is also when user-space is unable to accept()
the new connections so fast. How can I distinguish between these 2
reasons?

But anyway, why should that prevent legitimate normal connections from
being accepted? When I try to access nginx’ server-status, I am waiting
for many seconds…

Posted at Nginx Forum:

Thanks for your hints. Furtunately the issues disappeared. However I
would like to know how to troubleshoot it in the future.
The mentioned command produces this output (not interesting lines
filtered out):
bill3:~:# ss -nlt
Recv-Q Send-Q Local Address:Port
Peer Address:Port
0 128 :111
:
0 128 92.240.244.176:80
:
0 5 :::53
:::

0 5 *:53
:
0 1024 127.0.0.1:2812
:

What is Recv-Q/Send-Q? Is it the listen backlog queue length? How can I
see if the queue is full? On this new server I have it 128, but on old
it is 511 (for both apache and nginx). Why?
/proc/sys/net/core/netdev_max_backlog is 1000
/proc/sys/net/ipv4/tcp_max_syn_backlog is 1024

And what is the number of ESTABLISHED sockets without process
associated? Are these the connections waiting in queue to be accepted by
nginx? I have these values now (while nginx is responding within few ms
to server-status and handling about 150 conn/s):

bill3:~:# munin-run netstat_tcpstates|grep SYN_RECV
SYN_RECV.value 127
bill3:~:# netstat -ntp|grep EST|grep – \ -|wc -l
60

Posted at Nginx Forum: