Nginx not responding to some SYN packets

Radek_Burkat · February 4, 2009, 7:15am

Have a machine running the latest devel version nginx-0.7.33 (tried
0.6.35
with same results) for serving small (less than 10K images) and am
seeing on
tcpdump that some SYN packets are not responded to right away.The
browser
does retransmit these image requests every second and on the 2nd or 3rd
resent SYN, I finally start seeing and ACK, and the images load.
It is very indeterministic as to when it happens and can only reproduce
it
some of the time. When it does occur the outcome is a page with some
images
loaded and others (whose SYN packets are not ACKs) are not loaded…a
few
seconds later they load.
Typically the system has ~2000 active connections, most in keep alive.
The
load is around 100-200 req/sec.

I have tries all sorts of settings and configurations suggested in the
maillist but I still dont have the solution for this issue. from 1 to 4
workers, changing the connection counts, different even handlers, kernel
buffers, etc.
It just seems so anecdotal to just change a bunch of settings without
being
able to what is happening internally.
I’d like to be able to debug a little deeper to find out what is
happening
to these packets.

How would I go about debugging what is the cause of it. Is it the
interface
driver, kernel, or nginx? What kind of tools and debugging options can
I
try next?

Thanks, Radek

System Details
model name : Intel® Xeon® CPU X3210 @ 2.13GHz
Linux n06 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 12:03:43 EST 2008 i686
i686
i386 GNU/Linux
eth0
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Current message level: 0x000000ff (255)
Link detected: yes
driver: tg3
version: 3.86
firmware-version: 5721-v3.61, ASFIPMI v6.21
bus-info: 0000:05:00.0

avg-cpu: %user %nice %system %iowait %steal %idle
0.10 0.00 0.20 2.43 0.00 97.27
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await
svctm
%util
sda 0.00 0.00 27.40 0.00 443.20 0.00 16.18 0.10 3.50 3.32 9.10
no overruns or errors on interface.

Radek_Burkat · February 4, 2009, 7:21am

On Tue, Feb 03, 2009 at 10:07:27PM -0800, Radek Burkat wrote:

load is around 100-200 req/sec.
How would I go about debugging what is the cause of it. Is it the interface
driver, kernel, or nginx? What kind of tools and debugging options can I
try next?

nginx has not any relation to TCP handshake (SYN, SYN/ACK, ACK),
this is kernel or driver issue.

Radek_Burkat · February 4, 2009, 9:13am

Thanks Igor, for eliminating one variable for me. I noticed that when I
use
a single worker and set the affinity to the same core as the ethernet
interface interrupt I decrease the frequency of the issue but do not
eliminate it. Not sure
Are there any kernel/network debugging tools which reach a little deeper
than tcpdump?

Radek

Radek_Burkat · February 4, 2009, 9:37am

Hello,

have you tried to disable syncookies and/or adjust them ?
On the varnish documentation (
http://varnish.projects.linpro.no/wiki/Performance ) there is at least
this setup (in /etc/sysctl.conf) :

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2

and what about netfilter connection tracking ?

Olivier

Radek Burkat a écrit :

Radek_Burkat · February 10, 2009, 8:19am

On Mon, Feb 09, 2009 at 01:51:12PM -0800, Radek Burkat wrote:

From what I read so far is that syncookies goes into action when you hit a
This information seems like it is quite important in determining the
performance of your server latency.

In nginx there is the optional backlog variable in the listen configuration.
I imagine that this maps directly to the system call listen. Though I see
that the nginx backlog default is -1. How does that translate to the system
parameter which requires an in above 0.

Yes, backlog may affect this. However, it seems that you run very old
nginx as it had been fixed in 2007:

Changes with nginx 0.6.7 15 Aug
2007
Changes with nginx 0.5.32 24 Sep
2007

*) Bugfix: now nginx uses default listen backlog value 511 on all
   platforms except FreeBSD.
   Thanks to Jiang Hong.

Now -1 is used on FreeBSD only and it means the maximum available
backlog.

Besides, net.ipv4.tcp_max_syn_backlog = 10 is too small.

Also, when kernel has no enough backlog, it usually response with
RST packet, if this is not disabled.

Radek_Burkat · February 9, 2009, 11:05pm

I have been looking at this issue to try to understand the interactions
and
this is what I have found out so far.
I can reproduce the issue by decreasing the net.ipv4.tcp_max_syn_backlog
and setting net.ipv4.tcp_syncookies = 0. It makes sense that when the
load
gets a little high the system does not process the accept fast enough
and
the backlog builds up. When the backlog fills up the ACK to the initial
SYN
is not issued.
It would seem also that enabling syncookies uses some other mechanism as
I
do not see the dropped connections when I have net.ipv4.tcp_syncookies
= 1
and back log to some small value like net.ipv4.tcp_max_syn_backlog =10.
From what I read so far is that syncookies goes into action when you
hit a
certain high water mark on your backlog buffer. Raising the backlog
value
and/or enabling syncookies seems like a good idea.
You can look at how many connection are in the backlog with net stat.
netstat -aln | grep -c SYN_RECV
146
The problem here is that this gives you and instantaneous count of the
backlog. As you can imagine this backlog is getting filled and cleared
at a
high rate. It would be nice to know when this actually gets full or even
what the max has been over a period of time. Is there a stat like this
in
net statistics?
This information seems like it is quite important in determining the
performance of your server latency.

In nginx there is the optional backlog variable in the listen
configuration.
I imagine that this maps directly to the system call listen. Though I
see
that the nginx backlog default is -1. How does that translate to the
system
parameter which requires an in above 0.

Radek

Radek_Burkat · February 10, 2009, 9:37am

On Tue, Feb 10, 2009 at 12:09:50AM -0800, Radek Burkat wrote:

I actually tried the latest 0.6x and .7x. The value of 10 was just for

Oh, -1 should be fixed in wiki.

testing and reproducing what I was seeing. I had the system backlog at
1024 so I must have been hitting the 511 of nginx. I verified at least on
rel5/centos5 that when the backlog is reached, no RST packet is sent. The
SYN is just dropped and ignored. This is also consistent with the
documentation of backlog behavior from Stevens, Unix Network Programming.

One of the things that the documentation suggests, but is very vague about,
is that the backlog count may not only be connections that are in the
SYN_RECV state but also connections that are in the soft ESTABLISHED phase.

Yes, backlog is also for ESTABLISHED, but not accept()ed sockets.
However, in FreeBSD backlog does not account sockets in syncache
(SYN_RECV
state). I do not know about Linux.

Radek_Burkat · February 10, 2009, 9:23am

I actually tried the latest 0.6x and .7x. The value of 10 was just for
testing and reproducing what I was seeing. I had the system backlog at
1024 so I must have been hitting the 511 of nginx. I verified at least
on
rel5/centos5 that when the backlog is reached, no RST packet is sent.
The
SYN is just dropped and ignored. This is also consistent with the
documentation of backlog behavior from Stevens, Unix Network
Programming.

One of the things that the documentation suggests, but is very vague
about,
is that the backlog count may not only be connections that are in the
SYN_RECV state but also connections that are in the soft ESTABLISHED
phase.
That is server sent out the syn/ack but has not yet received the
ack/first
data from the client. I’ll try to verify this.

It would seem that on even a moderately busy server serving pages of
images
it is very easy to hit the 511 backlog. Even though my server can
saturate
a gigabit at 16k req/s on ab, i hit this backlog limit pretty regularly
on
conditions where the average is 200 req/s.
As I explained earlier, when this happens your image is delayed by at
least
a second until the browser re requests. Worst of all there are no
errors,
or statistics that would suggest this is happening. Of course I can
just
increase the backlog but at this point might as well try to understand
this
mechanism and figure out how to detect it in the future.

I’ll investigate some more and post here in the effort that this may
help
others with debugging similar issues.

Radek