Connection timed out while connecting to upstream, syn packet volume is high

I’ve been troubleshooting this issue on and off the last couple days and
have not been able to eliminate this error from nginx:

2011/12/12 13:12:32 [error] 27810#0: *36209658 connect() failed (110:
Connection timed out) while connecting to upstream

Our setup is a single machine running nginx that serves as reverse proxy
to two separate apache servers. The setup is handling perhaps a few
dozen requests per second. The boxes themselves are almost idle, no
issue with load in that respect.

The upstream apache servers are not getting hit hard and are no where
near their maxclient settings.

This error was happening every second or two on the nginx side until I
set “net.ipv4.tcp_syncookies = 1” on the apache machines (not sure why
it wasn’t turned out by default). After making that change the
frequency dropped to 1 or 2 connection time outs per minute. Similarly,
leaving syncookies off but increasing “net.ipv4.tcp_max_syn_backlog”
alleviated the issue as well, though didn’t stop it entirely.

If I use tcdump to count syn packets I see 8x the volume to these
nginx-fronted apache machines that I see for other datacenters handling
far more load (hundreds of concurrent connections) that are behind
traditional hardware load balancers.

Is there a reason nginx would be generating so many syn packets? Or,
anyone have recommended settings for alleviating it as much as possible
at least?

Any help is greatly appreciated.

Posted at Nginx Forum:

Are you sure this isn’t something network related, like packet
loss/reordering/duplication? What nginx version are you running?

I am not sure about that but am digging in more now to try and get more
information. We do have a lot of infrastructure in this datacenter
though and I haven’t previously run into issues along these lines.

I am running nginx 0.7.65 on Ubuntu 10.04.3 LTS. Seems using the
current version might be worth a shot as well, didn’t realize I was that
far behind.

Posted at Nginx Forum: