Ubuntu+Nginx packet loss / dropped upstream connections

Detlef_R · October 29, 2013, 7:36pm

Hello Nginx experts,

We make heavy use of Nginx as a reverse-proxy/load-balancer. It
communicates
with Apache and Tornado hosts upstream and proxies them publicly on port
80/443 - pretty standard.

The problem is, when pinging the LB host, every 30 pings or so, a ping
is
completely dropped and the latency immediately jumps from <20ms to

1000ms,
then after a few pings, calms down again.

We are receiving a lot of messages like:

2013/10/28 08:49:05 [error] 20612#0: *77590567 recv() failed (104:
Connection reset by peer) while reading response header from upstream,
client: 50.xx.xx.169, server: loadbalancer, request: “GET / HTTP/1.1”,
upstream: “http://10.xx.xx.84:8014/”, host: “loadbalancer”

and:

2013/10/28 08:49:05 [error] 20612#0: *77590567 no live upstreams while
connecting to upstream, client: 50.xx.xx.169, server: loadbalancer,
request:
“GET / HTTP/1.1”, upstream: “http://api-read-frontends/”, host:
“loadbalancer”

Is there configuration in Nginx that could be causing this? We have also
cusomized sysctl.conf to try and fix it, no luck so far. There’s more
info,
ping dumps, and our sysctl file attached to this question:

Thanks in advance, any help is immensely appreciated Nginx is
awesome!

Posted at Nginx Forum:

sgammon · May 22, 2014, 9:05pm

Did you ever get a response to this…We are seeing the following:

no live upstreams while connecting to upstream…

we know the upstream servers are not crashing but are trying to
determine
how/why they are being deemed as down. The nginx gives no information
on
this and the servers show no errors either.

Posted at Nginx Forum:

sgammon · May 23, 2014, 10:53am

Hello!

On Thu, May 22, 2014 at 03:05:22PM -0400, slowredbike wrote:

Did you ever get a response to this…We are seeing the following:

no live upstreams while connecting to upstream…

we know the upstream servers are not crashing but are trying to determine
how/why they are being deemed as down. The nginx gives no information on
this and the servers show no errors either.

All servers in the upstream block are considered down due to
errors encountered while working with the servers previously.
Relevant information about errors should be in logs.

See max_fails/fail_timeout parameters of the server directive in
the documentation for details:

http://nginx.org/en/docs/http/ngx_http_upstream_module.html#server

–
Maxim D.
http://nginx.org/

sgammon · May 23, 2014, 6:05pm

Would you recommend any extended/additional debugging that I should
enable
to help us track this down?

Posted at Nginx Forum: