We make heavy use of Nginx as a reverse-proxy/load-balancer. It
communicates
with Apache and Tornado hosts upstream and proxies them publicly on port
80/443 - pretty standard.
The problem is, when pinging the LB host, every 30 pings or so, a ping
is
completely dropped and the latency immediately jumps from <20ms to
1000ms,
then after a few pings, calms down again.
We are receiving a lot of messages like:
2013/10/28 08:49:05 [error] 20612#0: *77590567 recv() failed (104:
Connection reset by peer) while reading response header from upstream,
client: 50.xx.xx.169, server: loadbalancer, request: “GET / HTTP/1.1”,
upstream: “http://10.xx.xx.84:8014/”, host: “loadbalancer”
and:
2013/10/28 08:49:05 [error] 20612#0: *77590567 no live upstreams while
connecting to upstream, client: 50.xx.xx.169, server: loadbalancer,
request:
“GET / HTTP/1.1”, upstream: “http://api-read-frontends/”, host:
“loadbalancer”
Is there configuration in Nginx that could be causing this? We have also
cusomized sysctl.conf to try and fix it, no luck so far. There’s more
info,
ping dumps, and our sysctl file attached to this question:
Thanks in advance, any help is immensely appreciated Nginx is
awesome!
Did you ever get a response to this…We are seeing the following:
no live upstreams while connecting to upstream…
we know the upstream servers are not crashing but are trying to
determine
how/why they are being deemed as down. The nginx gives no information
on
this and the servers show no errors either.
On Thu, May 22, 2014 at 03:05:22PM -0400, slowredbike wrote:
Did you ever get a response to this…We are seeing the following:
no live upstreams while connecting to upstream…
we know the upstream servers are not crashing but are trying to determine
how/why they are being deemed as down. The nginx gives no information on
this and the servers show no errors either.
All servers in the upstream block are considered down due to
errors encountered while working with the servers previously.
Relevant information about errors should be in logs.
See max_fails/fail_timeout parameters of the server directive in
the documentation for details: