Upstream Read Timeout Upon Backend Server Offline

mevans336 · May 6, 2013, 3:46pm

Hello,

Each night we take our backend servers offline at specific times for
maintenance. When the application servers restart they immediately begin
answering HTTP requests from Nginx, but we want to keep them out of the
upstream pool for about 30 minutes while they cache information from our
data providers. To do this, I created iptables rules in cron on the
application servers to block all communication from our Nginx reverse
proxies and then delete the rule after 30 minutes.

However, Nginx still seems to think the server that is blocking it via
iptables is online, adds it back to the upstream pool, then times it out
and
takes it back out. This causes our alerting system to go haywire
throwing
HTTP Read Timeouts and our clients to be unable to connect to our
application.

Our upstream block is simple:

upstream app_servers {
ip_hash;
server 192.168.1.12:8080 max_fails=3 fail_timeout=30s;
server 192.168.1.13:8080 max_fails=3 fail_timeout=30s;
}

We’re running Nginx 1.4.

Any ideas on why this would happen and ways we can avoid it?

Thanks.

Posted at Nginx Forum:

mevans336 · May 6, 2013, 3:59pm

ehlo,

one question: do you shutdown all your app-servers or server-by-server,
so
you still have
a available application?

there ist the “down” option for you upstream-block to disable servers,
even
if they are
up, but using this in a dynamic process might get very frickling.

whet do you use for iptables-rules? drop/reset?

i’d debug your server/app-ports when the iptables-script enforces no
connections,
from my belly i wouldnt expect nginx to be the faulty chain link.

what does your log tells you when your appservers come up again and the
iptables-block
is enforced?

regards,

mex

Posted at Nginx Forum:

mevans336 · May 6, 2013, 6:13pm

Hi Mex,

We shut them down one-by-one, 45 minutes apart. The issue only seems to
occur when the first server listed is blocked however. We don’t see the
read
timeouts if I leave the iptables rules enabled on the second server. I
think
that may be a false symptom related to ip_hash binding clients to the
first
server.

Here are the iptables rules:

Drop rule: iptables -I INPUT -s 192.168.1.0/24 -j DROP
Allow rule: iptables -D INPUT -s 192.168.1.0/24 -j DROP

I also thought about trying to add “down” to the servers in the upstream
block, but as you said that would be rather complex to script.

The only error I see is a 499 error in the Nginx logs, followed by a
200:

ip.address - - [06/May/2013:01:50:53 -0400] “GET /home HTTP/1.1” 499 0
“-”
“Mozilla 4.0”
ip.address - - [06/May/2013:01:52:04 -0400] “GET /home HTTP/1.1” 200
24781
“-” “Mozilla/5.0 (compatible; PRTG Network Monitor (www.paessler.com);
Windows)”

Posted at Nginx Forum:

mevans336 · May 6, 2013, 4:01pm

mex Wrote:

ehlo,

one question: do you shutdown all your app-servers or
server-by-server, so you still have a available application?

my bad, please read:

do you shutdown all your app-servers at once or
server-after-server, so you still have a available application?

Posted at Nginx Forum:

mevans336 · May 6, 2013, 6:16pm

Oops, here is the relevant error.log entry from Nginx as well:

013/05/06 01:46:03 [error] 2063#0: *294659 upstream timed out (110:
Connection timed out) while connecting to upstream, client: ip.address,
server: amywebsite.com, request: “GET /home HTTP/1.1”, upstream:
“http://192.168.1.12:8080/home”, host: “www.mywebsite.com”

Posted at Nginx Forum:

mevans336 · May 6, 2013, 6:36pm

Hello!

On Mon, May 06, 2013 at 12:12:44PM -0400, mevans336 wrote:

Hi Mex,

We shut them down one-by-one, 45 minutes apart. The issue only seems to
occur when the first server listed is blocked however. We don’t see the read
timeouts if I leave the iptables rules enabled on the second server. I think
that may be a false symptom related to ip_hash binding clients to the first
server.

Timeouts are expected to appear in logs once per fail_timeout=
specified (after fail_timeout expires, nginx will route one
request to a server in question to check if it’s alive again).

As only certain ips are mapped to the server blocked with ip_hash,
it might nontrivial to test things with low traffic.

Here are the iptables rules:

Drop rule: iptables -I INPUT -s 192.168.1.0/24 -j DROP
Allow rule: iptables -D INPUT -s 192.168.1.0/24 -j DROP

Using “-j REJECT” would make things a lot faster.

[…]

–
Maxim D.
http://nginx.org/en/donation.html

mevans336 · May 6, 2013, 8:41pm

if you REJECT from iptables you tell the client immediatly that the
service/port is not available, otherwise you run into timeouts, yes.
i’m not quite sure, but max_fails=3 x fail_timeout=30s == 90 seconds,
until
your nginx fails over to the other
server.

regards,

mex

Posted at Nginx Forum:

mevans336 · May 6, 2013, 7:47pm

I didn’t even think about rejecting the traffic rather than dropping it!
Great idea!

Would that allow the client connection (Browser to Nginx) to fail over
to
the backend server that is up rather than simply timing out?

Posted at Nginx Forum: