Each night we take our backend servers offline at specific times for
maintenance. When the application servers restart they immediately begin
answering HTTP requests from Nginx, but we want to keep them out of the
upstream pool for about 30 minutes while they cache information from our
data providers. To do this, I created iptables rules in cron on the
application servers to block all communication from our Nginx reverse
proxies and then delete the rule after 30 minutes.
However, Nginx still seems to think the server that is blocking it via
iptables is online, adds it back to the upstream pool, then times it out
and
takes it back out. This causes our alerting system to go haywire
throwing
HTTP Read Timeouts and our clients to be unable to connect to our
application.
Our upstream block is simple:
upstream app_servers {
ip_hash;
server 192.168.1.12:8080 max_fails=3 fail_timeout=30s;
server 192.168.1.13:8080 max_fails=3 fail_timeout=30s;
}
We’re running Nginx 1.4.
Any ideas on why this would happen and ways we can avoid it?
one question: do you shutdown all your app-servers or server-by-server,
so
you still have
a available application?
there ist the “down” option for you upstream-block to disable servers,
even
if they are
up, but using this in a dynamic process might get very frickling.
whet do you use for iptables-rules? drop/reset?
i’d debug your server/app-ports when the iptables-script enforces no
connections,
from my belly i wouldnt expect nginx to be the faulty chain link.
what does your log tells you when your appservers come up again and the
iptables-block
is enforced?
We shut them down one-by-one, 45 minutes apart. The issue only seems to
occur when the first server listed is blocked however. We don’t see the
read
timeouts if I leave the iptables rules enabled on the second server. I
think
that may be a false symptom related to ip_hash binding clients to the
first
server.
Here are the iptables rules:
Drop rule: iptables -I INPUT -s 192.168.1.0/24 -j DROP
Allow rule: iptables -D INPUT -s 192.168.1.0/24 -j DROP
I also thought about trying to add “down” to the servers in the upstream
block, but as you said that would be rather complex to script.
The only error I see is a 499 error in the Nginx logs, followed by a
200:
On Mon, May 06, 2013 at 12:12:44PM -0400, mevans336 wrote:
Hi Mex,
We shut them down one-by-one, 45 minutes apart. The issue only seems to
occur when the first server listed is blocked however. We don’t see the read
timeouts if I leave the iptables rules enabled on the second server. I think
that may be a false symptom related to ip_hash binding clients to the first
server.
Timeouts are expected to appear in logs once per fail_timeout=
specified (after fail_timeout expires, nginx will route one
request to a server in question to check if it’s alive again).
As only certain ips are mapped to the server blocked with ip_hash,
it might nontrivial to test things with low traffic.
Here are the iptables rules:
Drop rule: iptables -I INPUT -s 192.168.1.0/24 -j DROP
Allow rule: iptables -D INPUT -s 192.168.1.0/24 -j DROP
if you REJECT from iptables you tell the client immediatly that the
service/port is not available, otherwise you run into timeouts, yes.
i’m not quite sure, but max_fails=3 x fail_timeout=30s == 90 seconds,
until
your nginx fails over to the other
server.