This also explains, in the writing about fail_timeout, how long
nginx considers a failed upstream server down for.
That you for the clarification about fail_timeout. Based on your
explanation, and some additional testing, this setting does indeed
control how long before nginx will try again (blindly, not knowing if
the box is back alive or not, unfortunately). I didn’t readily get that
from the docs.
It is very disappointing to see this is how nginx works. It is designed
to be the fastest performing engine of its kind on the planet, and seems
to be! So it seems odd that it would have a design that blindly
reattempts a known down server. It would be such a great enhancement to
have some background process detect if the box is back up.
In my particular case the challenge is that if an Amazon EC2 instance
that has an upstream server goes down, then it takes a long time before
nginx counts it as down. To counter this I’ve set proxy_request_timeout
to 500ms. This is because the connect time when the instances are alive
is generally just a ms or two.
I am a bit nervous about using such a short timeout param because of the
possibility of a false positive. For instance what if Amazon has a
network blip that makes the server unreachable for a second or two.
I’ve tried to counter this by using a short fail_timeout parameter. But
of course the downside of that is that WHEN (not if, but WHEN) an ec2
instance goes down nginx will be delaying the response for some users by
the 500ms by blindly retrying it. So I’m kinda stuck in the middle - I
need to use a short fail_timeout param to ensure a recovered box is
detected as such, but at the same time I can’t use too short because it
means delaying users over and over while it is down.
Looks like I will have to look at some of the outside monitoring tools
like God - really do not want to have to go there. Would much prefer it
if nginx would handle this more efficiently. Thank you.
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now.