Upstream server max_fails and fail_timeout question/discussion

rsawer · October 1, 2010, 3:33pm

Hi!
I have a question about the impact of the max_fails and fail_timeout
settings to the error pages which nginx is showing.

First, the God created an upstream:
upstream my_upstream {
server 192.168.104.58:81 max_fails=2 fail_timeout=30m;
server 192.168.104.58:82 max_fails=2 fail_timeout=30m;
}

I’m just checking the nginx behaviour to know what error pages will it
display in case of different errors, so on the upstream server I have
firewalled connections to the port 81 and 82 by iptables -I INPUT -p tcp
–dport 81 -j DROP (and the same for port 82)

Testing:
a) First request - after timeout for backends Nginx’s giving 504 error
page (gateway timeout) which is correct (and it’s 1st FAIL for
max_fails)
b) Second request - as above, after timeout Nginx’s giving 504 error
page (2nd FAIL so the max_fails limit is reached) which is also correct
c) Third request - All backends are in fail state, so Nginx immediately
gives 502 error page (bad gateway)

And here my doubts begin. With 4th,5th,6th, etc requests I thought I
will be getting 502’s unless the 30 minutes fail_timeout will end. And
instead of that what I see is that Nginx’s is resetting the max_fails
value if all servers in upstream clause have reached the max_fails.

What I wanted to ask is if this a proper behaviour?
I think that there should be an another configuration parameter which
would tell nginx what to do in such cases.

For me it’s quite logical that nginx’s resets this value, so it can try
to check if the backend/upstream server is alive, instead of giving
502’s for 30 minutes fail_timeout, but the world is big, and I think
there are people that would like Nginx to throw 502’s for fail_timeout
period. What do you think about that ?

Best regards,
Rafal Sawer - satisfied Nginx user

Posted at Nginx Forum:

rsawer · October 3, 2010, 1:32pm

Hello!

On Fri, Oct 01, 2010 at 09:32:00AM -0400, rsawer wrote:

[…]

And here my doubts begin. With 4th,5th,6th, etc requests I thought I
will be getting 502’s unless the 30 minutes fail_timeout will end. And
instead of that what I see is that Nginx’s is resetting the max_fails
value if all servers in upstream clause have reached the max_fails.

Yes.

What I wanted to ask is if this a proper behaviour?
I think that there should be an another configuration parameter which
would tell nginx what to do in such cases.

For me it’s quite logical that nginx’s resets this value, so it can try
to check if the backend/upstream server is alive, instead of giving
502’s for 30 minutes fail_timeout, but the world is big, and I think
there are people that would like Nginx to throw 502’s for fail_timeout
period. What do you think about that ?

There is a little sense in returning 502, and that’s why “down due
to max_fails” status is reset once nginx detects there are no live
upstreams.

Maxim D.