Work-around for upstream_fair bug?

Hi all,

We have nginx set up to reverse proxy to a couple of apache instances.
Every so often apache takes longer than 60 seconds to respond, and then
nginx pulls that server out of rotation and does not add it back in. We
are using the upstream_fair plugin, as it does a nicer job of round
robining, however we are clearly running into this bug:

“Upstream servers are not reintroduced to the pool after becoming
available”
http://nginx.localdomain.pl/ticket/3

This bug has been open for 16 months. Doesn’t seem like it’s going to
get fixed any time soon. Are there any known workarounds? I tried
bumping up max_fails from the default of one, however that didn’t seem
to help - any single slow response from apache pulls that server out of
rotation. Here is the relevant section of our nginx.conf:

upstream www.foo.com {
server public-web1:8000 weight=2 max_fails=2;
server public-web2:8000 weight=3 max_fails=2;
fair;
}

Any suggestions? Thanks in advance,

Russ

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,65269,65269#msg-65269

On Thu, Mar 18, 2010 at 10:30:57AM -0400, russellneufeld wrote:

server public-web1:8000 weight=2  max_fails=2;
server public-web2:8000 weight=3  max_fails=2;
fair;

}

Any suggestions? Thanks in advance,

Hi,

Indeed I haven’t got a suitable round tuit to fix this. I hope I’ll be
able to spend some time on upstream-fair in the upcoming weeks, so not
all is lost yet :slight_smile:

Best regards,
Grzegorz N.

Hi Grzegorz,

Thanks for the reply. Do you know of a way I can detect this state?
We are using both nagios and munin to monitor our website. If I can get
an alert when one of the servers has been taken out of rotation that
would be really helpful in the interim.

Thanks,

Russ

Posted at Nginx Forum: