Nginx doesn't switch upstream in some cases

Guillaume_F · February 10, 2010, 3:20pm

Hi all,

We’re running nginx as a load balancer in front of two reverse proxies
(Apache2/mod_security) for our web site.

It’s been working great except last night apache stopped on one of the
reverse proxy and nginx would continue sending HTTP requests to it. The
result was that about 50% of the requests to the web site failed.

If I unplug the reverse proxy, nginx detects that it’s down and will
only send requests to the “good” reverse proxy.

Here’s my config, I guess that my “proxy_next_upstream” is wrong but I
can’t figure out how… It would be great if someone could shed some
light on this issue for me.

=====
upstream SRACQ {
server 192.168.1.57:80;
server 192.168.1.67:80;
}

server {
listen 66.254.57.167:80;
server_name www.sracq.qc.ca;
access_log /var/log/nginx/www.sracq.qc.ca_HTTP.access.log;
error_log /var/log/nginx/www.sracq.qc.ca_HTTP.error.log;

location / {
proxy_next_upstream error timeout invalid_header http_500;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;

proxy_pass http://SRACQ;

}
}

Thanks a lot,
GFK’s

Guillaume_F · February 10, 2010, 3:35pm

Hello!

On Wed, Feb 10, 2010 at 09:19:07AM -0500, Guillaume F. wrote:

only send requests to the “good” reverse proxy.

Here’s my config, I guess that my “proxy_next_upstream” is wrong but I
can’t figure out how… It would be great if someone could shed some
light on this issue for me.

[…]

proxy_next_upstream error timeout invalid_header http_500;

Which status code your reverse proxy returns once backend is down?
Most likely it’s 502 or 504, so you have to add them to
proxy_next_upstream.

Maxim D.

Guillaume_F · February 10, 2010, 4:35pm

Maxim D. a Ã©crit :

Which status code your reverse proxy returns once backend is down?
Most likely it’s 502 or 504, so you have to add them to
proxy_next_upstream.

Thanks for the tip, effectively I should add http_502, http_503 and
http_504 to my proxy_next_upstream.

However, last night’s problem was not with a backend, but with a reverse
proxy. The machine was still up (responding to pings) but apache was not
responding.

The was no response to HTTP requests to the reverse proxy. I guess
that’s the “timeout” directive, but somehow that didn’t work…

Thanks,
GFK’s

Guillaume_F · February 10, 2010, 7:38pm

Guillaume F. a Ã©crit :

However, last night’s problem was not with a backend, but with a reverse
proxy. The machine was still up (responding to pings) but apache was not
responding.

The was no response to HTTP requests to the reverse proxy. I guess
that’s the “timeout” directive, but somehow that didn’t work…

It turned out that I was missing this directive:
proxy_connect_timeout 2;

I couldn’t find the default value, but I guess that it was too high for
my application.

What this directive in place, when one reverse proxy is down, half the
requests take 0.2sec to return and half take 2.2sec to return. This is
good enough for me.

Thanks,
GFK’s

Guillaume_F · February 12, 2010, 9:23am

Piotr S. a écrit :

What this directive in place, when one reverse proxy is down, half the
requests take 0.2sec to return and half take 2.2sec to return. This is
good enough for me.

You should consider using “max_fails” and “fail_timeout”:
Module ngx_http_upstream_module

Well, that’s strange. According to the documentation, the default values
for those are “max_fails=1 fail_timeout=10s”.

With those default settings (and with “timeout” in proxy_next_upstream),
Guillaume shouldn’t hit the timeout on half the requests, but rather
only once in a while, right ?

How can this happen ?

Guillaume_F · February 10, 2010, 7:46pm

What this directive in place, when one reverse proxy is down, half the
requests take 0.2sec to return and half take 2.2sec to return. This is
good enough for me.

You should consider using “max_fails” and “fail_timeout”:
http://wiki.nginx.org/NginxHttpUpstreamModule#server

Best regards,
Piotr S. < [email protected] >

Guillaume_F · February 12, 2010, 2:54pm

With those default settings (and with “timeout” in proxy_next_upstream),
Guillaume shouldn’t hit the timeout on half the requests, but rather
only once in a while, right ?

How can this happen ?

Actually, my “half the time” was based on running “time wget -O
/dev/null http://www.sracq.qc.ca/” a few times and looking at the
results. There’s nothing statistical in my approach.

Nginx doesn't switch upstream in some cases

} }

}
}