Nginx upstream problem

izghitu · April 13, 2010, 10:29pm

Hi,

I have several problems when using nginx as a load balancer.

Nginx 0.8.35
CentOS 5.4 64bit
Kernel 2.6.28.10

Configs:
/etc/nginx/nginx.conf

user nobody;
worker_processes 32;
worker_rlimit_nofile 10240;

error_log /var/log/nginx/error.log;

events {
use epoll;
accept_mutex off;
worker_connections 8192;
}

http {
include mime.types;
default_type application/octet-stream;

sendfile        on;
tcp_nodelay        on;
access_log off;
server_tokens off;
client_max_body_size 10m;

keepalive_timeout  65;

include /etc/nginx/conf.d/*.conf;

}

/etc/nginx/conf.d/upstrea-nonssl.conf

upstream cloud {
server apache3 max_fails=1 fail_timeout=5;
server apache2 max_fails=1 fail_timeout=5;
server apache1 max_fails=1 fail_timeout=5;
}

server {
listen 1.1.1.1:80;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 30;
proxy_next_upstream error timeout;
proxy_pass http://cloud;
}
}
server {
listen 1.1.1.2:80;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 30;
proxy_next_upstream error timeout;
proxy_pass http://cloud;
}
}

Problem 1:
Regardless of the order of the apache servers in the upstream, apache3
most of the time gets the lowest traffic. Sometimes it gets the same,
sometimes it gets very few traffic.

Problem 2:
If I take down one apache server(halt or /etc/init.d/network stop) then
I can see from the error log of nginx that it is still sending traffic
to that apache server and I see time out or no route to host errors.
When browsing the website I get waiting pages all the time which proves
it tries to send to the apache server that is down.

Problem 3:
If I use the nginx fair module then the server that is listed last in
the upstream, gets the highest amount of traffic.

Question 1:
How do I make nginx split the traffic equally between the 3 apache
servers?

Question 2:
How do I make nginx not send traffic to an apache server that is down?

Please help

Thanks

Posted at Nginx Forum:

izghitu · April 14, 2010, 12:06am

Hello!

On Tue, Apr 13, 2010 at 04:28:35PM -0400, izghitu wrote:

[…]

worker_processes 32;

This looks a bit too many.

[…]

include /etc/nginx/conf.d/*.conf;

Do you have only one file there? Note that upstream{} blocks are
all defined in http section, and having two with identical names
are likely to cause confusion.

[…]

upstream cloud {
server apache3 max_fails=1 fail_timeout=5;
server apache2 max_fails=1 fail_timeout=5;
server apache1 max_fails=1 fail_timeout=5;
}

[…]

proxy_read_timeout 30;
proxy_next_upstream error timeout;
proxy_pass http://cloud;

[…]

Problem 1:
Regardless of the order of the apache servers in the upstream,
apache3 most of the time gets the lowest traffic. Sometimes it
gets the same, sometimes it gets very few traffic.

Does these servers (apache1, apache2, apache3) actually resolve
to distinct ip addresses?

Problem 2:
If I take down one apache server(halt or /etc/init.d/network
stop) then I can see from the error log of nginx that it is
still sending traffic to that apache server and I see time out
or no route to host errors. When browsing the website I get
waiting pages all the time which proves it tries to send to the
apache server that is down.

nginx marks server as “down” after “max_fails” failures in
“fail_timeout” seconds, and re-enables it again after
“fail_timeout” seconds.

As it takes 60 seconds (default proxy_connect_timeout) to detect
that server is down, and you switch it off only for 5 seconds -
it’s somewhat expected that nginx will try to use dead server most
of the time.

Also note that statuses of upstream servers are tracked
independantly by each worker (so with 32 workers you are likely to
see dead server used all the time by different workers).

You may tune nginx to behave a bit better by setting bigger
fail_timeout and/or smaller proxy_connect_timeout.

[…]

Maxim D.

izghitu · April 14, 2010, 12:19am

Hi,

include /etc/nginx/conf.d/*.conf;
Do you have only one file there? Note that
upstream{} blocks are
all defined in http section, and having two with
identical names
are likely to cause confusion.

I have 2 upstreams, one for http and one for https, both have different
names

Does these servers (apache1, apache2, apache3)
actually resolve
to distinct ip addresses?
Yes, they do resolve to distinct IP addresses. I used those names to
hide the IPs.

it’s somewhat expected that nginx will try to use
You may tune nginx to behave a bit better by
setting bigger
fail_timeout and/or smaller proxy_connect_timeout.

Thanks for the info. I will try that and see how it goes.

Posted at Nginx Forum:

izghitu · April 14, 2010, 6:28pm

Hi,

Ok, the failover problem is solved after I applied Maxim’s suggestions.

Now the current problem is that nginx does not load balance equally
between servers.

Under heavy load(300 clients) I killed 2 apache servers so apache3 was
taking all the load. I put back online apache2 and now for some reason
apache3 was getting few hits and most of the load was going to apache2.
When I put back online apache1, it got most of the load, apache2 taking
less load and apache3 almost no load at all.

Why the strange behavior? How do I make it load balance equally between
the 3 apache servers? All the apache servers have identical hardware and
software configuration.

Please help

Posted at Nginx Forum:

izghitu · April 14, 2010, 5:50pm

Hi,

I’ve changed the weight of the 3 apache servers as follows:
apache3 weight=1
apache2 weight=2
apache1 weight=3

When I ran the load testing(300 clients) this time, apache3 was still
getting the lowest load, apache2 the highest and apache1 was getting
less then apache2 but much more then apache3

Any ideas?

Posted at Nginx Forum:

izghitu · April 14, 2010, 7:21pm

Hello!

On Wed, Apr 14, 2010 at 12:27:33PM -0400, izghitu wrote:

now for some reason apache3 was getting few hits and most of the
load was going to apache2. When I put back online apache1, it
got most of the load, apache2 taking less load and apache3
almost no load at all.

Why the strange behavior? How do I make it load balance equally
between the 3 apache servers? All the apache servers have
identical hardware and software configuration.

Please do the following:

make sure you compiled nginx without third party modules and
patches, and show nginx -V output;
show your config;
show access logs with $upstream_addr logged to make it clear
that requests are indeed distributed unequally (note that with
with 32 workers and 3 backends you have to provide at least 32 * 3
lines).

Maxim D.

izghitu · April 14, 2010, 6:59pm

How do I make it load balance equally between the 3 apache servers? All the apache servers have identical hardware and software
configuration.

Maybe try the upstream fair balancer?

http://nginx.localdomain.pl/wiki/UpstreamFair
http://wiki.nginx.org/NginxHttpUpstreamFairModule

“The main feature of upstream_fair is that it knows how many requests
each backend is processing (a backend is simply one of the
servers, among which the load balancer has to make its choice). Thus it
can make a more informed scheduling decision and avoid
sending further requests to already busy backends.”

… as simple roundrobin doesnt always give a consistent result in a
given period of time.

rr