HttpHealthcheckModule server not marked down

luislavena · September 22, 2011, 10:35pm

Hi there,

i’m trying to use the HttpHealthcheckModule for nginx, but i have some
troubles with it.

i have two servers in my upstream, when sabotaging the health for one
server i see in the status view of healthcheck that the server is
down(1), but if i go to the website i’m checking i still come out on
it and see a broken page.

how can i arrange that the server automatically is marked as down when
the check fails?

sorry for my bad english and maybe noob questions.

config:
upstream www-health{
server x.x.x.1 ;
server x.x.x.2 ;
healthcheck_enabled;
healthcheck_delay 10000 ;
healthcheck_timeout 1000;
healthcheck_failcount 2;
#healthcheck_expected ‘I_AM_ALIVE’;
#Important: HTTP/1.0
healthcheck_send “GET / HTTP/1.0” ‘Host: health.test.x.com’
‘Conection: close’ ;
}

nginx: nginx version: nginx/1.0.6
nginx: built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
nginx: TLS SNI support enabled
nginx: configure arguments: --with-http_ssl_module
–add-module=/gnosek-nginx-upstream-fair-2131c73
–with-http_stub_status_module
–add-module=/cep21-healthcheck_nginx_upstreams-b33a846
–prefix=/usr/local/nginx-1.0.6 --with-debug

used:
peckhardt@test-nginx:~/nginx-1.0.6$patch -p1 <
/cep21-healthcheck_nginx_upstreams-5fa4bff/nginx.patch

hope someone would help me.

greetings

Sjaak_P · September 30, 2011, 11:07am

Hi,

I believe that the healthcheck module is only used for checking if the
backend(node) is down.

As I read it, it doesn’t do anything with bringing the node “Down” in
the
nginx config but maybe I misunderstand it.

Regards,
Jaap

Sjaak_P · October 1, 2011, 12:06am

Hi,

It is a bug.

the ngx_upstream_get_peer only check the index greater than i; forgot
to
check i itself.

I used my nginx patch for healthcheck, I have used it in production
more
than half a year. I will upload it to my github in some hours.

liseen

Sjaak_P · October 1, 2011, 10:35am

Hi,

On Sat, Oct 1, 2011 at 7:16 AM, liseen [email protected] wrote:

if you use healthcheck with upstream hash, please compile with branch
support_http_healthchecks of cep21’s fork

GitHub - cep21/nginx_upstream_hash at support_http_healthchecks

if all upstreams’ backends are down(healthcheck), cep’s upstream_hash
will
ignore Healthcheck, if it is not you need, Please try:
GitHub - liseen/nginx_upstream_hash: An hashing load-balancer for nginx

If you find something wrong, please open an issue on github. thanks.

liseen

Sjaak_P · October 1, 2011, 1:16am

Hi,

Please try:

github.com

liseen/healthcheck_nginx_upstreams/blob/master/healthcheck.patch

diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c
index e8d8773..1fba036 100644
--- a/src/http/ngx_http_upstream.c
+++ b/src/http/ngx_http_upstream.c
@@ -4259,6 +4259,17 @@ ngx_http_upstream_add(ngx_conf_t *cf, ngx_url_t *u, ngx_uint_t flags)
     uscf->line = cf->conf_file->line;
     uscf->port = u->port;
     uscf->default_port = u->default_port;
+#if (NGX_HTTP_HEALTHCHECK)
+    uscf->healthcheck_enabled = 0;
+    uscf->health_delay = 10000;
+    uscf->health_timeout = 2000;
+    uscf->health_failcount = 2;
+    uscf->health_buffersize = 1000;
+    uscf->health_send.data = (u_char*)"";
+    uscf->health_send.len = 0;
+    uscf->health_expected.len = NGX_CONF_UNSET_SIZE;
+    uscf->health_expected.data = NGX_CONF_UNSET_PTR;
+#endif

This file has been truncated. show original

patch -p1 < healthcheck.patch
./configure …

if you use healthcheck with upstream hash, please compile with branch
support_http_healthchecks of cep21’s fork

liseen

Sjaak_P · October 3, 2011, 7:54pm

Hi, sjaak

On Mon, Oct 3, 2011 at 10:39 PM, Sjaak P. [email protected]
wrote:

Hey Liseen,

Thank you for teh fix, it’s working now for standard round robin, for
upstream_hash it’s not working but that’s no problem for us we are not
using that in production.

The fail test also add one time to hash_again. Can you set hash_again
10(greater than or equal servers’s number), try again and tell me the
result? The upstream_hash module has worked for some time.

what we often use is gnosek-nginx-upstream-fair, to make it work with

that, can you tell how to handle?

Patch the upstream-fair module like round_robin and upstream_hash
module.

Maybe these should be a module that contains all of the following
features:
RR, Hash, Fair, Health check. Hope some body will provide such module.
I
don’t like patching code.

Sjaak_P · October 4, 2011, 9:40am

Hi Liseen,

setting hash_again greater than the number of servers works, equal not.

i think i will get in the code some day to patch the upstream_fair,
but than i need some free time.

thnx for the help.

grt

2011/10/3 liseen [email protected]:

Sjaak_P · October 3, 2011, 4:40pm

Hey Liseen,

Thank you for teh fix, it’s working now for standard round robin, for
upstream_hash it’s not working but that’s no problem for us we are not
using that in production.
what we often use is gnosek-nginx-upstream-fair, to make it work with
that, can you tell how to handle?

this is what i’ve done to make it work for now:

peckhardt@test-nginx:~/nginx-1.0.6$ patch -p1 <
…/liseen-healthcheck_nginx_upstreams-17298cf/healthcheck.patch
patching file src/http/ngx_http_upstream.c
Hunk #1 succeeded at 4270 (offset 11 lines).
patching file src/http/ngx_http_upstream.h
patching file src/http/ngx_http_upstream_round_robin.c
Hunk #2 succeeded at 25 with fuzz 2 (offset 9 lines).
Hunk #3 succeeded at 33 (offset 9 lines).
Hunk #4 succeeded at 68 (offset 9 lines).
Hunk #5 succeeded at 416 (offset 7 lines).
Hunk #6 succeeded at 448 (offset 7 lines).
Hunk #7 succeeded at 465 (offset 7 lines).
Hunk #8 succeeded at 506 (offset 7 lines).
Hunk #9 succeeded at 523 (offset 7 lines).
Hunk #10 succeeded at 617 (offset 7 lines).
patching file src/http/ngx_http_upstream_round_robin.h

peckhardt@test-nginx:~/nginx-1.0.6$sudo ./configure
–with-http_ssl_module
–add-module=/home/peckhardt/gnosek-nginx-upstream-fair-2131c73
–with-http_stub_status_module
–add-module=/home/peckhardt/liseen-healthcheck_nginx_upstreams-17298cf
–add-module=/home/peckhardt/liseen-nginx_upstream_hash-43fab03
–prefix=/usr/local/nginx-1.0.6 --with-debug

peckhardt@test-nginx:~/nginx-1.0.6$sudo su
peckhardt@test-nginx:~/nginx-1.0.6$make install clean

nginx config:
########### test healthcheck ######
upstream www-health{
server 213.154.235.185 ;
server 213.136.14.13 ;
#hash $request_uri;
#hash_again 1;
healthcheck_enabled;
healthcheck_delay 10000 ;
healthcheck_timeout 1000;
healthcheck_failcount 2;
#healthcheck_expected ‘I_AM_ALIVE’;
#Important: HTTP/1.0
healthcheck_send “GET / HTTP/1.0” ‘Host: health.test.x.com’;
}

2011/10/1 liseen [email protected]:

Sjaak_P · April 21, 2012, 10:52pm

I confirmed that this is an issue only when ip_hash is used - without it
healthcheck fail causes upstream to stop routing requests to failing
node. Could use a bit of help understanding:

is this by design (i.e ip_hash support never implemented) or a bug?

I am happy to create an enhancement/fix but, as a c-newbie, would
appreciate a bit of guidance on where specifically to look in nginx or

github.com

liseen/healthcheck_nginx_upstreams/blob/master/ngx_http_healthcheck_module.c

/*
 * Does health checks of servers in an upstream
 *
 * Author: Jack Lindamood <jack facebook com>
 *
 */

#include <ngx_config.h>
#include <ngx_core.h>
#include <ngx_http.h>
#include <ngx_http_healthcheck_module.h>
#ifdef NGX_SUPERVISORD_MODULE
#include <ngx_supervisord.h>
#if (NGX_SUPERVISORD_API_VERSION != 2)
  #error "ngx_http_upstream_fair_module requires NGX_SUPERVISORD_API v2"
#endif
#endif

#if (!NGX_HAVE_ATOMIC_OPS)
#error "Healthcheck module only works with atomic ops"

This file has been truncated. show original

thanks!

Posted at Nginx Forum:

Sjaak_P · April 21, 2012, 4:23am

Hello,

I was seeing the same issue: when upstream node is down, healthcheck
correctly flags it as down yet subsequent requests are still routed to
that node (even though others are healthy).

Tried this patch with 1.0.13 and .15 - no relief:

github.com

liseen/healthcheck_nginx_upstreams/blob/master/healthcheck.patch

diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c
index e8d8773..1fba036 100644
--- a/src/http/ngx_http_upstream.c
+++ b/src/http/ngx_http_upstream.c
@@ -4259,6 +4259,17 @@ ngx_http_upstream_add(ngx_conf_t *cf, ngx_url_t *u, ngx_uint_t flags)
     uscf->line = cf->conf_file->line;
     uscf->port = u->port;
     uscf->default_port = u->default_port;
+#if (NGX_HTTP_HEALTHCHECK)
+    uscf->healthcheck_enabled = 0;
+    uscf->health_delay = 10000;
+    uscf->health_timeout = 2000;
+    uscf->health_failcount = 2;
+    uscf->health_buffersize = 1000;
+    uscf->health_send.data = (u_char*)"";
+    uscf->health_send.len = 0;
+    uscf->health_expected.len = NGX_CONF_UNSET_SIZE;
+    uscf->health_expected.data = NGX_CONF_UNSET_PTR;
+#endif

This file has been truncated. show original

Is the above supposed to work when ip_hash directive is used?

Thank you,

-nikita

My config:

    upstream admin-cluster {
            ip_hash; # clientIP-based session affinity

healthcheck_enabled;
healthcheck_send “GET /running HTTP/1.0” “Host: www.mydomain.com”
“User-Agent: FooBar/1.0 nginx” “Connection: close”;
healthcheck_delay 2000;
healthcheck_failcount 2;
healthcheck_timeout 2000;
server admin1.staging.mydomain.com:8080 max_fails=1 fail_timeout=5; #max
sec to connect to upstream host
}

Posted at Nginx Forum: