A problem with the keepalive module and the direcitve proxy_next_upstream

yaoweibin · January 14, 2013, 9:11am

Hi, folks,

We have found a bug with the keepalive module. When we used the
keepalive
module, the directive proxy_next_upstream seems disabled.

We use Nginx as reverse server. Our backend servers simply close
connection
when read some abnormal packets. Nginx will call the function
ngx_http_upstream_next() and try to use the next server. The ft_type
is NGX_HTTP_UPSTREAM_FT_ERROR. We want to turn off the try mechanism
with
such packets. Otherwise, it will try all the servers every time. We use
directive proxy_next_upstream off. If it’s not keepalive connection,
everything is fine. If it’s keepalive connection, it will run such code:

2858 if (u->peer.cached && ft_type == NGX_HTTP_UPSTREAM_FT_ERROR) {
2859 status = 0;
2860
2861 /* TODO: inform balancer instead */
2862
2863 u->peer.tries++;
2864

The status is cleared to be 0. The below code will never be touched:

2896 if (status) {
2897 u->state->status = status;
2898
2899 if (u->peer.tries == 0 || !(u->conf->next_upstream &
ft_type))
{

The variable of tries and u->conf->next_upstream become useless.

I don’t know why the cached connection should clear the status, Can we
just
remove the code from line 2858 to 2864? Is there any side effect?

yaoweibin · January 14, 2013, 11:52am

Hello!

On Mon, Jan 14, 2013 at 04:11:20PM +0800, 姚伟斌 wrote:

directive proxy_next_upstream off. If it’s not keepalive connection,
The status is cleared to be 0. The below code will never be touched:
remove the code from line 2858 to 2864? Is there any side effect?
Cached connection might be (legitimately) closed by an upstream
server at any time, so the code always retries if sending request
failed.

–
Maxim D.

yaoweibin · January 23, 2013, 7:25am

Hi, Maxim,

I have removed the above code. It seems work for us and there is no side
effect. And we have put it on our busy production boxes for a week.

This patch could make nginx honor the tries and u->conf->next_upstream
variable.

The patch is here:

diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c
index 842b634…0a0dfc3 100644
— a/src/http/ngx_http_upstream.c
+++ b/src/http/ngx_http_upstream.c
@@ -2845,6 +2845,12 @@ ngx_http_upstream_next(ngx_http_request_t *r,
ngx_http_upstream_t *u,
“upstream timed out”);
}

+#if 0
if (u->peer.cached && ft_type == NGX_HTTP_UPSTREAM_FT_ERROR) {
status = 0;

@@ -2853,6 +2859,7 @@ ngx_http_upstream_next(ngx_http_request_t *r,
ngx_http_upstream_t *u,
u->peer.tries++;

 } else {

+#endif
switch(ft_type) {

     case NGX_HTTP_UPSTREAM_FT_TIMEOUT:

@@ -2875,7 +2882,9 @@ ngx_http_upstream_next(ngx_http_request_t *r,
ngx_http_upstream_t *u,
default:
status = NGX_HTTP_BAD_GATEWAY;
}
+#if 0
}
+#endif

 if (r->connection->error) {
     ngx_http_upstream_finalize_request(r, u,

~

2013/1/14 Ҧΰ [email protected]

yaoweibin · January 14, 2013, 3:14pm

The nginx end will close the cacahed connection actively in this next
upstream function. I don’t know why it should always retry other
server
and don’t honor the tries and u->conf->next_upstream variable.

Thanks.

2013/1/14 Maxim D. [email protected]

yaoweibin · January 23, 2013, 11:42pm

Hello!

On Wed, Jan 23, 2013 at 02:25:10PM +0800, 姚伟斌 wrote:

I have removed the above code. It seems work for us and there is no side
effect. And we have put it on our busy production boxes for a week.

Just removing the code is wrong thing to do, as it will no longer
able to retry if an upstream server closes connection at the same
time when nginx decides to send a request into it, and there is
only one upstream server configured in a given upstream block.

[…]

–
Maxim D.