Is there a bug in ngx_http_upstream_module's broken connection check code?

luislavena · June 23, 2011, 6:15am

In the function “ngx_http_upstream_check_broken_connection” of
ngx_http_upstream.c, there is a code segment like below:

n = recv(c->fd, buf, 1, MSG_PEEK);
…
else { /* n == 0 */
err = 0;
}

Here use whether receiving 0 bytes from downstream client to judge
whether the connection has been closed. However, if the downstream
connection is https, and the connection is cosed, this n will never be
0. So the code fails to check the broken connection for https.

My test is create a connection by firefox and click the “Stop” button in
the firefox. If the connection is http, n will be 0; but if the
connection is https, n is always 1. I try to increase the peek read
limit from 1 to 1000 and then n will get 37.

So is it a bug or I misunderstand something?

Thanks.

Posted at Nginx Forum:

speedfirst · June 23, 2011, 2:13pm

Hello!

On Thu, Jun 23, 2011 at 12:14:47AM -0400, speedfirst wrote:

Here use whether receiving 0 bytes from downstream client to judge
whether the connection has been closed. However, if the downstream
connection is https, and the connection is cosed, this n will never be
0. So the code fails to check the broken connection for https.

My test is create a connection by firefox and click the “Stop” button in
the firefox. If the connection is http, n will be 0; but if the
connection is https, n is always 1. I try to increase the peek read
limit from 1 to 1000 and then n will get 37.

So is it a bug or I misunderstand something?

Yes, the code in question won’t be able to detect SSL connection
close with appropriate shutdown records sent by a client (as well
as connection close with some other pending data, e.g. pipelined
request). It is not possible to detect connection close with
standard socket interface if there are pending data without
reading that data first, which isn’t always desired/possible.

In the particular case of SSL it should be possible to use
SSL_peek() here, but better aproach is to use OS-provided hint,
e.g. as EV_EOF provided by kqueue interface (and used by nginx).

Maxim D.

speedfirst · June 24, 2011, 4:24am

So will it trigger a bug like this?

client access nginx via https
nginx proxy pass the request to upstream
before upstream response, client close the connection, then dispose
current ngx_http_request_t object and its memory pool
upstream sends back the response and invoke the read handler
the handler uses a disposed ngx_http_request_t object, and segment
fault.

Posted at Nginx Forum:

speedfirst · June 24, 2011, 4:27am

Hello!

On Thu, Jun 23, 2011 at 10:24:17PM -0400, speedfirst wrote:

So will it trigger a bug like this?

client access nginx via https

nginx proxy pass the request to upstream

before upstream response, client close the connection, then dispose
current ngx_http_request_t object and its memory pool

upstream sends back the response and invoke the read handler

the handler uses a disposed ngx_http_request_t object, and segment
fault.

No.

Maxim D.

speedfirst · November 26, 2011, 1:41pm

I am experiencing this bug in a production system. (SSL closes are not
detected, thus sockets stay in CLOSE_WAIT state forever – nice DoS).

I was looking at nginx sources, and it seems that this bug has not been
fixed.
The alternative is to use stunnel with the X-Forwarded-For patch, but
that’s way too messy.

In ngx_http_upstream_check_broken_connection(), there seems to be a
different path for kqueue. What about modifying the poll/epoll behavior
to detect disconnections for other event modules ? In
ngx_epoll_add_connection(), we can add the EPOLLHUP event, and mark the
connection as disconnected when processing HUP events instead of using
the buggy MSG_PEEK hack

Posted at Nginx Forum:

speedfirst · November 27, 2011, 12:39am

Hello!

On Sat, Nov 26, 2011 at 07:41:24AM -0500, nviennot wrote:

I am experiencing this bug in a production system. (SSL closes are not
detected, thus sockets stay in CLOSE_WAIT state forever – nice DoS).

I believe I’ve already stated this several times, but let me
repeat: this is not a bug, it’s rather lack of optimization.

And if this makes DoS possible in you setup, this means your setup
is broken (i.e. there is a bug in your setup). Even if nginx will
be able to detect connection close in simple SSL case, this
doesn’t mean all connection closes will be detectable.

Consider connection with a pipelined request not yet read from
client: it won’t be possible to detect pending connection close
with standard interfaces (without reading the request first, which
isn’t an option). You must use some timeouts/keepalives to handle
this.

the buggy MSG_PEEK hack
Likely, EPOLLRDHUP is needed here. Patches are welcome.

Maxim D.

speedfirst · November 27, 2011, 10:15pm

Consider the case where nginx is used as a reverse proxy, and upstream
is doing some server push (the http request never ends).
You are saying that timeouts/keepalive should be used. I would agree
with you, and I actually made a module for nginx+tcp_keepalive:

But that’s not enough. The closed SSL connection stays open from nginx
point of view, even though it doesn’t appear anymore in netstat.
On the other hand, when upstream sends something, nginx write() to the
closed socket will fail with EPIPE.
Maybe you were referring using keepalive by having upstream periodically
sending data to the client.
But that’s adding a requirement on the upstream logic, which is why it’s
not just an optimization problem, but an actual bug, when upstream rely
on nginx to close the connection when the client does to detect its
presence.

Bottom line: I’ll send a patch with EPOLLRDHUP. Also feel free to
comment on the tcp-keepalive module.

Posted at Nginx Forum: