Forum: NGINX Dropped https client connection doesn't drop backend proxy_pass connection

Posted by Robert Mueller (Guest)
on 2013-03-15 05:23
(Received via mailing list)
Hi

I'm trying to setup nginx to proxy a server sent events connection
(http://dev.w3.org/html5/eventsource/) to a backend server.

The approach is the browser connects to a particular path, which then
checks the cookies to see the connection is authorised, and then returns
an X-Accel-Redirect header to connect up to a separate internal path to
do the proxying. That separate internal path is configured like this:

    location ^~ /pushevents/ {
      internal;

      # Immediately send backend responses back to client
      proxy_buffering off;
      # Disable keepalive to browser
      keepalive_timeout 0;
      # It's a long lived backend connection with potentially a long
      time between
      #  push events, make sure proxy doesn't timeout
      proxy_read_timeout 7200;

      proxy_pass http://pushbackend;
    }

The backend service listening on pushbackend then holds the connection
open, and returns data down the connection whenever a push event occurs,
which nginx proxies straight back to the browser because buffering is
turned off.

One important thing here is that if the client drops the connection to
nginx, nginx should also drop the connection to the backend server. This
is done by making sure we do not change proxy_ignore_client_abort, which
defaults to off.

Testing this out, it all works fine for http browser connections, but
appears to be broken for https browser connections.

When using http with my test client, I see this in the nginx log when
the client disconnects:

2013/03/14 23:27:27 [info] 27717#0: *1 client prematurely closed
connection, so upstream connection is closed too while reading upstream,
client: 10.50.1.14, server: www.*, request: "GET /events/ HTTP/1.0",
upstream: "http://127.0.0.4:80/.../", host: "..."

And when I check netstat, I see that the connection from nginx ->
pushbackend has been dropped as well.

However for https connections, this is not what happens. Instead, it
appears nginx fails to detect that the client has closed the connection,
and leaves the nginx -> pushbackend connection open (confirmed with
netstat). And because we set proxy_read_timeout to 2 hours, it takes 2
hours for that connection to be closed! That's bad.

This is nginx 1.2.7 compiled with openssl 1.0.1e.

With an http connection with debugging enabled, when I close the client
connection I see this in the log:

2013/03/15 00:11:33 [debug] 28285#0: epoll: fd:3 ev:0005
d:00007F6B414AA150
2013/03/15 00:11:33 [debug] 28285#0: *1 http run request: "..."
2013/03/15 00:11:33 [debug] 28285#0: *1 http upstream check client,
write event:0, "..."
2013/03/15 00:11:33 [debug] 28285#0: *1 http upstream recv(): 0 (11:
Resource temporarily unavailable)
2013/03/15 00:11:33 [info] 28285#0: *1 client prematurely closed
connection, so upstream connection is closed too while reading upstream,
client: 10.50.1.14, server: insecure.*, request: "GET /events/
HTTP/1.0", upstream: "http://127.0.0.4/.../", host: "..."
2013/03/15 00:11:33 [debug] 28285#0: *1 finalize http upstream request:
499
2013/03/15 00:11:33 [debug] 28285#0: *1 finalize http proxy request
2013/03/15 00:11:33 [debug] 28285#0: *1 free rr peer 1 0
2013/03/15 00:11:33 [debug] 28285#0: *1 close http upstream connection:
32
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 0000000001847BC0, unused:
48
2013/03/15 00:11:33 [debug] 28285#0: *1 event timer del: 32:
1363327885296
2013/03/15 00:11:33 [debug] 28285#0: *1 reusable connection: 0
2013/03/15 00:11:33 [debug] 28285#0: *1 http output filter "..."
2013/03/15 00:11:33 [debug] 28285#0: *1 http copy filter: "..."
2013/03/15 00:11:33 [debug] 28285#0: *1 http postpone filter "..."
00007FFF614BAFE0
2013/03/15 00:11:33 [debug] 28285#0: *1 http copy filter: -1 "..."
2013/03/15 00:11:33 [debug] 28285#0: *1 http finalize request: -1, "..."
a:1, c:1
2013/03/15 00:11:33 [debug] 28285#0: *1 http terminate request count:1
2013/03/15 00:11:33 [debug] 28285#0: *1 http terminate cleanup count:1
blk:0
2013/03/15 00:11:33 [debug] 28285#0: *1 http posted request: "..."
2013/03/15 00:11:33 [debug] 28285#0: *1 http terminate handler count:1
2013/03/15 00:11:33 [debug] 28285#0: *1 http request count:1 blk:0
2013/03/15 00:11:33 [debug] 28285#0: *1 http close request
2013/03/15 00:11:33 [debug] 28285#0: *1 http log handler
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 0000000001952800
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 0000000001895EA0
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 0000000001846BB0, unused:
0
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 0000000001894E90, unused:
3
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 0000000001896EB0, unused:
1771
2013/03/15 00:11:33 [debug] 28285#0: *1 close http connection: 3
2013/03/15 00:11:33 [debug] 28285#0: *1 reusable connection: 0
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 00000000018467A0
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 00000000018461A0
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 000000000183B210, unused:
8
2013/03/15 00:11:33 [debug] 28285#0: *1 free: 0000000001846690, unused:
128
2013/03/15 00:11:33 [debug] 28285#0: timer delta: 3247
2013/03/15 00:11:33 [debug] 28285#0: posted events 0000000000000000
2013/03/15 00:11:33 [debug] 28285#0: worker cycle
2013/03/15 00:11:33 [debug] 28285#0: epoll timer: -1

When doing the same thing with an https connection (openssl s_client)
and closing the connection I see this in the log:

2013/03/15 00:10:23 [debug] 28237#0: epoll: fd:3 ev:0005
d:00007F6ECDCAE150
2013/03/15 00:10:23 [debug] 28237#0: *1 http run request: "..."
2013/03/15 00:10:23 [debug] 28237#0: *1 http upstream check client,
write event:0, "..."
2013/03/15 00:10:23 [debug] 28237#0: *1 http upstream recv(): 1 (11:
Resource temporarily unavailable)
2013/03/15 00:10:23 [debug] 28237#0: *1 http run request: "..."
2013/03/15 00:10:23 [debug] 28237#0: *1 http upstream process non
buffered downstream
2013/03/15 00:10:23 [debug] 28237#0: *1 event timer del: 32:
1363327814331
2013/03/15 00:10:23 [debug] 28237#0: *1 event timer add: 32:
7200000:1363327823956
2013/03/15 00:10:23 [debug] 28237#0: timer delta: 4626
2013/03/15 00:10:23 [debug] 28237#0: posted events 0000000000000000
2013/03/15 00:10:23 [debug] 28237#0: worker cycle
2013/03/15 00:10:23 [debug] 28237#0: epoll timer: 7200000

Is this an nginx bug?

--
Rob Mueller
robm@fastmail.fm
Posted by Maxim Dounin (Guest)
on 2013-03-15 09:11
(Received via mailing list)
Hello!

On Fri, Mar 15, 2013 at 03:22:31PM +1100, Robert Mueller wrote:

[...]

>
> However for https connections, this is not what happens. Instead, it
> appears nginx fails to detect that the client has closed the connection,
> and leaves the nginx -> pushbackend connection open (confirmed with
> netstat). And because we set proxy_read_timeout to 2 hours, it takes 2
> hours for that connection to be closed! That's bad.

You shouldn't rely on connection close being detected by nginx.
It's generally useful optimization, but it's not something which
is guaranteed to work.

It is generally not possible to check if the connection was closed
if there a pending data in it.  One have to read all pending data
before connection close can be detected, but it doesn't work as a
generic detection mechanism as one have to buffer read data
somewhere.

As of now, nginx uses:

1) A EV_EOF flag as reported by kqueue.  This only works if you
use kqueue, i.e. on FreeBSD and friends.

2) The recv(MSG_PEEK) call to test a case when connection was
closed.  This works on all platforms, but only if there are no
pending data.

In case of https, in many (most) cases there are pending data -
due to various SSL packets send during connection close.  This
means connection close detection with https doesn't work unless
you use kqueue.

Further reading:

http://mailman.nginx.org/pipermail/nginx/2011-June...
http://mailman.nginx.org/pipermail/nginx/2011-Nove...

--
Maxim Dounin
http://nginx.org/en/donation.html
Posted by Robert Mueller (Guest)
on 2013-03-15 23:33
(Received via mailing list)
> In case of https, in many (most) cases there are pending data -
> due to various SSL packets send during connection close.  This
> means connection close detection with https doesn't work unless
> you use kqueue.
>
> Further reading:
>
> http://mailman.nginx.org/pipermail/nginx/2011-June...
> http://mailman.nginx.org/pipermail/nginx/2011-Nove...

These reports appear to relate to SSL upstream connections (both refer
to ngx_http_upstream_check_broken_connection). I'm talking about an SSL
client connection, with a plain http upstream connection.

When an https client drops it's connection, the upstream http proxy
connection is not dropped. If nginx can't detect an https client
disconnect properly, that must mean it's leaking connection information
internally doesn't it?

Rob
Posted by Maxim Dounin (Guest)
on 2013-03-17 00:49
(Received via mailing list)
Hello!

On Sat, Mar 16, 2013 at 09:32:27AM +1100, Robert Mueller wrote:

>
> These reports appear to relate to SSL upstream connections (both refer
> to ngx_http_upstream_check_broken_connection). I'm talking about an SSL
> client connection, with a plain http upstream connection.

Both are about client connections.  The
ngx_http_upstream_check_broken_connection() function is here to
check if client is disconnected or not.

> When an https client drops it's connection, the upstream http proxy
> connection is not dropped. If nginx can't detect an https client
> disconnect properly, that must mean it's leaking connection information
> internally doesn't it?

No.  It just can't say if a connection was closed or not as there
are pending data in the connection, and it can't read data (there
may be a pipelined request).  Therefore in this case, being on the
safe side, it assumes the connection isn't closed and doesn't try
to abort upstream request.

--
Maxim Dounin
http://nginx.org/en/donation.html
Posted by Robert Mueller (Guest)
on 2013-03-19 05:45
(Received via mailing list)
> > When an https client drops it's connection, the upstream http proxy
> > connection is not dropped. If nginx can't detect an https client
> > disconnect properly, that must mean it's leaking connection information
> > internally doesn't it?
>
> No.  It just can't say if a connection was closed or not as there
> are pending data in the connection, and it can't read data (there
> may be a pipelined request).  Therefore in this case, being on the
> safe side, it assumes the connection isn't closed and doesn't try
> to abort upstream request.

Oh right I see now.

So the underlying problem is that the nginx stream layer abstraction
isn't clean enough to handle low level OS events and then map them
through the SSL layer to read/write/eof conceptual events as needed.
Instead you need an OS level "eof" event, which you then assume maps
through the SSL abstraction layer to a SSL stream eof event.

Ok, so I had a look at the kqueue eof handling, and what's needed for
epoll eof handling, and created a quick patch that seems to work.

Can you check this out, and see if it looks right. If so, any chance you
can incorporate it upstream?

http://robm.fastmail.fm/downloads/nginx-epoll-eof.patch

If there's anything you want changed, let me know and I'll try and fix
it up.

Rob
Posted by Maxim Dounin (Guest)
on 2013-03-19 15:55
(Received via mailing list)
Hello!

On Tue, Mar 19, 2013 at 03:45:10PM +1100, Robert Mueller wrote:

> > to abort upstream request.
>
> Oh right I see now.
>
> So the underlying problem is that the nginx stream layer abstraction
> isn't clean enough to handle low level OS events and then map them
> through the SSL layer to read/write/eof conceptual events as needed.
> Instead you need an OS level "eof" event, which you then assume maps
> through the SSL abstraction layer to a SSL stream eof event.

Not exactly.  The underlying problem is that BSD sockets API
doesn't provide standard means to detect EOF without reading all
pending data, and hence OS-specific extensions have to be used to
reliably detect pending EOFs.

> Ok, so I had a look at the kqueue eof handling, and what's needed for
> epoll eof handling, and created a quick patch that seems to work.
>
> Can you check this out, and see if it looks right. If so, any chance you
> can incorporate it upstream?
>
> http://robm.fastmail.fm/downloads/nginx-epoll-eof.patch
>
> If there's anything you want changed, let me know and I'll try and fix
> it up.

I don't really like what you did in ngx_http_upstream.c.  And
there are more places where ev->pending_eof is used, and probably
at least some of these places should be adapted, too.

Additionally, poll_ctl(2) manpage claims the EPOLLRDHUP only
available since Linux 2.6.17, and this suggests it needs some
configure tests/conditional compilation.

Valentin is already worked on this, and I believe he'll be able to
provide a little bit more generic patch.

--
Maxim Dounin
http://nginx.org/en/donation.html
Posted by Robert Mueller (Guest)
on 2013-03-20 02:08
(Received via mailing list)
> Valentin is already worked on this, and I believe he'll be able to
> provide a little bit more generic patch.

Ok, well I might just use ours for now, but won't develop it any
further.

Any idea on a time frame for this more official patch?

Rob
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.