SSL FD Leak

Ben_Maurer · December 30, 2007, 10:36pm

Hi,

On a server that has quite a few SSL connections, I started to notice
that FDs were leaking. I set the load balancer in front of nginx to stop
sending new requests to one server for a few minutes (to let the
keepalive time expire) and found that there were a few thousand FDs
open. netstat says that there are many sockets in the CLOSE_WAIT and
ESTABLISHED state for the SSL server. Many of them have data in receive
queue.

Any ideas what might cause this? This is an up-to-date 0.5.x install.

Ben

Ben_Maurer · December 30, 2007, 11:21pm

Ben Maurer wrote:

Any ideas what might cause this? This is an up-to-date 0.5.x install.
Some progress on debugging this – it may have to to do with the
deferred setting.

I’ve managed to get straces like this:

accept(6, {sa_family=AF_INET, sin_port=htons(35327),
sin_addr=inet_addr(“127.0.0.1”)}, [16]) = 92
ioctl(92, FIONBIO, [1]) = 0
recv(92, 0xbf9c6c2b, 1, MSG_PEEK) = -1 EAGAIN (Resource
temporarily unavailable)

by using:

ab -c500 -n2000 https://localhost:8095/

and aborting in the middle. It seems that these straces are the ones
that result in leaked FDs. The trace really doesn’t make much sense to
me. Deferred accept promises that the socket only goes into accept once
it has data or if it’s ready to be closed. Neither of these should
result in an EAGAIN. Regardless, it seems the problem is that the FD
never gets added to epoll at this point.

Ben

Ben_Maurer · December 31, 2007, 12:03am

Ben Maurer wrote:

recv(92, 0xbf9c6c2b, 1, MSG_PEEK) = -1 EAGAIN (Resource
result in an EAGAIN. Regardless, it seems the problem is that the FD
never gets added to epoll at this point.

It seems like commenting out the check for HTTP requests on the socket
made everything work. There’s probably a way to do this more correctly
(eg, get the event added back into the epoll structure). With that said,
maybe it’d be possible to avoid the MSG_PEEK call completely. Openssl is
good at detecting this error:

2007/12/30 17:16:43 [crit] 18303#0: *2 SSL_do_handshake() failed (SSL:
error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request) while
reading client request line, client: 127.0.0.1, server: warp10

The only disadvantage to doing this is that currently, nginx seems to
wait until the header has been fully parsed to send an error. However,
the client wouldn’t be able to notice a difference unless:

The initial request didn’t fit in a single write() (it’d have to be a
few KB for this to happen)
It was super-picky about getting input before output.

Given that this is a rare error condition anyways, it doesn’t seem worth
the extra system call to handle this case…

Ben

Ben_Maurer · January 4, 2008, 9:42am

On Thu, Jan 03, 2008 at 12:38:14PM -0500, Ben Maurer wrote:

and aborting in the middle. It seems that these straces are the ones
good at detecting this error:
Is it worth thinking about some safeguard against this. For example, by
enforcing that every connection has some sort of timer guarding it.

The attached patch should fix the leak.

Ben_Maurer · January 4, 2008, 9:51am

On Sun, Dec 30, 2007 at 05:55:51PM -0500, Ben Maurer wrote:

receive queue.
ioctl(92, FIONBIO, [1]) = 0
it has data or if it’s ready to be closed. Neither of these should
error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request) while
Given that this is a rare error condition anyways, it doesn’t seem worth
the extra system call to handle this case…

Yes, OpenSSL detects plain HTTP request, however, the request is lost,
so nginx uses MSG_PEEK to test SSL handshake message.

This allows to redirect a plain request to HTTPS using special 497 code:

error_page  497  = https://$host$request_uri  redirect;

create custom error_page, log it, etc.

Ben_Maurer · January 3, 2008, 6:46pm

Hi Igor,

I hope you had a good vacation.

Ben Maurer wrote:

temporarily unavailable)
the FD never gets added to epoll at this point.

It seems like commenting out the check for HTTP requests on the socket
made everything work. There’s probably a way to do this more correctly
(eg, get the event added back into the epoll structure). With that said,
maybe it’d be possible to avoid the MSG_PEEK call completely. Openssl is
good at detecting this error:

I tracked down one more FD leak – it seems that if openssl returns a
WANT_READ or a WANT_WRITE during the handshake and the client stops
responding to packets that the FDs will leak. I think the solution to
this is to add a ngx_add_timer(c->read, 30000); when WANT_READ is
returned in ngx_ssl_handshake.

I’m a bit worried about how easy it is for a connection to not have any
timer and thus be leaked when the client completely loses connectivity.
Is it worth thinking about some safeguard against this. For example, by
enforcing that every connection has some sort of timer guarding it.

Ben