Socket leaks., pread and SSL_Write() in 1.0.14

luislavena · March 26, 2012, 9:26am

Hi everyone,

I just wanted to let the developers know about some error flags I
discovered into Nginx 1.0.14, with debug mode disabled.
Socket leaks and pread:
2012/03/24 23:29:52 [alert] 10770#0: open socket #46 left in connection
5
2012/03/24 23:29:52 [alert] 10770#0: open socket #115 left in connection
54
2012/03/24 23:29:52 [alert] 10770#0: open socket #110 left in connection
107
2012/03/24 23:29:52 [alert] 10770#0: aborting
2012/03/24 23:29:52 [alert] 10772#0: open socket #44 left in connection
57
2012/03/24 23:29:52 [alert] 10772#0: open socket #38 left in connection
140
2012/03/24 23:29:52 [alert] 10772#0: aborting
2012/03/25 00:01:35 [alert] 4105#0: *14584 pread() read only 0 of 5733
from “/var/www/domain.com/index.html” while sending response to client,
client: xx.xxx.xx.xxx, server: www.domain.com, request: “GET /
HTTP/1.1”, host: “www.domain.com”

SSL_Write()
2012/03/25 12:59:25 [crit] 8254#0: *342055 SSL_write() failed (SSL:)
while sending to client, client: xxx.xx.xxx.xxx, server: www.domain.com,
request: “GET /community/attachments/info007-jpg.179371/ HTTP/1.1”,
upstream: “fastcgi://127.0.0.1:9000”, host: “www.domain.com”, referrer:
“https://www.domain.com/community/threads/photos.42232/”
2012/03/25 14:52:15 [crit] 8253#0: *388269 SSL_write() failed (SSL:)
while sending to client, client: xx.xxx.xx.xxx, server: www.domain.com,
request: “GET /community/ HTTP/1.0”, upstream:
“fastcgi://127.0.0.1:9000”, host: “www.domain.com”

We are running Nginx on CentOS 5.8 64bits, with openssl 0.9.8e-22.el5.
All ssl directives are located into host, only the ssl_certificate and
ssl_certificate_key are into server.

uname -a

Linux chronos.domain.com 2.6.18-274.18.1.el5 #1 SMP Thu Feb 9 12:45:44
EST 2012 x86_64 x86_64 x86_64 GNU/Linux

nginx -V

nginx version: nginx/1.0.14
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-52)
TLS SNI support disabled
configure arguments: --user=nginx --group=nginx
–prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx
–conf-path=/etc/nginx/nginx.conf --pid-path=/var/run/nginx.pid
–error-log-path=/var/log/nginx/error.log
–http-log-path=/var/log/nginx/access.log
–http-client-body-temp-path=/var/lib/nginx/client
–http-fastcgi-temp-path=/var/lib/nginx/fastcgi
–http-proxy-temp-path=/var/lib/nginx/proxy
–http-scgi-temp-path=/var/lib/nginx/scgi
–http-uwsgi-temp-path=/var/lib/nginx/uwsgi
–lock-path=/var/lock/subsys/nginx --with-cc-opt=‘-O3 -g -m64
-mtune=nocona -m128bit-long-double -mmmx -msse3 -mfpmath=sse’
–with-file-aio --with-http_addition_module --with-http_dav_module
–with-http_degradation_module --with-http_flv_module
–with-http_geoip_module --with-http_gzip_static_module
–with-http_image_filter_module --with-http_mp4_module
–with-http_perl_module --with-http_random_index_module
–with-http_realip_module --with-http_secure_link_module
–with-http_ssl_module --with-http_stub_status_module
–with-http_sub_module --with-http_xslt_module --with-mail
–with-mail_ssl_module --with-poll_module --with-rtsig_module
–with-select_module

open_file_cache is enabled into http:
open_file_cache max=1024 inactive=30s;
open_file_cache_errors on;
open_file_cache_min_uses 2;
open_log_file_cache max=1024 inactive=30s min_uses=2;

Posted at Nginx Forum:

teck · March 26, 2012, 9:30am

This occurs randomly, it is not something constant.

Posted at Nginx Forum:

teck · March 26, 2012, 12:22pm

Hello!

On Sun, Mar 25, 2012 at 11:55:53PM -0400, TECK wrote:

107
2012/03/24 23:29:52 [alert] 10770#0: aborting
2012/03/24 23:29:52 [alert] 10772#0: open socket #44 left in connection
57
2012/03/24 23:29:52 [alert] 10772#0: open socket #38 left in connection
140
2012/03/24 23:29:52 [alert] 10772#0: aborting

Do you see this as a regression from some previous version? If
yes - which one? Do you see the same problem in 1.1.x?

2012/03/25 00:01:35 [alert] 4105#0: *14584 pread() read only 0 of 5733
from “/var/www/domain.com/index.html” while sending response to client,
client: xx.xxx.xx.xxx, server: www.domain.com, request: “GET /
HTTP/1.1”, host: “www.domain.com”

This usually happens if you update files non-atomically, i.e. edit
files in-place instead of creating new file and then renaming it
to desired name. Obvious solution is to update files atomically.

Note well: by using open_file_cache you allow much bigger time
frame for non-atomic updates to trigger problems. If you can’t
eliminate non-atomic updates it’s a good idea to avoid using
open_file_cache.

We are running Nginx on CentOS 5.8 64bits, with openssl 0.9.8e-22.el5.

As openssl 0.9.8e is quite old, I assume it’s heavily modified by
your OS vendor. Do you see the same errors if you compile nginx with
recent vanilla openssl (0.9.8u, 1.0.0h or 1.0.1 will be ok)?

[…]

Maxim D.

teck · March 26, 2012, 2:05pm

Hi Maxim,

Do you see this as a regression from some previous version? If
yes - which one? Do you see the same problem in 1.1.x?

I used before 1.0.12 and did not experienced the socket leaks.

This usually happens if you update files non-atomically, i.e. edit
files in-place instead of creating new file and then renaming it
to desired name. Obvious solution is to update files atomically.

Thanks, that is what I was doing, editing the file with nano.

As openssl 0.9.8e is quite old, I assume it’s heavily modified by
your OS vendor. Do you see the same errors if you compile nginx with
recent vanilla openssl (0.9.8u, 1.0.0h or 1.0.1 will be ok)?

We are using the default openssl version available in CentOS 5.8.
I could look into that but we are talking hundreds of thousands of
servers still using 0.9.8e.
Personally I’m not comfortable yet moving to CentOS 6.2. I will create
an openssl-1.0.1 RPM for CentOS 5.8 and test it on a development server,
then move it into production. Still, I don’t recall noticing any SSL
errors on previous Nginx version (1.0.12).

Posted at Nginx Forum:

teck · March 26, 2012, 2:50pm

I looked into upgrading to 1.0.1, it is not possible. We are talking
about 84 package deps, on a minimal install. So upgrading openssl is not
viable.

Posted at Nginx Forum:

teck · March 26, 2012, 6:48pm

Hello!

On Mon, Mar 26, 2012 at 08:04:13AM -0400, TECK wrote:

Hi Maxim,

Do you see this as a regression from some previous version? If
yes - which one? Do you see the same problem in 1.1.x?

I used before 1.0.12 and did not experienced the socket leaks.

That’s really strange, changes between 1.0.12 and 1.0.14 are
minimal. Could you please re-try with 1.0.12 to see if it works
for you without problems?

As openssl 0.9.8e is quite old, I assume it’s heavily modified by
your OS vendor. Do you see the same errors if you compile nginx with
recent vanilla openssl (0.9.8u, 1.0.0h or 1.0.1 will be ok)?

We are using the default openssl version available in CentOS 5.8.
I could look into that but we are talking hundreds of thousands of
servers still using 0.9.8e.

I’m mostly concerned by local changes by your OS vendor, not about
openssl 0.9.8e by itself. BTW, when you’ve upgraded your openssl
last time? I.e. did the same openssl package version worked for
you before, or you’ve upgraded it with nginx as well?

Personally I’m not comfortable yet moving to CentOS 6.2. I will create
an openssl-1.0.1 RPM for CentOS 5.8 and test it on a development server,
then move it into production. Still, I don’t recall noticing any SSL
errors on previous Nginx version (1.0.12).

As already suggested - you may build nginx with any particular
openssl version statically, by using --with-openssl= configure
argument.

Maxim D.

teck · March 26, 2012, 4:42pm

No need to - it will be statically compiled. Just drop the “make
%{?_smp_mflags}” part for just “make” - otherwise not going to work with
OpenSSL 1.0.1 due to a bug. And you will be able to use EC EDH - a lot
better than just Ephemeral Diffie-Hellman (if you need Perfect Forward
Secrecy, of course); most of the browsers already support it.

‘–with-openssl=/usr/src/redhat/SOURCES/openssl-1.0.1
–with-openssl-opt=enable-ec_nistp_64_gcc_128 --with-cc=/usr/bin/gcc44’
will do the job for a 64bit build. Just make sure you are using gcc44
(export CC=/usr/bin/gcc44).

Posted at Nginx Forum:

teck · March 27, 2012, 4:29am

I mention this because the [crit] issues were present in the past with
Nginx 0.8 and they were solved by Igor. I was hoping a similar fix would
be provided for latest stable version. Do you recommend me to switch to
development version? Upgrading openssl is not a viable solution at the
present time and we are getting a fairly large number of [crit]
SSL_write() errors in our logs.

Posted at Nginx Forum:

teck · March 27, 2012, 12:57am

That’s really strange, changes between 1.0.12 and 1.0.14 are
minimal. Could you please re-try with 1.0.12 to see if it works
for you without problems?
Will do, Maxim. I have to rebuild the RPM again, as I tossed the
previous version from yum repository.

As already suggested - you may build nginx with any particular
openssl version statically, by using --with-openssl= configure
argument.

Personally, I don’t think is a good idea at all to compile source on a
production server. I only use RPM’s for the sake of easy upgrades, I
build my own missing RPM’s for that task. Either ways, the answer is
clear, in order to have Nginx working properly with SSL in a production
environment will require to upgrade to CentOS 6 which has openssl 1.0.0
RPM available. I thought it was a bug that generates those random crit
issues.

Posted at Nginx Forum:

teck · March 29, 2012, 10:51am

More updates on the SSL issue. I have the following configuration:
http {

…

    ssl_prefer_server_ciphers       on;
    ssl_ciphers                     RC4:HIGH:!aNULL:!MD5;
    ssl_protocols                   TLSv1 TLSv1.1 TLSv1.2;
    ssl_session_cache               shared:SSL:5m;
    ssl_session_timeout             10m;

…
}

If I disable the ssl_prefer_server_ciphers, the [crit] errors are gone.
On the other hand, I cannot use anymore the RC4. Any idea what could
cause this?

Posted at Nginx Forum:

teck · April 1, 2012, 12:40am

Hi Maxim,

On 3/26/2012 12:47 PM, Maxim D. wrote:

As already suggested - you may build nginx with any particular
openssl version statically, by using --with-openssl= configure
argument.

I followed your advice and built a backlevel RPM for libcripto.so6 and
libssl.so6 so none of the deps are broken in CentOS 5. Then, I built the
OpenSSL 1.0.1 RPM’s and rebuilt Nginx against the latest libs:

yum list openssl* nginx

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile

base: mirrors.manchester.icecolo.com
extras: mirrors.manchester.icecolo.com
updates: mirrors.manchester.icecolo.com
Installed Packages
nginx.x86_64 1.0.14-1.el5 installed
openssl.x86_64 1.0.1-1.el5 installed
openssl-libs.x86_64 1.0.1-1.el5 installed
openssl098e.x86_64 0.9.8e-1.el5 installed

nginx -V

nginx version: nginx/1.0.14
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-52)
TLS SNI support enabled
configure arguments: --user=nginx --group=nginx
–prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx
–conf-path=/etc/nginx/nginx.conf --pid-path=/var/run/nginx.pid
–error-log-path=/var/log/nginx/error.log
–http-log-path=/var/log/nginx/access.log
–http-client-body-temp-path=/var/lib/nginx/client
–http-fastcgi-temp-path=/var/lib/nginx/fastcgi
–http-proxy-temp-path=/var/lib/nginx/proxy
–http-scgi-temp-path=/var/lib/nginx/scgi
–http-uwsgi-temp-path=/var/lib/nginx/uwsgi
–lock-path=/var/lock/subsys/nginx --with-cc-opt=‘-O3 -g -m64
-mtune=nocona -m128bit-long-double -mmmx -msse3 -mfpmath=sse’
–with-file-aio --with-http_addition_module --with-http_dav_module
–with-http_degradation_module --with-http_flv_module
–with-http_geoip_module --with-http_gzip_static_module
–with-http_image_filter_module --with-http_mp4_module
–with-http_perl_module --with-http_random_index_module
–with-http_realip_module --with-http_secure_link_module
–with-http_ssl_module --with-http_stub_status_module
–with-http_sub_module --with-http_xslt_module --with-mail
–with-mail_ssl_module --with-poll_module --with-rtsig_module
–with-select_module

http {
…
ssl_prefer_server_ciphers on;
ssl_ciphers RC4:HIGH:!aNULL:!MD5;
ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
ssl_session_cache shared:SSL:5m;
ssl_session_timeout 10m;
…

server {
listen 192.168.1.3:443 ssl default_server;
server_name www.domain.com;
access_log off;
error_log /var/log/nginx/localhost.error.log error;
root /var/www/domain.com;
index index.php index.html;
ssl_certificate domain.com.crt;
ssl_certificate_key domain.com.key;
…
}
}

Even if I eliminated the OpenSSL version issues, I still have random
[crit] SSL_write() failures at the same frequency as before. They are
also accompanied by open socket alerts, of this format:
[alert] 2380#0: open socket #34 left in connection 12

I’m looking forward to your suggestions.

Regards,

Floren

teck · April 2, 2012, 6:20pm

Hi Maxim,

On 4/2/2012 7:17 AM, Maxim D. wrote:

As already suggested, it whould be cool to check if you see the
same problem in 1.1.x.

And to proceed further we need debug log, see here:

Debugging | NGINX

Note you’ll need to recompile nginx with “–with-debug” configure
argument to obtain one.

How safe is to put in production a debug built of Nginx?
I could easily compile a different built, but I’m concerned of running
this on a live environment as this is where the errors occur.

Regards,

Floren M.

teck · April 2, 2012, 7:07pm

Hello!

On Mon, Apr 02, 2012 at 12:15:10PM -0400, Floren M. wrote:

Note you’ll need to recompile nginx with “–with-debug” configure
argument to obtain one.

How safe is to put in production a debug built of Nginx?
I could easily compile a different built, but I’m concerned of
running this on a live environment as this is where the errors
occur.

It’s safe and only adds minimal overhead on logging level checks
as long as debug log isn’t enabled.

Note though that writing debug logs consume extra resources, and
it may be noticeable under high load. The debug_connection
directive allows to enable debug logging only for selected ip
addresses to minimize impact.

Maxim D.

teck · April 2, 2012, 1:18pm

Hello!

On Sat, Mar 31, 2012 at 06:39:37PM -0400, Floren M. wrote:

libs:
openssl098e.x86_64 0.9.8e-1.el5 installed
–http-client-body-temp-path=/var/lib/nginx/client
–with-http_perl_module --with-http_random_index_module
–with-http_realip_module --with-http_secure_link_module
–with-http_ssl_module --with-http_stub_status_module
–with-http_sub_module --with-http_xslt_module --with-mail
–with-mail_ssl_module --with-poll_module --with-rtsig_module
–with-select_module

Please also check if nginx actually uses new openssl library, ldd
should be helpful here.

server {
}

Even if I eliminated the OpenSSL version issues, I still have random
[crit] SSL_write() failures at the same frequency as before. They
are also accompanied by open socket alerts, of this format:
[alert] 2380#0: open socket #34 left in connection 12

I’m looking forward to your suggestions.

As already suggested, it whould be cool to check if you see the
same problem in 1.1.x.

And to proceed further we need debug log, see here:

Note you’ll need to recompile nginx with “–with-debug” configure
argument to obtain one.

Maxim D.

teck · April 3, 2012, 3:09am

Hi Maxim,

On 4/2/2012 1:06 PM, Maxim D. wrote:

It’s safe and only adds minimal overhead on logging level checks
as long as debug log isn’t enabled.

Note though that writing debug logs consume extra resources, and
it may be noticeable under high load. The debug_connection
directive allows to enable debug logging only for selected ip
addresses to minimize impact.

I’ve built a nginx-debug RPM but I’m tempted to try 1.1.18 now.
I will use instead 1.0.14, just in case is a nasty bug hidden somewhere
in Nginx. In this way, we fix both versions. These are the compile
cflags I used:
gcc -c -pipe -O -W -Wall -Wpointer-arith -Wno-unused-parameter
-Wunused-function -Wunused-variable -Wunused-value -Werror -g -O3 -g
-m64 -mtune=nocona -m128bit-long-double -mmmx -msse3 -mfpmath=sse

I’m going to leave the error_log set on error and give it a 256k buffer
so it should not affect the performance. Since we only deal [crit], I
don’t think we need to debug anything lower than error.

I’ll add the following settings into http:
debug_points abort;
worker_rlimit_core 512M;
working_directory /var/lib/nginx/;

Let me know please if I should do anything extra.

Regards,

Floren

teck · April 5, 2012, 5:18pm

Hi Maxim,

On 4/3/2012 12:32 PM, Maxim D. wrote:

I’m afraid we aren’t going to fix 1.0.x anyway. The 1.1.x branch
will become stable in serveral weeks, and 1.0.x is expected to get
only security fixes.

Thanks, I will install a debug version of 1.1.18 into live server.

For now, please capture debug log. Anything else (including gdb
dump of the request in question) isn’t that important, but if you
are going to obtain one it might be good idea to recompile nginx
without -O3 as it might produce undesired side-effects in gdb
output. The -O used by default is good enough.

I’m afraid I will have to keep the -O3 for sake of compile consistency
with other RPM’s. I will do a debug capture on latest devel version, the
[crit] errors still occur with 1.1.18 version and OpenSSL 1.0.1.

Regards,

Floren

teck · April 3, 2012, 6:32pm

Hello!

On Mon, Apr 02, 2012 at 09:08:47PM -0400, Floren M. wrote:

I’ve built a nginx-debug RPM but I’m tempted to try 1.1.18 now.
I will use instead 1.0.14, just in case is a nasty bug hidden
somewhere in Nginx. In this way, we fix both versions. These are the

I’m afraid we aren’t going to fix 1.0.x anyway. The 1.1.x branch
will become stable in serveral weeks, and 1.0.x is expected to get
only security fixes.

compile cflags I used:
gcc -c -pipe -O -W -Wall -Wpointer-arith -Wno-unused-parameter
-Wunused-function -Wunused-variable -Wunused-value -Werror -g -O3 -g
-m64 -mtune=nocona -m128bit-long-double -mmmx -msse3 -mfpmath=sse

I’m going to leave the error_log set on error and give it a 256k
buffer so it should not affect the performance. Since we only deal

The error log doesn’t support buffering.

[crit], I don’t think we need to debug anything lower than error.

I need complete request debug log to understand what’s going on
during request processing (and what’s going wrong).

I’ll add the following settings into http:
debug_points abort;
worker_rlimit_core 512M;
working_directory /var/lib/nginx/;

Let me know please if I should do anything extra.

For now, please capture debug log. Anything else (including gdb
dump of the request in question) isn’t that important, but if you
are going to obtain one it might be good idea to recompile nginx
without -O3 as it might produce undesired side-effects in gdb
output. The -O used by default is good enough.

Maxim D.