I just wanted to let the developers know about some error flags I
discovered into Nginx 1.0.14, with debug mode disabled.
Socket leaks and pread:
2012/03/24 23:29:52 [alert] 10770#0: open socket #46 left in connection
5
2012/03/24 23:29:52 [alert] 10770#0: open socket #115 left in connection
54
2012/03/24 23:29:52 [alert] 10770#0: open socket #110 left in connection
107
2012/03/24 23:29:52 [alert] 10770#0: aborting
2012/03/24 23:29:52 [alert] 10772#0: open socket #44 left in connection
57
2012/03/24 23:29:52 [alert] 10772#0: open socket #38 left in connection
140
2012/03/24 23:29:52 [alert] 10772#0: aborting
2012/03/25 00:01:35 [alert] 4105#0: *14584 pread() read only 0 of 5733
from “/var/www/domain.com/index.html” while sending response to client,
client: xx.xxx.xx.xxx, server: www.domain.com, request: “GET /
HTTP/1.1”, host: “www.domain.com”
We are running Nginx on CentOS 5.8 64bits, with openssl 0.9.8e-22.el5.
All ssl directives are located into host, only the ssl_certificate and
ssl_certificate_key are into server.
uname -a
Linux chronos.domain.com 2.6.18-274.18.1.el5 #1 SMP Thu Feb 9 12:45:44
EST 2012 x86_64 x86_64 x86_64 GNU/Linux
On Sun, Mar 25, 2012 at 11:55:53PM -0400, TECK wrote:
107
2012/03/24 23:29:52 [alert] 10770#0: aborting
2012/03/24 23:29:52 [alert] 10772#0: open socket #44 left in connection
57
2012/03/24 23:29:52 [alert] 10772#0: open socket #38 left in connection
140
2012/03/24 23:29:52 [alert] 10772#0: aborting
Do you see this as a regression from some previous version? If
yes - which one? Do you see the same problem in 1.1.x?
2012/03/25 00:01:35 [alert] 4105#0: *14584 pread() read only 0 of 5733
from “/var/www/domain.com/index.html” while sending response to client,
client: xx.xxx.xx.xxx, server: www.domain.com, request: “GET /
HTTP/1.1”, host: “www.domain.com”
This usually happens if you update files non-atomically, i.e. edit
files in-place instead of creating new file and then renaming it
to desired name. Obvious solution is to update files atomically.
Note well: by using open_file_cache you allow much bigger time
frame for non-atomic updates to trigger problems. If you can’t
eliminate non-atomic updates it’s a good idea to avoid using
open_file_cache.
We are running Nginx on CentOS 5.8 64bits, with openssl 0.9.8e-22.el5.
As openssl 0.9.8e is quite old, I assume it’s heavily modified by
your OS vendor. Do you see the same errors if you compile nginx with
recent vanilla openssl (0.9.8u, 1.0.0h or 1.0.1 will be ok)?
Do you see this as a regression from some previous version? If
yes - which one? Do you see the same problem in 1.1.x?
I used before 1.0.12 and did not experienced the socket leaks.
This usually happens if you update files non-atomically, i.e. edit
files in-place instead of creating new file and then renaming it
to desired name. Obvious solution is to update files atomically.
Thanks, that is what I was doing, editing the file with nano.
As openssl 0.9.8e is quite old, I assume it’s heavily modified by
your OS vendor. Do you see the same errors if you compile nginx with
recent vanilla openssl (0.9.8u, 1.0.0h or 1.0.1 will be ok)?
We are using the default openssl version available in CentOS 5.8.
I could look into that but we are talking hundreds of thousands of
servers still using 0.9.8e.
Personally I’m not comfortable yet moving to CentOS 6.2. I will create
an openssl-1.0.1 RPM for CentOS 5.8 and test it on a development server,
then move it into production. Still, I don’t recall noticing any SSL
errors on previous Nginx version (1.0.12).
On Mon, Mar 26, 2012 at 08:04:13AM -0400, TECK wrote:
Hi Maxim,
Do you see this as a regression from some previous version? If
yes - which one? Do you see the same problem in 1.1.x?
I used before 1.0.12 and did not experienced the socket leaks.
That’s really strange, changes between 1.0.12 and 1.0.14 are
minimal. Could you please re-try with 1.0.12 to see if it works
for you without problems?
As openssl 0.9.8e is quite old, I assume it’s heavily modified by
your OS vendor. Do you see the same errors if you compile nginx with
recent vanilla openssl (0.9.8u, 1.0.0h or 1.0.1 will be ok)?
We are using the default openssl version available in CentOS 5.8.
I could look into that but we are talking hundreds of thousands of
servers still using 0.9.8e.
I’m mostly concerned by local changes by your OS vendor, not about
openssl 0.9.8e by itself. BTW, when you’ve upgraded your openssl
last time? I.e. did the same openssl package version worked for
you before, or you’ve upgraded it with nginx as well?
Personally I’m not comfortable yet moving to CentOS 6.2. I will create
an openssl-1.0.1 RPM for CentOS 5.8 and test it on a development server,
then move it into production. Still, I don’t recall noticing any SSL
errors on previous Nginx version (1.0.12).
As already suggested - you may build nginx with any particular
openssl version statically, by using --with-openssl= configure
argument.
No need to - it will be statically compiled. Just drop the “make
%{?_smp_mflags}” part for just “make” - otherwise not going to work with
OpenSSL 1.0.1 due to a bug. And you will be able to use EC EDH - a lot
better than just Ephemeral Diffie-Hellman (if you need Perfect Forward
Secrecy, of course); most of the browsers already support it.
‘–with-openssl=/usr/src/redhat/SOURCES/openssl-1.0.1
–with-openssl-opt=enable-ec_nistp_64_gcc_128 --with-cc=/usr/bin/gcc44’
will do the job for a 64bit build. Just make sure you are using gcc44
(export CC=/usr/bin/gcc44).
I mention this because the [crit] issues were present in the past with
Nginx 0.8 and they were solved by Igor. I was hoping a similar fix would
be provided for latest stable version. Do you recommend me to switch to
development version? Upgrading openssl is not a viable solution at the
present time and we are getting a fairly large number of [crit]
SSL_write() errors in our logs.
That’s really strange, changes between 1.0.12 and 1.0.14 are
minimal. Could you please re-try with 1.0.12 to see if it works
for you without problems?
Will do, Maxim. I have to rebuild the RPM again, as I tossed the
previous version from yum repository.
As already suggested - you may build nginx with any particular
openssl version statically, by using --with-openssl= configure
argument.
Personally, I don’t think is a good idea at all to compile source on a
production server. I only use RPM’s for the sake of easy upgrades, I
build my own missing RPM’s for that task. Either ways, the answer is
clear, in order to have Nginx working properly with SSL in a production
environment will require to upgrade to CentOS 6 which has openssl 1.0.0
RPM available. I thought it was a bug that generates those random crit
issues.
If I disable the ssl_prefer_server_ciphers, the [crit] errors are gone.
On the other hand, I cannot use anymore the RC4. Any idea what could
cause this?
As already suggested - you may build nginx with any particular
openssl version statically, by using --with-openssl= configure
argument.
I followed your advice and built a backlevel RPM for libcripto.so6 and
libssl.so6 so none of the deps are broken in CentOS 5. Then, I built the
OpenSSL 1.0.1 RPM’s and rebuilt Nginx against the latest libs:
yum list openssl* nginx
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Even if I eliminated the OpenSSL version issues, I still have random
[crit] SSL_write() failures at the same frequency as before. They are
also accompanied by open socket alerts, of this format:
[alert] 2380#0: open socket #34 left in connection 12
Note you’ll need to recompile nginx with “–with-debug” configure
argument to obtain one.
How safe is to put in production a debug built of Nginx?
I could easily compile a different built, but I’m concerned of running
this on a live environment as this is where the errors occur.
On Mon, Apr 02, 2012 at 12:15:10PM -0400, Floren M. wrote:
Note you’ll need to recompile nginx with “–with-debug” configure
argument to obtain one.
How safe is to put in production a debug built of Nginx?
I could easily compile a different built, but I’m concerned of
running this on a live environment as this is where the errors
occur.
It’s safe and only adds minimal overhead on logging level checks
as long as debug log isn’t enabled.
Note though that writing debug logs consume extra resources, and
it may be noticeable under high load. The debug_connection
directive allows to enable debug logging only for selected ip
addresses to minimize impact.
Please also check if nginx actually uses new openssl library, ldd
should be helpful here.
server {
}
Even if I eliminated the OpenSSL version issues, I still have random
[crit] SSL_write() failures at the same frequency as before. They
are also accompanied by open socket alerts, of this format:
[alert] 2380#0: open socket #34 left in connection 12
I’m looking forward to your suggestions.
As already suggested, it whould be cool to check if you see the
same problem in 1.1.x.
And to proceed further we need debug log, see here:
Note you’ll need to recompile nginx with “–with-debug” configure
argument to obtain one.
It’s safe and only adds minimal overhead on logging level checks
as long as debug log isn’t enabled.
Note though that writing debug logs consume extra resources, and
it may be noticeable under high load. The debug_connection
directive allows to enable debug logging only for selected ip
addresses to minimize impact.
I’ve built a nginx-debug RPM but I’m tempted to try 1.1.18 now.
I will use instead 1.0.14, just in case is a nasty bug hidden somewhere
in Nginx. In this way, we fix both versions. These are the compile
cflags I used:
gcc -c -pipe -O -W -Wall -Wpointer-arith -Wno-unused-parameter
-Wunused-function -Wunused-variable -Wunused-value -Werror -g -O3 -g
-m64 -mtune=nocona -m128bit-long-double -mmmx -msse3 -mfpmath=sse
I’m going to leave the error_log set on error and give it a 256k buffer
so it should not affect the performance. Since we only deal [crit], I
don’t think we need to debug anything lower than error.
I’ll add the following settings into http:
debug_points abort;
worker_rlimit_core 512M;
working_directory /var/lib/nginx/;
I’m afraid we aren’t going to fix 1.0.x anyway. The 1.1.x branch
will become stable in serveral weeks, and 1.0.x is expected to get
only security fixes.
Thanks, I will install a debug version of 1.1.18 into live server.
For now, please capture debug log. Anything else (including gdb
dump of the request in question) isn’t that important, but if you
are going to obtain one it might be good idea to recompile nginx
without -O3 as it might produce undesired side-effects in gdb
output. The -O used by default is good enough.
I’m afraid I will have to keep the -O3 for sake of compile consistency
with other RPM’s. I will do a debug capture on latest devel version, the
[crit] errors still occur with 1.1.18 version and OpenSSL 1.0.1.
On Mon, Apr 02, 2012 at 09:08:47PM -0400, Floren M. wrote:
I’ve built a nginx-debug RPM but I’m tempted to try 1.1.18 now.
I will use instead 1.0.14, just in case is a nasty bug hidden
somewhere in Nginx. In this way, we fix both versions. These are the
I’m afraid we aren’t going to fix 1.0.x anyway. The 1.1.x branch
will become stable in serveral weeks, and 1.0.x is expected to get
only security fixes.
I’m going to leave the error_log set on error and give it a 256k
buffer so it should not affect the performance. Since we only deal
The error log doesn’t support buffering.
[crit], I don’t think we need to debug anything lower than error.
I need complete request debug log to understand what’s going on
during request processing (and what’s going wrong).
I’ll add the following settings into http:
debug_points abort;
worker_rlimit_core 512M;
working_directory /var/lib/nginx/;
Let me know please if I should do anything extra.
For now, please capture debug log. Anything else (including gdb
dump of the request in question) isn’t that important, but if you
are going to obtain one it might be good idea to recompile nginx
without -O3 as it might produce undesired side-effects in gdb
output. The -O used by default is good enough.
Maxim D.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.