Nginx - workers segfaulting

Dear All,

I’m facing a small problem with NGINX; The workers are segfaulting since
11:20 this morning. From kernel messages I got soemthing like this:

nginx[6888]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]
nginx[6886]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]
nginx[6890]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]
nginx[6889]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]
nginx[6892]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]
nginx[6893]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]
nginx[6894]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]
nginx[6891]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]
nginx[6896]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error
4
in nginx[400000+a8000]

The log files show the following:

2013/12/02 18:13:53 [alert] 26876#0: worker process 30412 exited on
signal
11
2013/12/02 18:13:53 [alert] 26876#0: worker process 30414 exited on
signal
11
2013/12/02 18:13:53 [alert] 26876#0: worker process 30413 exited on
signal
11
2013/12/02 18:13:54 [alert] 26876#0: worker process 30418 exited on
signal
11
2013/12/02 18:13:55 [info] 30417#0: *14388 client closed connection
while
SSL handshaking, client: 10.192.41.251, server: 0.0.0.0:4443
2013/12/02 18:13:56 [info] 30417#0: *14389 client closed connection
while
waiting for request, client: 10.192.41.252, server: 0.0.0.0:80
2013/12/02 18:13:56 [info] 30417#0: *14390 client closed connection
while
SSL handshaking, client: 10.192.41.252, server: 0.0.0.0:2443
2013/12/02 18:13:57 [info] 30417#0: *14397 client closed connection
while
SSL handshaking, client: 10.192.41.252, server: 0.0.0.0:4443
2013/12/02 18:13:57 [info] 30417#0: *14403 client closed connection
while
waiting for request, client: 10.192.41.251, server: 0.0.0.0:80
2013/12/02 18:13:57 [info] 30417#0: *14402 client closed connection
while
SSL handshaking, client: 10.192.41.251, server: 0.0.0.0:2443

I can provide some cores, but I can’t attach them here. My setup was
running
fine till today (which has some coincidence with a new webservice
deployed).

Please could you provide some extra information on how to further debug
this
issue ?

NM

Posted at Nginx Forum:

Hi!

I’m facing a small problem with NGINX; The workers are segfaulting since
11:20 this morning.
[…]
I can provide some cores, but I can’t attach them here. My setup was running
fine till today (which has some coincidence with a new webservice
deployed).

Please could you provide some extra information on how to further debug this
issue ?

A few things the developers will probably ask anyway:

  • exact output from “nginx -V”
  • do use any nginx modules?
  • do you use any third party modules?
  • can post and explain your configuration (at least partially)?
  • you said you can provide cores, can you post a backtrace?
  • some details about the underlying OS (virtualization,
    cpu/ram/architecture/nic/kernel releases)?

Regards,

Lukas

Adittionally:

Name : nginx Relocations: (not
relocatable)
Version : 1.4.3 Vendor: nginx inc.
Release : 1.el6.ngx Build Date: Tue 08 Oct 2013
02:34:32 PM WEST
Install Date: Mon 02 Dec 2013 05:35:41 PM WET Build Host:
centos6-amd64-ovl.t.nginx.com
Group : System Environment/Daemons Source RPM:
nginx-1.4.3-1.el6.ngx.src.rpm
Size : 788369 License: 2-clause
BSD-like
license
Signature : RSA/SHA1, Tue 08 Oct 2013 03:13:36 PM WEST, Key ID
abf5bd827bd9bf62
URL : http://nginx.org/
Summary : High performance web server
Description :
nginx [engine x] is an HTTP and reverse proxy server, as well as
a mail proxy server.

Running on fully updated CentOS 6.4.

Posted at Nginx Forum:

  • exact output from “nginx -V”

Name : nginx Relocations: (not
relocatable)
Version : 1.4.4 Vendor: nginx inc.
Release : 1.el6.ngx Build Date: Tue 19 Nov 2013
12:11:15 PM WET
Install Date: Mon 02 Dec 2013 06:33:31 PM WET Build Host:
centos6-amd64-ovl.t.nginx.com
Group : System Environment/Daemons Source RPM:
nginx-1.4.4-1.el6.ngx.src.rpm
Size : 788337 License: 2-clause
BSD-like
license
Signature : RSA/SHA1, Tue 19 Nov 2013 01:03:00 PM WET, Key ID
abf5bd827bd9bf62
URL : http://nginx.org/
Summary : High performance web server
Description :
nginx [engine x] is an HTTP and reverse proxy server, as well as
a mail proxy server.

Running on CentOS 6.4. Same happens with 1.4.3 (I’ve tested a
downgrade).

  • do use any nginx modules?

No. Plain upstream vanilla.

  • do you use any third party modules?

No. Upstream binary distribution package (rpm)

  • can post and explain your configuration (at least partially)?

NGINX is a reverse proxy for a vhosted tomcat with openSSL. What tokens
from
the configuration do you require?

  • you said you can provide cores, can you post a backtrace?

I’m going to attach nginx master process to gdb and check it out. I
can’t
really attach to workers since they segfault quite fast.

  • some details about the underlying OS (virtualization,
    cpu/ram/architecture/nic/kernel releases)?

CentOS 6.4 (full updated) - running on vmware 5.1u1 - 4vcpu’s, 6GB RAM,
etc…
[root@iweb-as2 ~]# uname -a
Linux XXXXXXXXXXXX 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 18:37:12
UTC
2013 x86_64 x86_64 x86_64 GNU/Linux

Posted at Nginx Forum:

I’m facing a small problem with gdb and separate debuginfo’s. Do you
build
with the ‘-g’ compiler option?

[root@XXXX2 nginx]# rpm -qa | grep nginx
nginx-1.4.4-1.el6.ngx.x86_64
nginx-debug-1.4.4-1.el6.ngx.x86_64

[root@XXXX2 nginx]# gdb --pid 39019
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show
copying”
and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Attaching to process 39019
Reading symbols from /usr/sbin/nginx…
warning: the debug information found in “/usr/sbin/nginx.debug” does not
match “/usr/sbin/nginx” (CRC mismatch).

(no debugging symbols found)…done.
Reading symbols from /lib64/libpthread.so.0…Reading symbols from
/usr/lib/debug/lib64/libpthread-2.12.so.debug…done.
[Thread debugging using libthread_db enabled]
done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1…Reading symbols from
/usr/lib/debug/lib64/libcrypt-2.12.so.debug…done.
done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libpcre.so.0…Reading symbols from
/usr/lib/debug/lib64/libpcre.so.0.0.1.debug…done.
done.
Loaded symbols for /lib64/libpcre.so.0
Reading symbols from /usr/lib64/libssl.so.10…Reading symbols from
/usr/lib/debug/usr/lib64/libssl.so.1.0.0.debug…done.
done.
Loaded symbols for /usr/lib64/libssl.so.10
Reading symbols from /usr/lib64/libcrypto.so.10…Reading symbols from
/usr/lib/debug/usr/lib64/libcrypto.so.1.0.0.debug…done.
done.
Loaded symbols for /usr/lib64/libcrypto.so.10
Reading symbols from /lib64/libdl.so.2…Reading symbols from
/usr/lib/debug/lib64/libdl-2.12.so.debug…done.
done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libz.so.1…Reading symbols from
/usr/lib/debug/lib64/libz.so.1.2.3.debug…done.
done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libc.so.6…Reading symbols from
/usr/lib/debug/lib64/libc-2.12.so.debug…done.
done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2…Reading symbols from
/usr/lib/debug/lib64/ld-2.12.so.debug…done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libfreebl3.so…Reading symbols from
/usr/lib/debug/lib64/libfreebl3.so.debug…done.
done.
Loaded symbols for /lib64/libfreebl3.so
Reading symbols from /lib64/libgssapi_krb5.so.2…Reading symbols from
/usr/lib/debug/lib64/libgssapi_krb5.so.2.2.debug…done.
done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libkrb5.so.3…Reading symbols from
/usr/lib/debug/lib64/libkrb5.so.3.3.debug…done.
done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2…Reading symbols from
/usr/lib/debug/lib64/libcom_err.so.2.1.debug…done.
done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libk5crypto.so.3…Reading symbols from
/usr/lib/debug/lib64/libk5crypto.so.3.1.debug…done.
done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /lib64/libkrb5support.so.0…Reading symbols from
/usr/lib/debug/lib64/libkrb5support.so.0.1.debug…done.
done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1…Reading symbols from
/usr/lib/debug/lib64/libkeyutils.so.1.3.debug…done.
done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2…Reading symbols from
/usr/lib/debug/lib64/libresolv-2.12.so.debug…done.
done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1…Reading symbols from
/usr/lib/debug/lib64/libselinux.so.1.debug…done.
done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libnss_files.so.2…Reading symbols from
/usr/lib/debug/lib64/libnss_files-2.12.so.debug…done.
done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /lib64/libnss_sss.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/libnss_sss.so.2
0x00007ffa45a45f23 in __epoll_wait_nocancel () at
…/sysdeps/unix/syscall-template.S:82
82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
Missing separate debuginfos, use: debuginfo-install
nginx-1.4.4-1.el6.ngx.x86_64

Posted at Nginx Forum:

Hello!

On Mon, Dec 02, 2013 at 01:15:25PM -0500, nmarques wrote:

in nginx[400000+a8000]
nginx[6896]: segfault at 8 ip 0000000000426a30 sp 00007fff85c01e70 error 4
2013/12/02 18:13:54 [alert] 26876#0: worker process 30418 exited on signal
waiting for request, client: 10.192.41.251, server: 0.0.0.0:80
2013/12/02 18:13:57 [info] 30417#0: *14402 client closed connection while
SSL handshaking, client: 10.192.41.251, server: 0.0.0.0:2443

I can provide some cores, but I can’t attach them here. My setup was running
fine till today (which has some coincidence with a new webservice
deployed).

Please could you provide some extra information on how to further debug this
issue ?

From the messages I suspect you are hitting this bug:

http://trac.nginx.org/nginx/ticket/235

Please follow suggested workaround to see if it helps (i.e., move
the “ssl_session_cache” directive to http{} level or use the same
value in all server{} blocks listening on the same socket).

If it doesn’t help, please follow debugging hints here:


Maxim D.
http://nginx.org/en/donation.html

Maxim,
Right on dude. Anyway you can have this patch merged into trunk for the
next
release? So far I have blocked nginx updates.

NM

Posted at Nginx Forum:

Hi!

I’m facing a small problem with gdb and separate debuginfo’s. Do you build
with the ‘-g’ compiler option?

Probably not. Please check with
file /usr/sbin/nginx

Does the repository contain a special debug build like nginx-debug or
something? Could you install it?

Whoever maintains the centos binaries on nginx.org, please advise howto
get the symbol informations; /usr/sbin/nginx.debug doesn’t seem to
contain
it:

warning: the debug information found in “/usr/sbin/nginx.debug” does not
match “/usr/sbin/nginx” (CRC mismatch).

[root@XXXX2 nginx]# gdb --pid 39019

Please let it properly dump a core.

Here is an example howto configure nginx so the workers can actually
coredump:

Thanks,

Lukas

Hello!

On Tue, Dec 03, 2013 at 11:11:24AM -0500, nmarques wrote:

Maxim,
Right on dude. Anyway you can have this patch merged into trunk for the next
release?

The patch as in the ticket is wrong, it only hides the real
problem. Proper patch to solve the problem is to be coded.

As the problem can be easily resolved by using symmetrical session
cache configuration (better yet, using a single session cache at
http level), it’s not a high priority task.

So far I have blocked nginx updates.

Looks like a silly thing to do. The problem you are seeing was a
result of a configuration change you’ve done, not of an nginx
update. And blocking updates will only make sure you’ll never get
a fix.


Maxim D.
http://nginx.org/en/donation.html

Hi!

The patch as in the ticket is wrong, it only hides the real
problem. Proper patch to solve the problem is to be coded.

As the problem can be easily resolved by using symmetrical session
cache configuration (better yet, using a single session cache at
http level), it’s not a high priority task.

Agreed, the configuration workaround is viable; but the problem lies
in the actual troubleshooting. Coming to this conclusion takes time,
time the users don’t have when the workers are crashing. Its not always
possible to rollback the configuration or understanding right away what
particularity caused the crash (after a move of vhosts from one server
to
another for example).

So perhaps until a proper fix is ready, we can add a note in the
documentation:
http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_session_cache

Its not about the workaround, its about knowing such issues and
limitations
in advance.

Regards,

Lukas

On 4 December 2013 09:36, Lukas T. [email protected] wrote:

Its not about the workaround, its about knowing such issues and limitations
in advance.

+1.

Hello,

On Wed, Dec 4, 2013 at 12:48 PM, Jonathan M.
[email protected]wrote:

Module ngx_http_ssl_module

Its not about the workaround, its about knowing such issues and
limitations
in advance.

+1.

​I also agree on that.​


B. R.