"worker process XXXXX exited on signal 11" following freeBSD -> Ubuntu migration

Any advice on tracking down the cause of “worker process XXXXX exited on
signal 11” errors? Thank you in advance for your help.

We have been running NGINX 0.7.65 on freeBSD for a few months. Because
of issues getting the fail-over to work properly, 1 week ago we migrated
from freeBSD to Linux (Ubuntu) running version 0.7.65. In this first
week, we saw perhaps 2 dozen “worker process XXXXX exited on signal 11”
errors per hour and have been experiencing dropped web connections at a
rate that seems to coincide with the “exited on signal 11” errors. We
tried many different configuration changes, and finally this afternoon
upgraded to 0.8.36. Unfortunately we continue to see the “worker process
XXXXX exited on signal 11” errors.

Other possible factors: We have been using keepalived to verify that
NGINX is accepting web traffic on port 80 every second.

This is our configuration file:

# Nginx configuration file
user  nobody;
worker_processes  1;

events {
    worker_connections  1024;
}

http {

    gzip on;
    gzip_types text/plain text/css application/x-javascript 
application/javascript;
    gzip_disable "MSIE [1-6]\.";

    # This controls upload size
    client_max_body_size 45M;

    # How long to wait for upstream server response (seconds)
    proxy_read_timeout 600;

    upstream php5 {
        server 192.168.1.121 weight=1;
        server 192.168.1.123 weight=1;
    }


    # keeps connection to same web servers
    upstream webserversessions {
        ip_hash;
        server 192.168.1.121;
        server 192.168.1.123;
    }

    ## Default for all sites
    server {
        listen 80;
        location / {
            proxy_pass http://php5$request_uri;
            proxy_set_header    Host            $host;
            proxy_set_header    X-Real-IP       $remote_addr;
            proxy_set_header    X-Forwarded-For 
$proxy_add_x_forwarded_for;
            proxy_set_header    X-Forwarded-By  $server_name;
        }

        # Add expires headers
        location ~* ^.+\.(js)$ {
            expires modified +24h;
            proxy_pass http://php5$request_uri;
        }

        # Stats reporting
        location /nginx_status {
          stub_status on;
          access_log off;
          allow 192.168.234.0/24;
          allow 192.168.1.0/24;
          deny all;
        }
    }

    ## maintain session required for openID
    server {
        listen 80;
        server_name     community.llli.org;
        location / {
            proxy_pass http://webserversessions$request_uri;
            proxy_set_header    Host            $host;
            proxy_set_header    X-Real-IP       $remote_addr;
            proxy_set_header    X-Forwarded-For 
$proxy_add_x_forwarded_for;
        }
    }

   ## For site that needs sessions
   server {
       listen 80;
       server_name     files.golightly.com;
       location / {
           proxy_pass http://webserversessions$request_uri;
           proxy_set_header    Host            $host;
           proxy_set_header    X-Real-IP       $remote_addr;
           proxy_set_header    X-Forwarded-For 
$proxy_add_x_forwarded_for;
           proxy_set_header    X-Forwarded-By  $server_name;
       }
   }

    ## SSL sites ##

    server {  # Need to add this server section with unique IP for each 
SSL site we serve
        listen                          38.127.224.114:443;
        ssl                             on;
        ssl_certificate                 /etc/ssl/www.latchon.org.pem;
        ssl_certificate_key 
/etc/ssl/private/www.latchon.org.key;
        ssl_protocols                   SSLv2 SSLv3 TLSv1;
        ssl_ciphers 
ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
        ssl_prefer_server_ciphers       on;

        location / {
            proxy_pass          http://php5$request_uri;
            proxy_set_header    Host            $host;
            proxy_set_header    X-Real-IP       $remote_addr;
            proxy_set_header    X-Forwarded-For 
$proxy_add_x_forwarded_for;
            proxy_set_header    X-Forwarded-By  $server_name:443;
        }
    }

    server {
        listen                          38.127.224.112:443;
        ssl                             on;
        ssl_certificate 
/etc/ssl/networking.cccu.org.pem;
        ssl_certificate_key 
/etc/ssl/private/networking.cccu.org.key;
        ssl_protocols                   SSLv2 SSLv3 TLSv1;
        ssl_ciphers 
ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
        ssl_prefer_server_ciphers       on;

        location / {
            proxy_pass          http://php5$request_uri;
            proxy_set_header    Host            $host;
            proxy_set_header    X-Real-IP       $remote_addr;
            proxy_set_header    X-Forwarded-For 
$proxy_add_x_forwarded_for;
            proxy_set_header    X-Forwarded-By  $server_name:443;
        }
    }

    server {
        listen                          38.127.224.113:443;
        ssl                             on;
        ssl_certificate 
/etc/ssl/backstage.codenomicon.com.chain.pem;
        ssl_certificate_key 
/etc/ssl/private/backstage.codenomicon.com.key;
        ssl_protocols                   SSLv2 SSLv3 TLSv1;
        ssl_ciphers 
ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
        ssl_prefer_server_ciphers       on;

        location / {
            proxy_pass          http://php5$request_uri;
            proxy_set_header    Host            $host;
            proxy_set_header    X-Real-IP       $remote_addr;
            proxy_set_header    X-Forwarded-For 
$proxy_add_x_forwarded_for;
            proxy_set_header    X-Forwarded-By  $server_name:443;
        }
    }

}

Posted at Nginx Forum:

On Fri, Apr 30, 2010 at 03:22:20AM -0400, DaleMcGrew wrote:

Any advice on tracking down the cause of “worker process XXXXX exited on signal 11” errors? Thank you in advance for your help.

We have been running NGINX 0.7.65 on freeBSD for a few months. Because of issues getting the fail-over to work properly, 1 week ago we migrated from freeBSD to Linux (Ubuntu) running version 0.7.65. In this first week, we saw perhaps 2 dozen “worker process XXXXX exited on signal 11” errors per hour and have been experiencing dropped web connections at a rate that seems to coincide with the “exited on signal 11” errors. We tried many different configuration changes, and finally this afternoon upgraded to 0.8.36. Unfortunately we continue to see the “worker process XXXXX exited on signal 11” errors.

Other possible factors: We have been using keepalived to verify that NGINX is accepting web traffic on port 80 every second.

This is our configuration file:

...

Could you provide

$ nginx -v
$ uname -a


Sergey A. Osokin,
[email protected]
[email protected]

[root@lb01 ~]# /usr/local/nginx/sbin/nginx -v
nginx version: nginx/0.8.36

[root@lb01 ~]# uname -a
Linux lb01.golightly.com 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT
2009 x86_64 x86_64 x86_64 GNU/Linux

Posted at Nginx Forum:

Hello!

On Fri, Apr 30, 2010 at 03:22:20AM -0400, DaleMcGrew wrote:

different configuration changes, and finally this afternoon
upgraded to 0.8.36. Unfortunately we continue to see the “worker
process XXXXX exited on signal 11” errors.

Could you please show nginx -V output, and obtain coredump and
show backtrace? Configuring something like this in nginx.conf
should be enough to obtain one even in Linux:

working_directory /path/to/cores;
worker_rlimit_core 500M;

Note that nginx workers should be able to write to /path/to/cores
directory.

Also please make sure you have no third party modules/patches
compiled in (and/or reproduce the problem without them, if any).

[…]

    location / {
        proxy_pass http://php5$request_uri;

Just curious: why do you use this form instead of

          proxy_pass http://php5;

?

[…]

    # Add expires headers
    location ~* ^.+\.(js)$ {

Just a note: there is no need to use “^.+”. And brackets just
produce capture which is not used. The following string will do
the same with less cpu burn:

      location ~* \.js$ {

[…]

Maxim D.

Hello!

On Mon, May 03, 2010 at 01:37:20AM -0400, dflook wrote:

I’m working together with Dale on this issue. We turned on debug
logging, and each time the worker process segfaults, it seems to
be right after checking SSL handshake. Am I reading this
correctly? Here are two examples in the excerpt below:

[…]

2010/05/02 22:27:53 [debug] 18478#0: *17046 posix_memalign: 0000000009336C10:4096 @16
2010/05/02 22:27:53 [debug] 18478#0: *17046 http check ssl handshake
2010/05/02 22:27:53 [debug] 18478#0: *17046 https ssl handshake: 0x16
2010/05/02 22:27:53 [notice] 15605#0: signal 17 (SIGCHLD) received

Yes, it’s right after nginx got first bytes of SSL handshake and
passed them to OpenSSL. This may indicate either bug in openssl
library (or some corruption in your particular installation) or
some openssl-related bug in nginx.

Have you tried to obtain backtrace as I previously suggested?

Maxim D.

I’m working together with Dale on this issue. We turned on debug
logging, and each time the worker process segfaults, it seems to be
right after checking SSL handshake. Am I reading this correctly? Here
are two examples in the excerpt below:

2010/05/02 22:27:52 [debug] 18478#0: worker cycle
2010/05/02 22:27:52 [debug] 18478#0: epoll timer: 59998
2010/05/02 22:27:53 [debug] 18478#0: epoll: fd:7 ev:0001
d:00000000092BFB20
2010/05/02 22:27:53 [debug] 18478#0: accept on 38.127.224.114:443,
ready: 0
2010/05/02 22:27:53 [debug] 18478#0: posix_memalign:
00000000092724B0:256 @16
2010/05/02 22:27:53 [debug] 18478#0: *17046 accept: 66.249.68.173 fd:22
2010/05/02 22:27:53 [debug] 18478#0: *17046 event timer add: 22:
60000:1272864533191
2010/05/02 22:27:53 [debug] 18478#0: *17046 epoll add event: fd:22 op:1
ev:80000001
2010/05/02 22:27:53 [debug] 18478#0: timer delta: 201
2010/05/02 22:27:53 [debug] 18478#0: posted events 0000000000000000
2010/05/02 22:27:53 [debug] 18478#0: worker cycle
2010/05/02 22:27:53 [debug] 18478#0: epoll timer: 59797
2010/05/02 22:27:53 [debug] 18478#0: epoll: fd:22 ev:0001
d:00000000092BFDE1
2010/05/02 22:27:53 [debug] 18478#0: *17046 malloc:
0000000009257330:1248
2010/05/02 22:27:53 [debug] 18478#0: *17046 posix_memalign:
00000000092617F0:256 @16
2010/05/02 22:27:53 [debug] 18478#0: *17046 malloc:
000000000928EB00:1024
2010/05/02 22:27:53 [debug] 18478#0: *17046 posix_memalign:
0000000009336C10:4096 @16
2010/05/02 22:27:53 [debug] 18478#0: *17046 http check ssl handshake
2010/05/02 22:27:53 [debug] 18478#0: *17046 https ssl handshake: 0x16
2010/05/02 22:27:53 [notice] 15605#0: signal 17 (SIGCHLD) received
2010/05/02 22:27:53 [alert] 15605#0: worker process 18478 exited on
signal 11
2010/05/02 22:27:53 [debug] 15605#0: wake up, sigio 0
2010/05/02 22:27:53 [debug] 15605#0: reap children
2010/05/02 22:27:53 [debug] 15605#0: child: 0 18478 e:0 t:1 d:0 r:1 j:0
2010/05/02 22:27:53 [debug] 15605#0: channel 3:4
2010/05/02 22:27:53 [debug] 18584#0: malloc: 00000000092A4CD0:6144
2010/05/02 22:27:53 [debug] 18584#0: malloc: 00000000092BFA70:180224
2010/05/02 22:27:53 [debug] 18584#0: malloc: 00000000092EBA80:106496
2010/05/02 22:27:53 [debug] 18584#0: malloc: 0000000009305A90:106496
2010/05/02 22:27:53 [debug] 18584#0: epoll add event: fd:6 op:1
ev:00000001
2010/05/02 22:27:53 [debug] 18584#0: epoll add event: fd:7 op:1
ev:00000001
2010/05/02 22:27:53 [debug] 18584#0: epoll add event: fd:8 op:1
ev:00000001
2010/05/02 22:27:53 [debug] 18584#0: epoll add event: fd:9 op:1
ev:00000001
2010/05/02 22:27:53 [debug] 18584#0: epoll add event: fd:4 op:1
ev:00000001
2010/05/02 22:27:53 [debug] 18584#0: setproctitle: “nginx: worker
process”
2010/05/02 22:27:53 [debug] 18584#0: worker cycle
2010/05/02 22:27:53 [debug] 18584#0: epoll timer: -1
2010/05/02 22:27:53 [notice] 15605#0: start worker process 18584
2010/05/02 22:27:53 [debug] 15605#0: sigsuspend
2010/05/02 22:27:53 [debug] 18584#0: epoll: fd:7 ev:0001
d:00000000092BFB20
2010/05/02 22:27:53 [debug] 18584#0: accept on 38.127.224.114:443,
ready: 0
2010/05/02 22:27:53 [debug] 18584#0: posix_memalign:
0000000009272CF0:256 @16
2010/05/02 22:27:53 [debug] 18584#0: *17047 accept: 66.249.68.173 fd:3
2010/05/02 22:27:53 [debug] 18584#0: *17047 event timer add: 3:
60000:1272864533371
2010/05/02 22:27:53 [debug] 18584#0: *17047 epoll add event: fd:3 op:1
ev:80000001
2010/05/02 22:27:53 [debug] 18584#0: timer delta: 40
2010/05/02 22:27:53 [debug] 18584#0: posted events 0000000000000000
2010/05/02 22:27:53 [debug] 18584#0: worker cycle
2010/05/02 22:27:53 [debug] 18584#0: epoll timer: 60000
2010/05/02 22:27:53 [debug] 18584#0: epoll: fd:3 ev:0001
d:00000000092BFDE0
2010/05/02 22:27:53 [debug] 18584#0: *17047 malloc:
00000000092ABA60:1248
2010/05/02 22:27:53 [debug] 18584#0: *17047 posix_memalign:
000000000924FDA0:256 @16
2010/05/02 22:27:53 [debug] 18584#0: *17047 malloc:
000000000923D700:1024
2010/05/02 22:27:53 [debug] 18584#0: *17047 posix_memalign:
00000000092A3940:4096 @16
2010/05/02 22:27:53 [debug] 18584#0: *17047 http check ssl handshake
2010/05/02 22:27:53 [debug] 18584#0: *17047 https ssl handshake: 0x80
2010/05/02 22:27:53 [notice] 15605#0: signal 17 (SIGCHLD) received
2010/05/02 22:27:53 [alert] 15605#0: worker process 18584 exited on
signal 11

Posted at Nginx Forum:

Hi Maxim, thank you so much for your feedback. Last I talked to Dflook
he was having trouble getting the backtrace and core dump due to
compilation problems. He is going to keep trying.

I just ran this per your request:

[root@lb01 ~]# /usr/local/nginx/sbin/nginx -V
nginx version: nginx/0.8.36
built by gcc 4.1.2 20080704 (Red Hat 4.1.2-46)
TLS SNI support disabled
configure arguments: --with-http_ssl_module
–with-http_stub_status_module
–http-log-path=/usr/local/www/logs/access.log
–error-log-path=/usr/local/www/logs/error.log --with-debug

Posted at Nginx Forum: