Weird crash over night with nginx

I got some weird crash of nginx in the kernel logs. The problem is I
cant find out why or what happened. It happened I suppose over night.

This is from the kernel log:

nginx[31486]: segfault at c4 ip 080aacb5 sp bfd79b60 error 4 in
nginx[8048000+73000]
nginx[31484]: segfault at c4 ip 080aacb5 sp bfd79b60 error 4 in
nginx[8048000+73000]
nginx[1528]: segfault at c4 ip 080aacb5 sp bfd79bc0 error 4 in
nginx[8048000+73000]
nginx[1568]: segfault at c4 ip 080aacb5 sp bfd79bc0 error 4 in
nginx[8048000+73000]

and from error_log:
2010/03/21 09:42:35 [notice] 31483#0: signal 17 (SIGCHLD) received
2010/03/21 09:42:35 [alert] 31483#0: worker process 31486 exited on
signal 11
2010/03/21 09:42:35 [notice] 31483#0: start worker process 1528
2010/03/21 09:42:35 [notice] 31483#0: signal 29 (SIGIO) received
2010/03/21 09:43:03 [notice] 31483#0: signal 17 (SIGCHLD) received
2010/03/21 09:43:03 [alert] 31483#0: worker process 31484 exited on
signal 11
2010/03/21 09:43:03 [notice] 31483#0: start worker process 1568
2010/03/21 09:43:03 [notice] 31483#0: signal 29 (SIGIO) received
2010/03/21 09:43:09 [notice] 31483#0: signal 17 (SIGCHLD) received
2010/03/21 09:43:09 [alert] 31483#0: worker process 1528 exited on
signal 11
2010/03/21 09:43:09 [notice] 31483#0: start worker process 1582
2010/03/21 09:43:09 [notice] 31483#0: signal 29 (SIGIO) received
2010/03/21 09:45:09 [notice] 31483#0: signal 17 (SIGCHLD) received
2010/03/21 09:45:09 [alert] 31483#0: worker process 1568 exited on
signal 11
2010/03/21 09:45:09 [notice] 31483#0: start worker process 1757
2010/03/21 09:45:09 [notice] 31483#0: signal 29 (SIGIO) received
2010/03/21 12:00:13 [notice] 31483#0: signal 15 (SIGTERM) received,
exiting

Does any one know what could be the issue?

And how am I suppose to do the backtrace, I mean I cant reproduce the
error or you just need a simple backtrace?

Hello!

On Sun, Mar 21, 2010 at 12:00:39PM +0100, Robert G. wrote:

I got some weird crash of nginx in the kernel logs. The problem is I
cant find out why or what happened. It happened I suppose over night.

[…]

Does any one know what could be the issue?

If you want this to be investigated you may want to provide:

  1. nginx -V output

  2. config

  3. backtrace as obtained from coredump via gdb

Maxim D.

Sorry, I cant reproduce the backtrace, but here is the config and nginx
-V:

System: Ubuntu Linux 9.10 Server with kernel: 2.6.33-server (custom
kernel build)

nginx version: nginx/0.8.34
TLS SNI support enabled
configure arguments: --prefix=/applications/nginx
–conf-path=/etc/nginx/nginx.conf --with-http_ssl_module
–with-http_realip_module --with-http_addition_module
–with-http_flv_module --with-http_gzip_static_module
–with-http_sub_module --http-log-path=/var/log/nginx/access_log
–with-http_perl_module --user=www-data --group=www-data
–http-fastcgi-temp-path=/applications/nginx/tmp/fastcgi
–http-client-body-temp-path=/applications/nginx/tmp/client
–http-proxy-temp-path=/applications/nginx/tmp/proxy
–pid-path=/var/run/nginx.pid --error-log-path=/var/log/nginx/error_log
–with-sha1=/usr/lib --with-md5=/usr/lib --with-file-aio

nginx config:

user www-data;
worker_processes 2;
worker_cpu_affinity 0101 1010;
pid /var/run/nginx.pid;
error_log /var/log/nginx/error_log info;

events {
worker_connections 1024;
use epoll;
}

http {
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local]
“$request” ’
'$status $body_bytes_sent
“$http_referer” ’
‘"$http_user_agent"
“$http_x_forwarded_for”’;
client_body_timeout 60;
client_header_timeout 60;
send_timeout 60;
server_tokens off;
server_names_hash_bucket_size 256;
server_names_hash_max_size 512;
aio on;
directio 1m;
output_buffers 1 128k;
sendfile off;
tcp_nopush off;
tcp_nodelay off;
keepalive_timeout 0;
gzip on;
gzip_comp_level 9;
gzip_buffers 16 8k;
gzip_http_version 1.0;
gzip_min_length 1024;
gzip_vary on;
gzip_proxied off;
gzip_disable msie6;
gzip_types text/plain text/css text/xml text/javascript
application/x-javascript application/xml application/xml+rss;

include /etc/nginx/site-hosts/;
include /etc/nginx/site-users/
;
include /etc/nginx/site-virtual/*;

}

Hello!

On Thu, Mar 25, 2010 at 08:51:17AM +0100, Robert G. wrote:

–with-http_realip_module --with-http_addition_module

[…]

include /etc/nginx/site-hosts/;
include /etc/nginx/site-users/
;
include /etc/nginx/site-virtual/*;

You may want to provide included files as well, they are parts of
config. Or at least samples if they are identical.

Maxim D.

They are over 300 of them, you sure?

Hello!

On Mon, Mar 22, 2010 at 04:14:33PM +0100, Robert G. wrote:

And how am I suppose to do the backtrace, I mean I cant reproduce the
error or you just need a simple backtrace?

You have to configure your system to dump cores, and once you’ll
have coredump run

gdb /path/to/nginx /path/to/nginx.core

then in gdb:

bt

Depending on your OS procedure to enable core dumps is different,
but it may be simplified using nginx’s own global direcitives
worker_rlimit_core and working_directory, e.g.:

worker_rlimit_core 500m;
working_directory /path/to/corefiles;

nginx must have write access to ‘/path/to/corefiles’ directory.

Note well: it’s good idea to make sure your nginx binary isn’t
stripped (e.g. via file(1) command).

If you are unable to reproduce segmentation fault and therefore
unable to obtain coredump - it’s still good idea to provide nginx
-V output and your config. There is a chance that segmentation
fault you’ve seen was already fixed or caused by known bad
configuration.

In nginx 0.8.34 I’m currently aware of at least 4 possible
segfaults: 2 caused by bugs (in fastcgi stderr handling and in
subrequest loop handling; patches are available) and 2 caused by
known bad configurations (error_page 400 redirected to named
location, “if” usage as outlined in IfIsEvil wiki page). Not even
talking about older versions.

Maxim D.