Keep Alive piles up

Since version 0.8.x I’ve noticed that my keep-alive sessions are not
removed properly (see http://xgame.pl/nginx/ ). I’m guessing that it’s
connected to keepalive_requests function, since it was introduced in
0.8.x.

FYI: Server config:
keepalive_timeout 10
keepalive_requests 250
nginx 0.8.4

Posted at Nginx Forum:

Hello!

On Thu, Jun 25, 2009 at 10:07:08AM -0400, meto wrote:

Since version 0.8.x I’ve noticed that my keep-alive sessions are not removed properly (see http://xgame.pl/nginx/ ). I’m guessing that it’s connected to keepalive_requests function, since it was introduced in 0.8.x.

FYI: Server config:
keepalive_timeout 10
keepalive_requests 250

Unlikely it’s related, but it would be great if you test in which
exact version problem had appeared (and the last good version).

Maxim D.

Last working version is 0.7.59 (last development 0.7.x) I’ll check
latest stable.

Posted at Nginx Forum:

Hello!

On Thu, Jun 25, 2009 at 03:49:59PM -0400, meto wrote:

Last working version is 0.7.59 (last development 0.7.x) I’ll check latest stable.

After looking more closely on your graphs it seems for me that
workers just die on signals (and obviously leave connection
counters non-decremented). Could you please check your logs
(system and/or nginx error log) to prove that?

If it’s the cause it would be great if you will be able to get
coredump and provide us a backtrace (but please recompile nginx
–with-debug if it’s not yet).

Maxim D.

So i’ve checked and 0.7.61 seems to have similar behaviour
(http://xgame.pl/nginx/)

In logs the only suspicious thing was:

2009/06/25 23:38:09 19083#0: worker process 19088 exited on signal 11

and some others similar, always signal 11.

So how to make coredump? I’ve got --with-debug package :slight_smile:

Posted at Nginx Forum:

Ok, already done that and waiting for process to die :wink: I’ll keep you
posted on progress.

Posted at Nginx Forum:

Hello!

On Thu, Jun 25, 2009 at 06:25:17PM -0400, meto wrote:

So i’ve checked and 0.7.61 seems to have similar behaviour (http://xgame.pl/nginx/)

In logs the only suspicious thing was:

2009/06/25 23:38:09 19083#0: worker process 19088 exited on signal 11

and some others similar, always signal 11.

Yep, it’s died on SIGSEGV.

So how to make coredump? I’ve got --with-debug package :slight_smile:

It depends on your OS. Under FreeBSD it should be enough to do:

sysctl kern.sugid_coredump=1
sysctl kern.corefile=“/var/coredumps/%N.%P.core”

(and create writeable /var/coredumps directory of course, more
details may be found in core(5) manual page).

I don’t know exact steps to enable coredumps under Linux, but
'Re: 0.5.7 waiting connections' - MARC suggests you
should use

echo 1 > /proc/sys/fs/suid_dumpable

or

echo 1 > /proc/sys/kernel/suid_dumpable

depending on Linux version, and add to nginx config something
like:

worker_rlimit_core 100m;
working_directory /path/to/core/files;

Once you have corefile please do

gdb /path/to/nginx /path/to/core
bt

and post backtrace here.

Maxim D.

There’s been a massacre at 3pm for nginx procs:
2009/06/26 14:56:35 3039#0: worker process 3040 exited on signal 11
2009/06/26 14:56:35 3039#0: worker process 3043 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 3041 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6622 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 3042 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6625 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6623 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6626 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6627 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6624 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6630 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6631 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6632 exited on signal 11
2009/06/26 14:56:39 3039#0: worker process 6633 exited on signal 11
2009/06/26 14:56:39 3039#0: worker process 6634 exited on signal 11

That’s also viewable on the graph (http://xgame.pl/nginx/), but no
coredump was saved due to chmod i believe :confused: So, that was a good guess
with workers dieing. I’ll try to provide coredumps later on.

The second question is why were there so many of them, since i’ve set
worker numer to 4 and affinity proc/core. maybe thets the problem?

Posted at Nginx Forum:

On Fri, Jun 26, 2009 at 09:06:36AM -0400, meto wrote:

2009/06/26 14:56:37 3039#0: worker process 6624 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6630 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6631 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6632 exited on signal 11
2009/06/26 14:56:39 3039#0: worker process 6633 exited on signal 11
2009/06/26 14:56:39 3039#0: worker process 6634 exited on signal 11

That’s also viewable on the graph (http://xgame.pl/nginx/), but no coredump was saved due to chmod i believe :confused: So, that was a good guess with workers dieing. I’ll try to provide coredumps later on.

Compile Nginx --with-cc-opt=“-O0 -g3” first. It will be a bit slower
(and noticeably larger) but should be easier to debug compared to
tightly optimised release code.

Are you using any addon modules/3rd party patches?

The second question is why were there so many of them, since i’ve set worker numer to 4 and affinity proc/core. maybe thets the problem?

If you’re on anyhow modern Linux, try something like this:

mkdir -p /var/lib/core
chmod 1733 /var/lib/core
echo /var/lib/core/core.pid%p.sig%s.%t > /proc/sys/kernel/core_pattern

And restart Nginx with unlimited core dump size, just to be sure:

killall nginx
ulimit -c unlimited
nginx

I have never seen Nginx segfault (apart from my own hacks), so
congratulations, I guess :slight_smile:

Best regards,
Grzegorz N.

I’m not using any 3rd party modules yet

nginx version: nginx/0.8.4
configure arguments: --conf-path=/etc/nginx/nginx.conf
–error-log-path=/var/log/nginx/error.log
–pid-path=/var/run/nginx.pid --lock-path=/var/lock/nginx.lock
–http-log-path=/var/log/nginx/access.log
–http-client-body-temp-path=/var/lib/nginx/body
–http-proxy-temp-path=/var/lib/nginx/proxy
–http-fastcgi-temp-path=/var/lib/nginx/fastcgi
–with-http_stub_status_module --with-http_flv_module
–with-http_ssl_module --with-http_dav_module --with-http_realip_module
–with-debug

A pice of logs:

2009/06/26 15:24:09 3039#0: worker process 6629 exited on signal 11
(core dumped)

Coredump:

GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show
copying”
and “show warranty” for details.
This GDB was configured as “x86_64-linux-gnu”…
(no debugging symbols found)

warning: Can’t read pathname for load map: Input/output error.
Reading symbols from /lib/libcrypt.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libpcre.so.3…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libpcre.so.3
Reading symbols from /usr/lib/libssl.so.0.9.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /usr/lib/libcrypto.so.0.9.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libdl.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1…
(no debugging symbols found)…done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libc.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_compat.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2…
(no debugging symbols found)…done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnss_files.so.2
(no debugging symbols found)
Core was generated by `nginx: worker pr’.
Program terminated with signal 11, Segmentation fault.

#0 0x0000000000442efb in ?? ()

If it’s any consideration its Ubuntu 8.04 amd64 with backports and nginx
package hand made ( nconfig : meto - last
nginx deb isin’t built with --with-debug option)

@Grzegorz
I’ve known how to resolve this, but it was so trivial that i forgot
about it :wink: I’ve set directory for coredump in nginx not global one.
PS.
Polaków coś tu mało :stuck_out_tongue:

I think that’s all. Anything that may be usefull?

Posted at Nginx Forum:

On piÄ…, cze 26, 2009 at 09:53:51 -0400, meto wrote:

#0 0x0000000000442efb in ?? ()

What does “bt” or “bt full” say? If you can rebuild Nginx with -g3 and
-O0
(like I wrote before), please do. Otherwise you may see a bunch of "??
()"s
in the stack trace. It is possible to decode them too, but it’s way
harder
than reading a clean trace.

If it’s any consideration its Ubuntu 8.04 amd64 with backports and nginx package hand made ( nconfig : meto - last nginx deb isin’t built with --with-debug option)

Shouldn’t matter (too much).

@Grzegorz
I’ve known how to resolve this, but it was so trivial that i forgot about it :wink: I’ve set directory for coredump in nginx not global one.
PS.
Polaków coś tu mało :stuck_out_tongue:

Już jedna nacja robiła tu forum w ojczystym języku :stuck_out_tongue:
Stick to English, please.

I think that’s all. Anything that may be usefull?

Don’t think so, just remember that recompiling the binary may make the
old core dumps useless (code layout changes), so you may wish to save
your current binary somewhere, just in case.

Best regards,
Grzegorz N.

bt:

(gdb) bt
#0 0x0000000000442efb in ?? ()
#1 0x0000000000438918 in ?? ()
#2 0x000000000045fe9e in ?? ()
#3 0x000000000042d7ec in ?? ()
#4 0x0000000000428a4d in ?? ()
#5 0x0000000000432825 in ?? ()
#6 0x00000000004331b3 in ?? ()
#7 0x000000000042268b in ?? ()
#8 0x000000000041a2ce in ?? ()
#9 0x0000000000420cb8 in ?? ()
#10 0x000000000041f37d in ?? ()
#11 0x0000000000421c33 in ?? ()
#12 0x00000000004065bb in ?? ()
#13 0x00007ff8bae771c4 in __libc_start_main () from /lib/libc.so.6
#14 0x0000000000404e09 in ?? ()
#15 0x00007fffc4219a38 in ?? ()
#16 0x0000000000000000 in ?? ()

I’m recompiling… Once again :wink:

Posted at Nginx Forum:

warning: Can’t read pathname for load map: Input/output error.
Reading symbols from /lib/libcrypt.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libpcre.so.3…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libpcre.so.3
Reading symbols from /usr/lib/libssl.so.0.9.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /usr/lib/libcrypto.so.0.9.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libdl.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1…
(no debugging symbols found)…done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libc.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_compat.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2…
(no debugging symbols found)…done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnss_files.so.2
(no debugging symbols found)
Core was generated by `nginx: worker pr’.
Program terminated with signal 11, Segmentation fault.

#0 0x000000000045dae4 in ?? ()
(gdb) bt full
#0 0x000000000045dae4 in ?? ()
No symbol table info available.
#1 0x0000000000454a00 in ?? ()
No symbol table info available.
#2 0x000000000049386d in ?? ()
No symbol table info available.
#3 0x000000000043e7b4 in ?? ()
No symbol table info available.
#4 0x000000000043d56c in ?? ()
No symbol table info available.
#5 0x000000000043d4e8 in ?? ()
No symbol table info available.
#6 0x000000000044a87e in ?? ()
No symbol table info available.
#7 0x00000000004492b3 in ?? ()
No symbol table info available.
#8 0x0000000000448ac3 in ?? ()
No symbol table info available.
#9 0x0000000000447f9e in ?? ()
No symbol table info available.
#10 0x0000000000427aec in ?? ()
No symbol table info available.
#11 0x00000000004258a9 in ?? ()
No symbol table info available.
#12 0x0000000000431caa in ?? ()
No symbol table info available.
#13 0x000000000042eb31 in ?? ()
No symbol table info available.
#14 0x0000000000430ac9 in ?? ()
No symbol table info available.
#15 0x000000000043021a in ?? ()
No symbol table info available.
#16 0x00000000004054ee in ?? ()
No symbol table info available.
#17 0x00007f293036a1c4 in __libc_start_main () from /lib/libc.so.6
No symbol table info available.
#18 0x0000000000404fd9 in ?? ()
No symbol table info available.
#19 0x00007fff3970f7c8 in ?? ()
No symbol table info available.
#20 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) bt
#0 0x000000000045dae4 in ?? ()
#1 0x0000000000454a00 in ?? ()
#2 0x000000000049386d in ?? ()
#3 0x000000000043e7b4 in ?? ()
#4 0x000000000043d56c in ?? ()
#5 0x000000000043d4e8 in ?? ()
#6 0x000000000044a87e in ?? ()
#7 0x00000000004492b3 in ?? ()
#8 0x0000000000448ac3 in ?? ()
#9 0x0000000000447f9e in ?? ()
#10 0x0000000000427aec in ?? ()
#11 0x00000000004258a9 in ?? ()
#12 0x0000000000431caa in ?? ()
#13 0x000000000042eb31 in ?? ()
#14 0x0000000000430ac9 in ?? ()
#15 0x000000000043021a in ?? ()
#16 0x00000000004054ee in ?? ()
#17 0x00007f293036a1c4 in __libc_start_main () from /lib/libc.so.6
#18 0x0000000000404fd9 in ?? ()
#19 0x00007fff3970f7c8 in ?? ()
#20 0x0000000000000000 in ?? ()

Crap. Does that mean that it didn’t compile with -G0? :frowning:

Posted at Nginx Forum:

i’m building packages and can force comand to gcc

Posted at Nginx Forum:

On Fri, Jun 26, 2009 at 01:17:28PM -0400, meto wrote:

#9 0x0000000000447f9e in ?? ()
#20 0x0000000000000000 in ?? ()

Crap. Does that mean that it didn’t compile with -G0? :frowning:

Apparently. Go to your source directory, say ‘make clean’ and rebuild
Nginx from scratch, i.e.:

./configure --with-cc-opt="-g3 -O0" … (all other options you used)
make

You should notice the compiled binary is much bigger now (for me it’s
2,2M with default options and 13,3M with -g3 -O0) and your stack traces
should become meaningful.

BTW, if you’re using Polish locales (or anything other than C) you may
wish to say:
export LANG=C
before building Nginx. Otherwise it doesn’t detect you’re using gcc (gcc
-v doesn’t say “gcc version”, at least on some Ubuntu versions; 8.10
looks fine, 7.10 or 8.04 IIRC wasn’t).

Best regards,
Grzegorz N.

I think the problem is that i’m not compiling nginx on target server.
Sorry, but i don’t want to do that on my production server… Is there
other way to reslove that problem?

Posted at Nginx Forum:

package while building prints this:


gcc -c -O0 -g3 -I src/core -I src/event -I src/event/modules -I
src/os/unix -I objs
-o objs/src/core/ngx_open_file_cache.o
src/core/ngx_open_file_cache.c

Posted at Nginx Forum:

On Fri, Jun 26, 2009 at 03:54:15PM -0400, meto wrote:

I think the problem is that i’m not compiling nginx on target server. Sorry, but i don’t want to do that on my production server… Is there other way to reslove that problem?

As long as your dev and production server have similar enough software
it shouldn’t matter (well, Nginx runs so it is close enough at least for
that).

Have you managed to get a usable core dump? Have you built an Nginx
binary with debugging flags? What is “the problem” currently?

Best regards,
Grzegorz N.

I’m still geting same coredumps and .deb package itself is bigger about
20kb. /usr/sbin/nginx is about 800kb when orginal is aprox. 500kb. But
still i cant get any more details in coredupms. Could there be other
problem? For example packages?

Posted at Nginx Forum:

I’ve read a while about manual instalation and have a little question:
Are there methods implemented in make to execute “make uninstall”? Would
it be safe/posible to overwrite those files from package after debuging?

Posted at Nginx Forum: