Since version 0.8.x I’ve noticed that my keep-alive sessions are not
removed properly (see http://xgame.pl/nginx/ ). I’m guessing that it’s
connected to keepalive_requests function, since it was introduced in
0.8.x.
FYI: Server config:
keepalive_timeout 10
keepalive_requests 250
nginx 0.8.4
On Thu, Jun 25, 2009 at 10:07:08AM -0400, meto wrote:
Since version 0.8.x I’ve noticed that my keep-alive sessions are not removed properly (see http://xgame.pl/nginx/ ). I’m guessing that it’s connected to keepalive_requests function, since it was introduced in 0.8.x.
FYI: Server config:
keepalive_timeout 10
keepalive_requests 250
Unlikely it’s related, but it would be great if you test in which
exact version problem had appeared (and the last good version).
On Thu, Jun 25, 2009 at 03:49:59PM -0400, meto wrote:
Last working version is 0.7.59 (last development 0.7.x) I’ll check latest stable.
After looking more closely on your graphs it seems for me that
workers just die on signals (and obviously leave connection
counters non-decremented). Could you please check your logs
(system and/or nginx error log) to prove that?
If it’s the cause it would be great if you will be able to get
coredump and provide us a backtrace (but please recompile nginx
–with-debug if it’s not yet).
There’s been a massacre at 3pm for nginx procs:
2009/06/26 14:56:35 3039#0: worker process 3040 exited on signal 11
2009/06/26 14:56:35 3039#0: worker process 3043 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 3041 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6622 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 3042 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6625 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6623 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6626 exited on signal 11
2009/06/26 14:56:36 3039#0: worker process 6627 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6624 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6630 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6631 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6632 exited on signal 11
2009/06/26 14:56:39 3039#0: worker process 6633 exited on signal 11
2009/06/26 14:56:39 3039#0: worker process 6634 exited on signal 11
That’s also viewable on the graph (http://xgame.pl/nginx/), but no
coredump was saved due to chmod i believe So, that was a good guess
with workers dieing. I’ll try to provide coredumps later on.
The second question is why were there so many of them, since i’ve set
worker numer to 4 and affinity proc/core. maybe thets the problem?
On Fri, Jun 26, 2009 at 09:06:36AM -0400, meto wrote:
2009/06/26 14:56:37 3039#0: worker process 6624 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6630 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6631 exited on signal 11
2009/06/26 14:56:37 3039#0: worker process 6632 exited on signal 11
2009/06/26 14:56:39 3039#0: worker process 6633 exited on signal 11
2009/06/26 14:56:39 3039#0: worker process 6634 exited on signal 11
That’s also viewable on the graph (http://xgame.pl/nginx/), but no coredump was saved due to chmod i believe So, that was a good guess with workers dieing. I’ll try to provide coredumps later on.
Compile Nginx --with-cc-opt=“-O0 -g3” first. It will be a bit slower
(and noticeably larger) but should be easier to debug compared to
tightly optimised release code.
Are you using any addon modules/3rd party patches?
The second question is why were there so many of them, since i’ve set worker numer to 4 and affinity proc/core. maybe thets the problem?
If you’re on anyhow modern Linux, try something like this:
2009/06/26 15:24:09 3039#0: worker process 6629 exited on signal 11
(core dumped)
Coredump:
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show
copying”
and “show warranty” for details.
This GDB was configured as “x86_64-linux-gnu”…
(no debugging symbols found)
warning: Can’t read pathname for load map: Input/output error.
Reading symbols from /lib/libcrypt.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libpcre.so.3…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libpcre.so.3
Reading symbols from /usr/lib/libssl.so.0.9.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /usr/lib/libcrypto.so.0.9.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libdl.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1…
(no debugging symbols found)…done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libc.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_compat.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2…
(no debugging symbols found)…done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnss_files.so.2
(no debugging symbols found)
Core was generated by `nginx: worker pr’.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000442efb in ?? ()
If it’s any consideration its Ubuntu 8.04 amd64 with backports and nginx
package hand made ( nconfig : meto - last
nginx deb isin’t built with --with-debug option)
@Grzegorz
I’ve known how to resolve this, but it was so trivial that i forgot
about it I’ve set directory for coredump in nginx not global one.
PS.
Polaków coś tu mało
On piÄ…, cze 26, 2009 at 09:53:51 -0400, meto wrote:
#0 0x0000000000442efb in ?? ()
What does “bt” or “bt full” say? If you can rebuild Nginx with -g3 and
-O0
(like I wrote before), please do. Otherwise you may see a bunch of "??
()"s
in the stack trace. It is possible to decode them too, but it’s way
harder
than reading a clean trace.
If it’s any consideration its Ubuntu 8.04 amd64 with backports and nginx package hand made ( nconfig : meto - last nginx deb isin’t built with --with-debug option)
Shouldn’t matter (too much).
@Grzegorz
I’ve known how to resolve this, but it was so trivial that i forgot about it I’ve set directory for coredump in nginx not global one.
PS.
Polaków coś tu mało
Już jedna nacja robiła tu forum w ojczystym języku
Stick to English, please.
I think that’s all. Anything that may be usefull?
Don’t think so, just remember that recompiling the binary may make the
old core dumps useless (code layout changes), so you may wish to save
your current binary somewhere, just in case.
(gdb) bt #0 0x0000000000442efb in ?? () #1 0x0000000000438918 in ?? () #2 0x000000000045fe9e in ?? () #3 0x000000000042d7ec in ?? () #4 0x0000000000428a4d in ?? () #5 0x0000000000432825 in ?? () #6 0x00000000004331b3 in ?? () #7 0x000000000042268b in ?? () #8 0x000000000041a2ce in ?? () #9 0x0000000000420cb8 in ?? () #10 0x000000000041f37d in ?? () #11 0x0000000000421c33 in ?? () #12 0x00000000004065bb in ?? () #13 0x00007ff8bae771c4 in __libc_start_main () from /lib/libc.so.6 #14 0x0000000000404e09 in ?? () #15 0x00007fffc4219a38 in ?? () #16 0x0000000000000000 in ?? ()
warning: Can’t read pathname for load map: Input/output error.
Reading symbols from /lib/libcrypt.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libpcre.so.3…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libpcre.so.3
Reading symbols from /usr/lib/libssl.so.0.9.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /usr/lib/libcrypto.so.0.9.8…(no debugging symbols
found)…done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libdl.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1…
(no debugging symbols found)…done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libc.so.6…(no debugging symbols
found)…done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_compat.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2…
(no debugging symbols found)…done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2…(no debugging symbols
found)…done.
Loaded symbols for /lib/libnss_files.so.2
(no debugging symbols found)
Core was generated by `nginx: worker pr’.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000045dae4 in ?? ()
(gdb) bt full #0 0x000000000045dae4 in ?? ()
No symbol table info available. #1 0x0000000000454a00 in ?? ()
No symbol table info available. #2 0x000000000049386d in ?? ()
No symbol table info available. #3 0x000000000043e7b4 in ?? ()
No symbol table info available. #4 0x000000000043d56c in ?? ()
No symbol table info available. #5 0x000000000043d4e8 in ?? ()
No symbol table info available. #6 0x000000000044a87e in ?? ()
No symbol table info available. #7 0x00000000004492b3 in ?? ()
No symbol table info available. #8 0x0000000000448ac3 in ?? ()
No symbol table info available. #9 0x0000000000447f9e in ?? ()
No symbol table info available. #10 0x0000000000427aec in ?? ()
No symbol table info available. #11 0x00000000004258a9 in ?? ()
No symbol table info available. #12 0x0000000000431caa in ?? ()
No symbol table info available. #13 0x000000000042eb31 in ?? ()
No symbol table info available. #14 0x0000000000430ac9 in ?? ()
No symbol table info available. #15 0x000000000043021a in ?? ()
No symbol table info available. #16 0x00000000004054ee in ?? ()
No symbol table info available. #17 0x00007f293036a1c4 in __libc_start_main () from /lib/libc.so.6
No symbol table info available. #18 0x0000000000404fd9 in ?? ()
No symbol table info available. #19 0x00007fff3970f7c8 in ?? ()
No symbol table info available. #20 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) bt #0 0x000000000045dae4 in ?? () #1 0x0000000000454a00 in ?? () #2 0x000000000049386d in ?? () #3 0x000000000043e7b4 in ?? () #4 0x000000000043d56c in ?? () #5 0x000000000043d4e8 in ?? () #6 0x000000000044a87e in ?? () #7 0x00000000004492b3 in ?? () #8 0x0000000000448ac3 in ?? () #9 0x0000000000447f9e in ?? () #10 0x0000000000427aec in ?? () #11 0x00000000004258a9 in ?? () #12 0x0000000000431caa in ?? () #13 0x000000000042eb31 in ?? () #14 0x0000000000430ac9 in ?? () #15 0x000000000043021a in ?? () #16 0x00000000004054ee in ?? () #17 0x00007f293036a1c4 in __libc_start_main () from /lib/libc.so.6 #18 0x0000000000404fd9 in ?? () #19 0x00007fff3970f7c8 in ?? () #20 0x0000000000000000 in ?? ()
Crap. Does that mean that it didn’t compile with -G0?
On Fri, Jun 26, 2009 at 01:17:28PM -0400, meto wrote:
#9 0x0000000000447f9e in ?? () #20 0x0000000000000000 in ?? ()
Crap. Does that mean that it didn’t compile with -G0?
Apparently. Go to your source directory, say ‘make clean’ and rebuild
Nginx from scratch, i.e.:
./configure --with-cc-opt="-g3 -O0" … (all other options you used)
make
You should notice the compiled binary is much bigger now (for me it’s
2,2M with default options and 13,3M with -g3 -O0) and your stack traces
should become meaningful.
BTW, if you’re using Polish locales (or anything other than C) you may
wish to say:
export LANG=C
before building Nginx. Otherwise it doesn’t detect you’re using gcc (gcc
-v doesn’t say “gcc version”, at least on some Ubuntu versions; 8.10
looks fine, 7.10 or 8.04 IIRC wasn’t).
I think the problem is that i’m not compiling nginx on target server.
Sorry, but i don’t want to do that on my production server… Is there
other way to reslove that problem?
On Fri, Jun 26, 2009 at 03:54:15PM -0400, meto wrote:
I think the problem is that i’m not compiling nginx on target server. Sorry, but i don’t want to do that on my production server… Is there other way to reslove that problem?
As long as your dev and production server have similar enough software
it shouldn’t matter (well, Nginx runs so it is close enough at least for
that).
Have you managed to get a usable core dump? Have you built an Nginx
binary with debugging flags? What is “the problem” currently?
I’m still geting same coredumps and .deb package itself is bigger about
20kb. /usr/sbin/nginx is about 800kb when orginal is aprox. 500kb. But
still i cant get any more details in coredupms. Could there be other
problem? For example packages?
I’ve read a while about manual instalation and have a little question:
Are there methods implemented in make to execute “make uninstall”? Would
it be safe/posible to overwrite those files from package after debuging?