Php-fpm backend, sigsegv and limit_conn

dubstep · April 19, 2012, 2:25pm

Hello,

I’m facing the situation where worker process crashes in the php-fpm
backend with SIGSEGV: [alert] 5712#0: worker process 5713 exited on
signal
11

In such a case, what happens to the connection? is it dangling?
Crashes in the backend are random, and I have the impression that when
too
many crashes happen in a row, connection stay open and eventually I’m
hitting limit_conn.

What do you think?

Regards,
Gregory

casper_the_ghost · April 19, 2012, 3:35pm

Hello!

On Thu, Apr 19, 2012 at 02:23:40PM +0200, Grégory Pakosz wrote:

I’m facing the situation where worker process crashes in the php-fpm
backend with SIGSEGV: [alert] 5712#0: worker process 5713 exited on signal
11

You mean nginx worker process crash, right?

In such a case, what happens to the connection? is it dangling?
Crashes in the backend are random, and I have the impression that when too
many crashes happen in a row, connection stay open and eventually I’m
hitting limit_conn.

The limit_conn numbers are kept in a shared memory and will not be
decremented on nginx worker process crash, leading to a limit
hits eventually. To clear the numbers you may do an online
upgrade, see Controlling nginx.

Or, better, you may want to debug crashes, see here for basic
instructions:

Note well that crashes might lead to much more severe
consequences, e.g. if the crash happens during shared memory
update it may be left in an inconsistent state, leading to
unpredictable behaviour (usually more crashes).

Maxim D.

casper_the_ghost · April 19, 2012, 3:46pm

On Thu, Apr 19, 2012 at 3:35 PM, Maxim D. [email protected]
wrote:

You mean nginx worker process crash, right?

errors get logged into /var/log/nginx/error.log:
2012/04/19 15:34:41 [alert] 5712#0: worker process 9234 exited on signal
11
2012/04/19 15:34:48 [alert] 5712#0: worker process 9252 exited on signal
11
2012/04/19 15:34:57 [alert] 5712#0: worker process 9253 exited on signal
11
2012/04/19 15:35:06 [alert] 5712#0: worker process 9272 exited on signal
11
2012/04/19 15:36:11 [alert] 5712#0: worker process 9277 exited on signal
11

debian squeeze
nginx-full 1.1.19-1~bpo60+1 from squeeze-backports
php5-fpm 5.3.10-1~dotdeb.1 from dotdeb

The context: I’m experimenting with Piwik’s new log analytics import
script. It’s a Python script that parses server logs and does POST
requests
to a Piwik instance.
I wouldn’t say it’s hammering much as it’s able to post only around 10
log
lines per second.

It’s those POST requests that randomly cause the SIGSEGV errors that got
logged into /var/log/nginx/error.log. After many worker process crashes,
I
eventually hit a connection limit I set with limit_conn (of 10
simultaneous
connections per IP). I could disable limit_conn as a workaround until I
find out the root cause of those crashes.

About the crashes themselves, I really don’t know what’s going on so
far.
PHP-FPM logs remain empty (or I wasn’t able to enable them correctly).
Piwik’s wiki states it’s impossible Piwik itself is the root cause for
SIGSEGV; instead they say bugs in PHP itself, MySQL or accelerators like
APC are to blame. (note I disabled APC without much luck).

Note well that crashes might lead to much more severe
consequences, e.g. if the crash happens during shared memory
update it may be left in an inconsistent state, leading to
unpredictable behaviour (usually more crashes).

I’ll try to give it a go. thank you for the reply

Gregory