Php-fpm stops responding

I’m having what appears to be a fairly common problem with PHP-FPM.
Every now and again, I get a string of errors in my log like this…

recv() failed (104: Connection reset by peer) while reading response
header from upstream, client:
recv() failed (104: Connection reset by peer) while reading response
header from upstream, client:
recv() failed (104: Connection reset by peer) while reading response
header from upstream, client:
etc

I found this in the forums from a few months back

[i]2. The typical problem we have encountered when php pages suddenly
stop processing is either all the forked childs are doing some long
(unintended) running scripts (as the inbuilt max_max_execution_time
doesnt always work (if at all) as expected) or just have been hanged so
the master process has no free childs to assign the incomming request.

Thats why you:

  • spawn more than just few childs. While the typical approach is to like
    go by cpu core count we have experienced that adding some multiplier
    like 3 - 4x works better as the php code tends usually to wait more from
    external resources (DBs etc) rather than processing code
  • use the great features of php-fpm to monitor which scripts take too
    long to execute and kill those who are taking too long.

Like we use:
30s
60s

Which means that requests taking more than 30 seconds to compute will be
logged (backtraced) and those taking longer than minute killed by
force.[/i]

I figured this would be the first place to start, how do I spawn more
children for php?

I would like to track down the offending script as well - how do I use
the items that were talked about above?

Thanks

Flash

Posted at Nginx Forum:

I figured this would be the first place to start, how do I spawn more
children for php?

If you want FPM to do it you need to use the dynamic spawning then (
edit
php-fpm.conf ).

pm = dynamic

and change the following settings to your needs:

pm.max_children
pm.start_servers
pm.min_spare_servers
pm.max_spare_servers

Otherways if you use ‘static’ process manager just increase
pm.max_children
then.

The error_log should even contain some information if there is a need to
tune theese params.
Example output:

Jul 12 20:14:20.869487 [WARNING] [pool www] seems busy (you may need to
increase start_servers, or min/max_spare_servers), spawning 32 children,
there are 0 idle, and 66 total children
Jul 12 20:14:21.872523 [WARNING] [pool www] server reached max_children
setting (70), consider raising it

I would like to track down the offending script as well - how do I use
the items that were talked about above?

request_terminate_timeout = 60s
request_slowlog_timeout = 20s
slowlog = /path/slow.log

slow.log then will have backtrace of which script and what part of it
takes
too long (in this case 20 seconds).

You can check http://php-fpm.org/wiki/Configuration_File - while the
documentation is for the old style XML configuration the variables are
still
the same.

rr