Nginx + PHP FASTCGI FAILS - how to debug?

I have an Ubuntu 9.10 server on AMAZON EC2 running Nginx +PHP with PHP
FASTCGI via port 9000.

The server runs fine for a few minutes and after a few minutes (several
thousands of hits in this case) FastCGI dies and Nginx returns 502
Error.

Nginx log shows 2010/01/12 16:49:24 1093#0: *9965 connect() failed
(111: Connection refused) while connecting to upstream, client:
79.180.27.241, server: localhost, request: “GET /data.php?data=7781
HTTP/1.1”, upstream: “fastcgi://127.0.0.1:9000”, host:
site1.mysite.com”, referrer: “http://www.othersite.com/subc.asp?t=10

How can I debug what is causing FastCGI to die ?

Posted at Nginx Forum:

On Tue, Jan 12, 2010 at 10:05 AM, Niro [email protected] wrote:


nginx mailing list
[email protected]
nginx Info Page

It appears to not be running or perhaps a firewall is blocking, but I
am guessing the former.

– Merlin

Same problem here.

This is with a clean install of Nginx 0.8.33 and Ubuntu Server 9.10.
Will try with 0.7.65 soon.

Anyhow, Nginx with PHP FastCGI works fine for a while and then just
dies. Time it works seems random to me.

If I restart Nginx and PHP FastCGI, web pages in php will load once
again, but oddly it keeps trying to load any images / scripts / files,
etc. using https now. I don’t have https. Agan this does not happen from
the start.

Very strange behavior and does not happen until I start seeing 502
Errors.

Have to reboot server for it to work again and it does every single time
after a reboot.

Really confusing.

Posted at Nginx Forum:

Hi,

I have an Ubuntu 9.10 server on AMAZON EC2 running Nginx +PHP with PHP FASTCGI via port 9000.

The server runs fine for a few minutes and after a few minutes (several thousands of hits in this case) FastCGI dies and Nginx returns 502 Error.

Nginx log shows 2010/01/12 16:49:24 1093#0: *9965 connect() failed (111: Connection refused) while connecting to upstream, client: 79.180.27.241, server: localhost, request: “GET /data.php?data=7781 HTTP/1.1”, upstream: “fastcgi://127.0.0.1:9000”, host: “site1.mysite.com”, referrer: “http://www.othersite.com/subc.asp?t=10

I’ve experiencing something similar.
After some time (some hours, but I’m getting not that much hits), PHP
appears to be not responding anymore:
Processes are running, but nginx times out:
First with
upstream timed out (110: Connection timed out) while reading
response header from upstream
then
recv() failed (104: Connection reset by peer) while reading response
header from upstream

This happened with php-cgi and a custom start script a few days ago
(same symptoms, have not checked logs carefully), and today after I’ve
switched to using php-fpm.

So, both php-cgi and php-fpm appear to not behave well with nginx.
Those problems never appeared using lighttpd (in the same
container/machine), but are happening now since I have only PHP
running in the container/machine and nginx (acting as first proxy in
line) is directly forwarding to it.

Maybe this did not happen with lighttpd before, because it (and the
php-cgi process) was regularly restarted, which is not the case
currently.

I’ve found PHP :: Bug #39809 :: FastCGI Requests silently dropped

I’m running PHP 5.3.1-0.dotdeb.1 and nginx 0.8.33-0~ppa2.

I guess this is rather a PHP than an nginx problem, since restarting
php-fpm fixes it.

For now, I’ve tried bumping the number of child processes, hoping that
this will prevent the PHP processes entering the ignoring state.

Here’s a munin graph from the PHP container
(http://i.imgur.com/MHbNi.png) - the failing started with the
increasing number of established network connection: those are
probably the connections from nginx queuing up, without being
answered.

How can I debug what is causing FastCGI to die ?

So in your case there aren’t any php-cgi processes anymore? Have you
checked the php error log?

Cheers,
Daniel


http://daniel.hahler.de/

So, both php-cgi and php-fpm appear to not behave well with nginx.

First of all - when running php in fastcgi mode you should understand
that more or less the webserver is taken out of picture.

The php fastcgi service is a standalone process which listens on port or
socket and doesnt actually care what webserver (as far as
it can ‘talk fastcgi’) is used (of course there are some differences of
how the webservers handle this or that situation/error
(like lighttpd for example disables the backend for some time
(configurable) if its not reachable and doesnt try to reconnect at
that period).

But the basic principles which have worked for us (on a pretty big site
with thousands of requests per sec) are such:

  1. Use PHP-FPM! ( http://php-fpm.org/ ) while at the moment it requires
    you to patch and build your php from source (rather than
    installing some distro package) even the php developers themselves have
    admited that current php process manager is bassicaly shite.
    And for a good reason they have at last included the php-fpm in the 5.3
    tree (still experimental though).

Without php-fpm you bassically have no control over your fastcgi
processes (except the ‘PHP_FCGI_MAX_REQUESTS’) the master process
can die / childs can get stuck in infinite loops or eat all of your ram
in case of a leaking code / extension.

  1. The typical problem we have encountered when php pages suddenly stop
    processing is either all the forked childs are doing some
    long (unintended) running scripts (as the inbuilt max_max_execution_time
    doesnt always work (if at all) as expected) or just have
    been hanged so the master process has no free childs to assign the
    incomming request.

Thats why you:

  • spawn more than just few childs. While the typical approach is to like
    go by cpu core count we have experienced that adding some
    multiplier like 3 - 4x works better as the php code tends usually to
    wait more from external resources (DBs etc) rather than
    processing code
  • use the great features of php-fpm to monitor which scripts take too
    long to execute and kill those who are taking too long.

Like we use:

30s
60s

Which means that requests taking more than 30 seconds to compute will be
logged (backtraced) and those taking longer than minute
killed by force.
Has helped to to find all the infinite loops and other weird issues
created by php coders or also opcode accelerators like eA ( look
at this for example http://www.eaccelerator.net/ticket/381 )

  • tune the process_control_timeout and emergency_restart_threshold
    settings so the php childs get respawned in case there are memory
    leaks/errors or the child gets stuck.
  1. At last to debug whats the child is actually doing is easilly done
    with ‘strace’ … Use ‘ps aux’ to see the process numbers and
    see what php child is taking cpu and then attach with ‘strace -p [pid]’
    and you can take a look what is the process doing (if
    anything).

rr

Hello.

Few days ago, we got the same !! php-cgi process was running using all
the CPU. I did need to kill them explicity. I could reproduce it,
stressing the application with http_load.

After upgrading eAccelerator to 0.9.6 Final (or disabling old version),
all goes to normal behaviour.

Cent OS 5.4 64 bits
php 5.3.1

Hope this helps.

Posted at Nginx Forum:

Few days ago I got something smillar, all php-cgi process was still
running but I got error bad gateway, php-cgi ignored sigterm but after
killing it with sigkill and start again seen to work. Odd.

Usu wrote:

nginx mailing list
[email protected]
nginx Info Page

what are you seeing on the system side? whats your output of:
dstat
ulimit

also, how many connections are you allowing for your fast-cgi ?

Hi, I’m having the same problem for a few weeks now, after x hours/days
php becomes unresponsive, I’ve already changed 2 different php-fastcgi
spawning scripts and tried php-fpm as well, switched between many
version of the php 5.2 and 5.3 branches but the problem still remains so
I don’t think it’s a php issue even if that would be the most logical
conclusion.

When php becomes unresponsive (502 bad gateway error) all the
php-cgi/php-fpm processes are still running and I can see them in top/ps
aux, when I restart php everything start to work again.

I don’t know if it’s releated, but while I was browsing this forum a few
minutes ago I got the 502 error with the same behavior it has when it
happens on my site, it lasted some minutes than someone took care of it.
Also, take a look at this discussion:
http://groups.google.com/group/highload-php-en/browse_thread/thread/a3809a50eba71a45
I think that this is a nginx bug, it would be nice if some developer
could look into this, I could provide error logs and such if needed,
just let me know, thanks!

Posted at Nginx Forum:

Hi, I’m having the same problem for a few weeks
now, after x hours/days php becomes unresponsive,
I’ve already changed 2 different php-fastcgi
spawning scripts and tried php-fpm as well,
switched between many version of the php 5.2 and
5.3 branches but the problem still remains so I
don’t think it’s a php issue even if that would be
the most logical conclusion.

You are probably have too many concurrent connections from nginx
to your php-fastcgi application. Try to increase listen queue to 1024
or something to handle your load spikes, benchmarking or whatever
causing it.

Posted at Nginx Forum:

Hi, thanks for the replys.

ulimit output: unilimited
To run dstat I will have to wait until php becomes unresponsive, it
could be hours or days, I can’t reproduce it since it’s not releated to
the load on the server, I tried stressing it with Apache Benchmark but
even with all the cpus at 100% the site was still working fine (only
slow), I use munin and every time it happens I don’t see any weird
behavior apart form the mysql threads that have a spike.

PHP_FCGI_MAX_REQUESTS = 500
I’m using dynamic child with php-fpm (but I’ve already tried many
setting here even without php-fpm)

Here’s the first part of the nginx configuration, let me know if you
need the whole thing

user www-data www-data;
worker_processes 12;
error_log logs/error.log;
pid /dev/shm/nginx.pid;
worker_rlimit_nofile 10000;

events {
worker_connections 1024;
use epoll;
}

http {
include mime.types;
#include /etc/nginx/proxy.conf;
include /usr/local/nginx/conf/fastcgi_params;

default_type application/octet-stream;

 log_format   main '$remote_addr - $remote_user [$time_local] 

“$request” ’
'$status $body_bytes_sent “$http_referer” ’
‘“$http_user_agent” “$http_x_forwarded_for”’;

access_log   logs/access.log  main;
sendfile     on;
tcp_nopush   on;
server_names_hash_bucket_size 128; # this seems to be required for 

some vhosts
client_max_body_size 12m;
large_client_header_buffers 8 32k;

# output compression saves bandwidth
gzip              on;
gzip_proxied      any;
gzip_http_version 1.1;
#gzip_min_length  1100;
gzip_comp_level   2;
#gzip_buffers     4 8k;
gzip_types        text/plain text/css application/x-javascript 

text/xml application/xml application/xml+rss text/javascript
application/atom+xml;
#gzip_vary on;
#gzip_disable “MSIE [1-6].”;

Posted at Nginx Forum:

Only a bunch of apf (firewall) messages…

Posted at Nginx Forum:

Hey, Also what is the output of your dmesg ?

Posted at Nginx Forum:

No, nothing like that, not now, not when the php problem occurs, the
system logs are clean.

Posted at Nginx Forum:

nothing saying that your out of sockets or mem or any messages about
connection tracking modules being full?

Posted at Nginx Forum:

Usu wrote:

Hi, I’m having the same problem for a few weeks now, after x hours/days
php becomes unresponsive, I’ve already changed 2 different php-fastcgi
spawning scripts and tried php-fpm as well, switched between many
version of the php 5.2 and 5.3 branches but the problem still remains so
I don’t think it’s a php issue even if that would be the most logical
conclusion.

When php becomes unresponsive (502 bad gateway error) all the
php-cgi/php-fpm processes are still running and I can see them in top/ps
aux, when I restart php everything start to work again.

I don’t know if it’s releated, but while I was browsing this forum a few
minutes ago I got the 502 error with the same behavior it has when it
happens on my site, it lasted some minutes than someone took care of it.
Also, take a look at this discussion:
http://groups.google.com/group/highload-php-en/browse_thread/thread/a3809a50eba71a45
I think that this is a nginx bug, it would be nice if some developer
could look into this, I could provide error logs and such if needed,
just let me know, thanks!

Posted at Nginx Forum:
Re: Nginx + PHP FASTCGI FAILS - how to debug ?

I realize you may have moved on; some time has passed since you posted
this problem. However, I had the same issue and I was able to fix it.
The user nginx runs as in a default setup is “www-data”, or “nobody”.
Nginx has insufficient permissions to communicate with
fastcgi://127.0.0.1:9000 which usually runs as root. So if you use
spawn-fcgi, you must run it with the -U flag, and specify the same user
that nginx runs as. For example:

spawn-fcgi -a 127.0.0.1 -p 9000 -C 5 -f /usr/bin/php-cgi -U nobody

or if you use sockets

spawn-fcgi -s /var/run/fcgi.sock -C 5 -f /usr/bin/php-cgi -U nobody

My problem was identical to yours, php would work for a moment, then
become unresponsive even though the processes were still running. I hope
this helps, this problem was very frustrating.

I’m having this issue. PHP5-CGI just disappears after undetermined
time(1 to 4 hrs).

I have the latest php, nginx and lighttpd installed on ubuntu

to get my web service back i need to re-spawn using this command

/usr/bin/spawn-fcgi -a 127.0.0.1 -p 9000 -u www-data -g www-data -f
/usr/bin/php5-cgi -P /var/run/fastcgi-php.pid

your help is appreciated

/usr/bin/spawn-fcgi -a 127.0.0.1 -p 9000 -u www-data -g www-data -f

I would suggest to use FPM php process manager (
http://www.php.net/manual/en/install.fpm.php ) instead of 'spawn-fcgi ’
. For a a
while its in php core and preferred way use php in webservers with
fastcgi interface.

rr