Nginx + php-fpm "504 Gateway Time-out" error with almost zero load (on a test-server)

(note - this is reposting of -
Nginx + php-fpm "504 Gateway Time-out" error with almost zero load (on a test-server))

Hi All,

After debugging for 6-hours - I am giving this up :expressionless:

We have a nginx+php-fpm+mysql in LAN with almost 100 wordpress (created
and used by different designers/developers all working on test wordpres
setup)

We are using nginx without any issues from long.

Today, all of a sudden - nginx started returning “504 Gateway
Time-out”
out of the blue…

I checked nginx error log for a virtual host…

2010/09/06 21:24:24 [error] 12909#0: *349 upstream timed out (110:
Connection timed out) while reading response header from upstream,
client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET
/favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host:
"rahul286.rtcamp.info"
2010/09/06 21:25:11 [error] 12909#0: *349 recv() failed (104: Connection
reset by peer) while reading response header from upstream, client:
192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico
HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host:
"rahul286.rtcamp.info"
2010/09/06 21:25:11 [error] 12909#0: *443 recv() failed (104: Connection
reset by peer) while reading response header from upstream, client:
192.168.0.1, server: rahul286.rtcamp.info, request: "GET /info.php
HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host:
"rahul286.rtcamp.info"
2010/09/06 21:25:12 [error] 12909#0: *443 connect() failed (111:
Connection refused) while connecting to upstream, client: 192.168.0.1,
server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1",
upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:08:32 [error] 12909#0: *1025 upstream timed out (110:
Connection timed out) while reading response header from upstream,
client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /
HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host:
"rahul286.rtcamp.info"
2010/09/06 22:09:33 [error] 12909#0: *1025 upstream timed out (110:
Connection timed out) while reading response header from upstream,
client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET
/favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host:
"rahul286.rtcamp.info"
2010/09/06 22:09:40 [error] 12909#0: *1064 recv() failed (104:
Connection reset by peer) while reading response header from upstream,
client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET
/info.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host:
"rahul286.rtcamp.info"
2010/09/06 22:09:40 [error] 12909#0: *1064 connect() failed (111:
Connection refused) while connecting to upstream, client: 192.168.0.1,
server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1",
upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:24:44 [error] 12909#0: *1313 upstream timed out (110:
Connection timed out) while reading response header from upstream,
client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /
HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host:
"rahul286.rtcamp.info"
2010/09/06 22:24:53 [error] 12909#0: *1313 recv() failed (104:
Connection reset by peer) while reading response header from upstream,
client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET
/favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host:
"rahul286.rtcamp.info"

As I run php-fpm on port 9000 via TCP mode, I ran “netstat | grep 9000”
and noticed something unusual…

tcp     1321      0 localhost:9000          localhost:46643
CLOSE_WAIT  -
tcp        9      0 localhost:9000          localhost:36087
CLOSE_WAIT  14272/php5-fpm
tcp     1257      0 localhost:9000          localhost:46664
CLOSE_WAIT  -
tcp        0      0 localhost:46666         localhost:9000
ESTABLISHED 12909/nginx: worker
tcp        9      0 localhost:9000          localhost:36100
CLOSE_WAIT  14276/php5-fpm
tcp     1321      0 localhost:9000          localhost:46629
CLOSE_WAIT  -
tcp        9      0 localhost:9000          localhost:36106
CLOSE_WAIT  14275/php5-fpm
tcp     1257      0 localhost:9000          localhost:46656
CLOSE_WAIT  -
tcp        9      0 localhost:9000          localhost:36091
CLOSE_WAIT  14277/php5-fpm
tcp     1321      0 localhost:9000          localhost:46625
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:46658
CLOSE_WAIT  -
tcp     1321      0 localhost:9000          localhost:46641
CLOSE_WAIT  -
tcp        0      0 localhost:46682         localhost:9000
ESTABLISHED 12909/nginx: worker
tcp     1321      0 localhost:9000          localhost:46633
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:36133
CLOSE_WAIT  -
tcp     1336      0 localhost:9000          localhost:46676
ESTABLISHED -
tcp     1257      0 localhost:9000          localhost:36131
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:46660
CLOSE_WAIT  -
tcp     1249      0 localhost:9000          localhost:36117
CLOSE_WAIT  -
tcp        9      0 localhost:9000          localhost:36094
CLOSE_WAIT  14269/php5-fpm
tcp        0      0 localhost:46664         localhost:9000
FIN_WAIT2   -
tcp     1257      0 localhost:9000          localhost:36135
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:36125
CLOSE_WAIT  -
tcp        9      0 localhost:9000          localhost:36102
CLOSE_WAIT  14268/php5-fpm
tcp        0      0 localhost:46662         localhost:9000
FIN_WAIT2   -
tcp      745      0 localhost:9000          localhost:46644
CLOSE_WAIT  -
tcp        0      0 localhost:46658         localhost:9000
FIN_WAIT2   -
tcp     1265      0 localhost:9000          localhost:46607
CLOSE_WAIT  -
tcp        0      0 localhost:46672         localhost:9000
ESTABLISHED 12909/nginx: worker
tcp     1257      0 localhost:9000          localhost:36119
CLOSE_WAIT  -
tcp     1265      0 localhost:9000          localhost:46613
CLOSE_WAIT  -
tcp        0      0 localhost:46646         localhost:9000
FIN_WAIT2   -
tcp     1257      0 localhost:9000          localhost:36137
CLOSE_WAIT  -
tcp        0      0 localhost:46670         localhost:9000
ESTABLISHED 12909/nginx: worker
tcp     1265      0 localhost:9000          localhost:46619
CLOSE_WAIT  -
tcp     1336      0 localhost:9000          localhost:46668
ESTABLISHED -
tcp        0      0 localhost:46648         localhost:9000
FIN_WAIT2   -
tcp     1336      0 localhost:9000          localhost:46670
ESTABLISHED -
tcp        9      0 localhost:9000          localhost:36108
CLOSE_WAIT  14274/php5-fpm
tcp     1336      0 localhost:9000          localhost:46684
ESTABLISHED -
tcp        0      0 localhost:46674         localhost:9000
ESTABLISHED 12909/nginx: worker
tcp     1336      0 localhost:9000          localhost:46666
ESTABLISHED -
tcp     1257      0 localhost:9000          localhost:46648
CLOSE_WAIT  -
tcp     1336      0 localhost:9000          localhost:46678
ESTABLISHED -
tcp        0      0 localhost:46668         localhost:9000
ESTABLISHED 12909/nginx: worker
tcp     1321      0 localhost:9000          localhost:46635
CLOSE_WAIT  -
tcp        0      0 localhost:46676         localhost:9000
ESTABLISHED 12909/nginx: worker
tcp     1336      0 localhost:9000          localhost:46682
ESTABLISHED -
tcp     1257      0 localhost:9000          localhost:36127
CLOSE_WAIT  -
tcp     1265      0 localhost:9000          localhost:46617
CLOSE_WAIT  -
tcp        0      0 localhost:46684         localhost:9000
ESTABLISHED -
tcp     1265      0 localhost:9000          localhost:46621
CLOSE_WAIT  -
tcp     1265      0 localhost:9000          localhost:46623
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:46652
CLOSE_WAIT  -
tcp     1321      0 localhost:9000          localhost:46627
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:46654
CLOSE_WAIT  -
tcp     1265      0 localhost:9000          localhost:46611
CLOSE_WAIT  -
tcp        0      0 localhost:46654         localhost:9000
FIN_WAIT2   -
tcp        0      0 localhost:46644         localhost:9000
FIN_WAIT2   -
tcp        0      0 localhost:46678         localhost:9000
ESTABLISHED 12909/nginx: worker
tcp     1321      0 localhost:9000          localhost:46637
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:36121
CLOSE_WAIT  -
tcp     1265      0 localhost:9000          localhost:46605
CLOSE_WAIT  -
tcp     1265      0 localhost:9000          localhost:46609
CLOSE_WAIT  -
tcp     1336      0 localhost:9000          localhost:46680
ESTABLISHED -
tcp     1257      0 localhost:9000          localhost:46662
CLOSE_WAIT  -
tcp     1321      0 localhost:9000          localhost:46631
CLOSE_WAIT  -
tcp        0      0 localhost:46652         localhost:9000
FIN_WAIT2   -
tcp        0      0 localhost:46660         localhost:9000
FIN_WAIT2   -
tcp     1249      0 localhost:9000          localhost:36114
CLOSE_WAIT  -
tcp     1336      0 localhost:9000          localhost:46672
ESTABLISHED -
tcp     1257      0 localhost:9000          localhost:36123
CLOSE_WAIT  -
tcp        9      0 localhost:9000          localhost:36113
CLOSE_WAIT  14273/php5-fpm
tcp     1321      0 localhost:9000          localhost:46639
CLOSE_WAIT  -
tcp     1265      0 localhost:9000          localhost:46615
CLOSE_WAIT  -
tcp     1336      0 localhost:9000          localhost:46674
ESTABLISHED -
tcp        9      0 localhost:9000          localhost:36085
CLOSE_WAIT  14270/php5-fpm
tcp        0      0 localhost:46656         localhost:9000
FIN_WAIT2   -
tcp        0      0 localhost:46650         localhost:9000
FIN_WAIT2   -
tcp        9      0 localhost:9000          localhost:36097
CLOSE_WAIT  14271/php5-fpm
tcp     1257      0 localhost:9000          localhost:36129
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:46646
CLOSE_WAIT  -
tcp     1257      0 localhost:9000          localhost:46650
CLOSE_WAIT  -
tcp        0      0 localhost:46680         localhost:9000
ESTABLISHED 12909/nginx: worker

There are plenty of “CLOSE_WAIT” and kind of pairs as like below:

tcp     1337      0 localhost:9000          localhost:46680
CLOSE_WAIT  -
tcp        0      0 localhost:46680         localhost:9000
FIN_WAIT2   -

Please note port 46680 in above.

I enabled mysql slow queries error log, but it didn’t work.

As of now restarting php5-fpm every minute via a cronjob (see command
below) keeping everything running “smoothly” but I hate patchwork and
want to solve this…

1 * * * * service php5-fpm restart > /dev/null

I searched extensively on Google - got no help.
As mentioned, this a test-server in LAN, CPU load is never crossed 0.10
and memory usage is also below 25% (System has 2GB RAM and ubuntu-server
installed)
So if you find its time-confusing to help me out, please atleast drop a
hint.

Thanks in advance for help.
-Rahul

Posted at Nginx Forum:

Can anyone please tell me if I should check nginx OR php-fpm for the
issues?

-Rahul :expressionless:

Can anyone please tell me if I should check nginx OR php-fpm for the
issues?

-Rahul :expressionless:

–
Rahul B. | Founder & CEO | rtCamp Solutions Pvt. Ltd.
Mobile: +91-9860501882 | Web: http://rtcamp.com/

This is certainly php-fpm issue:

Thanks Igor.
Do you think, if I use normal php-fastcgi - it will solve the problem?

On Tue, Sep 07, 2010 at 11:58:21AM +0530, Rahul B. wrote:

Can anyone please tell me if I should check nginx OR php-fpm for the issues?

This is certainly php-fpm issue:

2010/09/06 21:25:11 [error] 12909#0: *349 recv() failed (104: Connection
reset by peer) while reading response header from upstream, client:
192.168.0.1, server: rahul286.rtcamp.info, request: “GET /favicon.ico
HTTP/1.1”, upstream: “fastcgi://127.0.0.1:9000”, host:
“rahul286.rtcamp.info”

–
Igor S.
http://sysoev.ru/en/

After debugging for 6-hours - I am giving this up :expressionless:

First of all since this is clearly a php-fpm issue you should have
checked
(enabled) the php-fpm logs.

In your php-fpm.conf there should be lines:

error_log = /some/path/error.log
log_level = notice

and for debugging / performance purposes its good to add:

request_terminate_timeout = 60s
request_slowlog_timeout = 20s
slowlog = /some/path/slow.log

This way php-fpm will start to log/backtrace (print the codeparts in
logfile) of requests which take longer than 20 seconds (you can adjust
by
your own needs) and kill those taking longer that minute.

It should pretty much give you an idea what is happening and/or slowing
down
the php to level it hangs up (the typical case is usually some code
involving sockets or db backends).

rr

On Tue, Sep 07, 2010 at 12:10:57PM +0530, Rahul B. wrote:

This is certainly php-fpm issue:

Thanks Igor.
Do you think, if I use normal php-fastcgi - it will solve the problem?

I do not know, try.

Connection


nginx mailing list
[email protected]
nginx Info Page


nginx mailing list
[email protected]
nginx Info Page

–
Igor S.
http://sysoev.ru/en/

Hi Reinis,

Thanks for your reply.

Debugging php-fpm log was first thing I did.

I am using “log_level = debug” and “request_slowlog_timeout = 2s” only
already.
Only thing - “request_terminate_timeout” was not active which I am
setting now = 10s.

I am able to catch slow queries earlier and they didn’t look like
creating issues.

I think problem is mainly because some open (hanged) sockets/process
like below:
tcp 1337 0 localhost:9000 localhost:46680 CLOSE_WAIT

tcp 0 0 localhost:46680 localhost:9000 FIN_WAIT2 -

FIN_WAIT2 => Connection is closed, and the socket is waiting for a
shutdown from the remote end.
CLOSE_WAIT => The remote end has shut down, waiting for the socket to
close.

With my pathetic patchwork, as I am restarting php5-fpm every minute,
above kind of “blockage” is not getting created.

On a sidenote, as I have 100s of virtual hosts, is there anyway - I
can temporary configure nginx to dump all error logs in one place so
that I can analyze them easily? As of now, each virtual host has its
logs files in its own dir.
I cannot rule-out some php code going wrong as this issue started
right in the middle of working hours and its really hard to find out
exactly which virtual host might have some bad codes.

By the way using netstats, output, can I determine, for which domain
the port was assigned?

Thanks for your inputs. I really appreciate them as this is toughest
time I am having in my career.

-Rahul

First of all I just noticed this in your initial mail:

2010/09/06 22:09:40 [error] 12909#0: *1064 connect() failed (111:
Connection
refused) while connecting to upstream, client: 192.168.0.1, server:
rahul286.rtcamp.info, request: “GET /favicon.ico HTTP/1.1”, upstream:
“fastcgi://127.0.0.1:9000”, host: “rahul286.rtcamp.info”

Have you by chance misconfigurated the nginx and passing also static
files
(in this case favicon.ico) to php backend??

Debugging php-fpm log was first thing I did.

And nothing in there?

I am using “log_level = debug”

Debug prolly will end up as too much information anyways where in
general
you just need to catch something like this to pinpoint the slow scripts
(for
example):

Sep 07 18:51:01.536661 [NOTICE] [pool www] child 17460 exited with code
0
after 5875.977592 seconds from start
Sep 07 18:51:01.537240 [NOTICE] [pool www] child 18471 started
Sep 07 19:12:26.791721 [WARNING] [pool www] child 18409, script
‘/data/web/root/index.php’ executing too slow (5.103076 sec), logging
Sep 07 19:12:26.791790 [NOTICE] child 18409 stopped for tracing
Sep 07 19:12:26.791799 [NOTICE] about to trace 18409
Sep 07 19:12:26.791932 [ERROR] ptrace(PEEKDATA) failed: Input/output
error
(5)
Sep 07 19:12:26.791980 [NOTICE] finished trace of 18409

With my pathetic patchwork, as I am restarting php5-fpm every minute,
above kind of “blockage” is not getting created.

This is the job for FPM itself and no external tools / jobs should be
needed.

You could try (if not allready) the php-fpm adaptive spawning mechanism
then
in case there is overload situation (too few free php childs) you would
see
something like this in logs:

Sep 05 23:31:44.601141 [WARNING] [pool www] seems busy (you may need to
increase start_servers, or min/max_spare_servers), spawning 16 children,
there are 4 idle, and 50 total children
Sep 05 23:31:45.605144 [WARNING] [pool www] seems busy (you may need to
increase start_servers, or min/max_spare_servers), spawning 32 children,
there are 3 idle, and 51 total children

rr

Rahul,

I have the same problem and I think I, probably, know what’s causing it.
The bad thing is that I got it trying to solve another one and I don’t
know how to fix both :slight_smile:

So, until today, my server sometimes issued 504 error. The error log
reported “upstream timed out (110: Connection timed out) while reading
response header from upstream” error. Trying to fix it, I played with
various PHP-FPM timeouts. So, I increased pm.max_children from 5 to 10
and set request_terminate_timeout to 30s. Immediately after these
changes had been made, 504 error was gone, but now I receive 502 with
the same diagnostics as yours: “Connection reset by peer”.

Do you have request_terminate_timeout defined in your php-fpm.conf?

Posted at Nginx Forum:

minaev Wrote:

Do you have request_terminate_timeout defined in
your php-fpm.conf?

No. I hope this will solve the problem.
Thanks. :slight_smile:

Posted at Nginx Forum:

2010/9/8 minaev [email protected]:

and set request_terminate_timeout to 30s. Immediately after these
changes had been made, 504 error was gone, but now I receive 502 with
the same diagnostics as yours: “Connection reset by peer”.

Do you have request_terminate_timeout defined in your php-fpm.conf?

you have too many connections and FPM is not able to handle
everything. You can play with backlog, max_client and timeouts.

you have too many connections and FPM is not able
to handle
everything. You can play with backlog, max_client
and timeouts.

Thank you. I have already set listen.backlog to 4096. While tweaking the
timeouts, I got the situation I described above. And, sorry, max_client
stands for what?

Posted at Nginx Forum:

@minaev & Jerome

How listen.backlog works?

My issues is solved but I would prefer to have some extra knowledge.
:wink:

Thanks All,
-Rahul :slight_smile:

Posted at Nginx Forum:

Thanks Reinis.

request_terminate_timeout = 30s worked for me.

I also started using adaptive spawning mechanism.

What I am wondering about is:
-I have same config without “request_terminate_timeout = 60s” and it
handles
around 20K page-views daily without any issues.
-Issue appeared all of a sudden, then just 2 days back a test-site with
200MB database was created and running smoothly with 100 other sites
without
any problem.
-PHP-FPM debug really produced very less info which was not helpful at
all.

On sidenote, why php error shows up in nginx error.log and not in
php-fpm?

–
Rahul B. | Founder & CEO | rtCamp Solutions Pvt. Ltd.
Mobile: +91-9860501882 | Web: http://rtcamp.com/

Rahul,

Backlog is explained in man listen(2):

The backlog argument defines the maximum length to which the queue of
pending connections for sockfd
may grow. If a connection request arrives when the queue is full, the
client may receive an error
with an indication of ECONNREFUSED or, if the underlying protocol
supports retransmission, the
request may be ignored so that a later reattempt at connection
succeeds.
…
If the backlog argument is greater than the value in
/proc/sys/net/core/somaxconn, then it is
silently truncated to that value; the default value in this file is
128. In kernels before 2.4.25,
this limit was a hard coded value, SOMAXCONN, with the value 128.

Posted at Nginx Forum:

What I am wondering about is:
-I have same config without “request_terminate_timeout = 60s” and it
handles around 20K page-views daily without any issues.

Thats prolly just the average limit for you after which all the php
childs
are stuck (even one free child still can handle quite a lot) … In those
moments you can try to attach to any of the php processes with ‘gdb’ or
‘strace’ and look what are they doing (or stuck in).

But its not nginx related anymore :wink:

-Issue appeared all of a sudden, then just 2 days back a test-site with
200MB database was created and running smoothly with 100 other sites
without any problem.

Issues dont appear out of nowhere - usually its changes in the php code
or
some backend like DB for example reaching a limit where it can’t handle
all
incoming requests anymore (threads pile up etc etc) :wink:

rr

On Wed, Sep 8, 2010 at 4:02 AM, Rahul B. [email protected]
wrote:

-PHP-FPM debug really produced very less info which was not helpful at all.
On sidenote, why php error shows up in nginx error.log and not in php-fpm?

502 will be when nginx’s connection is broken to the upstream (it was
forcibly killed by request_terminate_timeout, etc.) - it’s bad
gateway.

504 is when nginx is waiting around and hits fastcgi_read_timeout or
other upstream related settings - it’s a gateway timeout.

This isn’t a “php error” per-se. nginx is logging the proxy related
issue, php-fpm just isn’t logging “backlog too full” type of messages
(which it may or may not be able to do. No clue about that)

I’ve found the newer php-fpm to have plenty of error log output to
play with. This may be a lower level kernel related thing that php-fpm
is not aware of because the requests have been queued up and fill
up/timeout/drop before php-fpm even knows about them. (Just a hunch)

problem php-fpm.log

[14-Apr-2011 22:55:34] WARNING: [pool www] seems busy (you may need to
increase start_servers, or min/max_spare_servers), spawning 8 children,
there are 0 idle, and 5 total children
[14-Apr-2011 22:55:35] WARNING: [pool www] seems busy (you may need to
increase start_servers, or min/max_spare_servers), spawning 16 children,
there are 0 idle, and 6 total children
[14-Apr-2011 22:55:36] WARNING: [pool www] seems busy (you may need to
increase start_servers, or min/max_spare_servers), spawning 32 children,
there are 0 idle, and 7 total children
[14-Apr-2011 22:55:38] WARNING: [pool www] seems busy (you may need to
increase start_servers, or min/max_spare_servers), spa

I use centos nginx
php 5.3.6
This program makes use of the Zend Scripting Language Engine:
Zend Engine v2.3.0, Copyright (c) 1998-2011 Zend Technologies
with the ionCube PHP Loader v4.0.7, Copyright (c) 2002-2011, by
ionCube Ltd., and
with XCache v1.3.1, Copyright (c) 2005-2010, by mOo
with Suhosin v0.9.32.1, Copyright (c) 2007-2010, by SektionEins
GmbH

how make more idle ?

thanks ppl

Posted at Nginx Forum:

2010/9/8 rahul286 [email protected]:

@minaev & Jerome

How listen.backlog works?

man 2 listen