Strange nginx issue

We are facing a strange issue on our servers. We have servers with 1GB
RAM
and some drupal sites are running on it.

Generally all sites are loading fine but sometimes we are unable to
access
any sites. After waiting for 10mts we are getting a 502 gateway timeout
error. In middle when we restart either nginx or php5-fpm it will load.

Our configurations are as follows:

/etc/nginx/nginx.conf:

user www-data;

worker_processes 1;

pid /run/nginx.pid;

worker_rlimit_nofile 400000;

events {

    worker_connections 10000;

    multi_accept on;

    use epoll;

}

http {

    access_log      off;

    sendfile on;

    tcp_nopush on;

    tcp_nodelay on;

    keepalive_timeout 2;

    types_hash_max_size 2048;

    server_tokens off;

    keepalive_requests 100000;

    reset_timedout_connection on;

    port_in_redirect off;

    client_max_body_size 10m;

    proxy_connect_timeout  600s;

    proxy_send_timeout  600s;

    proxy_read_timeout  600s;

    fastcgi_send_timeout 600s;

   fastcgi_read_timeout 600s;

   open_file_cache max=200000 inactive=20s;

   open_file_cache_valid 30s;

   open_file_cache_min_uses 2;

   open_file_cache_errors on;

/etc/php5/fpm/pool.d/www.conf

pm.max_children = 5

pm.start_servers = 2

pm.min_spare_servers = 1

pm.max_spare_servers = 3

;pm.process_idle_timeout = 10s;

;pm.max_requests = 200

request_terminate_timeout = 300s

And please see the added contents in /etc/sysctl.conf

##########################

fs.file-max = 150000

net.core.netdev_max_backlog=32768

net.core.optmem_max=20480

#net.core.rmem_default=65536

#net.core.rmem_max=16777216

net.core.somaxconn=50000

#net.core.wmem_default=65536

#net.core.wmem_max=16777216

net.ipv4.tcp_fin_timeout=120

#net.ipv4.tcp_keepalive_intvl=30

#net.ipv4.tcp_keepalive_probes=3

#net.ipv4.tcp_keepalive_time=120

net.ipv4.tcp_max_orphans=262144

net.ipv4.tcp_max_syn_backlog=524288

net.ipv4.tcp_max_tw_buckets=524288

#net.ipv4.tcp_mem=1048576 1048576 2097152

#net.ipv4.tcp_no_metrics_save=1

net.ipv4.tcp_orphan_retries=0

#net.ipv4.tcp_rmem=4096 16384 16777216

#net.ipv4.tcp_synack_retries=2

net.ipv4.tcp_syncookies=1

#net.ipv4.tcp_syn_retries=2

#net.ipv4.tcp_wmem=4096 32768 16777216

##########################

Can anyone please help us on it.

Thanks

Geo

I’ve just done a drupal7 site under nginx+php-fpm on debian.

One thing I noticed was that the php process wasn’t closing fast enough,
this was tracked down to an issue with mysql. Connections were sitting
idle for a long time which basically exhausted the fpm workers on both
the web servers.

We have 2 mysql nodes doing replication between them and I think the
binlog commit was holding this up, adding
“innodb_flush_log_at_trx_commit = 0” to the my.cnf stopped this problem
from occuring.

Page caching is stored in mysql too, moving this to memcached helped
massively and reduced the daily binlogs from 5G/day down to a few
hundred meg.

I’m not sure if this is a strange setup but we have nginx terminating
ssl which proxies to varnish which then has 2 additional nginx nodes
serving drupal7, these use a 2 node mysql cluster and 2 memcached nodes
for caching pages etc.

Steve.

On 07/04/2014 15:34, Geo P.C. wrote:

worker_processes 1;

tcp_nopush on;

proxy_read_timeout 600s;

pm.max_spare_servers = 3

net.core.somaxconn=50000

#net.ipv4.tcp_no_metrics_save=1


nginx mailing list
[email protected]
nginx Info Page [1]

Links:

Thanks Steve for your update. We are using separate mysql server and in
this innodb_flush_log_at_trx_commit = 1. This site is running money
transactions applications so is it safe to change this option.

Also to this mysql server other servers with default nginx and PHP5-fpm
configuration is connecting it and they don’t have such issues. For
optimization we done the above changes and causing this issue.

So can you please help me.

On 07/04/2014 16:45, Steve W. wrote:

I guess that link was 4.1 specific, I re-read
http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
and ended up changing the option back to 1 but also added in the
sync_binlog=1 option.

So far I’ve not seen any sql connections sitting doing nothing.

It might be worth doing a “show processlist” in sql when the problem
occurs to confirm if this is actually where the problem lies.

Steve

A quick read at
http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
[2]
suggests there’s a possibility of losing 1s worth of data. I’m not sure
if we’d still have a problem with this now we’ve moved page caching to
memcache as that was causing a lot of updates.

Unfortunately I’m at work so can’t investigate other variables easily at
the moment, I’ll hopefully have time this evening though.

Steve.

On 07/04/2014 16:18, Geo P.C. wrote:

One thing I noticed was that the php process wasn’t closing fast enough, this
was tracked down to an issue with mysql. Connections were sitting idle for a long
time which basically exhausted the fpm workers on both the web servers.

worker_processes 1;

tcp_nopush on;

proxy_read_timeout 600s;

pm.max_spare_servers = 3

net.core.somaxconn=50000

#net.ipv4.tcp_no_metrics_save=1


nginx mailing list
[email protected]
nginx Info Page [1]


nginx mailing list
[email protected]
nginx Info Page [1]


nginx mailing list
[email protected]
nginx Info Page [1]


Steve W.
IT Team Leader - Pirate Party UK
OpenDNS 2012 Sysadmin Awards: Flying Solo - Winner
+44.7751874508

Pirate Party UK is a political party registered with the Electoral
Commission.

Links: