Forum: NGINX Quick performance deterioration when No. of clients increases

34c7b0659e6361c2bbbdbfa6ca597cc0?d=identicon&s=25 Nikolaos Milas (Guest)
on 2013-10-11 10:08
(Received via mailing list)
Attachment: load_impact_1.png (60 KB)
Hello,

I am trying to migrate a Joomla 2.5.8 website from Apache to NGINX 1.4.2
with php-fpm 5.3.3 (and MySQL 5.5.34) on a CentOS 6.4 x86_64 Virtual
Machine (running under KMS).

The goal is to achieve better peak performance: This site has occasional
high peaks; while the normal traffic is ~10 req/sec, it may reach > 3000
req/sec for periods of a few hours (due to the type of services the site
provides - it is a non-profit, real-time seismicity-related site - so
php caching should not be more than 10 seconds).

The new VM (using Nginx) currently is in testing mode and it only has
1-core CPU / 3 GB of RAM. We tested performance with loadimpact and the
results are attached.

You can see at the load graph that as the load approaches 250 clients,
the response time increases very much and is already unacceptable (this
happens consistently). I expected better performance, esp. since caching
is enabled. Despite many efforts, I cannot find the cause of the
bottleneck, and how to deal with it. We would like to achieve better
scaling, esp. since NGINX is famous for its scaling capabilities. Having
very little experience with Nginx, I would like to ask for your
assistance for a better configuration.

When this performance deterioration occurs, we don't see very high CPU
load (Unix load peaks 2.5), neither RAM exhaustion (System RAM usage
appears to be below 30%). [Monitoring is through Nagios.]

Can you please guide me on how to correct this issue? Any and all
suggestions will be appreciated.

Current configuration, based on info available on the Internet, is as
follows (replaced true domain/website name and public IP address(es)):

=================== Nginx.conf ===================

user  nginx;
worker_processes  1;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

worker_rlimit_nofile 200000;

events {
     worker_connections 8192;
     multi_accept on;
     use epoll;
}

http {
     include       /etc/nginx/mime.types;
     default_type  application/octet-stream;
     server_names_hash_bucket_size 64;

     log_format  main  '$remote_addr - $remote_user [$time_local]
"$request" '
                       '$status $body_bytes_sent "$http_referer" '
                       '"$http_user_agent" "$http_x_forwarded_for"';

     log_format cache  '$remote_addr - $remote_user [$time_local]
"$request" '
                       '$status $upstream_cache_status $body_bytes_sent
"$http_referer" '
                       '"$http_user_agent" "$http_x_forwarded_for"';

     fastcgi_cache_path /var/cache/nginx levels=1:2
keys_zone=microcache:5m max_size=1000m;

     access_log  /var/log/nginx/access.log  main;

     sendfile           on;

     tcp_nopush         on;
     tcp_nodelay        on;
     keepalive_timeout  2;

     types_hash_max_size 2048;
     server_tokens off;

     keepalive_requests 30;

     open_file_cache max=5000 inactive=20s;
     open_file_cache_valid 30s;
     open_file_cache_min_uses 2;
     open_file_cache_errors on;

     gzip on;
     gzip_static on;
     gzip_disable "msie6";
     gzip_http_version 1.1;
     gzip_vary on;
     gzip_comp_level 6;
     gzip_proxied any;
     gzip_types text/plain text/css application/json
application/x-javascript text/xml application/xml application/xml+rss
text/javascript application/javascript text/x-js;
     gzip_buffers 16 8k;

     include /etc/nginx/conf.d/*.conf;
}

==================================================

================ website config ==================

server {
     listen       80;
     server_name  www.example.com;
     access_log  /var/webs/wwwexample/log/access_log main;
     error_log /var/webs/wwwexample/log/error_log warn;
     root   /var/webs/wwwexample/www/;

     index  index.php index.html index.htm index.cgi default.html
default.htm default.php;
     location / {
         try_files $uri $uri/ /index.php?$args;
     }

     location /nginx_status {
        stub_status on;
        access_log   off;
        allow 10.10.10.0/24;
        deny all;
     }

     location ~*
/(images|cache|media|logs|tmp)/.*\.(php|pl|py|jsp|asp|sh|cgi)$ {
         return 403;
         error_page 403 /403_error.html;
     }

     location ~ /\.ht {
         deny  all;
     }

     location /administrator {
         allow 10.10.10.0/24;
         deny all;
     }

     location ~ \.php$ {

         # Setup var defaults
         set $no_cache "";
         # If non GET/HEAD, don't cache & mark user as uncacheable for 1
second via cookie
if ($request_method !~ ^(GET|HEAD)$) {
             set $no_cache "1";
}
         # Drop no cache cookie if need be
         # (for some reason, add_header fails if included in prior
if-block)
         if ($no_cache = "1") {
             add_header Set-Cookie "_mcnc=1; Max-Age=2; Path=/";
             add_header X-Microcachable "0";
         }
         # Bypass cache if no-cache cookie is set
         if ($http_cookie ~* "_mcnc") {
             set $no_cache "1";
         }
         # Bypass cache if flag is set
         fastcgi_no_cache $no_cache;
         fastcgi_cache_bypass $no_cache;
         fastcgi_cache microcache;
         fastcgi_cache_key $scheme$host$request_uri$request_method;
         fastcgi_cache_valid 200 301 302 10s;
         fastcgi_cache_use_stale updating error timeout invalid_header
http_500;
         fastcgi_pass_header Set-Cookie;
         fastcgi_pass_header Cookie;
         fastcgi_ignore_headers Cache-Control Expires Set-Cookie;

         try_files $uri =404;
         include /etc/nginx/fastcgi_params;
         fastcgi_param PATH_INFO $fastcgi_script_name;
         fastcgi_intercept_errors on;

         fastcgi_buffer_size 128k;
         fastcgi_buffers 256 16k;
         fastcgi_busy_buffers_size 256k;
         fastcgi_temp_file_write_size 256k;
         fastcgi_read_timeout 240;

         fastcgi_pass unix:/tmp/php-fpm.sock;

         fastcgi_index index.php;
         include /etc/nginx/fastcgi_params;
         fastcgi_param SCRIPT_FILENAME
$document_root$fastcgi_script_name;

     }

     location ~* \.(ico|pdf|flv)$ {
         expires 1d;
     }

     location ~* \.(js|css|png|jpg|jpeg|gif|swf|xml|txt)$ {
         expires 1d;
     }

}
==================================================

================= php-fpm.conf ===================
include=/etc/php-fpm.d/*.conf
[global]
pid = /var/run/php-fpm/php-fpm.pid
error_log = /var/log/php-fpm/error.log

daemonize = no
==================================================

============== php-fpm.d/www.conf ================

[www]
listen = /tmp/php-fpm.sock
listen.allowed_clients = 127.0.0.1
user = nginx
group = nginx

pm = dynamic
pm.max_children = 1024
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35

slowlog = /var/log/php-fpm/www-slow.log

php_flag[display_errors] = off
php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on
php_admin_value[memory_limit] = 128M

php_value[session.save_handler] = files
php_value[session.save_path] = /var/lib/php/session

==================================================

================ mysql my.cnf ====================

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
user=mysql

query_cache_limit = 2M
query_cache_size = 200M
query_cache_type=1
thread_cache_size=128
key_buffer = 100M
join_buffer = 2M
table_cache= 150M
sort_buffer= 2M
read_rnd_buffer_size=10M
tmp_table_size=200M
max_heap_table_size=200M

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

==================================================

=============== mysqltuner report ================

  >>  MySQLTuner 1.2.0 - Major Hayden <major@mhtx.net>

-------- General Statistics
--------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.34
[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics
-------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 9M (Tables: 80)
[--] Data in InnoDB tables: 1M (Tables: 65)
[--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17)
[--] Data in MEMORY tables: 0B (Tables: 4)
[!!] Total fragmented tables: 66

-------- Security Recommendations
-------------------------------------------
[OK] All database users have passwords assigned

-------- Performance Metrics
-------------------------------------------------
[--] Up for: 12h 51m 16s (21K q [0.471 qps], 1K conn, TX: 10M, RX: 1M)
[--] Reads / Writes: 55% / 45%
[--] Total buffers: 694.0M global + 21.4M per thread (151 max threads)
[!!] Maximum possible memory usage: 3.8G (135% of installed RAM)
[OK] Slow queries: 0% (0/21K)
[OK] Highest usage of available connections: 23% (36/151)
[OK] Key buffer size / total MyISAM indexes: 150.0M/5.1M
[OK] Key buffer hit rate: 99.3% (51K cached / 358 reads)
[OK] Query cache efficiency: 80.9% (10K cached / 13K selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 55 sorts)
[OK] Temporary tables created on disk: 8% (5 on disk / 60 total)
[OK] Thread cache hit rate: 98% (36 created / 1K connections)
[OK] Table cache hit rate: 20% (192 open / 937 opened)
[OK] Open file limit used: 0% (210/200K)
[OK] Table locks acquired immediately: 99% (4K immediate / 4K locks)
[!!] Connections aborted: 8%
[OK] InnoDB data size / buffer pool: 1.1M/128.0M

==================================================

Please advise.

Thanks and Regards,
Nick
671d9faabfe3d3382be736b93fbfa1d5?d=identicon&s=25 Steve Holdoway (Guest)
on 2013-10-11 10:19
(Received via mailing list)
The ultimate bottleneck in any setup like this is usually raw cpu
power.  A single virtual core doesn't look like it'll hack it. You've
got 35 php processes serving 250 users, and I think it's just spread a
bit thin.

Apart from adding cores, there are 2 things I'd suggest looking at

  - are you using an opcode cacher? APC ( install via pecl to get the
latest ) works really well with php in fpm... allocate plenty of memory
to it too
  - check the bandwidth at the network interface. The usual 100Mbit
connection can easily get swamped by a graphics rich site - especially
with 250 concurrent users. If this is a problem, then look at using a
CDN to ease things.

hth,

Steve
3cb894b48ee4742e47a5716f74fc1320?d=identicon&s=25 Dennis Jacobfeuerborn (Guest)
on 2013-10-11 13:25
(Received via mailing list)
On 11.10.2013 10:18, Steve Holdoway wrote:
> The ultimate bottleneck in any setup like this is usually raw cpu
> power.  A single virtual core doesn't look like it'll hack it. You've
> got 35 php processes serving 250 users, and I think it's just spread a
> bit thin.
>
> Apart from adding cores, there are 2 things I'd suggest looking at
>
>   - are you using an opcode cacher? APC ( install via pecl to get the
> latest ) works really well with php in fpm... allocate plenty of memory
> to it too

APC is sort of deprecated though (at least the opcode cache part) in
favor of zend-opcache which is integrated in php 5.5.

Regards,
  Dennis
34c7b0659e6361c2bbbdbfa6ca597cc0?d=identicon&s=25 Nikolaos Milas (Guest)
on 2013-10-12 15:58
(Received via mailing list)
On 11/10/2013 11:18 πμ, Steve Holdoway wrote:

> Apart from adding cores, there are 2 things I'd suggest looking at
>
> - are you using an opcode cacher? APC ( install via pecl to get the
> latest ) works really well with php in fpm... allocate plenty of
> memory to it too
> - check the bandwidth at the network interface. The usual 100Mbit
> connection can easily get swamped by a graphics rich site - especially
> with 250 concurrent users. If this is a problem, then look at using a
> CDN to ease things.
>

Thanks for the hints.

The strange thing is that unix load does not seem to be over-strained
when this performance deterioration occurs.

APCu seems to be enabled:

    extension = apcu.so
    apc.enabled=1
    apc.mmap_file_mask=/tmp/apc.XXXXXX

All other params are default.

The network interface is Gigabit and should not be a problem.

We'll add virtual RAM and cores. *Any other suggestions? *

I wish there were a tool which benchmark/analyze the box and running
services and produce suggestions for all lemp stack config: mysqld, php,
php-fpm, apc, nginx! Some magic would help!!

Thanks,
Nick
A8cae7780631b0e946f4826cd06128a6?d=identicon&s=25 Toni Mueller (Guest)
on 2013-10-14 16:48
(Received via mailing list)
Hi Nick,

On Sat, Oct 12, 2013 at 04:47:50PM +0300, Nikolaos Milas wrote:
> We'll add virtual RAM and cores. *Any other suggestions? *

did you investigate disk I/O?

I found this to be the limiting factor. If you have shell access and if
it is a Linux machine, you can run 'top', 'dstat' and 'htop' to get an
idea about what is happening. 'dstat' gives you disk I/O and network
I/O.


Kind regards,
--Toni++
34c7b0659e6361c2bbbdbfa6ca597cc0?d=identicon&s=25 Nikolaos Milas (Guest)
on 2013-10-16 12:33
(Received via mailing list)
Attachment: graphes-Perfs-rate_tn.png (3 KB)
Attachment: graphes-Errors-rate_tn.png (3 KB)
Attachment: graphes-Perfs-mean_tn.png (3 KB)
On 14/10/2013 5:47 μμ, Toni Mueller wrote:

> did you investigate disk I/O?

Hi again,

Thanks for your suggestions (see below on that).

In the meantime, we have increased CPU power to 4 cores and the behavior
of the server is much better.

I found that the server performance was reaching a bottleneck (by
php-fpm) by NOT using microcache, because most pages were returning
codes 303 502 (and these return codes were not included in
fastcgi_cache_valid by default). When I set:

    fastcgi_cache_valid 200 301 302 303 502 3s;

then I saw immediate performance gains and drop to unix load down to
almost 0 (from 100 - not a typo -) during load.

I used iostat during a load test and I didn't see any serious stress on
I/O. The worst (max load) recorded entry is:

==========================================================================================================
avg-cpu: %user %nice %system %iowait %steal %idle
85.43 0.00 12.96 0.38 0.00 1.23

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await
svctm %util
vda 0.00 136.50 0.00 21.20 0.00 1260.00 59.43 1.15 54.25 3.92 8.30
dm-0 0.00 0.00 0.00 157.50 0.00 1260.00 8.00 13.39 85.04 0.53 8.29
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
==========================================================================================================

Can you see a serious problem here? (I am not an expert, but, judging
from what I've read on the Internet, it should not be bad.)

Now my problem is that there seems to be a limit of performance to
around 1200 req/sec (which is not too bad, anyway), although CPU and
memory is ample during all test. Increasing stress load more than that
(I am using tsung for load testing), results only to increasing
"error_connect_emfile" errors.

See results of a test attached. (100 users arriving per second for 5
minutes (with max 10000 users), each of them hitting the homepage 100
times. Details of the test at the bottom of this mail.)

My research showed that this should be a result of file descriptor
exhaustion, however I could not find the root cause. The following seem
OK:

# cat /proc/sys/fs/file-max
592940
# ulimit -n
200000
# ulimit -Hn
200000
# ulimit -Sn
200000
# grep nofile /etc/security/limits.conf
* - nofile 200000

Could you please guide me on how to resolve this issue? What is the real
bottleneck here and how to overcome?

My config remains as was initially posted (it can also be seen here:
https://www.ruby-forum.com/topic/4417776), with the difference of:
"worker_processes 4" (since we now have 4 CPU cores).

Please advise.

============================= tsung.xml <start>
=============================

<?xml version="1.0"?>
<!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd">

<tsung loglevel="debug" dumptraffic="false" version="1.0">

<clients>
<client host="localhost" use_controller_vm="true" maxusers="10000"/>
</clients>

<servers>
<server host="www.example.com" port="80" type="tcp"></server>
</servers>

<load duration="5" unit="minute">
<arrivalphase phase="1" duration="5" unit="minute">
<users arrivalrate="100" unit="second"/>
</arrivalphase>
</load>

<sessions>
<session probability="100" name="hit_en_homepage" type="ts_http">
<for from="1" to="100" var="i">
<request><http url='/' version='1.1' method='GET'></http></request>
<thinktime random='true' value='1'/>
</for>
</session>
</sessions>

</tsung>

============================== tsung.xml <end>
===============================

Thanks and Regards,
Nick
34c7b0659e6361c2bbbdbfa6ca597cc0?d=identicon&s=25 Nikolaos Milas (Guest)
on 2013-10-16 18:08
(Received via mailing list)
Attachment: compare.png (20 KB)
On 16/10/2013 1:32 μμ, Nikolaos Milas wrote:

> Now my problem is that there seems to be a limit of performance...
>
> Increasing stress load more than that (I am using tsung for load
> testing), results only to increasing "error_connect_emfile" errors.

I have been trying to resolve this behavior and I increased file
descriptors to 400.000:

    # ulimit -n
    400000

since:

    # cat /proc/sys/fs/file-max
    592940

Now, I am running the following test: X number of users per sec visit
the homepage and each one of them refreshes the page 4 times (at random
intervals).

Although the test scales OK until 500 users per sec, then
"error_connect_emfile" errors start again and performance deteriorates.
See the attached comparative chart.

So, I have two questions:

 1. Is there a way we can tweak settings to make the web server scale
    gracefully up to the limit of its resources (and not deteriorate
    performance) as load increases? Can we leverage additional RAM (the
    box always uses up to 3.5 GB RAM, despite the load, and despite the
    fact that the VM now has 6 GB)?
 2. If not, how can we safeguard the web server by setting a suitable
    limit which cannot be surpassed to cause performance deterioration?

Please advise.

Thanks and regards,
Nick
15eb13d662803f57bc4aea59704988b4?d=identicon&s=25 Scott Ribe (Guest)
on 2013-10-16 18:10
(Received via mailing list)
On Oct 16, 2013, at 10:07 AM, Nikolaos Milas <nmilas@noa.gr> wrote:

> 2. If not, how can we safeguard the web server by setting a suitable
>   limit which cannot be surpassed to cause performance deterioration?

Have you considered not having vastly more worker processes than you
have cores? (IIRC, you have configured things that way...)

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
34c7b0659e6361c2bbbdbfa6ca597cc0?d=identicon&s=25 Nikolaos Milas (Guest)
on 2013-10-16 18:16
(Received via mailing list)
On 16/10/2013 7:10 μμ, Scott Ribe wrote:

> Have you considered not having vastly more worker processes than you have cores?
(IIRC, you have configured things that way...)

I have (4 CPU cores and):

    worker_processes 4;
    worker_rlimit_nofile 400000;

    events {
    worker_connections 8192;
    multi_accept on;
    use epoll;
    }

Any ideas will be appreciated!

Nick
15eb13d662803f57bc4aea59704988b4?d=identicon&s=25 Scott Ribe (Guest)
on 2013-10-16 18:22
(Received via mailing list)
On Oct 16, 2013, at 10:16 AM, Nikolaos Milas <nmilas@noa.gr> wrote:

> I have (4 CPU cores and):
>
>   worker_processes 4;
>   worker_rlimit_nofile 400000;
>
>   events {
>   worker_connections 8192;
>   multi_accept on;
>   use epoll;
>   }

Then I have confused this thread with a different one. Sorry for the
noise.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
34c7b0659e6361c2bbbdbfa6ca597cc0?d=identicon&s=25 Nikolaos Milas (Guest)
on 2013-10-17 09:51
(Received via mailing list)
On 16/10/2013 7:07 μμ, Nikolaos Milas wrote:

> Although the test scales OK until 500 users per sec, then
> "error_connect_emfile" errors start again and performance
> deteriorates. See the attached comparative chart.

I resolved the "error_connect_emfile" errors by increasing the file
descriptors on the tsung machine. However, the behavior remains the same
(although no errors occur). I suspect that the problem may not be on the
nginx side but on the tsung box side: the latter may be unable to
generate a higher number of requests and handle the load.

So, I think this case might be considered "closed" until further testing
confirms findings (or rejects them).

Regards,
Nick
E40548053ab6deeef715a1f7fd0aafef?d=identicon&s=25 Jan-Philip Gehrcke (Guest)
on 2013-10-19 13:56
(Received via mailing list)
Hi Nikolaos,

just a small follow-up on this. In your initial mail you stated

 > The new VM (using Nginx) currently is in testing mode and it only has
 > 1-core CPU

as well as

 > When this performance deterioration occurs, we don't see very high
CPU
 > load (Unix load peaks 2.5)

These numbers already tell you that your initial tests were CPU bound. A
simple way to describe the situation would be that you have loaded your
system with 2.5 as much as it was able to handle "simultaneously". On
average, 1.5 processes were in the run queue of the scheduler just
"waiting" for a slice of CPU time.

In this configuration, you observed

 > You can see at the load graph that as the load approaches 250
clients,
 > the response time increases very much and is already unacceptable

Later on, you wrote

 > In the meantime, we have increased CPU power to 4 cores and the
behavior
 > of the server is much better.

and

 > Now my problem is that there seems to be a limit of performance to
 > around 1200 req/sec

Do you see that the rate increased by about factor 4? No coincidence, I
think these numbers clarify where the major bottleneck was in your
initial setup.

Also, there was this part of the discussion:

 > On 16/10/2013 7:10 μμ, Scott Ribe wrote:
 >
 >> Have you considered not having vastly more worker processes than you
 >> have cores? (IIRC, you have configured things that way...)
 >
 > I have (4 CPU cores and):
 >
 >     worker_processes 4;


Obviously, here you also need to consider the PHP-FPM and possibly other
processes involved in your web stack.

Eventually, what you want at all times is to have a load average below
the actual number of cores in your machine (N) , because you want your
machine to stay responsive, at least to internal events.

If you run more processes than N that potentially create huge CPU load,
the load average is easily pushed beyond this limit. Via a large request
rate, your users can then drive your machine to its knees. If you don't
spawn more than N worker processes in the first place, this helps
already a lot in preventing such a user-driven lockup situation.

Cheers,

Jan-Philip
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.