Nginx setting up >25.000 concurrent connections per second

dubstep · October 6, 2011, 8:31pm

Hi,

I a preparing a new web environment with high requirements: 100.000
concurrents connections per second (sometimes). Every server will
execute a php script through php5-fpm.
I am testing where are the limits of nginx (without any php) and how to
setup the machine for optimize it. I will explain my tests and results:

Test:

10 servers 4 CPUs, 4 Gb ram, 16Gb HD.
Local Network: 1Gb (Datacenter network)

1 Server has a debian squeeze with basic installation (from netinstall
iso) and nginx from debian repositories (0.7.67-3)

I changed only 2 options for nginx config (i tested with others):

worker_processes 4;
worker_connections 10240;

I add this lines to /etc/security/limits.conf (restart nginx)

www-data soft nproc 100000
www-data soft nofile 100000

and for discard I/O issues i mounted /var/log/nginx in ram:

mount -t tmpfs -o nodev,nosuid,noexec,nodiratime,size=2500M none
/var/log/nginx/

Created static file:
echo “HOLA”>/var/www/a.txt

From the rest of 9 servers with the same basic installation i installed
apache2-utils and changed: ulimit -n 100000. After just try this
command:

ab -n 500000 -c 200 http://192.168.1.11/a.txt

Really i tested with few server and more with a lot of diferents values
for ab tool, but i can not get better results:

awk ‘{ print $4 }’ /var/log/nginx/localhost.access.log |awk -F: '{

print $2 “:” $3 “:” $4 }'|sort|uniq -c
[…]
22345 19:57:58
21088 19:57:59
19010 19:58:00
20211 19:58:01
22469 19:58:02
23121 19:58:03
22682 19:58:04
23105 19:58:05
24537 19:58:06
22313 19:58:07
22406 19:58:08
22804 19:58:09
23823 19:58:10
22280 19:58:11
24634 19:58:12
22722 19:58:13
22429 19:58:14
24271 19:58:15
20265 19:58:16
20678 19:58:17
23136 19:58:18
22203 19:58:19
22521 19:58:20
24254 19:58:21
23216 19:58:22
22587 19:58:23
18365 19:58:24
22221 19:58:25
22123 19:58:26
24464 19:58:27
[…]

Also i tried changing a lot of things in /etc/sysctl.conf (sysctl -p and
restart nginx) but i didn’t see better results.

For example:

net.ipv4.tcp_keepalive_time = 300

Avoid a smurf attack

net.ipv4.icmp_echo_ignore_broadcasts = 1

Turn on protection for bad icmp error messages

net.ipv4.icmp_ignore_bogus_error_responses = 1

Turn on syncookies for SYN flood attack protection

net.ipv4.tcp_syncookies = 0

Turn on and log spoofed, source routed, and redirect packets

net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1

No source routed packets here

net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0

Turn on reverse path filtering

net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

Make sure no one can alter the routing tables

net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

Don’t act as a router

net.ipv4.ip_forward = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

Turn on execshild

kernel.exec-shield = 1
kernel.randomize_va_space = 1

Tuen IPv6

net.ipv6.conf.default.router_solicitations = 0
net.ipv6.conf.default.accept_ra_rtr_pref = 0
net.ipv6.conf.default.accept_ra_pinfo = 0
net.ipv6.conf.default.accept_ra_defrtr = 0
net.ipv6.conf.default.autoconf = 0
net.ipv6.conf.default.dad_transmits = 0
net.ipv6.conf.default.max_addresses = 1

Optimization for port usefor LBs

Increase system file descriptor limit

fs.file-max = 655350

Allow for more PIDs (to reduce rollover problems); may break some

programs 32768
kernel.pid_max = 65536

Increase system IP port limits

net.ipv4.ip_local_port_range = 1500 65000

Increase TCP max buffer size setable using setsockopt()

net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432

Increase Linux auto tuning TCP buffer limits

min, default, and max number of bytes to use

set max to at least 4MB, or higher if you use very high BDP paths

Tcp Windows etc

net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.core.rmem_default=65536
net.core.wmem_default=65536
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_no_metrics_save = 1

With last kernels and autoptimize is not necessary change anything about
tcp buffers (but i think for this requirements yes).

I was monitoring the machine while tests, CPU usage by nginx is around
30%, RAM nothing important, and few I/O traffic, Load <0.50.

Could somebody help me for find where is the bottleneck?

Thanks.

Posted at Nginx Forum:

atadmin · October 6, 2011, 8:39pm

On Thu, Oct 06, 2011 at 02:30:41PM -0400, atadmin wrote:

10 servers 4 CPUs, 4 Gb ram, 16Gb HD.
I add this lines to /etc/security/limits.conf (restart nginx)
echo “HOLA”>/var/www/a.txt

23105 19:58:05
20265 19:58:16
24464 19:58:27

No source routed packets here

net.ipv4.conf.all.secure_redirects = 0

Increase system file descriptor limit

net.ipv4.tcp_rmem = 4096 87380 33554432
net.core.netdev_max_backlog = 5000

Could somebody help me for find where is the bottleneck?

Thanks.

Could you be bottle-necked in your testing tool “ab”?

Ken

atadmin · October 6, 2011, 8:43pm

Could you be bottle-necked in your testing tool “ab”?

Ken

It could definitely be (or network latency)

try doing the same benchmark from a few hosts at once

atadmin · October 6, 2011, 9:01pm

Hi,

I have the same result with 4 servers that 9 servers executing ab in the
same time. I tested it before until i configured 9 servers client. With
1 server i can get between 12000 - 14500 concurrent connections.

ab -n 500000 -c 200 http://192.168.1.11/a.txt

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.11 (be patient)
Completed 50000 requests
Completed 100000 requests
Completed 150000 requests
Completed 200000 requests
Completed 250000 requests
Completed 300000 requests
Completed 350000 requests
Completed 400000 requests
Completed 450000 requests
Completed 500000 requests
Finished 500000 requests

Server Software: nginx/0.7.67
Server Hostname: 192.168.1.11
Server Port: 80

Document Path: /a.txt
Document Length: 5 bytes

Concurrency Level: 200
Time taken for tests: 34.309 seconds
Complete requests: 500000
Failed requests: 0
Write errors: 0
Total transferred: 107500000 bytes
HTML transferred: 2500000 bytes
Requests per second: 14573.44 [#/sec] (mean)
Time per request: 13.724 [ms] (mean)
Time per request: 0.069 [ms] (mean, across all concurrent
requests)
Transfer rate: 3059.85 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 4 73.8 2 3006
Processing: 0 10 5.2 9 225
Waiting: 0 9 5.2 8 224
Total: 1 14 73.9 11 3021

Percentage of the requests served within a certain time (ms)
50% 11
66% 13
75% 14
80% 15
90% 17
95% 19
98% 21
99% 24
100% 3021 (longest request)

Thanks.

Posted at Nginx Forum:

atadmin · October 6, 2011, 9:18pm

Hi,

On 06.10.2011 21:00, atadmin wrote:

Hi,

I have the same result with 4 servers that 9 servers executing ab in
the
same time. I tested it before until i configured 9 servers client.
With
1 server i can get between 12000 - 14500 concurrent connections.

Maybe you use the wrong tool?

http://kristianlyng.wordpress.com/2010/10/23/275k-req/

Here a statement from HAProxy - The Reliable, High Performance TCP/HTTP Load Balancer

Oct 23th, 2010 : new httperf results : 572000 reqs/s

 This morning I came across this interesting post from Kristian

Lyngstol about the performance tests he ran on the Varnish cache. What
struck me was the number of requests per second Kristian managed to
reach : 275000, not less. I’m not surprized at all that Varnish can
withstand such high rates, it’s known for being very fast. My surprize
came from the fact that Kristian managed to find fast enough tools to
run this test. My old injector is limited to around 100k requests per
second on my machines, as it does not support keep-alive, and Apache’s
ab to around 150k with keep-alive enabled. And when I managed to reach 2
millions requests per second, I was feeding a constant stream of
pipelined requests with netcat, which is particularly inconvenient to
use.

 Kristian said he used httperf. I tried it in the past but did not

manage to get good numbers out of it. He said he found some “httperf
secrets”, so that made me want to try again. First tests were limited to
approximately 50000 requests per second with httperf at 100% CPU.
Something close to my memories. But reading the man, I found that
httperf can work in a session-based mode with the “–wsess” parameter,
where it also support HTTP pipelining. Hmmm nice, we’ll be less sensible
to packet round-trips So I tried again with haproxy simply doing
redirects. Performance was still limited to 50000 requests per second.

 In fact, there appears to be a default limit of 50000 requests per

second when “–rate” is not specified. I set it to 1 million and ran the
test again. Result: about 158000 requests per second at 100% CPU and
with haproxy at 44%. Since my machine is a Core2 Quad at 3 GHz, I fired
3 httperf against one haproxy process. The load reached a max of 572000
requests/s with an average around 450000 requests per second. This time,
haproxy and all 3 httperf were using 100% CPU. What an improvement!

 These tests mean nothing at all for real world uses of course,

because when you have many clients, they won’t send you massive amounts
of pipelined requests. However it’s very nice to be able to stress-test
the HTTP engine for regression testing. And this will be an invaluable
measurement tool to test the end-to-end keep-alive when it’s finished. I
still have to figure out the meaning of some options and how to make the
process less verbose. Right now it fills a screen with many zeroes,
making it hard to extract the useful numbers. I’m grateful to Kristian
to have made me revisit httperf !

atadmin · October 6, 2011, 10:54pm

Hi,

I had some problems with httperf, i will try tomorrow, but i tested with
siege and 11 machines as clients and i get similar results:

22683 22:50:51
25653 22:50:52
27049 22:50:53
26246 22:50:54
26658 22:50:55
25725 22:50:56
26432 22:50:57
26940 22:50:58
27058 22:50:59
27236 22:51:00
27231 22:51:01
27361 22:51:02
26762 22:51:03
27004 22:51:04
27093 22:51:05
27097 22:51:06
26784 22:51:07
24799 22:51:08
24071 22:51:09
24034 22:51:10
26038 22:51:11
27025 22:51:12
26998 22:51:13
26963 22:51:14
27212 22:51:15
27244 22:51:16
27563 22:51:17
27101 22:51:18
25057 22:51:19
25334 22:51:20
25873 22:51:21
28045 22:51:22
27228 22:51:23
26752 22:51:24
26876 22:51:25

Thanks

Posted at Nginx Forum:

atadmin · October 7, 2011, 3:10am

here is my server results.
using 3 of ab, each open 10000 concurrent connections.

cat logger | sed ‘s/||/ /g’ | awk ‘{print $3}’| sed ‘s/.[0-9]+//g’
| sort | uniq -c
66776 1317949624
91383 1317949625
92828 1317949626
93364 1317949627
91456 1317949628
93498 1317949629
92916 1317949630
91795 1317949631
91921 1317949632
92935 1317949633
93000 1317949634
89737 1317949635
91141 1317949636
93217 1317949637
93490 1317949638
93069 1317949639
88566 1317949640
93721 1317949641
93860 1317949642
90619 1317949643
93118 1317949644
93011 1317949645
94501 1317949646
93367 1317949647
92656 1317949648
91941 1317949649

using 60% of cpu.
Server Environment:
4x AMD Quad-Core 8360 SE (total 16 cores)
32G DDR2
SATA3 SSD (r/w 550MB/s)
4x1Gbps Ethernet

2011/10/7 Bradley F. [email protected]:

Also, you have mentioned the state of iptables connection tracking yet. That
[email protected]
nginx Info Page

–
MagicBear

atadmin · October 7, 2011, 2:10am

On Fri, Oct 7, 2011 at 5:00 AM, atadmin [email protected] wrote:

Turn on syncookies for SYN flood attack protection

net.ipv4.tcp_syncookies = 0

I’ve never tested the performance benefit (and the costs) of having sync
cookies enabled or not, but that command there suggests you have turned
sync
cookies off (you probably want it enabled - again it comes at a cost I
haven’t personally investigated).

Also, you have mentioned the state of iptables connection tracking yet.
That
could be a problem if you believe the bottleneck is the server and
haven’t
checked that already. You probably want to disable nf_conntrack and
rewrite
your iptables rules (or just disable firewalling completely).

atadmin · October 7, 2011, 3:11am

and here is my sysctl

Avoid a smurf attack

net.ipv4.icmp_echo_ignore_broadcasts = 1

Turn on protection for bad icmp error messages

net.ipv4.icmp_ignore_bogus_error_responses = 1

Turn on and log spoofed, source routed, and redirect packets

#net.ipv4.conf.all.log_martians = 1
#net.ipv4.conf.default.log_martians = 1

No source routed packets here

net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0

Turn on reverse path filtering

net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

Make sure no one can alter the routing tables

net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

Don’t act as a router

net.ipv4.ip_forward = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

net.core.somaxconn=32768

net.ipv4.ip_local_port_range=4096 65535

net.core.netdev_max_backlog = 32768
net.ipv4.tcp_max_syn_backlog = 32768
net.ipv4.tcp_max_orphans = 262144

#for GigaEthernet
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_mem = 50576 64768 98152
net.core.netdev_max_backlog = 2500
net.ipv4.netfilter.ip_conntrack_max = 1048576
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=1200

net.nf_conntrack_max=237680
net.netfilter.nf_conntrack_max=237680
net.netfilter.nf_conntrack_tcp_timeout_established=1200

fs.file-max = 131072

Setting the Minimum System Page Cache

vm.min_free_kbytes=1024
#Managing the Swap Space
vm.swappiness=10

nginx config
worker_processes 8;

worker_rlimit_nofile 131072;
events {
worker_connections 65536;
use epoll;
}

2011/10/7 MagicBear [email protected]:

93498 1317949629
88566 1317949640
using 60% of cpu.

net.ipv4.tcp_syncookies = 0

–
MagicBear

–
MagicBear

atadmin · October 7, 2011, 3:21am

increase the worker number to 12, and I get such results, I think that
may be the maximum.
cat logger | sed ‘s/||/ /g’ | awk ‘{print $3}’| sed ‘s/.[0-9]+//g’
| sort | uniq -c
58423 1317950330
85703 1317950331
116036 1317950332
115995 1317950333
116070 1317950334
120604 1317950335
119080 1317950336
118695 1317950337
118231 1317950338
114383 1317950339
104594 1317950340
103047 1317950341
105614 1317950342
100386 1317950343
90679 1317950344
94728 1317950345
100741 1317950346
100206 1317950347
100959 1317950348
99431 1317950349
104943 1317950350
104868 1317950351
100532 1317950352
101507 1317950353
106315 1317950354
110642 1317950355
108740 1317950356
105454 1317950357
104623 1317950358
101233 1317950359

2011/10/7 MagicBear [email protected]:

#net.ipv4.conf.default.log_martians = 1
net.ipv4.conf.all.accept_redirects = 0

net.ipv4.tcp_wmem = 4096 65536 16777216

91456 1317949628
93069 1317949639

Turn on syncookies for SYN flood attack protection

your iptables rules (or just disable firewalling completely).

–
MagicBear

atadmin · October 7, 2011, 8:44am

Bradley F.,

I tested with this flag enable and disables and the result is the same,
now is disables because was the las test. Iptables is not enabled for
the test.

magicbear,

could you please tell me which SO and version are you using? And
parameters for ab command? Did you do any change for client test servers
(like ulimit, etc)?

Thanks!

Posted at Nginx Forum:

atadmin · October 7, 2011, 1:20pm

On 10/07/2011 12:44 PM, Bradley F. wrote:

them, then it’s probably not your problem anyway.
ipv6 321509 28 ip6t_REJECT,nf_conntrack_ipv6
[root@bf1 ~]# cat /proc/sys/net/netfilter/nf_conntrack_count
2

Given the ipv6 references above have you also done a “service ip6tables
stop”?

Regards,
Dennis

atadmin · October 7, 2011, 2:29pm

On Fri, Oct 7, 2011 at 9:49 PM, Dennis J.
<[email protected]

wrote:

Given the ipv6 references above have you also done a “service ip6tables
stop”?

You were correct, stopping ip6tables then took out the module.

Cheers, sorry to go off topic / hijack.

atadmin · October 7, 2011, 6:05pm

Maybe relevant:

Kind regards,
Fredrik Widlund

Frn: [email protected] [[email protected]] för Bradley
Falzon [[email protected]]
Skickat: den 7 oktober 2011 14:28
Till: [email protected]
mne: Re: Nginx setting up >25.000 concurrent connections per second

On Fri, Oct 7, 2011 at 9:49 PM, Dennis J.
<[email protected]mailto:[email protected]> wrote:
Given the ipv6 references above have you also done a “service ip6tables
stop”?

You were correct, stopping ip6tables then took out the module.

Cheers, sorry to go off topic / hijack.

–
Bradley F.
[email protected]mailto:[email protected]

atadmin · October 7, 2011, 12:45pm

On Fri, Oct 7, 2011 at 5:14 PM, atadmin [email protected] wrote:

I tested with this flag enable and disables and the result is the same,
now is disables because was the las test. Iptables is not enabled for
the test.

Just confirming you actively removed the nf_conntrack modules. My boxes
have
logged conntrack rules warnings in syslog, so if you’re not seeing them,
then it’s probably not your problem anyway.

My box with connection track is still tracking even though my iptables
is
stopped:
[root@bf1 ~]# service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
[root@bf1 ~]# lsmod | grep nf_conn
nf_conntrack_ipv6 19655 2
nf_conntrack 79643 2 nf_conntrack_ipv6,xt_state
ipv6 321509 28 ip6t_REJECT,nf_conntrack_ipv6
[root@bf1 ~]# cat /proc/sys/net/netfilter/nf_conntrack_count
2

My box with connection track removed:
[brad@cache1 ~]$ cat /proc/sys/net/netfilter/nf_conntrack_count
cat: /proc/sys/net/netfilter/nf_conntrack_count: No such file or
directory

Connection tracking probably isn’t your issue, but for completeness
sake,
just confirming.