Nginx Slow download over 1Gbps load!

Hi,

We’ve recently shifted to FreeBSD-10 due to its robust asynchronous
performance for big storage based on .mp4 files. Here is the server
specs :

2 x Intel Xeon X5690
96GB DDR3 Memory
12 x 3TB SATA Raid-10 (HBA LSI-9211)
ZFS FileSystem with 18TB usable space
2 x 1Gbps LACP (2Gbps Throughput)

Things are working quite well, no high I/O due to Big Ram cache and AIO
performance but once network port started to go over 1Gbps, performance
begins to lag, download speed started to stuck around 60-100Kbps on a
4Mbps
connection (using wget) which works quite efficient under 800Mbps port
(450kbps on 4Mbps). We first thought it could be network issue or LACP
issue but doesn’t looks like it is. We also checked that if requests are
in
queue using following command but it was ‘0’:

[root@cw005 ~/scripts]# netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen Local Address
tcp4 0/0/6000 *.80
tcp4 0/0/6000 *.443
tcp4 0/0/10 127.0.0.1.25
tcp4 0/0/128 *.1880
tcp6 0/0/128 *.1880
tcp4 0/0/5 *.5666
tcp6 0/0/5 *.5666
tcp4 0/0/128 *.199
unix 0/0/6000 /var/run/www.socket
unix 0/0/4 /var/run/devd.pipe
unix 0/0/4 /var/run/devd.seqpacket.pipe

Here is the output of mbcluster :

119747/550133/669880/6127378 mbuf clusters in use
(current/cache/total/max)
661065/1410183/2071248/6063689 4k (page size) jumbo clusters in use
(current/cache/total/max)

We also checked with Disk Busy rate using gstat which was quite stable
as
well.

So it looks like either the sysctl values need to tweak or Nginx
configurations are not optimized. Here is the sysctl.conf :

kern.ipc.somaxconn=6000

set to at least 16MB for 10GE hosts

kern.ipc.maxsockbuf=16777216

socket buffers

net.inet.tcp.recvspace=4194304
net.inet.tcp.sendspace=4197152
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288

security

security.bsd.see_other_uids=0
security.bsd.see_other_gids=0

drop UDP packets destined for closed sockets

net.inet.udp.blackhole=1

drop TCP packets destined for closed sockets

net.inet.tcp.blackhole=2

ipfw

net.inet.ip.fw.verbose_limit=3

maximum incoming and outgoing IPv4 network queue sizes

net.inet.ip.intr_queue_maxlen=2048
net.route.netisr_maxqlen=2048

net.inet.icmp.icmplim: 2048
net.inet.tcp.fast_finwait2_recycle=1
kern.random.sys.harvest.ethernet=0
net.inet.ip.portrange.randomized=0
net.link.lagg.0.use_flowid=0

Here is the bootloader.conf :

zpool_cache_load=“YES”
zpool_cache_type=“/boot/zfs/zpool.cache”
zpool_cache_name=“/boot/zfs/zpool.cache”
aio_load=“YES”
zfs_load=“YES”
ipmi_load=“YES”

Here is the nginx.conf :

user www www;
worker_processes 48;
worker_rlimit_nofile 900000; #2 filehandlers for each connection
error_log /var/log/nginx-error.log error;
#pid logs/nginx.pid;

events {
worker_connections 10240;
multi_accept on;

}
http {
include mime.types;
default_type application/octet-stream;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection “1; mode=block”;
client_max_body_size 4096M;
client_body_buffer_size 800M;
output_buffers 1 512k;
sendfile_max_chunk 128k;

 fastcgi_connect_timeout 30;
 fastcgi_send_timeout 30;
 fastcgi_read_timeout 30;
 proxy_read_timeout 30;
 fastcgi_buffer_size 64k;
 fastcgi_buffers 16 64k;
 fastcgi_temp_file_write_size 256k;


 server_tokens off; #Conceals nginx version
 access_log off;
 sendfile        off;
 tcp_nodelay on;
 aio on;
 client_header_timeout  30s;
 client_body_timeout 30s;
 send_timeout     30s;
 keepalive_timeout  15s;
 ssl_session_cache   shared:SSL:10m;
 ssl_session_timeout 10m;
gzip off;
gzip_vary on;
gzip_disable "MSIE [1-6]\.";
gzip_proxied any;
gzip_http_version 1.0;
gzip_min_length  1280;
gzip_comp_level  6;
gzip_buffers  16 8k;
gzip_types    text/plain text/xml text/css application/x-javascript

image/png image/x-icon image/gif image/jpeg image/jpg application/xml
application/xml+rss text/javascr ipt application/atom+xml;
include /usr/local/etc/nginx/vhosts/*.conf;
}

Here is the vhost :

server {
listen 80 sndbuf=16k;
server_name cw005.files.com cw005.domain.com
www.cw005.files.com www.cw005.domain.com cw005.domain.net
www.cw005.domain.net;
location / {
root /files;
index index.html index.htm index.php;
autoindex off;
}
location ~ .(jpg)$ {
* sendfile on;*
tcp_nopush on;
* aio off;*
root /files;
try_files $uri /thumbs.php;
expires 1y;
}

    location ~* \.(js|css|png|gif|ico)$ {
            root /files;
            expires 1y;
            log_not_found off;
    }


    location ~ \.(flv)$ {
            flv;
            root /files;
            expires 7d;
            include hotlink.inc;
            }

    location ~ \.(mp4)$ {
            mp4;
         mp4_buffer_size 4M;
            mp4_max_buffer_size 20M;
            expires 1y;
            add_header Cache-Control "public";
            root /files;
            include hotlink.inc;
            }

pass the PHP scripts to FastCGI server listening on

unix:/var/run/www.socket
location ~ .php$ {
root /files;
fastcgi_pass unix:/var/run/www.socket;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME
$document_root$fastcgi_script_name;
include fastcgi_params;
fastcgi_read_timeout 10000;
}

    location ~ /\.ht {
        deny  all;
    }

}

====================================================

Please i need guidance to handle with this problem, i am sure that some
value needs to tweak.

Thanks in advance !!

This is a bit out of scope of nginx but …

could be network issue or LACP issue but doesn’t looks like it is

How did you determine this?
Can you generate more than 1 Gbps (without nginx)?

12 x 3TB SATA Raid-10 (HBA LSI-9211)
ZFS FileSystem with 18TB usable space

Please i need guidance to handle with this problem, i am sure that some
value needs to tweak.

What’s the output of zpool iostat (and the overal zpool/zfs
configuration)?

Also do you have ZFS on top of hardware raid ?

In general just 12 SATA disks won’t have a lot of IOps (especially
random
read) unless it all hits ZFS Arc (can/should be monitored), even more if
there is a hardware raid underneath (in your place would flash the HBA
with
IT firmware so you get plain jbods managed by ZFS).

rr

Hi,

Forget the application layer being the problem until you have
successfully replicated the problem in several different setups.

Are you monitoring both links utilization levels? Really sounds like a
network layer problem or something with your ip stack.

Can you replicate using ftp, scp?

How is your switch configured? How are the links negotiated, make sure
both sides of both links are full duplex 1gig. Look for crc or input
errors on the interface side.

How many packets are you pushing? Make sure the switch isnt activating
unicast limiting.

Lots of things to check… Would help if you can help us understand what
tests youve done to determine its nginx.

Thanks


Payam C.
Network Engineer / Security Specialist

Hi,

Thanks a lot for response. Now i am doubting that issue is on network
layer
as i can examine lots of retransmitted packets in netstat -s output.
Here
is the server’s status :

Following is the thread with same mentioned issue :

This is what he said in thread :

“I ran into a problem with Cisco Switchs forcing Negotiation of network
speeds. This caused intermittent errors and retransmissions. The result
was
file transfers being really slow. May not be the cases, but you can turn
of
speed negotiation with miitools (if I recall correctly, been a long
time).”

Can you replicate using ftp, scp?
Yes, we recently tried downloading file over FTP and encountered the
same
slow transfer rate.

What’s the output of zpool iostat (and the overal zpool/zfs
configuration)?Also do you have ZFS on top of hardware raid ? In general
just 12 SATA disks won’t have a lot of IOps (especially random read)
unless
it all hits ZFS Arc (can/should be monitored), even more if there is a
hardware raid underneath (in your place would flash the HBA with IT
firmware so you get plain jbods managed by ZFS).

zpool iostat is quite stable yet. We’re using HBA LSI-9211 , so its not
hardware controller as FreeBSD recommends to use HBA in order to
directly
access all drives for scrubbing and data-integrity purposes. Do you
recommend Hardware-Raid ? Following is the scrnshot of ARC status :

How is your switch configured? How are the links negotiated, make sure
both sides of both links are full duplex 1gig. Look for crc or input
errors
on the interface side.
On My side, i can see that both interfaces have Fully-Duplex port.
Regarding crc / input errors, is there any command i can use to check
that
on FreeBSD ?

Regards.
Shahzaib

Sounds like at this point this discussion needs to be moved off the
nginx mailing list.

The server is using ports 18 and 19 and those port are configured with
speed 1000

LH26876_SW2#sh run int g 0/18

!

interface GigabitEthernet 0/18

description LH28765_3

no ip address

speed 1000

!

port-channel-protocol LACP

port-channel 3 mode active

no shutdown

LH26876_SW2#sh run int g 0/19

!

interface GigabitEthernet 0/19

description LH28765_3

no ip address

speed 1000

!

port-channel-protocol LACP

port-channel 3 mode active

no shutdown

LH26876_SW2#


Is it alright ?

Regards.

Shahzaib

On Sun, Jan 31, 2016 at 11:18 PM, shahzaib shahzaib
[email protected]

description LH28765_3

no ip address

speed 1000

Can you set that to „auto“ of some sort?
I know very little about switches - but we usually have problems when
one side is set to auto negotiation and the other isn’t….
Most of the time, the switch being set to some fixed bandwidth is a
legacy of maybe a decade ago, when switches were crap (and some NICs
were, too).

Then, can you check with something like ioperf or so, from two different
hosts at the same time if you can get past the 1GBit/s?

LACP will only do load-balancing with different addresses.

So, if you test from one IP, you will only ever get 1 GBit/s.

You could also play with some of the setting described on calomel.org
http://calomel.org/ for tuning tcp/ip.

As others have pointed out, it won’t hurt moving this to
[email protected] mailto:[email protected]….

Rainer

Yes, we recently tried downloading file over FTP and encountered the same
slow transfer rate.

Then it’s not really a nginx issue, it seems you just hit the servers
(current) network limit.

zpool iostat is quite stable yet. We’re using HBA LSI-9211 , so its not
hardware controller as FreeBSD recommends to use HBA in order to directly
access all drives for scrubbing and data-integrity purposes.
Do you recommend Hardware-Raid ?

Well no, exactly opposite - ZFS works best(better) with bare disks than
hardware raid between.
Just usually theese HBAs come with IR (internal raid) firmware - also
you
mentioned Raid 10 in your setup while in ZFS-world such term isn’t often
used (rather than mirrored pool or raidz(1,2,3…)).

As to how to solve the bandwidth issue - from personal experience I have
more success (less trouble/expensive etc) with interface bonding on the
server itself (for Linux with balance-alb which doesn’t require any
specific
switch features or port configuration) rather than relying on the
switches.
The drawback is that a single stream/download can’t go beyond one
physical
interface speed limit (for 1Gbe = ~100MB/s) but the total bandwidth cap
is
roughly multiplied by the device count bonded together.

Not a BSD guy but I guess the lagg in loadbalance mode would do the same
(
lagg ).

rr

Yep, im certain this is not an nginx problem as others have also pointed
out.

Two ways of solving an interface limitation problem.

  1. Change ur load balancing algo to per packet load balancing. This will
    split up te traffic much more evenly on multiple interfaces however, i
    would not recommend this as you run the risk of dropped/ out of state
    pkt and errors… There are only few conditions that this would work
    with a high level of success.

  2. 10gig interfaces. You can pickup a cisco 3750x on amazon for a few
    hundred $ these days and add a 10gig card to it.

Id say to check your sysctl and ulimit settings however kts clear that
issiue is only active when pushing over 1gig/sec.

Simple test: use two different sources and test your download at the
same time. If you can dictate the source ip addresses, use an even and
odd last octet so to aid in a better balanced return traffic path.
Monitor both switch port interfaces and you should see the total traffic

1gig/sec without problems… More controlled test would be to setup
each interface with its own ip and force reply traffic to staybon each
nic.

Feel free to drop me an email offlist if you need anymore help.


Payam C.
Solution Architect

In such a situation, raid data recovery may be necessary. In order not to do this, it is better to configure automatic backup.