Cache manager process - i/o perf

musicdenotation · May 21, 2014, 1:57pm

I’m having problem with I/O performance. I’m running nginx as caching
reverse proxy server.
When cache size on disk exceeds max_size cache manager starts working,
but
it causes two problems occur:

I/O %util reach 100% and nginx starts dropping connections
cache manager process dosen’t unlink files speed enough to delete old
file. So cache becomes bigger util the space on disk ends.

Can you give me an idea how can I solve those problems. Below are some
details.

#build on 20x 300GB SAS disks with 2 SSDs for Cachecade.

storcli64 /c0 show

VD LIST :

DG/VD TYPE State Access Consist Cache Cac sCC Size Name

1/2 RAID60 Optl RW Yes RaWBC R ON 4.357 TB
2/1 Cac0 Optl RW Yes RaWTD - ON 557.875 GB

mount

/dev/sdb1 on /cache type ext4 (rw,noatime,data=ordered)

df -h /dev/sdb1

/dev/sdb1 4.3T 3.2T 828G 80% /cache

for pid in `pgrep nginx` ;do ionice -p $pid ;done

unknown: prio 4 ← master
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0 ← workers
idle ← cache manager

grep proxy_cache_path nginx.conf

proxy_cache_path /cache zone=my-cache:20000msize=3355443m

netstat -sp|grep -i drop

6335115 SYNs to LISTEN sockets dropped

iostat -dx 1 /dev/sdb |grep ^sdb | awk ‘{print $14}’

24.40
31.20
26.80
23.60
26.80
16.00
34.80
35.20
29.60
…
14.40
15.60
11.60
16.00
17.20
18.00
17.20
42.00
90.80 ← cache manager process starts
100.00
100.00
29.20
100.00
100.00
100.00
52.00
100.00
100.00
100.00

Posted at Nginx Forum:

ixos · May 21, 2014, 2:49pm

Hello!

On Wed, May 21, 2014 at 07:57:00AM -0400, ixos wrote:

details.

#build on 20x 300GB SAS disks with 2 SSDs for Cachecade.

[…]

grep proxy_cache_path nginx.conf

proxy_cache_path /cache zone=my-cache:20000msize=3355443m

The “proxy_cache_path” looks corrupted and incomplete. First of
all, I would suggest you to make sure you are using “levels”
parameter, see Module ngx_http_proxy_module.

–
Maxim D.
http://nginx.org/

ixos · May 21, 2014, 3:15pm

The “proxy_cache_path” looks corrupted and incomplete. First of
all, I would suggest you to make sure you are using “levels”
parameter, see Module ngx_http_proxy_module.

I didn’t paste all of proxy_cache_path directive. Here you have all.
proxy_temp_path /cache/tmp;
proxy_cache_path /cache
levels=2:2
keys_zone=my-cache:20000m
max_size=3355443m
inactive=7d;

And also nginx version if needed:

/usr/local/nginx/sbin/nginx -V

nginx version: nginx/1.5.9

Posted at Nginx Forum:

ixos · May 22, 2014, 4:12am

I hit similar problem …

Can I know what is the ingest Gbps into the SSDs when you hit the
problem?
and How many cached file nodes in cache-manager? i have millions …

ixos · May 21, 2014, 3:59pm

Hello!

On Wed, May 21, 2014 at 09:15:16AM -0400, ixos wrote:

    inactive=7d;

See no obvious problems.

Try looking into system tuning then, your disk subsystem just
can’t cope with load. There are number of ways to improve disk
i/o performance, starting from nginx tuning (aio, output_buffers
etc., see Module ngx_http_core_module) to OS tuning (in particular,
tuning vnode cache may be beneficial, not sure how to do this on
Linux), as well as using a RAID configuration which delivers
better performance. A number of recommendations can be found in
this list, see archives.

An obvious workaround is to reduce disk load by using smaller
max_size/inactive, and/or with proxy_cache_min_uses (see
Module ngx_http_proxy_module).

And also nginx version if needed:

/usr/local/nginx/sbin/nginx -V

nginx version: nginx/1.5.9

While it may be a good idea to upgrade to a recent and supported
version, there shouldn’t be a big difference from performance
point of view.

–
Maxim D.
http://nginx.org/

ixos · May 22, 2014, 1:39pm

On Thu, May 22, 2014 at 5:05 PM, ixos [email protected] wrote:

Can I know what is the ingest Gbps into the SSDs when you hit the
problem?
About ~500 Mbps

and How many cached file nodes in cache-manager? I have millions …
Between 7-9 milions

Can you tell more about your configuration os/nginx/cache? And how have you
tried to solve the problem.

No, I didnot find a way to resolve this, I have to make the cached files
to
a smaller count and add more devices to share the load …

We may need a feature to do disk write admission based on the disk load
…

ixos · May 22, 2014, 3:07pm

No, I didnot find a way to resolve this, I have to make the cached files to
a smaller count and add more devices to share the load …

But is this split with samller devices “solve” the problem? I mean how
many
file could you have in cache? How many devices you have?

Posted at Nginx Forum:

ixos · May 22, 2014, 11:05am

Can I know what is the ingest Gbps into the SSDs when you hit the
problem?
About ~500 Mbps

and How many cached file nodes in cache-manager? I have millions …
Between 7-9 milions

Can you tell more about your configuration os/nginx/cache? And how have
you
tried to solve the problem.

Posted at Nginx Forum:

Cache manager process - i/o perf

storcli64 /c0 show

VD LIST :

DG/VD TYPE State Access Consist Cache Cac sCC Size Name

1/2 RAID60 Optl RW Yes RaWBC R ON 4.357 TB 2/1 Cac0 Optl RW Yes RaWTD - ON 557.875 GB

mount

df -h /dev/sdb1

for pid in pgrep nginx ;do ionice -p $pid ;done

grep proxy_cache_path nginx.conf

netstat -sp|grep -i drop

iostat -dx 1 /dev/sdb |grep ^sdb | awk ‘{print $14}’

grep proxy_cache_path nginx.conf

/usr/local/nginx/sbin/nginx -V

/usr/local/nginx/sbin/nginx -V

1/2 RAID60 Optl RW Yes RaWBC R ON 4.357 TB
2/1 Cac0 Optl RW Yes RaWTD - ON 557.875 GB

for pid in `pgrep nginx` ;do ionice -p $pid ;done