Forum: NGINX cache manager process - i/o perf

2974d09ac2541e892966b762aad84943?d=identicon&s=25 ixos (Guest)
on 2014-05-21 13:57
(Received via mailing list)
I'm having problem with I/O performance. I'm running nginx as caching
reverse proxy server.
When cache size on disk exceeds max_size cache manager starts working,
but
it causes two problems occur:

1) I/O %util reach 100% and nginx starts dropping connections
2) cache manager process dosen't unlink files speed enough to delete old
file. So cache becomes bigger util the space on disk ends.

Can you give me an idea how can I solve those problems. Below are some
details.

#build on 20x 300GB SAS disks with 2 SSDs for Cachecade.

# storcli64 /c0 show
VD LIST :
=======

----------------------------------------------------------------
DG/VD TYPE   State Access Consist Cache Cac sCC       Size Name
----------------------------------------------------------------
1/2   RAID60 Optl  RW     Yes     RaWBC R   ON    4.357 TB
2/1   Cac0   Optl  RW     Yes     RaWTD -   ON  557.875 GB
----------------------------------------------------------------

# mount
/dev/sdb1 on /cache type ext4 (rw,noatime,data=ordered)

# df -h /dev/sdb1
/dev/sdb1       4.3T  3.2T  828G  80% /cache


# for pid in `pgrep nginx `;do ionice -p $pid ;done
unknown: prio 4 <- master
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0
best-effort: prio 0 <- workers
idle <- cache manager

# grep proxy_cache_path nginx.conf
proxy_cache_path /cache zone=my-cache:20000msize=3355443m

# netstat -sp|grep -i drop
    6335115 SYNs to LISTEN sockets dropped

# iostat -dx 1 /dev/sdb |grep ^sdb | awk '{print $14}'
24.40
31.20
26.80
23.60
26.80
16.00
34.80
35.20
29.60
...
14.40
15.60
11.60
16.00
17.20
18.00
17.20
42.00
90.80 <- cache manager process starts
100.00
100.00
29.20
100.00
100.00
100.00
52.00
100.00
100.00
100.00

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,250247,250247#msg-250247
A8108a0961c6087c43cda32c8616dcba?d=identicon&s=25 Maxim Dounin (Guest)
on 2014-05-21 14:49
(Received via mailing list)
Hello!

On Wed, May 21, 2014 at 07:57:00AM -0400, ixos wrote:

> details.
>
> #build on 20x 300GB SAS disks with 2 SSDs for Cachecade.

[...]

> # grep proxy_cache_path nginx.conf
> proxy_cache_path /cache zone=my-cache:20000msize=3355443m

The "proxy_cache_path" looks corrupted and incomplete.  First of
all, I would suggest you to make sure you are using "levels"
parameter, see http://nginx.org/r/proxy_cache_path.

--
Maxim Dounin
http://nginx.org/
2974d09ac2541e892966b762aad84943?d=identicon&s=25 ixos (Guest)
on 2014-05-21 15:15
(Received via mailing list)
> The "proxy_cache_path" looks corrupted and incomplete. First of
> all, I would suggest you to make sure you are using "levels"
> parameter, see http://nginx.org/r/proxy_cache_path.

I didn't paste all of proxy_cache_path directive. Here you have all.
    proxy_temp_path /cache/tmp;
    proxy_cache_path /cache
        levels=2:2
        keys_zone=my-cache:20000m
        max_size=3355443m
        inactive=7d;

And also nginx version if needed:

# /usr/local/nginx/sbin/nginx -V
nginx version: nginx/1.5.9

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,250247,250250#msg-250250
A8108a0961c6087c43cda32c8616dcba?d=identicon&s=25 Maxim Dounin (Guest)
on 2014-05-21 15:59
(Received via mailing list)
Hello!

On Wed, May 21, 2014 at 09:15:16AM -0400, ixos wrote:

>         inactive=7d;
See no obvious problems.

Try looking into system tuning then, your disk subsystem just
can't cope with load.  There are number of ways to improve disk
i/o performance, starting from nginx tuning (aio, output_buffers
etc., see http://nginx.org/r/aio) to OS tuning (in particular,
tuning vnode cache may be beneficial, not sure how to do this on
Linux), as well as using a RAID configuration which delivers
better performance.  A number of recommendations can be found in
this list, see archives.

An obvious workaround is to reduce disk load by using smaller
max_size/inactive, and/or with proxy_cache_min_uses (see
http://nginx.org/r/proxy_cache_min_uses).

> And also nginx version if needed:
>
> # /usr/local/nginx/sbin/nginx -V
> nginx version: nginx/1.5.9

While it may be a good idea to upgrade to a recent and supported
version, there shouldn't be a big difference from performance
point of view.

--
Maxim Dounin
http://nginx.org/
F6a0c08917075a1ebc88d60ffa4461c4?d=identicon&s=25 Andy (Guest)
on 2014-05-22 04:12
(Received via mailing list)
I hit similar problem ...

Can I know what is the ingest Gbps into the SSDs when you hit the
problem?
and How many cached file nodes in cache-manager? i have millions ...
2974d09ac2541e892966b762aad84943?d=identicon&s=25 ixos (Guest)
on 2014-05-22 11:05
(Received via mailing list)
> Can I know what is the ingest Gbps into the SSDs when you hit the
problem?
About ~500 Mbps

> and How many cached file nodes in cache-manager? I have millions ...
Between 7-9 milions

Can you tell more about your configuration os/nginx/cache? And how have
you
tried to solve the problem.

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,250247,250273#msg-250273
F6a0c08917075a1ebc88d60ffa4461c4?d=identicon&s=25 Andy (Guest)
on 2014-05-22 13:39
(Received via mailing list)
On Thu, May 22, 2014 at 5:05 PM, ixos <nginx-forum@nginx.us> wrote:

> > Can I know what is the ingest Gbps into the SSDs when you hit the
> problem?
> About ~500 Mbps
>
> > and How many cached file nodes in cache-manager? I have millions ...
> Between 7-9 milions
>
> Can you tell more about your configuration os/nginx/cache? And how have you
> tried to solve the problem.
>

No, I didnot find a way to resolve this, I have to make the cached files
to
a smaller count and add more devices to share the load ...

We may need a feature to do disk write admission based on the disk load
...
2974d09ac2541e892966b762aad84943?d=identicon&s=25 ixos (Guest)
on 2014-05-22 15:07
(Received via mailing list)
>No, I didnot find a way to resolve this, I have to make the cached files to
a smaller count and add more devices to share the load ...

But is this split with samller devices "solve" the problem? I mean how
many
file could you have in cache? How many devices you have?

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,250247,250278#msg-250278
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.