16665#0 unlink()

nano · May 3, 2013, 10:17pm

Hello,

I’m using nginx 1.4.0 to proxy a website, and I cache responses. I
haven’t
noticed any problems on the front end, but the error log has unlink()
errors.

2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/8/9f/42da8f2662887b05cbb46fd5c9dac9f8” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/8/7d/f16e1a9cee13b3a9852fff331491d7d8” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/d/96/2b1e341ee2ccd315643dcad397b9796d” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/6/87/c3324c5f79272b6fff64ac19be2d0876” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/c/ae/aa5ee91c36f7ab931251dd125a200aec” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/c/d8/2ac585aa18ec25e3a8eab19b096dcd8c” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/2/94/77170f4b850dcc5bae0e93bdf0f07942” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/3/cd/f92020ab245f9be3bab04cf8bf93acd3” failed
(2: No
such file or directory)

The list goes on.

Is this something to be concerned of?

My configuration is psuedo but here is the main parts:

#reverse ssl (usage not shown in examples)
proxy_cache_path    /usr/local/nginx/cache  levels=1:2

keys_zone=static-files:10m inactive=24h max_size=1g;

#main site cache
proxy_cache_path    /usr/local/nginx/cache  levels=1:2

keys_zone=page-cache:10m inactive=24h max_size=1g;

#main site
location / {
        proxy_http_version 1.0;
        proxy_set_header Accept-Encoding "";
        proxy_set_header X-Forwarded-For

$proxy_add_x_forwarded_for;
proxy_set_header Host *********;
proxy_ignore_headers Set-Cookie;
proxy_ignore_headers Cache-Control;
proxy_cache page-cache;
proxy_cache_key $scheme$proxy_host$request_uri;
proxy_cache_valid 200 30m;
proxy_cache_valid 404 1m;
proxy_intercept_errors on;
proxy_cache_use_stale error timeout invalid_header updating
http_502 http_500 http_503 http_504;
add_header X-Cache $upstream_cache_status;
proxy_pass *********;
}

#sub domain
location / {
        proxy_http_version 1.0;
        proxy_set_header Accept-Encoding "";
        proxy_set_header X-Forwarded-For

$proxy_add_x_forwarded_for;
proxy_set_header Host *********;
proxy_ignore_headers Set-Cookie;
proxy_ignore_headers Cache-Control;
proxy_cache page-cache;
proxy_cache_key $scheme$proxy_host$request_uri;
proxy_cache_valid 200 30m;
proxy_cache_valid 404 1m;
proxy_intercept_errors on;
proxy_cache_use_stale error timeout invalid_header updating
http_502 http_500 http_503 http_504;
add_header X-Cache $upstream_cache_status;
proxy_pass *********;
}

Is this bad practice to share caches among subdomains? Is sharing the
cache
the reason why I’m getting unlink() errors?

Posted at Nginx Forum:

nano · May 4, 2013, 12:01am

Hello!

On Fri, May 03, 2013 at 04:17:17PM -0400, nano wrote:

Hello,

I’m using nginx 1.4.0 to proxy a website, and I cache responses. I haven’t
noticed any problems on the front end, but the error log has unlink()
errors.

[…]

#main site cache
proxy_cache_path    /usr/local/nginx/cache  levels=1:2
keys_zone=page-cache:10m inactive=24h max_size=1g;

You’ve configured two distinct caches to use single directory.
This is not how it’s expected to work.

You should use distinct directories for each cache you configure.
If you want different locations to use the same cache - just use
the same cache in the proxy_cache directive.

[…]

Is this bad practice to share caches among subdomains? Is sharing the cache
the reason why I’m getting unlink() errors?

It’s ok to use the same cache for different locations/servers.
But it’s really bad idea to configure multiple caches in the same
directory, and this is what causes your problems.

–
Maxim D.
http://nginx.org/en/donation.html

nano · May 4, 2013, 1:50am

Thank you for such a quick reply Maxim!

You solved my problem, thank you very much.

Posted at Nginx Forum:

nano · May 5, 2013, 1:09am

On 05/03/13 18:01, Maxim D. wrote:

Hello!

Is this bad practice to share caches among subdomains? Is sharing the cache
the reason why I’m getting unlink() errors?

It’s ok to use the same cache for different locations/servers.
But it’s really bad idea to configure multiple caches in the same
directory, and this is what causes your problems.

Maxim,

I have just seen a similar situation using fastcgi cache. In my case I
am using the same cache (but only one cache) for several server/location
blocks. The system is a fairly basic nginx set up with four upstream
fastcgi servers and ip hash. The returned content is cached locally by
nginx. The cache is rather large but I wouldn’t think this would be the
cause.

Relevant config:

http {

 ....

 upstream fastcgi_backend {
     ip_hash;
     server  10.0.2.1:xxxx;
     server  10.0.2.2:xxxx;
     server  10.0.2.3:xxxx;
     server  10.0.2.4:xxxx;
     keepalive   8;
 }

 fastcgi_cache_path /var/nginx/fcgi_cache levels=1:2

keys_zone=one:512m max_size=250g inactive=24h;

 ....

}

server1 {
…

 server_name domain1.com;

 ....

 location ~ \.blah$ {
     fastcgi_pass fastcgi_backend;
     include /usr/local/etc/nginx/fastcgi_params;
     fastcgi_buffers 64 4k;
     fastcgi_read_timeout    120s;
     fastcgi_keep_conn   on;
     fastcgi_send_timeout    120s;
     fastcgi_cache   one;
     fastcgi_cache_key $scheme$request_method$host$request_uri;
     fastcgi_cache_lock on;
     fastcgi_cache_lock_timeout 5s;
     fastcgi_cache_methods GET HEAD;
     fastcgi_cache_min_uses 1;
     fastcgi_cache_use_stale error updating;
     fastcgi_cache_valid 200 302 60m;
     fastcgi_cache_valid 301 12h;
     fastcgi_cache_valid 404 5m;
 }

…

}

The other sever/location blocks are pretty much identical insofar as
fastcgi and cache are concerned.

When I upgraded nginx using the “on the fly” binary upgrade method, I
saw almost 400,000 lines in the error log that looked like this:

2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/42adc8a0136048b940c6fcaa76abf2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/c3656dff5aa91af1a44bd0157045d2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/de75207502d7892cf377a3113ea552e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/c2205e6a3df4f29eb2a568e435b2b2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/6ccaa4244645e508dad3d14ff73ea2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/76b4b811553756a2989ae40da863d2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/53d40a6399ba6dcf08bc0a52623932e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/68ff8b00492991a2e3ba5ad7420d42e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/19c079c9a1e0bcacb697af123d47f2e7” failed (2:
No such file or directory)

The backend logs show nothing of note.

–
Jim O.

nano · May 5, 2013, 10:33pm

Hello!

On Sat, May 04, 2013 at 07:08:55PM -0400, Jim O. wrote:

[…]

I have just seen a similar situation using fastcgi cache. In my case
I am using the same cache (but only one cache) for several
server/location blocks. The system is a fairly basic nginx set up
with four upstream fastcgi servers and ip hash. The returned content
is cached locally by nginx. The cache is rather large but I wouldn’t
think this would be the cause.

[…]

fastcgi_cache_path /var/nginx/fcgi_cache levels=1:2
keys_zone=one:512m max_size=250g inactive=24h;

[…]

The other sever/location blocks are pretty much identical insofar as
fastcgi and cache are concerned.

When I upgraded nginx using the “on the fly” binary upgrade method,
I saw almost 400,000 lines in the error log that looked like this:

2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7” failed
(2: No such file or directory)

[…]

After binary upgrade there are two cache zones - one in old nginx,
and another one in new nginx (much like in originally posted
configuration). This may cause such errors if e.g. a cache file
is removed by old nginx, and new nginx fails to remove the file
shortly after.

The 400k lines is a bit too many though. You may want to check
that the cache wasn’t just removed by some (package?) script
during the upgrade process. Alternatively, it might indicate that
you let old and new processes to coexist for a long time.

On the other hand, as discussed many times - such errors are more
or less harmless as soon as it’s clear what caused cache files to
be removed. At worst they indicate that information in a cache
zone isn’t correct and max_size might not be maintained properly,
and eventually nginx will self-heal the cache zone. It probably
should be logged at [error] or even [warn] level instead.

–
Maxim D.
http://nginx.org/en/donation.html

nano · May 6, 2013, 3:54pm

Hello!

On Mon, May 06, 2013 at 09:01:45AM -0400, Jim O. wrote:

with four upstream fastcgi servers and ip hash. The returned content
The other sever/location blocks are pretty much identical insofar as

minutes after, access log entries show the expected ratio of “HIT”
and “MISS” entries which further supports your point below that
these are harmless (although I don’t really believe that I have a
cause).

I’m not sure what you mean by a “long time” but all of these entries
are time stamped over over roughly two and a half minutes.

Is it ok in your setup that 400k cache items are removed/expired
from cache in two minutes? If yes, then it’s probably ok.

On the other hand, as discussed many times - such errors are more
or less harmless as soon as it’s clear what caused cache files to
be removed. At worst they indicate that information in a cache
zone isn’t correct and max_size might not be maintained properly,
and eventually nginx will self-heal the cache zone. It probably
should be logged at [error] or even [warn] level instead.

Why would max_size not be maintained properly? Isn’t that the
responsibility cache manager process? Are there known issues/bugs?

Cache manager process uses the same shared memory zone to maintain
max_size. And if nginx thinks a cache file is here, but the file
was in fact already deleted (this is why alerts in question
appear) - total size of the cache as recorded in the shared memory
will be incorrect. As a result cache manager will delete some
extra files to keep (incorrect) size under max_size.

In a worst case cache size will be again correct after inactive=
time passes after cache files were deleted.

–
Maxim D.
http://nginx.org/en/donation.html

nano · May 6, 2013, 3:02pm

On 05/05/13 16:32, Maxim D. wrote:

is cached locally by nginx. The cache is rather large but I wouldn’t
fastcgi and cache are concerned.
After binary upgrade there are two cache zones - one in old nginx,
and another one in new nginx (much like in originally posted
configuration). This may cause such errors if e.g. a cache file
is removed by old nginx, and new nginx fails to remove the file
shortly after.

The 400k lines is a bit too many though. You may want to check
that the cache wasn’t just removed by some (package?) script
during the upgrade process. Alternatively, it might indicate that
you let old and new processes to coexist for a long time.

I hadn’t considered that there are two zones during that short time.
Thanks for pointing that out.

To my knowledge, there are no scripts or packages which remove files
from the cache, or the entire cache. A couple of minutes after this
occurred there were a bit under 1.4 million items in the cache and it
was “full” at 250 GB. I did look in a few sub-directories at the time,
and most of the items were time stamped from before this started so
clearly the entire cache was not removed. During the time period these
entries were made in the error log, and in the two minutes after, access
log entries show the expected ratio of “HIT” and “MISS” entries which
further supports your point below that these are harmless (although I
don’t really believe that I have a cause).

I’m not sure what you mean by a “long time” but all of these entries are
time stamped over over roughly two and a half minutes.

On the other hand, as discussed many times - such errors are more
or less harmless as soon as it’s clear what caused cache files to
be removed. At worst they indicate that information in a cache
zone isn’t correct and max_size might not be maintained properly,
and eventually nginx will self-heal the cache zone. It probably
should be logged at [error] or even [warn] level instead.

Why would max_size not be maintained properly? Isn’t that the
responsibility cache manager process? Are there known issues/bugs?

Thank you for your response and assistance.

–
Jim O.

nano · May 6, 2013, 4:21pm

On 05/06/13 09:54, Maxim D. wrote:

keys_zone=one:512m max_size=250g inactive=24h;
“/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7” failed
The 400k lines is a bit too many though. You may want to check
it was “full” at 250 GB. I did look in a few sub-directories at the

Is it ok in your setup that 400k cache items are removed/expired
from cache in two minutes? If yes, then it’s probably ok.

No, that is way more than expected. The box handles an average of
300-500 requests/second during peak hours, spiking around 800-900. so
that would be at most around 150,000 requests in three minutes. Even if
150,000 requests were all cache-able and were all cache misses
(resulting in them all expiring at the same time in the future) that
could not explain all of those items. FWIW, this upgrade was done on a
weekend. Peak times are “business hours” in Europe and North America.
The box was relatively slow at that time.

responsibility cache manager process? Are there known issues/bugs?

OK, that makes sense and is what I would expect. I’m still troubled by
how many items there were in discrepancy.

I will watch and see what happens the next time I upgrade. I’ll look at
how many items are in the cache directory before and after, as well as
the total size, which was on the mark after the upgrade this time, but
perhaps not before.

–
Jim O.