I’m using nginx 1.4.0 to proxy a website, and I cache responses. I
haven’t
noticed any problems on the front end, but the error log has unlink()
errors.
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/8/9f/42da8f2662887b05cbb46fd5c9dac9f8” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/8/7d/f16e1a9cee13b3a9852fff331491d7d8” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/d/96/2b1e341ee2ccd315643dcad397b9796d” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/6/87/c3324c5f79272b6fff64ac19be2d0876” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/c/ae/aa5ee91c36f7ab931251dd125a200aec” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/c/d8/2ac585aa18ec25e3a8eab19b096dcd8c” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/2/94/77170f4b850dcc5bae0e93bdf0f07942” failed
(2: No
such file or directory)
2013/05/03 12:53:42 [crit] 16665#0: unlink()
“/usr/local/nginx/cache/3/cd/f92020ab245f9be3bab04cf8bf93acd3” failed
(2: No
such file or directory)
The list goes on.
Is this something to be concerned of?
My configuration is psuedo but here is the main parts:
#reverse ssl (usage not shown in examples)
proxy_cache_path /usr/local/nginx/cache levels=1:2
On Fri, May 03, 2013 at 04:17:17PM -0400, nano wrote:
Hello,
I’m using nginx 1.4.0 to proxy a website, and I cache responses. I haven’t
noticed any problems on the front end, but the error log has unlink()
errors.
[…]
#main site cache
proxy_cache_path /usr/local/nginx/cache levels=1:2
You’ve configured two distinct caches to use single directory.
This is not how it’s expected to work.
You should use distinct directories for each cache you configure.
If you want different locations to use the same cache - just use
the same cache in the proxy_cache directive.
[…]
Is this bad practice to share caches among subdomains? Is sharing the cache
the reason why I’m getting unlink() errors?
It’s ok to use the same cache for different locations/servers.
But it’s really bad idea to configure multiple caches in the same
directory, and this is what causes your problems.
Is this bad practice to share caches among subdomains? Is sharing the cache
the reason why I’m getting unlink() errors?
It’s ok to use the same cache for different locations/servers.
But it’s really bad idea to configure multiple caches in the same
directory, and this is what causes your problems.
Maxim,
I have just seen a similar situation using fastcgi cache. In my case I
am using the same cache (but only one cache) for several server/location
blocks. The system is a fairly basic nginx set up with four upstream
fastcgi servers and ip hash. The returned content is cached locally by
nginx. The cache is rather large but I wouldn’t think this would be the
cause.
Relevant config:
http {
....
upstream fastcgi_backend {
ip_hash;
server 10.0.2.1:xxxx;
server 10.0.2.2:xxxx;
server 10.0.2.3:xxxx;
server 10.0.2.4:xxxx;
keepalive 8;
}
fastcgi_cache_path /var/nginx/fcgi_cache levels=1:2
The other sever/location blocks are pretty much identical insofar as
fastcgi and cache are concerned.
When I upgraded nginx using the “on the fly” binary upgrade method, I
saw almost 400,000 lines in the error log that looked like this:
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/42adc8a0136048b940c6fcaa76abf2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/c3656dff5aa91af1a44bd0157045d2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/de75207502d7892cf377a3113ea552e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/c2205e6a3df4f29eb2a568e435b2b2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/6ccaa4244645e508dad3d14ff73ea2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/76b4b811553756a2989ae40da863d2e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/53d40a6399ba6dcf08bc0a52623932e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/68ff8b00492991a2e3ba5ad7420d42e7” failed (2:
No such file or directory)
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/19c079c9a1e0bcacb697af123d47f2e7” failed (2:
No such file or directory)
On Sat, May 04, 2013 at 07:08:55PM -0400, Jim O. wrote:
[…]
I have just seen a similar situation using fastcgi cache. In my case
I am using the same cache (but only one cache) for several
server/location blocks. The system is a fairly basic nginx set up
with four upstream fastcgi servers and ip hash. The returned content
is cached locally by nginx. The cache is rather large but I wouldn’t
think this would be the cause.
The other sever/location blocks are pretty much identical insofar as
fastcgi and cache are concerned.
When I upgraded nginx using the “on the fly” binary upgrade method,
I saw almost 400,000 lines in the error log that looked like this:
2013/05/04 17:54:25 [crit] 65304#0: unlink()
“/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7” failed
(2: No such file or directory)
[…]
After binary upgrade there are two cache zones - one in old nginx,
and another one in new nginx (much like in originally posted
configuration). This may cause such errors if e.g. a cache file
is removed by old nginx, and new nginx fails to remove the file
shortly after.
The 400k lines is a bit too many though. You may want to check
that the cache wasn’t just removed by some (package?) script
during the upgrade process. Alternatively, it might indicate that
you let old and new processes to coexist for a long time.
On the other hand, as discussed many times - such errors are more
or less harmless as soon as it’s clear what caused cache files to
be removed. At worst they indicate that information in a cache
zone isn’t correct and max_size might not be maintained properly,
and eventually nginx will self-heal the cache zone. It probably
should be logged at [error] or even [warn] level instead.
On Mon, May 06, 2013 at 09:01:45AM -0400, Jim O. wrote:
with four upstream fastcgi servers and ip hash. The returned content
The other sever/location blocks are pretty much identical insofar as
minutes after, access log entries show the expected ratio of “HIT”
and “MISS” entries which further supports your point below that
these are harmless (although I don’t really believe that I have a
cause).
I’m not sure what you mean by a “long time” but all of these entries
are time stamped over over roughly two and a half minutes.
Is it ok in your setup that 400k cache items are removed/expired
from cache in two minutes? If yes, then it’s probably ok.
On the other hand, as discussed many times - such errors are more
or less harmless as soon as it’s clear what caused cache files to
be removed. At worst they indicate that information in a cache
zone isn’t correct and max_size might not be maintained properly,
and eventually nginx will self-heal the cache zone. It probably
should be logged at [error] or even [warn] level instead.
Why would max_size not be maintained properly? Isn’t that the
responsibility cache manager process? Are there known issues/bugs?
Cache manager process uses the same shared memory zone to maintain
max_size. And if nginx thinks a cache file is here, but the file
was in fact already deleted (this is why alerts in question
appear) - total size of the cache as recorded in the shared memory
will be incorrect. As a result cache manager will delete some
extra files to keep (incorrect) size under max_size.
In a worst case cache size will be again correct after inactive=
time passes after cache files were deleted.
is cached locally by nginx. The cache is rather large but I wouldn’t
fastcgi and cache are concerned.
After binary upgrade there are two cache zones - one in old nginx,
and another one in new nginx (much like in originally posted
configuration). This may cause such errors if e.g. a cache file
is removed by old nginx, and new nginx fails to remove the file
shortly after.
The 400k lines is a bit too many though. You may want to check
that the cache wasn’t just removed by some (package?) script
during the upgrade process. Alternatively, it might indicate that
you let old and new processes to coexist for a long time.
I hadn’t considered that there are two zones during that short time.
Thanks for pointing that out.
To my knowledge, there are no scripts or packages which remove files
from the cache, or the entire cache. A couple of minutes after this
occurred there were a bit under 1.4 million items in the cache and it
was “full” at 250 GB. I did look in a few sub-directories at the time,
and most of the items were time stamped from before this started so
clearly the entire cache was not removed. During the time period these
entries were made in the error log, and in the two minutes after, access
log entries show the expected ratio of “HIT” and “MISS” entries which
further supports your point below that these are harmless (although I
don’t really believe that I have a cause).
I’m not sure what you mean by a “long time” but all of these entries are
time stamped over over roughly two and a half minutes.
On the other hand, as discussed many times - such errors are more
or less harmless as soon as it’s clear what caused cache files to
be removed. At worst they indicate that information in a cache
zone isn’t correct and max_size might not be maintained properly,
and eventually nginx will self-heal the cache zone. It probably
should be logged at [error] or even [warn] level instead.
Why would max_size not be maintained properly? Isn’t that the
responsibility cache manager process? Are there known issues/bugs?
keys_zone=one:512m max_size=250g inactive=24h;
“/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7” failed
The 400k lines is a bit too many though. You may want to check
it was “full” at 250 GB. I did look in a few sub-directories at the
Is it ok in your setup that 400k cache items are removed/expired
from cache in two minutes? If yes, then it’s probably ok.
No, that is way more than expected. The box handles an average of
300-500 requests/second during peak hours, spiking around 800-900. so
that would be at most around 150,000 requests in three minutes. Even if
150,000 requests were all cache-able and were all cache misses
(resulting in them all expiring at the same time in the future) that
could not explain all of those items. FWIW, this upgrade was done on a
weekend. Peak times are “business hours” in Europe and North America.
The box was relatively slow at that time.
responsibility cache manager process? Are there known issues/bugs?
OK, that makes sense and is what I would expect. I’m still troubled by
how many items there were in discrepancy.
I will watch and see what happens the next time I upgrade. I’ll look at
how many items are in the cache directory before and after, as well as
the total size, which was on the mark after the upgrade this time, but
perhaps not before.
–
Jim O.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.