Proxy_cache

dubstep · April 24, 2011, 4:59pm

Hi
Is there a way to know if a file has been cached/is currently being
downloaded to cache without requesting the file?
I.e. can I query the cache table somehow?
I am trying to stop new files being written multiple times to disk when
they are requested more than once at the same time (currently a new file
can be being written 10’s of times to disk at once which kills the
server)

Many thanks as always

rkoskela · April 25, 2011, 4:12am

On Sunday, April 24, 2011, Richard K. [email protected]
wrote:

Is there a way to know if a file has been cached/is currently being downloaded
to cache without requesting the file?
I.e. can I query the cache table somehow?
I am trying to stop new files being written multiple times

proxy_cache_use_stale updating;
http://wiki.nginx.org/HttpProxyModule#proxy_cache_use_stale

–
RPM

rkoskela · April 25, 2011, 4:22am

Thanks, I already set “proxy_cache_use_stale updating”
But what if the file is completely new…? there is no stale file to
serve

rkoskela · April 25, 2011, 8:57pm

Hello!

On Mon, Apr 25, 2011 at 02:22:23AM +0000, Richard K. wrote:

Thanks, I already set “proxy_cache_use_stale updating”
But what if the file is completely new…? there is no stale file to serve

Currently, there is no good way to handle this. Solution is
usually called “busy locks” (search list archives for details),
but it’s not ready yet.

Maxim D.

rkoskela · April 25, 2011, 9:00pm

On Sun, Apr 24, 2011 at 9:22 PM, Richard K.
[email protected] wrote:

Thanks, I already set “proxy_cache_use_stale updating”
But what if the file is completely new…? there is no stale file to serve

Ah, I see… the first request for a new file would put it into the
updating state, but there would be no stale version to serve.

I’m not sure what happens in that case, but I would suspect that all
of those requests go to the back end. This should be a very
short-lived condition though. In the majority of the cases we’re
talking milliseconds, unless the files are really large or the
back-end is really slow.

Say you had a 50 MB video file and your back-end was something like
Amazon S3… I could see many many requests for that file coming in at
the same time the first was still processing. I’m not sure what the
best thing to do in that case would be.

The options, I think, would be:

return a 404 (or some temporary error code) until the cache is
primed (that doesn’t seem like good default behavior)
block all other requests until the first is finished (also seems
problematic, especially if the first request is taking forever)
pass all requests to the back-end until there is a valid cache entry

I suspect nginx chooses option #3. Are you saying that you want to do
#2? Or something else entirely?

Varnish seems to do #2 by default:
http://www.varnish-cache.org/docs/2.1/tutorial/handling_misbehaving_servers.html

–
RPM

rkoskela · April 25, 2011, 10:13pm

Ideal situation is initiate one request to back-end then for each
additional request start sending the (still incomplete) cached file at
the same rate it’s being downloaded into the cache (I cache after just 1
hit)

I’m talking video files 400-500mb crossing the Atlantic… Not fast at
all and can get seriously congested once it starts getting the same file
over and over in this “first cache” syndrome

Anything I can do to help/test please let me know
Keep up the good work

Sent from my iPhone

rkoskela · April 26, 2011, 4:29pm

Hehe, I actually came up with the same conclusion… except that I’m
running varnish in front of nginx
Interestingly varnish has added a feature in their super-latest
development version that streams and caches at the same time, but it
only does it for the first request (subsequent requests still wait until
the cache has been completed)…

rkoskela · April 26, 2011, 4:24pm

On Mon, Apr 25, 2011 at 3:13 PM, Richard K.
[email protected] wrote:

Ideal situation is initiate one request to back-end then for each additional
request start sending the (still incomplete) cached file at the same rate it’s
being downloaded into the cache (I cache after just 1 hit)

I’m talking video files 400-500mb crossing the Atlantic… Not fast at all and
can get seriously congested once it starts getting the same file over and over in
this “first cache” syndrome

Anything I can do to help/test please let me know
Keep up the good work

If this is an immediate problem (i.e. it is killing your monthly
bandwidth bill), I would suggest setting up varnish behind nginx, and
add a specific location for the video files in question. It will give
you behavior closer to what you need (only making one request for the
file over the slow link). You could then choose to expand varnish
usage to other files after careful testing. This will complicate your
stack, but such is life. Varnish just meets your particular use case
better at this point.

I’m no C developer, just an nginx user, but I suspect any change in
this area would be a lot of work and slow in coming for nginx. I think
it might be possible to do something all-nginx by having nginx proxy
to itself for these files, and using something like
Module ngx_http_limit_req_module on the “backend” nginx server
which then proxies to your real origin… but it would be kind of
hackish.

–
RPM

rkoskela · April 26, 2011, 5:20pm

On 25 Abr 2011 21h13 WEST, [email protected] wrote:

Just thinking out loud. Perhaps you can avoid using Varnish altogether
and staying purely nginx by using agentzh’s Embedded Lua Module.

Just an idea.

— appa

rkoskela · April 29, 2011, 3:03pm

I already use it for somethings…
But what way would you use it to fix this problem? It can do a lot of
things lol…