Forward single request to upstream server via proxy_store!

Hi,

When the multiple users request for same file on edge server via
proxy_store and requested file is still not downloaded on the edge
server,
the nginx keeps on proxying those requests towards the origin server due
to
which network port is getting saturated on the edge server and file
downloading taking 1~2hours. Is there a way that nginx would forward the
only single request towards the origin server and download the requested
file while holding back the other users and only serve them when the
file
is successfully downloaded on the edge server ?

This way Incoming port(nload) on edge server will not be saturated !!

Regards.
Shahzaib

Is there any way with nginx that i could put an hold on the subsequent
requests and only proxy the single request for same file in order to
prevent filling up the tmp folder ? tmp is kept on filling up due to the
multiple users are accessing the same file and file is not downloaded
yet.

On Sun, Sep 21, 2014 at 2:05 PM, shahzaib shahzaib
[email protected]

On Tuesday 23 September 2014 00:06:56 shahzaib shahzaib wrote:

Is there any way with nginx that i could put an hold on the subsequent
requests and only proxy the single request for same file in order to
prevent filling up the tmp folder ? tmp is kept on filling up due to the
multiple users are accessing the same file and file is not downloaded yet.

[…]

http://nginx.org/r/proxy_cache_lock

wbr, Valentin V. Bartenev

On Tuesday 23 September 2014 19:34:23 shahzaib shahzaib wrote:

@Valentine, is proxy_cache_lock supported with proxy_store ?

No. But if you’re asking, then you’re using a wrong tool.
The proxy_store feature is designed to be very simple and stupid.

To meet your needs you should use the proxy_cache directive
and its friends.

wbr, Valentin V. Bartenev

But i cannot switch with proxy_cache because we’re mirroring the mp4
files
for random seeking using mp4 module and proxy_cache doesn’t support
random
seeking. Is there a way i can use bash script with proxy_store ? I want
the
following logic to prevent duplicate downloads :-

1st user :-

client (request the test.mp4) → nginx (file not existed) → check if
tmp.txt not existed → create tmp.txt → download the test.mp4 from
origin → remove tmp.txt

2nd user requesting the same test.mp4 :-

client (request test.mp4) → nginx (file not existed) → tmp.txt
already
existed (which means nginx already downloading the file) → redirect
user
towards the origin server(keep redirecting users as long as tmp.txt not
removed)

3rd user requesting the same test.mp4 :-

client (request test.mp4) → nginx(file existed) → serve from the
cache.

SO tmp.txt plays the main role here and prevent the subsequent requests
for
the same file but i have no idea how to implement it with nginx. Only if
someone point me towards right direction. :frowning:

Regards.
Shahzaib

On Tue, Sep 23, 2014 at 9:41 PM, Valentin V. Bartenev [email protected]

@Valentine, is proxy_cache_lock supported with proxy_store ?

On Tue, Sep 23, 2014 at 7:03 PM, Valentin V. Bartenev [email protected]

@RR. could you guide me a bit on it or point me to some guide to start
with. I have worked with varnish regarding php caching so i have the
basic
knowledge of varnish but i am just not getting on how to make it work
with
proxy_store. :frowning:

But i cannot switch with proxy_cache because we’re mirroring the mp4 files
for random seeking using mp4 module and proxy_cache doesn’t support random
seeking. Is there a way i can use bash script with proxy_store ? I want
the following logic to prevent duplicate downloads :-

You can try to put Varnish ( https://www.varnish-cache.org ) between
your
proxy_store and content server. It supports request coalescing.

p.s. a branch of the 3.x tree and the new 4.x even does have stream
support.

rr

@RR. could you guide me a bit on it or point me to some guide to start
with. I have worked with varnish regarding php caching so i have the basic
knowledge of varnish but i am just not getting on how to make it work with
proxy_store. :frowning:

Depending on your needs (for example SSL) you can put varnish in
different
places in the setup:

If you use SSL (which varnish itself doesn’t support) you can use your
proxy_store server as an SSL offloader:

  1. [client] ← → [nginx proxy_store server] ← → [varnish] ← →
    [content_server]

… in this case when multiple requests land onto nginx proxy_store in
case
the file locally doesnt exist those are forwarded to varnish and
combined
into a single request to the content server.

A simplistic/generic nginx config:

location / {
error_page 404 = @store;
}

location @store {
internal;
proxy_pass http://imgstore;;
proxy_store on;
}

varnish config:

backend default {
.host = “content_server.ip”;
}
sub vcl_recv {
set req.backend = default;
}

Obviously add whatever else you need (like forwarded-for headers to pass
the
real client ip, cache expire times etc).

  1. In case you don’t use SSL:

[client] ← → [varnish] ← → [content_server]
(optionally you put nginx or some other software like stud or pound on
top
of varnish as SSL offloader (personally I use Shrpx from Spdylay (
GitHub - tatsuhiro-t/spdylay: The experimental SPDY protocol version 2, 3 and 3.1 implementation in C ))

Then generic varnish config would look bassically the same:

backend default {
.host = “content_server.ip”;
}
sub vcl_recv {
set req.backend = default;
}

sub vcl_backend_response {
set beresp.do_stream = true;
}

Hope that helps.

rr

@RR, That’s great. Sure it will help me. I am starting to work with it
on
local environment and will get back to you once the progress started :slight_smile:

Thanks a lot for writing sample config for me !!

@RR, i’ve prepared the local environment with the following structure :-

client → nginx (edge) → varnish → backend (Origin)

When i tested this method i.e :-

3 clients requested for test.mp4 (file size is 4mb) → nginx → file
not
existed (proxy_store) → varnish → backend (fetch the file from
origin).

When nginx proxied these three requests subsequently towards the
varnish,
despite of filling 4mb of tmp dir it was filled with 12MB which means
nginx
is proxying all three requests towards the varnish server and creating
tmp
files as long as the file is not downloaded. (The method was failed)

Although On putting varnish in front of nginx solved this issue.

3 clients requested for test.mp4(file size is 4mb) → varnish(proxying
all
requests for mp4,jpg) → nginx.(fetch the file from origin).

This time tmp dir was filled with the size of 4Mb which means varnish
combined those 3 subsequent requests into 1.


Now varnish also has a flaw to send subsequent requests for same file
towards the nginx i.e

1st user requested for file http://edge.files.com/videos/test.mp4.
During
the downloading of first requested file, the second user also requested
the
same file but with random seeking
http://edge.files.com/videos/test.mp4?start=33 . Now as the request uri
is
changed, there are two different requests for the same file in varnish
and
again nginx tmp directory was filled with 8MB instead of 4 which means
nginx downloaded the full file twice. So Random seeking will only work
once
the file is cached locally, otherwise nginx will keep on creating tmp
files
against random seekings.

I have two questions now :-

  1. If there’s way to prevent duplicate downloads for random seekings
    while
    the file not downloaded yet ? Note :- We cannot disable mp4 module.
  2. Should nginx in front of varnish never work as expected or i am doing
    something wrong ?

Following are existing varnish in front of nginx configs. Please let me
know if something need to be fixed :-

varnish config :-

backend origin002 {
.host = “127.0.0.1”;
.port = “8080”;
}

backend origin003 {
.host = “127.0.0.1”;
.port = “8080”;
}

sub vcl_recv {

    if   ( req.http.host == "origin002.files.com" ){
            set req.backend_hint = origin002;
    } elsif ( req.http.host == "origin003.files.com" ){
            set req.backend_hint = origin003;
    } elsif ( req.http.host == "origin004.files.com" ){
            set req.backend_hint = origin004;
    }

}

sub vcl_backend_response {

if (bereq.url ~ “^[^?].(mp4|jpeg|jpg)(?.)?$”){
set beresp.do_stream = true;
return (deliver);
}
set beresp.grace = 1m;

return (deliver);

}

sub vcl_deliver {

}


Nginx config :-

server {

    listen       127.0.0.1:8080;
    server_name  origin002.files.com;
    root /var/www/html/tunefiles;
    location ~ \.(mp4|jpeg|jpg)$ {
           root   /var/www/html/tunefiles;
            mp4;
    error_page 404 = @fetch;

        }


    location ~ \.(php)$ {
            proxy_pass http://origin002.files.com:80;
    }



    location @fetch {
    internal;
    proxy_max_temp_file_size 0;
    proxy_pass http://content.files.com:80$uri;
    proxy_store        on;
        proxy_store_access user:rw group:rw all:r;
    root /var/www/html/tunefiles;

}

}

I can also send the configs which were configured for nginx in front of
varnish (which didn’t resolved my issue).

BTW, i am using malloc storage instead of file in varnish.

Thanks !!

On Wed, Sep 24, 2014 at 6:55 PM, shahzaib shahzaib
[email protected]

@RR, thanks a lot for the explanation and examples. It really helped me
:slight_smile:

set req.url = regsub(req.url, “?.*”, “”);

It will also prevent users seeking the video because the arguments after
“?” will remove whenever user will try to seek the video stream, isn’t
it ?

unset req.http.Cookie;
unset req.http.Accept-Encoding;
unset req.http.Cache-Control;

I’ll apply it right at the top of vcl_recv.

If you insist on using proxy_store I would probably also add
proxy_ignore_client_abort on;

Well, only proxy_store is able to fulfill my requirements that is the
reason i’ll have to stick with it.

I am bit confused about the varnish. Actually, i don’t need any kind of
caching within the varnish as nginx already doing it via proxy_store. I
just need varnish to merge the subsequent requests into 1 and forward it
to
nginx and i think varnish is doing it pretty well . Nevertheless, i am
confused if malloc caching will have any odd effect on the stream
behavior
? Following is the curl request for video file on caching server and Age
parameter is also there :-

curl -I

HTTP/1.1 200 OK
Date: Thu, 25 Sep 2014 18:26:24 GMT
Content-Type: video/mp4
Last-Modified: Tue, 23 Sep 2014 08:36:11 GMT
ETag: “542130fb-5cd4456”
Age: 5
Content-Length: 97338454
Connection: keep-alive

Thanks !!
Shahzaib

It will also prevent users seeking the video because the arguments after
“?” will remove whenever user will try to seek the video stream, isn’t it
?

In general it shouldn’t since the ‘?start=’ is handled by nginx and not
varnish, but I’m not exactly sure how the mp4 module of nginx handles a
proxied request.
You have to test it.

In worst case scenario imho only the first request (before landing on
the
proxy_store server) will “fail” eg play from the beginning instead of
the
time set.

Well, only proxy_store is able to fulfill my requirements that is the
reason i’ll have to stick with it.

Well you can try to use varnish as the streamer, just need some
(web)player
supporting byte-range requests for the seeking (
http://flash.flowplayer.org/plugins/streaming/pseudostreaming.html ).

I am bit confused about the varnish. Actually, i don’t need any kind of
caching within the varnish as nginx already doing it via proxy_store. I
just need varnish to merge the subsequent requests into 1 and forward it
to nginx and i think varnish is doing it pretty well. Nevertheless, i am
confused if malloc caching will have any odd effect on the stream behavior
?

You can try to pass the request without caching:

sub vcl_fetch {
return (pass);
}

(maybe even do it in the vcl_recv stage but again I’m not exactly sure
if in
that case the request coalescing works).

rr

In general it shouldn’t since the ‘?start=’ is handled by nginx and not
varnish, but I’m not exactly sure how the mp4 module of nginx handles a
proxied request.
You have to test it.

Sure, i’ll test it.

sub vcl_fetch {
return (pass);
}

You’re right about return(pass), coalescing doesn’t work with pass.

In worst case scenario imho only the first request (before landing on the
proxy_store server) will “fail” eg play from the beginning instead of
the
time set.
Well, i am facing more worse scenario that first request always fail to
stream and player(HTML5) keeps on loading.

I’m already checking if there’s some config issue with varnish or this
is
the default behaviour(Which i don’t think it is).

Thanks @RR

Shahzaib

3 clients requested for test.mp4 (file size is 4mb) → nginx → file not
existed (proxy_store) → varnish → backend (fetch the file from
origin).
When nginx proxied these three requests subsequently towards the varnish,
despite of filling 4mb of tmp dir it was filled with 12MB which means
nginx is proxying all three requests towards the varnish server and
creating tmp files as long as the file is not downloaded. (The method was
failed)

That is expected, this setup only “guards” the content server.

Now varnish also has a flaw to send subsequent requests for same file
towards the nginx i.e

It’s not a really flaw but default behaviour (different urls mean
different
content/cachable objects), but of course you can implement your own
scenario:

By adding:

sub vcl_recv {
set req.url = regsub(req.url, “?.*”, “”);
}

will remove all the the arguments behind ? from the uri when forwarding
to
the content backend.

For static content I usually also add something like:

unset req.http.Cookie;
unset req.http.Accept-Encoding;
unset req.http.Cache-Control;

to normalise the request and so varnish doesnt try to cache different
versions of the same object.

If you insist on using proxy_store I would probably also add
proxy_ignore_client_abort on; (
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_ignore_client_abort
) to the nginx configuration. So the requests don’t get repeated if the
client closes/aborts the request early etc.

rr

Also, removing arguments after “?” also disabled the pseudo streaming.
So i
think i can’t apply this method !!

On Mon, Sep 29, 2014 at 6:05 PM, shahzaib shahzaib
[email protected]

@RR, i would like to inform you that the issue regarding failed stream
for
1st request is solved. Varnish was removing content-length header for
1st
request . Enabling Esi processing has resolved this issue.

set beresp.do_esi = true;

thanks !!

On Sat, Sep 27, 2014 at 10:41 AM, shahzaib shahzaib
[email protected]