Trying to configure an origin pull CDN-like reverse proxy

Hello, I’m hosting a group of Wordpress blogs with about 200k visits and
millions of hits per day. MySql + PHP live in a server (beefy VPS) and I
placed a reverse proxy in front of it to cache most of the requests.

Now I want to offload all the static files to a third server, taking
advantage of a feature of common Wordpress cache plugins, that rewrites
static file URLs for origin-pull CDN services. This way, an original URL
Sooma is rewritten as
http://cdn.url.com/wp-content/uploads/photo.jpg and this server requests
the
file form the original server, caches it and then serves it directly,
for
the duration of the 1st server’s Expires header/directive.

I thought it would be easy to use the proxy_* features, but I’m hitting
a
wall and I can’t find an applicable tutorial/article anywhere. Would
somebody have any advice on how to do this? This is the basic behavior
I’m
after:

  • Client requests static file cdn.blog.com/dir/photo.jpg
  • cdn.blog.com looks for the file in its cache
  • If the cache has it, check original or revalidate according with
    original
    headers (this is internal, I know).
  • If the cache doesn’t have it, request it from
    www.blog.com/dir/photo.jpg,
    cache it and serve it.
  • Preferably, allow for this to be done for many sites/domains, acting
    as a
    CDN server for many sites.

This is my conf:
The cache zones in otherwise default nginx.conf and before including
conf.d/*.conf (I’m on CentOS 6.3 with nginx 1.0.15 from EPEL)

proxy_cache_path /var/www/cache/images levels=1:2
keys_zone=images:200m
max_size=10g inactive=3d;

proxy_cache_path /var/www/cache/scripts levels=1:2
keys_zone=scripts:50m
max_size=10g inactive=3d;

proxy_cache_path /var/www/cache/pages levels=1:2
keys_zone=pages:200m
max_size=10g inactive=3d;

And this is the individual server config on conf.d/server1.conf

upstream backend_cdn.blog.com {
ip_hash;
server 333.333.333.333;
}

server {
listen 80;
server_name cdn.blog.com;
access_log off;

Set proxy headers for the passthrough

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

Let the Set-Cookie and Cache-Control headers through.

proxy_pass_header Set-Cookie;
proxy_pass_header Cache-Control;
proxy_pass_header Expires;

Fallback to stale cache on certain errors.

503 is deliberately missing, if we’re down for maintenance

we want the page to display.

proxy_cache_use_stale error
timeout
invalid_header
updating
http_500
http_502
http_504
http_404;

Set the proxy cache key

set $cache_key $scheme$host$uri$is_args$args;

location / {
proxy_pass http://backend_$host;
proxy_cache pages;
proxy_cache_key $cache_key;
proxy_cache_valid 15m; # 200, 301 and 302 will be cached.

2 rules to dedicate the no caching rule for logged in users.

proxy_cache_bypass $wordpress_auth; # Do not cache the response.

proxy_no_cache $wordpress_auth; # Do not serve response from cache.

add_header X-Cache $upstream_cache_status;
}

location ~* .(png|jpg|jpeg|gif|ico|swf|flv|mov|mpg|mp3)$ {
expires max;
log_not_found off;
proxy_pass http://backend_$host;
proxy_cache images;
proxy_cache_key $cache_key;
}

location ~* .(css|js|html|htm)$ {
expires 7d;
log_not_found off;
proxy_pass http://backend_$host;
proxy_cache scripts;
proxy_cache_key $cache_key;
}
}

With this configuration, whenever I call a static file such as
http://cdn.blog.com/wp-includes/js/prototype.js I end up being
redirected to
Sooma. I’ve tried many things,
like setting the Host header to various values or adding $uri to the end
of
the proxy_pass directives, to no avail. One thing to notice is that the
333.333.333.333 server only responds to www.blog.com, not cdn.blog.com.

Do I need a root directive in server1.conf?

I’m running in circles, any help will be much appreciated.

Thanks in advance,
Cachito Espinoza

Posted at Nginx Forum:

On Sat, Nov 03, 2012 at 12:16:46AM -0400, cachito wrote:

Hi there,

All untested by me, but…

  • Preferably, allow for this to be done for many sites/domains, acting as a
    CDN server for many sites.

So far, it looks like a straightforward caching reverse proxy setup. I’m
not quite sure what the last point means – but one server{} block per
site should work.

proxy_set_header Host $host;
$host here is probably “cdn.blog.com”.

What happens if you change this to “proxy_set_header Host www.blog.com;”
?

location ~* .(css|js|html|htm)$ {
expires 7d;
log_not_found off;
proxy_pass http://backend_$host;
proxy_cache scripts;
proxy_cache_key $cache_key;
}

With this configuration, whenever I call a static file such as
http://cdn.blog.com/wp-includes/js/prototype.js I end up being redirected to
http://www.blog.com/wp-includes/js/prototype.js. I’ve tried many things,
like setting the Host header to various values or adding $uri to the end of
the proxy_pass directives, to no avail. One thing to notice is that the
333.333.333.333 server only responds to www.blog.com, not cdn.blog.com.

What is the output of

curl -i -0 -H ‘Host: cdn.blog.com
http://333.333.333.333/wp-includes/js/prototype.js

? That is approximately what nginx will do. (You can add the extra
proxy_set_header headers there, if you think it will make a difference.)

My guess is that the 333.333.333.333 server returns the http redirect,
and nginx is correct in passing that on to the client.

The nginx log files should show more details.

Do I need a root directive in server1.conf?

If you read from the filesystem, or otherwise access $document_root,
then the root directive is used.

I don’t see that needed for this request.

Good luck with it,

f

Francis D. [email protected]