.htaccess issues

luislavena · February 10, 2012, 6:08pm

I’m starting to use nginx as a proxy/cache (apache back-end), and I’ve a
problem regarding directories that uses .htaccess to restrict access by
ip
(allow,deny).

The first time when a allowed IP access this area (i.e. /downloads), the
object is cached, but when a unauthorized IP access the same dir, it
gets
the object from cache.

Is there a way to deal with that?

Regards,

Guilherme

guilherme · February 10, 2012, 6:33pm

On 10 Fev 2012 17h08 WET, [email protected] wrote:

I’m starting to use nginx as a proxy/cache (apache back-end), and
I’ve a problem regarding directories that uses .htaccess to restrict
access by ip (allow,deny).

The first time when a allowed IP access this area (i.e. /downloads),
the object is cached, but when a unauthorized IP access the same
dir, it gets the object from cache.

Is there a way to deal with that?

At the http level:

geo $not_allowed {
default 1;
127.0.0.1 0;
192.168.45.0/24 0;
}

Then add in the cache config:

proxy_cache_bypass $not_allowed;

— appa

guilherme · February 10, 2012, 6:58pm

Antonio,

I’m using apache as a back-end, and I need to keep .htaccess files in
apache, because this is a shared web hosting server and it’s hosting ~
thousand websites.

The problem is that nginx is serving content allowed just for some IPs,
for
everyone, after this content is cached.

guilherme · February 10, 2012, 7:09pm

Then there is no way you can cache. Pass everything for these dirs
(/downloads) and filter htaccess with x-forwarded-for (make sure its
being sent by nginx).

By pure logic, the only way there would be to set the IP whitelist in
the nginx config and check it even for cached items hit.

Sent from my BlackBerry

–
Adrin Navarro / (+34) 608 831 094

guilherme · February 10, 2012, 8:41pm

Adrin,

This would fix the problem, but I don’t know the directories that has a
.htaccess file with allow/deny.

Example:

Scenario: nginx (cache/proxy) + back-end apache

root@srv1 [~]# ls -a /home/domain/public_html/restrictedimages/
./ …/ .htaccess image.jpg
root@srv1 [~]# cat /home/domain/public_html/restrictedimages/.htaccess
allow from x.x.x.x
deny from all

In the first access (source IP: x.x.x.x) to
http://domain.com/restrictedimages/image.jpg, nginx proxy request to
apache
and cache response. The problem comes in other request from other IP
address different from x.x.x.x. Nginx deliver the objects from cache,
even
if the ip address is not authorized, because nginx doesn’t understand
.htaccess.

I would like to bypass cache in this cases, maybe using
proxy_cache_bypass,
but I don’t know how. Any idea?

guilherme · February 10, 2012, 7:09pm

On 10 Fev 2012 17h57 WET, [email protected] wrote:

Antonio,

I’m using apache as a back-end, and I need to keep .htaccess files
in apache, because this is a shared web hosting server and it’s
hosting ~ thousand websites.

The problem is that nginx is serving content allowed just for some
IPs, for everyone, after this content is cached.

Using the suggested config will fix that. It’s backend agnostic. It
supports Apache or whatever upstream you’re passing to.

Post your full config if you want a more specific suggestion.

— appa

guilherme · February 11, 2012, 2:11pm

10 февраля 2012, 23:40 от Guilherme [email protected]:

allow from x.x.x.x
but I don’t know how. Any idea?
You could use this:

proxy_cache_key $scheme$remote_addr$host$$server_port$request_uri;

This would make originating IP addresses ($remote_addr) part of
the cache key, so different clients would get the correct responses
from the cache just as if they were accessing the backend directly,
there’s no need to bypass the cache at all.

Max

guilherme · February 10, 2012, 8:59pm

On 10 Fev 2012 19h40 WET, [email protected] wrote:

…/ .htaccess image.jpg root@srv1 [~]# cat
I would like to bypass cache in this cases, maybe using
proxy_cache_bypass, but I don’t know how. Any idea?

I already gave you a suggestion. You just need to use a geo directive
where you enumerate all the IPs that can access.

AFAICT this foots the bill. No need to complicate it with headers
being passed to the backend.

— appa

guilherme · February 12, 2012, 4:33pm

On Fri, Feb 10, 2012 at 5:58 PM, Antnio P. P. Almeida
[email protected]wrote:

doesn’t understand .htaccess.
— appa

nginx mailing list
[email protected]
nginx Info Page

Antonio, geo directive would be a great idea if I know the IPs that can
access the website (or directory), but the application is not mine, and
the
customer can change this list (in .htaccess). In this case the ip list
in
nginx would be outdated.

guilherme · February 12, 2012, 4:38pm

On Fri, Feb 10, 2012 at 6:08 PM, Max [email protected] wrote:

./ …/ .htaccess image.jpg
if the ip address is not authorized, because nginx doesn’t understand
This would make originating IP addresses ($remote_addr) part of
the cache key, so different clients would get the correct responses
from the cache just as if they were accessing the backend directly,
there’s no need to bypass the cache at all.

Max

nginx mailing list
[email protected]
nginx Info Page

Max, good idea, but in the other requests, that I want to cache
responses,
the cache size will grow too fast, because the same object will be
cached a
lot of times, cause the ip adress is in the cache key (one cache entry
per
IP).

guilherme · February 13, 2012, 6:58pm

On 12 Fev 2012 15h32 WET, [email protected] wrote:

You can use the auth_request module for that then.

http://mdounin.ru/hg/ngx_http_auth_request_module

I’ve replicated the mercurial repo on github:

It involves setting up a location that proxy_pass(es) to the Apache
upstream and returns 403 if not allowed to access.

Be careful with the X-Forwarded-For header and how it’s treated on the
Apache side so that you get a real correspondence with the client,
thus making the authorization procedure reliable.

— appa

guilherme · February 12, 2012, 5:50pm

On Fri, Feb 10, 2012 at 03:08:24PM -0200, Guilherme wrote:

Hi there,

The first time when a allowed IP access this area (i.e. /downloads), the
object is cached, but when a unauthorized IP access the same dir, it gets
the object from cache.

Is there a way to deal with that?

Unfortunately, the only answer is “fix your application”.

If you (apache) want the content not to be cached, you must set the
“please do not cache” http headers.

Any proxy between the client and the server can cache the content, and
serve it to other clients, unless the origin server marks it
uncacheable. This isn’t nginx-specific.

See, for example, mod_cache - Apache HTTP Server Version 2.2
and Caching Guide - Apache HTTP Server Version 2.2 for apache’s
notes on the same topic.

If you can’t configure apache to correctly declare what is and isn’t
cacheable, then you must decide yourself which responses nginx should
(or should not) cache. After you’ve decided which they are, you can
configure nginx to match.

If you can’t reliably tell nginx what is cacheable, the only safe option
is to cache nothing in nginx. But you’ll (probably) have to address the
same issue for any proxy between the client and the server.

Good luck with it,

f

Francis D. [email protected]

guilherme · February 13, 2012, 7:08pm

12 февраля 2012, 19:37 от Guilherme : > On Fri, Feb 10, 2012 at 6:08 PM,
Max wrote: > > > > > 10 февраля 2012, 23:40 от Guilherme : > > > This
would fix the problem, but I don’t know the directories that has a > > >
.htaccess file with allow/deny. > > > > > > Example: > > > > > >
Scenario: nginx (cache/proxy) + back-end apache > > > > > > root@srv1
[~]# ls -a /home/domain/public_html/restrictedimages/ > > > ./ …/
.htaccess image.jpg > > > root@srv1 [~]# cat
/home/domain/public_html/restrictedimages/.htaccess > > > allow from
x.x.x.x > > > deny from all > > > > > > In the first access (source IP:
x.x.x.x) to > > > http://domain.com/restrictedimages/image.jpg, nginx
proxy request to > > apache > > > and cache response. The problem comes
in other request from other IP > > > address different from x.x.x.x.
Nginx deliver the objects from cache, > > even > > > if the ip address
is not authorized, because nginx doesn’t understand > > > .htaccess. > >

I would like to bypass cache in this cases, maybe using > >
proxy_cache_bypass, > > > but I don’t know how. Any idea? > > > > You
could use this: > > > > proxy_cache_key
$scheme$remote_addr$host$$server_port$request_uri; > > > > This would
make originating IP addresses ($remote_addr) part of > > the cache key,
so different clients would get the correct responses > > from the cache
just as if they were accessing the backend directly, > > there’s no need
to bypass the cache at all. > > > > Max > >
_______________________________________________ > > nginx mailing list >
[email protected] > > nginx Info Page >
Max, good idea, but in the other requests, that I want to cache
responses, > the cache size will grow too fast, because the same object
will be cached a > lot of times, cause the ip adress is in the cache key
(one cache entry per > IP). I suggest you recompile your nginx with the
Lua module included: Lua | NGINX Then you could
use something like this: proxy_cache_key $scheme$host$server_port$uri;
location / { access_by_lua ’ local res =
ngx.location.capture(“/test_access” … ngx.var.request_uri) if
res.status == ngx.HTTP_OK then return end if res.status ==
ngx.HTTP_FORBIDDEN then ngx.exit(res.status) end
ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR) '; proxy_set_header Host
$host:$proxy_port; proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass http://backend/; } location /test_access/ { internal;
proxy_method HEAD; proxy_set_header X-Forwarded-For $remote_addr;
proxy_cache_bypass “Always bypass the cache!”; proxy_no_cache “Never
store the response in the cache!”; proxy_pass http://backend/; } The
access_by_lua block initiates a local non-blocking subrequest for
“/test_access/$request_uri”, which is handled by the /test_access/
location block as follows: the request method is set to HEAD instead of
the original POST or GET request in order to find out whether the
original request would be allowed or denied without the overhead of
having to transfer any files. The X-Forwarded-For header is also reset
to the originating IP address. Any X-Forwarded-For headers set by
clients are removed and replaced, so the backend server can rely on this
header for IP-based access control. The Apache mod_remoteip module can
be configured to make sure Apache always uses the originating IP address
from the X-Forwarded-For header:
mod_remoteip - Apache HTTP Server Version 2.5 The next two
directives make sure that the cache is always bypassed and that no HEAD
request responses are cached because you want to make sure you have the
latest access control information. The original request URI is then
passed on to the backend (note the trailing slash), and the response is
captured in the res variable inside the access_by_lua block. If the
subrequest was completed with the HTTP OK status code, access is
allowed, so after returning from the access_by_lua block the Host and
X-Forwarded-For headers are set and the original request is processed -
first the cache is checked and if there is no matching entry the request
is passed on to the backend server and the response is cached under such
a key that makes it possible for a single copy of a file to be stored in
the cache. If the subrequest is completed with the HTTP FORBIDDEN status
code or any other error, the access_by_lua block is exited in a way that
terminates further processing and returns the status code. There you go,
thanks to the speed and non-blocking nature of Lua, you now have a
solution that causes minimal overhead by allowing you to take full
advantage of both caching and IP-based access control. Max

guilherme · February 14, 2012, 3:15pm

13 февраля 2012, 21:58 от António P. P. Almeida [email protected]:

It involves setting up a location that proxy_pass(es) to the Apache
upstream and returns 403 if not allowed to access.

Maxim’s auth_request module is great, but AFAIK, it doesn’t
support caching, which makes it unsuited to the OP’s
situation because the OP wants to cache large files from
the backend server(s).

The access_by_lua solution I proposed, on the other hand,
does make it possible to cache the content, and if one should
want, even the IP-based authorization information in a
separate cache zone.

Max

guilherme · February 14, 2012, 3:46pm

On 13 Fev 2012 21h13 WET, [email protected] wrote:

GitHub - perusio/nginx-auth-request-module: A Nginx module that enables authorizations on sub-requests
does make it possible to cache the content, and if one should
want, even the IP-based authorization information in a
separate cache zone.

AFAIK the authorization occurs well before any content is served. What
does that have to with caching?

Using access_by_lua with a subrequest like you suggested is, AFAICT,
equivalent to using auth_request.

IIRC the OP wanted first to check if a given client could access a
certain file. If it can, then it gets the content from the cache or
whatever he decides.

— appa

guilherme · February 13, 2012, 7:10pm

12 февраля 2012, 19:37 от Guilherme [email protected]:

address different from x.x.x.x. Nginx deliver the objects from cache,
proxy_cache_key $scheme$remote_addr$host$$server_port$request_uri;
nginx Info Page

Max, good idea, but in the other requests, that I want to cache responses,
the cache size will grow too fast, because the same object will be cached a
lot of times, cause the ip adress is in the cache key (one cache entry per
IP).

I suggest you recompile your nginx with the Lua module included:

Then you could use something like this:

proxy_cache_key $scheme$host$server_port$uri;

location / {

access_by_lua '
    local res = ngx.location.capture("/test_access" ..

ngx.var.request_uri)
if res.status == ngx.HTTP_OK then
return
end
if res.status == ngx.HTTP_FORBIDDEN then
ngx.exit(res.status)
end
ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR)
';

proxy_set_header Host $host:$proxy_port;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass http://backend/;

}

location /test_access/ {
internal;
proxy_method HEAD;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_cache_bypass “Always bypass the cache!”;
proxy_no_cache “Never store the response in the cache!”;
proxy_pass http://backend/;
}

The access_by_lua block initiates a local non-blocking subrequest
for “/test_access/$request_uri”, which is handled by the /test_access/
location block as follows: the request method is set to HEAD instead
of the original POST or GET request in order to find out whether the
original request would be allowed or denied without the overhead of
having to transfer any files.

The X-Forwarded-For header is also reset to the originating IP address.
Any X-Forwarded-For headers set by clients are removed and replaced,
so the backend server can rely on this header for IP-based access
control. The Apache mod_remoteip module can be configured to
make sure Apache always uses the originating IP address from the
X-Forwarded-For header:

http://httpd.apache.org/docs/trunk/mod/mod_remoteip.html

The next two directives make sure that the cache is always bypassed
and that no HEAD request responses are cached because you want
to make sure you have the latest access control information. The
original
request URI is then passed on to the backend (note the trailing slash),
and the response is captured in the res variable inside the
access_by_lua
block. If the subrequest was completed with the HTTP OK status code,
access is allowed, so after returning from the access_by_lua block
the Host and X-Forwarded-For headers are set and the original request
is processed - first the cache is checked and if there is no matching
entry the request is passed on to the backend server and the response
is cached under such a key that makes it possible for a single copy of
a file to be stored in the cache.

If the subrequest is completed with the HTTP FORBIDDEN status code
or any other error, the access_by_lua block is exited in a way that
terminates further processing and returns the status code.

There you go, thanks to the speed and non-blocking nature of Lua, you
now have a solution that causes minimal overhead by allowing you to
take full advantage of both caching and IP-based access control.

Max

guilherme · February 15, 2012, 1:51am

14 февраля 2012, 01:44 от António P. P. Almeida [email protected]:

I’ve replicated the mercurial repo on github:

IIRC the OP wanted first to check if a given client could access a
certain file. If it can, then it gets the content from the cache or
whatever he decides.

Have you ever actually used the auth_request module? Or have you at
least read the part of the auth_request module README file where Maxim
wrote:

“Note: it is not currently possible to use proxy_cache/proxy_store (and
fastcgi_cache/fastcgi_store) for requests initiated by auth request
module.”

Let’s take the example from Maxim’s README file:

location /private/ {
auth_request /auth;
…
}

location = /auth {
proxy_pass …
proxy_pass_request_body off;
proxy_set_header Content-Length “”;
proxy_set_header X-Original-URI $request_uri;
}

Let’s say you configure caching in the /private/ location block, and
the cache is empty. The first matching request would get passed on to
the backend server, which would send back the latest requested file,
if the request was allowed. The frontend server would then store the
file in the cache and send it back to the client, as expected.

The next matching request would again be passed on to the backend
server, which would again send back the latest requested file,
if the request was allowed, but this time the frontend server would
send back to the client NOT the LATEST file, but the OLD file from
the CACHE. The old file would remain in the cache, from where it would
keep getting sent back to clients until it expired, while each new
allowed
request would cause the latest requested file to be retrieved from
the backend server and then DISCARDED.

Turning the proxy_no_cache directive on would prevent anything from
being stored in the cache, as expected.

Turning the proxy_bypass directive on would cause the cache to be
bypassed, and the latest requested file to be both sent back to the
client and stored in the cache each time (as long as proxy_no_cache
wasn’t turned on), but either way you’d end up retrieving the file
from the backend server on every request, which defeats the purpose of
caching.

However, forbidden response codes from the backend server are always
correctly sent back to clients, and are never cached.

Now, let’s say you’ve given up on caching in the /private/ location
block and decided to configure caching in the /auth location block.
Again, the cache is empty. Here the first matching request passed
on from the /private/ location block would be sent on to the backend
server, which would send back the latest requested file, if the request
was allowed. The frontend server would then store this file in the
cache,
but instead of sending it back to the client, it would just TERMINATE
the connection (444-style)!

The next matching request would again get passed on to the backend
server, which would again send back the latest requested file, if the
request was allowed, and in this case, the frontend server would
send back the latest requested file to the client, but ONLY if there
was an EXISTING cache entry for the request cache key! If there
was NO cache entry for the request cache key, then the requested file
would get retrieved from the backend server and stored in the cache,
if the request was allowed, but NOTHING would be sent back to the client
and the connection would be TERMINATED 444-style.

Once a file got stored in the cache, it would REMAIN in the CACHE
until it expired, while each new allowed request would cause the
latest requested file to be retrieved from the backend, sent back
to the client and DISCARDED without replacing the old file in the cache.

If there was no cache entry for a request cache key and the
proxy_no_cache directive was turned on, then each and every request
would cause the requested file to be retrieved from the backend server
and discarded, while the connection would ALWAYS be TERMINATED
444-style.

Turning the proxy_bypass directive on would cause the cache to be
bypassed, and the latest requested file to be retrieved from the
backend server and stored in the cache, but nothing would be sent
back to the client, and the connection would again be terminated
444-style.

So, as you can surely see by now, using caching with the auth_request
module not only defeats the purpose of caching, but also violates
the expected functionality in serious and totally unexpected ways.
The access_by_lua solution I proposed, on the other hand, can safely
be used with caching.

Maxim, feel free to add this explanation to the auth_request module
README file.

Max

guilherme · February 15, 2012, 8:36am

On 14 Fev 2012 07h48 WET, [email protected] wrote:

Have you ever actually used the auth_request module? Or have you at
least read the part of the auth_request module README file where
Maxim wrote:

location /private {

}

location /private/ {
error_page 403 /403.html;
auth_request /auth;
try_files /cache?q=$uri =404; # there’s a bug in 1.1.14 this won’t
work
}

location = /auth {
proxy_pass …
proxy_pass_request_body off;
proxy_set_header Content-Length “”;
proxy_set_header X-Original-URI $request_uri;
}

location /cache {
internal;

usual cache stuff

proxy_pass http://backend$arg_q;
}

It works for me here. I can post the debug log if necessary.

— appa

guilherme · February 15, 2012, 10:49am

I’ll take a look in lua and auth_request module.

Thanks for the suggestions. It was helpful!

guilherme · February 15, 2012, 8:56am

On 14 Fev 2012 07h48 WET, [email protected] wrote:

Have you ever actually used the auth_request module? Or have you at
least read the part of the auth_request module README file where
Maxim wrote:

Just to say that it works also with the cache on private:

location /private/ {
error_page 403 /403.html;
auth_request /auth;
# usual cache stuff
proxy_pass http://backend;
}

location = /auth {
proxy_pass …
proxy_pass_request_body off;
proxy_set_header Content-Length “”;
proxy_set_header X-Original-URI $request_uri;
# It’s here that you cannot cache…
}

The previous was just an example where the cache location was really
private. It cannot be accessed directly.

I suspect the reason it cannot be cached is simply because not only it
would defeat the authorization purpose as well due to the fact that
this module doesn’t care about the request body. It only deals with
the headers.

— appa