Preventing rewrite loops with "index"

Hi,
So with my first rewrite issue solved I now move closer towards the real
configuration and run into a problem with the index directive.

My location looks like this:

location ~* ^/(([A-Za-z])([A-Za-z0-9])([A-Za-z0-9])[^/])(/.)?$ {
root /web;
set $site_path /users/$2/$3/$4/$1/htdocs;
set $real_uri $5;
rewrite .* $site_path$real_uri break;
}

When I request “/test/index.html” the location matches and gets properly
rewritten into a hashed form “/users/t/e/s/test/index.html”. Then the
root
get prefixed resulting in the path “/web/users/t/e/s/test/index.html”
which
get correctly delivered by nginx. So far so good.

The problem happens when I request “/test/” instead which should deliver
the same index.html through the index directive. That doesn’t happen
though.

Looking at the log what seems to happen is that nginx sees that
“/web/users/t/e/s/test/” is a directory and issues a new request with
the
uri “/web/users/t/e/s/test/index.html”. This however matches the above
location again resulting in another rewrite that ends with a completely
broken path and a 404.

How can I get that the correct index processing for the first correctly
rewritten path without triggering another round of location processing
messing things up?

Regards,
Dennis

On Fri, Jan 22, 2010 at 03:06:47PM +0100, Dennis J. wrote:

 rewrite .* $site_path$real_uri break;

Looking at the log what seems to happen is that nginx sees that
“/web/users/t/e/s/test/” is a directory and issues a new request with the
uri “/web/users/t/e/s/test/index.html”. This however matches the above
location again resulting in another rewrite that ends with a completely
broken path and a 404.

How can I get that the correct index processing for the first correctly
rewritten path without triggering another round of location processing
messing things up?

location ~* ^/(([A-Za-z])([A-Za-z0-9])([A-Za-z0-9])[^/])(/.)?$ {
alias /web/users/$2/$3/$4/$1/htdocs$5;
}


Igor S.
http://sysoev.ru/en/

On 01/22/2010 04:01 PM, Igor S. wrote:

  set $site_path /users/$2/$3/$4/$1/htdocs;

the same index.html through the index directive. That doesn’t happen though.

location ~* ^/(([A-Za-z])([A-Za-z0-9])([A-Za-z0-9])[^/])(/.)?$ {
alias /web/users/$2/$3/$4/$1/htdocs$5;
}

This works as intended, thanks!
When I try to add a referrer check though I run into trouble. Adding the
following after the alias directive makes nginx return a 404 instead of
index.html:

         if ($request_uri ~ zip) {
         }

The log says that nginx cannot find the file “/web/users/////htdocs”.
When
I change the ~ into a = then nginx returns index.html correctly.
What I’m trying to get at is something similar to this:

valid_referers none www.mydomain.com;
if ($request_uri ~* .(mpg|zip|avi)$) {
if ($invalid_referer) {
return 405;
}
}

I noticed that nested if’s are not possible so I’m not sure how to
handle
such a case where multiple conditions have to be satisfied (name must
match
and $invalid_referer must be set). But right now I’m wondering why
changing
the “=” into “~” above suddenly results in a 404 and the captured
variables
all beeing empty.

Regards,
Dennis

Hello!

On Sat, Jan 23, 2010 at 02:47:12AM +0100, Dennis J. wrote:

 root /web;

The problem happens when I request “/test/” instead which should deliver
messing things up?
if ($request_uri ~ zip) {
return 405;
}
}

I noticed that nested if’s are not possible so I’m not sure how to
handle such a case where multiple conditions have to be satisfied
(name must match and $invalid_referer must be set). But right now
I’m wondering why changing the “=” into “~” above suddenly results
in a 404 and the captured variables all beeing empty.

With “=” condition is false. And - no, there is no surprise here.
See here for some more details:

Maxim D.

On 01/23/2010 04:59 AM, Maxim D. wrote:

rewritten into a hashed form “/users/t/e/s/test/index.html”. Then the root
broken path and a 404.
When I try to add a referrer check though I run into trouble. Adding

I’m wondering why changing the “=” into “~” above suddenly results
in a 404 and the captured variables all beeing empty.

With “=” condition is false. And - no, there is no surprise here.
See here for some more details:

With ~ the condition is false too after all I’m calling
“/test.index.html”
but if “if” is generally buggy then I gues that might be the problem.

If is Evil… when used in location context | NGINX

So how do I accomplish what I’m trying to do above with nginx?
For Apache this would look something like this:

RewriteCond %{HTTP_REFERER} !www.mydomain.com
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .(mpg|zip|avi)$ - [F]

What is the equivalent in nginx?

Regards,
Dennis

Hello!

On Sat, Jan 23, 2010 at 05:19:07AM +0100, Dennis J. wrote:

configuration and run into a problem with the index directive.
When I request “/test/index.html” the location matches and gets properly
location again resulting in another rewrite that ends with a completely
This works as intended, thanks!
What I’m trying to get at is something similar to this:
(name must match and $invalid_referer must be set). But right now
I’m wondering why changing the “=” into “~” above suddenly results
in a 404 and the captured variables all beeing empty.

With “=” condition is false. And - no, there is no surprise here.
See here for some more details:

With ~ the condition is false too after all I’m calling
“/test.index.html” but if “if” is generally buggy then I gues that
might be the problem.

No, than if isn’t the thing to blame on. And, after all, it’s not
generally buggy, it’s specifically buggy. :slight_smile:

In this case you just smashed captures from location with another
regexp (as alias evaluates after rewrite directives, including
your if with regexp). Solution is to use named captures as
available in nginx 0.8.25+ or explicit set to save captures.

If is Evil… when used in location context | NGINX

So how do I accomplish what I’m trying to do above with nginx?
For Apache this would look something like this:

RewriteCond %{HTTP_REFERER} !www.mydomain.com
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .(mpg|zip|avi)$ - [F]

What is the equivalent in nginx?

Normally this translates to:

location ~ \.(mpg|zip|avi)$ {
    valid_referers ...

    if ($invalid_referer) {
        return 403;
    }
}

As you need this together with already complex location - I belive
better aproach is to use separate rewrite as in your original
message, but protect destination to avoid your original problem.

Something like this should work:

location ~* ^/(([a-z])([a-z0-9])([a-z0-9])[^/])(/.)?$ {
set $path /$2/$3/$4/$1/htdocs$5;
rewrite ^ /users/$path last;
}

location ^~ /users/ {
internal;
root /web;

   location ~ \.(mpg|zip|avi)$ {
       valid_referers ...
       if ($invalid_referer) {
           return 403;
       }
   }

}

Some key points:

  1. Note “^~” in location /users/. It means “do not apply regexp
    locations”, so your location with rewrite won’t be triggered
    again.

  2. Note “internal” in location /users/. It means “only visible
    for internal redirects”, so even user called “users” should be
    correctly processed by the first location.

Maxim D.

Hello!

On Sun, Jan 24, 2010 at 10:45:32PM +0100, Piotr S. wrote:

listen 8000;
location / { return 500; }
location /x { internal; return 500; }
}

Accessing /x will result in 404 response.

True, I was wrong here. Actually I wasn’t sure and that’s why I
used “should”. :slight_smile:

This is a bug and it’s somewhere on my TODO list.

Strictly - it’s not bug, it’s just how internal locations work
now. But I agree it’s a probably good idea to change semantics
and make them just invisible for external requests.

Maxim D.

  1. Note “internal” in location /users/. It means “only visible
    for internal redirects”, so even user called “users” should be
    correctly processed by the first location.

Actually, this isn’t true. Any attempt to access internal location
results
in 404 response.

You can verify this with very simple configuration:

server {
listen 8000;
location / { return 500; }
location /x { internal; return 500; }
}

Accessing /x will result in 404 response. This is a bug and it’s
somewhere
on my TODO list.

Best regards,
Piotr S. < [email protected] >

Hi,

Maxim D. wrote:

results in 404 response.

The example is obviously correct, but it doesn’t truly explain the
reason for getting the 404 for accessing /users/xxx URLs (even though
the result is almost the same). The reason is to do with the order that
locations are handled, specifically that ^~ locations are handled before
~* and ~ ones, and if they match, then the regex ones aren’t tested. If
you try to access the URL /users/xxx, it will therefore match the second
location given by ^~, and return 404 because it’s an internal location.
Therefore, trying access anything under a user named ‘users’ will fail
(though the URL /users on its own is ok, because that will match the
regex location and not the ^~ location).

Using location /users in the original locations will result in an
internal server error, because the regex will be caught before the
/users location each time the URL is checked, creating an infinite loop.

and make them just invisible for external requests.

I was under the impression that the way internal requests currently work
was a consciously-chosen decision, and was considered a feature. It’s a
useful one IMHO. Surely if you want to make a location fully
‘invisible’ (i.e. both internally and externally), you can just add the
directive ‘return 404;’ to the location.

Marcus.

Hello!

On Mon, Jan 25, 2010 at 08:58:18AM +0200, Marcus C. wrote:

Actually, this isn’t true. Any attempt to access internal location
Accessing /x will result in 404 response.
the ^~ location).
It’s somewhat obvious.

Using location /users in the original locations will result in an
internal server error, because the regex will be caught before the
/users location each time the URL is checked, creating an infinite
loop.

By “original” you mean config I’m suggested to Dennis J? No, as
first rewrite will add ‘/’ to it, and on next iteration it will be
caught by /users/.

The problem will arise with directory redirects though
(/username/dir -> /username/dir/), as they will use paths after
rewrites, and this isn’t what we need here. When user has dir in
it’s htdocs - wee need redirect “/user/dir” -> “/user/dir/”, but
the config will issue “/users/u/s/e/users/dir/” one.

From the above I think that using alias will be better. In
0.8.* this may be done with named captures and nested locations,
like this:

location ~*
^/(?(?[a-z])(?[a-z0-9])(?[a-z0-9])[^/])(?

/.

)?$ {
alias /tmp/users/$n1/$n2/$n3/$name/htdocs$p;

   location ~ \.(mpg|zip|avi)$ {
       valid_referers localhost none blocked;
       if ($invalid_referer) {
           return 403;
       }
   }

}

In older versions one have to create separate locations for normal
files and ones which need special processing, e.g.

location ~*
^/(([a-z])([a-z0-9])([a-z0-9])[^/])(/..(mpg|zip|avi))?$ {
alias /tmp/users/$2/$3/$4/$1/htdocs$5;
valid_referers localhost none blocked;
if ($invalid_referer) {
return 403;
}
}

location ~* ^/(([a-z])([a-z0-9])([a-z0-9])[^/])(/.)?$ {
alias /tmp/users/$2/$3/$4/$1/htdocs$5;
}

feature. It’s a useful one IMHO. Surely if you want to make a
location fully ‘invisible’ (i.e. both internally and externally),
you can just add the directive ‘return 404;’ to the location.

No, “invisible” != “one which returns 404”. The idea is
that internal locations should be ignored during matching of
external requests and let other (non-internal) locations match
request instead.

Maxim D.

Hi,

Maxim D. wrote:

on its own is ok, because that will match the regex location and not
the ^~ location).

It’s somewhat obvious.

To you, sure (and I’m sure to Piotr too) - I wrote the above more for
anyone reading this who might not be as familiar with Nginx as you/Piotr
are, since I felt the explanation wasn’t obvious to everyone (just
trying to be helpful, that’s all).

Using location /users in the original locations will result in an
internal server error, because the regex will be caught before the
/users location each time the URL is checked, creating an infinite
loop.

By “original” you mean config I’m suggested to Dennis J? No, as
first rewrite will add ‘/’ to it, and on next iteration it will be
caught by /users/.

Sorry, my phrasing was bad. I was referring your suggestion with the ^~
removed entirely (i.e. ‘location /users’ not ‘location ^~ /users’) - to
highlight the difference between using ^~ and without anything (again
not for your benefit).

request instead.

Ah, I see. Yes, that makes sense.

Marcus.

On 01/25/2010 11:57 AM, Maxim D. wrote:

location / { return 500; }
therefore match the second location given by ^~, and return 404
/users location each time the URL is checked, creating an infinite
the config will issue “/users/u/s/e/users/dir/” one.
if ($invalid_referer) {
return 403;
}
}
}

I now went with something like the above but I’m still running into a
snag
when I get to setting up fastCGI which apparently has to do with the
fact
that $document_root is not set up properly when “alias” is used. This is
my
config so far:

location ~*
^/(?(?[a-z])(?[a-z0-9])(?[a-z0-9])[^/])(?

/.

)?$ {
alias /web/users/$n1/$n2/$n3/$name/htdocs$p;

 if (-f /web/users/$n1/$n2/$n3/$name/user/.disable-member) {
     return 405;
 }

 location ~ \.(zip|mpg|avi)$ {
     valid_referers none www.testdomain.com;
     if ($invalid_referer) {
         return 403;
     }
 }

 location ~ \.php$ {
     set $phphost 127.0.0.1:9000;
     fastcgi_pass   $phphost;
     fastcgi_index  index.php;
     fastcgi_param  DOCUMENT_ROOT 

/web/users/$n1/$n2/$n3/$name/htdocs;
fastcgi_param SCRIPT_FILENAME
$document_root$fastcgi_script_name;
fastcgi_param PATH_INFO $fastcgi_script_name;
include fastcgi_params;
}
}

Notice how I have to set DOCUMENT_ROOT and SCRIPT_FILENAME in order to
get
this working. What is strange is that $document_root is
“/web/users/t/e/s/test/htdocs/index.php” (the alias?) and
$fastcgi_script_name is “/test/index.php” yet when I call a script
displaying $_SERVER I get SCRIPT_FILENAME displayed as
“/web/users/t/e/s/test/htdocs/index.php” which is what I want but
according
to the definition above I shouldn’t get. What I would expect is
“/web/users/t/e/s/test/htdocs/index.php/test/index.php” given the values
of
the variables.
Also even if these are right I still get ORIG_SCRIPT_FILENAME as
“/web/users/t/e/s/test/htdocs/index.php/test/index.php”, PATH_TRANSLATED
as
“/web/users/t/e/s/test/htdocs/test/index.php” and PHP_SELF as
“/test/index.php/test/index.php” which all don’t look right.

Any ideas why the variables look messed up like this?

Regards,
Dennis

On 01/25/2010 11:57 AM, Maxim D. wrote:

location / { return 500; }
therefore match the second location given by ^~, and return 404
/users location each time the URL is checked, creating an infinite
the config will issue “/users/u/s/e/users/dir/” one.
if ($invalid_referer) {
return 403;
}
}
}

I was trying something like that in 0.7 but couldn’t get around the
captured var smashing problem.

The approach you posted works fine but I feel a bit uncomfortable
because
it feels a bit like a hack. One modification I tried was using something
like “/__user/” as second location and then explicitly doing a rewrite
in
the first external location to essentially pass on the execution of the
query to the second location block. Think of it as an emulated “goto”.
The
advantage would have been that the declaration of the config is more
explicit and would not need to rely on features such as “^~” and
“internal”
to do some magic.
Unfortunately that fails once “index” enters the picture and I end up
with
requests to “…/u/s/e/users…”.

location ~* ^/(([a-z])([a-z0-9])([a-z0-9])[^/]*)(/.*)?$ {
    alias /tmp/users/$2/$3/$4/$1/htdocs$5;
}

This splitting up of the configuration is something I’m trying to avoid
since I still need to add other checks. One is a check whether a special
file exists and if it does deny the user access. In your initial
“/users/”
config I can put that once in the first location block but here I would
have to duplicate that in both. I wouldn’t be terrible but I doesn’t
look
as clear, oncise and straightforward as the 0.8 example you mentioned
above.
The other thing I still have to add is the handling of *.php files. What
is
special here is that I have to check another special file in the user
directory to see which php-upstream I need to pass things to (there are
two
different ones). What I’m planning to do is to write a little module
that
parses the special file and sets a variable according to its contents.
Then
I’ll use something like “fastcgi_pass $php_upstream” to pass the request
the appropriate upstream servers.

I think I’ll give 0.8 and your config from above a try as that seems to
be
the cleanest way to handle this.

Regards,
Dennis