If-clause garbles variable content

I’d like to use a map in order to prevent various spiders from indexing
documents, but I’m running into strange issues, which I can’t explain
(nginx 0.7.62 or 0.7.64):

root /tmp/test;

map $lookup $test_blacklist {
default “”;
/aabbccddeeff/aabbccddeeff/2009/A/Pdf/20090101.pdf 1;
}

location ~* ^/aabbccddeeff(/.*) {
set $lookup “”;
if ($http_user_agent ~ “Googlebot|Slurp|msnbot”) {
# force lowercase “aabbccddeeff”
set $lookup /aabbccddeeff$1;
}

if ($test_blacklist != “”) {
return 410;
}

charset iso-8859-1;
alias /tmp/test/aabbccddeeff$1;
}

When requesting a file from that map, the open() log points to a garbled
path “/aabbccddeeffcddeeff” rather than the requested
“/aabbccddeeff/aabbccddeeff”:

31998#0: *1 “Googlebot|Slurp|msnbot” matches “Googlebot”, client:
127.0.0.1, server: test, request: “HEAD
/aabbccddeeff/aabbccddeeff/2009/A/Pdf/20090101.pdf HTTP/1.1”, host:
“test”
31998#0: *1 open()
“/tmp/test/aabbccddeeffcddeeff/2009/A/Pdf/20090101.pdf” failed (2: No
such file or directory), client: 127.0.0.1, server: test, request: “HEAD
/aabbccddeeff/aabbccddeeff/2009/A/Pdf/20090101.pdf HTTP/1.1”, host:
“test”

What’s more, if the user-agent isn’t matched, the server sends a 301
redirect with an appended “/” rather than delivering the file:

HTTP/1.1 301 Moved Permanently
Location: http://test/aabbccddeeff/aabbccddeeff/2009/A/Pdf/20090101.pdf/

On the other hand, if I omit the $http_user_agent test, the server
behaves as expected, by either delivering the file or returning a 410
status, dependning of the content of $lookup.

Am I doing something wrong?

Posted at Nginx Forum:

Hello!

On Sat, Dec 19, 2009 at 03:33:10PM -0500, marius wrote:

location ~* ^/aabbccddeeff(/.*) {
charset iso-8859-1;
alias /tmp/test/aabbccddeeff$1;
}

When requesting a file from that map, the open() log points to a garbled path “/aabbccddeeffcddeeff” rather than the requested “/aabbccddeeff/aabbccddeeff”:

This is somewhat expected, as you trashed captures from location
by executing another regex.

You should either use named captures as supported in nginx
0.8.25+, like this:

location ~* ^/aabbccddeeff(?<file>/.*) {
    ...
    alias /tmp/test/aabbccddeeff$file;
}

or save capture results before executing another regexp, e.g.

location ~* ^/aabbccddeeff(/.*) {
    set $file $1;
    ...
    alias /tmp/test/aabbccddeeff$file;
}

[…]

What’s more, if the user-agent isn’t matched, the server sends a 301 redirect with an appended “/” rather than delivering the file:

HTTP/1.1 301 Moved Permanently
Location: http://test/aabbccddeeff/aabbccddeeff/2009/A/Pdf/20090101.pdf/

And this one isn’t expected, but seems to be just another chapter
in “if is evil” saga. In this particular case alias directive
isn’t correctly inherited into implicit location created by if(),
and this screws things up.

Am I right in the assumption that you need “aabbccddeeff” to be case
insensitive while it’s in lower case on filesystem, and that’s why
you use alias instead of root? Try something like this:

# lowercase
rewrite ^(?i)/aabbccddeeff/(.*) /aabbccddeeff/$1;

location /aabbccddeeff/ {
    set $lookup "";
    if ($http_user_agent ~ "Googlebot|Slurp|msnbot") {
        set $lookup $uri;
    }
    if ($test_blacklist != "") {
        return 410;
    }
    root /tmp/test;
}

On the other hand, if I omit the $http_user_agent test, the server behaves as expected, by either delivering the file or returning a 410 status, dependning of the content of $lookup.

Am I doing something wrong?

The only safe things to do inside if() in location are

  1. rewrite … last;

  2. return …;

By using anything else you are searching for troubles.

Maxim D.

Maxim D. Wrote:

This is somewhat expected, as you trashed captures
from location by executing another regex.

You should either use named captures as supported
in nginx 0.8.25+, like this:

location ~* ^/aabbccddeeff(?/.*) {
    ...
    alias /tmp/test/aabbccddeeff$file;
}

Now that you’re mentioning the captures, it all makes sense.
I’d consider this a last resort solution, as running the current stable
build is definitely a plus.

or save capture results before executing another
regexp, e.g.

location ~* ^/aabbccddeeff(/.*) {
    set $file $1;
    ...
    alias /tmp/test/aabbccddeeff$file;
}

I already tried that way, but the configuration doesn’t validate:
the “alias” directive must use captures inside location given by regular
expression in

And this one isn’t expected, but seems to be just another chapter
in “if is evil” saga. In this particular case alias directive
isn’t correctly inherited into implicit location created by if(),
and this screws things up.

For legacy reasons, the first directory could be all uppercase or all
lowsercase. Avoiding the initial capture through $uri does the job.

Thank you for your help.

Posted at Nginx Forum: