Bad side effect of (even unmatched) nested regex locations in regex locations with anonymous capture

Hello!

I have another interesting scenario :wink: Given is following minimized
test case

server {
listen 80;
server_name t2.example.com;

 root                            /data/web/t2.example.com/htdoc;

 location                        ~ ^/bar(/.*)? {
     alias

/data/web/t2.example.com/htdoc/foo$1;
try_files ‘’ =404;
}

 location                        ~ ^/bla(/.*)? {
     alias

/data/web/t2.example.com/htdoc/foo$1;
try_files ‘’ =404;

     location                    ~ child_of_bla(?P<x>.*)$ {
         return                  418;
     }
 }

}

on Nginx 1.3.4 (but not specific to that version)

nginx -V

nginx version: nginx/1.3.4
TLS SNI support enabled
configure arguments: --prefix=/usr/share/nginx
–conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/sbin/nginx
–http-log-path=/var/log/nginx/access.log
–error-log-path=/var/log/nginx/error.log --pid-path=/var/run/nginx.pid
–user=nginx --group=nginx --with-openssl=openssl-1.0.1c --with-debug
–with-http_stub_status_module --with-http_ssl_module --with-ipv6

and follow file system layout

$ find /data/web/t2.example.com/htdoc/
/data/web/t2.example.com/htdoc/
/data/web/t2.example.com/htdoc/foo
/data/web/t2.example.com/htdoc/foo/quux.txt

Plain access to quux.txt works of course

$ curl -s -D - -H ‘Host: t2.example.comhttp://127.0.0.1/foo/quux.txt
HTTP/1.1 200 OK
Server: nginx/1.3.4
Date: Thu, 02 Aug 2012 08:27:57 GMT
Content-Type: text/plain
Content-Length: 5
Last-Modified: Thu, 02 Aug 2012 07:15:43 GMT
Connection: keep-alive
ETag: “501a291f-5”
Accept-Ranges: bytes

QUUX

Access to location /bar works as well as expected

$ curl -s -D - -H ‘Host: t2.example.comhttp://127.0.0.1/bar/quux.txt
HTTP/1.1 200 OK
Server: nginx/1.3.4
Date: Thu, 02 Aug 2012 08:28:10 GMT
Content-Type: application/octet-stream
Content-Length: 5
Last-Modified: Thu, 02 Aug 2012 07:15:43 GMT
Connection: keep-alive
ETag: “501a291f-5”
Accept-Ranges: bytes

QUUX

but it breaks when the location of the same style has a nested location
(which might be even unmatched; here this child_of_bla thingy), which
also does a regex capture (doesn’t matter whether this is an anonymous
or named capture)

$ curl -s -D - -H ‘Host: t2.example.comhttp://127.0.0.1/bla/quux.txt
HTTP/1.1 404 Not Found
Server: nginx/1.3.4
Date: Thu, 02 Aug 2012 08:28:13 GMT
Content-Type: text/html
Content-Length: 168
Connection: keep-alive

404 Not Found

404 Not Found


nginx/1.3.4

Nginx doesn’t even seem to process anyhting in the ‘try files phase’
according to the debug log.

2012/08/02 10:28:13 [debug] 15741#0: *17 http process request line
2012/08/02 10:28:13 [debug] 15741#0: *17 http request line: “GET
/bla/quux.txt HTTP/1.1”
2012/08/02 10:28:13 [debug] 15741#0: *17 http uri: “/bla/quux.txt”
2012/08/02 10:28:13 [debug] 15741#0: *17 http args: “”
2012/08/02 10:28:13 [debug] 15741#0: *17 http exten: “txt”
2012/08/02 10:28:13 [debug] 15741#0: *17 http process request header
line
2012/08/02 10:28:13 [debug] 15741#0: *17 http header: “User-Agent:
curl/7.19.7 (i386-redhat-linux-gnu) libcurl/7.19.7 NSS/3.13.1.0
zlib/1.2.3 libidn/1.18 libssh2/1.2.2”
2012/08/02 10:28:13 [debug] 15741#0: *17 http header: “Accept: /
2012/08/02 10:28:13 [debug] 15741#0: *17 http header: “Host:
t2.example.com
2012/08/02 10:28:13 [debug] 15741#0: *17 http header done
2012/08/02 10:28:13 [debug] 15741#0: *17 rewrite phase: 0
2012/08/02 10:28:13 [debug] 15741#0: 17 test location: ~ "^/bar(/.)?"
2012/08/02 10:28:13 [debug] 15741#0: 17 test location: ~ "^/baz(/.)?"
2012/08/02 10:28:13 [debug] 15741#0: *17 using configuration “”
2012/08/02 10:28:13 [debug] 15741#0: *17 http cl:-1 max:1048576
2012/08/02 10:28:13 [debug] 15741#0: *17 rewrite phase: 2
2012/08/02 10:28:13 [debug] 15741#0: *17 post rewrite phase: 3
2012/08/02 10:28:13 [debug] 15741#0: *17 generic phase: 4
2012/08/02 10:28:13 [debug] 15741#0: *17 generic phase: 5
2012/08/02 10:28:13 [debug] 15741#0: *17 access phase: 6
2012/08/02 10:28:13 [debug] 15741#0: *17 access phase: 7
2012/08/02 10:28:13 [debug] 15741#0: *17 post access phase: 8
2012/08/02 10:28:13 [debug] 15741#0: *17 try files phase: 9

(nothing happens here?)

2012/08/02 10:28:13 [debug] 15741#0: *17 content phase: 10
2012/08/02 10:28:13 [debug] 15741#0: *17 content phase: 11
2012/08/02 10:28:13 [debug] 15741#0: *17 content phase: 12
2012/08/02 10:28:13 [debug] 15741#0: *17 http filename:
“/data/web/t2.example.com/htdoc/bla/quux.txt”
2012/08/02 10:28:13 [error] 15741#0: *17 open()
“/data/web/t2.example.com/htdoc/bla/quux.txt” failed (2: No such file or
directory), client: 127.0.0.1, server: t2.example.com, request: “GET
/bla/quux.txt HTTP/1.1”, host: “t2.example.com
2012/08/02 10:28:13 [debug] 15741#0: *17 http finalize request: 404,
“/bla/quux.txt?” a:1, c:1
2012/08/02 10:28:13 [debug] 15741#0: *17 http special response: 404,
“/bla/quux.txt?”
2012/08/02 10:28:13 [debug] 15741#0: *17 http set discard body
2012/08/02 10:28:13 [debug] 15741#0: *17 HTTP/1.1 404 Not Found

Interestingly enough it works when I change the anonymous capture for
/bar to a named one and replace the $1 in the alias by that named
variable.
Is this expected behavior or should I rather assume “anonymous captures
are evil”?

-cs

Hello!

On Thu, Aug 02, 2012 at 10:38:15AM +0200, Christoph Schug wrote:

Hello!

I have another interesting scenario :wink: Given is following minimized
test case

server {
listen 80;
server_name t2.example.com;

root                            /data/web/t2.example.com/htdoc;

location                        ~ ^/bar(/.*)? {
    alias /data/web/t2.example.com/htdoc/foo$1;

This is expected to break once between location matching and
accessing a file (which will evaluate variables in the “alias”
directive) any regexp matching will happen.

Not only nested regex location matching (which is kind of
explicit), but even lookup of a map variable (with regexps) will
be enough to break things.

And this is why it’s not recommended to use enumerated captures
except for very simple configurations (or “rewrite” directive,
where use of enumerated captures immediatly follows regexp
matching). Use named captures instead and you’ll be fine.

[…]

Maxim D.

On 2012-08-02 12:42, Maxim D. wrote:
[…]

And this is why it’s not recommended to use enumerated captures
except for very simple configurations (or “rewrite” directive,
where use of enumerated captures immediatly follows regexp
matching). Use named captures instead and you’ll be fine.

Thanks Maxim,

using named captures it exactly what I did. The question to me was more
or less if the other configuration was intended to break. If that’s the
case, that this is mainly a documentation issue which should be added to
either [1] or [2] (best with cross reference to each other).

[1] http://www.nginx.org/en/docs/http/ngx_http_core_module.html#alias
[2]
http://www.nginx.org/en/docs/http/ngx_http_core_module.html#location

The topic “named captures” is as far as I can see is only mentioned in
[3]. It might be good to demonstrate its use in a wider context. While
doing so, also a comment on the syntax might be great, as PCRE not
always supported the Perl-style notation of “(?)” [4].

[3]
http://www.nginx.org/en/docs/http/ngx_http_core_module.html#server_name
[4]
http://vcs.pcre.org/viewvc/code/trunk/doc/pcre.txt?r1=91&r2=93#l3410

Cheers
-cs

Hello!

On Thu, Aug 02, 2012 at 05:59:40PM +0200, Christoph Schug wrote:

more or less if the other configuration was intended to break. If
While doing so, also a comment on the syntax might be great, as PCRE
not always supported the Perl-style notation of “(?)” [4].

[3] http://www.nginx.org/en/docs/http/ngx_http_core_module.html#server_name
[4]
http://vcs.pcre.org/viewvc/code/trunk/doc/pcre.txt?r1=91&r2=93#l3410

We have various details about the captures in general and the
issue with enumerated captures discussed in an introduction
article here:

http://nginx.org/en/docs/http/server_names.html#regex_names

Adding links to every directive which support variables and/or
execute regular expressions might be a bit too verbose.

Maxim D.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs