Location regular expression not filtering some characters

I am looking to filter all characters other then those specified in the
“location” regular expression. For example, [\w.]+$ should only allow
one or
more letters, numbers, underscore and period just like [a-zA-Z0-9_.]

location ~* ^/data/[\w.]+$ {…}

When I test the url with wget I find the pound (#) and question mark (?)
are
allowed through. For example…

This URL is valid and is allowed through
wget “http://example.com/data/1234.txt

This URL with the additional “#” should not be allowed thorugh, but it
is.
wget “http://example.com/data/12#34.txt

Adding a question mark also gets through when it is supposed to be
blocked like
the pound “#” above.
wget “http://example.com/data/12?34.txt

Are pound (#) and questions mark (?) matches being overridden in Nginx
and thus
getting past my regular expression?

Does anyone know of a way to block the “#” or “?” that I am missing?

Just for clarity, I have no need for the “#” or “?” in my script and I
can do
checks in the script to exclude these characters if necessary.

I believe this was a mistake on my side. While testing I noticed that
the (#) and (?)
were allowed through but the URL result was not what I was expecting.

When the pound (#) is used nginx converts the URI from

http://example.com/data/12#34.txt
and cuts off the pound sign and anything after it to this…
http://example.com/data/12

before my regular expression is ever used. The pound (#) is a location
specific
tag so this expected and fine.

The question mark (?) is still passed to my regular expression and
allowed through.

http://example.com/data/12?34.txt
get passed through the regular expression unchanged
http://example.com/data/12?34.txt

Not sure why the question mark is special yet.

For those interested and to close this loop the question mark (?) is the
query
string. I am not able to filter it, but you can use a rewrite rule to
clear it in nginx.

Another oddity is you can put a bunch of illegal characters behind the
question mark
and nginx will happily pass those to your back end server even though
the
regex is in place. So, if you do not expect a “$” or “%” or any other
special character
in your back end you may be surprised.

I am still using this location regex
location ~* ^/data/[\w.]+$ {…}

If we take this valid url
http://example.com/data/1234.txt

We can add a question mark and anything we want to after that and it
will be
passed to your back end or script.
http://example.com/data/1234.txt?some_text../../../�l table%

I am interested if this is an expected result. My concern is that the
regex
I specified is being silently ignored. Should Nginx respect the user
configuration
and deny access to the URL with the question mark in it?

In most case I imagine the question mark and following text would be
fine as the
link might contain helpful information. As far as I can tell most
resources online
say this is the expect behavior and it is up to the script to validate
the data.

I agree the script should check all input, but then why even bother with
a
location regex to validate the url before it gets passed to a back end
server?

Hello!

On Tue, Jun 19, 2012 at 11:46:32AM -0400, CM Fields wrote:

wget “http://example.com/data/1234.txt
getting past my regular expression?
They aren’t part of data matched by locations. The “#” character
denotes frament identifier (and normally not sent to a http server
at all), and the “?” character denotes query string start.

The “#” and “?” characters will be only seen by location matching
if they are sent escaped, i.e. as a part of uri path component,
not as a syntax construct.

Does anyone know of a way to block the “#” or “?” that I am missing?

It’s not clear what you are trying to block. If you want to
reject all requests with fragments and query strings, you probably
want to use the “if” directive instead.

Maxim D.

On Tue, Jun 19, 2012 at 2:01 PM, Maxim D. [email protected]
wrote:

When I test the url with wget I find the pound (#) and question mark (?) are
wget “http://example.com/data/12?34.txt
if they are sent escaped, i.e. as a part of uri path component,


nginx mailing list
[email protected]
nginx Info Page

Maxim,

Thanks for the response. I completely agree with your statement. I was
looking to
filter the URI with the location directive, but using an If {} was the
proper method.

Thanks again.

On Tue, Jun 19, 2012 at 01:07:17PM -0400, CM Fields wrote:

Hi there,

http://example.com/data/12#34.txt
and cuts off the pound sign and anything after it to this…
http://example.com/data/12

The client probably should not send the # or anything after it, as it
is the local “fragment” part of the url. nginx is right to ignore it,
if it present. If you want a # in a url, it must be %-encoded.

http://example.com/data/12?34.txt
get passed through the regular expression unchanged
http://example.com/data/12?34.txt

Not sure why the question mark is special yet.

The ? marks the start of the query_string part of the url. If you want
a literal ? in a url, it must be %-encoded.

In practice, the query_string is usually an unordered set of key/value
pairs. The choice nginx makes is not to consider the query_string when
determining which location{} is the best fit.

You can test variables with names that include “arg” if you want to see
what query_string was provided – but you can’t do it as part of the
location directive.

http://nginx.org/en/docs/http/ngx_http_core_module.html#variables

f

Francis D. [email protected]

On Tue, Jun 19, 2012 at 02:05:10PM -0400, CM Fields wrote:

Hi there,

We can add a question mark and anything we want to after that and it will be
passed to your back end or script.
http://example.com/data/1234.txt?some_text../../../�l table%

By default, yes; because that’s (presumably) the usual case.

But you decide, in nginx.conf, exactly what gets passed to your back
end or script.

Presumably you use proxy_pass or fastcgi_pass or some similar directive
to send data to your back end. It should be possible to configure that
directive to send what you want, and not to send what you don’t want.

$uri and $request_uri are different variables; possibly you can use one
of those to achieve what you want.

I am interested if this is an expected result. My concern is that the regex
I specified is being silently ignored. Should Nginx respect the user
configuration
and deny access to the URL with the question mark in it?

I suspect this is just down to a different understanding of what the
nginx
config is doing. The “location” directive only tries to match from the
first / after the hostname, to just before the first ? or #. (Within
that
range, it matches the unescaped url.)

If you care about anything after that, you have to handle it separately.

All the best,

f

Francis D. [email protected]