I am looking to filter all characters other then those specified in the
“location” regular expression. For example, [\w.]+$ should only allow
one or
more letters, numbers, underscore and period just like [a-zA-Z0-9_.]
location ~* ^/data/[\w.]+$ {…}
When I test the url with wget I find the pound (#) and question mark (?)
are
allowed through. For example…
I believe this was a mistake on my side. While testing I noticed that
the (#) and (?)
were allowed through but the URL result was not what I was expecting.
When the pound (#) is used nginx converts the URI from
For those interested and to close this loop the question mark (?) is the
query
string. I am not able to filter it, but you can use a rewrite rule to
clear it in nginx.
Another oddity is you can put a bunch of illegal characters behind the
question mark
and nginx will happily pass those to your back end server even though
the
regex is in place. So, if you do not expect a “$” or “%” or any other
special character
in your back end you may be surprised.
I am still using this location regex
location ~* ^/data/[\w.]+$ {…}
I am interested if this is an expected result. My concern is that the
regex
I specified is being silently ignored. Should Nginx respect the user
configuration
and deny access to the URL with the question mark in it?
In most case I imagine the question mark and following text would be
fine as the
link might contain helpful information. As far as I can tell most
resources online
say this is the expect behavior and it is up to the script to validate
the data.
I agree the script should check all input, but then why even bother with
a
location regex to validate the url before it gets passed to a back end
server?
On Tue, Jun 19, 2012 at 11:46:32AM -0400, CM Fields wrote:
wget “http://example.com/data/1234.txt”
getting past my regular expression?
They aren’t part of data matched by locations. The “#” character
denotes frament identifier (and normally not sent to a http server
at all), and the “?” character denotes query string start.
The “#” and “?” characters will be only seen by location matching
if they are sent escaped, i.e. as a part of uri path component,
not as a syntax construct.
Does anyone know of a way to block the “#” or “?” that I am missing?
It’s not clear what you are trying to block. If you want to
reject all requests with fragments and query strings, you probably
want to use the “if” directive instead.
When I test the url with wget I find the pound (#) and question mark (?) are
wget “http://example.com/data/12?34.txt”
if they are sent escaped, i.e. as a part of uri path component,
Thanks for the response. I completely agree with your statement. I was
looking to
filter the URI with the location directive, but using an If {} was the
proper method.
The client probably should not send the # or anything after it, as it
is the local “fragment” part of the url. nginx is right to ignore it,
if it present. If you want a # in a url, it must be %-encoded.
The ? marks the start of the query_string part of the url. If you want
a literal ? in a url, it must be %-encoded.
In practice, the query_string is usually an unordered set of key/value
pairs. The choice nginx makes is not to consider the query_string when
determining which location{} is the best fit.
You can test variables with names that include “arg” if you want to see
what query_string was provided – but you can’t do it as part of the
location directive.
By default, yes; because that’s (presumably) the usual case.
But you decide, in nginx.conf, exactly what gets passed to your back
end or script.
Presumably you use proxy_pass or fastcgi_pass or some similar directive
to send data to your back end. It should be possible to configure that
directive to send what you want, and not to send what you don’t want.
$uri and $request_uri are different variables; possibly you can use one
of those to achieve what you want.
I am interested if this is an expected result. My concern is that the regex
I specified is being silently ignored. Should Nginx respect the user
configuration
and deny access to the URL with the question mark in it?
I suspect this is just down to a different understanding of what the
nginx
config is doing. The “location” directive only tries to match from the
first / after the hostname, to just before the first ? or #. (Within
that
range, it matches the unescaped url.)
If you care about anything after that, you have to handle it separately.