Possible rewrite regular expression bug?

Hi all,

I’m trying to migrate our server from Lighttpd. Everything has been
going smoothly until I have this problem with regular expression. I
suspect this is a bug with NGINX regular expression. Consider the
following RE:

rewrite ^(.)&__Q=([^-]+)-([^/])$ /j?$2=$3 last;

This RE matches urls like http://localhost/testing&__Q=q-test but not
http://localhost/t?testing&__Q=q-test. It looks like this RE only
matches urls which don’t contain ‘?’ character before ‘__Q’ part.

Has anyone run into this problem before?

Cheers,
Peter

On Monday 23 June 2008, Peter Hoang wrote:

http://localhost/t?testing&__Q=q-test. It looks like this RE only
matches urls which don’t contain ‘?’ character before ‘__Q’ part.

Has anyone run into this problem before?

Cheers,
Peter

nginx ignores (like apache) query_string in rewrite rules

Thanks Roxis.

Roxis wrote:

On Monday 23 June 2008, Peter Hoang wrote:

http://localhost/t?testing&__Q=q-test. It looks like this RE only
matches urls which don’t contain ‘?’ character before ‘__Q’ part.

Has anyone run into this problem before?

Cheers,
Peter

nginx ignores (like apache) query_string in rewrite rules

I forgot to ask, how should I do the rewrite in NGINX?

Cheers,
Peter
Peter Hoang wrote:

Thanks Roxis.

Roxis wrote:

On Monday 23 June 2008, Peter Hoang wrote:

http://localhost/t?testing&__Q=q-test. It looks like this RE only
matches urls which don’t contain ‘?’ character before ‘__Q’ part.

Has anyone run into this problem before?

Cheers,
Peter

nginx ignores (like apache) query_string in rewrite rules

Just in case someone runs into the same problem, here is how I did it:

 if ($args ~ ^(.*)&__Q=([^\-]+)-([^/]*)$) {
    set $args $1&$2=$3;
  }

Peter Hoang wrote:

I forgot to ask, how should I do the rewrite in NGINX?

Cheers,
Peter
Peter Hoang wrote:

Thanks Roxis.

Roxis wrote:

On Monday 23 June 2008, Peter Hoang wrote:

http://localhost/t?testing&__Q=q-test. It looks like this RE only
matches urls which don’t contain ‘?’ character before ‘__Q’ part.

Has anyone run into this problem before?

Cheers,
Peter

nginx ignores (like apache) query_string in rewrite rules

On Mon, Jun 23, 2008 at 04:43:28PM +0200, Peter Hoang wrote:

I forgot to ask, how should I do the rewrite in NGINX?

Something like this:

 server {

      if ($args ~ ^(.*)&__Q=([^\-]+)-([^/]*)$ {
          set       $args   $2=$3;
          rewrite   ^       /j     last;
      }

      location / {

On Monday 23 June 2008, Peter Hoang wrote:

I forgot to ask, how should I do the rewrite in NGINX?

if ($query_string ~ “&__Q=([^-]+)-([^/]*)$”) {
set $new_args “$1=$2”;
rewrite / /j?$new_args last;
}

Thanks guys. I have to say NGINX has a more active community than
Lighttpd. Two more questions, is there something similar to
AllowEncodedSlashes and how do I enable rewrite debug in NGINX? Without
AllowEncodedSlashes, the following rewrite rule:

rewrite ^/([^-]+)-([^/]*).html$ /j?$1=$2 last;

doesn’t work with urls such as:
http://localhost/q-test/jt-test1%2Ftest2.html (notice %2F which is the
encoded value of ‘/’).

Cheers,
Peter

Roxis wrote:

On Monday 23 June 2008, Peter Hoang wrote:

I forgot to ask, how should I do the rewrite in NGINX?

if ($query_string ~ “&__Q=([^-]+)-([^/]*)$”) {
set $new_args “$1=$2”;
rewrite / /j?$new_args last;
}

On Tue, Jun 24, 2008 at 05:47:32PM +0200, Peter Hoang wrote:

Thanks guys. I have to say NGINX has a more active community than
Lighttpd. Two more questions, is there something similar to
AllowEncodedSlashes and how do I enable rewrite debug in NGINX? Without
AllowEncodedSlashes, the following rewrite rule:

rewrite ^/([^-]+)-([^/]*).html$ /j?$1=$2 last;

doesn’t work with urls such as:
http://localhost/q-test/jt-test1%2Ftest2.html (notice %2F which is the
encoded value of ‘/’).

There is no analog of AllowEncodedSlashes. nginx always decodes
quote-printables in URI part.

rewrite debug is:

error_log /path/to/error.log notice;

http {
server {
rewrite_log on;

Furthermore, it seems like NGINX seems to decode the parameter in
rewrite rules. For example, I have the following rule:

  if ($args ~ ^q=([^&]+)$) {
    set $q $1;
    rewrite ^/j /cms/q-$q.html? permanent;
  }

Now if I enter the http://localhost/j?q=c%2B%2B then NGINX will redirect
me to http://localhost/cms/q-c++.html . I was expected that the final
url would be http://localhost/cms/q-%2B%2B.html. Is there a way to
prevent this from happening? Thanks.

On Tue, Jun 24, 2008 at 06:44:16PM +0200, Peter Hoang wrote:

Thanks Igor. Is there any solution for my problem?

You may try to parse $request_uri. This is original uri+args as they
were recieved from a client.

Thanks Igor. Is there any solution for my problem?

Igor S. wrote:

There is no analog of AllowEncodedSlashes. nginx always decodes
quote-printables in URI part.

rewrite debug is:

error_log /path/to/error.log notice;

http {
server {
rewrite_log on;

On Tue, Jun 24, 2008 at 06:03:51PM +0200, Peter Hoang wrote:

url would be http://localhost/cms/q-%2B%2B.html. Is there a way to
prevent this from happening? Thanks.

It had been implemented long ago in 0.3.10:

*) Bugfix: the "rewrite" directive did not unescape URI part in
   redirect, now it is unescaped except the %00-%25 and %7F-%FF
   characters.

Why do you need “c%2B%2B” in URL ? “c++” is valid unambiguous URL part.

I’m looking at the log file and there is difference between how NGINX
and Apache handle $args.

Apache:

(4) RewriteCond: input=‘GET /search/q-C%2B%2B.html HTTP/1.1’
pattern=’^[A-Z]{3,9}\ /search/([^-]+)-([^/]*).html\ HTTP/’ => matched
(2) rewrite ‘/search/q-C++.html’ -> ‘/search?q=C%2B%2B’
(3) split uri=/search?q=C%2B%2B -> uri=/search, args=q=C%2B%2B
(2) forcing ‘/search’ to get passed through to next API URI-to-filename
handler

NGINX:

“^/search/([^-]+)-([^/]).html" matches “/search/q-C%2B%2B.html”,
client: 127.0.0.1, server: localhost, request: “GET
/search/q-C%2B%2B.html HTTP/1.1”, host: “localhost:82”
"^/search/([^-]+)-(.
).html$” matches “/search/q-C++.html”, client:
127.0.0.1, server: localhost, request: “GET /search/q-C%2B%2B.html
HTTP/1.1”, host:
rewritten data: “/search”, args: “q=C++”, client: 127.0.0.1, server:
localhost, request: “GET /search/q-C%2B%2B.html HTTP/1.1”, host:
“localhost:82”

As can be seen, Apache doesn’t decode $args while NGINX does. Is it
possible to orverwrite this behaviour? How much code needed to be
changed? I wouldn’t mind hacking the source if it’s a simple change.

Thanks.

Hi Igor,

I have a search form on the site and sometimes people do searches for
“C++”. When they submit the form, the URL will be
http://localhost/search?q=C%2B%2B because the browser will encode the
term “C++”. Since we want to make it search engine friendly, we redirect
user to http://localhost/search/q-C%2B%2B.html, hence my question.

Any suggestion?

Cheers,

Cuong

Igor S. wrote:

On Tue, Jun 24, 2008 at 06:03:51PM +0200, Peter Hoang wrote:

url would be http://localhost/cms/q-%2B%2B.html. Is there a way to
prevent this from happening? Thanks.

It had been implemented long ago in 0.3.10:

*) Bugfix: the "rewrite" directive did not unescape URI part in
   redirect, now it is unescaped except the %00-%25 and %7F-%FF
   characters.

Why do you need “c%2B%2B” in URL ? “c++” is valid unambiguous URL part.

Actually, I realise that Apache has NE (no encoding) flag for
RewriteRule and that’s why “C%2B%2B” is passed down intact to the
backend server. I’m thinking about implementing this flag in NGINX:

rewrite ^/search /search/q-$q.html last ne;

or

rewrite ^/search /search/q-$q.html ne;

I’m digging into the source code right now. However, it might be better
if someone could quickly show me the right direction to achieve that.

Cheers,
Peter

The escape_uri function is called in 3 different places in
ngx_http_script.c:

  1. size_t ngx_http_script_copy_capture_len_code(ngx_http_script_engine_t
    *e)
  2. void ngx_http_script_copy_capture_code(ngx_http_script_engine_t *e)
  3. void ngx_http_script_regex_start_code(ngx_http_script_engine_t *e)

If I were to add “ne” flag as following:

rewrite ^/search /search/q-$q.html last ne;

I could do check for ngx_strcmp(value[4].data, “ne”) == 0 in
ngx_http_rewrite and store the flag somewhere. My question is, how do we
store this flag so that we can retrieve it in 3 above functions? Can I
just create an additional flag in ngx_http_script_engine_t to do this?

As for the redirect decoding, what is the effect of commenting the
following line in ngx_http_script.c:

   /* ngx_unescape_uri(&dst, &src, e->pos - e->buf.data,
                     NGX_UNESCAPE_REDIRECT); */

Thanks in advance.

Cheers,
Peter