Rewrite regex with percent signs

Hello, I am helping an admin sort out some 404 issues by using some
rewrite
which have generally been successful. However we have a couple of cases
that
are a bit mysterious and hope you can help explain. This is from a
vbulletin
forum that used to use the vbseo extension to make the url’s prettier
but
that extension has been dropped now so posts with those pretty url’s
don’t
point the correct places.

For example, we have a url of the following:
/members/redrobes-albums-2d%20vs%203d%20?-picture12345-mt-pub01.jpg

it needs to go to

/attachment.php?attachmentid=12345

we have:

location /members/ {
rewrite ^/members/.±albums-.±picture(\d+)-.*
/attachment.php?attachmentid=$1? redirect;
}

and this particular one is not working. It works with many others where
the
original url did not have the %20’s in them. So there is something about
those %20’s that are causing these to fail.

I can write a perl script and run that url through its regex and it does
change them.

So what does the nginx regex do different from perl regex with regard to
%
signs.

Thanks.

Posted at Nginx Forum:

On Sun, May 22, 2016 at 07:16:35AM -0400, redrobes wrote:

Hi there,

For example, we have a url of the following:
/members/redrobes-albums-2d%20vs%203d%20?-picture12345-mt-pub01.jpg

The %20 pieces in there are url-encoded spaces. In a “location” or a
“rewrite”, you would have to match a single space character each.

However, there is also a ? in the url; that marks the start of the query
string. A “location” or “rewrite” in nginx will not consider that part
of the url.

it needs to go to

/attachment.php?attachmentid=12345

It is not immediately clear to me which parts of the original url are
important in deciding whether the request should be redirected or not.

we have:

location /members/ {
rewrite ^/members/.±albums-.±picture(\d+)-.*
/attachment.php?attachmentid=$1? redirect;
}

That suggests that just those three words matter. You might be able
to put something together involving “$args” matching “-picture(\d+)-”
if the request matches “^/members/.*-albums-”, perhaps?

Alternatively, perhaps the thing that created the url in the first
place,
incorrectly did not url-encode the ? to %3F.

and this particular one is not working. It works with many others where the
original url did not have the %20’s in them. So there is something about
those %20’s that are causing these to fail.

I suspect that it is the ? rather than the %20, from the one example
you have given.

I can write a perl script and run that url through its regex and it does
change them.

So what does the nginx regex do different from perl regex with regard to %
signs.

With regard to % signs, nginx regex uses the %-unencoded version. With
regard to ?, some nginx parts do not consider anything after the ? when
matching.

Good luck with it,

f

Francis D. [email protected]

Francis D. Wrote:

However, there is also a ? in the url; that marks the start of the
query
string. A “location” or “rewrite” in nginx will not consider that
part
of the url.

Ah ! Thanks. I didn’t spot that one amongst all the other odd chars
there.
Yes nginx does indeed treat the ? as a different character to perl and
the
hex codes convert to “2d vs 3d ?” I.e. the ? was in the title of the
post
and is not the start of args. I think thats it and it makes sense now. I
can
understand what is going on.

It could have been mighty hard to fix this case since we are not going
to
know in advance whether the ? was part of the url or the start of args
but I
think in our case we know were going to dump all the args anyway and
substitute our own in. So I think there may be the possibility of
appending
the args to the rewrite before we do the match. Not sure at this point.

But thanks Francis - I think you have solved it.

location /members/ {
incorrectly did not url-encode the ? to %3F.
I can write a perl script and run that url through its regex and it
matching.
nginx Info Page
Posted at Nginx Forum:
Re: Rewrite regex with percent signs