Regular Expression global redirect

I’m using nginx as a reverse proxy for about 2000 websites. I’m trying
to find a good way to redirect all www traffic to nonwww addresses. I
don’t want to have a separate entry for every domain…just a global
redirect in the server block preferably. I found lots of examples to do
this one domain at a time, but does anyone have any suggestions on how
to do it for the whole server?

I was thinking of extracting the domain something like this then using
an if statement, but I understand that if’s are not recommended:

server_name ~^(.)?(?.+)$;

thanks,
altimage

here’s my server block:

server {
listen 80;
server_name _;

location / {
proxy_pass http://websites;
}
}

Posted at Nginx Forum:

On 26 Fev 2012 04h59 CET, [email protected] wrote:

location / {
proxy_pass http://websites;
}
}
Try:

server {
server_name ^~www.(?.*)$;
return 301 http://$domain;
}

server {
server_name ^~(?<domain_name>[^.]).(?[^.])$;

location / {
    proxy_pass http://$domain_name.$tld;

}
}

— appa

On 2012-02-26 10:59, altiamge wrote:

I’m using nginx as a reverse proxy for about 2000 websites. I’m trying
to find a good way to redirect all www traffic to nonwww addresses. I
don’t want to have a separate entry for every domain…just a global
redirect in the server block preferably. I found lots of examples to do
this one domain at a time, but does anyone have any suggestions on how
to do it for the whole server?

This is what I’m using:

server {
listen 80;
server_name ~^www.(?.+)$;
rewrite ^ $scheme://$domain$request_uri? permanent;
}

2012/2/26 António P. P. Almeida [email protected]:

server {
server_name ^~www.(?.*)$;
return 301 http://$domain;
}

Where can I read the documentation for this? It doesn’t seem to be
mentioned in nginx.org docs and nginx wiki

2012/2/26 altiamge [email protected]:

I’m not able to get either one of these to work. I just upgraded to
nginx 1.0.12 just to make sure my version wasn’t an issue. I also
checked my PCRE version.

pcretest

PCRE version 6.6 06-Feb-2006

Your pcre is too old. I believe the workaround is by appending P
before the capture name.

server_name ^~www.(?P.*)$;

I’m not able to get either one of these to work. I just upgraded to
nginx 1.0.12 just to make sure my version wasn’t an issue. I also
checked my PCRE version.

pcretest

PCRE version 6.6 06-Feb-2006

Here are the errors I’m getting with each example:

Example 1

server {
listen 80;
server_name ~^www.(?.+)$;
rewrite ^ $scheme://$domain$request_uri? permanent;
}

Error:

[emerg] pcre_compile() failed: unrecognized character after (?< in
“^www.(?.+)$” at “domain>.+)$”

Example 2

server {
server_name ^~www.(?.*)$;
return 301 http://$domain;
}

server {
server_name ^~(?<domain_name>[^.]).(?[^.])$;
location / {
proxy_pass http://websites;
}
}

Error:

nginx: [emerg] unknown “domain” variable

thanks,
altimage

Posted at Nginx Forum:

On 26 Fev 2012 07h19 CET, [email protected] wrote:

2012/2/26 António P. P. Almeida [email protected]:

server {
server_name ^~www.(?.*)$;
return 301 http://$domain;
}

Where can I read the documentation for this? It doesn’t seem to be
mentioned in nginx.org docs and nginx wiki

AFAIK it’s undocumented. You can use return for a lot of things. I
hardly ever use rewrite anymore. There are situations where it still
applies, but they’re not the majority.

IMHO using return is more Nginx like, while rewrite harks back to
Apache’s mod_rewrite and its “reverse” logic.

Using return you can make a poorman’s web service, for example:

location /ws-test {
return 200 “{uri: $uri, ‘service name’: ‘this is a service’}\n”;
}

If you do a capture in the location you can use the captures in the
URI you give return as the second argument. The default status is
302. AFAIK it doesn’t support named locations redirects. Hence the
usual idiom of returning an error status and then using error_page for
the redirect with a named location.

It was late and I forgot the $request_uri :frowning:

Also for old PCRE versions the ? has to be replaced by ?P.

Both things that were already addressed in the thread.

— appa

On Sat, 25 Feb 2012 22:59:50 -0500 (EST), “altiamge”
[email protected] wrote:

}
}

Would this help?

For older PCRE’s:

for http

server { listen 80; server_name ~^www.(?P.+)$; return 301
$scheme://$domain$request_uri; }

#for https (change ‘sslcert’ for your own certificate name)
server { listen 443 ssl; server_name ~^www.(?P.+)$;
ssl_certificate /etc/ssl/certs/sslcert.crt;
ssl_certificate_key /etc/ssl/private/sslcert.key; return 301
$scheme://$domain$request_uri; }

For newer PCRE’s:
Instead of ?P use ?

Note: in ‘return XXX’ 301 is like rewrite…permanent and

302 like rewrite…redirect

M.

2012/2/26 António P. P. Almeida [email protected]:

location /ws-test {
return 200 “{uri: $uri, ‘service name’: ‘this is a service’}\n”;
}

Is this some kind of magic :open_mouth:

where’s the documentation :frowning:

I still cant seem to get this working. I upgraded my PCRE libraries and
recompiled/reinstalled a fresh nginx 1.0.12

pcrecheck

PCRE version 8.21 2011-12-12

Here is my server sections. Notice I have 2 server sections…the 1st
section catches the WWW site and redirects it to the 2nd,
non-www…right? I’m still getting:
nginx: [emerg] unknown “domain” variable

server {
listen 80;
server_name ^~www.(?.*)$;
return 301 http://$domain;
}

server {
listen 80;
server_name ^~(?<domain_name>[^.]).(?[^.])$;
location / {
proxy_pass http://websites;
}
}

When I try it with the P, everything (www and nonwww) get a white 301
nginx page:
server {
listen 80;
server_name ^~www.(?P.*)$;
return 301
$scheme://$domain$request_uri;;
}

server {
listen 80;
server_name _;
location / {
proxy_pass http://websites;
}
}

I tried making server_name in the 2nd block:
server_name ^~(?P<domain_name>[^.]).(?[^.])$;

but I get this:
nginx: [emerg] invalid server name or wildcard
“^~(?p<domain_name>[^.]).(?[^.])$” on 0.0.0.0:80
(fyi, the error has a lowercase p, server_name has it capitalized)

Is there some other dependency I’m missing or am I just mangling the
syntax?

thanks,
altimage

Posted at Nginx Forum:

No Luck…I’m still getting this:

nginx: [emerg] unknown “domain” variable.

thanks,
altimage

Posted at Nginx Forum:

On 27 Fev 2012 00h39 CET, [email protected] wrote:

proxy_pass http://websites;
location / {
(fyi, the error has a lowercase p, server_name has it capitalized)

Is there some other dependency I’m missing or am I just mangling the
syntax?

Ok. It seems that your PCRE library has problems with the non P syntax
for named captures. So you cannot mix both.

server {
listen 80;
server_name ^~www.(?P.*)$;
return 301 $scheme://$domain$request_uri;
}

server {
listen 80;
server_name ^~(?P<domain_name>[^.]).(?P[^.])$;
location / {
proxy_pass http://$domain_name.$tld;
}
}

This should work [1].

— appa

[1] Server names

On 27 Fev 2012 00h39 CET, [email protected] wrote:

proxy_pass http://websites;
location / {
(fyi, the error has a lowercase p, server_name has it capitalized)

Is there some other dependency I’m missing or am I just mangling the
syntax?

Oops. I erroneously switched the ‘^’ and ‘~’. It’s ~^ not ^~. Solly :frowning:

Ok. It seems that your PCRE library has problems with the non P syntax
for named captures. So you cannot mix both.

server {
listen 80;
server_name ~^www.(?P.*)$;
return 301 $scheme://$domain$request_uri;
}

server {
listen 80;
server_name ~^(?P<domain_name>[^.]).(?P[^.])$;
location / {
proxy_pass http://$domain_name.$tld;
}
}

This should work [1].

— appa

[1] Server names

On 27 Fev 2012 07h33 CET, [email protected] wrote:

Your solution, while syntactically correct, is wrong by design.
What you created there is an open anonymizing proxy that will pass
any request from anyone to any host:port combination that contains
only the domain name and the TLD, if a functional resolver has been
set up using the resolver directive. Take a guess what this would
do:

This deals with illegal Host headers:

server {
listen 80 default_server;
server_name _;
server_name_in_redirect off;
return 444;
}

— appa

That did the trick! Thank you so much for all your help.

altimage

Posted at Nginx Forum:

27 февраля 2012, 14:13 от António P. P. Almeida [email protected]:

server_name ~^(?P<domain_name>[^.]).(?P[^.])$;
only the domain name and the TLD, if a functional resolver has been
}
If by deals you mean gives a card to every player who wants one,
then you are correct. :stuck_out_tongue: But it does nothing to close that open
anonymizing proxy you created with the previous server block,
anyone can still use your frontend server as an open anonymizing
proxy to access any domain.tld:port they want, including fbi.gov:22.

Besides, server_name_in_redirect is off by default. Moreover,
it’s completely useless in that server block because you’re just
dropping the connection anyway. This would have been just
as useful:

proxy_set_header Warning “CPU cycle wasting in progress…”;

As for illegal Host headers, nginx takes care of those on its
own and returns error code 400 without such blocks. The
purpose of such blocks is to catch everything else that is not
matched by defined server names. In your case, the other two
server blocks already match any requests that have the Host
header set to start with www or contain a domain.tld type
of hostname, so your latest server block just catches everything
else (requests with missing Host headers, IP addresses,
nonwwwhostname.domain.tld hostnames etc.).

To put it simply - your configuration is wrong and should not
be used, unless you want to “deal with” the FBI in the near
future.

Max

On 28 Fev 2012 04h47 CET, [email protected] wrote:

Your solution, while syntactically correct, is wrong by design.
server_name _;
Besides, server_name_in_redirect is off by default. Moreover,
it’s completely useless in that server block because you’re just
dropping the connection anyway. This would have been just
as useful:

That was set to off by default in 0.8.48.

nonwwwhostname.domain.tld hostnames etc.).
Illegal in the sense of being relative to undefined/unauthorized
hosts. That’s what I meant. I use a similar vhost in all my setups.

To put it simply - your configuration is wrong and should not
be used, unless you want to “deal with” the FBI in the near
future.

  1. The OP didn’t request anything like you said.

  2. If he requested such, that could have been dealt with using a
    simple map with hostnames and an if at the server level.

  3. IIRC he hasn’t said how his exact setup works. He could have in
    place network policies that disable the usage of the servers as
    open proxies.

  4. You’re just trolling. Like you trolled other people before
    me. People that have been working on Nginx for quite some time, and
    that have real accomplishements, besides trolling and posing as
    “experts”.

  5. I won’t engage you ever again. My mistake.

HAND,
— appa

27 февраля 2012, 04:41 от António P. P. Almeida [email protected]:

non-www…right? I’m still getting: nginx: [emerg] unknown “domain”
server_name ^~(?<domain_name>[^.]).(?[^.])$;
listen 80;
nginx: [emerg] invalid server name or wildcard

proxy_pass http://$domain_name.$tld;
}
}

This should work [1].

Your solution, while syntactically correct, is wrong by design.
What you created there is an open anonymizing proxy that will pass
any request from anyone to any host:port combination that contains
only the domain name and the TLD, if a functional resolver has been
set up using the resolver directive. Take a guess what this would do:

$ nc frontend 80
GET /a/clue HTTP/1.0
Host: fbi.gov:22

You should never pass unsanitized user input to pass_proxy, unless
you want people to abuse your open anonymizing proxy for illegal
activities that will get you in trouble. Good luck convincing
the FBI that your incompetence was the real culprit.

Moreover, the frontend server will pass all requests for
“http://$domain_name.$tld” that would have normally been passed
on to the backend server on to itself to create a nasty loop,
unless you happen to have split horizon DNS set up with the
resolver set to the internal DNS server that maps the value
of “$domain_name.$tld” to an internal IP. But if you had that kind
of setup, you’d use it to do the mapping in the first place instead
of doing what you’ve been trying to do.

This is what your solution does if a functional resolver has been
set up:

http://www.domain.tld → status code 301 with “Location:
http://domain.tld

http://own-domain.tld → proxy_pass LOOP to http://own-domain.tld

http://foreign-domain.tld:port → OPEN ANONYMIZING PROXY to
foreign-domain.tld:port

If no resolver has been set up, proxy_pass will fail due to being
unable to resolve the value of “$domain.$tld” for any request
that contains only the domain name and the TLD.

Here’s one of the correct ways to do what the OP wants to do:

map $http_host $wwwless_http_host {
hostnames;
default $http_host;
~^www.(?P.*)$ $domain;
}

server {
listen 80 default_server;
server_name _;

location / {
    proxy_set_header Host $wwwless_http_host;
    proxy_pass http://backend;
}

}

It would be a good idea to also allow only hosts and domains that you
actually host, which could be done like this:

map $http_host $own_http_host {
hostnames;
default 0;
include nginx.own-domains.map;
}

server {
listen 80 default_server;
server_name _;

if ($own_http_host = 0) {
    # Not one of our hosts / domains, so terminate the connection
    return 444;
}

location / {
    proxy_set_header Host $own_http_host;
    proxy_pass http://backend;
}

}

The nginx.own-domains.map file would contain entries such as:

.domain.org domain.org; # map *.domain.org to domain.org
www.another.net another.net; # map only www.another.net to
another.net

This file could be generated automatically from DNS zone files,
so it would be easy to maintain.

Max