Problem with rewrite last giving a HTTP 301?

Hello,

I’m having a weird problem with my website.
In my nginx conf, I have this rule:
rewrite ^/robots.txt$ /cms/robotstxt.php last;

and a location to handle PHP files:
location ~ .php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}

When I visit robots.txt, I get the expected result (a robots.txt
dynamically generated). If I check the nginx log file, as expected I get
a 200 HTTP answer :
IPADDRESS - - [21/Sep/2010:16:22:57 +0200] “GET /robots.txt HTTP/1.0”
200 231 “-” “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1;
Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR
3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E)”
If I check with fiddler (a HTTP debugger), everything is also OK : I
have my robots.txt file with the 200 HTTP code.

By checking the logs most of the time the 200 code is here.

Now, here is the problem : sometimes, for some reasons that I don’t
know, instead of the 200 code, nginx send a 301 code. Here’s an example
of the googlebot visiting my website :
66.249.65.168 - - [21/Sep/2010:12:02:09 +0200] “GET /robots.txt
HTTP/1.0” 301 178 “-” “Googlebot-Image/1.0”
66.249.65.166 - - [21/Sep/2010:12:02:09 +0200] “GET /cms/robotstxt.php
HTTP/1.0” 200 231 “-” “Googlebot-Image/1.0”

Of course, I don’t want bots or people to visit this /cms/robotstxt.php
page directly…

Tonight, for the first time I’ve also found this problem with another
rewrite rule. Again, I have a rule like :
rewrite “^/([0-9]+)-([a-z0-9-]*)-([a-z]{2})$”
“/cms/pages.php?id=$1;title=$2;language=$3” last; and normally I get a
200 HTTP code when I visit one of my page.
But I discovered some weird logs from the Yandex bot :
95.108.151.244 - - [21/Sep/2010:21:20:56 +0200] “GET /123-mypage-fr
HTTP/1.0” 301 178 “-” “Mozilla/5.0 (compatible; YandexBot/3.0;
MirrorDetector; +http://yandex.com/bots)”

I’ve also tried to fetch the robots.txt file from the google webmaster
tools, but I got a correct 200 HTTP answer.

Is anybody know what the problem could be ? I have no idea how I could
reproduce this strange results with my browser or wget or whatever
tool.

I’m using nginx 0.7.67, the version packaged with Debian Squeeze
32bits.

Thanks.

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,132723,132723#msg-132723

Hello!

On Tue, Sep 21, 2010 at 06:10:05PM -0400, toto2008 wrote:

fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;

If I check with fiddler (a HTTP debugger), everything is also OK : I
HTTP/1.0" 200 231 “-” “Googlebot-Image/1.0”
95.108.151.244 - - [21/Sep/2010:21:20:56 +0200] “GET /123-mypage-fr
HTTP/1.0” 301 178 “-” “Mozilla/5.0 (compatible; YandexBot/3.0;
MirrorDetector; +http://yandex.com/bots)”

I’ve also tried to fetch the robots.txt file from the google webmaster
tools, but I got a correct 200 HTTP answer.

Is anybody know what the problem could be ? I have no idea how I could
reproduce this strange results with my browser or wget or whatever
tool.

Most likely it’s your cms code which detects something in bot’s
requests (e.g. wrong Host header) and issues redirect.

Either look into it’s code or try logging something like
$upstream_http_location, it should give you some better idea
what’s going on.

Maxim D.

Hello Maxim,

Thanks a lot for your answer !

I wrote the CMS, and I had already changed the few 301 redirects into
302 to be sure they weren’t the problem.
So I’ll log $upstream_http_location and I’ll report back when I’ll get
this problem again.
In my PHP code, I’ll also log the headers so maybe I can find what could
trigger that strange behavior.

Olivier

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,132723,132862#msg-132862

Hello,

Thanks again Maxim ! With your help I’ve discovered that the bots which
were redirected with a 301 weren’t coming from my main domain
(something.fr) but they were coming from another domain I have
(something.com).
I had just setup the .com to redirect to the .fr but I weren’t using it.
Anyway, somehow some bots were visiting it… I’ll fix this redirection
problem in my configuration file so it won’t happen anymore.

Olivier

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,132723,134250#msg-134250