Proxy_cache doesn't return headers

Hello,

We have quite busy phpbb based forum. I found that it is quite heavy
loaded with bots(Google, AdSense, Yahoo. It seems that AdSense bot
duplicate every user’s request). I have added confuguration to cache all
dynamic requests and return them to bots only(see config below). But I
faced strange issue, when page returned from cache - nginx doesn’t pass
any headers back.

So questions are:

  • Are there anything bad that response considered as http 0.9?
  • Is it possible configure it to return headers in case when page was
    returned from cache so response will be considered as 1.1?
  • It seems that cache refreshed evertime when page was hit by user(not
    bot). Is it possible to configure it not to refresh it till it is
    valid(72h in config). To not produce not necessary IO.

Here is how it looks in logs
{{{
66.249.66.89 - - [03/Mar/2011:12:02:51 +0200] “GET
/viewtopic.php?f=112&t=73799&p=2363346 HTTP/1.1” 009 175820 “-”
“Mediapartners-Google”
}}}

Here is how response looks in wget -d
–user-agent=“Mediapartners-Google”…
{{{
HTTP request sent, awaiting response…
—response begin—
—response end—
200 No headers, assuming HTTP/0.9
Length: unspecified
Saving to: `index.php.3’
}}}

Config sample

{{{

    set $crawlernocache 1;
    if ($http_user_agent ~ ".*Google.*"){
     set $crawlernocache 0;
    }

    if ($http_user_agent ~ ".*Yandex.*"){
     set $crawlernocache 0;
    }


  ...


          location / {
            proxy_pass  http://127.0.0.1:80;
            proxy_set_header  X-Real-IP  $remote_addr;
            proxy_set_header Host $http_host;
            proxy_set_header X-Forwarded-For

$proxy_add_x_forwarded_for;
proxy_set_header Accept-Encoding “”;
proxy_send_timeout 300;
set $delimeter +;
proxy_ignore_headers “Expires” “X-Accel-Expires”
“Cache-Control”;
proxy_cache_bypass $crawlernocache;
proxy_cache crawlercache;
proxy_cache_key
$host$uri$arg_f$delimeter$arg_t$delimeter$arg_start$delimeter$arg_p;
proxy_cache_valid 200 72h;
}

}}}

Posted at Nginx Forum:

Hello!

On Thu, Mar 03, 2011 at 05:23:45AM -0500, Alexander Zuban wrote:

  • Are there anything bad that response considered as http 0.9?
  • Is it possible configure it to return headers in case when page was
    returned from cache so response will be considered as 1.1?
  • It seems that cache refreshed evertime when page was hit by user(not
    bot). Is it possible to configure it not to refresh it till it is
    valid(72h in config). To not produce not necessary IO.

[…]

            proxy_cache_bypass $crawlernocache;

You are using proxy_cache_bypass without identical proxy_no_cache,
it’s known to have problems (and that’s why you see http 0.9
replies). You have to write something like

              proxy_cache_bypass $crawlernocache;
              proxy_no_cache $crawlernocache;

instead. This way a) http 0.9 problem will be fixed and b) normal
users hits won’t touch cache at all.

Maxim D.

Maxim thank you!

You are right, this was what I did also(and seemsit start to return
headers). Seems it will be simpler to prefetch recent pages in cache
during the night .

Regards,
Alexander

Posted at Nginx Forum: