Hello,
when a bot visits our page, we want to create a response that is
different
than the response for a human. In particular we want to limit the
hierarchies of menus so that the bot doesnt think there are too may
tags/keywords for the site (and thus they dont get indexed) as well we
do
not display ads for bots.
We cache most of our pages using standard rails caching.
The problem is that when a bot visits the site, it will get the standard
cached page with the incorrect menus. We want to work around this so
that
bots do not get cached pages. (Yes, it is good if the bots get a
bot-specific cached page, but lets keep it simple so far).
We did try numerous the following:
1) using apache rewrite to add /robot to the URL. then mongrel will
never
find the cached page on disk. The problem here is that all the links in
the
page have /robot in front of them. So, having apache add /robot to the
URI
results in the PATH_INFO as-seen by rails to have /robot in it. Another
problem was that we effectivly duplicated the set of rules in routes.rb
for
the cases with /robot as a prefix.
We have 2 ideas:
1) when apache detects a bot, send the request to a non-caching
webserver.
Does anyonme know one of these?
2) I edited the mongrel source code in
mongrel-1.1.4/lib/mongrel/rails.rb
and added this kind of thing to the process method:
do_not_cache = KNOWN_ROBOT_AGENTS.detect{ |b|
user_agent.downcase.include?(b) } if user_agent
And then used that variable in the tests for @files.can_serve(...)
This works but we still want mongrel to serve static files as cached (so
the
rules above can take care of this too, it just gets more complicated to
check for /stylesheets, /.images etc.).
------------
Question: is there a way to plug-in our own logic into the mogrel
process of
handling a request? And/or can we set up a specific mongrel to never
cache
(are there options for this)?
Any ideas are appreciated,
Mike
on 2009-06-11 00:54
on 2009-06-11 03:24
What kind of caches are you talking about? Are these full page caches? The kind that get stored into /public? My question is, can you instead of adding /robot to the URL when apache finds the robot, can you instead change Apache's DocumentRoot? It seems to be that this would prevent apache from finding the cached page. Also, if you actually point to another copy of your /public, you could get the normal static pages... I think perhaps since you are talking about changing mongrel's caching behaviour that you aren't talking about the page caches that get stored into /public. (Well, I'm rusty on terminology here) -- Michael Richardson <mcr@simtone.net> Director -- Consumer Desktop Development, Simtone Corporation, Ottawa, Canada Personal: http://www.sandelman.ca/mcr/ SIMtone Corporation fundamentally transforms computing into simple, secure, and very low-cost network-provisioned services pervasively accessible by everyone. Learn more at www.simtone.net and www.SIMtoneVDU.com
on 2009-06-11 03:49
I am talking about Rails standard page-caching mechanism. Rails by default puts full pages into public/... and if mongrel sees them there, it serves them (without running rails dispatch et al). This is fine for normal but not good for the bot user agents. Here is a new solution: 1) set rails to cache in public/cache 2) Use Apache rewrite to serve these files directly (if found) 3) If not found, pass to mongrel which will not find the cached files either since MONGREL ONLY LOOKS IN public for cached files. Mongrel does not honor the config.action_controller.page_cache_directory rails setting 4) Rails processes the file and puts it into public/cache/... ...on the next request, apache serves from cache. I am working on the reqwrite rules etc. for this. Mike
on 2009-06-11 04:22
>>>>> "Mike" == Mike Papper <bodaro@gmail.com> writes: Mike> I am talking about Rails standard page-caching mechanism. Rails by default Mike> puts full pages into public/... and if mongrel sees them there, it serves Mike> them (without running rails dispatch et al). This is fine for normal but not Mike> good for the bot user agents. right, so that's what I thought you were talking about. Only, it's not mongrel that serves up the pages, but Apache, usually. In your apache config, you have something like: # Rewrite all non-static requests to cluster RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f RewriteRule ^/(.*)$ balancer://spartan_cluster%{REQUEST_URI} [P,QSA,L] which basically serves up any files found in /public, otherwise, punts to the mongrel. I thought that rails put the files directly there for apache to use/see. (there are caveats if your mongrel and apache do not share the same file system, such as because they are on different machines) If you are telling me that actually mongrel does this, it's news to me. Mike> Here is a new solution: Mike> 1) set rails to cache in public/cache Mike> 2) Use Apache rewrite to serve these files directly (if found) Mike> 3) If not found, pass to mongrel which will not find the cached files either Mike> since MONGREL ONLY LOOKS IN public for cached files. Mongrel does not honor Mike> the config.action_controller.page_cache_directory rails setting Mike> 4) Rails processes the file and puts it into public/cache/... Mike> ...on the next request, apache serves from cache. Mike> I am working on the reqwrite rules etc. for this. So, basically have apache pick a different cache location when it sees a robot. -- Michael Richardson <mcr@simtone.net> Director -- Consumer Desktop Development, Simtone Corporation, Ottawa, Canada Personal: http://www.sandelman.ca/mcr/ SIMtone Corporation fundamentally transforms computing into simple, secure, and very low-cost network-provisioned services pervasively accessible by everyone. Learn more at www.simtone.net and www.SIMtoneVDU.com
on 2009-06-11 04:45
"Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index." http://www.google.com/support/webmasters/bin/answe...
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.