Googlebots - sitemaps

I am close to launching a web site and so I have directed Google (via
Webmaster Tools) to index parts of the site via a sitemap (a rather
interesting approach to testing I might add).

Anyway, there are places where I require a login and generally if it
fails to find a logged in user, it does…
request.env[“HTTP_REFERER”]
redirect_to :back
return false

Now a GoogleBot doesn’t submit a reply to the
request.env[“HTTP_REFERRER”] so it dutifully sends me an e-mail via my
exception handler.

How do people handle these things?

Should I be doing something like…

my_url = request.env[“HTTP_REFERRER”] || “:action => ‘index’”
redirect_to my_url
return false

I am sure that at least at this point, I don’t want to not send error
mails on 404’s just as a sanity check.

Craig


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Maybe I got you wrong, but wouldn’t a rel=“nofollow” in your login-links
be a solution for this situation?

On Fri, 2010-02-12 at 15:40 +0100, Tom Ha wrote:

Maybe I got you wrong, but wouldn’t a rel=“nofollow” in your login-links
be a solution for this situation?


I think that for this usage, it’s not appropriate.

In essence, I am providing a sitemap for all of the valid
controllers/actions but clearly there are a few that require a login or
else the user is ‘redirect_to :back’ with a flash[:notice] explanation.
But this doesn’t work for GoogleBot since it doesn’t supply an answer to
request.env[“HTTP_REFERRER”] and thus is redirected to nowhere which
generates an error for the Bot and sends me an e-mail… neither of
which are particularly useful.

I suspect that even if I were to remove those specific
controller/actions that require a valid login from the sitemap, that
Bots will eventually find them anyway so I think the better long term
thought is to handle them in the methods now anyway.

I figured that there are people who have run into this before on the
list and might have some suggestions.

Craig


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

On Fri, Feb 12, 2010 at 6:34 AM, Craig W. [email protected]
wrote:

I am close to launching a web site and so I have directed Google (via
Webmaster Tools) to index parts of the site via a sitemap (a rather
interesting approach to testing I might add).

Um, well; there are other link-checkers available, btw, but…

Anyway, there are places where I require a login and generally if it
fails to find a logged in user, it does…
request.env[“HTTP_REFERER”]
redirect_to :back
return false

That seems like a really bad idea to me: referer isn’t a required
header, period. Some paranoid clients won’t send them at all, and
if for instance your legitimate user types in the URL, or invokes it
from a bookmark, there won’t be a referer.

Most apps redirect to a login page if login is required. :slight_smile:

FWIW,

Hassan S. ------------------------ [email protected]
twitter: @hassan

Craig W. wrote:

I am close to launching a web site and so I have directed Google (via
Webmaster Tools) to index parts of the site via a sitemap (a rather
interesting approach to testing I might add).
[…]

And perhaps not a good one. Do you really want your not-yet-launched
pages in Google’s index before launch?

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Fri, 2010-02-12 at 17:54 +0100, Marnen Laibow-Koser wrote:

Craig W. wrote:

I am close to launching a web site and so I have directed Google (via
Webmaster Tools) to index parts of the site via a sitemap (a rather
interesting approach to testing I might add).
[…]

And perhaps not a good one. Do you really want your not-yet-launched
pages in Google’s index before launch?


yes because from the ‘visitors’ perspective, it is mostly finished and
it takes time to get the ‘bots’ to index and hopefully list the site in
searches.

I am mostly concerned with handling ‘routing’ issues caused by things
like ‘redirect_to :back’ which doesn’t work for ‘bots’ because they
can’t (or won’t) provide any useful information to
'request.env[“HTTP_REFERRER”]

Craig


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

On Feb 12, 2010, at 11:54 AM, Marnen Laibow-Koser wrote:


Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Of course! You want to give Google (and other bots) time to find and
index your site so that it’s primed for launch. The timing is the
only tricky part. I had to redesign a “deep” page to act more like an
entry page because so many visitors got there first by coming from
Google search results.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]