Forum: Ruby on Rails robots.txt best practices

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Steve Odom (Guest)
on 2006-02-10 16:42
(Received via mailing list)
I'd been ignoring this error message in my log for a while:

ActionController::RoutingError (Recognition failed for "/robots.txt"):

I had never touched robots.txt. So I decided to make it a proper
robots.txtfile

I found this great article...
http://www.ilovejackdaniels.com/seo/robots-txt-file/

...where Dave explains the ins and outs of the file.

Before I changed mine, I thought I'd poll the group to see if anyone had
any
good thoughts on the subject -like any rails-specific excludes. And
whether
some samples could be posted.

Mine was going to look like this:
User-agent: *
Disallow: /404.php

Thanks,

Steve
http://www.smarkets.net
softwareengineer 99 (Guest)
on 2006-02-10 17:42
(Received via mailing list)
Hi Steve,

  I would like to warn you about the issue of abusing bots, i.e. bots
who  do not obey robots.txt. Such bots can really eat up your bandwidth
fast.

  On one of my servers in October last month the total bandwidth usage
was 2300GB (serving just text and images). A detailed log scan showed
that most of the bandwidth was used by bots in Asian countries.

  Scraping is really "hot" these days so you will want to ensure that
you  get hold of various abusing bots to include in your robots.txt
file.

  Since I manage multiple domains on dedicated clusters, the solution
for  me was to ban these bots using mod_rewrite.  If you would like, I
can post a copy of my mod_rewrite banned user agents.

  I recommend putting crawl delay in your robots.txt file (if you have a
large webite) otherwise bots like MSN can hit your site hard


User-agent: *
Crawl-delay: 17


  Not that this answers your question, but I thought it may help.

  Frank



Steve Odom <steve.odom@gmail.com> wrote:  I'd been ignoring this error
message in my log for a while:

ActionController::RoutingError (Recognition failed for "/robots.txt"):

I had never touched robots.txt. So I decided to make it a proper
robots.txt   file

I found this great article...
http://www.ilovejackdaniels.com/seo/robots-txt-file/

...where Dave explains the ins and outs of the file.

Before I changed mine, I thought I'd poll the group to see if  anyone
had any good thoughts on the subject -like any rails-specific  excludes.
And whether some samples could be posted.

Mine was going to look like this:
User-agent: *
Disallow: /404.php

Thanks,

Steve
http://www.smarkets.net
  _______________________________________________
Rails mailing list
Rails@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails
Tony Collen (Guest)
on 2006-02-10 18:02
(Received via mailing list)
Yes, keep in mind that robots.txt is just a *suggestion* -- nothing has
to
follow it.

Another easy way of banning abusive bots is to use Allow/Deny rules in
your
Apache configuration and ban them by IP or subnet, e.g.

Deny from www.xxx.yyy.zzz

http://www.brainstormsandraves.com/archives/2005/1...

Tony
softwareengineer 99 (Guest)
on 2006-02-10 19:03
(Received via mailing list)
Hello Kevin,

      Thank you for your reply.

      Here is a mini tutorial I created on how to install/configure
mod_security for Apache:

      http://frankmash.blogspot.com/2005_12_09_frankmash...

      Here I have posted my current useragents.conf files and a sample
of   how to ban using .htaccess/mod_rewrite, as well as links to a great
three -part discussion on WebmasterWorld.
      http://frankmash.blogspot.com/2006/02/banning-abus...

      And finally, I also posted a current copy of my blacklist.conf
file for mod_security
      http://network-security-blacklists.blogspot.com/20...

      Please feel free to ask any question you may have regarding this.

      Hope this helps.

      Frank


    Kevin Skoglund <kevin@pixelandpress.com> wrote:Please do post (or
send just to me) your mod_rewrite banned user
    agents.  That would be very helpful.

    Thanks,
    Kevin
This topic is locked and can not be replied to.