On Mon, Oct 14, 2013 at 09:25:24AM -0400, Sylvia wrote:
Doesnt robots.txt “Crawl-Delay” directive satisfy your needs?
I have it already there, but I don’t know how long it takes for such a
directive, or any changes to robots.txt for that matter, to take effect.
Observing the logs, I’d say that this delay between changing robots.txt
and a change in robot behaviour would take several days, as I cannot see
any effects so far.
Normal spiders should obey robots.txt, if they dont - they can be banned.
Banning Google is not a good idea, no matter how abusive they might be,
and they incidentically operate one of those robots which keep hammering
the site. I’d much prefer a technical solution to enforce such limits,
I’d also like to limit the request frequency over an entire pool, so
that I can say “clients from this pool can make requests only with this
fequency, combined, not per client IP”, because it doesn’t buy me
anything if I can limit the individual search robot to a decent
frequency, but then get hammered by 1000 search robots in parallel, each
one observing the request limit. Right?