I would like to put a brake on spiders which are hammering a site with
dynamic content generation. They should still get to see the content,
but only not generate excessive load. I therefore constructed a map to
identify spiders, which works well, and then tried to
On Mon, Oct 14, 2013 at 09:25:24AM -0400, Sylvia wrote:
Doesnt robots.txt “Crawl-Delay” directive satisfy your needs?
I have it already there, but I don’t know how long it takes for such a
directive, or any changes to robots.txt for that matter, to take effect.
Observing the logs, I’d say that this delay between changing robots.txt
and a change in robot behaviour would take several days, as I cannot see
any effects so far.
Normal spiders should obey robots.txt, if they dont - they can be banned.
Banning Google is not a good idea, no matter how abusive they might be,
and they incidentically operate one of those robots which keep hammering
the site. I’d much prefer a technical solution to enforce such limits,
over convention.
I’d also like to limit the request frequency over an entire pool, so
that I can say “clients from this pool can make requests only with this
fequency, combined, not per client IP”, because it doesn’t buy me
anything if I can limit the individual search robot to a decent
frequency, but then get hammered by 1000 search robots in parallel, each
one observing the request limit. Right?
If you have any tips, that would be much appreciated!
In your map, let $is_spider be empty if is not a spider (“default”,
presumably), and be something else if it is a spider (possibly
$binary_remote_addr if every client should be counted individually,
or something else if you want to group some spiders together.)
On Mon, Oct 14, 2013 at 03:23:03PM +0100, Francis D. wrote:
In your map, let $is_spider be empty if is not a spider (“default”,
presumably), and be something else if it is a spider (possibly
$binary_remote_addr if every client should be counted individually,
or something else if you want to group some spiders together.)
thanks a bunch! This works like a charm!
Kind regards,
–Toni++
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.