Two type of rate limiting for based on IP address

luislavena · March 12, 2012, 5:18am

Hi all,

How can I maintain two rate limit strategies?One for spiders and one
for
regular users?

I can get the IP address list of spiders from

http://www.iplists.com/ . Can I separate it by geo? Have people
attempted
this?

My website is being pounded by some screen scrapers and I want to block
them, but not at the risk of blocking search engine spiders.

-Quintin

Quintin_P · March 12, 2012, 3:34pm

My website is being pounded by some screen scrapers and I want to block
them, but not at the risk of blocking search engine spiders.

Do you understand that by going that way, regular users will be subject
to
the same request limiting of the bad spiders? You can try to do further
selection on UA, but bad spiders have the habit of providing bogus UA
strings.

At the http level:

geo $good_spider {
default 0;
#list all good spider IPs
}

limit_req_zone $binary_remote_addr zone=bad_spiders:10m rate=1r/s;

On the vhost (server level):

location / {
limit_req zone=bad_spiders burst=5;

error_page 418 @good-spiders;

if ($good_spider) {
    return 418;
}
#...

}

location @good-spiders {
# no limits here
#…
}

–appa

Quintin_P · March 12, 2012, 3:47pm

> > location / { > limit_req zone=bad_spiders burst=5; > > error_page 418 @good-spiders;

Oops. This should be:

error_page 418 =200 @good-spiders;

–appa