Token bucket to limit bots and site grabbers

Hello

Is there any module I can use to limit or deny access to bots and site
grabbers, based on the long-term request rate?

I’m thinking of a token bucket with a timeframe of hours or days, where
a legitimate user will only download, say, 50 pages (images and css
excluded) per day, from a single ip address. Bots will obviously try and
grab more content than that. Even if they set a long delay between
requests, the overall number of requests per day will be much higher
than that of a legitimate user.

limit_req is not what I’m looking for, because it has a short timeframe
of seconds or minutes, and because this kind of limit requires a token
bucket, not a leaky bucket.

Is there anything available, or should I write my own module?

Tobia

Hello!

On Mon, Feb 15, 2010 at 11:34:16AM +0100, Tobia C. wrote:

limit_req is not what I’m looking for, because it has a short
timeframe of seconds or minutes, and because this kind of limit
requires a token bucket, not a leaky bucket.

To turn limit_req into token bucket it’s enough to specify
“nodelay” flag.

It should be relatively easy to extend supported time frames, too.
Not as easy as just adding another line of configuration parsing,
but I believe it’s something that should be done.

Maxim D.