What do the users of this ML use to block abusive behavior and spiders
that don’t respect robots.txt?
I observe 2 kinds of abusive behavior on my site:
- Vulnerability scanners/ abusive crawlers
- Targeted PHP app failed logins
For 1, I block a few bots by manually adding them to a deny list in
nginx and a few others through spider traps, which are essentially
locations that log to a separate log file which is then scanned by
For 2, I do the same as above (the trap is an image on the login page)
but also use the Limit Requests module that logs to error.log and is
also scanned by fail2ban.
The problem is that I regularly have to go through the fail2ban logs
to see what it has caught and possibly add to the static list in the
nginx conf. It would be nice to have an auto-updating list of bad bots
from user-agents.org or similar sites instead of the hassle of having
to create my own list.
So, what do you use?