hdv@gmail (12020-12-04): > Let me offer you an alternative option. (Most) bots work by analysing the > referrals on each page in your website. Right? So, why not add a link to a > page that normal users will never visit (e.g. because they do not see the > link and thus will never click on it), but will show up in a bot's analysis? > That way you can monitor your logs for entries containing that page. Every > entity requesting that specific URL is blocked.
This made me think of something. A long time ago, a friend of mine implemented, to trap the badly-behaved robots, something called the Book of Infinity: a set of deterministic pseud-random pages linking to sub-pages ad infinitum, with ever growing URLs. As it happened, it was not actually a good idea, and released a lot of CO₂, and the very badly behaved robots had to be blacklisted from explring it. (At some point, we had the same problem when Googlebot tried to brute-force our online make-your-own-adventure book, but Googlebot heeds robots.txt.) But it could be coupled with techniques inspired by spam tarpits: have the server reply at a crawl to force the bots to waste resources, while keeping the resource consumption on the server strictly bounded. Oh, I just noticed I was not the first one to think of it: Wikipedia tells me it's called a spider trap. https://en.wikipedia.org/wiki/Spider_trap Regards, -- Nicolas George
signature.asc
Description: PGP signature