Greetings all I am developing a list of broken webcrawlers who are repeatedly downloading my entire web site including the hidden stuff.
These crawlers/bots are ignoring my robots.txt files and aren't just indexing the site, but are downloading every single bit of every file there. This is burning up my upload bandwidth and constitutes a DDOS when 4 or 5 bots all go into this pull it all mode at the same time. How do I best deal with these poorly written bots? I can target the individual address of course, but have chosen to block the /24, but that seems not to bother them for more than 30 minutes. Its also a too broad brush, blocking legit addresses access. Restarting apache2 also work, for half an hour or so, but I may be interrupting a legit request for a realtime kernel whose built tree is around 2.7GB in tgz format How do I get their attention to stop the DDOS? Or is this a war you cannot win? Thanks all. Cheers, Gene Heskett -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis Genes Web page <http://geneslinuxbox.net:6309/gene>