Paul In a perfect world yes. Here's a trick you guys can use... Generate a robots.txt and add a few pages not to crawl. Assuming bad bots will ignore, one of the "d onot crawl" pages will have a trigger that blocks the ip address of the session. You would need the ability to communicate the IP address of the offending bot to a process that does the blocking. There are various ways to do that.
On Wed, Sep 17, 2025 at 9:46 AM Paul Koning via cctalk < [email protected]> wrote: > A web crawler that does not obey robots.txt is not a law abiding outfit. > Best would be to block it entirely. If they are that dismissive of > honesty, they are also unlikely to pay attention to such matters as > copyright and intellectual property ownership. > > paul > > > On Sep 16, 2025, at 8:55 PM, Wayne S via cctalk <[email protected]> > wrote: > > > > They do not observe robots .txt > > Sent from my iPhone > > > >> On Sep 16, 2025, at 17:53, Wayne S <[email protected]> wrote: > >> > >> I did notice the scraping. > >> I toyed with the idea of putting ludicrous text files up that a normal > user would not see and see which bot got them. > >> > >> Sent from my iPhone > >> > >>> On Sep 16, 2025, at 17:02, Bill Degnan via cctalk < > [email protected]> wrote: > >>> > >>> For those of you who run vintage computing-related info sites, have > you > >>> noticed all of the LLM scraper activity? AI services are using the > LLM > >>> scrapers to populate their knowledge bases. > >>> > >>> At any given moment 5-10 of them are active on vintagecomputer.net. > It’s > >>> funny, when I ask an AI about something vintage computing-related, > >>> something obscure, I can trick into giving me an answer from my own > site. > >>> > >>> I have actually had to modify the site code to manage the traffic, to > >>> improve efficiency. > >>> > >>> But they’re not going after just my site, these scrapers are absorbing > >>> copies of the entire WWW. > >>> > >>> I wonder how long the WWW will remain open, it would be a bummer if I > found > >>> copies of my site elsewhere. > >>> > >>> Bill > >
