Hi Sébastien, On 08/06/26 at 21:37 +0200, Sebastien Bacher wrote: > Hey Lucas, > > Indeed, launchpad is still struggling with AI scrappers and similar and the > number of requests UDD is making has led to the IP being blocked again. It > is unblocked now,
Thanks! > but I think we need figure out a way for it to not hammer > launchpad that hard on a regular basis (if my previous quick check is > correct is does read 160k+ pages every run). Was there any technical reason > to not use launchpad API (which would allow to filter on recent changes and > only process bugs that changed recently instead of hammering every > launchpad bug page at every run)? Most of that code was written in 2008, so I don't remember the design choices from back then. Maybe the launchpad API wasn't ready for that back then. Still, the main reason for re-importing all bugs every time is that it's easier to ensure data correctness that way. Once you try to refresh only things that change, you increase your chances of running into corner cases... I would welcome a patch to re-implement the code using the launchpad API, but I'm unlikely to find time+motivation to work on it myself. In the meantime, if that helps, I can reduce the number of parallel workers from 2 to 1 -- I reorganized the process to split it into a download phase, and an SQL INSERT phase, so if the download phase takes several days, it's no longer a problem (previously the long running transaction caused a problem because it prevented VACUUMing). Lucas

