On Sun, Jun 22, 2025 at 11:37:29PM +0200, Santiago Vila wrote: > Note for the record: I've just given Tony access to an AWS machine > of type m7a.large, which has 2 CPUs and 8 GB of RAM (I know this > is enough because I monitor /proc/meminfo during the builds). > > In this type of machine the failure rate is around 30% > so several tries might be necessary to get build failures.
Thank you for pushing this issue Santiago. I suppose I have just been lucky, but using the 2-core amd64-based VM you shared, I encountered similar failure rates running the tests. I want to point out that failing test class is not consistent across builds. I have expereinced at least 5 distinct test class failures. But because I haven't seen these with local builds, I initially pursued the hypothesis that this was due to limited resources, specifically cores. So I performed 30 builds on an 8-core / 32GB instance (arm64, m7g.2xlarge) and encountered 4 failures for 3 distinct test classes. And since then, I have experienced the occasional test failure on bare metal (non-hypervisor) systems, both 4-core and 8-core amd64 and arm64, although the failures are (much) less frequent. Because the failures occur for multiple different tests, I don't think we should attempt to disable tests 1-by-1, I expect that to become a game of whack-a-mole. As you suggested, we should engage with upstream regarding the Heisentests. I will work on that. For the trixie release, we can either request that the bug be ignored by the Release Managers or I can upload a packaging change to skip tests during the build by default and then request a freeze exception. If anyone has a strong preference, please speak up. Thank you, tony