Hi, On Wed, Aug 17, 2022 at 10:25:38PM +0200, Paul Gevers wrote: > Control: severity -1 serious > Control: retitle -1 autopkgtest fails on hosts with lots of RAM/cores > > Hi, > > On Sun, 3 Apr 2022 19:42:42 +0200 Michael Banck <mba...@debian.org> wrote: > > Hrm, it seems that test case passed now on the latest upload: > > https://ci.debian.net/data/autopkgtest/unstable/amd64/b/bagel/20573831/log.gz > > > > |Get:14 http://deb.debian.org/debian unstable/main amd64 libmpich12 amd64 > > 4.0.1-1 [4,924 kB] > > [...] > > |running test case 'he3_svp_asd-dmrg'... PASSED. > > > > So I'm a bit at a loss about what's going on here, perhaps that test > > case really is just flakey. > > Yes, this test looks flaky (I came here because it was blocking glibc). The > good news is however, it seems related to the host that runs the test. I.e. > the test fails on our beefy amd64 host (ci-worker13) with 64 cores and 256GB > RAM, but seems to pass on the others. > > The error on s390x is the same by the way (that has 10 cores and 32GB RAM).
I can reproduce this again on my developer (amd64) notebook. If I downgrade mpich from 4.0.2 to 3.x, it passes fine: |(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ dpkg -l | grep mpich |ii libmpich12:amd64 3.4.1-5 amd64 Shared libraries for MPICH |(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ ./debian/tests/testsuite.sh |running test case 'he3_svp_asd-dmrg'... PASSED. |All tests passed |(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ dpkg -l | grep mpich |ii libmpich12:amd64 4.0.2-2 amd64 Shared libraries for MPICH |(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ ./debian/tests/testsuite.sh |running test case 'he3_svp_asd-dmrg'... FAILED. | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * broadcast 0.00 | * dmrg block 0.00 | >> ** .. 0.17 | | ===== Starting sweeps ===== | | o convergence threshold: 1.0000e-08 | iter state sweep average sweep range dE average | ERROR: EXCEPTION RAISED: dsyev/pdsyevd failed in Matrix |1 tests failed If I set BAGEL_NUM_THREADS as Graham suggests it also passes, so I'll upload that now: |(unstable-amd64-sbuild)mba@curie:/tmp/autopkgtest.p02Sns/build.Osj/src$ BAGEL_NUM_THREADS=4 ./debian/tests/testsuite.sh |running test case 'he3_svp_asd-dmrg'... PASSED. |All tests passed Michael