Package: mpich Version: 4.3.0-5 Severity: serious Justification: debci I apologise for another serious bug, but mpich 4.3 is doing weird things that we don't want in trixie. I see the problem in mpich test errors in armci-mpi (https://buildd.debian.org/status/fetch.php?pkg=armci-mpi&arch=amd64&ver=0.4-5&stamp=1744327219&raw=0 ) but can reproduce in a trivial test.
The problem is that mpich is not initialising multiple processes. Instead it is simply launching multiple single processes (each with MPI_Comm_size = 1). You can see the problem in the armci-mpi test errors, e.g. FAIL: benchmarks/ping-pong ========================== [1744327153.607644] [sbuild:19884:0] sock.c:513 UCX WARN unable to read somaxconn value from /proc/sys/net/core/somaxconn file [0] ARMCI Error: This benchmark should be run on at least two processes Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 1) - process 0 [0] ARMCI Error: This benchmark should be run on at least two processes [1744327153.612861] [sbuild:19883:0] sock.c:513 UCX WARN unable to read somaxconn value from /proc/sys/net/core/somaxconn file Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 1) - process 0 FAIL benchmarks/ping-pong (exit status: 1) The error message "at least two processes" is issued by armci-mpi's ping-pong.c, when it detects MPI_Comm_size = 1. But the test is launched with mpiexec.mpich -np 2 (that's why the error is repeated twice). I can reproduce the issue with a trivial test: ``` $ cat mpich_test.c #include <stdio.h> #include <mpi.h> int main(int argc, char **argv) { int me, nproc; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &me); MPI_Comm_size(MPI_COMM_WORLD, &nproc); printf("mpi test rank %d of %d\n", me, nproc); MPI_Finalize(); return 0; } $ mpicc.mpich -o mpich_test mpich_test.c $ mpiexec.mpich -n 4 ./mpich_test mpi test rank 0 of 1 mpi test rank 0 of 1 mpi test rank 0 of 1 mpi test rank 0 of 1 ``` It should instead be reporting mpi test rank 3 of 4 mpi test rank 1 of 4 mpi test rank 0 of 4 mpi test rank 2 of 4 There is even more weirdness however. The first time I compiled and ran this trivial test, it did report having 4 processes, but that correct output was accompanied with pmix warnings: Query for unrecognized attribute: pmix.qry.node Query for unrecognized attribute: pmix.qry.peers But after recompiling the same way, it no longer gave the correct output but also did not give the pmix warnings. Can you reproduce this problem? -- System Information: Debian Release: trixie/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 6.12.21-amd64 (SMP w/8 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), LANGUAGE=en_AU:en Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages mpich depends on: ii hwloc 2.12.0-1 ii libc6 2.41-6 ii libhwloc15 2.12.0-1 ii libmpich12 4.3.0-5 ii libslurm42t64 24.11.3-2 ii perl 5.40.1-2 Versions of packages mpich recommends: ii libmpich-dev 4.3.0-5 Versions of packages mpich suggests: ii mpich-doc 4.3.0-5 -- no debconf information