Hi there,
Using slurm v24.11.0 together with openMPI 5.0.7 built with openpmix v5.0.6 i
am facing a systematical crash at process wiring-up phase when launching
standard MPI job (OSU benchmarks ) on our new AMD compute nodes
( amd-epyc 9654, 192 phys. cores +HT ) running Rocky Linux 9.4 OS
The
Hi
I made some progress trying to understand the problem i reported some weeks ago:
https://lists.schedmd.com/pipermail/slurm-users/2023-May/010027.html
I noticed that the intermittent connection timeout that i am experiencing
occurs only
when using the tcp based direct connection to establi
I am facing the same problem that was quoted long ago (2019) in this mailing
mailing reference:
https://lists.schedmd.com/pipermail/slurm-users/2019-July/003785.html
but with more recent version of slurm i.e:
slurm 21.08.8-2
PMIx 2.2.5 (pmix-2.2.5-1.el8.src.rpm)
openMPI 4.1.5
In a similar