[slurm-users] broken SLURM-PMIX out-of-band communication on v24.11.0 with PMIx v5

2025-03-06 Thread Bertini, Denis Dr. via slurm-users
Hi there, Using slurm v24.11.0 together with openMPI 5.0.7 built with openpmix v5.0.6 i am facing a systematical crash at process wiring-up phase when launching standard MPI job (OSU benchmarks ) on our new AMD compute nodes ( amd-epyc 9654, 192 phys. cores +HT ) running Rocky Linux 9.4 OS The

[slurm-users] PMix3 Plugin+ openMPI 4.1.5 broken for heterogenous jobs with SLURM v 21.08.8-2

2023-06-19 Thread Bertini, Denis Dr.
Hi I made some progress trying to understand the problem i reported some weeks ago: https://lists.schedmd.com/pipermail/slurm-users/2023-May/010027.html I noticed that the intermittent connection timeout that i am experiencing occurs only when using the tcp based direct connection to establi

[slurm-users] PMIx + openMPI with heterogeneous jobs

2023-05-24 Thread Bertini, Denis Dr.
I am facing the same problem that was quoted long ago (2019) in this mailing mailing reference: https://lists.schedmd.com/pipermail/slurm-users/2019-July/003785.html but with more recent version of slurm i.e: slurm 21.08.8-2 PMIx 2.2.5 (pmix-2.2.5-1.el8.src.rpm) openMPI 4.1.5 In a similar