Hi,
I am running slurm version 19.05.0 and openmpi version 3.1.4. Openmpi is configured with pmi2 from slurm. Whenever I tried to run an mpi job with more than 1 node, I have this error message: srun: error: mpi/pmi2: failed to send temp kvs to compute nodes srun: Job step aborted: Waiting up to 32 seconds for job step to finish. then the job just got killed off. If I only use 1 node, then the job will run as normal. In my sbatch script I use srun --mpi=pmi2 mpi_job. Has anyone else encountered this problem but was able to fix it? Please help. Thanks, Lei