We are in the process of upgrading to CentOS 7, and have built Slurm 19.05.5 and OpenMPI 4.0.3 for CentOS 7. When I submit that launches using srun, the job appears to be running according to squeue, (state = R), but the program doesn't do anything. I'm testing with a simple Hello, World program that I've been using for years.

This same program runs just fine when I launch it with mpiexec or mpirun instead of srun. Any ideas of what's wrong?

I was originally getting a different failure, but over on the OpenMPI list, where I was told that Slurm wasn't compiled with PMIx support, so we rebuilt Slurm with PMIx support, and that's when the jobs starting "running" and just hanging. Any ideas of what's wrong or how to debug this?

THanks,

Prentice


Reply via email to