Hi, I am testing slurm 20.11.2 on a local cluster together with Intel MPI 2018.4.274 .
1) On a single node (20 physical cores) and executed manually (no slurm), a particular application runs fine using Intel’s mpirun, execution time for this example: 8.505 s (wall clock). (this is a straight MPI application, no complications) 2) Using slurm and Intel’s mpirun through a queue / batch script, #SBATCH --ntasks-per-node=20 … mpirun -n 20 $bin > file.out the same job runs correcty but takes 121.735 s (wall clock!) 3) After some considerable searching, a partial fix is #SBATCH --ntasks-per-node=20 ... export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0 srun --cpu-bind=cores -n 20 $bin > file.out can bring down the execution time to 13.482 s 4) After changing #SBATCH --ntasks-per-node=20 #SBATCH --cpus-per-task=2 ... export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0 srun --cpu-bind=cores -n 20 $bin > file.out finally, the time is: 8.480 s This timing is as it should be, but at the price of pretending that an application is multithreaded when it is, in fact, not multithreaded. *** Is it possible to just keep Intel MPI defaults intact when using its mpirun in a slurm batch script? Best wishes Volker Volker Blum Associate Professor Ab Initio Materials Simulations Thomas Lord Department of Mechanical Engineering and Materials Science Duke University https://aims.pratt.duke.edu volker.b...@duke.edu Twitter: Aimsduke Office: 4308 Chesterfield Building