I think it's related to the job step launch semantic change introduced at 20.11.0, which has been reverted since 20.11.3, see https://www.schedmd.com/news.php For details.
Cheers, Angelos (Sent from mobile, please pardon me for typos and cursoriness.) > 26/2/2021 9:07、Volker Blum <volker.b...@duke.edu>のメール: > > Hi, > > I am testing slurm 20.11.2 on a local cluster together with Intel MPI > 2018.4.274 . > > > 1) On a single node (20 physical cores) and executed manually (no slurm), a > particular application runs fine using Intel’s mpirun, execution time for > this example: 8.505 s (wall clock). > > (this is a straight MPI application, no complications) > > > 2) Using slurm and Intel’s mpirun through a queue / batch script, > > #SBATCH --ntasks-per-node=20 > > > … > mpirun -n 20 $bin > file.out > > the same job runs correcty but takes 121.735 s (wall clock!) > > > 3) After some considerable searching, a partial fix is > > #SBATCH --ntasks-per-node=20 > > > ... > export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0 > srun --cpu-bind=cores -n 20 $bin > file.out > > can bring down the execution time to 13.482 s > > > 4) After changing > > #SBATCH --ntasks-per-node=20 > > > #SBATCH --cpus-per-task=2 > > > ... > export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so.0 > srun --cpu-bind=cores -n 20 $bin > file.out > > finally, the time is: 8.480 s > > This timing is as it should be, but at the price of pretending that an > application is multithreaded when it is, in fact, not multithreaded. > > *** > > Is it possible to just keep Intel MPI defaults intact when using its mpirun > in a slurm batch script? > > Best wishes > Volker > > Volker Blum > Associate Professor > Ab Initio Materials Simulations > Thomas Lord Department of Mechanical Engineering and Materials Science > Duke University > https://aims.pratt.duke.edu > > volker.b...@duke.edu > Twitter: Aimsduke > > Office: 4308 Chesterfield Building > > > > > > > > >