Hello This is using Slurm version - 17.02.6 running on Scientific Linux release 7.4 (Nitrogen)
[gwolosh@p-slogin bin]$ module li Currently Loaded Modules: 1) GCCcore/.5.4.0 (H) 2) binutils/.2.26 (H) 3) GCC/5.4.0-2.26 4) numactl/2.0.11 5) hwloc/1.11.3 6) OpenMPI/1.10.3 If I run srun --nodes=8 --ntasks-per-node=8 --ntasks=64 ./ep.C.64 It runs successfuly but I get a message — PMI2 initialized but returned bad values for size/rank/jobid. This is symptomatic of either a failure to use the "--mpi=pmi2" flag in SLURM, or a borked PMI2 installation. If running under SLURM, try adding "-mpi=pmi2" to your srun command line. If that doesn't work, or if you are not running under SLURM, try removing or renaming the pmi2.h header file so PMI2 support will not automatically be built, reconfigure and build OMPI, and then try again with only PMI1 support enabled. If I run srun --nodes=8 --ntasks-per-node=8 --ntasks=64 —mpi=pmi2 ./ep.C.64 The job crashes If I run via sbatch — #!/bin/bash # Job name: #SBATCH --job-name=nas_bench #SBATCH --nodes=8 #SBATCH --ntasks=64 #SBATCH --ntasks-per-node=8 #SBATCH --time=48:00:00 #SBATCH --output=nas.out.1 # ## Command(s) to run (example): module use $HOME/easybuild/modules/all/Core module load GCC/5.4.0-2.26 OpenMPI/1.10.3 mpirun -np 64 ./ep.C.64 the job crashes Using easybuild, these are my config options for ompi — configopts = '--with-threads=posix --enable-shared --enable-mpi-thread-multiple --with-verbs ' configopts += '--enable-mpirun-prefix-by-default ' # suppress failure modes in relation to mpirun path configopts += '--with-hwloc=$EBROOTHWLOC ' # hwloc support configopts += '--disable-dlopen ' # statically link component, don't do dynamic loading configopts += '--with-slurm --with-pmi ‘ And finally — $ ldd /opt/local/easybuild/software/Compiler/GCC/5.4.0-2.26/OpenMPI/1.10.3/bin/orterun | grep pmi libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00007f0129d6d000) libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00007f0129b51000) $ ompi_info | grep pmi MCA db: pmi (MCA v2.0.0, API v1.0.0, Component v1.10.3) MCA ess: pmi (MCA v2.0.0, API v3.0.0, Component v1.10.3) MCA grpcomm: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3) MCA pubsub: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3) Any suggestions? _______________ Gedaliah Wolosh IST Academic and Research Computing Systems (ARCS) NJIT GITC 2203 973 596 5437 gwol...@njit.edu