Hello, what is the value of MpiDefault option in your Slurm configuration file?
2017-12-07 9:37 GMT-08:00 Glenn (Gedaliah) Wolosh <gwol...@njit.edu>: > Hello > > This is using Slurm version - 17.02.6 running on Scientific Linux release > 7.4 (Nitrogen) > > [gwolosh@p-slogin bin]$ module li > > Currently Loaded Modules: > 1) GCCcore/.5.4.0 (H) 2) binutils/.2.26 (H) 3) GCC/5.4.0-2.26 4) > numactl/2.0.11 5) hwloc/1.11.3 6) OpenMPI/1.10.3 > > If I run > > srun --nodes=8 --ntasks-per-node=8 --ntasks=64 ./ep.C.64 > > It runs successfuly but I get a message — > > PMI2 initialized but returned bad values for size/rank/jobid. > This is symptomatic of either a failure to use the > "--mpi=pmi2" flag in SLURM, or a borked PMI2 installation. > If running under SLURM, try adding "-mpi=pmi2" to your > srun command line. If that doesn't work, or if you are > not running under SLURM, try removing or renaming the > pmi2.h header file so PMI2 support will not automatically > be built, reconfigure and build OMPI, and then try again > with only PMI1 support enabled. > > If I run > > srun --nodes=8 --ntasks-per-node=8 --ntasks=64 —mpi=pmi2 ./ep.C.64 > > The job crashes > > If I run via sbatch — > > #!/bin/bash > # Job name: > #SBATCH --job-name=nas_bench > #SBATCH --nodes=8 > #SBATCH --ntasks=64 > #SBATCH --ntasks-per-node=8 > #SBATCH --time=48:00:00 > #SBATCH --output=nas.out.1 > # > ## Command(s) to run (example): > module use $HOME/easybuild/modules/all/Core > module load GCC/5.4.0-2.26 OpenMPI/1.10.3 > mpirun -np 64 ./ep.C.64 > > the job crashes > > Using easybuild, these are my config options for ompi — > > configopts = '--with-threads=posix --enable-shared > --enable-mpi-thread-multiple --with-verbs ' > configopts += '--enable-mpirun-prefix-by-default ' # suppress failure > modes in relation to mpirun path > configopts += '--with-hwloc=$EBROOTHWLOC ' # hwloc support > configopts += '--disable-dlopen ' # statically link component, don't do > dynamic loading > configopts += '--with-slurm --with-pmi ‘ > > And finally — > > $ ldd > /opt/local/easybuild/software/Compiler/GCC/5.4.0-2.26/OpenMPI/1.10.3/bin/orterun > | grep pmi > libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00007f0129d6d000) > libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00007f0129b51000) > > $ ompi_info | grep pmi > MCA db: pmi (MCA v2.0.0, API v1.0.0, Component v1.10.3) > MCA ess: pmi (MCA v2.0.0, API v3.0.0, Component v1.10.3) > MCA grpcomm: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3) > MCA pubsub: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3) > > > Any suggestions? > _______________ > Gedaliah Wolosh > IST Academic and Research Computing Systems (ARCS) > NJIT > GITC 2203 > 973 596 5437 <(973)%20596-5437> > gwol...@njit.edu > > -- С Уважением, Поляков Артем Юрьевич Best regards, Artem Y. Polyakov