also please post the output of $ srun --mpi=list When job crashes - is there any error messages in the relevant slurmd.log's or output on the screen?
2017-12-07 9:49 GMT-08:00 Artem Polyakov <artpo...@gmail.com>: > Hello, > > what is the value of MpiDefault option in your Slurm configuration file? > > 2017-12-07 9:37 GMT-08:00 Glenn (Gedaliah) Wolosh <gwol...@njit.edu>: > >> Hello >> >> This is using Slurm version - 17.02.6 running on Scientific Linux release >> 7.4 (Nitrogen) >> >> [gwolosh@p-slogin bin]$ module li >> >> Currently Loaded Modules: >> 1) GCCcore/.5.4.0 (H) 2) binutils/.2.26 (H) 3) GCC/5.4.0-2.26 4) >> numactl/2.0.11 5) hwloc/1.11.3 6) OpenMPI/1.10.3 >> >> If I run >> >> srun --nodes=8 --ntasks-per-node=8 --ntasks=64 ./ep.C.64 >> >> It runs successfuly but I get a message — >> >> PMI2 initialized but returned bad values for size/rank/jobid. >> This is symptomatic of either a failure to use the >> "--mpi=pmi2" flag in SLURM, or a borked PMI2 installation. >> If running under SLURM, try adding "-mpi=pmi2" to your >> srun command line. If that doesn't work, or if you are >> not running under SLURM, try removing or renaming the >> pmi2.h header file so PMI2 support will not automatically >> be built, reconfigure and build OMPI, and then try again >> with only PMI1 support enabled. >> >> If I run >> >> srun --nodes=8 --ntasks-per-node=8 --ntasks=64 —mpi=pmi2 ./ep.C.64 >> >> The job crashes >> >> If I run via sbatch — >> >> #!/bin/bash >> # Job name: >> #SBATCH --job-name=nas_bench >> #SBATCH --nodes=8 >> #SBATCH --ntasks=64 >> #SBATCH --ntasks-per-node=8 >> #SBATCH --time=48:00:00 >> #SBATCH --output=nas.out.1 >> # >> ## Command(s) to run (example): >> module use $HOME/easybuild/modules/all/Core >> module load GCC/5.4.0-2.26 OpenMPI/1.10.3 >> mpirun -np 64 ./ep.C.64 >> >> the job crashes >> >> Using easybuild, these are my config options for ompi — >> >> configopts = '--with-threads=posix --enable-shared >> --enable-mpi-thread-multiple --with-verbs ' >> configopts += '--enable-mpirun-prefix-by-default ' # suppress failure >> modes in relation to mpirun path >> configopts += '--with-hwloc=$EBROOTHWLOC ' # hwloc support >> configopts += '--disable-dlopen ' # statically link component, don't do >> dynamic loading >> configopts += '--with-slurm --with-pmi ‘ >> >> And finally — >> >> $ ldd >> /opt/local/easybuild/software/Compiler/GCC/5.4.0-2.26/OpenMPI/1.10.3/bin/orterun >> | grep pmi >> libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00007f0129d6d000) >> libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00007f0129b51000) >> >> $ ompi_info | grep pmi >> MCA db: pmi (MCA v2.0.0, API v1.0.0, Component v1.10.3) >> MCA ess: pmi (MCA v2.0.0, API v3.0.0, Component v1.10.3) >> MCA grpcomm: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3) >> MCA pubsub: pmi (MCA v2.0.0, API v2.0.0, Component v1.10.3) >> >> >> Any suggestions? >> _______________ >> Gedaliah Wolosh >> IST Academic and Research Computing Systems (ARCS) >> NJIT >> GITC 2203 >> 973 596 5437 <(973)%20596-5437> >> gwol...@njit.edu >> >> > > > -- > С Уважением, Поляков Артем Юрьевич > Best regards, Artem Y. Polyakov > -- С Уважением, Поляков Артем Юрьевич Best regards, Artem Y. Polyakov