Hi Christopher, We hit some problems at LANL trying to use this SLURm feature. At the time, I think SchedMD said there would need to be fixes to the SLURM PMI2 library to get this to work.
What version of SLURM are you using? Howard -- Howard Pritchard B Schedule HPC-ENV Office 9, 2nd floor Research Park TA-03, Building 4200, Room 203 Los Alamos National Laboratory On 10/9/18, 8:50 PM, "slurm-users on behalf of Gilles Gouaillardet" <slurm-users-boun...@lists.schedmd.com on behalf of gil...@rist.or.jp> wrote: >Christopher, > > >This looks like a SLURM issue and Open MPI is (currently) out of the >picture. > > >What if you > > >srun --pack-group=0,1 hostname > > >Do you get a similar error ? > > >Cheers, > >Gilles > >On 10/10/2018 3:07 AM, Christopher Benjamin Coffey wrote: >> Hi, >> >> I have a user trying to setup a heterogeneous job with one >>MPI_COMM_WORLD with the following: >> >> ========== >> #!/bin/bash >> #SBATCH --job-name=hetero >> #SBATCH --output=/scratch/cbc/hetero.txt >> #SBATCH --time=2:00 >> #SBATCH --workdir=/scratch/cbc >> #SBATCH --cpus-per-task=1 --mem-per-cpu=2g --ntasks=1 -C sb >> #SBATCH packjob >> #SBATCH --cpus-per-task=1 --mem-per-cpu=1g --ntasks=1 -C sl >> #SBATCH --mail-type=START,END >> >> module load openmpi/3.1.2-gcc-6.2.0 >> >> srun --pack-group=0,1 ~/hellompi >> =========== >> >> >> Yet, we get an error: " srun: fatal: Job steps that span multiple >>components of a heterogeneous job are not currently supported". But the >>docs seem to indicate it should work? >> >> IMPORTANT: The ability to execute a single application across more than >>one job allocation does not work with all MPI implementations or Slurm >>MPI plugins. Slurm's ability to execute such an application can be >>disabled on the entire cluster by adding "disable_hetero_steps" to >>Slurm's SchedulerParameters configuration parameter. >> >> By default, the applications launched by a single execution of the srun >>command (even for different components of the heterogeneous job) are >>combined into one MPI_COMM_WORLD with non-overlapping task IDs. >> >> Does this not work with openmpi? If not, which mpi/slurm config will >>work? We have slurm.conf MpiDefault=pmi2 currently. I've tried a modern >>openmpi, and also mpich, and mvapich2. >> >> Any help would be appreciated, thanks! >> >> Best, >> Chris >> >> — >> Christopher Coffey >> High-Performance Computing >> Northern Arizona University >> 928-523-1167 >> >> > >