I got this same error when testing on older updates (17.11?). Try the Slurm-18.08 branch or master. I'm testing 18.08 now and get this:
[slurm@trek6 mpihello]$ srun -phyper -n3 --mpi=pmi2 --pack-group=0-2 ./mpihello-ompi2-rhel7 | sort srun: job 643 queued and waiting for resources srun: job 643 has been allocated resources Hello world, I am 0 of 9 - running on trek7 Hello world, I am 1 of 9 - running on trek7 Hello world, I am 2 of 9 - running on trek7 Hello world, I am 3 of 9 - running on trek8 Hello world, I am 4 of 9 - running on trek8 Hello world, I am 5 of 9 - running on trek8 Hello world, I am 6 of 9 - running on trek9 Hello world, I am 7 of 9 - running on trek9 Hello world, I am 8 of 9 - running on trek9 -Steve -----Original Message----- From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Pritchard Jr., Howard Sent: Wednesday, October 10, 2018 7:58 AM To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD Hi Christopher, We hit some problems at LANL trying to use this SLURm feature. At the time, I think SchedMD said there would need to be fixes to the SLURM PMI2 library to get this to work. What version of SLURM are you using? Howard -- Howard Pritchard B Schedule HPC-ENV Office 9, 2nd floor Research Park TA-03, Building 4200, Room 203 Los Alamos National Laboratory On 10/9/18, 8:50 PM, "slurm-users on behalf of Gilles Gouaillardet" <slurm-users-boun...@lists.schedmd.com on behalf of gil...@rist.or.jp> wrote: >Christopher, > > >This looks like a SLURM issue and Open MPI is (currently) out of the >picture. > > >What if you > > >srun --pack-group=0,1 hostname > > >Do you get a similar error ? > > >Cheers, > >Gilles > >On 10/10/2018 3:07 AM, Christopher Benjamin Coffey wrote: >> Hi, >> >> I have a user trying to setup a heterogeneous job with one >>MPI_COMM_WORLD with the following: >> >> ========== >> #!/bin/bash >> #SBATCH --job-name=hetero >> #SBATCH --output=/scratch/cbc/hetero.txt #SBATCH --time=2:00 #SBATCH >> --workdir=/scratch/cbc #SBATCH --cpus-per-task=1 --mem-per-cpu=2g >> --ntasks=1 -C sb #SBATCH packjob #SBATCH --cpus-per-task=1 >> --mem-per-cpu=1g --ntasks=1 -C sl #SBATCH --mail-type=START,END >> >> module load openmpi/3.1.2-gcc-6.2.0 >> >> srun --pack-group=0,1 ~/hellompi >> =========== >> >> >> Yet, we get an error: " srun: fatal: Job steps that span multiple >>components of a heterogeneous job are not currently supported". But >>the docs seem to indicate it should work? >> >> IMPORTANT: The ability to execute a single application across more >>than one job allocation does not work with all MPI implementations or >>Slurm MPI plugins. Slurm's ability to execute such an application can >>be disabled on the entire cluster by adding "disable_hetero_steps" to >>Slurm's SchedulerParameters configuration parameter. >> >> By default, the applications launched by a single execution of the >>srun command (even for different components of the heterogeneous job) >>are combined into one MPI_COMM_WORLD with non-overlapping task IDs. >> >> Does this not work with openmpi? If not, which mpi/slurm config will >>work? We have slurm.conf MpiDefault=pmi2 currently. I've tried a >>modern openmpi, and also mpich, and mvapich2. >> >> Any help would be appreciated, thanks! >> >> Best, >> Chris >> >> — >> Christopher Coffey >> High-Performance Computing >> Northern Arizona University >> 928-523-1167 >> >> > >