Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Chris Samuel
On 11/10/18 01:27, Christopher Benjamin Coffey wrote: That is interesting. It is disabled in 17.11.10: Yeah, I seem to remember seeing a commit that disabled in 17.11.x. I don't think it's meant to work before 18.08.x (which is what the website will be talking about). All the best, Chris -

Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Christopher Benjamin Coffey
That is interesting. It is disabled in 17.11.10: static bool _enable_pack_steps(void) { bool enabled = false; char *sched_params = slurm_get_sched_params(); if (sched_params && strstr(sched_params, "disable_hetero_steps")) enabled = false; else if (

Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Mehlberg, Steve
bject: Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD Hi Christopher, We hit some problems at LANL trying to use this SLURm feature. At the time, I think SchedMD said there would need to be fixes to the SLURM PMI2 library to get this to work. What version of SLURM are you using? H

Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Pritchard Jr., Howard
Hi Christopher, We hit some problems at LANL trying to use this SLURm feature. At the time, I think SchedMD said there would need to be fixes to the SLURM PMI2 library to get this to work. What version of SLURM are you using? Howard -- Howard Pritchard B Schedule HPC-ENV Office 9, 2nd floor

Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Chris Samuel
On 10/10/18 05:07, Christopher Benjamin Coffey wrote: Yet, we get an error: " srun: fatal: Job steps that span multiple components of a heterogeneous job are not currently supported". But the docs seem to indicate it should work? Which version of Slurm are you on? It was disabled by default i

Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-09 Thread Gilles Gouaillardet
Christopher, This looks like a SLURM issue and Open MPI is (currently) out of the picture. What if you srun --pack-group=0,1 hostname Do you get a similar error ? Cheers, Gilles On 10/10/2018 3:07 AM, Christopher Benjamin Coffey wrote: Hi, I have a user trying to setup a heterogene

[slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-09 Thread Christopher Benjamin Coffey
Hi, I have a user trying to setup a heterogeneous job with one MPI_COMM_WORLD with the following: == #!/bin/bash #SBATCH --job-name=hetero #SBATCH --output=/scratch/cbc/hetero.txt #SBATCH --time=2:00 #SBATCH --workdir=/scratch/cbc #SBATCH --cpu