Hey Rob,
Perhaps something in the direction of srun --ntasks=2 --gres=gpu:4 nvidia-smi , help you?
this will run two tasks each with 4 gpu and execute nvidia-smi,
the output should be similar of doing nvidia-smi on one 8 gpu server


On 22/02/2018 01:26, Rob Middleton wrote:
Hello,

I'm relatively new to administering slurm, so my apologies if I've missed something obvious.

We have nodes of 4 GPU and nodes of 8 GPU. I would like users to be able to request a total number of GPUs they require. The MPI software is not fussed how many nodes it spans.

I had hoped requests such as these would work:
#SBATCH --gres=gpu:8
#SBATCH --exclusive
#SBATCH --nodes=1-2

However as both "gres" (or an alternate workaround "mem") are per-node resources rather than per-job this doesn't work -- a pair of 4-GPU boxes can never be chosen.

So -- is there a way to do this right, or to fake it? Such jobs should run on whatever appropriate hardware configuration is first available. The submitted job script will then slightly reconfigure our software configuration depending on the hardware type it lands on, before launching via srun.


As an alternative -- I note the "heterogeneous jobs" feature. This allows jobs which require resources of "hardware config A" AND "hardware config B". Is there anyway to request one hardware configuration OR another?


I can almost fake it for a single use-case with "constraints", however this syntax doesn't seem understood by the parser code:
--constraints=[grp1|grp2|grp3|grp4]&[gpuA*1&gpuB*1]
--nodes=1-2
--exclusive

With example node configuration:
NodeName=small1 Gres=gpu:4 Feature=gpuA,grp1
NodeName=small2 Gres=gpu:4 Feature=gpuB,grp1
NodeName=small3 Gres=gpu:4 Feature=gpuB,grp2
NodeName=small4 Gres=gpu:4 Feature=gpuB,grp2
NodeName=big1 Gres=gpu:8 Feature=gpuA,gpuB,grp3
NodeName=big2 Gres=gpu:8 Feature=gpuA,gpuB,grp4


All ideas are appreciated.

Thanks,
Rob Middleton.

Reply via email to