[slurm-users] Can't specify multiple partitions when submitting GPU jobs
Hi, I have defined a partition for each GPU type we have in the cluster. This was mainly because I have different Node types for each GPU type and I want to set `DefCpuPerGPU` `DefMemPerGPU` for each of them. Unfortunately one can't set them per node but can do that per partition. Now sometimes people don't care about the GPU type and would like any of the partitions to pick up the job. The `--partition` in `sbatch` does allow specifying multiple partitions and this works fine when I'm not specifying `--gpu`. However when I add do something like `sbatch -p A,B --gpus 1 script.sh` then I get "srun: job 6279 queued and waiting for resources" even though partition B does have a GPU to offer. Strangely if the first partition specified (i.e. A) had a free GPU it would allocate the GPU and run the job. Is this a bug? Perhaps related to this: https://groups.google.com/g/slurm-users/c/UOUVfkajUBQ -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Can't specify multiple partitions when submitting GPU jobs
My partitions definition is super simple: ``` PartitionName=t4 Nodes=slurm-t4-[1-30] DEFAULT=YES MaxTime=INFINITE State=UP DefCpuPerGPU=16 DefMemPerGPU=14350 PartitionName=a100-40 Nodes=slurm-a100-40gb-[1-30] MaxTime=INFINITE State=UP DefCpuPerGPU=12 DefMemPerGPU=85486 PartitionName=a100-80 Nodes=slurm-a100-80gb-[1-30] MaxTime=INFINITE State=UP DefCpuPerGPU=12 DefMemPerGPU=85486 ``` I looked at PARTITION CONFIGURATION section of slurm.conf page but don't see anything that would relate to multiple partitions and/or number of tasks. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Can't specify multiple partitions when submitting GPU jobs
Update: I also noticed that specifying -ntasks makes a difference when --gpus is present. if I have two partitions a100 and h100 that both have free GPUs: ✅ h100 specified first in -p: works sbatch -p h100,a100 --gpus h100:1 script.sh ❌ h100 specified second: doesn't work sbatch -p a100,h100 --gpus h100:1 script.sh Adding --ntasks: works ✅ sbatch -p a100,h100 --gpus h100:1 --ntasks 1 script.sh -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com