[slurm-users] GPU jobs not allocated correctly when requesting more than 1 CPU

Rohith Mohan Fri, 21 Oct 2022 10:16:08 -0700

 Hi,

I recently set up slurm for the first time on our small cluster and got
everything working well except for one issue. When requesting jobs with GPU
and CPU, requesting 1 GPU+1CPU is allocated correctly among the nodes but
requesting 1GPU+2CPUs is not allocated correctly. I'm not sure exactly
what's causing the issue and was hoping someone might have some suggestions.


Slurm version: 22.05.3
OS: RedHat 7.9 (head node), and RedHat 7.4 (compute nodes)
Hardware config: 1 head node, 5 compute nodes each with 2 GPUs and 8 CPUs

Some example scenarios to explain the problem:
Submitting a job requesting 1 CPU and 1 GPU works fine:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=4GB
#SBATCH --cpus-per-task=1
#SBATCH --gpus=1

- Job A requests 1 CPU, 1GPU and 4GB memory -> assigned to node1
- Job B requests 1 CPU, 1GPU and 4GB memory -> assigned to node1
- Job C requests 1 CPU, 1GPU and 4GB memory -> assigned to node2 as there's
only 2 GPUs per node

Submitting a job requesting 2 CPUs and 1 GPU causes issues:
#SBATCH --cpus-per-task=2

- Job A requests 2 CPUs, 1GPU and 4GB memory -> assigned to node1
- Job B requests 2 CPUs, 1GPU and 4GB memory -> assigned to node2 even
though node1 should still have resources available

Including what might be relevant info from slurm.conf below in case it's
helpful:
DefMemPerCPU=2048
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU_Memory
DefCpuPerGPU=1
GresTypes=gpu
NodeName= computenodes [1-5] NodeAddr= computenodes[1-5] CPUs=8
RealMemory=64189 Gres=gpu:2 State=UNKNOWN
PartitionName=batch Nodes=ALL Default=YES MaxTime=INFINITE State=UP

Appreciate any suggestions/ideas!

Thanks,
Rohith

[slurm-users] GPU jobs not allocated correctly when requesting more than 1 CPU

Reply via email to