Hi, I recently set up slurm for the first time on our small cluster and got everything working well except for one issue. When requesting jobs with GPU and CPU, requesting 1 GPU+1CPU is allocated correctly among the nodes but requesting 1GPU+2CPUs is not allocated correctly. I'm not sure exactly what's causing the issue and was hoping someone might have some suggestions.
Slurm version: 22.05.3 OS: RedHat 7.9 (head node), and RedHat 7.4 (compute nodes) Hardware config: 1 head node, 5 compute nodes each with 2 GPUs and 8 CPUs Some example scenarios to explain the problem: Submitting a job requesting 1 CPU and 1 GPU works fine: #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem=4GB #SBATCH --cpus-per-task=1 #SBATCH --gpus=1 - Job A requests 1 CPU, 1GPU and 4GB memory -> assigned to node1 - Job B requests 1 CPU, 1GPU and 4GB memory -> assigned to node1 - Job C requests 1 CPU, 1GPU and 4GB memory -> assigned to node2 as there's only 2 GPUs per node Submitting a job requesting 2 CPUs and 1 GPU causes issues: #SBATCH --cpus-per-task=2 - Job A requests 2 CPUs, 1GPU and 4GB memory -> assigned to node1 - Job B requests 2 CPUs, 1GPU and 4GB memory -> assigned to node2 even though node1 should still have resources available Including what might be relevant info from slurm.conf below in case it's helpful: DefMemPerCPU=2048 SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_CPU_Memory DefCpuPerGPU=1 GresTypes=gpu NodeName= computenodes [1-5] NodeAddr= computenodes[1-5] CPUs=8 RealMemory=64189 Gres=gpu:2 State=UNKNOWN PartitionName=batch Nodes=ALL Default=YES MaxTime=INFINITE State=UP Appreciate any suggestions/ideas! Thanks, Rohith