Hey everyone, Perhaps I am asking a basic question, but I really dont understand how the preemption works. The scenario(simplified for the example) is like this: Nodes: NodeName=A1 CPUS=2 RealMemory=128906 TmpDisk=117172 NodeName=A2 CPUS=30 RealMemory=128906 TmpDisk=117172 Gres=gpu:3 Partitions: PartitionName=lab1 Nodes=A2 QOS=lab Default=No State=UP PartitionName=all Nodes=A2,A1 QOS=normal Default=Yes State=UP Users: u1 : qos=lab u2: qos=normal commands(in this order): u2: srun --gres=gpu:2 --pty bash u1: srun --gres=gpu:2 --pty bash result squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %Q" JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) PRIORITY 318 lab1 bash u1 PD 0:00 1 (Resources) 101177 317 all bash u2 R 0:21 1 A2 20 As you can see u1 didnt get his resources because(I believe) qos cannot preempt another qos which run on different partition, oven though they use the same resources. How should i configure the cluster so that all users with specific qos(lab), can suspend jobs in all other qos(not lab) for specific partition(lab1)? sacctmgr show qos Name Priority GraceTime Preempt PreemptMode lab1 1000 00:01:00 normal suspend normal 0 00:00:00 slurm.conf: PreemptType=preempt/qos PreemptMode=suspend,gang PriorityType=priority/multifactor PriorityDecayHalfLife=30-0 PriorityMaxAge=10000 PriorityWeightFairshare=10000 PriorityWeightQOS=100000 AccountingStorageEnforce=associations,limits,qos Thanks in advance, Nadav |