Hi I believe this is how the preemption algorithm works- it selects the entire node's resources:
> For performance reasons, the backfill scheduler reserves whole nodes for jobs, not partial nodes. - https://slurm.schedmd.com/preempt.html#limitations However, that does specifically call out the backfill scheduler. Is that the scheduler type you're using? - Michael On Tue, Jan 17, 2023 at 4:06 AM Michał Kadlof <michal.kad...@pw.edu.pl> wrote: > Hi, > > I struggle with configuring job preempting. I have nodes with 8 Nvidia > A100 GPUs. I have two partitions: short (lower priority) and sfglab (higher > priority). I want to allow higher priority jobs to preempt (REQUEUE mode) > lower priority job. It looks like it works, however it works too good. > > Job from higher priority partition preempts entire host instead of only > single job which would be enough to release resources for higher priority > partition. Whats more it lock the rest of resources until high-prio job > will end. What am I doing wrong? > > Here is example: > > $ srun --test-only -G1 -c1 --mem 1M -p sfglab > srun: Job 501151 to start at 2023-01-17T12:46:01 using 1 processors on > nodes dgx-1 in partition sfglab > srun: Preempts: 363278,501001,501029,501075,501076,501077,501120,501121 > > To release these resources it would be enough to preempt one job instead > of all. > > > Here is my config: > > slurm.conf > > (...) > > DefMemPerCPU = 100 > JobAcctGatherFrequency = 30 > JobAcctGatherType = jobacct_gather/linux > PreemptMode = REQUEUE > PreemptType = preempt/partition_prio > PreemptExemptTime = 00:00:00 > SelectType = select/cons_tres > SelectTypeParameters = CR_CORE_MEMORY > > (...) > > PartitionName=short Nodes=dgx-[1-4],sr-[1-3] MaxTime=1-0 State=UP > PriorityTier=10000 Default=YES DefaultTime=0-01:00:00 OverSubscribe=NO > PreemptMode=requeue > > PartitionName=sfglab Nodes=dgx-1 MaxTime=10-0 State=UP PriorityTier=20000 > PreemptMode=off OverSubscribe=NO AllowAccounts=sfglab > -- > best regards | pozdrawiam serdecznie > *Michał Kadlof* > Head of the high performance computing center > EdenN cluster administrator > Faculty of Mathematics and Computer Science > Warsaw University of Technology >