Hello, Hoping someone can shed some light on what is causing jobs to run on same nodes simultaneously rather than being actually suspended for the lower priority job? I can provide more info if someone can think of something to help!
# Relevant config. PreemptType=preempt/qos PreemptMode=SUSPEND,GANG PartitionName=general Default=YES Nodes=general OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=general AllowQos=general PartitionName=suspend Default=NO Nodes=general OverSubscribe=FORCE:1 MaxTime=30-00:00:00 Qos=suspend AllowQos=suspend # Qoses Name Priority Preempt PreemptMode ---------- ---------- ---------- ----------- general 1000 suspend cluster suspend 100 cluster # squeue (another note is I see that both processes are actually running at same time and not being timesliced in htop) $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 45085 general stress.s user2 R 7:33 2 node[04-05] 45084 suspend stress-s user1 R 7:40 2 node[04-05] Thanks!