Hi, Thanks for your help.
Either setting qos or setting priority doesn't work for me. However I have found the cause if not the reason. Using a Priority setting on the partition called "Priority" in slurm.conf seems to force all jobs waiting on this queue to run first regardless of any qos set on a job. Priority is not a limit, but I think this is a bit inconsistent with the limit hierarchy we see elsewhere and possibly even a bug. 1. Partition QOS limit*2. Job QOS limit* 3. User association 4. Account association(s), ascending the hierarchy 5. Root/Cluster association*6. Partition limit* 7. None So for multiple partitions with differing priorities, I can get the same effect by moving the priority into a qos, applying a qos on the partition, and then taking care to set OverPartQOS flag on the "boost" qos. Does anyone have a feeling for why setting a high Priority on a partition makes jobs run in that partition first regardless that a job in a different Partition may have a much higher overall priority? Sean On Mon, 11 Mar 2019 at 17:00, Sean Brisbane <sean.brisb...@securelinx.com> wrote: > Hi, > > I'm looking to have a way an administrator can boost any job to be next to > run when resources become available. What is the best practice way to do > this? Happy to try something new :-D > > The way I thought to do this was to have a qos with a large priority and > manually assign this to the job. Job 469 is the job in this example I am > trying to elevate to be next in queue. > > scontrol update jobid=469 qos=boost > > sprio shows that this job is the highest priority by quite some way, > however, job nbumber 492 will be next to run > > squeue (qxluding runnign jobs) > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 469 Backgroun sleeping centos PD 0:00 1 > (Resources) > 492 Priority sleepy.s superuse PD 0:00 1 > (Resources) > 448 Backgroun sleepy.s groupboo PD 0:00 1 > (Resources) > 478 Backgroun sleepy.s groupboo PD 0:00 1 > (Resources) > 479 Backgroun sleepy.s groupboo PD 0:00 1 > (Resources) > 480 Backgroun sleepy.s groupboo PD 0:00 1 > (Resources) > 481 Backgroun sleepy.s groupboo PD 0:00 1 > (Resources) > 482 Backgroun sleepy.s groupboo PD 0:00 1 > (Resources) > 483 Backgroun sleepy.s groupboo PD 0:00 1 > (Resources) > 484 Backgroun sleepy.s groupboo PD 0:00 1 > (Resources) > 449 Backgroun sleepy.s superuse PD 0:00 1 > (Resources) > 450 Backgroun sleepy.s superuse PD 0:00 1 > (Resources) > 465 Backgroun sleeping centos PD 0:00 1 > (Resources) > 466 Backgroun sleeping centos PD 0:00 1 > (Resources) > 467 Backgroun sleeping centos PD 0:00 1 > (Resources) > > > [root@master yp]# sprio > JOBID PARTITION PRIORITY AGE FAIRSHARE JOBSIZE > PARTITION QOS > 448 Backgroun 13667 58 484 3125 > 10000 0 > 449 Backgroun 13205 58 23 3125 > 10000 0 > 450 Backgroun 13205 58 23 3125 > 10000 0 > 465 Backgroun 13157 32 0 3125 > 10000 0 > 466 Backgroun 13157 32 0 3125 > 10000 0 > 467 Backgroun 13157 32 0 3125 > 10000 0 > 469 Backgroun 10013157 32 0 3125 > 10000 10000000 > 478 Backgroun 13640 32 484 3125 > 10000 0 > 479 Backgroun 13640 32 484 3125 > 10000 0 > 480 Backgroun 13640 32 484 3125 > 10000 0 > 481 Backgroun 13610 32 454 3125 > 10000 0 > 482 Backgroun 13610 32 454 3125 > 10000 0 > 483 Backgroun 13610 32 454 3125 > 10000 0 > 484 Backgroun 13610 32 454 3125 > 10000 0 > 492 Priority 1003158 11 23 3125 > 1000000 0 > > > I'm trying to troubleshoot why the highest priority job is not next to > run, jobs in the partition called "Priority" seem to run first. > > The job 469 has no qos, partition, user accounts or group limits on the > number of cpus,jobs,nodes etc. I've set this test cluster up from scratch > to be sure! > > [root@master yp]# scontrol show job 469 > JobId=469 JobName=sleeping.sh > UserId=centos(1000) GroupId=centos(1000) MCS_label=N/A > Priority=10013161 Nice=0 Account=default QOS=boost > JobState=PENDING Reason=Resources Dependency=(null) > Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 > RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A > SubmitTime=2019-03-11T16:01:20 EligibleTime=2019-03-11T16:01:20 > StartTime=2020-03-10T15:23:40 EndTime=Unknown Deadline=N/A > PreemptTime=None SuspendTime=None SecsPreSuspend=0 > LastSchedEval=2019-03-11T16:54:44 > Partition=Background AllocNode:Sid=master:1322 > ReqNodeList=(null) ExcNodeList=(null) > NodeList=(null) > NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > TRES=cpu=1,node=1 > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > Gres=(null) Reservation=(null) > OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) > Command=/home/centos/sleeping.sh > WorkDir=/home/centos > StdErr=/home/centos/sleeping.sh.e469 > StdIn=/dev/null > StdOut=/home/centos/sleeping.sh.o469 > Power= > > The partition called "Priority" has a priority boost assigned through qos. > > PartitionName=Priority Nodes=compute[01-02] Default=NO MaxTime=INFINITE > State=UP Priority=1000 QOS=Priority > PartitionName=Background Nodes=compute[01-02] Default=YES > MaxTime=INFINITE State=UP Priority=10 > > Any Ideas would be much appreciated. > > Sean > > > > -- > > -- > > Sean Brisbane | Linux Systems Specialist > > Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin. > Registered in Ireland No. 357396 > www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland > -- -- Sean Brisbane | Linux Systems Specialist Mobile: +353(0)87 627 3024 | Office: +353 1 5065 615 (ext 610) Securelinx Ltd., Pottery Road, Dun Laoghaire, Co. Dublin. Registered in Ireland No. 357396 www.securelinx.com <http://www.securelinx.com/> - Linux Leaders in Ireland