Dear all, I have two users on our cluster who "bought into" it, much like a condo model, by purchasing one single physical node each. For those users, I have attempted to configure two QOS levels, such that when they submit jobs and invoke the QOS, they will have preempt, priority access to resources up to the amount provided in the nodes they purchased. When they are not using those resources, I want them to be available to any user on the system.
Apparently, this is not working as designed. One user who should have had priority was waiting for resources for over 10 minutes on a super small, simple job; based on expected behavior, she should have preempted a running job and more or less started immediately. Something might be wrong in my configuration, and I'd also welcome other thoughts for how to test to ensure the QOS is working properly. Here's what I have in slurm.conf: PreemptType=preempt/qos PreemptMode=REQUEUE The list of QOS seems reasonable: [simmsj@hpc ~]$ sacctmgr show qos format=name,priority Name Priority ---------- ---------- normal 0 hendricks+ 100 douglaslab 100 And here is the sample job invocation that was stuck in the queue: srun -t 45 --cpus-per-task=1 --mem-per-cpu=1gb --qos=douglaslab --pty /bin/bash And here is how I created the QOS in the first place: sacctmgr add qos douglaslab sacctmgr modify qos douglaslab set priority=100 sacctmgr modify qos douglaslab set GrpCPUs=24 #because the node has 24 CPUs Also, I verified that the appropriate users are the QOS groups. In the end, then, I assume that if members of the douglaslab QOS submit jobs that require up to 24 CPUs, then they can preempt running jobs to get that many if they are not immediately available. But this doesn't seem to be working, as noted. Any advice would be welcome! Warmest regards, Jason -- *Jason L. Simms, Ph.D., M.P.H.* Manager of Research and High-Performance Computing XSEDE Campus Champion Lafayette College Information Technology Services 710 Sullivan Rd | Easton, PA 18042 Office: 112 Skillman Library p: (610) 330-5632