Thank you Carsten.  I'll take a closer look at the QOS limit approach.

If I'm understanding the documentation correctly, partition limits (non QOS) are set via the slurm.conf file, and although there are options for limiting the max number of nodes for a person, and the max cpus per node, there isn't an option within slurm.conf to limit the max total number of cpus that someone can use, so my original approach will not work.

The QOS option you mention seems to be the way to do it in order to set a default limit for everyone on the partition.

The only other approach I can see would be to set an association limit for every account individually.

Thank you,

-Dj


On 9/23/21 07:18, Carsten Beyer wrote:
Hi Dj,

the solution could be in two QOS. We use something similar to restrict usage of GPU nodes (MaxTresPU=node=2). Examples below are from our Testcluster.

1) create a QOS with e.g. MaxTresPU=cpu=200 and assign it to your partition, e.g.

[root@bta0 ~]# sacctmgr -s show qos maxcpu format=Name,MaxTRESPU
      Name     MaxTRESPU
---------- -------------
    maxcpu        cpu=10
[root@bta0 ~]#
[root@bta0 ~]# scontrol show part maxtresputest
PartitionName=maxtresputest
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=maxcpu

If a user submits jobs requesting more cpus his (new) jobs get 'QOSMaxCpuPerUserLimit' in squeue.

kxxxxxx@btlogin1% squeue
             JOBID PARTITION     NAME     USER ST       TIME NODES NODELIST(REASON)             125316 maxtrespu maxsubmi  kxxxxxx PD 0:00      1 (QOSMaxCpuPerUserLimit)             125317 maxtrespu maxsubmi  kxxxxxx PD 0:00      1 (QOSMaxCpuPerUserLimit)
            125305 maxtrespu maxsubmi  kxxxxxx  R 0:45      1 btc30
            125306 maxtrespu maxsubmi  kxxxxxx  R 0:45      1 btc30

2) create a second QOS with Flags=DenyOnLimit,OverPartQoS and MaxTresPU=400. Assign it to a user that should overcome the limit of 200 cpus, but he will be limited then to 400. That user has to use this QOS, when submiting new jobs, e.g.

[root@bta0 ~]# sacctmgr -s show qos overpart format=Name,Flags%30,MaxTRESPU
      Name                          Flags     MaxTRESPU
---------- ------------------------------ -------------
  overpart        DenyOnLimit,OverPartQOS        cpu=40


Cheers,
Carsten

null

Reply via email to