Suksmandhira,
That qos specifies a walltime, cpu, and memory limit.  From the job script, it 
appears you are within the cpu limit.  But, the job script does not specify 
walltime nor memory and your squeue output is not showing those values (or cpu) 
for the job.
'scontrol show job=JOBID' will show it all values.  Added flags=DenyOnLimit to 
the qos will reject the job when it is over the limit of a QOS, hopefully so 
there are not jobs that will never run sitting in queue.

-b

On 11/7/19 9:37 PM, Sukman wrote:
Hi all,

I am currently having a problem in limiting the number of CPU used for running 
a job.
I tried to limit the CPU to just only 2 from the maximum 56.
But, when I run the job, using only 1 CPU, the QOS has been reached already.
When I set the CPU to 56, the job runs finely.

Does anyone have any suggestion regarding this problem?


Following is the details of the problem.


My node has 56 cores (2sockets x 28cores).


I configured already slurm.conf by enabling the qos/limit enforcement.

#slurm.conf
AccountingStorageEnforce=qos,limits


For QOS itself, I just tried applying a simple limit-CPU number to be 2.

#QOS
sacctmgr show qos where Name=normal_compute 
format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU
       Name   Priority UsageFactor     MaxWall     MaxTRESPU
---------- ---------- ----------- ----------- -------------
normal_co+         10    1.000000    00:01:00  cpu=2,mem=1G


I then applied the QOS to a specific user, sukman.

#QOS-defined user
sacctmgr list association where User=sukman format=User,QOS,
       User                  QOS
---------- --------------------
     sukman       normal_compute


Then, I tried to run a simple bash command, hostname, by just using 1 node, 1 
task, and 1 CPU

#!/bin/bash
#SBATCH --job-name=hostname
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --nodelist=cn110

srun hostname


However, the QOS has been reached already.

squeue
              JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
                 68      defq hostname   sukman PD       0:00      1 
(QOSMaxCpuPerUserLimit)


When I change the CPU limit to the max cores number in a server, 56 cores

sacctmgr show qos where Name=normal_compute 
format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU
       Name   Priority UsageFactor     MaxWall     MaxTRESPU
---------- ---------- ----------- ----------- -------------
normal_co+         10    1.000000    00:01:00 cpu=56,mem=1G


the script runs perfectly.

cat slurm-68.out
cn110



--------------------------------

Suksmandhira H
ITB Indonesia


Reply via email to