Suksmandhira, That qos specifies a walltime, cpu, and memory limit. From the job script, it appears you are within the cpu limit. But, the job script does not specify walltime nor memory and your squeue output is not showing those values (or cpu) for the job. 'scontrol show job=JOBID' will show it all values. Added flags=DenyOnLimit to the qos will reject the job when it is over the limit of a QOS, hopefully so there are not jobs that will never run sitting in queue.
-b On 11/7/19 9:37 PM, Sukman wrote:
Hi all, I am currently having a problem in limiting the number of CPU used for running a job. I tried to limit the CPU to just only 2 from the maximum 56. But, when I run the job, using only 1 CPU, the QOS has been reached already. When I set the CPU to 56, the job runs finely. Does anyone have any suggestion regarding this problem? Following is the details of the problem. My node has 56 cores (2sockets x 28cores). I configured already slurm.conf by enabling the qos/limit enforcement. #slurm.conf AccountingStorageEnforce=qos,limits For QOS itself, I just tried applying a simple limit-CPU number to be 2. #QOS sacctmgr show qos where Name=normal_compute format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU Name Priority UsageFactor MaxWall MaxTRESPU ---------- ---------- ----------- ----------- ------------- normal_co+ 10 1.000000 00:01:00 cpu=2,mem=1G I then applied the QOS to a specific user, sukman. #QOS-defined user sacctmgr list association where User=sukman format=User,QOS, User QOS ---------- -------------------- sukman normal_compute Then, I tried to run a simple bash command, hostname, by just using 1 node, 1 task, and 1 CPU #!/bin/bash #SBATCH --job-name=hostname #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --nodelist=cn110 srun hostname However, the QOS has been reached already. squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 68 defq hostname sukman PD 0:00 1 (QOSMaxCpuPerUserLimit) When I change the CPU limit to the max cores number in a server, 56 cores sacctmgr show qos where Name=normal_compute format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU Name Priority UsageFactor MaxWall MaxTRESPU ---------- ---------- ----------- ----------- ------------- normal_co+ 10 1.000000 00:01:00 cpu=56,mem=1G the script runs perfectly. cat slurm-68.out cn110 -------------------------------- Suksmandhira H ITB Indonesia