Hi all, I am currently having a problem in limiting the number of CPU used for running a job. I tried to limit the CPU to just only 2 from the maximum 56. But, when I run the job, using only 1 CPU, the QOS has been reached already. When I set the CPU to 56, the job runs finely.
Does anyone have any suggestion regarding this problem? Following is the details of the problem. My node has 56 cores (2sockets x 28cores). I configured already slurm.conf by enabling the qos/limit enforcement. #slurm.conf AccountingStorageEnforce=qos,limits For QOS itself, I just tried applying a simple limit-CPU number to be 2. #QOS sacctmgr show qos where Name=normal_compute format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU Name Priority UsageFactor MaxWall MaxTRESPU ---------- ---------- ----------- ----------- ------------- normal_co+ 10 1.000000 00:01:00 cpu=2,mem=1G I then applied the QOS to a specific user, sukman. #QOS-defined user sacctmgr list association where User=sukman format=User,QOS, User QOS ---------- -------------------- sukman normal_compute Then, I tried to run a simple bash command, hostname, by just using 1 node, 1 task, and 1 CPU #!/bin/bash #SBATCH --job-name=hostname #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --nodelist=cn110 srun hostname However, the QOS has been reached already. squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 68 defq hostname sukman PD 0:00 1 (QOSMaxCpuPerUserLimit) When I change the CPU limit to the max cores number in a server, 56 cores sacctmgr show qos where Name=normal_compute format=Name,Priority,UsageFactor,MaxWall,MaxTRESPU Name Priority UsageFactor MaxWall MaxTRESPU ---------- ---------- ----------- ----------- ------------- normal_co+ 10 1.000000 00:01:00 cpu=56,mem=1G the script runs perfectly. cat slurm-68.out cn110 -------------------------------- Suksmandhira H ITB Indonesia