Re: [slurm-users] I can't seem to use all the CPUs in my Cluster?

Brian Andrus Tue, 13 Dec 2022 07:51:54 -0800

Gary,

Well your first issue is using Cyclecloud, but that is mostly opinion :)

Your error states there aren't enough CPUs in the partition, which meanswe should take a look at the partition settings.

Take a look at 'scontrol show partition hpc' and see how many nodes areassigned to it. Also check the state of the nodes with 'sinfo'

It would also be good to ensure the node settings are right. Run 'slurmd-C' on a node and see if the output matches what is in the config.


Brian Andrus

On 12/13/2022 1:38 AM, Gary Mansell wrote:

Dear Slurm Users, perhaps you can help me with a problem that I amhaving using the Scheduler (I am new to this, so please forgive me forany stupid mistakes/misunderstandings).
I am not able to submit a Multi-Threaded MPI job on a small democluster that I have setup using Azure CycleCloud that uses all the 10xCPUs on my cluster, and I don’t understand why – perhaps you canexplain why and how I can fix this to use all available CPUs?
The hpc partition that I have setup consists of 5 nodes (Azure VM type= Standard_F2s_v2), each with 2 cpu’s (I presume that these areHyperthreaded cores, rather than 2 cpus – but I am not certain of this)?
[azccadmin@ricslurm-hpc-pg0-1 ~]$ cat /proc/cpuinfo

processor : 0

vendor_id : GenuineIntel

cpu family      : 6

model : 106

model name      : Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz

stepping : 6

microcode : 0xffffffff

cpu MHz : 2793.436

cache size      : 49152 KB

physical id     : 0

siblings : 2

core id         : 0

cpu cores       : 1

apicid        : 0

initial apicid  : 0

fpu : yes

fpu_exception : yes

cpuid level     : 21

wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmovpat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lmconstant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrandhypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmiept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512favx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vlxsaveopt xsavec md_clear
bogomips : 5586.87

clflush size    : 64

cache_alignment : 64

address sizes   : 46 bits physical, 48 bits virtual

power management:

processor : 1

vendor_id : GenuineIntel

cpu family      : 6

model : 106

model name      : Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz

stepping : 6

microcode : 0xffffffff

cpu MHz         : 2793.436

cache size      : 49152 KB

physical id     : 0

siblings : 2

core id         : 0

cpu cores       : 1

apicid : 1

initial apicid  : 1

fpu : yes

fpu_exception : yes

cpuid level     : 21

wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmovpat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lmconstant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrandhypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmiept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512favx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vlxsaveopt xsavec md_clear
bogomips : 5586.87

clflush size    : 64

cache_alignment : 64

address sizes   : 46 bits physical, 48 bits virtual

power management:

This is how Slurm sees one of the nodes:

[azccadmin@ricslurm-scheduler LID_CAVITY]$ scontrol show nodes

NodeName=ricslurm-hpc-pg0-1 Arch=x86_64 CoresPerSocket=1

CPUAlloc=0 CPUEfctv=1 CPUTot=1 CPULoad=0.88

AvailableFeatures=cloud

ActiveFeatures=cloud

Gres=(null)
NodeAddr=ricslurm-hpc-pg0-1 NodeHostName=ricslurm-hpc-pg0-1Version=22.05.3
OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020

RealMemory=3072 AllocMem=0 FreeMem=1854 Sockets=1 Boards=1
State=IDLE+CLOUD ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/AMCS_label=N/A
Partitions=hpc

BootTime=2022-12-12T17:42:27 SlurmdStartTime=2022-12-12T17:42:28

LastBusyTime=2022-12-12T17:52:29

CfgTRES=cpu=1,mem=3G,billing=1

AllocTRES=

CapWatts=n/a

CurrentWatts=0 AveWatts=0

ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
This is the Slurm Job Control Script I have come up with to run theVectis Job (I have set 5x Node, 1x CPU, and 2x Threads – is this right?):
#!/bin/bash

## Job name

#SBATCH --job-name=run-grma

#

## File to write standard output and error

#SBATCH --output=run-grma.out

#SBATCH --error=run-grma.err

#

## Partition for the cluster (you might not need that)

#SBATCH --partition=hpc

#

## Number of nodes

#SBATCH --nodes=5

#

## Number of CPUs per nodes

#SBATCH --ntasks-per-node=1

#

## Number of CPUs per task

#SBATCH --cpus-per-task=2

#

## General

module purge

## Initialise VECTIS 2022.3b4

if [ -d /shared/apps/RealisSimulation/2022.3/bin ]

then

export PATH=$PATH:/shared/apps/RealisSimulation/2022.3/bin

else

echo "Failed to Initialise VECTIS"

fi

## Run

vpre -V 2022.3 -np $SLURM_NTASKS /shared/data/LID_CAVITY/files/lid.GRD
vsolve -V 2022.3 -np $SLURM_NTASKS -mpi intel_2018.4 -rdmu/shared/data/LID_CAVITY/files/lid_no_write.inp
But, the submitted job will not run as it says that there is notenough CPUs.
Here is the debug log from slurmctld – where you can see that it issaying the job has requested 10 CPUs (which is what I want), but thehpc partition only has 5 (which I think is wrong?):
[2022-12-13T09:05:01.177] debug2: Processing RPC: REQUEST_NODE_INFOfrom UID=0
[2022-12-13T09:05:01.370] debug2: Processing RPC:REQUEST_SUBMIT_BATCH_JOB from UID=20001
[2022-12-13T09:05:01.371] debug3: _set_hostname: Using auth hostnamefor alloc_node: ricslurm-scheduler
[2022-12-13T09:05:01.371] debug3: JobDesc: user_id=20001 JobId=N/Apartition=hpc name=run-grma
[2022-12-13T09:05:01.371] debug3: cpus=10-4294967294 pn_min_cpus=2core_spec=-1
[2022-12-13T09:05:01.371] debug3: Nodes=5-[5] Sock/Node=65534Core/Sock=65534 Thread/Core=65534
[2022-12-13T09:05:01.371] debug3:pn_min_memory_job=18446744073709551615 pn_min_tmp_disk=-1
[2022-12-13T09:05:01.371] debug3: immediate=0 reservation=(null)
[2022-12-13T09:05:01.371] debug3: features=(null)batch_features=(null) cluster_features=(null) prefer=(null)
[2022-12-13T09:05:01.371] debug3: req_nodes=(null) exc_nodes=(null)
[2022-12-13T09:05:01.371] debug3: time_limit=15-15 priority=-1contiguous=0 shared=-1
[2022-12-13T09:05:01.371] debug3: kill_on_node_fail=-1 script=#!/bin/bash

## Job name

#SBATCH --job-n...
[2022-12-13T09:05:01.371] debug3:argv="/shared/data/LID_CAVITY/slurm-runit.sh"
[2022-12-13T09:05:01.371] debug3:environment=XDG_SESSION_ID=12,HOSTNAME=ricslurm-scheduler,SELINUX_ROLE_REQUESTED=,...
[2022-12-13T09:05:01.371] debug3: stdin=/dev/nullstdout=/shared/data/LID_CAVITY/run-grma.outstderr=/shared/data/LID_CAVITY/run-grma.err
[2022-12-13T09:05:01.372] debug3: work_dir=/shared/data/LID_CAVITYalloc_node:sid=ricslurm-scheduler:13464
[2022-12-13T09:05:01.372] debug3: power_flags=
[2022-12-13T09:05:01.372] debug3: resp_host=(null) alloc_resp_port=0other_port=0
[2022-12-13T09:05:01.372] debug3: dependency=(null) account=(null)qos=(null) comment=(null)
[2022-12-13T09:05:01.372] debug3: mail_type=0 mail_user=(null) nice=0num_tasks=5 open_mode=0 overcommit=-1 acctg_freq=(null)
[2022-12-13T09:05:01.372] debug3: network=(null) begin=Unknowncpus_per_task=2 requeue=-1 licenses=(null)
[2022-12-13T09:05:01.372] debug3: end_time= signal=0@0wait_all_nodes=-1 cpu_freq=
[2022-12-13T09:05:01.372] debug3: ntasks_per_node=1ntasks_per_socket=-1 ntasks_per_core=-1 ntasks_per_tres=-1
[2022-12-13T09:05:01.372] debug3: mem_bind=0:(null) plane_size:65534

[2022-12-13T09:05:01.372] debug3: array_inx=(null)

[2022-12-13T09:05:01.372] debug3: burst_buffer=(null)

[2022-12-13T09:05:01.372] debug3: mcs_label=(null)

[2022-12-13T09:05:01.372] debug3: deadline=Unknown
[2022-12-13T09:05:01.372] debug3: bitflags=0x1a00c000delay_boot=4294967294
[2022-12-13T09:05:01.372] debug3: job_submit/lua:slurm_lua_loadscript: skipping loading Lua script:/etc/slurm/job_submit.lua
[2022-12-13T09:05:01.372] lua: Setting reqswitch to 1.

[2022-12-13T09:05:01.372] lua: returning.
[2022-12-13T09:05:01.372] debug2: _part_access_check: Job requestedtoo many CPUs (10) of partition hpc(5)
[2022-12-13T09:05:01.373] debug2: _part_access_check: Job requestedtoo many CPUs (10) of partition hpc(5)
[2022-12-13T09:05:01.373] debug2: JobId=1 can't run in partition hpc:More processors requested than permitted
The job will run fine if I use the below settings (across 5 nodes, butonly using one of the two CPUs on each node):
## Number of nodes

#SBATCH --nodes=5

#

## Number of CPUs per nodes

#SBATCH --ntasks-per-node=1

#

## Number of CPUs per task

#SBATCH --cpus-per-task=1
Here is the successfully submitted Job details showing it using 5CPU’s (only one CPU per node) across 5x Nodes:
[azccadmin@ricslurm-scheduler LID_CAVITY]$ scontrol show job 3

JobId=3 JobName=run-grma

UserId=azccadmin(20001) GroupId=azccadmin(20001) MCS_label=N/A

Priority=4294901757 Nice=0 Account=(null) QOS=(null)

JobState=RUNNING Reason=None Dependency=(null)

Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0

RunTime=00:07:35 TimeLimit=00:15:00 TimeMin=N/A

SubmitTime=2022-12-12T17:32:01 EligibleTime=2022-12-12T17:32:01

AccrueTime=2022-12-12T17:32:01

StartTime=2022-12-12T17:42:46 EndTime=2022-12-12T17:57:46 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-12-12T17:32:01Scheduler=Main
Partition=hpc AllocNode:Sid=ricslurm-scheduler:11723

ReqNodeList=(null) ExcNodeList=(null)

NodeList=ricslurm-hpc-pg0-[1-5]

BatchHost=ricslurm-hpc-pg0-1

NumNodes=5 NumCPUs=5 NumTasks=5 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

TRES=cpu=5,mem=15G,node=5,billing=5

Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*

MinCPUsNode=1 MinMemoryCPU=3G MinTmpDiskNode=0

Features=(null) DelayBoot=00:00:00

OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

Command=/shared/data/LID_CAVITY/slurm-runit.sh

WorkDir=/shared/data/LID_CAVITY

StdErr=/shared/data/LID_CAVITY/run-grma.err

StdIn=/dev/null

StdOut=/shared/data/LID_CAVITY/run-grma.out

Switches=1@00:00:24

Power=
What am I doing wrong here - how do I get it to run the job on bothCPU’s on all 5 nodes (i.e. fully utilising the available clusterresources of 10x CPUs)?
Regards

Gary

Re: [slurm-users] I can't seem to use all the CPUs in my Cluster?

Reply via email to