[slurm-users] 'slurmd -c' not returning correct information

Prentice Bisbal Thu, 17 Jan 2019 12:13:30 -0800

It appears that 'slurmd -C is not returning the correct information forsome of the systems in my very heterogeneous cluster.


For example, take the node dawson081:


[root@dawson081 ~]# slurmd -C
NodeName=dawson081 slurmd: Considering each NUMA node as a socket

CPUs=32 Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1RealMemory=64554

UpTime=2-09:30:47

Since Boards and CPUS are mutually exclusive, I omitted CPUs and addedthis line to my slurm.conf:

NodeName=dawson[064,066,068-069,071-072,074-079,081,083,085-086,088-099,101-102,105,108-117]Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1RealMemory=64554

When I restart slurm, however, I get the following messages inslurmctld.log:

[2019-01-17T14:54:47.788] error: Node dawson081 has highsocket,core,thread count (4,8,1 > 2,16,1), extra resources ignored


lscpu on that same node shows a different hardware layout:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          4
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 1
Model name:            AMD Opteron(TM) Processor 6274
Stepping:              2
CPU MHz:               2200.000
BogoMIPS:              4399.39
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              6144K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15
NUMA node2 CPU(s):     16-23
NUMA node3 CPU(s):     24-31

Both slurmd and slurmctld are version 18.08.4. I built the Slurm RPMsfor both at the same time on the same system, so they were linked to thesame hwloc. Any ideas why there's a discrepancy? How should I deal withthis?

Both the compute node and the Slurm controller are using CentOS 6.10 andhave hwloc-1.5-3 installed.


Thanks for the help

--
Prentice

[slurm-users] 'slurmd -c' not returning correct information

Reply via email to