Dear all,

I must say, I'm a bit dazzled, since this configuration should not be valid. This is something, I myself observed. According to the manpage of slurm.conf, CPUs and Boards are mutually exclusive:

*Boards*    Number of Baseboards in nodes with a baseboard controller.  Note that when Boards is specified, SocketsPerBoard, CoresPerSocket, and ThreadsPerCore should be specified.  Boards and CPUs  are  mutually  exclusive.   The default value is 1.


Since I needed to tell slurm to not use the hyperthreads, I halved the CPUs value and omitted Boards. This configuration works for me:

NodeName=linuxbmc[0021-0036] CPUs=12 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=96508 TmpDisk=41035 State=UNKNOWN

Yet I do not know, why these options are mutually exclusive, since from my sight of view CPUs is the number of cores/threads, I want slurm to schedule. The other values, Boards, SocketsPerBoard, CoresPerSocket and ThreadsPerCore are "architectural" values, which should help in placing the tasks on the node itself, compare --hint of sbatch.


Best
Marcus


On 05/03/2018 02:28 AM, Matt Hohmeister wrote:

I have a two-node cluster: the server/compute node is a Dell PowerEdge R730; the compute node, a Dell PowerEdge R630. On both of these nodes, slurmd -C gives me the exact same line:

[me@odin slurm]$ slurmd -C

NodeName=odin CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=128655

[me@thor slurm]$ slurmd -C

NodeName=thor CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=128655

So I edited my slurm.conf appropriately:

NodeName=odin CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=128655

NodeName=thor CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 ThreadsPerCore=2 RealMemory=128655

…and it looks good, except for the drain on my server/compute node:

[me@odin slurm]$ sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST

debug* up   infinite      1  drain odin

debug* up   infinite      1   idle thor

…for the following reason:

[me@odin slurm]$ sinfo -R

REASON USER      TIMESTAMP           NODELIST

Low socket*core*thre slurm     2018-05-02T11:55:38 odin

Any ideas?

Thanks!

Matt Hohmeister

Systems and Network Administrator

Department of Psychology

Florida State University

PO Box 3064301

Tallahassee, FL 32306-4301

Phone: +1 850 645 1902

Fax: +1 850 644 7739


--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Reply via email to