Dear all,
I must say, I'm a bit dazzled, since this configuration should not be
valid. This is something, I myself observed. According to the manpage of
slurm.conf, CPUs and Boards are mutually exclusive:
*Boards* Number of Baseboards in nodes with a baseboard controller.
Note that when Boards is specified, SocketsPerBoard, CoresPerSocket, and
ThreadsPerCore should be specified. Boards and CPUs are mutually
exclusive. The default value is 1.
Since I needed to tell slurm to not use the hyperthreads, I halved the
CPUs value and omitted Boards. This configuration works for me:
NodeName=linuxbmc[0021-0036] CPUs=12 SocketsPerBoard=2 CoresPerSocket=6
ThreadsPerCore=2 RealMemory=96508 TmpDisk=41035 State=UNKNOWN
Yet I do not know, why these options are mutually exclusive, since from
my sight of view CPUs is the number of cores/threads, I want slurm to
schedule. The other values, Boards, SocketsPerBoard, CoresPerSocket and
ThreadsPerCore are "architectural" values, which should help in placing
the tasks on the node itself, compare --hint of sbatch.
Best
Marcus
On 05/03/2018 02:28 AM, Matt Hohmeister wrote:
I have a two-node cluster: the server/compute node is a Dell PowerEdge
R730; the compute node, a Dell PowerEdge R630. On both of these nodes,
slurmd -C gives me the exact same line:
[me@odin slurm]$ slurmd -C
NodeName=odin CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10
ThreadsPerCore=2 RealMemory=128655
[me@thor slurm]$ slurmd -C
NodeName=thor CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10
ThreadsPerCore=2 RealMemory=128655
So I edited my slurm.conf appropriately:
NodeName=odin CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10
ThreadsPerCore=2 RealMemory=128655
NodeName=thor CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10
ThreadsPerCore=2 RealMemory=128655
…and it looks good, except for the drain on my server/compute node:
[me@odin slurm]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 drain odin
debug* up infinite 1 idle thor
…for the following reason:
[me@odin slurm]$ sinfo -R
REASON USER TIMESTAMP NODELIST
Low socket*core*thre slurm 2018-05-02T11:55:38 odin
Any ideas?
Thanks!
Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de