Andrew,
you could try change it to the following:
/etc/slurm/slurm.conf:
NodeName=node[1-3] CPUs=40 RealMemory=48000 Sockets=2
CoresPerSocket=10 ThreadsPerCore=2 Feature="p4000" Gres=gpu:pascal:8
State=UNKNOWN
NodeName=node[4-5,7-10] CPUs=8 RealMemory=48000 Sockets=2
CoresPerSocket=4 ThreadsPerCore=1 Feature="p1000" Gres=gpu:pascal:4
State=UNKNOWN
NodeName=node[6] CPUs=24 RealMemory=30000 Sockets=2
CoresPerSocket=6 ThreadsPerCore=2 Feature="p1000" Gres=gpu:pascal:4
State=UNKNOWN
/etc/slurm/gres.conf
NodeName=node[1-3] Name=gpu Type=pascal File=/dev/nvidia[0-7]
NodeName=node[4-10] Name=gpu Type=pascal File=/dev/nvidia[0-4]
Best
Marcus
On 5/15/20 11:48 PM, Speer, Andrew wrote:
I've run into a bit of an issue when trying to define GPU's in our
slurm conf. Any insight is appreciated.
Hopefully relevant lines from the configs below.
Error:
[2020-05-15T16:35:14.862] error: gres_plugin_node_config_unpack: No
plugin configured to process GRES data from node node3 (Name:gpu
Type:(null) PluginID:7696487 Count:2)
[2020-05-15T16:35:15.321] error: gres_plugin_node_config_unpack: No
plugin configured to process GRES data from node node4 (Name:gpu
Type:(null) PluginID:7696487 Count:1)
[2020-05-15T16:35:15.738] error: gres_plugin_node_config_unpack: No
plugin configured to process GRES data from node node5 (Name:gpu
Type:(null) PluginID:7696487 Count:1)
[2020-05-15T16:35:16.229] error: gres_plugin_node_config_unpack: No
plugin configured to process GRES data from node node6 (Name:gpu
Type:(null) PluginID:7696487 Count:1)
/etc/slurm/slurm.conf:
GresTypes=gpu
NodeName=node[1-3] CPUs=40 RealMemory=48000 Sockets=2
CoresPerSocket=10 ThreadsPerCore=2 Feature="pascal,p4000" Gres=gpu:8
State=UNKNOWN
NodeName=node[4-5,7-10] CPUs=8 RealMemory=48000 Sockets=2
CoresPerSocket=4 ThreadsPerCore=1 Feature="pascal,p1000" Gres=gpu:8
State=UNKNOWN
NodeName=node[6] CPUs=24 RealMemory=30000 Sockets=2
CoresPerSocket=6 ThreadsPerCore=2 Feature="pascal,p1000" Gres=gpu:8
State=UNKNOWN
/etc/slurm/gres.conf
NodeName=node[1-3] Name=gpu File=/dev/nvidia[0-7]
NodeName=node[4-10] Name=gpu File=/dev/nvidia[0-4]
scontrol show node node1
NodeName=node1 Arch=x86_64 CoresPerSocket=10
CPUAlloc=0 CPUTot=40 CPULoad=1.75
AvailableFeatures=pascal,p4000
ActiveFeatures=pascal,p4000
Gres=(null) <------------------------
NodeAddr=node1 NodeHostName=node1
OS=Linux 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019
RealMemory=48000 AllocMem=0 FreeMem=57465 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=pharmacy
BootTime=2020-05-15T09:26:45 SlurmdStartTime=2020-05-15T16:35:13
CfgTRES=cpu=40,mem=48000M,billing=40
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de