Quick follow up. I see the Sockets for the head node is 1 while for the compute nodes is 32. And I think that is the reason, why slurm only see one cpu (CPUTot=1).
May I ask what is the difference between CPUs and Sockets in slurm.conf? Regards, Mahmood On Sat, May 5, 2018 at 9:24 PM, Mahmood Naderan <mahmood...@gmail.com> wrote: > Hi, > I also have the same problem. I think by default, slurm won't add the > head node as a compute node. I manually set the state to resume, > However, the number of cores is still low (1) and not what I specified > in slurm.conf > > > [root@rocks7 mahmood]# scontrol show node rocks7 > NodeName=rocks7 Arch=x86_64 CoresPerSocket=1 > CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.14 > AvailableFeatures=(null) > ActiveFeatures=(null) > Gres=(null) > NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=17.11 > OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 > RealMemory=64261 AllocMem=0 FreeMem=1247 Sockets=1 Boards=1 > State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=281775 Weight=1 Owner=N/A > MCS_label=N/A > Partitions=WHEEL,EMERALD > BootTime=2018-04-13T13:04:59 SlurmdStartTime=2018-04-13T13:05:17 > CfgTRES=cpu=1,mem=64261M,billing=1 > AllocTRES= > CapWatts=n/a > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > Reason=Low socket*core*thread count, Low CPUs [root@2018-05-05T21:18:05] > > [root@rocks7 mahmood]# scontrol update node=rocks7 state=resume > [root@rocks7 mahmood]# scontrol show node rocks7 > NodeName=rocks7 Arch=x86_64 CoresPerSocket=1 > CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.14 > AvailableFeatures=(null) > ActiveFeatures=(null) > Gres=(null) > NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=17.11 > OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 > RealMemory=64261 AllocMem=0 FreeMem=1247 Sockets=1 Boards=1 > State=IDLE ThreadsPerCore=1 TmpDisk=281775 Weight=1 Owner=N/A MCS_label=N/A > Partitions=WHEEL,EMERALD > BootTime=2018-04-13T13:04:59 SlurmdStartTime=2018-04-13T13:05:17 > CfgTRES=cpu=1,mem=64261M,billing=1 > AllocTRES= > CapWatts=n/a > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > [root@rocks7 mahmood]# grep -A 3 -B 3 rocks7 /etc/slurm/slurm.conf > DebugFlags=Priority,NO_CONF_HASH,backfill,BackfillMap > > NodeName=DEFAULT State=UNKNOWN > NodeName=rocks7 NodeAddr=10.1.1.1 CPUs=20 > PartitionName=DEFAULT AllocNodes=rocks7 State=UP > PartitionName=DEBUG > > ####### Power Save Begin ################## > > > > > > Regards, > Mahmood > > > > > On Sat, May 5, 2018 at 5:06 PM, Chris Samuel <ch...@csamuel.org> wrote: >> On Thursday, 3 May 2018 10:28:46 AM AEST Matt Hohmeister wrote: >> >>> …and it looks good, except for the drain on my server/compute node: >> >> I think if you've had the config wrong at some point in the past then >> slurmctld >> will remember the error and you'll need to manually clear it with: >> >> scontrol update node=${NODE} state=resume >> >> All the best, >> Chris >> -- >> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC >> >>