Hi:

If you mean "why are the nodes still Drained, now that I fixed the
slurm.conf and restarted (never mind whether the RealMem parameter is
correct)?", try 'scontrol update nodename=str957-bl0-0[1-2] State=RESUME'.

-- 
Paul Brunk, system administrator
Georgia Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia

-----Original Message-----
From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Diego 
Zuccato
Sent: Friday, October 1, 2021 04:23
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] "Low RealMem" after upgrade

[EXTERNAL SENDER - PROCEED CAUTIOUSLY]


Hello all.

I just upgraded to Debian 11 that brings Slurm 21.08 and the newer nodes 
upgraded w/o too many issues (just minor config changes, one being RealMemory 
value in slurm.conf, since for some reason it seems the new slurmd detects 
about 12MB less memory than before).

But the older nodes are still marked IDLE+DRAIN:
-8<--
NodeName=str957-bl0-01 Arch=x86_64 CoresPerSocket=6
    CPUAlloc=0 CPUTot=24 CPULoad=0.39
    AvailableFeatures=ib,blade,intel,avx
    ActiveFeatures=ib,blade,intel,avx
    Gres=(null)
    NodeAddr=str957-bl0-01 NodeHostName=str957-bl0-01 Version=20.11.4
    OS=Linux 5.10.0-8-amd64 #1 SMP Debian 5.10.46-5 (2021-09-23)
    RealMemory=64000 AllocMem=0 FreeMem=63518 Sockets=2 Boards=1
    MemSpecLimit=2048
    State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=2 Owner=N/A MCS_label=N/A
    Partitions=b1
    BootTime=2021-10-01T09:35:42 SlurmdStartTime=2021-10-01T09:36:15
    CfgTRES=cpu=24,mem=62.50G,billing=182
    AllocTRES=
    CapWatts=n/a
    CurrentWatts=0 AveWatts=0
    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
    Reason=Low RealMemory [root@2021-10-01T08:08:18]
    Comment=(null)
-8<--
I already reduced RealMemory line in slurm.conf and restarted both slurmctld 
and slurmd (in case "scontrol reconfigure" was not enough...
not really clear from the docs).

The relevant lines in slurm.conf are:
-8<--
NodeName=DEFAULT            Sockets=2                  ThreadsPerCore=2
  State=UNKNOWN  MemSpecLimit=2048
NodeName=str957-bl0-0[1-2]            CoresPerSocket=6
  RealMemory=64000  Weight=2 Feature=ib,blade,intel,avx
-8<--

And the node says:
-8<--
root@str957-bl0-01:~# slurmd -C
NodeName=str957-bl0-01 CPUs=24 Boards=1 SocketsPerBoard=2
CoresPerSocket=6 ThreadsPerCore=2 RealMemory=64378
UpTime=0-00:37:17
-8<--

I also tried lowering RealMemory setting to 60000, in case MemSpecLimit 
interfered, but the result remains the same.

Any ideas?

TIA!

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 
Bologna - Italy
tel.: +39 051 20 95786


Reply via email to