Re: [slurm-users] [External] Re: Slurm queue seems to be completely blocked

Alex Chekholko Mon, 11 May 2020 11:44:45 -0700

Any time a node goes into DRAIN state you need to manually intervene and
put it back into service.
scontrol update nodename=ip-172-31-80-232 state=resume


On Mon, May 11, 2020 at 11:40 AM Joakim Hove <joakim.h...@gmail.com> wrote:

>
> You’re on the right track with the DRAIN state. The more specific answer
>> is in the “Reason=” description on the last line.
>>
>> It looks like your node has less memory than what you’ve defined for the
>> node in slurm.conf
>>
>
> Thank you; that sounded meaningful to me. My slurm.conf file had
> RealMemory=983 whereas "slurmd -C" showed "RealMemory=978" - so you are
> right; the actual node had less available memory than what I configured in
> slurm.conf - I guess the reason for the difference is slightly different
> AWS nodes? Anyay I updated the slurm.conf with "RealMemory=512" - i.e. with
> a wide margin less than the what the node actually has. After restarting
> slurmctld / slurmd I now get:
>
> ubuntu@ip-172-31-80-232:~/opm-portal/aws$ scontrol show node
> NodeName=ip-172-31-80-232 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.00
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=ip-172-31-80-232 NodeHostName=ip-172-31-80-232 Version=17.11
>    OS=Linux 5.3.0-1017-aws #18~18.04.1-Ubuntu SMP Wed Apr 8 15:12:16 UTC
> 2020
>    RealMemory=512 AllocMem=0 FreeMem=254 Sockets=1 Boards=1
>    State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
> MCS_label=N/A
>    Partitions=debug
>    BootTime=2020-05-11T17:02:15 SlurmdStartTime=2020-05-11T18:29:30
>    CfgTRES=cpu=1,mem=512M,billing=1
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>    Reason=Low RealMemory [root@2020-05-11T16:20:02]
>
> I.e. slurm has recognized the new memory setting, but the state is still
> "IDLE+DRAIN" - and no jobs start running :-(
>
>
>
>
>

Re: [slurm-users] [External] Re: Slurm queue seems to be completely blocked

Reply via email to