[slurm-users] Re: Node switching randomly to down state

John Hearns via slurm-users Sat, 18 Oct 2025 03:18:00 -0700

Look at the slurmd logs on these nodes.
Or try to run slurmd in non background mode.


And as I said on another thread check the time on these nodes

On Tue, Sep 23, 2025, 11:41 PM Julien Tailleur via slurm-users <
[email protected]> wrote:

> On 9/23/25 16:44, Davide DelVento wrote:
> > As the great Ole just taught us in another thread, this should tell
> > you why:
> >
> > sacctmgr show event
> > Format=NodeName,TimeStart,Duration,State%-6,Reason%-40,User where
> > nodes=FX[12-14]
> >
> > However I suspect you'd only get "not responding" again ;-)
>
> Good prediction!
>
> sacctmgr show event
> Format=NodeName,TimeStart,Duration,State%-6,Reason%-40,User
>         NodeName           TimeStart      Duration
> State                                   Reason       User
> --------------- ------------------- ------------- ------
> ---------------------------------------- ----------
>                  2021-08-25T11:13:56 1490-12:21:12        Cluster
> Registered TRES
> FX12            2025-09-08T15:04:39   15-08:30:29 DOWN*  Not
> responding                           slurm(640+
> FX13            2025-09-08T15:04:39   15-08:30:29 DOWN*  Not
> responding                           slurm(640+
> FX14            2025-09-08T15:04:39   15-08:30:29 DOWN*  Not
> responding                           slurm(640+
>
> > Are you sure that all the slurm services are running correctly on
> > those servers? Maybe try rebooting them?
>
> The service were all running. "Correctly" is harder to say :-) I did not
> see anything obviously interesting in the logs, but I am not sure what
> to look for.
>
> Anyway, I've followed your advice and rebooted the servers and they are
> idle for now. I will see how long it lasts. If that fixed it, I will
> fall on my sword and apologize for disturbing the ML...
>
> Best,
>
> Julien
>
> --
> slurm-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[slurm-users] Re: Node switching randomly to down state

Reply via email to