[slurm-users] slurm_state

sblock Fri, 12 Mar 2021 00:48:27 -0800

Hello,

we had an outage of the cluster file system which also included the
slurm StateSaveLocation. Also slurm reported al jobs as orphan and then
setting the nodes DOWN because they were not responding.
After the file system was back user started to submit jobs, but the old
queue was gone.
Should slurm not use the old slurm_state when the filesystem is back?
What can we do to prevent loosing the queue again in such a situation?
The version is 17.11.5


Best regards,
 Sebastian
 

-- 
Sebastian Baldauf
HPC-Team


Technische Universität Berlin
Zentraleinrichtung Campusmanagement
Einsteinufer 17, 10587 Berlin
Telefon: +49 (0)30 314-74591
[email protected]
www.campusmanagement.tu-berlin.de

[slurm-users] slurm_state

Reply via email to