Hello, we had an outage of the cluster file system which also included the slurm StateSaveLocation. Also slurm reported al jobs as orphan and then setting the nodes DOWN because they were not responding. After the file system was back user started to submit jobs, but the old queue was gone. Should slurm not use the old slurm_state when the filesystem is back? What can we do to prevent loosing the queue again in such a situation? The version is 17.11.5
Best regards, Sebastian -- Sebastian Baldauf HPC-Team Technische Universität Berlin Zentraleinrichtung Campusmanagement Einsteinufer 17, 10587 Berlin Telefon: +49 (0)30 314-74591 s.bl...@tu-berlin.de www.campusmanagement.tu-berlin.de