Actually, I was able to fix the problem by starting slurmctld with the
-c option and then clear the runaway jobs with sacctmgr.
Thanks for your help.
J.
Le 20/07/2022 à 17:06, Julien Rey a écrit :
Hello,
Unfortunately, the sacctmgr show runawayjobs is returning the
following error:
sacctm
Hello,
Unfortunately, the sacctmgr show runawayjobs is returning the following
error:
sacctmgr: error: Slurmctld running on cluster cluster is not up, can't
check running jobs
J.
Le 20/07/2022 à 14:45, Ole Holm Nielsen a écrit :
Hi Julien,
You could make a database dump of the current da
Hi Julien,
You could make a database dump of the current database so that you can
load it on another server outside the cluster, while you reinitialize
Slurm with a fresh database.
So the database thinks that you have 253 running jobs? I guess that
slurmctld is not working, otherwise you co
Hello,
Thanks for your quick reply.
I don't mind losing jobs information but I certainly don't want to clear
the slurm database altogether.
The /var/lib/slurm-llnl/slurmctld/node_state and
/var/lib/slurm-llnl/slurmctld/node_state.old files look effectively
empty. I then entered the followin
Hi Julien,
Apparently your slurmdbd is quite happy, but it seems that your
slurmctld StateSaveLocation has been corrupted:
[2022-07-19T15:17:58.356] error: Node state file
/var/lib/slurm-llnl/slurmctld/node_state too small
[2022-07-19T15:17:58.356] error: NOTE: Trying backup state save file.
Hello,
I am currently facing an issue with an old install of slurm (17.02.11).
However, I cannot upgrade this version because I had troubles with
database migration in the past (when upgrading to 17.11) and this
install is set to be replaced in the next coming monthes. For the time
being I ha