Re: [slurm-users] slurmctld up and running but not really working

2022-07-20 Thread Julien Rey
Actually, I was able to fix the problem by starting slurmctld with the -c option and then clear the runaway jobs with sacctmgr. Thanks for your help. J. Le 20/07/2022 à 17:06, Julien Rey a écrit : Hello, Unfortunately, the sacctmgr show runawayjobs is returning the following error: sacctm

Re: [slurm-users] slurmctld up and running but not really working

2022-07-20 Thread Julien Rey
Hello, Unfortunately, the sacctmgr show runawayjobs is returning the following error: sacctmgr: error: Slurmctld running on cluster cluster is not up, can't check running jobs J. Le 20/07/2022 à 14:45, Ole Holm Nielsen a écrit : Hi Julien, You could make a database dump of the current da

Re: [slurm-users] slurmctld up and running but not really working

2022-07-20 Thread Ole Holm Nielsen
Hi Julien, You could make a database dump of the current database so that you can load it on another server outside the cluster, while you reinitialize Slurm with a fresh database. So the database thinks that you have 253 running jobs? I guess that slurmctld is not working, otherwise you co

Re: [slurm-users] slurmctld up and running but not really working

2022-07-20 Thread Julien Rey
Hello, Thanks for your quick reply. I don't mind losing jobs information but I certainly don't want to clear the slurm database altogether. The /var/lib/slurm-llnl/slurmctld/node_state and /var/lib/slurm-llnl/slurmctld/node_state.old files look effectively empty. I then entered the followin

Re: [slurm-users] slurmctld up and running but not really working

2022-07-19 Thread Ole Holm Nielsen
Hi Julien, Apparently your slurmdbd is quite happy, but it seems that your slurmctld StateSaveLocation has been corrupted: [2022-07-19T15:17:58.356] error: Node state file /var/lib/slurm-llnl/slurmctld/node_state too small [2022-07-19T15:17:58.356] error: NOTE: Trying backup state save file.

[slurm-users] slurmctld up and running but not really working

2022-07-19 Thread Julien Rey
Hello, I am currently facing an issue with an old install of slurm (17.02.11). However, I cannot upgrade this version because I had troubles with database migration in the past (when upgrading to 17.11) and this install is set to be replaced in the next coming monthes. For the time being I ha