Steven,
Looks like you may have had a secondary controller that took over and
changed your StateSave files.
IF you don't need the job info AND no jobs are running, you can just
rename/delete your StateSaveLocation directory and things will be
recreated. Job numbers will start over (unless you set FirstJobId, which
you should if you want to keep your sacct data).
It also looks like your logging does not have permissions. Change
SlurmctldLogFile to be something like /var/log/slurm/slurmctld.log and
set the owner of /var/log/slurm to the slurm user.
Ensure all slurmctld daemons are down, then start the first. Once it is
up (you can run scontrol show config) start the second. Run 'scontrol
show config' again and you should see both daemons listed as 'up at the
end of the output.
-Brian Andrus
On 2/3/2025 7:29 PM, Steven Jones via slurm-users wrote:
>From the logs 2 errors,
8><---
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: Starting
Slurm controller daemon...
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]:
slurmctld: error: chdir(/var/log): Permission denied
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]:
slurmctld: slurmctld version 24.11.1 started on cluster poc-cluster(2175)
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: Started
Slurm controller daemon.
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]:
slurmctld: fatal: Can not recover assoc_usage state, incompatible
version, got 9728 need >= 9984 <= 10752, start with '-i' to ignore
this. Warning: using -i will lose the data that can't be recovered.
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]:
slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]:
slurmctld.service: Failed with result 'exit-code'.
No idea on "slurmctld: error: chdir(/var/log): Permission denied"
need more info but the log seems to be written to OK as we can see.
"fatal: Can not recover assoc_usage state, incompatible version,"
This seems to be me attempting to upgrade from ver22 to ver24 but
google tells me ver22 "left a mess" and ver24 cant cope. Where would I
go looking to clean up please?
regards
Steven
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com