Following up on this, it looks like slurmctld crashes reliably just
after a job which was submitted to multiple partitions completes. Has
anyone encountered this sort of thing before?
Here is a simplified version of our cluster's partitions:
Nodes PartitionPriority
nod
We are running into a problem where slurmctld is segfaulting a few
times a day. We had this problem with SLURM 23.11.8 and now with 23.11.10
as well, though the problem only appears on one of the several SLURM
clusters we have, and all of them use one of those versions of SLURM. I was
wonde