[slurm-users] Re: slurmctld keeps segfaulting, possibly during or just after backfill

2024-11-01 Thread Marcus Lauer via slurm-users
Following up on this, it looks like slurmctld crashes reliably just after a job which was submitted to multiple partitions completes. Has anyone encountered this sort of thing before? Here is a simplified version of our cluster's partitions: Nodes PartitionPriority nod

[slurm-users] slurmctld keeps segfaulting, possibly during or just after backfill

2024-10-02 Thread Marcus Lauer via slurm-users
We are running into a problem where slurmctld is segfaulting a few times a day. We had this problem with SLURM 23.11.8 and now with 23.11.10 as well, though the problem only appears on one of the several SLURM clusters we have, and all of them use one of those versions of SLURM. I was wonde