Looking through the slurm.conf docs and greping around the source code
it looks like MinJobAge might be what I need to adjust. I changed it
by 2 orders of magnitude, 300 -> 300_000 on our dev cluster. I'll see
how things go.
On Wed, Dec 19, 2018 at 1:14 PM Eli V wrote:
>
> Does slurm remove job c
Does slurm remove job completion info from it's memory after a while?
Might explain a why I'm seeing job's getting cancled when there
dependent predecessor step finished ok. Below is the egrep
'352209(1|2)_11' from slurmctld.log. The 3522092 job array was created
with -d aftercorr:3522091. Looks li