Does slurm remove job completion info from it's memory after a while? Might explain a why I'm seeing job's getting cancled when there dependent predecessor step finished ok. Below is the egrep '352209(1|2)_11' from slurmctld.log. The 3522092 job array was created with -d aftercorr:3522091. Looks like the predecessor job finished successfully at 9:21, and was split out to run at 9:30, never run and then canceled at 10:38 because of an unsatisfied job dependency on the job that already completed over an hour ago. Is there some config in slurm.conf that will keep this completion info around longer, or is this just a flat out bug in the slurmctld?
[2018-12-19T08:48:52.632] backfill: Started JobId=3522091_11(3522113) in low on r3-19 [2018-12-19T09:21:22.914] _job_complete: JobId=3522091_11(3522113) WEXITSTATUS 0 [2018-12-19T09:21:22.914] _job_complete: JobId=3522091_11(3522113) done [2018-12-19T09:30:07.922] build_job_queue: Split out JobId=3522092_11(3522317) for SLURM_DEPEND_AFTER_CORRESPOND use [2018-12-19T10:38:12.981] _kill_dependent: Job dependency can't be satisfied, cancelling JobId=3522092_11(3522317)