syscall to
return. Sometimes swap death is the culprit, but usually not at the scale that
you stated. Maybe you could try reproducing the issue manually or putting
something in epilog the see the state of the processes in the job's cgroup.
Ryan
On 7/22/20 10:24 AM, Ivan Kovanda wrote:
Dear
Dear slurm community,
Currently running slurm version 18.08.4
We have been experiencing an issue causing any nodes a slurm job was submitted
to to "drain".
>From what I've seen, it appears that there is a problem with how slurm is
>cleaning up the job with the SIGKILL process.
I've found this