from:"Ivan Kovanda"

Re: [slurm-users] Nodes going into drain because of "Kill task failed"

2020-07-23 Thread Ivan Kovanda

syscall to return. Sometimes swap death is the culprit, but usually not at the scale that you stated. Maybe you could try reproducing the issue manually or putting something in epilog the see the state of the processes in the job's cgroup. Ryan On 7/22/20 10:24 AM, Ivan Kovanda wrote: Dear

[slurm-users] Nodes going into drain because of "Kill task failed"

2020-07-22 Thread Ivan Kovanda

Dear slurm community, Currently running slurm version 18.08.4 We have been experiencing an issue causing any nodes a slurm job was submitted to to "drain". >From what I've seen, it appears that there is a problem with how slurm is >cleaning up the job with the SIGKILL process. I've found this