[slurm-users] Jobs of a user are stuck in Completing stage for a long time and cannot cancel them
We are running a slurm cluster with version `slurm 22.05.8`. One of our users has reported that their jobs have been stuck at the completion stage for a long time. Referring to Slurm Workload Manager - Slurm Troubleshooting Guide we found that indeed the batchhost for the job was removed from the cluster, perhaps without draining it first. How do we cancel/delete the jobs ? * We tried scancel on the batch and individual job ids from both the user and from SlurmUser -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them
Could you give more details regarding this and how you debugged the same? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them
In our case, that node has been removed from the cluster and cannot be added back right now ( is being used for some other work ). What can we do in such a case? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com