Re: [slurm-users] help with canceling or deleteing a job

2023-09-20 Thread Feng Zhang
👍 Best, Feng On Wed, Sep 20, 2023 at 7:29 AM Wagner, Marcus wrote: > Even after rebooting, sometimes nodes are stuck because of "completing > jobs". > > What helps then is to set the node down and resume it afterwards: > > scontrol update nodename= state=drain reason=stuck; scontrol > update

Re: [slurm-users] help with canceling or deleteing a job

2023-09-20 Thread Wagner, Marcus
Even after rebooting, sometimes nodes are stuck because of "completing jobs". What helps then is to set the node down and resume it afterwards: scontrol update nodename= state=drain reason=stuck; scontrol update nodename= state=resume Best Marcus Am 20.09.2023 um 09:11 schrieb Ole Holm Nie

Re: [slurm-users] help with canceling or deleteing a job

2023-09-20 Thread Ole Holm Nielsen
On 9/20/23 01:39, Feng Zhang wrote: Restarting the slurmd dameon of the compute node should work, if the node is still online and normal. Probably not. If the filesystem used by the job is hung, the node must probably be rebooted, and the filesystem must be checked. /Ole On Tue, Sep 19, 2

Re: [slurm-users] help with canceling or deleteing a job

2023-09-19 Thread Feng Zhang
Restarting the slurmd dameon of the compute node should work, if the node is still online and normal. Best, Feng On Tue, Sep 19, 2023 at 8:03 AM Felix wrote: > > Hello > > I have a job on my system which is running more than its time, more than > 4 days. > > 1808851 debug gridjob atlas01

Re: [slurm-users] help with canceling or deleteing a job

2023-09-19 Thread Ole Holm Nielsen
On 9/19/23 13:59, Felix wrote: Hello I have a job on my system which is running more than its time, more than 4 days. 1808851 debug  gridjob  atlas01 CG 4-00:00:19  1 awn-047 The job has state "CG" which means "Completing". The Completing status is explained in "man sinfo". T