Re: [slurm-users] Job with srun is still RUNNING after node reboot

2020-04-01 Thread Yair Yarom
I've checked it now, it isn't listed as a runaway job. On Tue, Mar 31, 2020 at 5:24 PM David Rhey wrote: > Hi, Yair, > > Out of curiosity have you checked to see if this is a runaway job? > > David > > On Tue, Mar 31, 2020 at 7:49 AM Yair Yarom wrote: > >> Hi, >> >> We have an issue where runni

Re: [slurm-users] Job with srun is still RUNNING after node reboot

2020-03-31 Thread David Rhey
Hi, Yair, Out of curiosity have you checked to see if this is a runaway job? David On Tue, Mar 31, 2020 at 7:49 AM Yair Yarom wrote: > Hi, > > We have an issue where running srun (with --pty zsh), and rebooting the > node (from a different shell), the srun reports: > srun: error: eio_message_s

[slurm-users] Job with srun is still RUNNING after node reboot

2020-03-31 Thread Yair Yarom
Hi, We have an issue where running srun (with --pty zsh), and rebooting the node (from a different shell), the srun reports: srun: error: eio_message_socket_accept: slurm_receive_msg[an.ip.addr.ess]: Zero Bytes were transmitted or received and hangs. After the node boots, the slurm claims that jo