On 10/24/20 9:22 am, Kimera Rodgers wrote:
[root@kla-ac-ohpc-01 critical]# srun -c 8 --pty bash -i
srun: error: slurm_receive_msgs: Socket timed out on send/recv operation
srun: error: Task launch for 37.0 failed on node c-node3: Socket timed
out on send/recv operation
srun: error: Application launch failed: Socket timed out on send/recv
operation
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
To me this looks like networking issues, perhaps firewall/iptables rules
blocking connections.
Best of luck,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA