Dear all,
I kept having this error message, I couldnt find
out why, anybody have similar experience? Thanks!
aborting job:
Fatal error in MPI_Barrier: Other MPI error, error stack: MPI_Barrier(406): MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier(76): MPIC_Sendrecv(161): MPIC_Wait(321): MPIDI_CH3_Progress_wait(209): an error occurred while handling an event returned by MPIDU_Sock_Wait() MPIDI_CH3I_Progress_handle_sock_event(489): connection_recv_fail(1836): MPIDU_Socki_handle_read(658): connection failure (set=0,sock=1,errno=104:Connection reset by peer) rank 9 in job 1 cn117_42770 caused collective abort of all ranks exit status of rank 9: killed by signal 9 rank 7 in job 1 cn117_42770 caused collective abort of all ranks exit status of rank 7: killed by signal 9 rank 10 in job 1 cn117_42770 caused collective abort of all ranks exit status of rank 10: return code 13 rank 11 in job 1 cn117_42770 caused collective abort of all ranks exit status of rank 11: killed by signal 9 |
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf