Dear all,
 
I kept having this error message, I couldnt find out why, anybody have similar experience? Thanks!
 
aborting job:
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406): MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(76):
MPIC_Sendrecv(161):
MPIC_Wait(321):
MPIDI_CH3_Progress_wait(209): an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(489):
connection_recv_fail(1836):
MPIDU_Socki_handle_read(658): connection failure (set=0,sock=1,errno=104:Connection reset by peer)
rank 9 in job 1  cn117_42770   caused collective abort of all ranks
  exit status of rank 9: killed by signal 9
rank 7 in job 1  cn117_42770   caused collective abort of all ranks
  exit status of rank 7: killed by signal 9
rank 10 in job 1  cn117_42770   caused collective abort of all ranks
  exit status of rank 10: return code 13
rank 11 in job 1  cn117_42770   caused collective abort of all ranks
  exit status of rank 11: killed by signal 9
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to