Dear All,
I kept having this error message, I couldnt find out why, anybody have similar experience?
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(406): MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(76):
MPIC_Sendrecv(152):
MPIC_Wait(321):
MPIDI_CH3_Progress_wait(209): an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(489):
connection_recv_fail(1836):
MPIDU_Socki_handle_read(658): connection failure (set=0,sock=2,errno=104:Connection reset by peer)
aborting job:
but in 7 nodes run fine, and not errors
can you help me ?
thanks!
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf