On Wed, Jan 21, 2009 at 04:40:26PM -0600, Gerry Creager wrote: > We're seeing the following error in WRF compiled with openMPI and the > PGI 7.2 compiler: > mca_btl_tcp_frag_send:writev failed with errno=104
It's unfortunate that OpenMPI is following in the footsteps of MPICH and doesn't print out that 104 = "Connection reset by peer". The OpenMPI FAQ has some info about that: http://open-mpi.basemirror.de/faq/?category=tcp > While all nodes were accessible prior to the run and returned > appropriate "stuff" when queried with, eg., ssh and a command, two nodes > now return something like this: > [ge...@brazos SCOOP12km]$ ssh c0522 > Received disconnect from 192.168.200.154: 2: Bad packet length 808464432. That's kinda interesting. Perhaps the network chip got into a really funny state, and is corrupting packets? Power off for a while. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf