Am 25.09.2012 um 12:19 schrieb Andrew Holway:

> <snip>
> Im pretty sure faulty hardware is the root cause of out fault
> tolerance problems :). In any case the main issue seems to be the loss
> of a chunk of your application memory when the node fail not so much
> the retransmission of messages. MPI has some kind of functionality
> inside to address fault tolerance anyway.

If you are interested: there was a lot of discussion about FT in MPI3. There is 
a mailing list:

http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

-- Reuti
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to