Or, if you can factor your computation to make use of extra processing
nodes, you can just keep on moving.  Think of this as a higher level
scheme than, say, Hamming codes for memory protection:  use 11 bits to
store 8, and you're still synchronous.

Jim, you are smarter than me!
IW as going to air the idea of pairs of nodes in lock-step, with either node 
being able to STONITH the other if
either there is a machine check event, or the other node does not keep up with 
reporting results.
Then signal to the cluster management that "There's been a failure here - but 
lets keep trucking to the end of the run,
When you can come along and replace my buddy and me"

The obvious drawback being you get half an exaflop for your money!




The contents of this email are confidential and for the exclusive use of the 
intended recipient.  If you receive this email in error you should not copy it, 
retransmit it, use it or disclose its contents but should return it to the 
sender immediately and delete your copy.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to