It looks fairly accurate.

This is because reconcile distributed checkpoints is theoretically
difficult. Therefore, frequent checkpointing is cost prohibitive for
exacscale apps.

Justin

On Fri, Sep 21, 2012 at 10:49 AM, Hearns, John <john.hea...@mclaren.com> wrote:
> http://www.theregister.co.uk/2012/09/21/emc_abba/
>
>
>
> Frequent checkpointing will of course be vital for exascale, given the MTBF
> of individual nodes.
>
>
>
> However how accurate is this statement:
>
>
>
> HPC jobs involving half a million compute cores ... have a series of
> checkpoints set up in their code with the entire memory state stored at each
> checkpoint in a storage node.
>
>
>
>
>
>
>
> John Hearns | CFD Hardware Specialist | McLaren Racing Limited
> McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK
>
>
> T:  +44 (0) 1483 262000
>
> D:  +44 (0) 1483 262352
>
> F:  +44 (0) 1483 261928
> E:  john.hea...@mclaren.com
>
> W: www.mclaren.com
>
>
>
> The contents of this email are confidential and for the exclusive use of the
> intended recipient. If you receive this email in error you should not copy
> it, retransmit it, use it or disclose its contents but should return it to
> the sender immediately and delete your copy.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to