Re: [Beowulf] Checkpointing using flash

Justin YUAN SHI Fri, 21 Sep 2012 08:15:33 -0700

It looks fairly accurate.

This is because reconcile distributed checkpoints is theoretically
difficult. Therefore, frequent checkpointing is cost prohibitive for
exacscale apps.


Justin

On Fri, Sep 21, 2012 at 10:49 AM, Hearns, John <john.hea...@mclaren.com> wrote:
> http://www.theregister.co.uk/2012/09/21/emc_abba/
>
>
>
> Frequent checkpointing will of course be vital for exascale, given the MTBF
> of individual nodes.
>
>
>
> However how accurate is this statement:
>
>
>
> HPC jobs involving half a million compute cores ... have a series of
> checkpoints set up in their code with the entire memory state stored at each
> checkpoint in a storage node.
>
>
>
>
>
>
>
> John Hearns | CFD Hardware Specialist | McLaren Racing Limited
> McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK
>
>
> T:  +44 (0) 1483 262000
>
> D:  +44 (0) 1483 262352
>
> F:  +44 (0) 1483 261928
> E:  john.hea...@mclaren.com
>
> W: www.mclaren.com
>
>
>
> The contents of this email are confidential and for the exclusive use of the
> intended recipient. If you receive this email in error you should not copy
> it, retransmit it, use it or disclose its contents but should return it to
> the sender immediately and delete your copy.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Checkpointing using flash

Reply via email to