Re: [Beowulf] Checkpointing using flash

2012-10-02 Thread Hearns, John
Regarding fault tolerance, this sounds interesting. I haven't had a chance to do more than glimpse at the web page though (9:30 UK time and I need my coffee) Lecture Series "A Perspective on Exploiting Heterogeneous Fault-Tolerant Parallelism for HPC clusters and Supercomputers" at the Uni

Re: [Beowulf] Checkpointing using flash

2012-10-01 Thread Justin YUAN SHI
On Mon, Oct 1, 2012 at 2:22 PM, Mark Hahn wrote: >> My idea is to use data parallel API. This is nothing new. In theory, > > > right, it's not new. so why would it succeed this time around? This is because the transformation of the application architecture from static to statistic multiplexed f

Re: [Beowulf] Checkpointing using flash

2012-10-01 Thread Mark Hahn
> My idea is to use data parallel API. This is nothing new. In theory, right, it's not new. so why would it succeed this time around? > can still be elegant looking. For example, you can have multiple > Infiniband interfaces (some machines already have) to help counter the > speed disparity betw

Re: [Beowulf] Checkpointing using flash

2012-09-29 Thread Justin YUAN SHI
Something like that. But we don't want the app code to look too ugly. My idea is to use data parallel API. This is nothing new. In theory, every MPI program can be translated into data parallel. The magic is the total transformation of the application architecture. Traditionally computer, network

Re: [Beowulf] Checkpointing using flash

2012-09-29 Thread Lux, Jim (337C)
On 9/29/12 2:29 AM, "Justin YUAN SHI" wrote: >I missed this thread. Got busy with classes. Sorry. > >Going back to Jim's comments on Infiniband and OSI and MPI. I see the >exacscale computing requires us to rethink MPI's insistence on sending >message directly. Even with the group communicators

Re: [Beowulf] Checkpointing using flash

2012-09-29 Thread Justin YUAN SHI
I missed this thread. Got busy with classes. Sorry. Going back to Jim's comments on Infiniband and OSI and MPI. I see the exacscale computing requires us to rethink MPI's insistence on sending message directly. Even with the group communicators, the implementation insists on the same. The problem

Re: [Beowulf] Checkpointing using flash

2012-09-25 Thread Ellis H. Wilson III
On 09/24/2012 12:57 PM, Andrew Holway wrote: >> Haha, I doubt it -- probably the opposite in terms of development cost. >>Which is why I question the original statement on the grounds that >> "cost" isn't well defined. Maybe the costs just performance-wise, but >> that's not even clear to me w

Re: [Beowulf] Checkpointing using flash

2012-09-24 Thread Andrew Holway
> Haha, I doubt it -- probably the opposite in terms of development cost. > Which is why I question the original statement on the grounds that > "cost" isn't well defined. Maybe the costs just performance-wise, but > that's not even clear to me when we consider things at huge scales. 40 years a

Re: [Beowulf] Checkpointing using flash

2012-09-24 Thread Ellis H. Wilson III
Subject: Re: [Beowulf] Checkpointing using flash > >> Of course the physical modelers won't bat an eyelash, but the common >> programmer who still tries to figure out this multithreading thing >> will be out to lunch. > > Whenever you push a problem to from hardware software y

Re: [Beowulf] Checkpointing using flash

2012-09-24 Thread Lux, Jim (337C)
nodes, etc. Jim Lux -Original Message- From: beowulf-boun...@beowulf.org [mailto:beowulf-boun...@beowulf.org] On Behalf Of Eugen Leitl Sent: Monday, September 24, 2012 2:11 AM To: beowulf@beowulf.org Subject: Re: [Beowulf] Checkpointing using flash On Sat, Sep 22, 2012 at 09:29:25PM +, L

Re: [Beowulf] Checkpointing using flash

2012-09-24 Thread Lux, Jim (337C)
-Original Message- From: beowulf-boun...@beowulf.org [mailto:beowulf-boun...@beowulf.org] On Behalf Of Andrew Holway Sent: Monday, September 24, 2012 3:59 AM To: Eugen Leitl Cc: beowulf@beowulf.org Subject: Re: [Beowulf] Checkpointing using flash > Of course the physical modelers wo

Re: [Beowulf] Checkpointing using flash

2012-09-24 Thread Justin Shi
Regardless how low MPI stack goes, it has never "punched" through the packet retransmission layer. Therefore, the OSI model serves as a template to illustrate the point of discussion. Justin On Sep 22, 2012, at 10:34 AM, "Lux, Jim (337C)" wrote: > I see MPI as sitting much lower (network o

Re: [Beowulf] Checkpointing using flash

2012-09-24 Thread Justin Shi
Andrew, I think you are not too far off. If the global "fluster like" mechanism can provide the theoretical upper bounded protection for its stored info, and can scale as we grow the machine size, it would look like a reasonable exascale machine. Justin On Sep 22, 2012, at 7:02 AM, Andrew Ho

Re: [Beowulf] Checkpointing using flash

2012-09-24 Thread Andrew Holway
> Of course the physical modelers won't bat an eyelash, > but the common programmer who still tries to figure out > this multithreading thing will be out to lunch. Whenever you push a problem to from hardware software you exponentially increase the cost of solving that problem. ___

Re: [Beowulf] Checkpointing using flash

2012-09-24 Thread Eugen Leitl
On Sat, Sep 22, 2012 at 09:29:25PM +, Lux, Jim (337C) wrote: > I think the future is in explicitly recognizing that you have to pass > messages serially and designing algorithms that are tolerant of things > like missing messages, variable (but bounded) latency (or heck, latency at > all). Co

Re: [Beowulf] Checkpointing using flash

2012-09-23 Thread Lux, Jim (337C)
On 9/23/12 6:57 AM, "Andrew Holway" wrote: >2012/9/21 David N. Lombard : >> Our primary approach today is recovery-base resilience, a.k.a., >> checkpoint-restart (C/R). I'm not convinced we can continue to rely on >>that >> at exascale. > >- Snapshotting seems to be an ugly and inelegant way of

Re: [Beowulf] Checkpointing using flash

2012-09-23 Thread Andrew Holway
2012/9/21 David N. Lombard : > Our primary approach today is recovery-base resilience, a.k.a., > checkpoint-restart (C/R). I'm not convinced we can continue to rely on that > at exascale. - Snapshotting seems to be an ugly and inelegant way of solving the problem. For me it is especially laughable

Re: [Beowulf] Checkpointing using flash

2012-09-22 Thread Alan Louis Scheinine
Jim Lux wrote that one giant [distributed] memory has scalability problems from physical distance reasons. Yes indeed. Simply to clarify, I was refering to a specific niche in parameter space (physical and programmatic) associated with programs using file I/O. That is to say, there is a realm fo

Re: [Beowulf] Checkpointing using flash

2012-09-22 Thread Lux, Jim (337C)
On 9/22/12 12:47 PM, "Alan Louis Scheinine" wrote: >Andrew Holway wrote: > > I've been playing around with GFS and Gluster a bit recently and this > > has got me thinking... Given a fast enough, low enough latency network > > might it but possible to have a Gluster like or GFS like memory sp

Re: [Beowulf] Checkpointing using flash

2012-09-22 Thread Alan Louis Scheinine
Andrew Holway wrote: > I've been playing around with GFS and Gluster a bit recently and this > has got me thinking... Given a fast enough, low enough latency network > might it but possible to have a Gluster like or GFS like memory space? For random access, hard disk access times are millise

Re: [Beowulf] Checkpointing using flash

2012-09-22 Thread Lux, Jim (337C)
I see MPI as sitting much lower (network or transport, perhaps) Maybe for this (as in many other cases) the OSI model is not an appropriate one. That is, most practical systems have more blending between layers, and outright punching through. There are a variety of high level protocols/algorithms

Re: [Beowulf] Checkpointing using flash

2012-09-22 Thread Andrew Holway
> To be exact, the OSI layers 1-4 can defend packet data losses and > corruptions against transient hardware and network failures. Layers > 5-7 provides no protection. MPI sits on top of layer 7. And it assumes > that every transmission must be successful (this is why we have to use > checkpoint in

Re: [Beowulf] Checkpointing using flash

2012-09-22 Thread Justin YUAN SHI
Ellis: If we go to a little nitty-gritty detail view, you will see that transient faults are the ultimate enemies of exacscale computing. The problem, if we really go to the nitty-gritty details, stems from mismatch between the MPI assumptions and what the OSI model promises. To be exact, the OS

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread David N. Lombard
On Fri, Sep 21, 2012 at 02:49:32PM +, Hearns, John wrote: > http://www.theregister.co.uk/2012/09/21/emc_abba/ > > Frequent checkpointing will of course be vital for exascale, given the MTBF > of individual nodes. Individual nodes have very good MTBF. It's /system/ scale that causes problems

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Douglas Eadline
> I would suggest that some scheme of redundant computation might be more > effective.. Rather than try to store a single node's state on the node, > and then, if any node hiccups, restore the state (perhaps to a spare), and > restart, means stopping the entire cluster while you recover. > > Or, i

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Eugen Leitl
On Fri, Sep 21, 2012 at 01:09:41PM -0400, Ellis H. Wilson III wrote: > On 09/21/12 12:58, Lux, Jim (337C) wrote: > > Yes.. If that's the frequency of checkpoints. I was thinking more like 1 > > checkpoint per second or 10 seconds. > > While I suppose they might exist that frequent somehow in the

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Robert G. Brown
On Fri, 21 Sep 2012, Lux, Jim (337C) wrote: On 9/21/12 9:21 AM, "Hearns, John" wrote: Or, if you can factor your computation to make use of extra processing nodes, you can just keep on moving. Think of this as a higher level scheme than, say, Hamming codes for memory protection: use 11 b

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Ellis H. Wilson III
On 09/21/12 12:29, Lux, Jim (337C) wrote: > Flash is slow, though... SLC NAND flash (pretty fast, 8 Gbit part) is 250 > microseconds to write a 4kbyte (approx) page. Erasing is about 700 > microseconds (reading is 25 microseconds) > > MLC flash (say 512Gbit parts with 8 kBbyte pages) takes 1.3mi

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Ellis H. Wilson III
On 09/21/12 12:58, Lux, Jim (337C) wrote: > Yes.. If that's the frequency of checkpoints. I was thinking more like 1 > checkpoint per second or 10 seconds. While I suppose they might exist that frequent somehow in the wild, I've never heard of checkpoints at that low of time interval. These hug

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Lux, Jim (337C)
On 9/21/12 9:44 AM, "Ellis H. Wilson III" wrote: >On 09/21/12 12:29, Lux, Jim (337C) wrote: >> Flash is slow, though... SLC NAND flash (pretty fast, 8 Gbit part) is >>250 >> microseconds to write a 4kbyte (approx) page. Erasing is about 700 >> microseconds (reading is 25 microseconds) >> >>

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Lux, Jim (337C)
On 9/21/12 9:21 AM, "Hearns, John" wrote: > >Or, if you can factor your computation to make use of extra processing >nodes, you can just keep on moving. Think of this as a higher level >scheme than, say, Hamming codes for memory protection: use 11 bits to >store 8, and you're still synchronou

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Lux, Jim (337C)
On 9/21/12 8:41 AM, "Hearns, John" wrote: > > >Are your concerns about the accuracy of this statement related to the >fact that elReg is claiming that they must dump "the entire memory" or >some concern about flash being used as a temporary checkpointing medium? > >The "entire memory" statement

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Ellis H. Wilson III
On 09/21/12 12:13, Lux, Jim (337C) wrote: > I would suggest that some scheme of redundant computation might be more > effective.. Rather than try to store a single node's state on the node, > and then, if any node hiccups, restore the state (perhaps to a spare), and > restart, means stopping the en

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Hearns, John
Or, if you can factor your computation to make use of extra processing nodes, you can just keep on moving. Think of this as a higher level scheme than, say, Hamming codes for memory protection: use 11 bits to store 8, and you're still synchronous. Jim, you are smarter than me! IW as going to ai

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Lux, Jim (337C)
I would suggest that some scheme of redundant computation might be more effective.. Rather than try to store a single node's state on the node, and then, if any node hiccups, restore the state (perhaps to a spare), and restart, means stopping the entire cluster while you recover. Or, if you can fa

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Hearns, John
Are your concerns about the accuracy of this statement related to the fact that elReg is claiming that they must dump "the entire memory" or some concern about flash being used as a temporary checkpointing medium? The "entire memory" statement puzzled me. But using flash in this fashion does

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Ellis H. Wilson III
On 09/21/12 10:49, Hearns, John wrote: > http://www.theregister.co.uk/2012/09/21/emc_abba/ > > Frequent checkpointing will of course be vital for exascale, given the > MTBF of individual nodes. > > However how accurate is this statement: > > HPC jobs involving half a million compute cores ... have

Re: [Beowulf] Checkpointing using flash

2012-09-21 Thread Justin YUAN SHI
It looks fairly accurate. This is because reconcile distributed checkpoints is theoretically difficult. Therefore, frequent checkpointing is cost prohibitive for exacscale apps. Justin On Fri, Sep 21, 2012 at 10:49 AM, Hearns, John wrote: > http://www.theregister.co.uk/2012/09/21/emc_abba/ > >

[Beowulf] Checkpointing using flash

2012-09-21 Thread Hearns, John
http://www.theregister.co.uk/2012/09/21/emc_abba/ Frequent checkpointing will of course be vital for exascale, given the MTBF of individual nodes. However how accurate is this statement: HPC jobs involving half a million compute cores ... have a series of checkpoints set up in their code with