Well you CAN have RAM arranged in banks which mirror each other in a RAID-1 
fashion.

But heck, why not have THREE servers running the same problem - then two of 
them can vote out the other one,
and start to mutter about it behind its back...


-----Original Message-----
From: Beowulf [mailto:beowulf-boun...@beowulf.org] On Behalf Of Prentice Bisbal
Sent: 26 October 2016 14:51
To: beowulf@beowulf.org
Subject: Re: [Beowulf] non-stop computing

There is a amazing beauty in this simplicity.

Prentice

On 10/25/2016 02:46 PM, Gavin W. Burris wrote:
> Hi, Michael.
>
> What if the same job ran on two separate nodes, with IO to local scratch?  
> What are the odds both nodes would fail in that three week period.  No 
> special hardware / software required.  Simple.  Done.
>
> Cheers.
>
> On Tue 10/25/16 02:24PM EDT, Michael Di Domenico wrote:
>> here's an interesting thought exercise and a real problem i have to tackle.
>>
>> i have a researchers that want to run magma codes for three weeks or
>> so at a time.  the process is unfortunately sequential in nature and
>> magma doesn't support check pointing (as far as i know) and (I don't
>> know much about magma)
>>
>> So the question is;
>>
>> what kind of a system could one design/buy using any combination of
>> hardware/software that would guarantee that this program would run
>> for
>> 3 wks or so and not fail
>>
>> and by "fail" i mean from some system type error, ie memory faulted,
>> cpu faulted, network io slipped (nfs timeout) as opposed to "there's
>> a bug in magma" which already bit us a few times
>>
>> there's probably some commercial or "unreleased" commercial product
>> on the market that might fill this need, but i'm also looking for
>> something "creative" as well
>>
>> three weeks isn't a big stretch compared to some of the others codes
>> i've heard around the DOE that run for months, but it's still pretty
>> painful to have a run go for three weeks and then fail 2.5 weeks in
>> and have to restart.  most modern day hardware would probably support
>> this without issue, but i'm looking for more of a guarantee then a
>> prayer
>>
>> double bonus points for anything that runs at high clock speeds >3Ghz
>>
>> any thoughts?
>> _______________________________________________
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
>> Computing To change your subscription (digest mode or unsubscribe)
>> visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To 
change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf
Any views or opinions presented in this email are solely those of the author 
and do not necessarily represent those of the company. Employees of XMA Ltd are 
expressly required not to make defamatory statements and not to infringe or 
authorise any infringement of copyright or any other legal right by email 
communications. Any such communication is contrary to company policy and 
outside the scope of the employment of the individual concerned. The company 
will not accept any liability in respect of such communication, and the 
employee responsible will be personally liable for any damages or other 
liability arising. XMA Limited is registered in England and Wales (registered 
no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane, 
Wilford, Nottingham, NG11 7EP
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to