On Sat, Apr 04, 2009 at 05:24:23PM -0400, Jason Riedy wrote: > And Robert G. Brown writes: > > For them servicing/replacing a system is cheap: Box dies. > > Employee notes this, grabs box from Big Stack of Boxes, carries > > it to dead box, removes dead box, replace it with new working > > box, presses power switch, walks away. > > Plus, your operator can be unskilled.
Um, not completely. These clusters work by starting with 3 copies of every chunk of the data, and as you work you have to make sure that you don't take down the wrong system and leave the cluster with 0 or 1 copies of a chunk of data. There are software mechanisms you can use to help, but the operator needs to know how the rules work. Some tasks, yeah, no problem: if the box is already dead. But many tasks involve boxes which aren't dead yet: 1 disk has failed, the box needs a reboot to run a new kernel, a new release of the application software, etc etc. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf