> > E.g. you see a system disk going bad, but the user > > will lose all their output unless the job runs for > > 4 more weeks... > > We run SMART tests and the like trying to proactively > spot bad disks (and other hardware) prior to failures, > but yes, that's inevitable.
It's not inevitable that the policy be that 3 month jobs are allowed. But you know me: I never saw a battle I didn't want to fight :-) Arrr, mateys, this be the BOFH, and I'm heere to educate you about the right way to use this here supercomputer... my way... or walk the plank! -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf