----- "Greg Lindahl" <lind...@pbm.com> wrote: > That kind of policy has a fairly high opportunity > cost, even before you factor in linked nodes.
Well we cannot dictate to our users what they do, we set a maximum walltime of 3 months and tell users that they should checkpoint (if they have control of the application and have coding skills). > E.g. you see a system disk going bad, but the user > will lose all their output unless the job runs for > 4 more weeks... We run SMART tests and the like trying to proactively spot bad disks (and other hardware) prior to failures, but yes, that's inevitable. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf