Re: [Beowulf] Re: failure trends in a large disk drive population

Mark Hahn Wed, 21 Feb 2007 18:55:53 -0800

weakly correlated with failure. However, of all the disks that failed, lessthan half (around 45%) had ANY of the "strong" signals and another 25% hadsome of the "weak" signals. This means that over a third of disks thatfailed gave no appreciable warning. Therefore even combining the variableswould give no better than a 70% chance of predicting failure.


well, a factorial analysis might still show useful interactions.

number of disks. For example, among the disks that failed, many had a largenumber of seek error; however, over 70% of disks in the fleet -- failed andworking -- had a large number of seek errors.


was there any trend across time in the seek errors?

So that's our master plan.  Just don't tell anyone. :)


hah.  well, if it were me, the M.P. would involve some sort of proactive

treatment: say, a full-disk read once a day. smart self-tests _ought_to be more valuable than that, but otoh, the vendor probably munge themeasurements pretty badly.


regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Re: failure trends in a large disk drive population

Reply via email to