Hi David David Mathog wrote: > Eugen Leitl <[EMAIL PROTECTED]> wrote: > >> http://labs.google.com/papers/disk_failures.pdf > > Interesting. However google apparently uses: > > serial and parallel ATA consumer-grade hard disk drives, > ranging in speed from 5400 to 7200 rpm > > Not quite clear what they meant by "consumer-grade", but I'm assuming > that it's the cheapest disk in that manufacturer's line. I don't > typically buy those kinds of disks, as they have only a 1 year > warranty but rather purchase those with 5 year warranties. Even > for workstations.
Seagates. > > So I'm not too sure how useful their data is. I think everyone here Quite useful IMO. I know it would be PC, but I (and many others) would like to see a clustering of the data, specifically to see if there are any hyperplanes that separate the disks in terms of vendors, models, interfaces, etc. CERN had a study up about this which I had read and linked to, but now it seems to be gone, and I did not download a copy for myself. > would have agreed without the study that a disk reallocating blocks and > throwing scan errors is on the way out. Quite surprising about the "Tic tic tic whirrrrrrr" scares the heck out of me now :( > lack of a temperature correlation though. At the very least I would > have expected increased temps to lead to faster loss of bearing > lubricant. That tends to manifest as a disk that spun for 3 years > not being able to restart after being off for a half an hour. > Presumably you've all seen that. If they have great power and systems > management at their data centers the systems may not have been > down long enough for this to be observed. With enough disks, their sampling should be reasonably good, albeit biased towards their preferred vendor(s) and model(s). Would like to see that data. CERN compared SCSI, IDE, SATA, and FC. They found (as I remember, quoting from a document I no longer can find online) that there really weren't any significant reliability differences between them. I would like to see this sort of analysis here, and see if the real data (not the estimated MTBFs) shows a signal. I am guessing that we could build a pragmatic and time dependent MTBF based upon the time rate of change of the AFR. I think the Google paper was basically saying that they wanted to do something like this using the SMART data, but found that it was insufficient by itself to render a meaningful predictable model. That is, in and of itself, quite interesting. If you could read back reasonable sets of parameters from a machine and estimate the likelihood of it going south, this would be quite nice (or annoying) for admins everywhere. Also good in terms of tightening down real support costs and the value of warranties, default and extended. > > Regards, > > David Mathog > [EMAIL PROTECTED] > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: [EMAIL PROTECTED] web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf