Re: [Beowulf] Re: failure trends in a large disk drive population

Robert G. Brown Fri, 16 Feb 2007 15:21:48 -0800

On Fri, 16 Feb 2007, David Mathog wrote:

Justin Moore wrote:

Subject: Re: [Beowulf] failure trends in a large disk drive population
To: Eugen Leitl <[EMAIL PROTECTED]>
Cc: Beowulf@beowulf.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

http://labs.google.com/papers/disk_failures.pdf


Despite my Duke e-mail address, I've been at Google since July.  While
I'm not a co-author, I'm part of the group that did this study and can
answer (some) questions people may have about the paper.


Dangling meat in front of the bears, eh?  Well...


Hey Justin.  Are you going to stay in NC and move to the new facility as
they build it?

Let me add one general question to David's.

How did they look for predictive models on the SMART data?  It sounds
like they did a fairly linear data decomposition, looking for first
order correlations.  Did they try to e.g. build a neural network on it,
or use fully multivariate methods (ordinary stats can handle it up to
5-10 variables).

This is really an extension of David's questions below.  It would be
very interesting to add variables to the problem (if possible) until the
observed correlations resolve (in sufficiently high dimensionality) into
something significantly predictive.  That would be VERY useful.

    rgb


Is there any info for failure rates versus type of main bearing
in the drive?

Failure rate versus any other implementation technology?

Failure rate vs. drive speed (RPM)?

Or to put it another way, is there anything to indicate which
component designs most often result in the eventual SMART
events (reallocation, scan errors) and then, ultimately, drive
failure?

Failure rates versus rack position?  I'd guess no effect here,
since that would mostly affect temperature, and there was
little temperature effect.

Failure rates by data center?  (Are some of your data centers
harder on drives than others?  If so, why?)  Are there air
pressure and humidity measurements from your data centers?
Really low air pressure (as at observatory height)
is a known killer of disks,  it would be interesting if lesser
changes in air pressure also had a measurable effect.  Low
humidity cranks up static problems, high humidity can result
in condensation.  Again, what happens with values in between?
Are these effects quantifiable?

Regards,

David Mathog
[EMAIL PROTECTED]
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Re: failure trends in a large disk drive population

Reply via email to