On Fri, Jan 23, 2009 at 02:13:02PM -0800, Bill Broadley wrote: > I've seen little correlation between weight and vibration. After all even the > built like a tank hardware is still noisy.
If yelling at a RAID array in a noisy center causes a latency peak obviously the drives themselves are susceptible. The cover plate is thin, after all. Another reason to look forward to SSDs. > Just a delay between read/write and the answer. Usually there is a timeout, > after all a completely dead drive might never answer. Does anyone know whether WDTLER.EXE http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery still works on modern non-RAID-Edition WD Green lines? The price difference is some 50 EUR for TByte drives. > Well you don't want the drive hiding the fact that you had to retry 10 times > to read a sector. Sure smartctl can track this kind of thing, strangely I should make it a habit to read SMART trend report for my drive population. > hardware RAID controllers often hide that info from the operating system. > Basically for a raid you want a yes you have this block or no you don't have a > block within a fairly low time windows. Especially in the gruesome case of a > manual rebuild where you don't want the marginal sectors sending your drive > into la la land preventing you from getting the perfectly healthy blocks off. > > It all comes down to it's easier to deal with a sorry, can't get that block > within 50ms then handle a drive that disappears for 10's of seconds at a time. > > The kind of nightmare scenarios I've seen is a 16 disk array bit rot starts, > the array looks perfect, but of course the number of invisible retries starts > increasing. If you are using a pathetically old kernel (like say the standard > RHEL kernel) you don't have ECC scrubbing. Then of course a drive drops, you Apropos scrubbing, is chipkill worth it? Some AMD systems I've seen have ECC buffered DIMMs with chipkill. > go to rebuild, then a 2nd drive hits an error (that has been silent till now). > Then you are in a position where you want to scan all drives and hope that > the errors that you find are not aligned with the errors on other drives. > With RAID edition drives you can do such a rebuild in a reasonable amount of > time, with desktop drives, even one that is 99% good blocks can lead to very > high rebuild times. I'm aware of the problem, and looking at FreeNAS 0.7 (currently pre-alpha) with scrubbing and zfs/RAID-Z for self-healing. > I'm guessing that when a 120MB/sec consumer drive is providing 20-30MB/sec > that it's service life is shortened, but I've no numbers to back that up. In > the same conditions a raid edition drive provided 75MB/sec or so with or > without vibration. As another anecdote, I had 7200.11 TByte line perform awfully on DB-like tasks, and a lot of issues reported by SMART and failures during use (one RAID 1 failed to rebuild since the second drive died during reconstruction). > Manufacturers are starting to mention the number of drives in a RAID... they > seem to be differentiating between single drive, 2-4 drive arrays, and larger. ... _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf