----- "Joe Landman" <land...@scalableinformatics.com> wrote:
> 2) Scrub early, scrub often. As long as you don't have IBM gear where what appears to be a firmware issue somewhere (possibly on the disks themselves) can mean that the LSI RAID controller they rebadge thinks that up to 12 drives have just failed in the space of a few minutes. Of course none of them really have failed, but your RAID60 is still toast and boy does it take a few years off your life, not to mention days and days to recover from tape.. Sigh.. Happens under Debian (with mainline kernel) and CentOS with its stock kernel (we copied over the scrub script that Debian packages), but of course IBM wouldn't take any notice until we could do it under RHEL - you can trigger a scrub manually through (for example): echo check > /sys/block/md0/md/sync_action We now have another vendors storage unit and won't think about using the IBM unit in anger until we can confirm that the latest round of firmware updates have solved the problem. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf