Stuart Midgley wrote:
Thanks to all the responses, it has been interesting reading. We have
started using raid6 on newer servers and will slowely get rid of our old
raid5 servers.
I found the comments about scrubbing very interesting. What do people
do with their file systems? We couldn't afford the reduced performance
Software RAIDs (our DeltaV) are scrubbed once a week. Hardware raids
are scrubbed also once a week. Basically errors can accumulate.
Scrubbing isn't perfect, and as Michael and others have pointed out,
there can be bugs. But honestly, I am of the opinion that the several
hours of scrubbing which results in reduced performance, are a heck of a
lot better than dealing with down time due to an "event".
Scrubbing occurs in the background, and you can limit its impact.
and time for scrubbing. We run our Lustre setup almost flat out all the
time. We regularly do over a PB of io in a week (we often have our
total throughput at ~3GB/s for weeks on end). We use lustre as our
scratch space so backups are not possible. Nothing could get the data
off fast enough between us creating/using/deleting it.
Of course, the fact that we basically run at 95% full all the time is as
good as scrubbing :)
Not quite ... Scrubbing is a bit more of a structured testing and
repair. The I/O may leave coverage holes ... even at 95% capacity.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: land...@scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf