According to Bruce Allen: > > This thread has been evolving, but I'd like to push it back a bit. > Earlier in the thread you pointed out the CERN study on silent data > corruption: > > http://fuji.web.cern.ch/fuji/talk/2007/kelemen-2007-C5-Silent_Corruptions.pdf > Actually, I was not the one who pointed out this study but I can't remember who did.
> If you are not already doing this, would it be possible for you to run > fsprobe(8) on your X4500 boxes to see if there are any silent data > corruption issues there? You have a large enough storage farm to gather > meaningful statistics. > We are not using fsprobe on our X4500. There are two reasons: . ZFS has built-in error detection (through "zpool scrub") and we are (maybe naively) relying on this to detect and correct data corruption which would be otherwise silent; . due to some ZFS limitation (there are some :-) fsprobe does not work reliably with ZFS. I'll try to be as concise as possible on the last point. In order to make sure that data are actually written to/read from disk and not from cache, fsprobe (optionally) uses Direct I/O (buffer cache bypass). Since Direct I/O is not supported by ZFS, you can't actually be certain that you're reading from disk and not from the cache (although you can get "some" guarantee that you actually write to the disk using "data synchronous" writes -- aka O_DSYNC or the "fsync()" family of POSIX functions). Really flushing the cache for ZFS filesystems is intrusive (to say the least), you need to either: . reboot; . unmount all ZFS filesystems then unload the ZFS kernel module(s) and start over (reload, remount); . export the ZFS pool and import it back. So, my point is that if you're not reliably reading from disk, you can't reliably detect disk errors. The main point (and one of the intents, I guess) of the initial report by Peter Kelemen (and his boss) was to give very strong incentives to the LHC software developpers to make sure that data files (and more generally all software that handle LHC data) include ways to check data integrity by storing/handling data checksums/hash/error detection and correction information. That's an absolute requirement for reliable long term data storage since the amount of data planned to be generated by the 4 LHC experiments is so huge (mind boggling actually). The estimated data production rate will be of almost 1 petabyte per month (10 PB/year). Regarding statistics, we plan to collect "zpool scrub" results and SMART statistics on all our X4500, but it's not done yet. Loïc. -- | Loïc Tortay <[EMAIL PROTECTED]> - IN2P3 Computing Centre | _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf