[OK, I thought my previous message would be the last, but *this* is Greg Lindahl !]
According to Greg Lindahl: > On Sat, Sep 15, 2007 at 10:37:03AM +0200, Loic Tortay wrote: > > > Therefore, we are not running fsprobe on our X4500s since it is > > actually less useful than "zpool scrub" for detecting corruptions or > > problems on data. > > .. how does zpool scrub double-check that zpool scrub is working? > How does fsprobe double-check that fsprobe is working ? People, please go read Peter Kelemen slides (or watch his presentation), and see that sometimes he was unable to see the corruptions reported by fsprobe. Each and every non trivial piece of software (and hardware) has bugs (so does certainly ZFS and most probaby fsprobe too). That's a fact of life just like silent data corruptions (to paraphrase Peter's slides). How come it's somehow normal to express scepticism on ZFS but not on fsprobe ? I dare say that I am sceptical of both. > > The > point of extra user-run testing is often to make sure that your vendor > did not screw up. Of course, you are welcome to not follow advice, > good or bad. > The point of extra user testing with fsprobe is moot since fsprobe provides no *cheap and useful* extra user testing *in my environment*. We already have extra user testing built-in in most of the applications (so *that* is essentially free). Even if fsprobe doesn't find corruption doesn't mean that corruption is not happening on the other parts of the system. To some extent, if the initial burn-in testing does not find such problems that is a clue that the burn-in process is insufficient (think of it like regression testing: "we've seen this class of problems, now we check for it"). > > Several people have commented that fsprobe doesn't check existing files. > For your system binaries, you can test them using rpm -V. > That is, in my opinion, the point of the initial report by Peter Kelemen et al (which eventually became the slides and now is generating that buzz, here and even on LKML and Slashdot): the applications have to generate and check data integrity information. > > My new startup is planning on using md5 everywhere to provide an > end-to-end check. > Again, that is the only sane and useful anwser to (silent) data corruption: include and check data integrity information (end-to-end). In my environment, a significant portion of the data already include data integrity information (sometimes as a by-product of data compression). We have detected otherwise silent data corruption through this several times in the past (without fsprobe and ZFS). Loïc. -- | Loïc Tortay <[EMAIL PROTECTED]> - IN2P3 Computing Centre | _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf