>> It might be worth noting that dedup is not intended for high >> performance file systems ... the cost of computing the hash(es) >> is(are) huge. > > Some file systems do (or claim to do) checksumming for data integrity > purposes, this seems to me like the perfect place to add the computation > of a hash - with data in cache (needed for checksumming anyay), the > computation should be fast.
Filesystems may call it a "checksum" but it's usually a hash. We use a Jenkins hash, which is fast and a lot better than, say, the TCP checksum. But it's a lot weaker than an expensive hash. If your dedup is going to fall back to byte-by-byte comparisons, it could be that a weak hash would be good enough. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf