Re: [Beowulf] dedupe filesystem

Lawrence Stewart Mon, 08 Jun 2009 05:51:10 -0700


On Jun 8, 2009, at 12:55 AM, Joe Landman wrote:

Lawrence Stewart wrote:

[...]
Yup this is it, but on the fly is the hard part. Doing thiscomparison is computationally very expensive. The hashcalculations are not cheap by any measure. You most decidedly donot wish to do this on the fly ...
The assumption of a high performance disk/file system is implicithere.
And for that, it’s all about trading between cost of storagespace, retrieval time, and computational effort to run thealgorithm.
Exactly.
I think the hash calculations are pretty cheap, actually. I justtimed sha1sum on a 2.4 GHz core2 and it runs at 148 Megabytes persecond, on one core (from the disk cache). That is substantiallyfaster than the disk transfer rate. If you have a parallelfilesystem, you can parallize the hashes as well.
Disk transfer rates are 100-120 MB/s these days. For highperformance local file systems (N * disks), the data rates youshowed for sha1 won't cut it. Especially if they have multiple hashsignatures computed in order to avoid hash collisions (ala MD5 etal). The probability of two different hashes having two or moreunique blocks that have a collision in the same manner is somewhatsmaller than the probability that a single hash function has two (ormore) unique blocks that have a collision. So you compute two ormore in different ways, to reduce the probability of that collision.
For laughs, I just tried this on our lab server. We have a 500 MB/sfile system attached to it, so we aren't so concerned about data IObottlenecks.



Actual measurements and careful analysis beat handwaving every time :-)

My assumptions were a bit different I guess, still figuring 50MB/s perspindle, and supposing that <clients> are computing the hashes, ratherthan the servers running the disks. If that could be arranged, thenthe cores available to compute hashes scale with the clients.

In any case, I am <not> arguing for deduplication for high performancefilesystems, I think it is a decent idea for backups though.

Regarding hash collisions, the relevant math is the "birthday problem"I think. If the hash values are uniformly distributed, as they shouldbe, then the probability of a collision rises to about 1/2 when thenumber of blocks reaches the square root of the size of the valuespace. So you would have about 50% chance of a collision <somewhere>if you have 4 billion blocks (32 bits) and are using 64 bit hashes.If multiple hashes are independent, the you get to add the sizesbefore taking the square root. 256 bit hashes ought to givenegligible odds of a collision up to 64 bits worth of blocks, where"negligible" means much less than other sources of permanently lostdata.

However it might make more sense from a system design perspective touse the hash as a hint, and to actually compare the data. This wouldforce a random block read to confirm every duplicate. Hmm, let'sconvert sequential writes into random reads...

The compression ratio of 4K blocks to 16 byte hashes is also suspect,this is 250 to one, and the incremental cost ratio of disk and ram isnot much different. So keeping the hashes in ram is probably tooexpensive.

So in summary, deduplication is messy, complicated, has badperformance, and uncertain economics for HPC. Let's not do it.


-L


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] dedupe filesystem

Reply via email to