On 02/25/2016 03:07 PM, Stefan Monnier wrote: >> MD5 alone can be somewhat dangerous even in benevolent environments: if the >> data sets are large enough or you are just unlucky, you are going to hit a >> colision and corrupt-or-lose-data-on-dedup sooner or later. > > [G]it doesn't seem worried about this. Admittedly, they use sha1 rather > than md5, so they have 160bit instead of 128bit, with a correspondingly > lower probability of collisions, but I'd be interested to know about > cases where md5 lead to accidental collisions.
Well, I wouldn't necessarily use that as a benchmark: git could have used SHA2-256 from the start - it's not like SHA2 is something brand new, it was already 4 years old when git was developed. I haven't heard of any _accidental_ collision of either MD5 or SHA1 so far, but I might be mistaken. (There are of course famous intentional collisions in MD5, see <http://www.mscs.dal.ca/~selinger/md5collision/>.) From a mathematical standpoint: if we assume that the values a hash may produce are uniformly distributed and cover the entire range of possible outputs, due to the birthday paradox accidental collisions occur every 2^(bitsize/2) inputs; for MD5 that would be 2^64, for SHA1 that would be 2^80. Whether you can live with that is up to you. Regards, Christian
signature.asc
Description: OpenPGP digital signature