Nobody has ever reported seeing a collision 'in the wild' with MD5. It is broken, but that takes an algorithm.

As to cosmic rays: it's a real problem. A recent Google paper reported that some ram chips will have 1 bit error per gigabit per century, while others have that much per hour. I've also seen bit errors on disks. All file systems should use checksums.

Yonik Seeley wrote:
On Tue, Nov 16, 2010 at 9:05 PM, Dennis Gearon<gear...@sbcglobal.net>  wrote:
Read up on WikiPedia, but I believe that no Hash Function is much good above 50%
of the address space it generates.
50% is way to high - collisions will happen before that.

But given that something like MD5 has 128 bits, that's 3.4e38, so even
a small fraction of that address space will work.  The probabilities
follow the "birthday problem":
http://en.wikipedia.org/wiki/Birthday_problem

Using a 128 bit hash, you can hash 26B docs with a hash collision
probability of e-18 (and yes, that is lower than the probability of
something else going wrong).

It also says: "For comparison, 10-18 to 10-15 is the uncorrectable bit
error rate of a typical hard disk [2]. In theory, MD5, 128 bits,
should stay within that range until about 820 billion documents, even
if its possible outputs are many more."

-Yonik
http://www.lucidimagination.com

Reply via email to