On 09/16/13 17:36, Raimo Niskanen wrote:
On Mon, Sep 16, 2013 at 02:25:58PM +0000, Christian Weisgerber wrote:
Raimo Niskanen <[email protected]> wrote:
A resembling application is the Git version control system that is
based on the assumption that all content blobs can be uniquely
decribed by their 128-bit SHA1 hash value.
^^^^^^^^^^^^^^^^^
... 160-bit SHA1 hash...
Oh dear. That makes it 10^-48 collision probability beating hard drive
10^-15 with a factor 10^33. For SHA1. Thank you for correcting me.
Leaving the internals of rsync aside (of which I assume much but *know*
little), if I consider two 4TB blobs to be equal just because they have
the same SHA1 hash, I can easily see myself ending up in one of these
conditions (but not both):
- Just saved myself 4TB of disk space, time, bandwidth usage and
whatnot. Most likely this will always be the case.
*OR*
- Just fucked up a bloody shitload of data.
This is not comparable to a flipped bit on a harddrive. Sure, it does
not make up for the 10^33 difference, but it puts it in perspective, and
makes me a tad hesitant to run some dedup scheme on my backups.
/Alexander