On Tue 23 Feb 2016 at 16:04:37 (+0100), Nicolas George wrote: > Le quintidi 5 ventôse, an CCXXIV, David Wright a écrit : > > Any faster ones that you recommend from the lists below? (I've rolled > > my own implementation of fdupes (which uses MD5) in python.) > > Nobody can recommend anything without knowing the intended use.
I don't understand. The intended use is in the previous line: "my own implementation of fdupes (which uses MD5) in python" and also in Thomas's statement: "we want to use it for identifying files in benevolent environments" (which you snipped). So to summarise: 1) I do what fdupes does, ie identify files (in a benevolent environment) using the MD5 signature to detect duplicate contents. 2) In view of your statement that faster hashes exist, I would like to explore replacing my use of MD5 by such a hash. 3) My python implementation has the following hashes in its own library, either "available on this platform" or "guaranteed on all python platforms". Using one of these makes altering my program easier than having to find out how to call an external hashing program (and the calls might slow things back down again). 4) As you're far more familiar with hashing than I am (and many people here), would you have any recommendations from these two lists? Python 3.4.2 (default, Oct 8 2014, 13:14:40) ... >>> hashlib.algorithms_guaranteed {'md5', 'sha1', 'sha224', 'sha512', 'sha384', 'sha256'} >>> hashlib.algorithms_available {'MD4', 'md5', 'md4', 'sha1', 'MD5', 'dsaWithSHA', 'whirlpool', 'sha', 'SHA512', 'SHA256', 'ripemd160', 'sha512', 'SHA384', 'sha384', 'dsaEncryption', 'RIPEMD160', 'sha256', 'SHA224', 'SHA1', 'ecdsa-with-SHA1', 'DSA', 'SHA', 'sha224', 'DSA-SHA'} >>> I hope that explains things better than my previous attempt. Cheers, David.