Package: ssdeep Version: 2.7-2 Severity: important Tags: patch Dear Maintainer,
ssdeep (and libfuzzy2 Debian package) before version 2.10 has a bug which may make wrong score on two fuzzy hashes with same block sizes. This will make clustering/comparing files unreliable. This bug was fixed in 2.10 by Jesse Kornblum <resea...@jessekornblum.com> but still not fixed in Debian versions (sid, unstable and stable). I encountered this bug while clustering about 10M files based on ssdeep hashes and I had to recluster all the files. Sorry that I have no `natural' examples to reproduce (because I slightly changed the parameter after building patched versions of ssdeep/libfuzzy2 2.7-2 and it will take about 2 months * 20 CPU cores to compare clusters) but we can generate `artificial' example by truncating second chunk of fuzzy hashes. [PROMPT_EXAMPLE_BEGIN] $ # Generate artificial test cases $ cat >test <<_END ssdeep,1.1--blocksize:hash:hash,filename 24:5nmkHuww9FXe0ZpPKoVH7bK3KT1Odk8gKgNWvoqzDVEatXSHlY31x:E4uV9FX,"1" 24:5nmkHuww9FXe0ZpPKoVH7bK3KT1Odk8gKgNWvoqzDVENXSCYA1x:E4uV9FX,"2" _END $ # This is the expected result. $ $SSDEEP_FIXED/ssdeep -k test -x test test:1 matches test:2 (100) test:1 matches test:2 (100) test:2 matches test:1 (100) test:2 matches test:1 (100) test:1 matches test:2 (100) test:1 matches test:2 (100) test:2 matches test:1 (100) test:2 matches test:1 (100) $ # This is the result from Debian versions of ssdeep. $ ssdeep -k test -x test test:1 matches test:2 (94) test:1 matches test:2 (94) test:2 matches test:1 (94) test:2 matches test:1 (94) test:1 matches test:2 (94) test:1 matches test:2 (94) test:2 matches test:1 (94) test:2 matches test:1 (94) $ [PROMPT_EXAMPLE_END] As you can see, buggy ssdeep/libfuzzy2 returns score of 94 but fixed versions of ssdeep/libfuzzy2 returns score of 100 for cases: * file 1 and file 2 * file 1 and file 1 (matching itself) * file 2 and file 2 (matching itself) Attached patch is excerpt from actual Jesse Kornblum's patch (applied in ssdeep 2.10) formatted for Debian version of 2.7-2. By the way, I recommend UPGRADING THE UPSTREAM VERSION TO 2.10 on `unstable' and `sid' instead of applying the patch because ssdeep version 2.10 fixes some other bugs (I didn't encountered but someone other may). Thanks and I hope this will be fixed before `Jessie' is frozen. Tsukasa OI http://a4lg.com/ -- System Information: Debian Release: 7.6 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-4-amd64 (SMP w/40 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages ssdeep depends on: ii libc6 2.13-38+deb7u4 ssdeep recommends no packages. ssdeep suggests no packages. -- no debconf information
diff --git a/fuzzy.c b/fuzzy.c index a9b771c..bcdef56 100644 --- a/fuzzy.c +++ b/fuzzy.c @@ -584,7 +584,7 @@ int fuzzy_compare(const char *str1, const char *str2) if (block_size1 == block_size2) { uint32_t score1, score2; score1 = score_strings(s1_1, s2_1, block_size1); - score2 = score_strings(s1_2, s2_2, block_size2); + score2 = score_strings(s1_2, s2_2, block_size1*2); // s->block_size = block_size1;