Package: ssdeep
Version: 2.7-2
Severity: important
Tags: patch

Dear Maintainer,

ssdeep (and libfuzzy2 Debian package) before version 2.10 has a bug
which may make wrong score on two fuzzy hashes with same block sizes.
This will make clustering/comparing files unreliable.

This bug was fixed in 2.10 by Jesse Kornblum
<resea...@jessekornblum.com> but still not fixed in Debian versions
(sid, unstable and stable).
I encountered this bug while clustering about 10M files based on ssdeep
hashes and I had to recluster all the files.

Sorry that I have no `natural' examples to reproduce (because I slightly
changed the parameter after building patched versions of
ssdeep/libfuzzy2 2.7-2 and it will take about 2 months * 20 CPU cores to
compare clusters) but we can generate `artificial' example by truncating
second chunk of fuzzy hashes.



[PROMPT_EXAMPLE_BEGIN]

$ # Generate artificial test cases
$ cat >test <<_END
ssdeep,1.1--blocksize:hash:hash,filename
24:5nmkHuww9FXe0ZpPKoVH7bK3KT1Odk8gKgNWvoqzDVEatXSHlY31x:E4uV9FX,"1"
24:5nmkHuww9FXe0ZpPKoVH7bK3KT1Odk8gKgNWvoqzDVENXSCYA1x:E4uV9FX,"2"
_END

$ # This is the expected result.
$ $SSDEEP_FIXED/ssdeep -k test -x test
test:1 matches test:2 (100)
test:1 matches test:2 (100)

test:2 matches test:1 (100)
test:2 matches test:1 (100)

test:1 matches test:2 (100)
test:1 matches test:2 (100)

test:2 matches test:1 (100)
test:2 matches test:1 (100)

$ # This is the result from Debian versions of ssdeep.
$ ssdeep -k test -x test
test:1 matches test:2 (94)
test:1 matches test:2 (94)

test:2 matches test:1 (94)
test:2 matches test:1 (94)

test:1 matches test:2 (94)
test:1 matches test:2 (94)

test:2 matches test:1 (94)
test:2 matches test:1 (94)

$
[PROMPT_EXAMPLE_END]



As you can see, buggy ssdeep/libfuzzy2 returns score of 94 but fixed
versions of ssdeep/libfuzzy2 returns score of 100 for cases:

* file 1 and file 2
* file 1 and file 1 (matching itself)
* file 2 and file 2 (matching itself)

Attached patch is excerpt from actual Jesse Kornblum's patch (applied in
ssdeep 2.10) formatted for Debian version of 2.7-2.


By the way, I recommend UPGRADING THE UPSTREAM VERSION TO 2.10 on
`unstable' and `sid' instead of applying the patch because ssdeep
version 2.10 fixes some other bugs (I didn't encountered but someone
other may).

Thanks and I hope this will be fixed before `Jessie' is frozen.

Tsukasa OI
http://a4lg.com/


-- System Information:
Debian Release: 7.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/40 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages ssdeep depends on:
ii  libc6  2.13-38+deb7u4

ssdeep recommends no packages.

ssdeep suggests no packages.

-- no debconf information

diff --git a/fuzzy.c b/fuzzy.c
index a9b771c..bcdef56 100644
--- a/fuzzy.c
+++ b/fuzzy.c
@@ -584,7 +584,7 @@ int fuzzy_compare(const char *str1, const char *str2)
   if (block_size1 == block_size2) {
     uint32_t score1, score2;
     score1 = score_strings(s1_1, s2_1, block_size1);
-    score2 = score_strings(s1_2, s2_2, block_size2);
+    score2 = score_strings(s1_2, s2_2, block_size1*2);
 
     //    s->block_size = block_size1;
 

Reply via email to