I try to implement NearDup detection by SimHash <https://moz.com/devblog/near-duplicate-detection/> algorithm in Solr. Let's say: 1) each document has a field /simhash_signature/ that stores a sequence of bits. 2) that in order to be considered NearDup, documents must have, at most, 2 bits that differ in /simhash_signature/
*My question:* How can I get groups of nearDup by /simhash_signature/? *Examples:* Input: Doc A = 0001000 Doc B = 1000000 Doc C = 1111111 Doc D = 0101000 Output: A -> {B, D} B -> {A} C -> {} D -> {A} -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-by-simhash-signature-tp4243236.html Sent from the Solr - User mailing list archive at Nabble.com.