Have you looked at: http://wiki.apache.org/solr/Deduplication
<http://wiki.apache.org/solr/Deduplication>Best Erick On Wed, Sep 15, 2010 at 4:58 AM, hellboy <pbon...@googlemail.com> wrote: > > Is there possible to rewrite this code to Python: > > private static String getFuzzyHashing(MediaUnit unit) { > TextProfileSignature tps = new TextProfileSignature(); > // initialise with empty parameters to force default values > of > TextProfileSignature attributes > tps.init(SolrParams.toSolrParams(new NamedList<String>())); > > // The following lines are copied from > SignatureUpdateProcessorFactory > SolR class > tps.add("text"); > tps.add(SolrComponent.extractTextFromResource(unit)); > byte[] signature = tps.getSignature(); > char[] arr = new char[signature.length << 1]; > for (int i = 0; i < signature.length; i++) { > int b = signature[i]; > int idx = i << 1; > arr[idx] = StrUtils.HEX_DIGITS[(b >> 4) & 0xf]; > arr[idx + 1] = StrUtils.HEX_DIGITS[b & 0xf]; > } > return new String(arr); > } > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-install-DuplicatesDetectorService-tp1472561p1478526.html > Sent from the Solr - User mailing list archive at Nabble.com. >