Have you looked at:
http://wiki.apache.org/solr/Deduplication

<http://wiki.apache.org/solr/Deduplication>Best
Erick

On Wed, Sep 15, 2010 at 4:58 AM, hellboy <pbon...@googlemail.com> wrote:

>
> Is there possible to rewrite this code to Python:
>
> private static String getFuzzyHashing(MediaUnit unit) {
>                TextProfileSignature tps = new TextProfileSignature();
>                // initialise with empty parameters to force default values
> of
> TextProfileSignature attributes
>                tps.init(SolrParams.toSolrParams(new NamedList<String>()));
>
>                // The following lines are copied from
> SignatureUpdateProcessorFactory
> SolR class
>                tps.add("text");
>                tps.add(SolrComponent.extractTextFromResource(unit));
>                byte[] signature = tps.getSignature();
>                char[] arr = new char[signature.length << 1];
>                for (int i = 0; i < signature.length; i++) {
>                        int b = signature[i];
>                        int idx = i << 1;
>                        arr[idx] = StrUtils.HEX_DIGITS[(b >> 4) & 0xf];
>                        arr[idx + 1] = StrUtils.HEX_DIGITS[b & 0xf];
>                }
>                return new String(arr);
>        }
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-install-DuplicatesDetectorService-tp1472561p1478526.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to