I found
http://www.jarvana.com/jarvana/browse/org/ow2/weblab/service/solr-duplicates-detector/2.0/
Is anybody knows, hot to install ans use this lib on existing Solr instance?
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-install-DuplicatesDetectorService-tp147256
OK.
I need to find find/prevent duplicates in Database using Solr-Index
I use Django with Haystack integration.
I use TextProfileSignature to smart detect duplicates in text fields
solrconfig.xml wrote:
>
>
> class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>
Is there possible to rewrite this code to Python:
private static String getFuzzyHashing(MediaUnit unit) {
TextProfileSignature tps = new TextProfileSignature();
// initialise with empty parameters to force default values of
TextProfileSignature attributes