How to install DuplicatesDetectorService

2010-09-14 Thread hellboy

I found 

http://www.jarvana.com/jarvana/browse/org/ow2/weblab/service/solr-duplicates-detector/2.0/

Is anybody knows, hot to install ans use this lib on existing Solr instance?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-install-DuplicatesDetectorService-tp1472561p1472561.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to install DuplicatesDetectorService

2010-09-15 Thread hellboy

OK.

I need to find find/prevent duplicates in Database using Solr-Index

I use Django with Haystack integration.

I use TextProfileSignature to smart detect duplicates in text fields


solrconfig.xml wrote:
> 
> 
>  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>   true
>   sig
>   false
>   title,description
>name="signatureClass">org.apache.solr.update.processor.TextProfileSignature
> 
> 
> 
>   
> 

But there is also some other fields 

How can I calculate TextProfileSignature-value for custom title,description-
values on Django-Side WITHOUT adding to Solr Index?

I need only detect "possible duplicates" for entered by user
title,description, i.e. select all records from Solr with
user_sig=TextProfileSignature(user_title,user_description)

Is there in Solr Webservice-Interface to do it?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-install-DuplicatesDetectorService-tp1472561p1478111.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to install DuplicatesDetectorService

2010-09-15 Thread hellboy

Is there possible to rewrite this code to Python:

private static String getFuzzyHashing(MediaUnit unit) {
TextProfileSignature tps = new TextProfileSignature();
// initialise with empty parameters to force default values of
TextProfileSignature attributes
tps.init(SolrParams.toSolrParams(new NamedList()));

// The following lines are copied from 
SignatureUpdateProcessorFactory
SolR class
tps.add("text");
tps.add(SolrComponent.extractTextFromResource(unit));
byte[] signature = tps.getSignature();
char[] arr = new char[signature.length << 1];
for (int i = 0; i < signature.length; i++) {
int b = signature[i];
int idx = i << 1;
arr[idx] = StrUtils.HEX_DIGITS[(b >> 4) & 0xf];
arr[idx + 1] = StrUtils.HEX_DIGITS[b & 0xf];
}
return new String(arr);
}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-install-DuplicatesDetectorService-tp1472561p1478526.html
Sent from the Solr - User mailing list archive at Nabble.com.