I'm facing a challenges using de-dupliation of Solr documents.

De-duplicate is done using TextProfileSignature with following parameters: 
<str name="fields">field1, field2, field3</str> 
<str name="quantRate">0.5</str>
<str name="minTokenLen">3</str>

Here Field3 is normal text with few lines of data.
Field1 and Field2 can contain upto 5 or 6 words of data. 

I want to de-duplicate when data in field1 and field2 are exactly the same
and 90% of the lines in field3 is matched to that in another document. 

Is there anyway to achieve this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to