I'm facing a challenges using de-dupliation of Solr documents. De-duplicate is done using TextProfileSignature with following parameters: <str name="fields">field1, field2, field3</str> <str name="quantRate">0.5</str> <str name="minTokenLen">3</str>
Here Field3 is normal text with few lines of data. Field1 and Field2 can contain upto 5 or 6 words of data. I want to de-duplicate when data in field1 and field2 are exactly the same and 90% of the lines in field3 is matched to that in another document. Is there anyway to achieve this? -- View this message in context: http://lucene.472066.n3.nabble.com/Customzing-Solr-Dedupe-tp4196879.html Sent from the Solr - User mailing list archive at Nabble.com.