RE: removing duplicates

Petersen, Robert Wed, 21 Aug 2013 14:34:01 -0700

Hi

Perhaps you could query for all documents asking for the id field to be 
returned and then facet on the field you say you can key off of for duplicates. 
 Set the facet mincount to 2, then you would have to filter on each facet value 
and page through all doc IDs (except skip the first document) for each returned 
facet and delete by ID using a small app or something like that.  Spin all the 
deletes into the index and then do a commit at the end.  I think that would do 
it.

Thanks
Robi

-----Original Message-----
From: Ali, Saqib [mailto:docbook....@gmail.com] 
Sent: Wednesday, August 21, 2013 2:15 PM
To: solr-user@lucene.apache.org
Subject: removing duplicates

hello,

We have documents that are duplicates i.e. the ID is different, but rest of the 
fields are same. Is there a query that can remove duplicate, and just leave one 
copy of the document on solr? There is one numeric field that we can key off 
for find duplicates.

Please advise.

Thanks

RE: removing duplicates

Reply via email to