Hi Perhaps you could query for all documents asking for the id field to be returned and then facet on the field you say you can key off of for duplicates. Set the facet mincount to 2, then you would have to filter on each facet value and page through all doc IDs (except skip the first document) for each returned facet and delete by ID using a small app or something like that. Spin all the deletes into the index and then do a commit at the end. I think that would do it.
Thanks Robi -----Original Message----- From: Ali, Saqib [mailto:docbook....@gmail.com] Sent: Wednesday, August 21, 2013 2:15 PM To: solr-user@lucene.apache.org Subject: removing duplicates hello, We have documents that are duplicates i.e. the ID is different, but rest of the fields are same. Is there a query that can remove duplicate, and just leave one copy of the document on solr? There is one numeric field that we can key off for find duplicates. Please advise. Thanks