Hmmm, with that setup you should _not_ be getting duplicate documents. So, when you see duplicate documents, you're seeing the exact same UUID on two shards, correct? My best guess is that you've done something innocent-seeming (that perhaps you forgot!) the resulted in this. Otherwise there would be a lot more complaints of duplicate documents.
In fact, what I'd do is create a new collection where you're absolutely sure that nothing "interesting" has been done. You can use "collection aliasing" to switch to that one after you've re-indexed all your docs and are satisfied with it. And I'm assuming that your UUID field is 1> labeled as the <unkqueKey> and 2> a string type (NOT text). Best, Erick On Mon, Jul 27, 2015 at 3:21 AM, mesenthil1 <senthilkumar.arumu...@viacomcontractor.com> wrote: > Thanks Erick. As I understand now that the entire cluster goes down if any > one shard is down, my first confusion is clarified. > > Following are the other details > > We really need to see details since I'm guessing we're talking > past each other. So: > *1> exactly how are you indexing documents?* > /using HTTPSolrServer and placing all update request to leader1/shard1. > Enabled autoCommit with 60 seconds and not placing any commit from client > application./ > *2> exactly how are you assigning a UUID to a doc?* > /defined an unique field in schema.xml and it is generated by the > client application, ID format is {mongoDBHostName}-{mongoDBName}-{UUID}. / > *3> do you ever re-index documents? If so, how are you > assuring that the UUID generated for any re-indexing operations > are the same ones used the first time? * > /Yes we are re-indexing documents. We are getting the UUID from mongodb and > the ID generated is same while we are doing update as well, using the same > code. / > > > We are unable to guess the root cause for having duplicate documents in > multiple shards. Also, it looks reindexing is the only solution for > removing the duplicates. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Cloud-Duplicate-documents-in-multiple-shards-tp4218162p4219251.html > Sent from the Solr - User mailing list archive at Nabble.com.