Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-28 Thread mesenthil1
Thanks Erick. We could not recollect what could have happened in between.. Yes. We are seeing the same document in 2 shards. "Uniquefiled" is set as uuid in schema and declared as String. Will go with reindexing. schema.xml : Query: http://localhost:1004/solr/collection1/select?q=id:%22

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-27 Thread Erick Erickson
Hmmm, with that setup you should _not_ be getting duplicate documents. So, when you see duplicate documents, you're seeing the exact same UUID on two shards, correct? My best guess is that you've done something innocent-seeming (that perhaps you forgot!) the resulted in this. Otherwise there would

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-27 Thread mesenthil1
Thanks Erick. As I understand now that the entire cluster goes down if any one shard is down, my first confusion is clarified. Following are the other details We really need to see details since I'm guessing we're talking past each other. So: *1> exactly how are you indexing documents?* /u

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-22 Thread Erick Erickson
bq: What happens if a shard(both leader and replica) goes down. If the document on the "dead shard" is updated, will it forward the document to the new shard. If so, when the "dead shard" comes up again, will this not be considered for the same hask key range? No. The index operation will just fa

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-22 Thread mesenthil1
Alessandro, Thanks. see some confusion here. *First of all you need a smart client that will load balance the docs to index. Let's say the CloudSolrClient . * All these 5 shards are configured to load-balancer and requests are sent to the load-balancer and whichever server is up, will accept t

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Alessandro Benedetti
f the hash dominate the > distribution of data. > > -Original Message- > From: Reitzel, Charles > Sent: Tuesday, July 21, 2015 9:55 AM > To: solr-user@lucene.apache.org > Subject: RE: Solr Cloud: Duplicate documents in multiple shards > > When are you generating the UUID

RE: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Reitzel, Charles
the distribution of data. -Original Message- From: Reitzel, Charles Sent: Tuesday, July 21, 2015 9:55 AM To: solr-user@lucene.apache.org Subject: RE: Solr Cloud: Duplicate documents in multiple shards When are you generating the UUID exactly? If you set the unique ID field on an "updat

RE: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Reitzel, Charles
t: Tuesday, July 21, 2015 4:11 AM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud: Duplicate documents in multiple shards Unable to delete by passing distrib=false as well. Also it is difficult to identify those duplicate documents among the 130 million. Is there a way we can see the gene

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread mesenthil1
Unable to delete by passing distrib=false as well. Also it is difficult to identify those duplicate documents among the 130 million. Is there a way we can see the generated hash key and mapping them to the specific shard? -- View this message in context: http://lucene.472066.n3.nabble.com/Sol

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Upayavira
I suspect you can delete a document from the wrong shard by using update?distrib=false. I also suspect there are people here who would like to help you debug this, because it has been reported before, but we haven't yet been able to see whether it occurred due to human or software error. Upayavir

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-20 Thread mesenthil1
Thanks Erick for clarifying .. We are not explicitly setting the compositeId. We are using numShards=5 alone as part of the server start up. We are using uuid as unique field. One sample id is : possting.mongo-v2.services.com-intl-staging-c2d2a376-5e4a-11e2-8963-0026b9414f30 Not sure how it wou

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-20 Thread Erick Erickson
bq: We have 130 million documents in our set up and the routing key is set as "compositeId". The most likely explanation is that somehow you've sent the same document out with different routing keys. So what is the ID field (or, more generally, your field) for a pair of duplicated documents? My b