Can you check if those IDs are on shard8? You can do this by pointing the URL at the core and specifying &distrib=false...
Best, Erick On Thu, Jun 1, 2017 at 1:42 AM, Amrit Sarkar <sarkaramr...@gmail.com> wrote: > Sorry, The confluence link: > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Thu, Jun 1, 2017 at 2:11 PM, Amrit Sarkar <sarkaramr...@gmail.com> wrote: > >> Sathyam, >> >> It seems your interpretation is wrong as CloudSolrClient calculates >> (hashes the document id and determine the range it belongs to) which shard >> the document incoming belongs to. As you have 10 shards, the document will >> belong to one of them, that is what being calculated and eventually pushed >> to the leader of that shard. >> >> The confluence link provides the insights in much detail: >> https://lucidworks.com/2013/06/13/solr-cloud-document-routing/ >> Another useful link: https://lucidworks.com/2013/06/13/solr-cloud- >> document-routing/ >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> On Thu, Jun 1, 2017 at 11:52 AM, Sathyam <sathyam.dorasw...@gmail.com> >> wrote: >> >>> HI, >>> >>> I am indexing documents to a 10 shard collection (testcollection, having >>> no >>> replicas) in solr6 cluster using CloudSolrClient. I saw that there is a >>> lot >>> of peer to peer document distribution going on when I looked at the solr >>> logs. >>> >>> An example log statement is as follows: >>> 2017-06-01 06:07:28.378 INFO (qtp1358444045-3673692) [c:testcollection >>> s:shard8 r:core_node7 x:testcollection_shard8_replica1] >>> o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1] >>> webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from= >>> http://10.199.42.29:8983/solr/testcollection_shard7_replica1 >>> /&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP >>> (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904), >>> BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk >>> (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25 >>> >>> When I went through the code of CloudSolrClient on grepcode I saw that the >>> client itself finds out which server it needs to hit by using the message >>> id hash and getting the shard range information from state.json. >>> Then it is quite confusing to me why there is a distribution of data >>> between peers as there is no replication and each shard is a leader. >>> >>> I would like to know why this is happening and how to avoid it or if the >>> above log statement means something else and I am misinterpreting >>> something. >>> >>> -- >>> Sathyam Doraswamy >>> >> >>