On 7/25/2018 3:49 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote: > I end up with four cores instead of two, as expected. The problem is that > three of the four cores (col_shard1_0_replica_n5, col_shard1_0_replica0 and > col_shard1_1_replica_n6) are *all on hostname1*. Only col_shard1_1_replica0 > was placed on hostname2. <snip> > My question is: How can I tell Solr "avoid putting two replicas of the same > shard on the same node"?
Somehow I missed that there were three cores on host1 when you first described the problem. Looking back, I see that you did have that information there. I was more focused on the fact that host2 only had one core. My apologies for not reading closely enough. Is this collection using compositeId or implicit? I think it would have to be compositeId for a split to work correctly. I wouldn't expect split to be supported on a collection with the implicit router. Are you running one Solr node per host? If you have multiple Solr nodes (instances) on one host, Solr will have no idea that this is the case -- the entire node identifier (including host name, port, and context path) is compared to distinguish nodes from each other. The assumption in SolrCloud's internals is that each node is completely separate from every other node. Running multiple nodes per host is only recommended when the heap requirements are *very* high, and in that situation, making sure that replicas are distributed properly will require extra effort. For most installations, it is strongly recommended to only have one Solr node per physical host. If you are only running one Solr node per host, then the way it's behaving for you is certainly not the design intent, and sounds like a bug in SPLITSHARD. Solr should try very hard to not place multiple replicas of one shard on the same *node*. A side question for devs that know about SolrCloud internals: Could SolrCloud avoid putting multiple replicas of the same shard on the same host when there are multiple nodes per host? It seems to me that it would not be supremely difficult to have SolrCloud detect a match in the host name and use that information to prefer nodes on different hosts when possible. I am thinking about creating an issue for this enhancement. Thanks, Shawn