On 7/25/2018 3:49 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> I end up with four cores instead of two, as expected. The problem is that 
> three of the four cores (col_shard1_0_replica_n5, col_shard1_0_replica0 and 
> col_shard1_1_replica_n6) are *all on hostname1*. Only col_shard1_1_replica0 
> was placed on hostname2.
<snip>
> My question is: How can I tell Solr "avoid putting two replicas of the same 
> shard on the same node"?

Somehow I missed that there were three cores on host1 when you first
described the problem.  Looking back, I see that you did have that
information there.  I was more focused on the fact that host2 only had
one core.  My apologies for not reading closely enough.

Is this collection using compositeId or implicit?  I think it would have
to be compositeId for a split to work correctly.  I wouldn't expect
split to be supported on a collection with the implicit router.

Are you running one Solr node per host?  If you have multiple Solr nodes
(instances) on one host, Solr will have no idea that this is the case --
the entire node identifier (including host name, port, and context path)
is compared to distinguish nodes from each other.  The assumption in
SolrCloud's internals is that each node is completely separate from
every other node.  Running multiple nodes per host is only recommended
when the heap requirements are *very* high, and in that situation,
making sure that replicas are distributed properly will require extra
effort.  For most installations, it is strongly recommended to only have
one Solr node per physical host.

If you are only running one Solr node per host, then the way it's
behaving for you is certainly not the design intent, and sounds like a
bug in SPLITSHARD.  Solr should try very hard to not place multiple
replicas of one shard on the same *node*.

A side question for devs that know about SolrCloud internals:  Could
SolrCloud avoid putting multiple replicas of the same shard on the same
host when there are multiple nodes per host?  It seems to me that it
would not be supremely difficult to have SolrCloud detect a match in the
host name and use that information to prefer nodes on different hosts
when possible.  I am thinking about creating an issue for this enhancement.

Thanks,
Shawn

Reply via email to