Hi, I am seeing some unexpected behavior when adding a new machine to my cluster. I am running 4.10.3.
My setup has multiple collections, each collection has a single shard. I am using core auto discovery on the hosts (my deployment mechanism ensures that the directory structure is created and the core.properties file is in the right place). To add a new machine I have to stop the cluster. If I add a new machine, and start the cluster, if this new machine is elected leader for the shard, peer recovery fails. So, now I have a leader with no content, and replicas with content. Depending on where the read request is sent, I may or may not get the response I am expecting. 2015-06-04 14:26:09.595 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - Running the leader process for shard shard1 2015-06-04 14:26:09.607 -0700 (,,,) coreZkRegister-1-thread-9 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - Waiting until we see more replicas up for shard shard1: total=2 found=1 timeoutin=1.14707356E15ms 2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to continue. 2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - try and sync 2015-06-04 14:26:10.115 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.update.PeerSync - PeerSync: core=domain url=http://10.36.9.70:11000/solr START replicas=[http://mlim:11000/solr/domain/] nUpdates=100 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.update.PeerSync - PeerSync: core=domain url=http://10.36.9.70:11000/solr DONE. We have no versions. sync failed. 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: http://10.36.9.70:11000/solr/domain/ shard1 2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.cloud.ZkController - No LogReplay needed for core=domain baseURL=http://10.36.9.70:11000/solr 2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO org.apache.solr.cloud.ZkController - I am the leader, no recovery necessary This seems like a fairly common scenario. So I suspect, either I am doing something incorrectly, or I have an incorrect assumption about how this is supposed to work. Does anyone have any suggestions? Thanks Mike.