Hi,

I am seeing some unexpected behavior when adding a new machine to my cluster. I 
am running 4.10.3.

My setup has multiple collections, each collection has a single shard. I am 
using core auto discovery on the hosts (my deployment mechanism ensures that 
the directory structure is created and the core.properties file is in the right 
place).

To add a new machine I have to stop the cluster.

If I add a new machine, and start the cluster, if this new machine is elected 
leader for the shard, peer recovery fails. So, now I have a leader with no 
content, and replicas with content. Depending on where the read request is 
sent, I may or may not get the response I am expecting.

2015-06-04 14:26:09.595 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Running the leader process 
for shard shard1
2015-06-04 14:26:09.607 -0700 (,,,) coreZkRegister-1-thread-9 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Waiting until we see more 
replicas up for shard shard1: total=2 found=1 timeoutin=1.14707356E15ms
2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to 
continue.
2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - 
try and sync
2015-06-04 14:26:10.115 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.update.PeerSync - PeerSync: core=domain 
url=http://10.36.9.70:11000/solr START 
replicas=[http://mlim:11000/solr/domain/] nUpdates=100
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.update.PeerSync - PeerSync: core=domain 
url=http://10.36.9.70:11000/solr DONE.  We have no versions.  sync failed.
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we have 
no versions - we can't sync in that case - we were active before, so become 
leader anyway
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: 
http://10.36.9.70:11000/solr/domain/ shard1
2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ZkController - No LogReplay needed for core=domain 
baseURL=http://10.36.9.70:11000/solr
2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ZkController - I am the leader, no recovery necessary

This seems like a fairly common scenario. So I suspect, either I am doing 
something incorrectly, or I have an incorrect assumption about how this is 
supposed to work.

Does anyone have any suggestions?

Thanks

Mike.

Reply via email to