Peer Sync fails when newly added node is elected leader.

Michael Roberts Thu, 04 Jun 2015 16:31:39 -0700

Hi,

I am seeing some unexpected behavior when adding a new machine to my cluster. I 
am running 4.10.3.


My setup has multiple collections, each collection has a single shard. I am 
using core auto discovery on the hosts (my deployment mechanism ensures that 
the directory structure is created and the core.properties file is in the right 
place).

To add a new machine I have to stop the cluster.

If I add a new machine, and start the cluster, if this new machine is elected 
leader for the shard, peer recovery fails. So, now I have a leader with no 
content, and replicas with content. Depending on where the read request is 
sent, I may or may not get the response I am expecting.

2015-06-04 14:26:09.595 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Running the leader process 
for shard shard1
2015-06-04 14:26:09.607 -0700 (,,,) coreZkRegister-1-thread-9 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Waiting until we see more 
replicas up for shard shard1: total=2 found=1 timeoutin=1.14707356E15ms
2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to 
continue.
2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - 
try and sync
2015-06-04 14:26:10.115 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.update.PeerSync - PeerSync: core=domain 
url=http://10.36.9.70:11000/solr START 
replicas=[http://mlim:11000/solr/domain/] nUpdates=100
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.update.PeerSync - PeerSync: core=domain 
url=http://10.36.9.70:11000/solr DONE.  We have no versions.  sync failed.
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we have 
no versions - we can't sync in that case - we were active before, so become 
leader anyway
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: 
http://10.36.9.70:11000/solr/domain/ shard1
2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ZkController - No LogReplay needed for core=domain 
baseURL=http://10.36.9.70:11000/solr
2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ZkController - I am the leader, no recovery necessary

This seems like a fairly common scenario. So I suspect, either I am doing 
something incorrectly, or I have an incorrect assumption about how this is 
supposed to work.

Does anyone have any suggestions?

Thanks

Mike.

Peer Sync fails when newly added node is elected leader.

Reply via email to