Active node "kicked out" when starting a new node

teddie_lee Sun, 27 Jan 2019 22:45:04 -0800

Hi,

I have a SolrCloud cluster with 3 nodes running on AWS. My collection is
created with numShard=1and replicationFactor=3. Recently, due to the need of
having stress test, our ops cloned a new machine with exactly the same
configuration as one of the nodes in existed cluster (let's say the new
machine is node4 and the node being cloned is node1).



However, after I started node4 mistakenly (node4 is supposed to start in
standalone mode, I just forgot to remove the configuration regards to
zookeeper), I could see that node4 took the place of node1 in Admin UI. Then
I found directory 'items_shard1_replica_n1' under path
'../solr/server/solr/' is no longer exist on node1. Instead, the directory
was copied to node4.


I tried to stop Solr on node4 and restarted Solr on node1 but to no avail.
It seems like node1 can't rejoin the cluster automatically. Then I found
even I start Solr on node4, the status of node4 was still 'Down' and never
become 'Recovering' while the rest of the nodes in cluster are 'Active'.

So the final solution is to copied directory  'items_shard1_replica_n1' from
node4 back to the node1 and restarted Solr on node1. Then node1 join the
cluster automatically and everything seems fine. 


My question is why this would happen? Or are there any documents about how
SolrCloud manages the cluster behind the scenes?


Thanks,
Teddie




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Active node "kicked out" when starting a new node

Reply via email to