patsonluk opened a new pull request, #2673:
URL: https://github.com/apache/lucene-solr/pull/2673

   ## Description
   It's found that our prod env have certain data nodes have "ghost replicas" 
that do not have data dir but has the core.properties file and core directory. 
Replica with same name is actually reside on a different node as defined in the 
`state.json`. Such "ghost replicas" can trigger  `DOWN` replica state being 
published, which the real replica (with same name) is actually healthy in 
another node.
   
   More details of the issue can be found in 
https://app.shortcut.com/fullstory/story/217252/investigate-replica-that-failed-to-come-up-as-active-during-restart-deployment#activity-217734
   
   ## Solution
   While we do not yet know the exact cause of those "ghost replicas" (probably 
from some migration hiccup during c82 creation?), it seems to be a rare 
occurrence now (8 replicas in c82). 
   
   Therefore we will add a new exception `InconsistentClusterStateException`, 
which would be thrown from `checkStateInZk` if node name of a replica defined 
in state.json is different from the current node which tries to spin up such 
core. Such exception would interrupt the core creation, and no longer publish a 
`DOWN` state.
   
   For now, we will NOT provide an cleanup in the Solr code, as this seems to 
be an edge case and cleanup (ie unload core and remove the physical directory) 
could be risky. 
   
   Take note that we will probably still need to "clean up" those ghost 
replicas later on perhaps by manually purging them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to