Hey Mark, Thanks again for your reply. /*"The way we know that its no longer connected to zookeeper is looking at live_nodes - which are ephemeral and will go away if a node goes away"*/
i am not too sure if this is really the case. As far as i remember, even after a node was dead, live_nodes still reported that node as active /but/ the leader was changed to the one that was /really/ alive. I had a look in the Overseer's code and it seems its looping on FIFO queue and wait for new state update requests. So if a node was killed, it would never be sending a state update request and i guess that's why the state is out of sync. If we can set up a wait time for each known node and then declare a node as INACTIVE if overseer does not hear from that node within the wait time. Something similar to heartbeats in several other systems. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Programmatically-create-multiple-collections-tp3916927p3944327.html Sent from the Solr - User mailing list archive at Nabble.com.