Hi, I'm seeing some odd behavior that I am hoping someone could explain to me.
The configuration I'm using to repro the issue, has a ZK cluster and a single Solr instance. The instance has 10 Cores, and none of the cores are sharded. The initial startup is fine, the Solr instance comes up and we build our index. However if the Solr instance exits uncleanly (killed rather than sent a SIGINT), the next time it starts I see the following in the logs. 2015-01-22 09:56:23.236 -0800 (,,,) localhost-startStop-1 : INFO org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from ZooKeeper... 2015-01-22 09:56:30.008 -0800 (,,,) localhost-startStop-1-EventThread : DEBUG org.apache.solr.common.cloud.SolrZkClient - Submitting job to respond to event WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes 2015-01-22 09:56:30.008 -0800 (,,,) zkCallback-2-thread-1 : DEBUG org.apache.solr.common.cloud.ZkStateReader - Updating live nodes... (0) 2015-01-22 09:57:24.102 -0800 (,,,) localhost-startStop-1 : WARN org.apache.solr.cloud.ZkController - Timed out waiting to see all nodes published as DOWN in our cluster state. 2015-01-22 09:57:24.102 -0800 (,,,) localhost-startStop-1 : INFO org.apache.solr.cloud.ZkController - Register node as live in ZooKeeper:/live_nodes/10.18.8.113:11000_solr My question is about "Timed out waiting to see all nodes published as DOWN in our cluster state." Cursory look at the code, we seem to iterate through all Collections/Shards, and mark the state as Down. These notifications are offered to the Overseer, who I believe updates the ZK state. We then wait for the ZK state to update, with the 60 second timeout. However, it looks like the Overseer is not started until after we wait for the timeout. So, in a single instance scenario we'll always have to wait for the timeout. Is this the expected behavior (and just a side effect of running a single instance in cloud mode), or is my understanding of the Overseer/Zk relationhip incorrect? Thanks. .Mike