Hi Mike, This is a bug which was fixed in Solr 4.10.3 via http://issues.apache.org/jira/browse/SOLR-6610 and it slows down cluster restarts. Since you have a single node cluster, you will run into it on every restart.
On Thu, Jan 22, 2015 at 6:42 PM, Michael Roberts <mrobe...@tableau.com> wrote: > Hi, > > I'm seeing some odd behavior that I am hoping someone could explain to me. > > The configuration I'm using to repro the issue, has a ZK cluster and a > single Solr instance. The instance has 10 Cores, and none of the cores are > sharded. > > The initial startup is fine, the Solr instance comes up and we build our > index. However if the Solr instance exits uncleanly (killed rather than > sent a SIGINT), the next time it starts I see the following in the logs. > > 2015-01-22 09:56:23.236 -0800 (,,,) localhost-startStop-1 : INFO > org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from > ZooKeeper... > 2015-01-22 09:56:30.008 -0800 (,,,) localhost-startStop-1-EventThread : > DEBUG org.apache.solr.common.cloud.SolrZkClient - Submitting job to respond > to event WatchedEvent state:SyncConnected type:NodeChildrenChanged > path:/live_nodes > 2015-01-22 09:56:30.008 -0800 (,,,) zkCallback-2-thread-1 : DEBUG > org.apache.solr.common.cloud.ZkStateReader - Updating live nodes... (0) > 2015-01-22 09:57:24.102 -0800 (,,,) localhost-startStop-1 : WARN > org.apache.solr.cloud.ZkController - Timed out waiting to see all nodes > published as DOWN in our cluster state. > 2015-01-22 09:57:24.102 -0800 (,,,) localhost-startStop-1 : INFO > org.apache.solr.cloud.ZkController - Register node as live in > ZooKeeper:/live_nodes/10.18.8.113:11000_solr > My question is about "Timed out waiting to see all nodes published as DOWN > in our cluster state." > > Cursory look at the code, we seem to iterate through all > Collections/Shards, and mark the state as Down. These notifications are > offered to the Overseer, who I believe updates the ZK state. We then wait > for the ZK state to update, with the 60 second timeout. > > However, it looks like the Overseer is not started until after we wait for > the timeout. So, in a single instance scenario we'll always have to wait > for the timeout. > > Is this the expected behavior (and just a side effect of running a single > instance in cloud mode), or is my understanding of the Overseer/Zk > relationhip incorrect? > > Thanks. > > .Mike > > -- Regards, Shalin Shekhar Mangar.