Re: SolrCloud timing out marking node as down during startup.

Shalin Shekhar Mangar Fri, 23 Jan 2015 08:16:48 -0800

Hi Mike,

This is a bug which was fixed in Solr 4.10.3 via
http://issues.apache.org/jira/browse/SOLR-6610 and it slows down cluster
restarts. Since you have a single node cluster, you will run into it on
every restart.


On Thu, Jan 22, 2015 at 6:42 PM, Michael Roberts <mrobe...@tableau.com>
wrote:

> Hi,
>
> I'm seeing some odd behavior that I am hoping someone could explain to me.
>
> The configuration I'm using to repro the issue, has a ZK cluster and a
> single Solr instance. The instance has 10 Cores, and none of the cores are
> sharded.
>
> The initial startup is fine, the Solr instance comes up and we build our
> index. However if the Solr instance exits uncleanly (killed rather than
> sent a SIGINT), the next time it starts I see the following in the logs.
>
> 2015-01-22 09:56:23.236 -0800 (,,,) localhost-startStop-1 : INFO
> org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from
> ZooKeeper...
> 2015-01-22 09:56:30.008 -0800 (,,,) localhost-startStop-1-EventThread :
> DEBUG org.apache.solr.common.cloud.SolrZkClient - Submitting job to respond
> to event WatchedEvent state:SyncConnected type:NodeChildrenChanged
> path:/live_nodes
> 2015-01-22 09:56:30.008 -0800 (,,,) zkCallback-2-thread-1 : DEBUG
> org.apache.solr.common.cloud.ZkStateReader - Updating live nodes... (0)
> 2015-01-22 09:57:24.102 -0800 (,,,) localhost-startStop-1 : WARN
> org.apache.solr.cloud.ZkController - Timed out waiting to see all nodes
> published as DOWN in our cluster state.
> 2015-01-22 09:57:24.102 -0800 (,,,) localhost-startStop-1 : INFO
> org.apache.solr.cloud.ZkController - Register node as live in
> ZooKeeper:/live_nodes/10.18.8.113:11000_solr
> My question is about "Timed out waiting to see all nodes published as DOWN
> in our cluster state."
>
> Cursory look at the code, we seem to iterate through all
> Collections/Shards, and mark the state as Down. These notifications are
> offered to the Overseer, who I believe updates the ZK state. We then wait
> for the ZK state to update, with the 60 second timeout.
>
> However, it looks like the Overseer is not started until after we wait for
> the timeout. So, in a single instance scenario we'll always have to wait
> for the timeout.
>
> Is this the expected behavior (and just a side effect of running a single
> instance in cloud mode), or is my understanding of the Overseer/Zk
> relationhip incorrect?
>
> Thanks.
>
> .Mike
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: SolrCloud timing out marking node as down during startup.

Reply via email to