Re: solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

Shalin Shekhar Mangar Mon, 21 Sep 2015 06:15:34 -0700

Hi Jeff,

The leader election relies on ephemeral nodes in Zookeeper to detect
when leader or other nodes have gone down (abruptly). These ephemeral
nodes are automatically deleted by ZooKeeper after the ZK session
timeout which is by default 30 seconds. So if you kill a node then it
can take up to 30 seconds for the cluster to detect it and start a new
leader election. This won't be necessary during a graceful shutdown
because on shutdown the node will give up leader position so that a
new one can be elected. You could tune the zk session timeout to a
lower value but then it makes the cluster more sensitive to GC pauses
which can also trigger new leader elections.


On Mon, Sep 21, 2015 at 5:55 PM, Jeff Wu <wuhai...@gmail.com> wrote:
> Our environment still run with Solr4.7. Recently we noticed in a test. When
> we stopped 1 solr server(solr02, which did OS shutdown), all the cores of
> solr02 are shown as "down", but remains a few cores still as leaders. After
> that, we quickly seeing all other servers are still sending requests to
> that down solr server, and therefore we saw a lot of TCP waiting threads in
> thread pool of other solr servers since solr02 already down.
>
> "shard53":{
>         "range":"26660000-2998ffff",
>         "state":"active",
>         "replicas":{
>           "core_node102":{
>             "state":"down",
>             "base_url":"https://solr02.myhost/solr";,
>             "core":"collection2_shard53_replica1",
>             "node_name":"https://solr02.myhost_solr";,
>             "leader":"true"},
>           "core_node104":{
>             "state":"active",
>             "base_url":"https://solr04.myhost/solr";,
>             "core":"collection2_shard53_replica2",
>             "node_name":"https://solr04.myhost/solr_solr"}}},
>
> Is this something known bug in 4.7 and late on fixed? Any reference JIRA we
> can study about?  If the solr service is stopped gracefully, we can see
> leader core election happens and switched to other active core. But if we
> just directly shutdown a Solr OS, we can reproduce in our environment that
> some "Down" cores remains "leader" at ZK clusterstate.json



-- 
Regards,
Shalin Shekhar Mangar.

Re: solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

Reply via email to