[
https://issues.apache.org/jira/browse/HADOOP-12569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133513#comment-15133513
]
Tao Jie commented on HADOOP-12569:
----------------------------------
In our version, we just force kill namenode process before zkfc quit due to
zookeeper connection failure by shell cmd.
Maybe we should add a STOP command in HAServiceProtocol?
Any thoughts?
> ZKFC should stop namenode before itself quit in some circumstances
> ------------------------------------------------------------------
>
> Key: HADOOP-12569
> URL: https://issues.apache.org/jira/browse/HADOOP-12569
> Project: Hadoop Common
> Issue Type: Bug
> Components: ha
> Affects Versions: 2.6.0
> Reporter: Tao Jie
>
> We have met such a HA scenario:
> NN1(active) and zkfc1 on node1;
> NN2(standby) and zkfc2 on node2.
> 1,Stop network on node1, NN2 becomes active. On node2, zkfc2 kills itself
> since it cannot connect to zookeeper, but leaving NN1 still running.
> 2,Several minutes later, network on node1 recovers. NN1 is running but out of
> control. NN1 and NN2 both run as active nn.
> Maybe zkfc should stop nn before quit in such circumstances.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)