[
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185434#comment-16185434
]
Prasanna Gautam commented on KAFKA-5473:
----------------------------------------
[~ijuma] I added a new configuration that's consistent with [~junrao] was
mentioning previously. I have added zookeeper.connection.retry.timeout.ms to
set an upper bound on how long to wait before killing the connection and
triggering the shutdown. This is looking like a bigger structure change than
I'd originally anticipated. I want to make sure I'm on right track. Since
ZkUtils is initialized and needs to be closed/reconnected in ZKServer object,
does it make sense to pass state of connection to the KafkaServer so that
timeout can be guaranteed and the services cleanly shut down.
This is different than other examples in the codebase where ZK is used to share
state, but since this involves ZK not being available, etc, we need a different
mechanism to inform KafkaServer that it needs to start reconnect, then use the
ZKUtils instance thereafter. if the reconnect retry timeout has reached, then
start shutdown process. The IZkStateListener is used in multiple places in
code, and I think it's easier to make another class like
ZKSessionTimeoutRecovery that only handles reconnects, and clean exit if that
fails.
> handle ZK session expiration properly when a new session can't be established
> -----------------------------------------------------------------------------
>
> Key: KAFKA-5473
> URL: https://issues.apache.org/jira/browse/KAFKA-5473
> Project: Kafka
> Issue Type: Sub-task
> Affects Versions: 0.9.0.0
> Reporter: Jun Rao
> Assignee: Prasanna Gautam
> Fix For: 1.0.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in
> handling ZK session expiration a bit. If a new ZK session can't be
> established after session expiration, we just log an error and continue.
> However, this can leave the broker in a bad state since it's up, but not
> registered from the controller's perspective. Replicas on this broker may
> never to be in sync.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)