[
https://issues.apache.org/jira/browse/KAFKA-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705560#comment-16705560
]
Guozhang Wang commented on KAFKA-4748:
--------------------------------------
Just a heads up, we are working on KIP-345 which should resolve this issue. cc
[~bchen225242]
> Need a way to shutdown all workers in a Streams application at the same time
> ----------------------------------------------------------------------------
>
> Key: KAFKA-4748
> URL: https://issues.apache.org/jira/browse/KAFKA-4748
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 0.10.1.1
> Reporter: Elias Levy
> Priority: Major
>
> If you have a fleet of Stream workers for an application and attempt to shut
> them down simultaneously (e.g. via SIGTERM and
> Runtime.getRuntime().addShutdownHook() and streams.close())), a large number
> of the workers fail to shutdown.
> The problem appears to be a race condition between the shutdown signal and
> the consumer rebalancing that is triggered by some of the workers existing
> before others. Apparently, workers that receive the signal later fail to
> exit apparently as they are caught in the rebalance.
> Terminating workers in a rolling fashion is not advisable in some situations.
> The rolling shutdown will result in many unnecessary rebalances and may
> fail, as the application may have large amount of local state that a smaller
> number of nodes may not be able to store.
> It would appear that there is a need for a protocol change to allow the
> coordinator to signal a consumer group to shutdown without leading to
> rebalancing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)