[
https://issues.apache.org/jira/browse/KAFKA-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eno Thereska resolved KAFKA-5571.
---------------------------------
Resolution: Fixed
> Possible deadlock during shutdown in setState in kafka streams 10.2
> -------------------------------------------------------------------
>
> Key: KAFKA-5571
> URL: https://issues.apache.org/jira/browse/KAFKA-5571
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 0.10.2.1
> Reporter: Greg Fodor
> Assignee: Eno Thereska
> Attachments: kafka-streams.deadlock.log
>
>
> I'm running a 10.2 job across 5 nodes with 32 stream threads on each node and
> find that when gracefully shutdown all of them at once via an ansible
> scripts, some of the nodes end up freezing -- at a glance the attached thread
> dump implies a deadlock between stream threads trying to update their state
> via setState. We haven't had this problem before but it may or may not be
> related to changes in 10.2 (we are upgrading from 10.0 to 10.2)
> when we gracefully shutdown all nodes simultaneously, what typically happens
> is some subset of the nodes end up not shutting down completely but end up
> going through a rebalance first. it seems this deadlock requires this
> rebalancing to occur simultaneously with the graceful shutdown. if we happen
> to shut them down and no rebalance happens, i don't believe this deadlock is
> triggered.
> the deadlock appears related to the state change handlers being subscribed
> across threads and the fact that both StreamThread#setState and
> StreamStateListener#onChange are both synchronized methods.
> Another thing worth mentioning is that one of the transformers used in the
> job has a close() method that can take 10-15 seconds to finish since it needs
> to flush some data to a database. Having a long close() method combined with
> a rebalance during a shutdown across many threads may be necessary for
> reproduction.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)