[
https://issues.apache.org/jira/browse/KAFKA-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Varsha Abhinandan updated KAFKA-8673:
-------------------------------------
Description:
We observed a deadlock kind of a situation in our Kafka streams application
when we accidentally shut down all the brokers. The Kafka cluster was brought
back in about an hour.
Observations made :
# Normal Kafka producers and consumers started working fine after the brokers
were up again.
# The Kafka streams applications were stuck in the "rebalancing" state.
# The Kafka streams apps have exactly-once semantics enabled.
# The stack trace showed most of the stream threads sending the join group
requests to the group co-ordinator
# Few stream threads couldn't initiate the join group request since the call
to
[org.apache.kafka.clients.producer.KafkaProducer#sendOffsetsToTransaction|https://jira.corp.appdynamics.com/browse/ANLYTCS_ES-2062#sendOffsetsToTransaction%20which%20was%20hung]
was stuck.
# Seems like the join group requests were getting parked at the coordinator
since the expected members hadn't sent their own group join requests
# And after the timeout, the stream threads that were not stuck sent a new
join group requests.
# Maybe (6) and (7) is happening infinitely
# Sample values of the GroupMetadata object on the group co-ordinator !Screen
Shot 2019-07-11 at 12.08.09 PM.png|width=319,height=53!
> Kafka stream threads stuck while sending offsets to transaction preventing
> join group from completing
> -----------------------------------------------------------------------------------------------------
>
> Key: KAFKA-8673
> URL: https://issues.apache.org/jira/browse/KAFKA-8673
> Project: Kafka
> Issue Type: Bug
> Components: consumer, streams
> Affects Versions: 2.2.0
> Reporter: Varsha Abhinandan
> Priority: Major
>
> We observed a deadlock kind of a situation in our Kafka streams application
> when we accidentally shut down all the brokers. The Kafka cluster was brought
> back in about an hour.
> Observations made :
> # Normal Kafka producers and consumers started working fine after the
> brokers were up again.
> # The Kafka streams applications were stuck in the "rebalancing" state.
> # The Kafka streams apps have exactly-once semantics enabled.
> # The stack trace showed most of the stream threads sending the join group
> requests to the group co-ordinator
> # Few stream threads couldn't initiate the join group request since the call
> to
> [org.apache.kafka.clients.producer.KafkaProducer#sendOffsetsToTransaction|https://jira.corp.appdynamics.com/browse/ANLYTCS_ES-2062#sendOffsetsToTransaction%20which%20was%20hung]
> was stuck.
> # Seems like the join group requests were getting parked at the coordinator
> since the expected members hadn't sent their own group join requests
> # And after the timeout, the stream threads that were not stuck sent a new
> join group requests.
> # Maybe (6) and (7) is happening infinitely
> # Sample values of the GroupMetadata object on the group co-ordinator
> !Screen Shot 2019-07-11 at 12.08.09 PM.png|width=319,height=53!
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)