[ 
https://issues.apache.org/jira/browse/KAFKA-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton updated KAFKA-17115:
----------------------------------
    Description: 
When a dynamic consumer (i.e., one with no group instance ID configured) first 
tries to join a group, the group coordinator normally responds with the 
MEMBER_ID_REQUIRED error, under the assumption that the member will retry soon 
after. During this step, the group coordinator will also generate a new member 
ID for the consumer, include it in the error response for the initial join 
group request, and expect that a member with that ID will participate in future 
rebalances.

If a consumer is closed in between the time that it sends the JoinGroup request 
and the time that it receives the response from the group coordinator, it will 
not attempt to leave the group, since it doesn't have a member ID to include in 
that request.

This will cause future rebalances to hang, since the group coordinator will 
still expect a member with the ID for the now-closed consumer to join. 
Eventually, the group coordinator may remove the closed consumer from the 
group, but with default configuration settings, this could take as long as five 
minutes.

One possible fix is to send a LeaveGroup response with the member ID if the 
consumer receives a JoinGroup response with a member ID after it has been 
closed.

This ticket applies only to the legacy consumer. There is a similar issue with 
the new consumer that is tracked separately in KAFKA-17116.

  was:
When a dynamic consumer (i.e., one with no group instance ID configured) first 
tries to join a group, the group coordinator normally responds with the 
MEMBER_ID_REQUIRED error, under the assumption that the member will retry soon 
after. During this step, the group coordinator will also generate a new member 
ID for the consumer, include it in the error response for the initial join 
group request, and expect that a member with that ID will participate in future 
rebalances.

If a consumer is closed in between the time that it sends the JoinGroup request 
and the time that it receives the response from the group coordinator, it will 
not attempt to leave the group, since it doesn't have a member ID to include in 
that request.

This will cause future rebalances to hang, since the group coordinator will 
still expect a member with the ID for the now-closed consumer to join. 
Eventually, the group coordinator may remove the closed consumer from the 
group, but with default configuration settings, this could take as long as five 
minutes.

One possible fix is to send a LeaveGroup response with the member ID if the 
consumer receives a JoinGroup response with a member ID after it has been 
closed.

 

This applies to the legacy consumer; I have not verified yet with the new async 
consumer.


> Closing newly-created consumers during rebalance can cause rebalances to hang
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-17115
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17115
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 3.9.0
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Major
>
> When a dynamic consumer (i.e., one with no group instance ID configured) 
> first tries to join a group, the group coordinator normally responds with the 
> MEMBER_ID_REQUIRED error, under the assumption that the member will retry 
> soon after. During this step, the group coordinator will also generate a new 
> member ID for the consumer, include it in the error response for the initial 
> join group request, and expect that a member with that ID will participate in 
> future rebalances.
> If a consumer is closed in between the time that it sends the JoinGroup 
> request and the time that it receives the response from the group 
> coordinator, it will not attempt to leave the group, since it doesn't have a 
> member ID to include in that request.
> This will cause future rebalances to hang, since the group coordinator will 
> still expect a member with the ID for the now-closed consumer to join. 
> Eventually, the group coordinator may remove the closed consumer from the 
> group, but with default configuration settings, this could take as long as 
> five minutes.
> One possible fix is to send a LeaveGroup response with the member ID if the 
> consumer receives a JoinGroup response with a member ID after it has been 
> closed.
> This ticket applies only to the legacy consumer. There is a similar issue 
> with the new consumer that is tracked separately in KAFKA-17116.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to