[
https://issues.apache.org/jira/browse/KAFKA-16185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lianet Magrans updated KAFKA-16185:
-----------------------------------
Labels: client-transitions-issues (was: )
> Fix client reconciliation of same assignment received in different epochs
> --------------------------------------------------------------------------
>
> Key: KAFKA-16185
> URL: https://issues.apache.org/jira/browse/KAFKA-16185
> Project: Kafka
> Issue Type: Sub-task
> Components: clients, consumer
> Reporter: Lianet Magrans
> Assignee: Lianet Magrans
> Priority: Major
> Labels: client-transitions-issues
>
> Currently, the intention in the client state machine is that the client
> always reconciles whatever it has pending that has not been removed by the
> coordinator.
> There is still an edge case where this does not happen, and the client might
> get stuck JOINING/RECONCILING, with a pending reconciliation (delayed), and
> it receives the same assignment, but in a new epoch (ex. after being FENCED).
> First time it receives the assignment it takes no action, as it already has
> it as pending to reconcile, but when the reconciliation completes it discards
> the result because the epoch changed. And this is wrong. Note that after
> sending the assignment with the new epoch one time, the broker continues to
> send null assignments.
> Here is a sample sequence leading to the client stuck JOINING:
> - client joins, epoch 0
> - client receives assignment tp1, stuck RECONCILING, epoch 1
> - member gets FENCED on the coord, coord bumps epoch to 2
> - client tries to rejoin (JOINING), epoch 0 provided by the client
> - new member added to the group (group epoch bumped to 3), client receives
> same assignment that is currently trying to reconcile (tp1), but with epoch 3
> - previous reconciliation completes, but will discard the result because it
> will notice that the memberHasRejoined (memberEpochOnReconciliationStart !=
> memberEpoch). Client is stuck JOINING, with the server sending null target
> assignment because it hasn't changed since the last one sent (tp1)
> (We should end up with a test similar to the existing
> #testDelayedReconciliationResultDiscardedIfMemberRejoins but with the case
> that the member receives the same assignment after being fenced and rejoining)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)