[
https://issues.apache.org/jira/browse/KAFKA-16106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Jacot resolved KAFKA-16106.
---------------------------------
Fix Version/s: 4.0.0
Assignee: Dongnuo Lyu (was: Jeff Kim)
Resolution: Fixed
> group size counters do not reflect the actual sizes when operations fail
> ------------------------------------------------------------------------
>
> Key: KAFKA-16106
> URL: https://issues.apache.org/jira/browse/KAFKA-16106
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Jeff Kim
> Assignee: Dongnuo Lyu
> Priority: Major
> Fix For: 4.0.0
>
>
> An expire-group-metadata operation generates tombstone records, updates the
> `groups` state and decrements group size counters, then performs a write to
> the log. If there is a __consumer_offsets partition reassignment, this
> operation fails. The `groups` state is reverted to an earlier snapshot but
> classic group size counters are not. This begins an inconsistency between the
> metrics and the actual groups size. This applies to all unsuccessful write
> operations that alter the `groups` state.
>
> The issue is exacerbated because the expire group metadata operation can be
> retried multiple times until the partition is fully unloaded.
>
> The solution to this is to make the counters also a timeline data structure
> (TimelineLong) so that in the event of a failed write operation we revert the
> counters as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)