Jeff Kim created KAFKA-16106:
--------------------------------
Summary: group size counters do not reflect the actual sizes when
operations fail
Key: KAFKA-16106
URL: https://issues.apache.org/jira/browse/KAFKA-16106
Project: Kafka
Issue Type: Sub-task
Reporter: Jeff Kim
Assignee: Jeff Kim
An expire-group-metadata operation generates tombstone records, updates the
`groups` state and decrements group size counters, then performs a write to the
log. If there is a __consumer_offsets partition reassignment, this operation
fails. The `groups` state is reverted to an earlier snapshot but classic group
size counters are not. This begins an inconsistency between the metrics and the
actual groups size. This applies to all unsuccessful write operations that
alter the `groups` state.
The issue is exacerbated because the expire group metadata operation is retried
possibly indefinitely.
The solution to this is to make the counters also a timeline data structure
(TimelineLong) so that in the event of a failed write operation we revert the
counters as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)