[
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849391#comment-17849391
]
Edoardo Comar commented on KAFKA-15905:
---------------------------------------
backported to 3.7.1
> Restarts of MirrorCheckpointTask should not permanently interrupt offset
> translation
> ------------------------------------------------------------------------------------
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
> Issue Type: Improvement
> Components: mirrormaker
> Affects Versions: 3.6.0
> Reporter: Greg Harris
> Assignee: Edoardo Comar
> Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state
> of checkpointsPerConsumerGroup, which limits offset translation to records
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more
> records are mirrored, translation can happen at the ~1500th record, but no
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to
> go backwards temporarily during restarts. To fix this, we forced the
> OffsetSyncStore to initialize completely before translation could take place,
> ensuring that the latest OffsetSync had been read, and thus providing the
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync.
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to
> allow for translation of earlier offsets. This came with the caveat that the
> cache's sparseness allowed translations to go backwards permanently. To
> prevent this behavior, a cache of the latest Checkpoints was kept in the
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an
> OffsetSync emitted during it's lifetime, to ensure that no previous
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints
> emitted by previous generations of MirrorCheckpointTask, we can still ensure
> that checkpoints are monotonic, while allowing translation further back in
> history.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)