[jira] [Commented] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

Greg Harris (Jira) Mon, 29 Apr 2024 11:03:15 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842117#comment-17842117
 ]


Greg Harris commented on KAFKA-16622:
-------------------------------------

Yeah [~ecomar] from the final state of the OffsetSyncStore, this appears to be 
working as intended:

 
{noformat}
[2024-04-26 10:58:44,557] TRACE [MirrorCheckpointConnector|task-0] New sync 
OffsetSync{topicPartition=mytopic-0, upstreamOffset=19998, 
downstreamOffset=19998} applied, new state is 
[19998:19998,19987:19987,19965:19965,19921:19921,19822:19822,19635:19635,19415:19415,18964:18964,18095:18095,16500:16500,9999:9999]
 (org.apache.kafka.connect.mirror.OffsetSyncStore:176){noformat}
The gaps are 11, 22, 44, 99, 187, 220, 451, 869, 1595 which follow the 
approximate exponential that I would expect. Instead of the ~5 syncs I expected 
there's 9, which is better than I estimated because you have the offset.lag.max 
low.

I would say the title of this issue isn't quite accurate now that we've 
investigated it, as the translation can happen at these intermediate points in 
addition to the end of the topic. If you had a consumer group with offset 19635 
or 19636, that would be translated exactly, but a consumer group with offset 
19700 would translate to 19636 and have some lag/reprocessing. This is 
intentional, as we made a trade-off about memory usage and precision to 
prioritize accuracy in the offset translation algorithm. You can see the 
discussion about this here: 
[https://lists.apache.org/thread/7qzxm1727y8rtrw6ds7t6hltkm55j5po] and here: 
[https://github.com/apache/kafka/pull/13178] to see more motivation for the 
current algorithm.

I understand your concern though, and you're correct that KAFKA-15905 will help 
for the offsets between 0 and 9999, and KAFKA-16364 will help with offsets 
close to the end of the topic.

I also just opened this ticket: 
https://issues.apache.org/jira/browse/KAFKA-16641 for another improvement I 
thought of. It has a risk of mis-translating offsets for topics with gaps, but 
should be better than the old pre KAFKA-12468 algorithm, so we can discuss if 
it requires a configuration, and maybe it can be included in a KIP with 
KAFKA-16364.

 

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> -----------------------------------------------------------------------------------
>
>                 Key: KAFKA-16622
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16622
>             Project: Kafka
>          Issue Type: Bug
>          Components: mirrormaker
>    Affects Versions: 3.7.0, 3.6.2, 3.8.0
>            Reporter: Edoardo Comar
>            Priority: Major
>         Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 10000 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>    --from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

Reply via email to