[ 
https://issues.apache.org/jira/browse/KAFKA-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-13499:
-------------------------------------

We found a bug on the merged PR, and reverted the change in 4.3 branch. 
Re-opening this ticket with fixed-version 4.4 so we can either fix-forward, or 
if we don't get to it, have a signal to also revert for 4.4.0 release if 
necessary.

> Avoid restoring outdated records
> --------------------------------
>
>                 Key: KAFKA-13499
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13499
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Ziyun Fu
>            Priority: Major
>             Fix For: 4.4.0
>
>
> Kafka Streams has the config `windowstore.changelog.additional.retention.ms` 
> to allow for an increase retention time.
> While an increase retention time can be useful, it can also lead to 
> unnecessary restore cost, especially for stream-stream joins. Assume a 
> stream-stream join with 1h window size and a grace period of 1h. For this 
> case, we only need 2h of data to restore. If we lag, the 
> `windowstore.changelog.additional.retention.ms` helps to prevent the broker 
> from truncating data too early. However, if we don't lag and we need to 
> restore, we restore everything from the changelog.
> Instead of doing a seek-to-beginning, we could use the timestamp index to 
> seek the first offset older than the 2h "window" of data that we need to 
> restore, to avoid unnecessary work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to