[
https://issues.apache.org/jira/browse/KAFKA-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francois Visconte updated KAFKA-17249:
--------------------------------------
Affects Version/s: 3.9.0
> Failures when building remote log aux state can make the leader epoch cache
> inconsistent
> ----------------------------------------------------------------------------------------
>
> Key: KAFKA-17249
> URL: https://issues.apache.org/jira/browse/KAFKA-17249
> Project: Kafka
> Issue Type: Bug
> Components: Tiered-Storage
> Affects Versions: 3.8.0, 3.7.1, 3.9.0
> Reporter: Kyle Phelps
> Priority: Major
>
> When a follower has to `buildRemoteLogAuxState` it truncates the local log.
> Then it attempts to rebuild the epoch cache from the checkpoint in remote
> storage. However, if this fails and the broker is restarted, the cache is
> missing entries associated with remote segments.
> Reproduction steps:
> # Take an existing tiered storage partition - move the latest index file
> from remote storage so it will be inaccessible.
> # Stop one of the follower brokers, delete the partition's local data.
> # Restart the follower - it should be failing to build aux state.
> # Restart the follower again. Since the log's offsets have been updated, it
> can now successfully fetch and join the ISR.
> # Promote the follower to the leader.
> In this scenario the leader becomes unable to serve tiered fetch requests.
> I _think_ the root of the problem here is that the leader epoch cache isn't
> recovering the epoch data for remote segments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)