[
https://issues.apache.org/jira/browse/KAFKA-16297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Soarez updated KAFKA-16297:
--------------------------------
Fix Version/s: 3.8.0
> Race condition while promoting future replica can lead to partition
> unavailability.
> -----------------------------------------------------------------------------------
>
> Key: KAFKA-16297
> URL: https://issues.apache.org/jira/browse/KAFKA-16297
> Project: Kafka
> Issue Type: Sub-task
> Components: jbod
> Affects Versions: 3.7.0
> Reporter: Igor Soarez
> Assignee: Igor Soarez
> Priority: Major
> Fix For: 3.8.0, 3.7.1
>
>
> KIP-858 proposed that when a directory failure occurs after changing the
> assignment of a replica that's moved between two directories in the same
> broker, but before the future replica promotion completes, the broker should
> reassign the replica to inform the controller of its correct status. But this
> hasn't yet been implemented, and without it this failure may lead to
> indefinite partition unavailability.
> Example scenario:
> # A broker which leads partition P receives a request to alter the replica
> from directory A to directory B.
> # The broker creates a future replica in directory B and starts a replica
> fetcher.
> # Once the future replica first catches up, the broker queues a reassignment
> to inform the controller of the directory change.
> # The next time the replica catches up, the broker briefly blocks appends
> and promotes the replica. However, before the promotion is attempted,
> directory A fails.
> # The controller was informed that P in now in directory B before it
> received the notification that directory A has failed, so it does not elect a
> new leader, and as long as the broker is online, partition A remains
> unavailable.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)