[
https://issues.apache.org/jira/browse/KAFKA-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chia-Ping Tsai resolved KAFKA-16431.
------------------------------------
Fix Version/s: (was: 3.7.1)
Resolution: Duplicate
> Handle log dir failure in hybrid mode
> -------------------------------------
>
> Key: KAFKA-16431
> URL: https://issues.apache.org/jira/browse/KAFKA-16431
> Project: Kafka
> Issue Type: Bug
> Components: jbod
> Affects Versions: 3.7.0
> Reporter: Igor Soarez
> Assignee: Igor Soarez
> Priority: Critical
>
> As part of the KRaft migration, the Controller implements some of the ZK-mode
> controller functionality that is employed during the migration in what is
> known as "hybrid mode".
> In hybrid mode some brokers may still be running in ZK-mode and some brokers
> may have already been restarted into KRaft mode.
> The ZK-mode Controller implementation in KRaft does not implement the
> ZK-based logic to handle directory failures, so it will be unable to re-elect
> leaders for partitions led by failed directories.
> This leaves a gap for JBOD during the ZK-KRaft migration. And there are two
> main ways this can be addressed:
> # Implement the ZK-mode functionality to handle failed directories. Like in
> ZK-mode, the controller needs to subscribe to events in the
> `/log_dir_event_notification` ZNode, and rely on per-partition errors on full
> LeaderAndIsr responses to detect directory failures.
> # Another, simpler way to address this, would be to have a migrating ZK
> broker stop upon any directory failure. This would sacrifice some
> availability / operational flexibility, but it may be much more
> straightforward to implement in comparison.
> Without a solution, a directory failure during the migration may lead to
> indefinite partition unavailability.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)