Gantigmaa Selenge created KAFKA-16620:
-----------------------------------------
Summary: Kraft quorum cannot be formed if all controllers are
restarted at the same time
Key: KAFKA-16620
URL: https://issues.apache.org/jira/browse/KAFKA-16620
Project: Kafka
Issue Type: Bug
Reporter: Gantigmaa Selenge
Controller quorum cannot seem to form at all after accidentally restarting all
controller nodes at the same time in a test environment. This is reproducible,
happens almost everytime when restarting all controller nodes of the cluster.
Started a cluster with 3 controller nodes and 3 broker nodes. After restarting
the controller nodes, one of them becomes the active controller but resigns due
to fetch timeout. The quorum leadership bounces off like this between the nodes
indefinitely.
The controller.quorum.fetch.timeout.ms was set to the default of 2 seconds.
Logs from an active controller:
```
2024-04-17 14:00:48,250 INFO [QuorumController id=0] Becoming the active
controller at epoch 34, next write offset 1116.
(org.apache.kafka.controller.QuorumController)
[quorum-controller-0-event-handler]
2024-04-17 14:00:48,250 WARN [QuorumController id=0] Performing controller
activation. Loaded ZK migration state of NONE.
(org.apache.kafka.controller.QuorumController)
[quorum-controller-0-event-handler]
2024-04-17 14:00:48,701 INFO [RaftManager id=0] Node 1 disconnected.
(org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread]
2024-04-17 14:00:48,701 WARN [RaftManager id=0] Connection to node 1
(my-cluster-controller-1.my-cluster-kafka-brokers.roller.svc.cluster.local/10.244.0.68:9090)
could not be established. Node may not be available.
(org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread]
2024-04-17 14:00:48,776 DEBUG [UnifiedLog partition=__cluster_metadata-0,
dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1117
(exclusive)with recovery point 1117, last flushed: 1713362448239, current
time: 1713362448776,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread]
2024-04-17 14:00:49,277 DEBUG [UnifiedLog partition=__cluster_metadata-0,
dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1118
(exclusive)with recovery point 1118, last flushed: 1713362448777, current
time:
...
2024-04-17 14:01:35,934 DEBUG [UnifiedLog partition=__cluster_metadata-0,
dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1200
(exclusive)with recovery point 1200, last flushed: 1713362489371, current
time: 1713362495934,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread]
2024-04-17 14:01:36,121 INFO [RaftManager id=0] Did not receive fetch request
from the majority of the voters within 3000ms. Current fetched voters are [].
(org.apache.kafka.raft.LeaderState) [kafka-0-raft-io-thread]
2024-04-17 14:01:36,223 WARN [QuorumController id=0] Renouncing the leadership
due to a metadata log event. We were the leader at epoch 34, but in the new
epoch 35, the leader is (none). Reverting to last stable offset 1198.
(org.apache.kafka.controller.QuorumController)
[quorum-controller-0-event-handler]
2024-04-17 14:01:36,223 INFO [QuorumController id=0]
failAll(NotControllerException): failing writeNoOpRecord(152156824).
(org.apache.kafka.deferred.DeferredEventQueue)
[quorum-controller-0-event-handler]
2024-04-17 14:01:36,223 INFO [QuorumController id=0] writeNoOpRecord: event
failed with NotControllerException in 6291037 microseconds.
(org.apache.kafka.controller.QuorumController)
[quorum-controller-0-event-handler]
```
Logs from the follower:
```
024-04-17 14:00:48,242 INFO [RaftManager id=2] Completed transition to
FollowerState(fetchTimeoutMs=2000, epoch=34, leaderId=0, voters=[0, 1, 2],
highWatermark=Optional[LogOffsetMetadata(offset=1113,
metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from
Voted(epoch=34, votedId=0, voters=[0, 1, 2], electionTimeoutMs=1794)
(org.apache.kafka.raft.QuorumState) [kafka-2-raft-io-thread]
2024-04-17 14:00:48,242 INFO [QuorumController id=2] In the new epoch 34, the
leader is 0. (org.apache.kafka.controller.QuorumController)
[quorum-controller-2-event-handler]
2024-04-17 14:00:48,247 DEBUG [UnifiedLog partition=__cluster_metadata-0,
dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1116
(exclusive)with recovery point 1116, last flushed: 1713362442238, current
time: 1713362448247,unflushed: 2 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread]
2024-04-17 14:00:48,777 DEBUG [UnifiedLog partition=__cluster_metadata-0,
dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1117
(exclusive)with recovery point 1117, last flushed: 1713362448249, current
time: 1713362448777,unflushed: 1 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread]
2024-04-17 14:00:49,278 DEBUG [UnifiedLog partition=__cluster_metadata-0,
dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1118
(exclusive)with recovery point 1118, last flushed: 1713362448811, current
time: 1713362449278,unflushed
...
2024-04-17 14:01:29,371 DEBUG [UnifiedLog partition=__cluster_metadata-0,
dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1199
(exclusive)with recovery point 1199, last flushed: 1713362488919, current
time: 1713362489371,unflushed: 1 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread]
2024-04-17 14:01:35,718 INFO [RaftManager id=2] Become candidate due to fetch
timeout (org.apache.kafka.raft.KafkaRaftClient) [kafka-2-raft-io-thread]
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)