[
https://issues.apache.org/jira/browse/KAFKA-19643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zzshine updated KAFKA-19643:
----------------------------
Attachment: controller_every_min_change.png
part_leader_to_one_node.png
> Controller keeps switching and occasionally goes offline.
> ---------------------------------------------------------
>
> Key: KAFKA-19643
> URL: https://issues.apache.org/jira/browse/KAFKA-19643
> Project: Kafka
> Issue Type: Bug
> Components: controller, kraft
> Affects Versions: 3.9.1
> Environment: CentOS Linux 7,kernel-release:4.19.325
> Java 21
> Reporter: zzshine
> Priority: Major
> Attachments: controller_every_min_change.png,
> part_leader_to_one_node.png
>
>
> Inter-cluster communication is normal without packet loss, and the cluster is
> properly configured.
> The Kafka server continuously prints the following logs:
> {code:java}
> [2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to
> fetch timeout (org.apache.kafka.raft.KafkaRaftClient)
> [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2
> due to request timeout. (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH
> request with correlation id 128927 due to node 2 being disconnected (elapsed
> time since creation: 5147ms, elapsed time since send: 5146ms, throttle time:
> 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1
> name=heartbeat] Disconnecting from node 3 due to request timeout.
> (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1
> name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation
> id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms,
> elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms)
> (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3
> due to request timeout. (org.apache.kafka.clients.NetworkClient)
> [2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH
> request with correlation id 128995 due to node 3 being disconnected (elapsed
> time since creation: 5720ms, elapsed time since send: 5720ms, throttle time:
> 0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient) {code}
> Adjust Kafka parameters as follows:
> {code:java}
> # default 2000
> broker.heartbeat.interval.ms=4000
> # default 9000
> broker.session.timeout.ms=10000
> # default 2000
> controller.quorum.request.timeout.ms=5000
> # default 1000
> controller.quorum.election.timeout.ms=5000
> # default 1000
> controller.quorum.election.backoff.max.ms=3000
> # default 2000
> controller.quorum.fetch.timeout.ms=6000 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)