[
https://issues.apache.org/jira/browse/KAFKA-19318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dennis lucero updated KAFKA-19318:
----------------------------------
Description:
Despite several attempts to migrate from Zookeeper cluster to Kraft, it failed
to properly migrate.
We spawn a need cluster fully healthy with 18 Kafka nodes connected to 5
Zookeeper nodes. 9 new Kafka nodes are there for the new controllers.
It was tested with Kafka 3.9.0
The controllers are started with:
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:04:28,798] INFO Registered kafka:type=kafka.Log4jController
MBean (kafka.utils.Log4jControllerRegistration$)
[2025-05-20 15:04:29,207] INFO [ControllerServer id=111] Starting controller
(kafka.server.ControllerServer)
[2025-05-20 15:04:29,656] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:04:29,784] INFO [SocketServer listenerType=CONTROLLER,
nodeId=111] Created data-plane acceptor and processors for endpoint :
ListenerName(CONTROLLER) (kafka.network.SocketServer)
[2025-05-20 15:04:29,792] INFO [SharedServer id=111] Starting SharedServer
(kafka.server.SharedServer)
[2025-05-20 15:04:29,841] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Recovering unflushed segment 0. 0/1 recovered for
__cluster_metadata-0. (kafka.log.LogLoader)
[2025-05-20 15:04:29,842] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 0 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 0 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load and
0ms for segment recovery from offset 0 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 2709 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 2709 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,906] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 1ms for snapshot load and
0ms for segment recovery from offset 2709 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,940] INFO Initialized snapshots with IDs
SortedSet(OffsetAndEpoch(offset=0, epoch=0)) from
/data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
[2025-05-20 15:04:29,952] INFO [raft-expiration-reaper]: Starting
(kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
[2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Waiting for controller
quorum voters future (kafka.server.ControllerServer)
[2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Finished waiting for
controller quorum voters future (kafka.server.ControllerServer)
[2025-05-20 15:04:30,233] INFO [ZooKeeperClient KRaft Migration] Initializing a
new session to
prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181.
(kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,248] INFO [ZooKeeperClient KRaft Migration] Waiting until
connected. (kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,261] INFO [ZooKeeperClient KRaft Migration] Connected.
(kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,293] INFO [controller-111-ThrottledChannelReaper-Fetch]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,293] INFO [controller-111-ThrottledChannelReaper-Produce]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,294] INFO [controller-111-ThrottledChannelReaper-Request]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,297] INFO
[controller-111-ThrottledChannelReaper-ControllerMutation]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,316] INFO [ExpirationReaper-111-AlterAcls]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:04:30,392] INFO [ControllerServer id=111] Waiting for the
controller metadata publishers to be installed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,393] INFO [ControllerServer id=111] Finished waiting for
the controller metadata publishers to be installed
(kafka.server.ControllerServer)
[2025-05-20 15:04:30,393] INFO [SocketServer listenerType=CONTROLLER,
nodeId=111] Enabling request processing. (kafka.network.SocketServer)
[2025-05-20 15:04:30,394] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] Found registration for
UwcgQ_r6T_uBVDMCdWLG6A instead of our incarnation.
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,395] INFO Awaiting socket connections on 0.0.0.0:9091.
(kafka.network.DataPlaneAcceptor)
[2025-05-20 15:04:30,455] INFO
[controller-111-to-controller-registration-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:04:30,455] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] initialized channel manager.
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,455] INFO [ControllerServer id=111] Waiting for all of the
authorizer futures to be completed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
all of the authorizer futures to be completed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Waiting for all of the
SocketServer Acceptors to be started (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
all of the SocketServer Acceptors to be started (kafka.server.ControllerServer)
[2025-05-20 15:04:30,457] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] sendControllerRegistration: attempting to
send ControllerRegistrationRequestData(controllerId=111,
incarnationId=mbreItYzTmySXeahJkXxFA, zkMigrationReady=true,
listeners=[Listener(name='CONTROLLER', host='prodkafkacontr11z1.h.ae-rus.net',
port=9091, securityProtocol=0)], features=[Feature(name='kraft.version',
minSupportedVersion=0, maxSupportedVersion=1), Feature(name='metadata.version',
minSupportedVersion=1, maxSupportedVersion=21)])
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,471] INFO
[controller-111-to-controller-registration-channel-manager]: Recorded new KRaft
controller, from now on will use node prodkafkacontr11z1.h.ae-rus.net:9091 (id:
111 rack: null) (kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:04:30,505] INFO [KafkaRaftServer nodeId=111] Kafka Server
started (kafka.server.KafkaRaftServer)
[2025-05-20 15:04:30,579] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] Our registration has been persisted to the
metadata log. (kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,581] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] RegistrationResponseHandler: controller
acknowledged ControllerRegistrationRequest.
(kafka.server.ControllerRegistrationManager){code}
{code:java}
cat /var/log/kafka/controller.log
[2025-05-20 15:04:30,357] TRACE [KRaftMigrationDriver id=111] Recovered
migration state from ZK in 68765354 ns. Transitioned migration state from
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=-1,
migrationZkVersion=-2, controllerZkEpoch=-1, controllerZkVersion=-2} to
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2}
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] Initial migration
of ZK metadata is not done.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] 111 transitioning
from UNINITIALIZED to INACTIVE state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Became active
migration driver in 27621312 ns. Transitioned migration state from
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2} to
ZkMigrationLeadershipState{kraftControllerId=111, kraftControllerEpoch=2,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2}
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111 transitioning
from INACTIVE to WAIT_FOR_CONTROLLER_QUORUM state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Controller Quorum
is ready for Zk to KRaft migration. Now waiting for ZK brokers.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111 transitioning
from WAIT_FOR_CONTROLLER_QUORUM to WAIT_FOR_BROKERS state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,389] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,494] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,579] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,580] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,645] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,645] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,701] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,701] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,730] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,730] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,778] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,861] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,861] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,905] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,905] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,202] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,202] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,278] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,778] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver) {code}
The config on the controller node is the following
{code:java}
cat /etc/kafka/server.properties
##
# Ansible managed
#host.name=prodkafkacontr14z4.h.ae-rus.netcluster.name=MrgOne
node.id=114
controller.rack=z4replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=controllerdelete.topic.enable=True
auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners=CONTROLLER://:9091advertised.listeners=CONTROLLER://prodkafkacontr14z4.h.ae-rus.net:9091
controller.listener.names=CONTROLLER
inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
num.network.threads=128
background.threads=128num.io.threads=128
socket.send.buffer.bytes=2097152
socket.receive.buffer.bytes=2097152
socket.request.max.bytes=104857600
queued.max.requests=32768offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka
num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
log.flush.interval.ms=5000log.retention.hours=48
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=20default.replication.factor=3
num.replica.fetchers=64
replica.fetch.max.bytes=41943040
replica.fetch.wait.max.ms=1000
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000############################# Extra Properties
#############################
offsets.topic.num.partitions=150
message.max.bytes=20971520
batch.size=65536
zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
rack.aware.assignment.strategy=balance_subtopology
rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
compression.gzip.level=5
zookeeper.metadata.migration.enable=True
zookeeper.metadata.migration.min.batch.size=512
initial.broker.registration.timeout.ms=18000 {code}
The config on the broker node is the following
{code:java}
$ cat /etc/kafka/server.properties
##
# Ansible managed
#host.name=prodkafka2z1.h.ae-rus.netcluster.name=MrgOne
node.id=2
broker.rack=z1replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=brokerdelete.topic.enable=True
auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners=EXTERNAL://:9092,INTERNAL://:29092advertised.listeners=EXTERNAL://prod-kafka-12.ae-rus.net:9092,INTERNAL://prod-kafka-12.ae-rus.net:29092
controller.listener.names=CONTROLLER
inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
num.network.threads=128
background.threads=128num.io.threads=128
socket.send.buffer.bytes=2097152
socket.receive.buffer.bytes=2097152
socket.request.max.bytes=104857600
queued.max.requests=32768offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka,/data/kafka2/kafka,/data/kafka3/kafka,/data/kafka4/kafka,/data/kafka5/kafka,/data/kafka6/kafka
num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
log.flush.interval.ms=5000log.retention.hours=48
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=20default.replication.factor=3
num.replica.fetchers=64
replica.fetch.max.bytes=41943040
replica.fetch.wait.max.ms=1000
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000############################# Extra Properties
#############################
offsets.topic.num.partitions=150
message.max.bytes=20971520
batch.size=65536
zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
rack.aware.assignment.strategy=balance_subtopology
rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
compression.gzip.level=5
zookeeper.metadata.migration.enable=True
zookeeper.metadata.migration.min.batch.size=512
initial.broker.registration.timeout.ms=18000
{code}
When trying to move to the next step (`Migrating brokers to KRaft`), it fails
to get controller quorum and crashes.
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:24:07,521] INFO Registered kafka:type=kafka.Log4jController
MBean (kafka.utils.Log4jControllerRegistration$)
[2025-05-20 15:24:07,826] INFO [BrokerServer id=2] Transition from SHUTDOWN to
STARTING (kafka.server.BrokerServer)
[2025-05-20 15:24:07,827] INFO [SharedServer id=2] Starting SharedServer
(kafka.server.SharedServer)
[2025-05-20 15:24:07,884] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Recovering unflushed segment 21124. 0/1 recovered for
__cluster_metadata-0. (kafka.log.LogLoader)
[2025-05-20 15:24:07,885] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 21124 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,886] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 21124 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,893] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load and
7ms for segment recovery from offset 21124 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 31424 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 31424 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,960] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 2ms for snapshot load and
0ms for segment recovery from offset 31424 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,995] INFO Initialized snapshots with IDs
SortedSet(OffsetAndEpoch(offset=21124, epoch=7)) from
/data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
[2025-05-20 15:24:08,023] INFO [raft-expiration-reaper]: Starting
(kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
[2025-05-20 15:24:08,301] INFO [BrokerServer id=2] Starting broker
(kafka.server.BrokerServer)
[2025-05-20 15:24:08,318] INFO [broker-2-ThrottledChannelReaper-Fetch]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,319] INFO [broker-2-ThrottledChannelReaper-Produce]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,320] INFO [broker-2-ThrottledChannelReaper-Request]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,322] INFO
[broker-2-ThrottledChannelReaper-ControllerMutation]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Waiting for controller
quorum voters future (kafka.server.BrokerServer)
[2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Finished waiting for
controller quorum voters future (kafka.server.BrokerServer)
[2025-05-20 15:24:08,371] INFO
[broker-2-to-controller-forwarding-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,592] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:24:08,687] INFO [SocketServer listenerType=BROKER, nodeId=2]
Created data-plane acceptor and processors for endpoint :
ListenerName(EXTERNAL) (kafka.network.SocketServer)
[2025-05-20 15:24:08,688] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:24:08,759] INFO [SocketServer listenerType=BROKER, nodeId=2]
Created data-plane acceptor and processors for endpoint :
ListenerName(INTERNAL) (kafka.network.SocketServer)
[2025-05-20 15:24:08,765] INFO
[broker-2-to-controller-alter-partition-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,771] INFO
[broker-2-to-controller-directory-assignments-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Produce]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Fetch]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,789] INFO [ExpirationReaper-2-DeleteRecords]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-ElectLeader]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-RemoteFetch]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Heartbeat]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Rebalance]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,863] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,865] INFO [BrokerLifecycleManager id=2] Incarnation
ZvarJwVqQZOUzzobd91HlQ of broker 2 in cluster 9ca1s2tbTGm1Bi35aynf6Q is now
STARTING. (kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:08,898] INFO [ExpirationReaper-2-AlterAcls]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,978] INFO [BrokerServer id=2] Waiting for the broker
metadata publishers to be installed (kafka.server.BrokerServer)
[2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Finished waiting for the
broker metadata publishers to be installed (kafka.server.BrokerServer)
[2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Waiting for the controller
to acknowledge that we are caught up (kafka.server.BrokerServer)
{code}
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:24:13,380] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:17,990] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:22,701] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,864] ERROR [BrokerLifecycleManager id=2] Shutting down
because we were unable to register with the controller quorum.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,864] INFO [BrokerLifecycleManager id=2] Transitioning from
STARTING to SHUTTING_DOWN. (kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Shutting down
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Stopped
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Shutdown completed
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] ERROR [BrokerServer id=2] Received a fatal error
while waiting for the controller to acknowledge that we are caught up
(kafka.server.BrokerServer)
java.util.concurrent.CancellationException
at
java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
at
kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2025-05-20 15:24:26,867] INFO Node to controller channel manager for heartbeat
shutdown (kafka.server.NodeToControllerChannelManagerImpl)
[2025-05-20 15:24:26,868] INFO [BrokerServer id=2] Transition from STARTING to
STARTED (kafka.server.BrokerServer)
[2025-05-20 15:24:26,870] ERROR [BrokerServer id=2] Fatal error during broker
startup. Prepare to shutdown (kafka.server.BrokerServer)
java.lang.RuntimeException: Received a fatal error while waiting for the
controller to acknowledge that we are caught up
at
org.apache.kafka.server.util.FutureUtils.waitWithLogging(FutureUtils.java:73)
at kafka.server.BrokerServer.startup(BrokerServer.scala:502)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
Caused by: java.util.concurrent.CancellationException
at
java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
at
kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2025-05-20 15:24:26,870] INFO [BrokerServer id=2] Transition from STARTED to
SHUTTING_DOWN (kafka.server.BrokerServer)
[2025-05-20 15:24:26,871] INFO [BrokerServer id=2] shutting down
(kafka.server.BrokerServer)
[2025-05-20 15:24:26,871] INFO [SocketServer listenerType=BROKER, nodeId=2]
Stopping socket server request processors (kafka.network.SocketServer)
[2025-05-20 15:24:26,874] ERROR Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
(kafka.network.DataPlaneAcceptor)
java.lang.NullPointerException: Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
at kafka.network.Acceptor.close(SocketServer.scala:678)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
at kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
[2025-05-20 15:24:26,906] ERROR Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
(kafka.network.DataPlaneAcceptor)
java.lang.NullPointerException: Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
at kafka.network.Acceptor.close(SocketServer.scala:678)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
at kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
[2025-05-20 15:24:26,932] INFO [SocketServer listenerType=BROKER, nodeId=2]
Stopped socket server request processors (kafka.network.SocketServer) {code}
quorum is ok and accessible from brokers
{code:java}
root@prodkafka2z1:/data# /opt/kafka/bin/kafka-metadata-quorum.sh
--bootstrap-controller prodkafkacontr11z1.h.ae-rus.net:9091 describe
--replication
NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp
LastCaughtUpTimestamp Status
111 L_5_4pNVTlyBl6_go_TkMA 5457 0 1747754839735
1747754839735 Leader
117 uXDetGxGYrzMssuEw-TsKQ 5457 0 1747754839705
1747754839705 Observer
118 EAqK0bysSLhV57oMWfeRbw 5457 0 1747754839705
1747754839705 Observer
112 utKjIX3i9UJU-dZf5e5rfQ 5457 0 1747754839700
1747754839700 Observer
113 gqu36Z80ogEmZ0cflxvAAw 5457 0 1747754839699
1747754839699 Observer
119 UcFbHUd5RGqXoqmsNXYYsA 5457 0 1747754839705
1747754839705 Observer
115 jH9oLOSeGoth0R7w6CPT3A 5457 0 1747754839701
1747754839701 Observer
114 nMGKPes0EA8gB4YS1E7KJw 5457 0 1747754839701
1747754839701 Observer
116 eoiyLM_huf88Ny5ZexL_7w 5457 0 1747754839701
1747754839701 Observer
root@prodkafka2z1:/data# /opt/kafka/bin/kafka-metadata-quorum.sh
--bootstrap-controller prodkafkacontr11z1.h.ae-rus.net:9091 describe --status
ClusterId: 9ca1s2tbTGm1Bi35aynf6Q
LeaderId: 111
LeaderEpoch: 2
HighWatermark: 5467
MaxFollowerLag: 0
MaxFollowerLagTimeMs: 0
CurrentVoters: [{"id": 111, "directoryId": "L_5_4pNVTlyBl6_go_TkMA",
"endpoints": ["CONTROLLER://prodkafkacontr11z1.h.ae-rus.net:9091"]}]
CurrentObservers: [{"id": 117, "directoryId": "uXDetGxGYrzMssuEw-TsKQ"},
{"id": 118, "directoryId": "EAqK0bysSLhV57oMWfeRbw"}, {"id": 112,
"directoryId": "utKjIX3i9UJU-dZf5e5rfQ"}, {"id": 113, "directoryId":
"gqu36Z80ogEmZ0cflxvAAw"}, {"id": 119, "directoryId":
"UcFbHUd5RGqXoqmsNXYYsA"}, {"id": 115, "directoryId":
"jH9oLOSeGoth0R7w6CPT3A"}, {"id": 114, "directoryId":
"nMGKPes0EA8gB4YS1E7KJw"}, {"id": 116, "directoryId":
"eoiyLM_huf88Ny5ZexL_7w"}] {code}
was:
Despite several attempts to migrate from Zookeeper cluster to Kraft, it failed
to properly migrate.
We spawn a need cluster fully healthy with 18 Kafka nodes connected to 5
Zookeeper nodes. 9 new Kafka nodes are there for the new controllers.
It was tested with Kafka 3.9.0
The controllers are started with:
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:04:28,798] INFO Registered kafka:type=kafka.Log4jController
MBean (kafka.utils.Log4jControllerRegistration$)
[2025-05-20 15:04:29,207] INFO [ControllerServer id=111] Starting controller
(kafka.server.ControllerServer)
[2025-05-20 15:04:29,656] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:04:29,784] INFO [SocketServer listenerType=CONTROLLER,
nodeId=111] Created data-plane acceptor and processors for endpoint :
ListenerName(CONTROLLER) (kafka.network.SocketServer)
[2025-05-20 15:04:29,792] INFO [SharedServer id=111] Starting SharedServer
(kafka.server.SharedServer)
[2025-05-20 15:04:29,841] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Recovering unflushed segment 0. 0/1 recovered for
__cluster_metadata-0. (kafka.log.LogLoader)
[2025-05-20 15:04:29,842] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 0 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 0 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load and
0ms for segment recovery from offset 0 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 2709 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 2709 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,906] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 1ms for snapshot load and
0ms for segment recovery from offset 2709 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,940] INFO Initialized snapshots with IDs
SortedSet(OffsetAndEpoch(offset=0, epoch=0)) from
/data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
[2025-05-20 15:04:29,952] INFO [raft-expiration-reaper]: Starting
(kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
[2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Waiting for controller
quorum voters future (kafka.server.ControllerServer)
[2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Finished waiting for
controller quorum voters future (kafka.server.ControllerServer)
[2025-05-20 15:04:30,233] INFO [ZooKeeperClient KRaft Migration] Initializing a
new session to
prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181.
(kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,248] INFO [ZooKeeperClient KRaft Migration] Waiting until
connected. (kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,261] INFO [ZooKeeperClient KRaft Migration] Connected.
(kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,293] INFO [controller-111-ThrottledChannelReaper-Fetch]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,293] INFO [controller-111-ThrottledChannelReaper-Produce]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,294] INFO [controller-111-ThrottledChannelReaper-Request]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,297] INFO
[controller-111-ThrottledChannelReaper-ControllerMutation]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,316] INFO [ExpirationReaper-111-AlterAcls]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:04:30,392] INFO [ControllerServer id=111] Waiting for the
controller metadata publishers to be installed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,393] INFO [ControllerServer id=111] Finished waiting for
the controller metadata publishers to be installed
(kafka.server.ControllerServer)
[2025-05-20 15:04:30,393] INFO [SocketServer listenerType=CONTROLLER,
nodeId=111] Enabling request processing. (kafka.network.SocketServer)
[2025-05-20 15:04:30,394] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] Found registration for
UwcgQ_r6T_uBVDMCdWLG6A instead of our incarnation.
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,395] INFO Awaiting socket connections on 0.0.0.0:9091.
(kafka.network.DataPlaneAcceptor)
[2025-05-20 15:04:30,455] INFO
[controller-111-to-controller-registration-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:04:30,455] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] initialized channel manager.
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,455] INFO [ControllerServer id=111] Waiting for all of the
authorizer futures to be completed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
all of the authorizer futures to be completed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Waiting for all of the
SocketServer Acceptors to be started (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
all of the SocketServer Acceptors to be started (kafka.server.ControllerServer)
[2025-05-20 15:04:30,457] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] sendControllerRegistration: attempting to
send ControllerRegistrationRequestData(controllerId=111,
incarnationId=mbreItYzTmySXeahJkXxFA, zkMigrationReady=true,
listeners=[Listener(name='CONTROLLER', host='prodkafkacontr11z1.h.ae-rus.net',
port=9091, securityProtocol=0)], features=[Feature(name='kraft.version',
minSupportedVersion=0, maxSupportedVersion=1), Feature(name='metadata.version',
minSupportedVersion=1, maxSupportedVersion=21)])
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,471] INFO
[controller-111-to-controller-registration-channel-manager]: Recorded new KRaft
controller, from now on will use node prodkafkacontr11z1.h.ae-rus.net:9091 (id:
111 rack: null) (kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:04:30,505] INFO [KafkaRaftServer nodeId=111] Kafka Server
started (kafka.server.KafkaRaftServer)
[2025-05-20 15:04:30,579] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] Our registration has been persisted to the
metadata log. (kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,581] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] RegistrationResponseHandler: controller
acknowledged ControllerRegistrationRequest.
(kafka.server.ControllerRegistrationManager){code}
{code:java}
cat /var/log/kafka/controller.log
[2025-05-20 15:04:30,357] TRACE [KRaftMigrationDriver id=111] Recovered
migration state from ZK in 68765354 ns. Transitioned migration state from
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=-1,
migrationZkVersion=-2, controllerZkEpoch=-1, controllerZkVersion=-2} to
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2}
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] Initial migration
of ZK metadata is not done.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] 111 transitioning
from UNINITIALIZED to INACTIVE state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Became active
migration driver in 27621312 ns. Transitioned migration state from
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2} to
ZkMigrationLeadershipState{kraftControllerId=111, kraftControllerEpoch=2,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2}
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111 transitioning
from INACTIVE to WAIT_FOR_CONTROLLER_QUORUM state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Controller Quorum
is ready for Zk to KRaft migration. Now waiting for ZK brokers.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111 transitioning
from WAIT_FOR_CONTROLLER_QUORUM to WAIT_FOR_BROKERS state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,389] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,494] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,579] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,580] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,645] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,645] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,701] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,701] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,730] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,730] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,778] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,861] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,861] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,905] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,905] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,202] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,202] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,278] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,778] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver) {code}
The config on the controller node is the following
{code:java}
cat /etc/kafka/server.properties
##
# Ansible managed
#host.name=prodkafkacontr14z4.h.ae-rus.netcluster.name=MrgOne
node.id=114
controller.rack=z4replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=controllerdelete.topic.enable=True
auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners=CONTROLLER://:9091advertised.listeners=CONTROLLER://prodkafkacontr14z4.h.ae-rus.net:9091
controller.listener.names=CONTROLLER
inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
num.network.threads=128
background.threads=128num.io.threads=128
socket.send.buffer.bytes=2097152
socket.receive.buffer.bytes=2097152
socket.request.max.bytes=104857600
queued.max.requests=32768offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka
num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
log.flush.interval.ms=5000log.retention.hours=48
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=20default.replication.factor=3
num.replica.fetchers=64
replica.fetch.max.bytes=41943040
replica.fetch.wait.max.ms=1000
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000############################# Extra Properties
#############################
offsets.topic.num.partitions=150
message.max.bytes=20971520
batch.size=65536
zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
rack.aware.assignment.strategy=balance_subtopology
rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
compression.gzip.level=5
zookeeper.metadata.migration.enable=True
zookeeper.metadata.migration.min.batch.size=512
initial.broker.registration.timeout.ms=18000 {code}
The config on the broker node is the following
{code:java}
$ cat /etc/kafka/server.properties
##
# Ansible managed
#host.name=prodkafka2z1.h.ae-rus.netcluster.name=MrgOne
node.id=2
broker.rack=z1replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=brokerdelete.topic.enable=True
auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners=EXTERNAL://:9092,INTERNAL://:29092advertised.listeners=EXTERNAL://prod-kafka-12.ae-rus.net:9092,INTERNAL://prod-kafka-12.ae-rus.net:29092
controller.listener.names=CONTROLLER
inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
num.network.threads=128
background.threads=128num.io.threads=128
socket.send.buffer.bytes=2097152
socket.receive.buffer.bytes=2097152
socket.request.max.bytes=104857600
queued.max.requests=32768offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka,/data/kafka2/kafka,/data/kafka3/kafka,/data/kafka4/kafka,/data/kafka5/kafka,/data/kafka6/kafka
num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
log.flush.interval.ms=5000log.retention.hours=48
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=20default.replication.factor=3
num.replica.fetchers=64
replica.fetch.max.bytes=41943040
replica.fetch.wait.max.ms=1000
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000############################# Extra Properties
#############################
offsets.topic.num.partitions=150
message.max.bytes=20971520
batch.size=65536
zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
rack.aware.assignment.strategy=balance_subtopology
rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
compression.gzip.level=5
zookeeper.metadata.migration.enable=True
zookeeper.metadata.migration.min.batch.size=512
initial.broker.registration.timeout.ms=18000
{code}
When trying to move to the next step (`Migrating brokers to KRaft`), it fails
to get controller quorum and crashes.
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:24:07,521] INFO Registered kafka:type=kafka.Log4jController
MBean (kafka.utils.Log4jControllerRegistration$)
[2025-05-20 15:24:07,826] INFO [BrokerServer id=2] Transition from SHUTDOWN to
STARTING (kafka.server.BrokerServer)
[2025-05-20 15:24:07,827] INFO [SharedServer id=2] Starting SharedServer
(kafka.server.SharedServer)
[2025-05-20 15:24:07,884] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Recovering unflushed segment 21124. 0/1 recovered for
__cluster_metadata-0. (kafka.log.LogLoader)
[2025-05-20 15:24:07,885] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 21124 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,886] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 21124 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,893] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load and
7ms for segment recovery from offset 21124 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 31424 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 31424 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,960] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 2ms for snapshot load and
0ms for segment recovery from offset 31424 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,995] INFO Initialized snapshots with IDs
SortedSet(OffsetAndEpoch(offset=21124, epoch=7)) from
/data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
[2025-05-20 15:24:08,023] INFO [raft-expiration-reaper]: Starting
(kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
[2025-05-20 15:24:08,301] INFO [BrokerServer id=2] Starting broker
(kafka.server.BrokerServer)
[2025-05-20 15:24:08,318] INFO [broker-2-ThrottledChannelReaper-Fetch]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,319] INFO [broker-2-ThrottledChannelReaper-Produce]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,320] INFO [broker-2-ThrottledChannelReaper-Request]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,322] INFO
[broker-2-ThrottledChannelReaper-ControllerMutation]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Waiting for controller
quorum voters future (kafka.server.BrokerServer)
[2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Finished waiting for
controller quorum voters future (kafka.server.BrokerServer)
[2025-05-20 15:24:08,371] INFO
[broker-2-to-controller-forwarding-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,592] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:24:08,687] INFO [SocketServer listenerType=BROKER, nodeId=2]
Created data-plane acceptor and processors for endpoint :
ListenerName(EXTERNAL) (kafka.network.SocketServer)
[2025-05-20 15:24:08,688] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:24:08,759] INFO [SocketServer listenerType=BROKER, nodeId=2]
Created data-plane acceptor and processors for endpoint :
ListenerName(INTERNAL) (kafka.network.SocketServer)
[2025-05-20 15:24:08,765] INFO
[broker-2-to-controller-alter-partition-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,771] INFO
[broker-2-to-controller-directory-assignments-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Produce]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Fetch]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,789] INFO [ExpirationReaper-2-DeleteRecords]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-ElectLeader]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-RemoteFetch]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Heartbeat]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Rebalance]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,863] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,865] INFO [BrokerLifecycleManager id=2] Incarnation
ZvarJwVqQZOUzzobd91HlQ of broker 2 in cluster 9ca1s2tbTGm1Bi35aynf6Q is now
STARTING. (kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:08,898] INFO [ExpirationReaper-2-AlterAcls]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,978] INFO [BrokerServer id=2] Waiting for the broker
metadata publishers to be installed (kafka.server.BrokerServer)
[2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Finished waiting for the
broker metadata publishers to be installed (kafka.server.BrokerServer)
[2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Waiting for the controller
to acknowledge that we are caught up (kafka.server.BrokerServer)
{code}
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:24:13,380] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:17,990] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:22,701] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,864] ERROR [BrokerLifecycleManager id=2] Shutting down
because we were unable to register with the controller quorum.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,864] INFO [BrokerLifecycleManager id=2] Transitioning from
STARTING to SHUTTING_DOWN. (kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Shutting down
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Stopped
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Shutdown completed
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] ERROR [BrokerServer id=2] Received a fatal error
while waiting for the controller to acknowledge that we are caught up
(kafka.server.BrokerServer)
java.util.concurrent.CancellationException
at
java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
at
kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2025-05-20 15:24:26,867] INFO Node to controller channel manager for heartbeat
shutdown (kafka.server.NodeToControllerChannelManagerImpl)
[2025-05-20 15:24:26,868] INFO [BrokerServer id=2] Transition from STARTING to
STARTED (kafka.server.BrokerServer)
[2025-05-20 15:24:26,870] ERROR [BrokerServer id=2] Fatal error during broker
startup. Prepare to shutdown (kafka.server.BrokerServer)
java.lang.RuntimeException: Received a fatal error while waiting for the
controller to acknowledge that we are caught up
at
org.apache.kafka.server.util.FutureUtils.waitWithLogging(FutureUtils.java:73)
at kafka.server.BrokerServer.startup(BrokerServer.scala:502)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
Caused by: java.util.concurrent.CancellationException
at
java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
at
kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2025-05-20 15:24:26,870] INFO [BrokerServer id=2] Transition from STARTED to
SHUTTING_DOWN (kafka.server.BrokerServer)
[2025-05-20 15:24:26,871] INFO [BrokerServer id=2] shutting down
(kafka.server.BrokerServer)
[2025-05-20 15:24:26,871] INFO [SocketServer listenerType=BROKER, nodeId=2]
Stopping socket server request processors (kafka.network.SocketServer)
[2025-05-20 15:24:26,874] ERROR Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
(kafka.network.DataPlaneAcceptor)
java.lang.NullPointerException: Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
at kafka.network.Acceptor.close(SocketServer.scala:678)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
at kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
[2025-05-20 15:24:26,906] ERROR Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
(kafka.network.DataPlaneAcceptor)
java.lang.NullPointerException: Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
at kafka.network.Acceptor.close(SocketServer.scala:678)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
at kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
[2025-05-20 15:24:26,932] INFO [SocketServer listenerType=BROKER, nodeId=2]
Stopped socket server request processors (kafka.network.SocketServer) {code}
> Zookeeper-Kraft failing migration - RPC got timed out before it could be sent
> -----------------------------------------------------------------------------
>
> Key: KAFKA-19318
> URL: https://issues.apache.org/jira/browse/KAFKA-19318
> Project: Kafka
> Issue Type: Bug
> Components: kraft
> Affects Versions: 3.9.0
> Reporter: dennis lucero
> Priority: Major
>
> Despite several attempts to migrate from Zookeeper cluster to Kraft, it
> failed to properly migrate.
> We spawn a need cluster fully healthy with 18 Kafka nodes connected to 5
> Zookeeper nodes. 9 new Kafka nodes are there for the new controllers.
> It was tested with Kafka 3.9.0
> The controllers are started with:
> {code:java}
> ==> /var/log/kafka/kafka-server.log <==
> [2025-05-20 15:04:28,798] INFO Registered kafka:type=kafka.Log4jController
> MBean (kafka.utils.Log4jControllerRegistration$)
> [2025-05-20 15:04:29,207] INFO [ControllerServer id=111] Starting controller
> (kafka.server.ControllerServer)
> [2025-05-20 15:04:29,656] INFO Updated connection-accept-rate max connection
> creation rate to 2147483647 (kafka.network.ConnectionQuotas)
> [2025-05-20 15:04:29,784] INFO [SocketServer listenerType=CONTROLLER,
> nodeId=111] Created data-plane acceptor and processors for endpoint :
> ListenerName(CONTROLLER) (kafka.network.SocketServer)
> [2025-05-20 15:04:29,792] INFO [SharedServer id=111] Starting SharedServer
> (kafka.server.SharedServer)
> [2025-05-20 15:04:29,841] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Recovering unflushed segment 0. 0/1 recovered for
> __cluster_metadata-0. (kafka.log.LogLoader)
> [2025-05-20 15:04:29,842] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Loading producer state till offset 0 with message
> format version 2 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
> producer state from offset 0 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load
> and 0ms for segment recovery from offset 0 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Loading producer state till offset 2709 with message
> format version 2 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
> producer state from offset 2709 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,906] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Producer state recovery took 1ms for snapshot load
> and 0ms for segment recovery from offset 2709 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,940] INFO Initialized snapshots with IDs
> SortedSet(OffsetAndEpoch(offset=0, epoch=0)) from
> /data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
> [2025-05-20 15:04:29,952] INFO [raft-expiration-reaper]: Starting
> (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
> [2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Waiting for
> controller quorum voters future (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Finished waiting for
> controller quorum voters future (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,233] INFO [ZooKeeperClient KRaft Migration] Initializing
> a new session to
> prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181.
> (kafka.zookeeper.ZooKeeperClient)
> [2025-05-20 15:04:30,248] INFO [ZooKeeperClient KRaft Migration] Waiting
> until connected. (kafka.zookeeper.ZooKeeperClient)
> [2025-05-20 15:04:30,261] INFO [ZooKeeperClient KRaft Migration] Connected.
> (kafka.zookeeper.ZooKeeperClient)
> [2025-05-20 15:04:30,293] INFO [controller-111-ThrottledChannelReaper-Fetch]:
> Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:04:30,293] INFO
> [controller-111-ThrottledChannelReaper-Produce]: Starting
> (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:04:30,294] INFO
> [controller-111-ThrottledChannelReaper-Request]: Starting
> (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:04:30,297] INFO
> [controller-111-ThrottledChannelReaper-ControllerMutation]: Starting
> (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:04:30,316] INFO [ExpirationReaper-111-AlterAcls]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:04:30,392] INFO [ControllerServer id=111] Waiting for the
> controller metadata publishers to be installed (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,393] INFO [ControllerServer id=111] Finished waiting for
> the controller metadata publishers to be installed
> (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,393] INFO [SocketServer listenerType=CONTROLLER,
> nodeId=111] Enabling request processing. (kafka.network.SocketServer)
> [2025-05-20 15:04:30,394] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] Found registration for
> UwcgQ_r6T_uBVDMCdWLG6A instead of our incarnation.
> (kafka.server.ControllerRegistrationManager)
> [2025-05-20 15:04:30,395] INFO Awaiting socket connections on 0.0.0.0:9091.
> (kafka.network.DataPlaneAcceptor)
> [2025-05-20 15:04:30,455] INFO
> [controller-111-to-controller-registration-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:04:30,455] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] initialized channel manager.
> (kafka.server.ControllerRegistrationManager)
> [2025-05-20 15:04:30,455] INFO [ControllerServer id=111] Waiting for all of
> the authorizer futures to be completed (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
> all of the authorizer futures to be completed (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Waiting for all of
> the SocketServer Acceptors to be started (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
> all of the SocketServer Acceptors to be started
> (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,457] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] sendControllerRegistration: attempting to
> send ControllerRegistrationRequestData(controllerId=111,
> incarnationId=mbreItYzTmySXeahJkXxFA, zkMigrationReady=true,
> listeners=[Listener(name='CONTROLLER',
> host='prodkafkacontr11z1.h.ae-rus.net', port=9091, securityProtocol=0)],
> features=[Feature(name='kraft.version', minSupportedVersion=0,
> maxSupportedVersion=1), Feature(name='metadata.version',
> minSupportedVersion=1, maxSupportedVersion=21)])
> (kafka.server.ControllerRegistrationManager)
> [2025-05-20 15:04:30,471] INFO
> [controller-111-to-controller-registration-channel-manager]: Recorded new
> KRaft controller, from now on will use node
> prodkafkacontr11z1.h.ae-rus.net:9091 (id: 111 rack: null)
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:04:30,505] INFO [KafkaRaftServer nodeId=111] Kafka Server
> started (kafka.server.KafkaRaftServer)
> [2025-05-20 15:04:30,579] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] Our registration has been persisted to
> the metadata log. (kafka.server.ControllerRegistrationManager)
> [2025-05-20 15:04:30,581] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] RegistrationResponseHandler: controller
> acknowledged ControllerRegistrationRequest.
> (kafka.server.ControllerRegistrationManager){code}
> {code:java}
> cat /var/log/kafka/controller.log
> [2025-05-20 15:04:30,357] TRACE [KRaftMigrationDriver id=111] Recovered
> migration state from ZK in 68765354 ns. Transitioned migration state from
> ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
> kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=-1,
> migrationZkVersion=-2, controllerZkEpoch=-1, controllerZkVersion=-2} to
> ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
> kraftMetadataOffset=-1, kraftMetadataEpoch=-1,
> lastUpdatedTimeMs=1747648116174, migrationZkVersion=0, controllerZkEpoch=-1,
> controllerZkVersion=-2}
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] Initial
> migration of ZK metadata is not done.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] 111
> transitioning from UNINITIALIZED to INACTIVE state
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Became active
> migration driver in 27621312 ns. Transitioned migration state from
> ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
> kraftMetadataOffset=-1, kraftMetadataEpoch=-1,
> lastUpdatedTimeMs=1747648116174, migrationZkVersion=0, controllerZkEpoch=-1,
> controllerZkVersion=-2} to ZkMigrationLeadershipState{kraftControllerId=111,
> kraftControllerEpoch=2, kraftMetadataOffset=-1, kraftMetadataEpoch=-1,
> lastUpdatedTimeMs=1747648116174, migrationZkVersion=0, controllerZkEpoch=-1,
> controllerZkVersion=-2}
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111
> transitioning from INACTIVE to WAIT_FOR_CONTROLLER_QUORUM state
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Controller
> Quorum is ready for Zk to KRaft migration. Now waiting for ZK brokers.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111
> transitioning from WAIT_FOR_CONTROLLER_QUORUM to WAIT_FOR_BROKERS state
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,389] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,494] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,579] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,580] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,645] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,645] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,701] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,701] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,730] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,730] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,778] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,861] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,861] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,905] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,905] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:31,202] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:31,202] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:31,278] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:31,778] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> {code}
> The config on the controller node is the following
> {code:java}
> cat /etc/kafka/server.properties
> ##
> # Ansible managed
> #host.name=prodkafkacontr14z4.h.ae-rus.netcluster.name=MrgOne
> node.id=114
> controller.rack=z4replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=controllerdelete.topic.enable=True
> auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
> listeners=CONTROLLER://:9091advertised.listeners=CONTROLLER://prodkafkacontr14z4.h.ae-rus.net:9091
> controller.listener.names=CONTROLLER
> inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
> num.network.threads=128
> background.threads=128num.io.threads=128
> socket.send.buffer.bytes=2097152
> socket.receive.buffer.bytes=2097152
> socket.request.max.bytes=104857600
> queued.max.requests=32768offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka
> num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
> log.flush.interval.ms=5000log.retention.hours=48
> log.retention.bytes=1073741824
> log.segment.bytes=1073741824
> log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
> leader.imbalance.check.interval.seconds=300
> leader.imbalance.per.broker.percentage=20default.replication.factor=3
> num.replica.fetchers=64
> replica.fetch.max.bytes=41943040
> replica.fetch.wait.max.ms=1000
> replica.high.watermark.checkpoint.interval.ms=5000
> replica.socket.timeout.ms=30000
> replica.socket.receive.buffer.bytes=65536
> replica.lag.time.max.ms=10000############################# Extra Properties
> #############################
> offsets.topic.num.partitions=150
> message.max.bytes=20971520
> batch.size=65536
> zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
> rack.aware.assignment.strategy=balance_subtopology
> rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
> compression.gzip.level=5
> zookeeper.metadata.migration.enable=True
> zookeeper.metadata.migration.min.batch.size=512
> initial.broker.registration.timeout.ms=18000 {code}
> The config on the broker node is the following
> {code:java}
> $ cat /etc/kafka/server.properties
> ##
> # Ansible managed
> #host.name=prodkafka2z1.h.ae-rus.netcluster.name=MrgOne
> node.id=2
> broker.rack=z1replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=brokerdelete.topic.enable=True
> auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
> listeners=EXTERNAL://:9092,INTERNAL://:29092advertised.listeners=EXTERNAL://prod-kafka-12.ae-rus.net:9092,INTERNAL://prod-kafka-12.ae-rus.net:29092
> controller.listener.names=CONTROLLER
> inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
> num.network.threads=128
> background.threads=128num.io.threads=128
> socket.send.buffer.bytes=2097152
> socket.receive.buffer.bytes=2097152
> socket.request.max.bytes=104857600
> queued.max.requests=32768offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka,/data/kafka2/kafka,/data/kafka3/kafka,/data/kafka4/kafka,/data/kafka5/kafka,/data/kafka6/kafka
> num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
> log.flush.interval.ms=5000log.retention.hours=48
> log.retention.bytes=1073741824
> log.segment.bytes=1073741824
> log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
> leader.imbalance.check.interval.seconds=300
> leader.imbalance.per.broker.percentage=20default.replication.factor=3
> num.replica.fetchers=64
> replica.fetch.max.bytes=41943040
> replica.fetch.wait.max.ms=1000
> replica.high.watermark.checkpoint.interval.ms=5000
> replica.socket.timeout.ms=30000
> replica.socket.receive.buffer.bytes=65536
> replica.lag.time.max.ms=10000############################# Extra Properties
> #############################
> offsets.topic.num.partitions=150
> message.max.bytes=20971520
> batch.size=65536
> zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
> rack.aware.assignment.strategy=balance_subtopology
> rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
> compression.gzip.level=5
> zookeeper.metadata.migration.enable=True
> zookeeper.metadata.migration.min.batch.size=512
> initial.broker.registration.timeout.ms=18000
> {code}
> When trying to move to the next step (`Migrating brokers to KRaft`), it fails
> to get controller quorum and crashes.
> {code:java}
> ==> /var/log/kafka/kafka-server.log <==
> [2025-05-20 15:24:07,521] INFO Registered kafka:type=kafka.Log4jController
> MBean (kafka.utils.Log4jControllerRegistration$)
> [2025-05-20 15:24:07,826] INFO [BrokerServer id=2] Transition from SHUTDOWN
> to STARTING (kafka.server.BrokerServer)
> [2025-05-20 15:24:07,827] INFO [SharedServer id=2] Starting SharedServer
> (kafka.server.SharedServer)
> [2025-05-20 15:24:07,884] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Recovering unflushed segment 21124. 0/1 recovered for
> __cluster_metadata-0. (kafka.log.LogLoader)
> [2025-05-20 15:24:07,885] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Loading producer state till offset 21124 with message
> format version 2 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,886] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
> producer state from offset 21124 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,893] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load
> and 7ms for segment recovery from offset 21124 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Loading producer state till offset 31424 with message
> format version 2 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
> producer state from offset 31424 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,960] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Producer state recovery took 2ms for snapshot load
> and 0ms for segment recovery from offset 31424 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,995] INFO Initialized snapshots with IDs
> SortedSet(OffsetAndEpoch(offset=21124, epoch=7)) from
> /data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
> [2025-05-20 15:24:08,023] INFO [raft-expiration-reaper]: Starting
> (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
> [2025-05-20 15:24:08,301] INFO [BrokerServer id=2] Starting broker
> (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,318] INFO [broker-2-ThrottledChannelReaper-Fetch]:
> Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:24:08,319] INFO [broker-2-ThrottledChannelReaper-Produce]:
> Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:24:08,320] INFO [broker-2-ThrottledChannelReaper-Request]:
> Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:24:08,322] INFO
> [broker-2-ThrottledChannelReaper-ControllerMutation]: Starting
> (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Waiting for controller
> quorum voters future (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Finished waiting for
> controller quorum voters future (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,371] INFO
> [broker-2-to-controller-forwarding-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:08,592] INFO Updated connection-accept-rate max connection
> creation rate to 2147483647 (kafka.network.ConnectionQuotas)
> [2025-05-20 15:24:08,687] INFO [SocketServer listenerType=BROKER, nodeId=2]
> Created data-plane acceptor and processors for endpoint :
> ListenerName(EXTERNAL) (kafka.network.SocketServer)
> [2025-05-20 15:24:08,688] INFO Updated connection-accept-rate max connection
> creation rate to 2147483647 (kafka.network.ConnectionQuotas)
> [2025-05-20 15:24:08,759] INFO [SocketServer listenerType=BROKER, nodeId=2]
> Created data-plane acceptor and processors for endpoint :
> ListenerName(INTERNAL) (kafka.network.SocketServer)
> [2025-05-20 15:24:08,765] INFO
> [broker-2-to-controller-alter-partition-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:08,771] INFO
> [broker-2-to-controller-directory-assignments-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Produce]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Fetch]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,789] INFO [ExpirationReaper-2-DeleteRecords]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-ElectLeader]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-RemoteFetch]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Heartbeat]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Rebalance]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,863] INFO
> [broker-2-to-controller-heartbeat-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:08,865] INFO [BrokerLifecycleManager id=2] Incarnation
> ZvarJwVqQZOUzzobd91HlQ of broker 2 in cluster 9ca1s2tbTGm1Bi35aynf6Q is now
> STARTING. (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:08,898] INFO [ExpirationReaper-2-AlterAcls]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,978] INFO [BrokerServer id=2] Waiting for the broker
> metadata publishers to be installed (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Finished waiting for the
> broker metadata publishers to be installed (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Waiting for the controller
> to acknowledge that we are caught up (kafka.server.BrokerServer)
> {code}
> {code:java}
> ==> /var/log/kafka/kafka-server.log <==
> [2025-05-20 15:24:13,380] INFO [BrokerLifecycleManager id=2] Unable to
> register the broker because the RPC got timed out before it could be sent.
> (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:17,990] INFO [BrokerLifecycleManager id=2] Unable to
> register the broker because the RPC got timed out before it could be sent.
> (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:22,701] INFO [BrokerLifecycleManager id=2] Unable to
> register the broker because the RPC got timed out before it could be sent.
> (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:26,864] ERROR [BrokerLifecycleManager id=2] Shutting down
> because we were unable to register with the controller quorum.
> (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:26,864] INFO [BrokerLifecycleManager id=2] Transitioning
> from STARTING to SHUTTING_DOWN. (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:26,865] INFO
> [broker-2-to-controller-heartbeat-channel-manager]: Shutting down
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:26,865] INFO
> [broker-2-to-controller-heartbeat-channel-manager]: Stopped
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:26,865] INFO
> [broker-2-to-controller-heartbeat-channel-manager]: Shutdown completed
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:26,865] ERROR [BrokerServer id=2] Received a fatal error
> while waiting for the controller to acknowledge that we are caught up
> (kafka.server.BrokerServer)
> java.util.concurrent.CancellationException
> at
> java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
> at
> kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> [2025-05-20 15:24:26,867] INFO Node to controller channel manager for
> heartbeat shutdown (kafka.server.NodeToControllerChannelManagerImpl)
> [2025-05-20 15:24:26,868] INFO [BrokerServer id=2] Transition from STARTING
> to STARTED (kafka.server.BrokerServer)
> [2025-05-20 15:24:26,870] ERROR [BrokerServer id=2] Fatal error during broker
> startup. Prepare to shutdown (kafka.server.BrokerServer)
> java.lang.RuntimeException: Received a fatal error while waiting for the
> controller to acknowledge that we are caught up
> at
> org.apache.kafka.server.util.FutureUtils.waitWithLogging(FutureUtils.java:73)
> at kafka.server.BrokerServer.startup(BrokerServer.scala:502)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
> at scala.Option.foreach(Option.scala:437)
> at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
> at kafka.Kafka$.main(Kafka.scala:112)
> at kafka.Kafka.main(Kafka.scala)
> Caused by: java.util.concurrent.CancellationException
> at
> java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
> at
> kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> [2025-05-20 15:24:26,870] INFO [BrokerServer id=2] Transition from STARTED to
> SHUTTING_DOWN (kafka.server.BrokerServer)
> [2025-05-20 15:24:26,871] INFO [BrokerServer id=2] shutting down
> (kafka.server.BrokerServer)
> [2025-05-20 15:24:26,871] INFO [SocketServer listenerType=BROKER, nodeId=2]
> Stopping socket server request processors (kafka.network.SocketServer)
> [2025-05-20 15:24:26,874] ERROR Cannot invoke
> "java.nio.channels.ServerSocketChannel.close()" because the return value of
> "kafka.network.Acceptor.serverChannel()" is null
> (kafka.network.DataPlaneAcceptor)
> java.lang.NullPointerException: Cannot invoke
> "java.nio.channels.ServerSocketChannel.close()" because the return value of
> "kafka.network.Acceptor.serverChannel()" is null
> at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
> at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
> at kafka.network.Acceptor.close(SocketServer.scala:678)
> at
> kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
> at
> kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
> at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
> at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
> at
> kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
> at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
> at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
> at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
> at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
> at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
> at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
> at scala.Option.foreach(Option.scala:437)
> at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
> at kafka.Kafka$.main(Kafka.scala:112)
> at kafka.Kafka.main(Kafka.scala)
> [2025-05-20 15:24:26,906] ERROR Cannot invoke
> "java.nio.channels.ServerSocketChannel.close()" because the return value of
> "kafka.network.Acceptor.serverChannel()" is null
> (kafka.network.DataPlaneAcceptor)
> java.lang.NullPointerException: Cannot invoke
> "java.nio.channels.ServerSocketChannel.close()" because the return value of
> "kafka.network.Acceptor.serverChannel()" is null
> at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
> at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
> at kafka.network.Acceptor.close(SocketServer.scala:678)
> at
> kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
> at
> kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
> at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
> at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
> at
> kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
> at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
> at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
> at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
> at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
> at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
> at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
> at scala.Option.foreach(Option.scala:437)
> at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
> at kafka.Kafka$.main(Kafka.scala:112)
> at kafka.Kafka.main(Kafka.scala)
> [2025-05-20 15:24:26,932] INFO [SocketServer listenerType=BROKER, nodeId=2]
> Stopped socket server request processors (kafka.network.SocketServer) {code}
> quorum is ok and accessible from brokers
> {code:java}
> root@prodkafka2z1:/data# /opt/kafka/bin/kafka-metadata-quorum.sh
> --bootstrap-controller prodkafkacontr11z1.h.ae-rus.net:9091 describe
> --replication
> NodeId DirectoryId LogEndOffset Lag LastFetchTimestamp
> LastCaughtUpTimestamp Status
> 111 L_5_4pNVTlyBl6_go_TkMA 5457 0 1747754839735
> 1747754839735 Leader
> 117 uXDetGxGYrzMssuEw-TsKQ 5457 0 1747754839705
> 1747754839705 Observer
> 118 EAqK0bysSLhV57oMWfeRbw 5457 0 1747754839705
> 1747754839705 Observer
> 112 utKjIX3i9UJU-dZf5e5rfQ 5457 0 1747754839700
> 1747754839700 Observer
> 113 gqu36Z80ogEmZ0cflxvAAw 5457 0 1747754839699
> 1747754839699 Observer
> 119 UcFbHUd5RGqXoqmsNXYYsA 5457 0 1747754839705
> 1747754839705 Observer
> 115 jH9oLOSeGoth0R7w6CPT3A 5457 0 1747754839701
> 1747754839701 Observer
> 114 nMGKPes0EA8gB4YS1E7KJw 5457 0 1747754839701
> 1747754839701 Observer
> 116 eoiyLM_huf88Ny5ZexL_7w 5457 0 1747754839701
> 1747754839701 Observer
> root@prodkafka2z1:/data# /opt/kafka/bin/kafka-metadata-quorum.sh
> --bootstrap-controller prodkafkacontr11z1.h.ae-rus.net:9091 describe --status
> ClusterId: 9ca1s2tbTGm1Bi35aynf6Q
> LeaderId: 111
> LeaderEpoch: 2
> HighWatermark: 5467
> MaxFollowerLag: 0
> MaxFollowerLagTimeMs: 0
> CurrentVoters: [{"id": 111, "directoryId": "L_5_4pNVTlyBl6_go_TkMA",
> "endpoints": ["CONTROLLER://prodkafkacontr11z1.h.ae-rus.net:9091"]}]
> CurrentObservers: [{"id": 117, "directoryId":
> "uXDetGxGYrzMssuEw-TsKQ"}, {"id": 118, "directoryId":
> "EAqK0bysSLhV57oMWfeRbw"}, {"id": 112, "directoryId":
> "utKjIX3i9UJU-dZf5e5rfQ"}, {"id": 113, "directoryId":
> "gqu36Z80ogEmZ0cflxvAAw"}, {"id": 119, "directoryId":
> "UcFbHUd5RGqXoqmsNXYYsA"}, {"id": 115, "directoryId":
> "jH9oLOSeGoth0R7w6CPT3A"}, {"id": 114, "directoryId":
> "nMGKPes0EA8gB4YS1E7KJw"}, {"id": 116, "directoryId":
> "eoiyLM_huf88Ny5ZexL_7w"}] {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)