[
https://issues.apache.org/jira/browse/KAFKA-19318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dennis lucero updated KAFKA-19318:
----------------------------------
Fix Version/s: (was: 3.7.1)
Affects Version/s: 3.9.0
(was: 3.7.0)
(was: 3.6.1)
(was: 3.6.2)
Description:
Despite several attempts to migrate from Zookeeper cluster to Kraft, it failed
to properly migrate.
We spawn a need cluster fully healthy with 18 Kafka nodes connected to 5
Zookeeper nodes. 9 new Kafka nodes are there for the new controllers.
It was tested with Kafka 3.9.0
The controllers are started with:
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:04:28,798] INFO Registered kafka:type=kafka.Log4jController
MBean (kafka.utils.Log4jControllerRegistration$)
[2025-05-20 15:04:29,207] INFO [ControllerServer id=111] Starting controller
(kafka.server.ControllerServer)
[2025-05-20 15:04:29,656] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:04:29,784] INFO [SocketServer listenerType=CONTROLLER,
nodeId=111] Created data-plane acceptor and processors for endpoint :
ListenerName(CONTROLLER) (kafka.network.SocketServer)
[2025-05-20 15:04:29,792] INFO [SharedServer id=111] Starting SharedServer
(kafka.server.SharedServer)
[2025-05-20 15:04:29,841] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Recovering unflushed segment 0. 0/1 recovered for
__cluster_metadata-0. (kafka.log.LogLoader)
[2025-05-20 15:04:29,842] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 0 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 0 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load and
0ms for segment recovery from offset 0 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 2709 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 2709 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,906] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 1ms for snapshot load and
0ms for segment recovery from offset 2709 (kafka.log.UnifiedLog$)
[2025-05-20 15:04:29,940] INFO Initialized snapshots with IDs
SortedSet(OffsetAndEpoch(offset=0, epoch=0)) from
/data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
[2025-05-20 15:04:29,952] INFO [raft-expiration-reaper]: Starting
(kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
[2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Waiting for controller
quorum voters future (kafka.server.ControllerServer)
[2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Finished waiting for
controller quorum voters future (kafka.server.ControllerServer)
[2025-05-20 15:04:30,233] INFO [ZooKeeperClient KRaft Migration] Initializing a
new session to
prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181.
(kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,248] INFO [ZooKeeperClient KRaft Migration] Waiting until
connected. (kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,261] INFO [ZooKeeperClient KRaft Migration] Connected.
(kafka.zookeeper.ZooKeeperClient)
[2025-05-20 15:04:30,293] INFO [controller-111-ThrottledChannelReaper-Fetch]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,293] INFO [controller-111-ThrottledChannelReaper-Produce]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,294] INFO [controller-111-ThrottledChannelReaper-Request]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,297] INFO
[controller-111-ThrottledChannelReaper-ControllerMutation]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:04:30,316] INFO [ExpirationReaper-111-AlterAcls]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:04:30,392] INFO [ControllerServer id=111] Waiting for the
controller metadata publishers to be installed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,393] INFO [ControllerServer id=111] Finished waiting for
the controller metadata publishers to be installed
(kafka.server.ControllerServer)
[2025-05-20 15:04:30,393] INFO [SocketServer listenerType=CONTROLLER,
nodeId=111] Enabling request processing. (kafka.network.SocketServer)
[2025-05-20 15:04:30,394] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] Found registration for
UwcgQ_r6T_uBVDMCdWLG6A instead of our incarnation.
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,395] INFO Awaiting socket connections on 0.0.0.0:9091.
(kafka.network.DataPlaneAcceptor)
[2025-05-20 15:04:30,455] INFO
[controller-111-to-controller-registration-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:04:30,455] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] initialized channel manager.
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,455] INFO [ControllerServer id=111] Waiting for all of the
authorizer futures to be completed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
all of the authorizer futures to be completed (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Waiting for all of the
SocketServer Acceptors to be started (kafka.server.ControllerServer)
[2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
all of the SocketServer Acceptors to be started (kafka.server.ControllerServer)
[2025-05-20 15:04:30,457] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] sendControllerRegistration: attempting to
send ControllerRegistrationRequestData(controllerId=111,
incarnationId=mbreItYzTmySXeahJkXxFA, zkMigrationReady=true,
listeners=[Listener(name='CONTROLLER', host='prodkafkacontr11z1.h.ae-rus.net',
port=9091, securityProtocol=0)], features=[Feature(name='kraft.version',
minSupportedVersion=0, maxSupportedVersion=1), Feature(name='metadata.version',
minSupportedVersion=1, maxSupportedVersion=21)])
(kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,471] INFO
[controller-111-to-controller-registration-channel-manager]: Recorded new KRaft
controller, from now on will use node prodkafkacontr11z1.h.ae-rus.net:9091 (id:
111 rack: null) (kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:04:30,505] INFO [KafkaRaftServer nodeId=111] Kafka Server
started (kafka.server.KafkaRaftServer)
[2025-05-20 15:04:30,579] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] Our registration has been persisted to the
metadata log. (kafka.server.ControllerRegistrationManager)
[2025-05-20 15:04:30,581] INFO [ControllerRegistrationManager id=111
incarnation=mbreItYzTmySXeahJkXxFA] RegistrationResponseHandler: controller
acknowledged ControllerRegistrationRequest.
(kafka.server.ControllerRegistrationManager){code}
{code:java}
cat /var/log/kafka/controller.log
[2025-05-20 15:04:30,357] TRACE [KRaftMigrationDriver id=111] Recovered
migration state from ZK in 68765354 ns. Transitioned migration state from
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=-1,
migrationZkVersion=-2, controllerZkEpoch=-1, controllerZkVersion=-2} to
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2}
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] Initial migration
of ZK metadata is not done.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] 111 transitioning
from UNINITIALIZED to INACTIVE state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Became active
migration driver in 27621312 ns. Transitioned migration state from
ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2} to
ZkMigrationLeadershipState{kraftControllerId=111, kraftControllerEpoch=2,
kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=1747648116174,
migrationZkVersion=0, controllerZkEpoch=-1, controllerZkVersion=-2}
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111 transitioning
from INACTIVE to WAIT_FOR_CONTROLLER_QUORUM state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Controller Quorum
is ready for Zk to KRaft migration. Now waiting for ZK brokers.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111 transitioning
from WAIT_FOR_CONTROLLER_QUORUM to WAIT_FOR_BROKERS state
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,389] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,494] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,579] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,580] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,645] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,645] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,701] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,701] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,730] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,730] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,778] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,861] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,861] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,905] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:30,905] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,202] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,202] INFO [KRaftMigrationDriver id=111] No brokers are
known to KRaft, waiting for brokers to register.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,278] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2025-05-20 15:04:31,778] TRACE [KRaftMigrationDriver id=111] Received metadata
delta, but the controller is not in dual-write mode. Ignoring this metadata
update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver) {code}
The config on the controller node is the following
{code:java}
cat /etc/kafka/server.properties
##
# Ansible managed
#host.name=prodkafkacontr14z4.h.ae-rus.netcluster.name=MrgOne
node.id=114
controller.rack=z4replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=controllerdelete.topic.enable=True
auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners=CONTROLLER://:9091advertised.listeners=CONTROLLER://prodkafkacontr14z4.h.ae-rus.net:9091
controller.listener.names=CONTROLLER
inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
num.network.threads=128
background.threads=128num.io.threads=128
socket.send.buffer.bytes=2097152
socket.receive.buffer.bytes=2097152
socket.request.max.bytes=104857600
queued.max.requests=32768offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka
num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
log.flush.interval.ms=5000log.retention.hours=48
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=20default.replication.factor=3
num.replica.fetchers=64
replica.fetch.max.bytes=41943040
replica.fetch.wait.max.ms=1000
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000############################# Extra Properties
#############################
offsets.topic.num.partitions=150
message.max.bytes=20971520
batch.size=65536
zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
rack.aware.assignment.strategy=balance_subtopology
rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
compression.gzip.level=5
zookeeper.metadata.migration.enable=True
zookeeper.metadata.migration.min.batch.size=512
initial.broker.registration.timeout.ms=18000 {code}
The config on the broker node is the following
{code:java}
$ cat /etc/kafka/server.properties
##
# Ansible managed
#host.name=prodkafka2z1.h.ae-rus.netcluster.name=MrgOne
node.id=2
broker.rack=z1replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=brokerdelete.topic.enable=True
auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners=EXTERNAL://:9092,INTERNAL://:29092advertised.listeners=EXTERNAL://prod-kafka-12.ae-rus.net:9092,INTERNAL://prod-kafka-12.ae-rus.net:29092
controller.listener.names=CONTROLLER
inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
num.network.threads=128
background.threads=128num.io.threads=128
socket.send.buffer.bytes=2097152
socket.receive.buffer.bytes=2097152
socket.request.max.bytes=104857600
queued.max.requests=32768offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka,/data/kafka2/kafka,/data/kafka3/kafka,/data/kafka4/kafka,/data/kafka5/kafka,/data/kafka6/kafka
num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
log.flush.interval.ms=5000log.retention.hours=48
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=20default.replication.factor=3
num.replica.fetchers=64
replica.fetch.max.bytes=41943040
replica.fetch.wait.max.ms=1000
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000############################# Extra Properties
#############################
offsets.topic.num.partitions=150
message.max.bytes=20971520
batch.size=65536
zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
rack.aware.assignment.strategy=balance_subtopology
rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
compression.gzip.level=5
zookeeper.metadata.migration.enable=True
zookeeper.metadata.migration.min.batch.size=512
initial.broker.registration.timeout.ms=18000
{code}
When trying to move to the next step (`Migrating brokers to KRaft`), it fails
to get controller quorum and crashes.
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:24:07,521] INFO Registered kafka:type=kafka.Log4jController
MBean (kafka.utils.Log4jControllerRegistration$)
[2025-05-20 15:24:07,826] INFO [BrokerServer id=2] Transition from SHUTDOWN to
STARTING (kafka.server.BrokerServer)
[2025-05-20 15:24:07,827] INFO [SharedServer id=2] Starting SharedServer
(kafka.server.SharedServer)
[2025-05-20 15:24:07,884] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Recovering unflushed segment 21124. 0/1 recovered for
__cluster_metadata-0. (kafka.log.LogLoader)
[2025-05-20 15:24:07,885] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 21124 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,886] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 21124 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,893] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load and
7ms for segment recovery from offset 21124 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Loading producer state till offset 31424 with message
format version 2 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
producer state from offset 31424 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,960] INFO [LogLoader partition=__cluster_metadata-0,
dir=/data/kafka1/kafka] Producer state recovery took 2ms for snapshot load and
0ms for segment recovery from offset 31424 (kafka.log.UnifiedLog$)
[2025-05-20 15:24:07,995] INFO Initialized snapshots with IDs
SortedSet(OffsetAndEpoch(offset=21124, epoch=7)) from
/data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
[2025-05-20 15:24:08,023] INFO [raft-expiration-reaper]: Starting
(kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
[2025-05-20 15:24:08,301] INFO [BrokerServer id=2] Starting broker
(kafka.server.BrokerServer)
[2025-05-20 15:24:08,318] INFO [broker-2-ThrottledChannelReaper-Fetch]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,319] INFO [broker-2-ThrottledChannelReaper-Produce]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,320] INFO [broker-2-ThrottledChannelReaper-Request]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,322] INFO
[broker-2-ThrottledChannelReaper-ControllerMutation]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Waiting for controller
quorum voters future (kafka.server.BrokerServer)
[2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Finished waiting for
controller quorum voters future (kafka.server.BrokerServer)
[2025-05-20 15:24:08,371] INFO
[broker-2-to-controller-forwarding-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,592] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:24:08,687] INFO [SocketServer listenerType=BROKER, nodeId=2]
Created data-plane acceptor and processors for endpoint :
ListenerName(EXTERNAL) (kafka.network.SocketServer)
[2025-05-20 15:24:08,688] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2025-05-20 15:24:08,759] INFO [SocketServer listenerType=BROKER, nodeId=2]
Created data-plane acceptor and processors for endpoint :
ListenerName(INTERNAL) (kafka.network.SocketServer)
[2025-05-20 15:24:08,765] INFO
[broker-2-to-controller-alter-partition-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,771] INFO
[broker-2-to-controller-directory-assignments-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Produce]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Fetch]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,789] INFO [ExpirationReaper-2-DeleteRecords]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-ElectLeader]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-RemoteFetch]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Heartbeat]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Rebalance]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,863] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Starting
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:08,865] INFO [BrokerLifecycleManager id=2] Incarnation
ZvarJwVqQZOUzzobd91HlQ of broker 2 in cluster 9ca1s2tbTGm1Bi35aynf6Q is now
STARTING. (kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:08,898] INFO [ExpirationReaper-2-AlterAcls]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2025-05-20 15:24:08,978] INFO [BrokerServer id=2] Waiting for the broker
metadata publishers to be installed (kafka.server.BrokerServer)
[2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Finished waiting for the
broker metadata publishers to be installed (kafka.server.BrokerServer)
[2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Waiting for the controller
to acknowledge that we are caught up (kafka.server.BrokerServer)
{code}
{code:java}
==> /var/log/kafka/kafka-server.log <==
[2025-05-20 15:24:13,380] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:17,990] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:22,701] INFO [BrokerLifecycleManager id=2] Unable to register
the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,864] ERROR [BrokerLifecycleManager id=2] Shutting down
because we were unable to register with the controller quorum.
(kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,864] INFO [BrokerLifecycleManager id=2] Transitioning from
STARTING to SHUTTING_DOWN. (kafka.server.BrokerLifecycleManager)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Shutting down
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Stopped
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] INFO
[broker-2-to-controller-heartbeat-channel-manager]: Shutdown completed
(kafka.server.NodeToControllerRequestThread)
[2025-05-20 15:24:26,865] ERROR [BrokerServer id=2] Received a fatal error
while waiting for the controller to acknowledge that we are caught up
(kafka.server.BrokerServer)
java.util.concurrent.CancellationException
at
java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
at
kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2025-05-20 15:24:26,867] INFO Node to controller channel manager for heartbeat
shutdown (kafka.server.NodeToControllerChannelManagerImpl)
[2025-05-20 15:24:26,868] INFO [BrokerServer id=2] Transition from STARTING to
STARTED (kafka.server.BrokerServer)
[2025-05-20 15:24:26,870] ERROR [BrokerServer id=2] Fatal error during broker
startup. Prepare to shutdown (kafka.server.BrokerServer)
java.lang.RuntimeException: Received a fatal error while waiting for the
controller to acknowledge that we are caught up
at
org.apache.kafka.server.util.FutureUtils.waitWithLogging(FutureUtils.java:73)
at kafka.server.BrokerServer.startup(BrokerServer.scala:502)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
Caused by: java.util.concurrent.CancellationException
at
java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
at
kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
at java.base/java.lang.Thread.run(Thread.java:1583)
[2025-05-20 15:24:26,870] INFO [BrokerServer id=2] Transition from STARTED to
SHUTTING_DOWN (kafka.server.BrokerServer)
[2025-05-20 15:24:26,871] INFO [BrokerServer id=2] shutting down
(kafka.server.BrokerServer)
[2025-05-20 15:24:26,871] INFO [SocketServer listenerType=BROKER, nodeId=2]
Stopping socket server request processors (kafka.network.SocketServer)
[2025-05-20 15:24:26,874] ERROR Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
(kafka.network.DataPlaneAcceptor)
java.lang.NullPointerException: Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
at kafka.network.Acceptor.close(SocketServer.scala:678)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
at kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
[2025-05-20 15:24:26,906] ERROR Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
(kafka.network.DataPlaneAcceptor)
java.lang.NullPointerException: Cannot invoke
"java.nio.channels.ServerSocketChannel.close()" because the return value of
"kafka.network.Acceptor.serverChannel()" is null
at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
at kafka.network.Acceptor.close(SocketServer.scala:678)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
at
kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
at kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
at
kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
at scala.Option.foreach(Option.scala:437)
at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
at kafka.Kafka$.main(Kafka.scala:112)
at kafka.Kafka.main(Kafka.scala)
[2025-05-20 15:24:26,932] INFO [SocketServer listenerType=BROKER, nodeId=2]
Stopped socket server request processors (kafka.network.SocketServer) {code}
was:
Despite several attempts to migrate from Zookeeper cluster to Kraft, it failed
to properly migrate.
We spawn a need cluster fully healthy with 3 Kafka nodes connected to 3
Zookeeper nodes. 3 new Kafka nodes are there for the new controllers.
It was tested with Kafka 3.6.1, 3.6.2 and 3.7.0.
it might be linked to KAFKA-15330.
The controllers are started without issue. When the brokers are then configured
for the migration, the migration is not starting. Once the last broker is
restarted, we got the following logs.
{code:java}
[2024-06-03 15:11:48,192] INFO [ReplicaFetcherThread-0-11]: Stopped
(kafka.server.ReplicaFetcherThread)
[2024-06-03 15:11:48,193] INFO [ReplicaFetcherThread-0-11]: Shutdown completed
(kafka.server.ReplicaFetcherThread)
{code}
Then we only get the following every 30s
{code:java}
[2024-06-03 15:12:04,163] INFO [BrokerLifecycleManager id=12 isZkBroker=true]
Unable to register the broker because the RPC got timed out before it could be
sent. (kafka.server.BrokerLifecycleManager)
[2024-06-03 15:12:34,297] INFO [BrokerLifecycleManager id=12 isZkBroker=true]
Unable to register the broker because the RPC got timed out before it could be
sent. (kafka.server.BrokerLifecycleManager)
[2024-06-03 15:13:04,536] INFO [BrokerLifecycleManager id=12 isZkBroker=true]
Unable to register the broker because the RPC got timed out before it could be
sent. (kafka.server.BrokerLifecycleManager){code}
The config on the controller node is the following
{code:java}
kafka0202e1 ~]$ sudo grep -v '^\s*$\|^\s*\#' /etc/kafka/server.properties |
grep -v password | sort
advertised.host.name=kafka0202e1.ahub.sb.eu.ginfra.net
broker.rack=e1
controller.listener.names=CONTROLLER
[email protected]:9093,[email protected]:9093,[email protected]:9093
default.replication.factor=3
delete.topic.enable=false
group.initial.rebalance.delay.ms=3000
inter.broker.protocol.version=3.7
listeners=CONTROLLER://kafka0202e1.ahub.sb.eu.ginfra.net:9093
listener.security.protocol.map=CONTROLLER:SSL,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
log.dirs=/data/kafka
log.message.format.version=3.6
log.retention.check.interval.ms=300000
log.retention.hours=240
log.segment.bytes=1073741824
min.insync.replicas=2
node.id=20
num.io.threads=8
num.network.threads=3
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
process.roles=controller
security.inter.broker.protocol=SSL
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
ssl.cipher.suites=TLS_AES_256_GCM_SHA384
ssl.client.auth=required
ssl.enabled.protocols=TLSv1.3
ssl.endpoint.identification.algorithm=HTTPS
ssl.keystore.location=/etc/kafka/ssl/keystore.ts
ssl.keystore.type=JKS
ssl.secure.random.implementation=SHA1PRNG
ssl.truststore.location=/etc/kafka/ssl/truststore.ts
transaction.state.log.min.isr=3
transaction.state.log.replication.factor=3
unclean.leader.election.enable=false
zookeeper.connect=10.135.65.199:2181,10.133.65.199:2181,10.137.64.56:2181,
zookeeper.metadata.migration.enable=true
{code}
The config on the broker node is the following
{code}
$ sudo grep -v '^\s*$\|^\s*\#' /etc/kafka/server.properties | grep -v password
| sort
advertised.host.name=kafka0201e3.ahub.sb.eu.ginfra.net
advertised.listeners=SSL://kafka0201e3.ahub.sb.eu.ginfra.net:9092
broker.id=12
broker.rack=e3
controller.listener.names=CONTROLLER # added once all controllers were started
[email protected]:9093,[email protected]:9093,[email protected]:9093
# added once all controllers were started
default.replication.factor=3
delete.topic.enable=false
group.initial.rebalance.delay.ms=3000
inter.broker.protocol.version=3.7
listener.security.protocol.map=CONTROLLER:SSL,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners=SSL://kafka0201e3.ahub.sb.eu.ginfra.net:9092
log.dirs=/data/kafka
log.retention.check.interval.ms=300000
log.retention.hours=240
log.segment.bytes=1073741824
min.insync.replicas=2
num.io.threads=8
num.network.threads=3
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
security.inter.broker.protocol=SSL
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
ssl.cipher.suites=TLS_AES_256_GCM_SHA384
ssl.client.auth=required
ssl.enabled.protocols=TLSv1.3
ssl.endpoint.identification.algorithm=HTTPS
ssl.keystore.location=/etc/kafka/ssl/keystore.ts
ssl.keystore.type=JKS
ssl.secure.random.implementation=SHA1PRNG
ssl.truststore.location=/etc/kafka/ssl/truststore.ts
transaction.state.log.min.isr=3
transaction.state.log.replication.factor=3
unclean.leader.election.enable=false
zookeeper.connect=10.133.65.199:2181,10.135.65.199:2181,10.137.64.56:2181,
zookeeper.connection.timeout.ms=6000
zookeeper.metadata.migration.enable=true # added once all controllers were
started
{code}
When trying to move to the next step (`Migrating brokers to KRaft`), it fails
to get controller quorum and crashes.
{code}
[2024-06-03 15:33:21,553] INFO [BrokerLifecycleManager id=12] Unable to
register the broker because the RPC got timed out before it could be sent.
(kafka.server.BrokerLifecycleManager)
[2024-06-03 15:33:32,549] ERROR [BrokerLifecycleManager id=12] Shutting down
because we were unable to register with the controller quorum.
(kafka.server.BrokerLifecycleManager)
[2024-06-03 15:33:32,550] INFO [BrokerLifecycleManager id=12] Transitioning
from STARTING to SHUTTING_DOWN. (kafka.server.BrokerLifecycleManager)
[2024-06-03 15:33:32,551] INFO
[broker-12-to-controller-heartbeat-channel-manager]: Shutting down
(kafka.server.NodeToControllerRequestThread)
[2024-06-03 15:33:32,551] INFO
[broker-12-to-controller-heartbeat-channel-manager]: Shutdown completed
(kafka.server.NodeToControllerRequestThread)
[2024-06-03 15:33:32,551] ERROR [BrokerServer id=12] Received a fatal error
while waiting for the controller to acknowledge that we are caught up
(kafka.server.BrokerServer)
java.util.concurrent.CancellationException
{code}
> Zookeeper-Kraft failing migration - RPC got timed out before it could be sent
> -----------------------------------------------------------------------------
>
> Key: KAFKA-19318
> URL: https://issues.apache.org/jira/browse/KAFKA-19318
> Project: Kafka
> Issue Type: Bug
> Components: kraft
> Affects Versions: 3.9.0
> Reporter: dennis lucero
> Priority: Major
>
> Despite several attempts to migrate from Zookeeper cluster to Kraft, it
> failed to properly migrate.
> We spawn a need cluster fully healthy with 18 Kafka nodes connected to 5
> Zookeeper nodes. 9 new Kafka nodes are there for the new controllers.
> It was tested with Kafka 3.9.0
> The controllers are started with:
> {code:java}
> ==> /var/log/kafka/kafka-server.log <==
> [2025-05-20 15:04:28,798] INFO Registered kafka:type=kafka.Log4jController
> MBean (kafka.utils.Log4jControllerRegistration$)
> [2025-05-20 15:04:29,207] INFO [ControllerServer id=111] Starting controller
> (kafka.server.ControllerServer)
> [2025-05-20 15:04:29,656] INFO Updated connection-accept-rate max connection
> creation rate to 2147483647 (kafka.network.ConnectionQuotas)
> [2025-05-20 15:04:29,784] INFO [SocketServer listenerType=CONTROLLER,
> nodeId=111] Created data-plane acceptor and processors for endpoint :
> ListenerName(CONTROLLER) (kafka.network.SocketServer)
> [2025-05-20 15:04:29,792] INFO [SharedServer id=111] Starting SharedServer
> (kafka.server.SharedServer)
> [2025-05-20 15:04:29,841] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Recovering unflushed segment 0. 0/1 recovered for
> __cluster_metadata-0. (kafka.log.LogLoader)
> [2025-05-20 15:04:29,842] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Loading producer state till offset 0 with message
> format version 2 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
> producer state from offset 0 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,843] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load
> and 0ms for segment recovery from offset 0 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Loading producer state till offset 2709 with message
> format version 2 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,905] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
> producer state from offset 2709 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,906] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Producer state recovery took 1ms for snapshot load
> and 0ms for segment recovery from offset 2709 (kafka.log.UnifiedLog$)
> [2025-05-20 15:04:29,940] INFO Initialized snapshots with IDs
> SortedSet(OffsetAndEpoch(offset=0, epoch=0)) from
> /data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
> [2025-05-20 15:04:29,952] INFO [raft-expiration-reaper]: Starting
> (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
> [2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Waiting for
> controller quorum voters future (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,191] INFO [ControllerServer id=111] Finished waiting for
> controller quorum voters future (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,233] INFO [ZooKeeperClient KRaft Migration] Initializing
> a new session to
> prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181.
> (kafka.zookeeper.ZooKeeperClient)
> [2025-05-20 15:04:30,248] INFO [ZooKeeperClient KRaft Migration] Waiting
> until connected. (kafka.zookeeper.ZooKeeperClient)
> [2025-05-20 15:04:30,261] INFO [ZooKeeperClient KRaft Migration] Connected.
> (kafka.zookeeper.ZooKeeperClient)
> [2025-05-20 15:04:30,293] INFO [controller-111-ThrottledChannelReaper-Fetch]:
> Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:04:30,293] INFO
> [controller-111-ThrottledChannelReaper-Produce]: Starting
> (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:04:30,294] INFO
> [controller-111-ThrottledChannelReaper-Request]: Starting
> (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:04:30,297] INFO
> [controller-111-ThrottledChannelReaper-ControllerMutation]: Starting
> (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:04:30,316] INFO [ExpirationReaper-111-AlterAcls]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:04:30,392] INFO [ControllerServer id=111] Waiting for the
> controller metadata publishers to be installed (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,393] INFO [ControllerServer id=111] Finished waiting for
> the controller metadata publishers to be installed
> (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,393] INFO [SocketServer listenerType=CONTROLLER,
> nodeId=111] Enabling request processing. (kafka.network.SocketServer)
> [2025-05-20 15:04:30,394] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] Found registration for
> UwcgQ_r6T_uBVDMCdWLG6A instead of our incarnation.
> (kafka.server.ControllerRegistrationManager)
> [2025-05-20 15:04:30,395] INFO Awaiting socket connections on 0.0.0.0:9091.
> (kafka.network.DataPlaneAcceptor)
> [2025-05-20 15:04:30,455] INFO
> [controller-111-to-controller-registration-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:04:30,455] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] initialized channel manager.
> (kafka.server.ControllerRegistrationManager)
> [2025-05-20 15:04:30,455] INFO [ControllerServer id=111] Waiting for all of
> the authorizer futures to be completed (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
> all of the authorizer futures to be completed (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Waiting for all of
> the SocketServer Acceptors to be started (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,456] INFO [ControllerServer id=111] Finished waiting for
> all of the SocketServer Acceptors to be started
> (kafka.server.ControllerServer)
> [2025-05-20 15:04:30,457] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] sendControllerRegistration: attempting to
> send ControllerRegistrationRequestData(controllerId=111,
> incarnationId=mbreItYzTmySXeahJkXxFA, zkMigrationReady=true,
> listeners=[Listener(name='CONTROLLER',
> host='prodkafkacontr11z1.h.ae-rus.net', port=9091, securityProtocol=0)],
> features=[Feature(name='kraft.version', minSupportedVersion=0,
> maxSupportedVersion=1), Feature(name='metadata.version',
> minSupportedVersion=1, maxSupportedVersion=21)])
> (kafka.server.ControllerRegistrationManager)
> [2025-05-20 15:04:30,471] INFO
> [controller-111-to-controller-registration-channel-manager]: Recorded new
> KRaft controller, from now on will use node
> prodkafkacontr11z1.h.ae-rus.net:9091 (id: 111 rack: null)
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:04:30,505] INFO [KafkaRaftServer nodeId=111] Kafka Server
> started (kafka.server.KafkaRaftServer)
> [2025-05-20 15:04:30,579] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] Our registration has been persisted to
> the metadata log. (kafka.server.ControllerRegistrationManager)
> [2025-05-20 15:04:30,581] INFO [ControllerRegistrationManager id=111
> incarnation=mbreItYzTmySXeahJkXxFA] RegistrationResponseHandler: controller
> acknowledged ControllerRegistrationRequest.
> (kafka.server.ControllerRegistrationManager){code}
> {code:java}
> cat /var/log/kafka/controller.log
> [2025-05-20 15:04:30,357] TRACE [KRaftMigrationDriver id=111] Recovered
> migration state from ZK in 68765354 ns. Transitioned migration state from
> ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
> kraftMetadataOffset=-1, kraftMetadataEpoch=-1, lastUpdatedTimeMs=-1,
> migrationZkVersion=-2, controllerZkEpoch=-1, controllerZkVersion=-2} to
> ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
> kraftMetadataOffset=-1, kraftMetadataEpoch=-1,
> lastUpdatedTimeMs=1747648116174, migrationZkVersion=0, controllerZkEpoch=-1,
> controllerZkVersion=-2}
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] Initial
> migration of ZK metadata is not done.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,358] INFO [KRaftMigrationDriver id=111] 111
> transitioning from UNINITIALIZED to INACTIVE state
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Became active
> migration driver in 27621312 ns. Transitioned migration state from
> ZkMigrationLeadershipState{kraftControllerId=-1, kraftControllerEpoch=-1,
> kraftMetadataOffset=-1, kraftMetadataEpoch=-1,
> lastUpdatedTimeMs=1747648116174, migrationZkVersion=0, controllerZkEpoch=-1,
> controllerZkVersion=-2} to ZkMigrationLeadershipState{kraftControllerId=111,
> kraftControllerEpoch=2, kraftMetadataOffset=-1, kraftMetadataEpoch=-1,
> lastUpdatedTimeMs=1747648116174, migrationZkVersion=0, controllerZkEpoch=-1,
> controllerZkVersion=-2}
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111
> transitioning from INACTIVE to WAIT_FOR_CONTROLLER_QUORUM state
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] Controller
> Quorum is ready for Zk to KRaft migration. Now waiting for ZK brokers.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,388] INFO [KRaftMigrationDriver id=111] 111
> transitioning from WAIT_FOR_CONTROLLER_QUORUM to WAIT_FOR_BROKERS state
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,389] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,494] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,579] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,580] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,645] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,645] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,701] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,701] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,730] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,730] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,778] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,861] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,861] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,905] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:30,905] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:31,202] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:31,202] INFO [KRaftMigrationDriver id=111] No brokers are
> known to KRaft, waiting for brokers to register.
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:31,278] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> [2025-05-20 15:04:31,778] TRACE [KRaftMigrationDriver id=111] Received
> metadata delta, but the controller is not in dual-write mode. Ignoring this
> metadata update. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)
> {code}
> The config on the controller node is the following
> {code:java}
> cat /etc/kafka/server.properties
> ##
> # Ansible managed
> #host.name=prodkafkacontr14z4.h.ae-rus.netcluster.name=MrgOne
> node.id=114
> controller.rack=z4replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=controllerdelete.topic.enable=True
> auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
> listeners=CONTROLLER://:9091advertised.listeners=CONTROLLER://prodkafkacontr14z4.h.ae-rus.net:9091
> controller.listener.names=CONTROLLER
> inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
> num.network.threads=128
> background.threads=128num.io.threads=128
> socket.send.buffer.bytes=2097152
> socket.receive.buffer.bytes=2097152
> socket.request.max.bytes=104857600
> queued.max.requests=32768offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka
> num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
> log.flush.interval.ms=5000log.retention.hours=48
> log.retention.bytes=1073741824
> log.segment.bytes=1073741824
> log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
> leader.imbalance.check.interval.seconds=300
> leader.imbalance.per.broker.percentage=20default.replication.factor=3
> num.replica.fetchers=64
> replica.fetch.max.bytes=41943040
> replica.fetch.wait.max.ms=1000
> replica.high.watermark.checkpoint.interval.ms=5000
> replica.socket.timeout.ms=30000
> replica.socket.receive.buffer.bytes=65536
> replica.lag.time.max.ms=10000############################# Extra Properties
> #############################
> offsets.topic.num.partitions=150
> message.max.bytes=20971520
> batch.size=65536
> zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
> rack.aware.assignment.strategy=balance_subtopology
> rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
> compression.gzip.level=5
> zookeeper.metadata.migration.enable=True
> zookeeper.metadata.migration.min.batch.size=512
> initial.broker.registration.timeout.ms=18000 {code}
> The config on the broker node is the following
> {code:java}
> $ cat /etc/kafka/server.properties
> ##
> # Ansible managed
> #host.name=prodkafka2z1.h.ae-rus.netcluster.name=MrgOne
> node.id=2
> broker.rack=z1replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelectorprocess.roles=brokerdelete.topic.enable=True
> auto.create.topics.enable=Truelistener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
> listeners=EXTERNAL://:9092,INTERNAL://:29092advertised.listeners=EXTERNAL://prod-kafka-12.ae-rus.net:9092,INTERNAL://prod-kafka-12.ae-rus.net:29092
> controller.listener.names=CONTROLLER
> inter.broker.listener.name=EXTERNALcontroller.quorum.voters=1...@prodkafkacontr11z1.h.ae-rus.net:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091,[email protected]:9091controller.quorum.bootstrap.servers=prodkafkacontr11z1.h.ae-rus.net:9091,prodkafkacontr12z1.h.ae-rus.net:9091,prodkafkacontr13z1.h.ae-rus.net:9091,prodkafkacontr17z3.h.ae-rus.net:9091,prodkafkacontr18z3.h.ae-rus.net:9091,prodkafkacontr19z3.h.ae-rus.net:9091,prodkafkacontr14z4.h.ae-rus.net:9091,prodkafkacontr15z4.h.ae-rus.net:9091,prodkafkacontr16z4.h.ae-rus.net:9091
> num.network.threads=128
> background.threads=128num.io.threads=128
> socket.send.buffer.bytes=2097152
> socket.receive.buffer.bytes=2097152
> socket.request.max.bytes=104857600
> queued.max.requests=32768offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=3log.dirs=/data/kafka1/kafka,/data/kafka2/kafka,/data/kafka3/kafka,/data/kafka4/kafka,/data/kafka5/kafka,/data/kafka6/kafka
> num.recovery.threads.per.data.dir=16log.flush.interval.messages=10240
> log.flush.interval.ms=5000log.retention.hours=48
> log.retention.bytes=1073741824
> log.segment.bytes=1073741824
> log.retention.check.interval.ms=300000group.initial.rebalance.delay.ms=1000auto.leader.rebalance.enable=True
> leader.imbalance.check.interval.seconds=300
> leader.imbalance.per.broker.percentage=20default.replication.factor=3
> num.replica.fetchers=64
> replica.fetch.max.bytes=41943040
> replica.fetch.wait.max.ms=1000
> replica.high.watermark.checkpoint.interval.ms=5000
> replica.socket.timeout.ms=30000
> replica.socket.receive.buffer.bytes=65536
> replica.lag.time.max.ms=10000############################# Extra Properties
> #############################
> offsets.topic.num.partitions=150
> message.max.bytes=20971520
> batch.size=65536
> zookeeper.connect=prod-zk-11.ae-rus.net:2181,prod-zk-12.ae-rus.net:2181,prod-zk-13.ae-rus.net:2181,prod-zk-14.ae-rus.net:2181,prod-zk-15.ae-rus.net:2181
> rack.aware.assignment.strategy=balance_subtopology
> rack.aware.assignment.traffic_cost=org.apache.kafka.streams.processor.internals.assignment.RackAwareTaskAssignor
> compression.gzip.level=5
> zookeeper.metadata.migration.enable=True
> zookeeper.metadata.migration.min.batch.size=512
> initial.broker.registration.timeout.ms=18000
> {code}
> When trying to move to the next step (`Migrating brokers to KRaft`), it fails
> to get controller quorum and crashes.
> {code:java}
> ==> /var/log/kafka/kafka-server.log <==
> [2025-05-20 15:24:07,521] INFO Registered kafka:type=kafka.Log4jController
> MBean (kafka.utils.Log4jControllerRegistration$)
> [2025-05-20 15:24:07,826] INFO [BrokerServer id=2] Transition from SHUTDOWN
> to STARTING (kafka.server.BrokerServer)
> [2025-05-20 15:24:07,827] INFO [SharedServer id=2] Starting SharedServer
> (kafka.server.SharedServer)
> [2025-05-20 15:24:07,884] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Recovering unflushed segment 21124. 0/1 recovered for
> __cluster_metadata-0. (kafka.log.LogLoader)
> [2025-05-20 15:24:07,885] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Loading producer state till offset 21124 with message
> format version 2 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,886] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
> producer state from offset 21124 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,893] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Producer state recovery took 0ms for snapshot load
> and 7ms for segment recovery from offset 21124 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Loading producer state till offset 31424 with message
> format version 2 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,958] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Reloading from producer snapshot and rebuilding
> producer state from offset 31424 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,960] INFO [LogLoader partition=__cluster_metadata-0,
> dir=/data/kafka1/kafka] Producer state recovery took 2ms for snapshot load
> and 0ms for segment recovery from offset 31424 (kafka.log.UnifiedLog$)
> [2025-05-20 15:24:07,995] INFO Initialized snapshots with IDs
> SortedSet(OffsetAndEpoch(offset=21124, epoch=7)) from
> /data/kafka1/kafka/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
> [2025-05-20 15:24:08,023] INFO [raft-expiration-reaper]: Starting
> (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
> [2025-05-20 15:24:08,301] INFO [BrokerServer id=2] Starting broker
> (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,318] INFO [broker-2-ThrottledChannelReaper-Fetch]:
> Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:24:08,319] INFO [broker-2-ThrottledChannelReaper-Produce]:
> Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:24:08,320] INFO [broker-2-ThrottledChannelReaper-Request]:
> Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:24:08,322] INFO
> [broker-2-ThrottledChannelReaper-ControllerMutation]: Starting
> (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
> [2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Waiting for controller
> quorum voters future (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,365] INFO [BrokerServer id=2] Finished waiting for
> controller quorum voters future (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,371] INFO
> [broker-2-to-controller-forwarding-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:08,592] INFO Updated connection-accept-rate max connection
> creation rate to 2147483647 (kafka.network.ConnectionQuotas)
> [2025-05-20 15:24:08,687] INFO [SocketServer listenerType=BROKER, nodeId=2]
> Created data-plane acceptor and processors for endpoint :
> ListenerName(EXTERNAL) (kafka.network.SocketServer)
> [2025-05-20 15:24:08,688] INFO Updated connection-accept-rate max connection
> creation rate to 2147483647 (kafka.network.ConnectionQuotas)
> [2025-05-20 15:24:08,759] INFO [SocketServer listenerType=BROKER, nodeId=2]
> Created data-plane acceptor and processors for endpoint :
> ListenerName(INTERNAL) (kafka.network.SocketServer)
> [2025-05-20 15:24:08,765] INFO
> [broker-2-to-controller-alter-partition-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:08,771] INFO
> [broker-2-to-controller-directory-assignments-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Produce]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,788] INFO [ExpirationReaper-2-Fetch]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,789] INFO [ExpirationReaper-2-DeleteRecords]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-ElectLeader]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,790] INFO [ExpirationReaper-2-RemoteFetch]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Heartbeat]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,808] INFO [ExpirationReaper-2-Rebalance]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,863] INFO
> [broker-2-to-controller-heartbeat-channel-manager]: Starting
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:08,865] INFO [BrokerLifecycleManager id=2] Incarnation
> ZvarJwVqQZOUzzobd91HlQ of broker 2 in cluster 9ca1s2tbTGm1Bi35aynf6Q is now
> STARTING. (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:08,898] INFO [ExpirationReaper-2-AlterAcls]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2025-05-20 15:24:08,978] INFO [BrokerServer id=2] Waiting for the broker
> metadata publishers to be installed (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Finished waiting for the
> broker metadata publishers to be installed (kafka.server.BrokerServer)
> [2025-05-20 15:24:08,979] INFO [BrokerServer id=2] Waiting for the controller
> to acknowledge that we are caught up (kafka.server.BrokerServer)
> {code}
> {code:java}
> ==> /var/log/kafka/kafka-server.log <==
> [2025-05-20 15:24:13,380] INFO [BrokerLifecycleManager id=2] Unable to
> register the broker because the RPC got timed out before it could be sent.
> (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:17,990] INFO [BrokerLifecycleManager id=2] Unable to
> register the broker because the RPC got timed out before it could be sent.
> (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:22,701] INFO [BrokerLifecycleManager id=2] Unable to
> register the broker because the RPC got timed out before it could be sent.
> (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:26,864] ERROR [BrokerLifecycleManager id=2] Shutting down
> because we were unable to register with the controller quorum.
> (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:26,864] INFO [BrokerLifecycleManager id=2] Transitioning
> from STARTING to SHUTTING_DOWN. (kafka.server.BrokerLifecycleManager)
> [2025-05-20 15:24:26,865] INFO
> [broker-2-to-controller-heartbeat-channel-manager]: Shutting down
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:26,865] INFO
> [broker-2-to-controller-heartbeat-channel-manager]: Stopped
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:26,865] INFO
> [broker-2-to-controller-heartbeat-channel-manager]: Shutdown completed
> (kafka.server.NodeToControllerRequestThread)
> [2025-05-20 15:24:26,865] ERROR [BrokerServer id=2] Received a fatal error
> while waiting for the controller to acknowledge that we are caught up
> (kafka.server.BrokerServer)
> java.util.concurrent.CancellationException
> at
> java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
> at
> kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> [2025-05-20 15:24:26,867] INFO Node to controller channel manager for
> heartbeat shutdown (kafka.server.NodeToControllerChannelManagerImpl)
> [2025-05-20 15:24:26,868] INFO [BrokerServer id=2] Transition from STARTING
> to STARTED (kafka.server.BrokerServer)
> [2025-05-20 15:24:26,870] ERROR [BrokerServer id=2] Fatal error during broker
> startup. Prepare to shutdown (kafka.server.BrokerServer)
> java.lang.RuntimeException: Received a fatal error while waiting for the
> controller to acknowledge that we are caught up
> at
> org.apache.kafka.server.util.FutureUtils.waitWithLogging(FutureUtils.java:73)
> at kafka.server.BrokerServer.startup(BrokerServer.scala:502)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
> at scala.Option.foreach(Option.scala:437)
> at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
> at kafka.Kafka$.main(Kafka.scala:112)
> at kafka.Kafka.main(Kafka.scala)
> Caused by: java.util.concurrent.CancellationException
> at
> java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2510)
> at
> kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:638)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:191)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> [2025-05-20 15:24:26,870] INFO [BrokerServer id=2] Transition from STARTED to
> SHUTTING_DOWN (kafka.server.BrokerServer)
> [2025-05-20 15:24:26,871] INFO [BrokerServer id=2] shutting down
> (kafka.server.BrokerServer)
> [2025-05-20 15:24:26,871] INFO [SocketServer listenerType=BROKER, nodeId=2]
> Stopping socket server request processors (kafka.network.SocketServer)
> [2025-05-20 15:24:26,874] ERROR Cannot invoke
> "java.nio.channels.ServerSocketChannel.close()" because the return value of
> "kafka.network.Acceptor.serverChannel()" is null
> (kafka.network.DataPlaneAcceptor)
> java.lang.NullPointerException: Cannot invoke
> "java.nio.channels.ServerSocketChannel.close()" because the return value of
> "kafka.network.Acceptor.serverChannel()" is null
> at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
> at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
> at kafka.network.Acceptor.close(SocketServer.scala:678)
> at
> kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
> at
> kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
> at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
> at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
> at
> kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
> at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
> at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
> at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
> at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
> at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
> at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
> at scala.Option.foreach(Option.scala:437)
> at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
> at kafka.Kafka$.main(Kafka.scala:112)
> at kafka.Kafka.main(Kafka.scala)
> [2025-05-20 15:24:26,906] ERROR Cannot invoke
> "java.nio.channels.ServerSocketChannel.close()" because the return value of
> "kafka.network.Acceptor.serverChannel()" is null
> (kafka.network.DataPlaneAcceptor)
> java.lang.NullPointerException: Cannot invoke
> "java.nio.channels.ServerSocketChannel.close()" because the return value of
> "kafka.network.Acceptor.serverChannel()" is null
> at kafka.network.Acceptor.$anonfun$closeAll$2(SocketServer.scala:711)
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
> at kafka.network.Acceptor.closeAll(SocketServer.scala:711)
> at kafka.network.Acceptor.close(SocketServer.scala:678)
> at
> kafka.network.SocketServer.$anonfun$stopProcessingRequests$4(SocketServer.scala:287)
> at
> kafka.network.SocketServer.$anonfun$stopProcessingRequests$4$adapted(SocketServer.scala:287)
> at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
> at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
> at
> kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:287)
> at kafka.server.BrokerServer.$anonfun$shutdown$5(BrokerServer.scala:638)
> at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:68)
> at kafka.server.BrokerServer.shutdown(BrokerServer.scala:638)
> at kafka.server.KafkaBroker.shutdown(KafkaBroker.scala:98)
> at kafka.server.KafkaBroker.shutdown$(KafkaBroker.scala:98)
> at kafka.server.BrokerServer.shutdown(BrokerServer.scala:67)
> at kafka.server.BrokerServer.startup(BrokerServer.scala:555)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:99)
> at
> kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:99)
> at scala.Option.foreach(Option.scala:437)
> at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:99)
> at kafka.Kafka$.main(Kafka.scala:112)
> at kafka.Kafka.main(Kafka.scala)
> [2025-05-20 15:24:26,932] INFO [SocketServer listenerType=BROKER, nodeId=2]
> Stopped socket server request processors (kafka.network.SocketServer) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)