[
https://issues.apache.org/jira/browse/KAFKA-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Juha Mynttinen updated KAFKA-17752:
-----------------------------------
Description:
Hey,
Tested using 3.9.0 RC0.
It seems that "kafka-metadata-quorum.sh remove-controller" causes the removed
controller to crash if it is one of the controllers specified using
"--initial-controllers "
Steps to reproduce:
Clean up and setup the environment
rm -rf /tmp/controllers && \
mkdir -p /tmp/controllers/c1 && \
mkdir -p /tmp/controllers/c2 && \
mkdir -p /tmp/controllers/c3
export KAFKA_HOME=<your_kafka_3_9_home>
Format the controllers
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c1.properties
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c2.properties
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c3.properties
Start the controllers, in separate terminals
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c1.properties
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c2.properties
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c3.properties
Remove a controller:
$KAFKA_HOME/bin/kafka-metadata-quorum.sh --bootstrap-controller
localhost:10001,localhost:10002,localhost:10003,localhost:10004
remove-controller --controller-id 1001 --controller-directory-id
AAAAAAAAAAEAAAAAAAAAAA
The process crashes with the following error:
[2024-10-09 15:19:15,574] ERROR Encountered fatal fault: exception while
renouncing leadership
(org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
java.lang.RuntimeException: Unable to reset to last stable offset 55. No
in-memory snapshot found for this offset.
at
org.apache.kafka.controller.OffsetControlManager.deactivate(OffsetControlManager.java:268)
at
org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1281)
at
org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:552)
at
org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:180)
at
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:885)
at
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:875)
at
org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:153)
at
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:142)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:215)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:186)
at java.base/java.lang.Thread.run(Thread.java:840)
If the process that died is restarted it joins the cluster and becomes on
observer, as expected.
The crash doesn't happen in a slightly different case, exact steps missing. But
the idea is this:
1. Create a 3-controller cluster as above
2. Format and start a 4rd controller.
3. Add the 4th controller as a voter.
4. Remove the 4th controller to make it an observer. It becomes observer as
expected.
Because this case works, I'm guessing the crash is somehow related to the
controller being one of the initial controllers.
I didn't dig deeper on why the crash occurs.
was:
Hey,
Tested using 3.9.0 RC0.
It seems that "kafka-metadata-quorum.sh remove-controller" causes the removed
controller to crash if it is one of the controllers specified using
"--initial-controllers "
Steps to reproduce:
Clean up and setup the environment
rm -rf /tmp/controllers && \
mkdir -p /tmp/controllers/c1 && \
mkdir -p /tmp/controllers/c2 && \
mkdir -p /tmp/controllers/c3
export KAFKA_HOME=<your_kafka_3_9_home>
Format the controllers
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c1.properties
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c2.properties
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c3.properties
Start the controllers, in separate terminals
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c1.properties
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c2.properties
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c3.properties
Remove a controller:
$KAFKA_HOME/bin/kafka-metadata-quorum.sh --bootstrap-controller
localhost:10001,localhost:10002,localhost:10003,localhost:10004
remove-controller --controller-id 1001 --controller-directory-id
AAAAAAAAAAEAAAAAAAAAAA
The process crashes with the following error:
[2024-10-09 15:19:15,574] ERROR Encountered fatal fault: exception while
renouncing leadership
(org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
java.lang.RuntimeException: Unable to reset to last stable offset 55. No
in-memory snapshot found for this offset.
at
org.apache.kafka.controller.OffsetControlManager.deactivate(OffsetControlManager.java:268)
at
org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1281)
at
org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:552)
at
org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:180)
at
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:885)
at
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:875)
at
org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:153)
at
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:142)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:215)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:186)
at java.base/java.lang.Thread.run(Thread.java:840)
If the process that died is restarted it joins the cluster and becomes on
observer, as expected.
The crash doesn't happen in a slightly different case, exact steps missing. But
the idea is this:
1. Create a 3-controller cluster as above
2. Format and start a 4rd controller.
3. Add the 4th controller as a voter.
4. Remove the 4th controller to make it an observer. It becomes observer as
expected.
Because this case works, I'm guessing the crash is somehow related to the
controller being one of the initial controllers.
I didn't dig deeper on why the crash occurs.
> Contoller crashes when removed if it is an initial controller
> -------------------------------------------------------------
>
> Key: KAFKA-17752
> URL: https://issues.apache.org/jira/browse/KAFKA-17752
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 3.9.0
> Reporter: Juha Mynttinen
> Priority: Major
>
> Hey,
> Tested using 3.9.0 RC0.
> It seems that "kafka-metadata-quorum.sh remove-controller" causes the removed
> controller to crash if it is one of the controllers specified using
> "--initial-controllers "
> Steps to reproduce:
> Clean up and setup the environment
> rm -rf /tmp/controllers && \
> mkdir -p /tmp/controllers/c1 && \
> mkdir -p /tmp/controllers/c2 && \
> mkdir -p /tmp/controllers/c3
> export KAFKA_HOME=<your_kafka_3_9_home>
> Format the controllers
> $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
> 00000000-0000-0000-0000-000000000001 --initial-controllers
> 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
> --config c1.properties
> $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
> 00000000-0000-0000-0000-000000000001 --initial-controllers
> 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
> --config c2.properties
> $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
> 00000000-0000-0000-0000-000000000001 --initial-controllers
> 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
> --config c3.properties
> Start the controllers, in separate terminals
> $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka
> c1.properties
> $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka
> c2.properties
> $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka
> c3.properties
> Remove a controller:
> $KAFKA_HOME/bin/kafka-metadata-quorum.sh --bootstrap-controller
> localhost:10001,localhost:10002,localhost:10003,localhost:10004
> remove-controller --controller-id 1001 --controller-directory-id
> AAAAAAAAAAEAAAAAAAAAAA
> The process crashes with the following error:
> [2024-10-09 15:19:15,574] ERROR Encountered fatal fault: exception while
> renouncing leadership
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> java.lang.RuntimeException: Unable to reset to last stable offset 55. No
> in-memory snapshot found for this offset.
> at
> org.apache.kafka.controller.OffsetControlManager.deactivate(OffsetControlManager.java:268)
> at
> org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1281)
> at
> org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:552)
> at
> org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:180)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:885)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:875)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:153)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:142)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:215)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:186)
> at java.base/java.lang.Thread.run(Thread.java:840)
> If the process that died is restarted it joins the cluster and becomes on
> observer, as expected.
> The crash doesn't happen in a slightly different case, exact steps missing.
> But the idea is this:
> 1. Create a 3-controller cluster as above
> 2. Format and start a 4rd controller.
> 3. Add the 4th controller as a voter.
> 4. Remove the 4th controller to make it an observer. It becomes observer as
> expected.
> Because this case works, I'm guessing the crash is somehow related to the
> controller being one of the initial controllers.
> I didn't dig deeper on why the crash occurs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)