[
https://issues.apache.org/jira/browse/KAFKA-15648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845658#comment-17845658
]
sanghyeok An commented on KAFKA-15648:
--------------------------------------
Hey, [~davidarthur] !
How did you test it locally? I execute test code repeatly, bit that error not
occurs.
this is command lists which i executed.
# gradle --parallel test
# gradle --parallel metadata:test
# gradle metadata:test
> QuorumControllerTest#testBootstrapZkMigrationRecord is flaky
> ------------------------------------------------------------
>
> Key: KAFKA-15648
> URL: https://issues.apache.org/jira/browse/KAFKA-15648
> Project: Kafka
> Issue Type: Bug
> Components: controller, unit tests
> Reporter: David Arthur
> Priority: Minor
> Labels: flaky-test, good-first-issue
>
> Noticed that this test failed on Jenkins with
> {code}
> org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler:
> exception while completing controller activation: Should not have ZK
> migrations enabled on a cluster running metadata.version 3.0-IV1
> at
> app//org.apache.kafka.controller.ActivationRecordsGenerator.recordsForNonEmptyLog(ActivationRecordsGenerator.java:154)
> at
> app//org.apache.kafka.controller.ActivationRecordsGenerator.generate(ActivationRecordsGenerator.java:229)
> at
> app//org.apache.kafka.controller.QuorumController$CompleteActivationEvent.generateRecordsAndResult(QuorumController.java:1237)
> at
> app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:784)
> at
> app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
> at
> app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> at
> app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> at [email protected]/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.RuntimeException: Should not have ZK migrations enabled
> on a cluster running metadata.version 3.0-IV1
> ... 8 more
> {code}
> I think this exception is a red herring (it's expected from one of the
> negative test cases). When trying to reproduce this issue locally, I do
> occasionally see the following type of failure
> {code}
> [2023-10-19 13:42:10,091] INFO Elected new leader:
> LeaderAndEpoch(leaderId=OptionalInt[0], epoch=1).
> (org.apache.kafka.metalog.LocalLogManager$SharedLogData:300)
> [2023-10-19 13:42:10,091] DEBUG
> append(batch=LeaderChangeBatch(newLeader=LeaderAndEpoch(leaderId=OptionalInt[0],
> epoch=1)), nextEndOffset=0)
> (org.apache.kafka.metalog.LocalLogManager$SharedLogData:276)
> [2023-10-19 13:42:10,091] DEBUG [LocalLogManager 0] Node 0: running log
> check. (org.apache.kafka.metalog.LocalLogManager:536)
> [2023-10-19 13:42:10,091] DEBUG [LocalLogManager 0] initialized local log
> manager for node 0 (org.apache.kafka.metalog.LocalLogManager:685)
> [2023-10-19 13:42:10,091] DEBUG [QuorumController id=0] Creating in-memory
> snapshot -1 (org.apache.kafka.timeline.SnapshotRegistry:203)
> [2023-10-19 13:42:10,091] INFO [QuorumController id=0] Creating new
> QuorumController with clusterId 6xRUXZ_kQ1GfuaHK42AS9Q. ZK migration mode is
> enabled. (org.apache.kafka.controller.QuorumController:1912)
> [2023-10-19 13:42:10,092] INFO [LocalLogManager 0] Node 0: registered
> MetaLogListener 1082153924 (org.apache.kafka.metalog.LocalLogManager:703)
> [2023-10-19 13:42:10,092] DEBUG [LocalLogManager 0] Node 0: running log
> check. (org.apache.kafka.metalog.LocalLogManager:536)
> [2023-10-19 13:42:10,092] DEBUG [LocalLogManager 0] Node 0: Executing
> handleLeaderChange LeaderAndEpoch(leaderId=OptionalInt[0], epoch=1)
> (org.apache.kafka.metalog.LocalLogManager:578)
> [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Executing
> handleLeaderChange[1]. (org.apache.kafka.controller.QuorumController:577)
> [2023-10-19 13:42:10,092] INFO [QuorumController id=0] In the new epoch 1,
> the leader is (none). (org.apache.kafka.controller.QuorumController:1179)
> [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Processed
> handleLeaderChange[1] in 50 us
> (org.apache.kafka.controller.QuorumController:510)
> [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Executing
> handleLeaderChange[1]. (org.apache.kafka.controller.QuorumController:577)
> [2023-10-19 13:42:10,092] INFO [QuorumController id=0] Becoming the active
> controller at epoch 1, next write offset 1.
> (org.apache.kafka.controller.QuorumController:1175)
> [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Processed
> handleLeaderChange[1] in 77 us
> (org.apache.kafka.controller.QuorumController:510)
> [2023-10-19 13:42:10,092] WARN [QuorumController id=0] Performing controller
> activation. The metadata log appears to be empty. Appending 1 bootstrap
> record(s) at metadata.version 3.4-IV0 from bootstrap source 'test'. Putting
> the controller into pre-migration mode. No metadata updates will be allowed
> until the ZK metadata has been migrated.
> (org.apache.kafka.controller.QuorumController:108)
> [2023-10-19 13:42:10,092] INFO [QuorumController id=0] Replayed a
> FeatureLevelRecord setting metadata version to 3.4-IV0
> (org.apache.kafka.controller.FeatureControlManager:400)
> [2023-10-19 13:42:10,092] INFO [QuorumController id=0] Replayed a
> ZkMigrationStateRecord changing the migration state from NONE to
> PRE_MIGRATION. (org.apache.kafka.controller.FeatureControlManager:421)
> [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Got exception while
> running completeActivation[1](342832465). Invoking handleException.
> (org.apache.kafka.queue.KafkaEventQueue:132)
> org.apache.kafka.raft.errors.NotLeaderException: Append failed because the
> given epoch 0 is stale. Current leader epoch = 1
> at
> org.apache.kafka.metalog.LocalLogManager$SharedLogData.tryAppend(LocalLogManager.java:253)
> at
> org.apache.kafka.metalog.LocalLogManager$SharedLogData.tryAppend(LocalLogManager.java:233)
> at
> org.apache.kafka.metalog.LocalLogManager.scheduleAtomicAppend(LocalLogManager.java:795)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent$1.apply(QuorumController.java:839)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent$1.apply(QuorumController.java:816)
> at
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:918)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:815)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> at java.base/java.lang.Thread.run(Thread.java:833)
> [2023-10-19 13:42:10,092] INFO [QuorumController id=0] completeActivation[1]:
> event failed with NotLeaderException (treated as NotControllerException) at
> epoch 1 in 387 microseconds. Renouncing leadership and reverting to the last
> committed offset -1. (org.apache.kafka.controller.QuorumController:542)
> [2023-10-19 13:42:10,092] ERROR Encountered fatalFaultHandler fault:
> exception while renouncing leadership
> (org.apache.kafka.server.fault.MockFaultHandler:44)
> java.lang.IllegalArgumentException: Attempt to resign from epoch 1 which is
> larger than the current epoch 0
> at
> org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808)
> at
> org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1268)
> at
> org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:545)
> at
> org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:177)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:879)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:869)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> at java.base/java.lang.Thread.run(Thread.java:833)
> [2023-10-19 13:42:10,093] DEBUG [QuorumController id=0] Processed read
> migration state(176059074) in 4 us
> (org.apache.kafka.controller.QuorumController:510)
> [2023-10-19 13:42:10,093] INFO [QuorumController id=0]
> QuorumController#beginShutdown: shutting down event queue.
> (org.apache.kafka.queue.KafkaEventQueue:492)
> [2023-10-19 13:42:10,093] DEBUG [QuorumController id=0]
> KafkaEventQueue#close: Event queue is already shutting down.
> (org.apache.kafka.queue.KafkaEventQueue:489)
> [2023-10-19 13:42:10,093] INFO [QuorumController id=0] closed event queue.
> (org.apache.kafka.queue.KafkaEventQueue:514)
> [2023-10-19 13:42:10,093] INFO [LocalLogManager 0] beginShutdown: shutting
> down event queue. (org.apache.kafka.queue.KafkaEventQueue:492)
> [2023-10-19 13:42:10,093] DEBUG [LocalLogManager 0] Node 0: closing.
> (org.apache.kafka.metalog.LocalLogManager:653)
> [2023-10-19 13:42:10,094] DEBUG [LocalLogManager 0] beginShutdown: Event
> queue is already shutting down. (org.apache.kafka.queue.KafkaEventQueue:489)
> [2023-10-19 13:42:10,094] DEBUG [LocalLogManager 0] KafkaEventQueue#close:
> Event queue is already shutting down.
> (org.apache.kafka.queue.KafkaEventQueue:489)
> [2023-10-19 13:42:10,094] DEBUG [LocalLogManager 0] Node 0: beginning
> shutdown. (org.apache.kafka.metalog.LocalLogManager:637)
> [2023-10-19 13:42:10,094] DEBUG
> append(batch=LeaderChangeBatch(newLeader=LeaderAndEpoch(leaderId=OptionalInt.empty,
> epoch=2)), nextEndOffset=1)
> (org.apache.kafka.metalog.LocalLogManager$SharedLogData:276)
> [2023-10-19 13:42:10,094] INFO Elected new leader:
> LeaderAndEpoch(leaderId=OptionalInt[0], epoch=3).
> (org.apache.kafka.metalog.LocalLogManager$SharedLogData:300)
> [2023-10-19 13:42:10,094] DEBUG
> append(batch=LeaderChangeBatch(newLeader=LeaderAndEpoch(leaderId=OptionalInt[0],
> epoch=3)), nextEndOffset=2)
> (org.apache.kafka.metalog.LocalLogManager$SharedLogData:276)
> [2023-10-19 13:42:10,094] DEBUG [QuorumController id=0]
> MetaLogManager.Listener: Event queue is already shutting down.
> (org.apache.kafka.queue.KafkaEventQueue:489)
> [2023-10-19 13:42:10,094] INFO [LocalLogManager 0] closed event queue.
> (org.apache.kafka.queue.KafkaEventQueue:514)
> org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler:
> exception while renouncing leadership: Attempt to resign from epoch 1 which
> is larger than the current epoch 0
> at
> org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808)
> at
> org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1268)
> at
> org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:545)
> at
> org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:177)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:879)
> at
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:869)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> at
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: java.lang.IllegalArgumentException: Attempt to resign from epoch 1
> which is larger than the current epoch 0
> ... 11 more
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)