[
https://issues.apache.org/jira/browse/KAFKA-18981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937328#comment-17937328
]
PoAn Yang commented on KAFKA-18981:
-----------------------------------
The root cause of this flaky test is that: if broker 1 doesn't get heartbeat
promptly and it's fenced after the topic creation, the broker 1 cannot be ISR.
The session timeout is 300ms. Following logs are fromĀ
[https://develocity.apache.org/s/tjs4dzxiphmwc/tests/task/:metadata:test/details/org.apache.kafka.controller.QuorumControllerTest/testMinIsrUpdateWithElr()/1/output]:
{noformat}
...
[2025-03-18 07:14:45,121] DEBUG [QuorumController id=0] Processed
processBrokerHeartbeat(1474681863) in 140186 us
(org.apache.kafka.controller.QuorumController:542) <-- heartbeat for broker 1
...
[2025-03-18 07:14:45,147] DEBUG [QuorumController id=0] Processed
processBrokerHeartbeat(399642596) in 10723 us
(org.apache.kafka.controller.QuorumController:542) <-- heartbeat for broker 2
...
[2025-03-18 07:14:45,172] DEBUG [QuorumController id=0] Processed
processBrokerHeartbeat(168471181) in 13236 us
(org.apache.kafka.controller.QuorumController:542) <-- heartbeat for broker 3
...
[2025-03-18 07:14:45,288] INFO [QuorumController id=0] CreateTopics result(s):
CreatableTopic(name='foo', ...)
(org.apache.kafka.metalog.LocalLogManager$SharedLogData:258) <-- topic creation
...
[2025-03-18 07:14:45,455] INFO [QuorumController id=0] Fencing broker 1 at
epoch 6 because its session has timed out.
(org.apache.kafka.controller.ReplicationControlManager:1693) <-- broker 1
session timeout
{noformat}
At 07:14:45,121, the broker 1 gets heartbeat and it's active. However, it
doesn't get another heartbeat before 07:14:45,421, so it's fenced at
07:14:45,455.
We can reproduce this by adding Thread.sleep(300) just after active.creatTopics
[0]. IMO, to solve the root cause, we can use another thread to send heartbeat
request, so broker 1 doesn't have chance to get fenced.
[0]
https://github.com/apache/kafka/blob/1ded681684e771b16aa98ae751f39b9816345a83/metadata/src/test/java/org/apache/kafka/controller/QuorumControllerTest.java#L663-L665
> Fix flaky QuorumControllerTest.testMinIsrUpdateWithElr
> ------------------------------------------------------
>
> Key: KAFKA-18981
> URL: https://issues.apache.org/jira/browse/KAFKA-18981
> Project: Kafka
> Issue Type: Improvement
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Priority: Major
>
> {code:java}
> org.opentest4j.AssertionFailedError: PartitionRegistration(replicas=[3, 1,
> 2], directories=[vFsJEjZDRlONBTr8543h6A, gQ7fQdQaTFmyno0jsZ2LVA,
> 4OWYkhmOTO2eaTFzTyiMEg], isr=[], removingReplicas=[], addingReplicas=[],
> elr=[3], lastKnownElr=[3], leader=-1, leaderRecoveryState=RECOVERED,
> leaderEpoch=1, partitionEpoch=3) ==> array lengths differ, expected: <1> but
> was: <0>
> at
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at
> app//org.junit.jupiter.api.AssertArrayEquals.assertArraysHaveSameLength(AssertArrayEquals.java:428)
> at
> app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:237)
> at
> app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:87)
> at
> app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1290)
> at
> app//org.apache.kafka.controller.QuorumControllerTest.testMinIsrUpdateWithElr(QuorumControllerTest.java:699)
> at [email protected]/java.lang.reflect.Method.invoke(Method.java:569)
> at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)
> at [email protected]/java.util.ArrayList.forEach(ArrayList.java:1511)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)