[
https://issues.apache.org/jira/browse/KAFKA-12892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke Chen updated KAFKA-12892:
------------------------------
Description:
In KAFKA-12866, we fixed the issue that Kafka required ZK root access even when
using a chroot. But after the PR merged (build #183), trunk build keeps failing
at least one test group (mostly, JDK 15 and Scala 2.13). The build result will
said nothing useful:
{code:java}
> Task :core:integrationTest FAILED
[2021-06-04T03:19:18.974Z]
[2021-06-04T03:19:18.974Z] FAILURE: Build failed with an exception.
[2021-06-04T03:19:18.974Z]
[2021-06-04T03:19:18.974Z] * What went wrong:
[2021-06-04T03:19:18.974Z] Execution failed for task ':core:integrationTest'.
[2021-06-04T03:19:18.974Z] > Process 'Gradle Test Executor 128' finished with
non-zero exit value 1
[2021-06-04T03:19:18.974Z] This problem might be caused by incorrect test
process configuration.
[2021-06-04T03:19:18.974Z] Please refer to the test execution section in the
User Manual at
https://docs.gradle.org/7.0.2/userguide/java_testing.html#sec:test_execution
{code}
After investigation, I found the failed tests is because there are many
`InvalidACLException` thrown during the tests, ex:
{code:java}
GssapiAuthenticationTest > testServerNotFoundInKerberosDatabase() FAILED
[2021-06-04T02:25:45.419Z]
org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode =
InvalidACL for /config/topics/__consumer_offsets
[2021-06-04T02:25:45.419Z] at
org.apache.zookeeper.KeeperException.create(KeeperException.java:128)
[2021-06-04T02:25:45.419Z] at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
[2021-06-04T02:25:45.419Z] at
kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)
[2021-06-04T02:25:45.419Z] at
kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)
[2021-06-04T02:25:45.419Z] at
kafka.zk.KafkaZkClient.createOrSet$1(KafkaZkClient.scala:366)
[2021-06-04T02:25:45.419Z] at
kafka.zk.KafkaZkClient.setOrCreateEntityConfigs(KafkaZkClient.scala:376)
[2021-06-04T02:25:45.419Z] at
kafka.zk.AdminZkClient.createTopicWithAssignment(AdminZkClient.scala:109)
[2021-06-04T02:25:45.419Z] at
kafka.zk.AdminZkClient.createTopic(AdminZkClient.scala:60)
[2021-06-04T02:25:45.419Z] at
kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:357)
[2021-06-04T02:25:45.419Z] at
kafka.utils.TestUtils$.createTopic(TestUtils.scala:848)
[2021-06-04T02:25:45.419Z] at
kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:428)
[2021-06-04T02:25:45.419Z] at
kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:109)
[2021-06-04T02:25:45.419Z] at
kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:84)
[2021-06-04T02:25:45.419Z] at
kafka.server.GssapiAuthenticationTest.setUp(GssapiAuthenticationTest.scala:68)
{code}
Log can be found
[here|[https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka/branches/trunk/runs/195/nodes/14/steps/145/log/?start=0]]
After tracing back, I found it could because we add a test in the KAFKA-12866
to lock root access in zookeeper, but somehow it didn't unlock after the test
in testChrootExistsAndRootIsLocked. Also, while all the InvalidACLException
failed tests happened right after testChrootExistsAndRootIsLocked not long. Ex:
below testChrootExistsAndRootIsLocked completed at 02:24:30, and the above
failed test is at 02:25:45 (and following more than 10 tests with the same
InvalidACLException.
{code:java}
[2021-06-04T02:24:29.370Z] ZkClientAclTest > testChrootExistsAndRootIsLocked()
STARTED
[2021-06-04T02:24:30.321Z]
[2021-06-04T02:24:30.321Z] ZkClientAclTest > testChrootExistsAndRootIsLocked()
PASSED{code}
!image-2021-06-04-21-05-57-222.png|width=489,height=1111!
We should have further investigation to see how to improve the test to avoid
breaking the build. Before that, we can disable the test first. Thanks.
was:
In KAFKA-12866, we fixed the issue that Kafka required ZK root access even when
using a chroot. But after the PR merged (build #183), trunk build keeps failing
at least one test group (mostly, JDK 15 and Scala 2.13). The build result will
said nothing useful:
{code:java}
> Task :core:integrationTest FAILED
[2021-06-04T03:19:18.974Z]
[2021-06-04T03:19:18.974Z] FAILURE: Build failed with an exception.
[2021-06-04T03:19:18.974Z]
[2021-06-04T03:19:18.974Z] * What went wrong:
[2021-06-04T03:19:18.974Z] Execution failed for task ':core:integrationTest'.
[2021-06-04T03:19:18.974Z] > Process 'Gradle Test Executor 128' finished with
non-zero exit value 1
[2021-06-04T03:19:18.974Z] This problem might be caused by incorrect test
process configuration.
[2021-06-04T03:19:18.974Z] Please refer to the test execution section in the
User Manual at
https://docs.gradle.org/7.0.2/userguide/java_testing.html#sec:test_execution
{code}
After investigation, I found the failed tests is because there are many
`InvalidACLException` thrown during the tests, ex:
{code:java}
GssapiAuthenticationTest > testServerNotFoundInKerberosDatabase() FAILED
[2021-06-04T02:25:45.419Z]
org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode =
InvalidACL for /config/topics/__consumer_offsets
[2021-06-04T02:25:45.419Z] at
org.apache.zookeeper.KeeperException.create(KeeperException.java:128)
[2021-06-04T02:25:45.419Z] at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
[2021-06-04T02:25:45.419Z] at
kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)
[2021-06-04T02:25:45.419Z] at
kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)
[2021-06-04T02:25:45.419Z] at
kafka.zk.KafkaZkClient.createOrSet$1(KafkaZkClient.scala:366)
[2021-06-04T02:25:45.419Z] at
kafka.zk.KafkaZkClient.setOrCreateEntityConfigs(KafkaZkClient.scala:376)
[2021-06-04T02:25:45.419Z] at
kafka.zk.AdminZkClient.createTopicWithAssignment(AdminZkClient.scala:109)
[2021-06-04T02:25:45.419Z] at
kafka.zk.AdminZkClient.createTopic(AdminZkClient.scala:60)
[2021-06-04T02:25:45.419Z] at
kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:357)
[2021-06-04T02:25:45.419Z] at
kafka.utils.TestUtils$.createTopic(TestUtils.scala:848)
[2021-06-04T02:25:45.419Z] at
kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:428)
[2021-06-04T02:25:45.419Z] at
kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:109)
[2021-06-04T02:25:45.419Z] at
kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:84)
[2021-06-04T02:25:45.419Z] at
kafka.server.GssapiAuthenticationTest.setUp(GssapiAuthenticationTest.scala:68)
{code}
Log can be found
[here|[https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka/branches/trunk/runs/195/nodes/14/steps/145/log/?start=0]|https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka/branches/trunk/runs/195/nodes/14/steps/145/log/?start=0].]
After tracing back, I found it could because we add a test in the KAFKA-12866
to lock root access in zookeeper, but somehow it didn't unlock after the test
in testChrootExistsAndRootIsLocked. Also, while all the InvalidACLException
failed tests happened right after testChrootExistsAndRootIsLocked not long. Ex:
below testChrootExistsAndRootIsLocked completed at 02:24:30, and the above
failed test is at 02:25:45 (and following more than 10 tests with the same
InvalidACLException.
{code:java}
[2021-06-04T02:24:29.370Z] ZkClientAclTest > testChrootExistsAndRootIsLocked()
STARTED
[2021-06-04T02:24:30.321Z]
[2021-06-04T02:24:30.321Z] ZkClientAclTest > testChrootExistsAndRootIsLocked()
PASSED{code}
!image-2021-06-04-21-05-57-222.png|width=489,height=1111!
We should have further investigation to see how to improve the test to avoid
breaking the build. Before that, we can disable the test first. Thanks.
> InvalidACLException thrown in tests caused jenkins build unstable
> -----------------------------------------------------------------
>
> Key: KAFKA-12892
> URL: https://issues.apache.org/jira/browse/KAFKA-12892
> Project: Kafka
> Issue Type: Bug
> Reporter: Luke Chen
> Priority: Major
> Attachments: image-2021-06-04-21-05-57-222.png
>
>
> In KAFKA-12866, we fixed the issue that Kafka required ZK root access even
> when using a chroot. But after the PR merged (build #183), trunk build keeps
> failing at least one test group (mostly, JDK 15 and Scala 2.13). The build
> result will said nothing useful:
> {code:java}
> > Task :core:integrationTest FAILED
> [2021-06-04T03:19:18.974Z]
> [2021-06-04T03:19:18.974Z] FAILURE: Build failed with an exception.
> [2021-06-04T03:19:18.974Z]
> [2021-06-04T03:19:18.974Z] * What went wrong:
> [2021-06-04T03:19:18.974Z] Execution failed for task ':core:integrationTest'.
> [2021-06-04T03:19:18.974Z] > Process 'Gradle Test Executor 128' finished with
> non-zero exit value 1
> [2021-06-04T03:19:18.974Z] This problem might be caused by incorrect test
> process configuration.
> [2021-06-04T03:19:18.974Z] Please refer to the test execution section in
> the User Manual at
> https://docs.gradle.org/7.0.2/userguide/java_testing.html#sec:test_execution
> {code}
>
> After investigation, I found the failed tests is because there are many
> `InvalidACLException` thrown during the tests, ex:
>
> {code:java}
> GssapiAuthenticationTest > testServerNotFoundInKerberosDatabase() FAILED
> [2021-06-04T02:25:45.419Z]
> org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode =
> InvalidACL for /config/topics/__consumer_offsets
> [2021-06-04T02:25:45.419Z] at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:128)
> [2021-06-04T02:25:45.419Z] at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> [2021-06-04T02:25:45.419Z] at
> kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)
> [2021-06-04T02:25:45.419Z] at
> kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)
> [2021-06-04T02:25:45.419Z] at
> kafka.zk.KafkaZkClient.createOrSet$1(KafkaZkClient.scala:366)
> [2021-06-04T02:25:45.419Z] at
> kafka.zk.KafkaZkClient.setOrCreateEntityConfigs(KafkaZkClient.scala:376)
> [2021-06-04T02:25:45.419Z] at
> kafka.zk.AdminZkClient.createTopicWithAssignment(AdminZkClient.scala:109)
> [2021-06-04T02:25:45.419Z] at
> kafka.zk.AdminZkClient.createTopic(AdminZkClient.scala:60)
> [2021-06-04T02:25:45.419Z] at
> kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:357)
> [2021-06-04T02:25:45.419Z] at
> kafka.utils.TestUtils$.createTopic(TestUtils.scala:848)
> [2021-06-04T02:25:45.419Z] at
> kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:428)
> [2021-06-04T02:25:45.419Z] at
> kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:109)
> [2021-06-04T02:25:45.419Z] at
> kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:84)
> [2021-06-04T02:25:45.419Z] at
> kafka.server.GssapiAuthenticationTest.setUp(GssapiAuthenticationTest.scala:68)
> {code}
>
> Log can be found
> [here|[https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka/branches/trunk/runs/195/nodes/14/steps/145/log/?start=0]]
> After tracing back, I found it could because we add a test in the KAFKA-12866
> to lock root access in zookeeper, but somehow it didn't unlock after the test
> in testChrootExistsAndRootIsLocked. Also, while all the InvalidACLException
> failed tests happened right after testChrootExistsAndRootIsLocked not long.
> Ex: below testChrootExistsAndRootIsLocked completed at 02:24:30, and the
> above failed test is at 02:25:45 (and following more than 10 tests with the
> same InvalidACLException.
> {code:java}
> [2021-06-04T02:24:29.370Z] ZkClientAclTest >
> testChrootExistsAndRootIsLocked() STARTED
> [2021-06-04T02:24:30.321Z]
> [2021-06-04T02:24:30.321Z] ZkClientAclTest >
> testChrootExistsAndRootIsLocked() PASSED{code}
>
> !image-2021-06-04-21-05-57-222.png|width=489,height=1111!
> We should have further investigation to see how to improve the test to avoid
> breaking the build. Before that, we can disable the test first. Thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)