[ https://issues.apache.org/jira/browse/GEODE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456612#comment-17456612 ]
ASF subversion and git services commented on GEODE-9872: -------------------------------------------------------- Commit 68b9080e84054f059b8c3e9b4aff9034fb302353 in geode's branch refs/heads/develop from Dale Emery [ https://gitbox.apache.org/repos/asf?p=geode.git;h=68b9080 ] GEODE-9872: Make test framework code assign ports (#7176) * GEODE-9872: Make test framework code assign ports PROBLEM `DistTXPersistentDebugDUnitTest ` failed in CI because it accidentally connected to a locator from another test (`ClusterConfigLocatorRestartDUnitTest`). CAUSE `ClusterConfigLocatorRestartDUnitTest` attempts to restart a locator on a port in the ephemeral port range. Here is the sequence of events: 1. `ClusterConfigLocatorRestartDUnitTest ` started a locator on an ephemeral port. In this CI run it got port 37877. 2. `ClusterConfigLocatorRestartDUnitTest` stopped the locator on port 37877. 3. `DistTXPersistentDebugDUnitTest` started a locator on an ephemeral port. In this CI run it got 37877. 4. `ClusterConfigLocatorRestartDUnitTest ` attempted to restart the locator on port 37877. That port was already in use in `DistTXPersistentDebugDUnitTest`'s locator, and as a result the two tests became entangled. CONTRIBUTING FACTORS `DistTXPersistentDebugDUnitTest` uses `DUnitLauncher` to start its locator. By default, `DUnitLauncher` starts its locator on an ephemeral port. `ClusterConfigLocatorRestartDUnitTest` uses `ClusterStartupRule` to start several locators. By default, `ClusterStartupRule` starts each locator on an ephemeral port. SOLUTION Change `DUnitLauncher` and `ClusterStartupRule` to assign locator ports via `AvailablePortHelper` if the test does not specify a particular port. I considered changing only `ClusterConfigLogatorRestartDUnitTest` to assign the port that it intends to reuse. But: - That would fix only this one test, though an unknown number of tests similarly attempt to reuse ports assigned by framework code. Numerous of those tests have already been changed to assign ports explicitly, but an unknown number remain. - It is quite reasonable for this test and others to assume that, if the test framework assigns a port on the test's behalf, then the test will enjoy exclusive use of that port for the entire life of the test. I think the key problem is not that tests make this assumption, but that the framework code violates it. Changing the test framework classes that tacitly assign ports (`DUnitLauncher` and `ClusterStartupRule`) makes them behave in a way that tests expect. * Add new port var to dunit sanctioned serializables > DistTXPersistentDebugDUnitTest tests fail because "cluster configuration > service not available" > ----------------------------------------------------------------------------------------------- > > Key: GEODE-9872 > URL: https://issues.apache.org/jira/browse/GEODE-9872 > Project: Geode > Issue Type: Bug > Components: tests > Reporter: Bill Burcham > Assignee: Dale Emery > Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > I suspect this failure is due to something in the test framework, or perhaps > one or more tests failing to manage ports correctly, allowing two or more > tests to interfere with one another. > In this distributed test: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/388] > we see two failures. Here's the first full stack trace: > > > {code:java} > [error 2021/12/04 20:40:53.796 UTC <RMI TCP Connection(1)-10.0.2.28> > tid=33] org.apache.geode.GemFireConfigException: cluster configuration > service not available > at > org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:196) > at > org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:226) > at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:192) > at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:79) > at > org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87) > at > org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225) > at > org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72) > at > org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222) > at > org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52) > at > org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:96) > at > org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:75) > at > org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.processAllTestClasses(JUnitPlatformTestClassProcessor.java:99) > at > org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.access$000(JUnitPlatformTestClassProcessor.java:79) > at > org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor.stop(JUnitPlatformTestClassProcessor.java:75) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) > at com.sun.proxy.$Proxy2.stop(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:133) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) > at > org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164) > at > org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414) > at > org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) > at > org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.disttx.DistTXDebugDUnitTest$$Lambda$267/1763955613.run in VM > 0 running on Host > heavy-lifter-2278eb98-196d-5ad5-aae6-cb3cdb6f4c62.c.apachegeode-ci.internal > with 4 VMs > at > org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631) > at org.apache.geode.test.dunit.VM.invoke(VM.java:448) > at > org.apache.geode.disttx.DistTXDebugDUnitTest.createCacheInAllVms(DistTXDebugDUnitTest.java:91) > at > org.apache.geode.disttx.DistTXPersistentDebugDUnitTest.testBasicDistributedTX(DistTXPersistentDebugDUnitTest.java:99) > Caused by: > org.apache.geode.GemFireConfigException: cluster configuration > service not available > at > org.apache.geode.internal.cache.GemFireCacheImpl.requestSharedConfiguration(GemFireCacheImpl.java:1342) > at > org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:970) > at > org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:870) > at > org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187) > at > org.apache.geode.internal.cache.CacheFactoryStatics.create(CacheFactoryStatics.java:61) > at > org.apache.geode.cache.CacheFactory.create(CacheFactory.java:352) > at > org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:120) > at > org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:222) > at > org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:209) > at > org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:196) > at > org.apache.geode.disttx.DistTXDebugDUnitTest.createCacheInVm(DistTXDebugDUnitTest.java:87) > at > org.apache.geode.disttx.DistTXDebugDUnitTest.lambda$createCacheInAllVms$bb17a952$1(DistTXDebugDUnitTest.java:91) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123) > at > org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: > > org.apache.geode.internal.config.ClusterConfigurationNotAvailableException: > Unable to retrieve cluster configuration from the locator. > at > org.apache.geode.internal.cache.ClusterConfigurationLoader.requestConfigurationFromLocators(ClusterConfigurationLoader.java:302) > at > org.apache.geode.internal.cache.GemFireCacheImpl.requestSharedConfiguration(GemFireCacheImpl.java:1310) > ... 34 more {code} > > Second stack trace starts with: > > {code:java} > DistTXPersistentDebugDUnitTest > testTXPRRR2_putall FAILED > org.gradle.internal.exceptions.DefaultMultiCauseException: Multiple > Failures (2 failures) > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.disttx.DistTXDebugDUnitTest$$Lambda$267/1763955613.run in VM > 0 running on Host > heavy-lifter-2278eb98-196d-5ad5-aae6-cb3cdb6f4c62.c.apachegeode-ci.internal > with 4 VMs > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > ----------------------------------------------------------------------- > Found suspect string in 'dunit_suspect-vm0.log' at line 446 > [error 2021/12/04 20:41:28.201 UTC <RMI TCP Connection(1)-10.0.2.28> > tid=33] org.apache.geode.GemFireConfigException: A server must use cluster > configuration when joining a secured cluster. {code} > > Looking at stdout for the first failing test (accessed from > [http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0705/test-results/distributedTest/1638653700/|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0705/test-results/distributedTest/1638653700/] > ) we see the first view, inexplicably contains two locators (I don't see any > locators at all configured by {color:#000000}DistTXPersistentDebugDUnitTest > or its base classes so I'd expect there to be at most one locator):{color} > {code:java} > [locator] [info 2021/12/04 20:39:49.735 UTC <Geode Membership View Creator> > tid=0x3a] preparing new view > View[heavy-lifter-2278eb98-196d-5ad5-aae6-cb3cdb6f4c62(658614:locator)<ec><v0>:45166|1] > members: > [heavy-lifter-2278eb98-196d-5ad5-aae6-cb3cdb6f4c62(658614:locator)<ec><v0>:45166, > > heavy-lifter-2278eb98-196d-5ad5-aae6-cb3cdb6f4c62(locator-1:607836:locator)<ec><v1>:41005] > {code} > Here are all the log messages pertaining to launching JVMs, notice none of > these mentions locator-1's pid 607836 so that locator seems to have infected > this test run from some other test run: > {code:java} > [locator] [info 2021/12/04 20:39:47.322 UTC <main> tid=0x1] VM-2 is > launching with PID 658614 > [vm1] [info 2021/12/04 20:39:51.410 UTC <main> tid=0x1] VM1 is launching > with PID 660767 > [vm0] [info 2021/12/04 20:39:51.424 UTC <main> tid=0x1] VM0 is launching > with PID 660707 > [vm2] [info 2021/12/04 20:39:51.435 UTC <main> tid=0x1] VM2 is launching > with PID 660806 > [vm3] [info 2021/12/04 20:39:51.438 UTC <main> tid=0x1] VM3 is launching > with PID 660832 {code} > The I see that the locator that is the coordinator has cluster config > disabled: > {code:java} > [locator] [info 2021/12/04 20:39:50.379 UTC <RMI TCP > Connection(1)-10.0.2.28> tid=0x21] Cluster configuration service is disabled > {code} > View 2 adds a server member with another PID not mentioned above. > I wonder if this test meant to run with no cluster config service. But > somehow the test saw a locator from another test and tried (and failed) to > access cluster config on that other locator. -- This message was sent by Atlassian Jira (v8.20.1#820001)