[ https://issues.apache.org/jira/browse/GEODE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441885#comment-17441885 ]
Kamilla Aslami commented on GEODE-9738: --------------------------------------- The failure happens after the test executes stopLocator on vm2 (during the cleanup phase of the test) : {noformat} [vm2] [info 2021/10/14 00:24:14.501 UTC vm2 <RMI TCP Connection(1)-10.0.0.146> tid=0x23] Received method: org.apache.geode.internal.cache.rollingupgrade.RollingUpgradeDUnitTest$9.run with 0 args on object: @1553b8f5(0, "execute: stopLocator") [vm2] [info 2021/10/14 00:24:14.502 UTC vm2 <RMI TCP Connection(1)-10.0.0.146> tid=0x23] Stopping Locator on heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal/10.0.0.146:26909:26909 [vm2] [info 2021/10/14 00:24:14.504 UTC vm2 <Distribution Locator on heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal: 26909> tid=0x24] locator shutting down [vm2] [info 2021/10/14 00:24:14.507 UTC vm2 <Distribution Locator on heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal: 26909> tid=0x24] Closing locator's cache {noformat} FederatingManager thread on vm2 attempts to reconnect to vm0 and re-establishes the connection: {noformat} [vm2] [warn 2021/10/14 00:24:14.508 UTC vm2 <FederatingManager6> tid=0x73] Connection: Attempting reconnect to peer heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752 [vm2] [info 2021/10/14 00:24:14.514 UTC vm2 <FederatingManager6> tid=0x73] heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751: Successfully reestablished connection to peer heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752 {noformat} It looks like this thread doesn't know that the locator is shutting down and continues to do something. {noformat} [vm2] [info 2021/10/14 00:24:14.536 UTC vm2 <Distribution Locator on heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal: 26909> tid=0x24] GemFireCache[id = 481915635; isClosing = true; isShutDownAll = false; created = Thu Oct 14 00:23:52 UTC 2021; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing. [vm2] [warn 2021/10/14 00:24:14.628 UTC vm2 <FederatingManager6> tid=0x73] Attempting TCP/IP reconnect to heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752 [vm2] [info 2021/10/14 00:24:14.634 UTC vm2 <FederatingManager6> tid=0x73] Successfully reconnected to member heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752 [vm2] [info 2021/10/14 00:24:14.732 UTC vm2 <org.apache.geode.distributed.internal.InternalLocator$$Lambda$356/0x000000010046e440@5dcbb6c4> tid=0x77] Distribution Locator on heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal[26909] is stopped [vm2] [info 2021/10/14 00:24:14.735 UTC vm2 <Distribution Locator on heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal: 26909> tid=0x24] Shutting down DistributionManager heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751. [vm2] [warn 2021/10/14 00:24:14.735 UTC vm2 <FederatingManager6> tid=0x73] Attempting TCP/IP reconnect to heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm1:585440)<v25>:54753 [vm0] [info 2021/10/14 00:24:14.737 UTC vm0 <Pooled High Priority Message Processor 2> tid=0x3d] received leave request from heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751 for heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751 [vm2] [info 2021/10/14 00:24:14.737 UTC vm2 <FederatingManager6> tid=0x73] Successfully reconnected to member heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm1:585440)<v25>:54753 [vm0] [info 2021/10/14 00:24:14.738 UTC vm0 <Pooled High Priority Message Processor 2> tid=0x3d] JoinLeave.processLeaveRequestMessage(LeaveRequestMessage) invoked. isCoordinator=false; isStopping=false; cancelInProgress=false [vm0] [info 2021/10/14 00:24:14.738 UTC vm0 <Pooled High Priority Message Processor 2> tid=0x3d] Checking to see if I should become coordinator. My address is heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752 [vm0] [info 2021/10/14 00:24:14.738 UTC vm0 <Pooled High Priority Message Processor 2> tid=0x3d] View with removed and left members removed is View[heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751|26] members: [heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752\{lead}, heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm1:585440)<v25>:54753] and coordinator would be heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752 [vm0] [info 2021/10/14 00:24:14.739 UTC vm0 <Pooled High Priority Message Processor 2> tid=0x3d] This member is becoming the membership coordinator with address heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752 [vm1] [info 2021/10/14 00:24:14.738 UTC vm1 <Pooled High Priority Message Processor 2> tid=0x4b] received leave request from heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751 for heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751 {noformat} At some point this thread fails with "Message distribution has terminated" error: {noformat} [vm2] [fatal 2021/10/14 00:24:14.739 UTC vm2 <FederatingManager6> tid=0x73] Uncaught exception in thread Thread[FederatingManager6,5,RMI Runtime] [vm2] org.apache.geode.management.ManagementException: org.apache.geode.distributed.DistributedSystemDisconnectedException: Distribution manager on heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751 started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated [vm2] at org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:486) [vm2] at org.apache.geode.management.internal.FederatingManager$AddMemberTask.call(FederatingManager.java:596) [vm2] at org.apache.geode.management.internal.FederatingManager.lambda$addMember$1(FederatingManager.java:199) [vm2] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [vm2] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [vm2] at java.base/java.lang.Thread.run(Thread.java:829) [vm2] Caused by: org.apache.geode.distributed.DistributedSystemDisconnectedException: Distribution manager on heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751 started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated [vm2] at org.apache.geode.distributed.internal.ClusterDistributionManager$Stopper.generateCancelledException(ClusterDistributionManager.java:2885) [vm2] at org.apache.geode.distributed.internal.InternalDistributedSystem$Stopper.generateCancelledException(InternalDistributedSystem.java:1177) [vm2] at org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5212) [vm2] at org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83) [vm2] at org.apache.geode.internal.cache.CreateRegionProcessor.initializeRegion(CreateRegionProcessor.java:121) [vm2] at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1164) [vm2] at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1095) [vm2] at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3108) [vm2] at org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:78) [vm2] at org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:429) [vm2] ... 5 more {noformat} > CI failure: RollingUpgradeRollServersOnReplicatedRegion_dataserializable > failed with DistributedSystemDisconnectedException > --------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-9738 > URL: https://issues.apache.org/jira/browse/GEODE-9738 > Project: Geode > Issue Type: Bug > Components: membership, messaging > Affects Versions: 1.15.0 > Reporter: Kamilla Aslami > Assignee: Bill Burcham > Priority: Major > Labels: needsTriage > Attachments: controller.log, locator.log, vm0.log, vm1.log, vm2.log, > vm3.log > > > {noformat} > RollingUpgradeRollServersOnReplicatedRegion_dataserializable > > testRollServersOnReplicatedRegion_dataserializable[from_v1.13.4] FAILED > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > ----------------------------------------------------------------------- > Found suspect string in 'dunit_suspect-vm2.log' at line 685 [fatal > 2021/10/14 00:24:14.739 UTC <FederatingManager6> tid=115] Uncaught exception > in thread Thread[FederatingManager6,5,RMI Runtime] > org.apache.geode.management.ManagementException: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > Distribution manager on > heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751 > started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated > at > org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:486) > at > org.apache.geode.management.internal.FederatingManager$AddMemberTask.call(FederatingManager.java:596) > at > org.apache.geode.management.internal.FederatingManager.lambda$addMember$1(FederatingManager.java:199) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > Distribution manager on > heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751 > started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated > at > org.apache.geode.distributed.internal.ClusterDistributionManager$Stopper.generateCancelledException(ClusterDistributionManager.java:2885) > at > org.apache.geode.distributed.internal.InternalDistributedSystem$Stopper.generateCancelledException(InternalDistributedSystem.java:1177) > at > org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5212) > at > org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83) > at > org.apache.geode.internal.cache.CreateRegionProcessor.initializeRegion(CreateRegionProcessor.java:121) > at > org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1164) > at > org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1095) > at > org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3108) > at > org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:78) > at > org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:429) > ... 5 more > at org.junit.Assert.fail(Assert.java:89) > at > org.apache.geode.test.dunit.internal.DUnitLauncher.closeAndCheckForSuspects(DUnitLauncher.java:420) > at > org.apache.geode.test.dunit.internal.DUnitLauncher.closeAndCheckForSuspects(DUnitLauncher.java:436) > at > org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.cleanupAllVms(JUnit4DistributedTestCase.java:551) > at > org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.doTearDownDistributedTestCase(JUnit4DistributedTestCase.java:498) > at > org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.tearDownDistributedTestCase(JUnit4DistributedTestCase.java:481) > at jdk.internal.reflect.GeneratedMethodAccessor11.invoke(Unknown > Source) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) > at java.util.Iterator.forEachRemaining(Iterator.java:133) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52) > at > org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:96) > at > org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:75) > at > org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.processAllTestClasses(JUnitPlatformTestClassProcessor.java:99) > at > org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.access$000(JUnitPlatformTestClassProcessor.java:79) > at > org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor.stop(JUnitPlatformTestClassProcessor.java:75) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:61) > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:566) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) > at com.sun.proxy.$Proxy2.stop(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:133) > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:566) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) > at > org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164) > at > org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414) > at > org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) > at > org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) > at java.lang.Thread.run(Thread.java:829) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)