[ 
https://issues.apache.org/jira/browse/GEODE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441885#comment-17441885
 ] 

Kamilla Aslami commented on GEODE-9738:
---------------------------------------

The failure happens after the test executes stopLocator on vm2 (during the 
cleanup phase of the test) :
{noformat}
[vm2] [info 2021/10/14 00:24:14.501 UTC vm2 <RMI TCP 
Connection(1)-10.0.0.146> tid=0x23] Received method: 
org.apache.geode.internal.cache.rollingupgrade.RollingUpgradeDUnitTest$9.run 
with 0 args on object: @1553b8f5(0, "execute: stopLocator")

[vm2] [info 2021/10/14 00:24:14.502 UTC vm2 <RMI TCP 
Connection(1)-10.0.0.146> tid=0x23] Stopping Locator on 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal/10.0.0.146:26909:26909

[vm2] [info 2021/10/14 00:24:14.504 UTC vm2 <Distribution Locator on 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal: 
26909> tid=0x24] locator shutting down

[vm2] [info 2021/10/14 00:24:14.507 UTC vm2 <Distribution Locator on 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal: 
26909> tid=0x24] Closing locator's cache
{noformat}

FederatingManager thread on vm2 attempts to reconnect to vm0 and re-establishes 
the connection:
{noformat}
[vm2] [warn 2021/10/14 00:24:14.508 UTC vm2 <FederatingManager6> 
tid=0x73] Connection: Attempting reconnect to peer 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752

[vm2] [info 2021/10/14 00:24:14.514 UTC vm2 <FederatingManager6> 
tid=0x73] 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751:
 Successfully reestablished connection to peer 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752
{noformat}
It looks like this thread doesn't know that the locator is shutting down and 
continues to do something.
{noformat}
[vm2] [info 2021/10/14 00:24:14.536 UTC vm2 <Distribution Locator on 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal: 
26909> tid=0x24] GemFireCache[id = 481915635; isClosing = true; 
isShutDownAll = false; created = Thu Oct 14 00:23:52 UTC 2021; server = false; 
copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.

[vm2] [warn 2021/10/14 00:24:14.628 UTC vm2 <FederatingManager6> 
tid=0x73] Attempting TCP/IP reconnect to  
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752

[vm2] [info 2021/10/14 00:24:14.634 UTC vm2 <FederatingManager6> 
tid=0x73] Successfully reconnected to member 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752

[vm2] [info 2021/10/14 00:24:14.732 UTC vm2 
<org.apache.geode.distributed.internal.InternalLocator$$Lambda$356/0x000000010046e440@5dcbb6c4>
 tid=0x77] Distribution Locator on 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal[26909]
 is stopped

[vm2] [info 2021/10/14 00:24:14.735 UTC vm2 <Distribution Locator on 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a.c.apachegeode-ci.internal: 
26909> tid=0x24] Shutting down DistributionManager 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751.
 

[vm2] [warn 2021/10/14 00:24:14.735 UTC vm2 <FederatingManager6> 
tid=0x73] Attempting TCP/IP reconnect to  
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm1:585440)<v25>:54753

[vm0] [info 2021/10/14 00:24:14.737 UTC vm0 <Pooled High Priority Message 
Processor 2> tid=0x3d] received leave request from 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751
 for 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751

[vm2] [info 2021/10/14 00:24:14.737 UTC vm2 <FederatingManager6> 
tid=0x73] Successfully reconnected to member 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm1:585440)<v25>:54753

[vm0] [info 2021/10/14 00:24:14.738 UTC vm0 <Pooled High Priority Message 
Processor 2> tid=0x3d] 
JoinLeave.processLeaveRequestMessage(LeaveRequestMessage) invoked.  
isCoordinator=false; isStopping=false; cancelInProgress=false

[vm0] [info 2021/10/14 00:24:14.738 UTC vm0 <Pooled High Priority Message 
Processor 2> tid=0x3d] Checking to see if I should become coordinator.  My 
address is 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752

[vm0] [info 2021/10/14 00:24:14.738 UTC vm0 <Pooled High Priority Message 
Processor 2> tid=0x3d] View with removed and left members removed is 
View[heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751|26]
 members: 
[heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752\{lead},
 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm1:585440)<v25>:54753] 
and coordinator would be 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752

[vm0] [info 2021/10/14 00:24:14.739 UTC vm0 <Pooled High Priority Message 
Processor 2> tid=0x3d] This member is becoming the membership coordinator 
with address 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm0:584249)<v23>:54752

[vm1] [info 2021/10/14 00:24:14.738 UTC vm1 <Pooled High Priority Message 
Processor 2> tid=0x4b] received leave request from 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751
 for 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751
{noformat}

At some point this thread fails with "Message distribution has terminated" 
error:
{noformat}
[vm2] [fatal 2021/10/14 00:24:14.739 UTC vm2 <FederatingManager6> 
tid=0x73] Uncaught exception in thread Thread[FederatingManager6,5,RMI Runtime]
[vm2] org.apache.geode.management.ManagementException: 
org.apache.geode.distributed.DistributedSystemDisconnectedException: 
Distribution manager on 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751
 started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated
[vm2]     at 
org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:486)
[vm2]     at 
org.apache.geode.management.internal.FederatingManager$AddMemberTask.call(FederatingManager.java:596)
[vm2]     at 
org.apache.geode.management.internal.FederatingManager.lambda$addMember$1(FederatingManager.java:199)
[vm2]     at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[vm2]     at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[vm2]     at java.base/java.lang.Thread.run(Thread.java:829)
[vm2] Caused by: 
org.apache.geode.distributed.DistributedSystemDisconnectedException: 
Distribution manager on 
heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751
 started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated
[vm2]     at 
org.apache.geode.distributed.internal.ClusterDistributionManager$Stopper.generateCancelledException(ClusterDistributionManager.java:2885)
[vm2]     at 
org.apache.geode.distributed.internal.InternalDistributedSystem$Stopper.generateCancelledException(InternalDistributedSystem.java:1177)
[vm2]     at 
org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5212)
[vm2]     at 
org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83)
[vm2]     at 
org.apache.geode.internal.cache.CreateRegionProcessor.initializeRegion(CreateRegionProcessor.java:121)
[vm2]     at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1164)
[vm2]     at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1095)
[vm2]     at 
org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3108)
[vm2]     at 
org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:78)
[vm2]     at 
org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:429)
[vm2]     ... 5 more
{noformat}

> CI failure: RollingUpgradeRollServersOnReplicatedRegion_dataserializable 
> failed with DistributedSystemDisconnectedException
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9738
>                 URL: https://issues.apache.org/jira/browse/GEODE-9738
>             Project: Geode
>          Issue Type: Bug
>          Components: membership, messaging
>    Affects Versions: 1.15.0
>            Reporter: Kamilla Aslami
>            Assignee: Bill Burcham
>            Priority: Major
>              Labels: needsTriage
>         Attachments: controller.log, locator.log, vm0.log, vm1.log, vm2.log, 
> vm3.log
>
>
> {noformat}
> RollingUpgradeRollServersOnReplicatedRegion_dataserializable > 
> testRollServersOnReplicatedRegion_dataserializable[from_v1.13.4] FAILED
>     java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
>     Fix the strings or use IgnoredException.addIgnoredException to ignore.
>     -----------------------------------------------------------------------
>     Found suspect string in 'dunit_suspect-vm2.log' at line 685    [fatal 
> 2021/10/14 00:24:14.739 UTC <FederatingManager6> tid=115] Uncaught exception 
> in thread Thread[FederatingManager6,5,RMI Runtime]
>     org.apache.geode.management.ManagementException: 
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> Distribution manager on 
> heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751
>  started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated
>         at 
> org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:486)
>         at 
> org.apache.geode.management.internal.FederatingManager$AddMemberTask.call(FederatingManager.java:596)
>         at 
> org.apache.geode.management.internal.FederatingManager.lambda$addMember$1(FederatingManager.java:199)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:829)
>     Caused by: 
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> Distribution manager on 
> heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator)<ec><v8>:54751
>  started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated
>         at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$Stopper.generateCancelledException(ClusterDistributionManager.java:2885)
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Stopper.generateCancelledException(InternalDistributedSystem.java:1177)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5212)
>         at 
> org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83)
>         at 
> org.apache.geode.internal.cache.CreateRegionProcessor.initializeRegion(CreateRegionProcessor.java:121)
>         at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1164)
>         at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1095)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3108)
>         at 
> org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:78)
>         at 
> org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:429)
>         ... 5 more
>         at org.junit.Assert.fail(Assert.java:89)
>         at 
> org.apache.geode.test.dunit.internal.DUnitLauncher.closeAndCheckForSuspects(DUnitLauncher.java:420)
>         at 
> org.apache.geode.test.dunit.internal.DUnitLauncher.closeAndCheckForSuspects(DUnitLauncher.java:436)
>         at 
> org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.cleanupAllVms(JUnit4DistributedTestCase.java:551)
>         at 
> org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.doTearDownDistributedTestCase(JUnit4DistributedTestCase.java:498)
>         at 
> org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.tearDownDistributedTestCase(JUnit4DistributedTestCase.java:481)
>         at jdk.internal.reflect.GeneratedMethodAccessor11.invoke(Unknown 
> Source)
>         at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>         at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>         at 
> org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46)
>         at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>         at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>         at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>         at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>         at org.junit.runners.Suite.runChild(Suite.java:128)
>         at org.junit.runners.Suite.runChild(Suite.java:27)
>         at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>         at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>         at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>         at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>         at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>         at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
>         at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>         at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
>         at java.util.Iterator.forEachRemaining(Iterator.java:133)
>         at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>         at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>         at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
>         at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
>         at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
>         at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>         at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
>         at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
>         at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
>         at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
>         at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:96)
>         at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:75)
>         at 
> org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.processAllTestClasses(JUnitPlatformTestClassProcessor.java:99)
>         at 
> org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.access$000(JUnitPlatformTestClassProcessor.java:79)
>         at 
> org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor.stop(JUnitPlatformTestClassProcessor.java:75)
>         at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:61)
>         at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>         at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>         at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>         at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>         at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>         at com.sun.proxy.$Proxy2.stop(Unknown Source)
>         at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:133)
>         at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>         at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>         at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>         at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
>         at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
>         at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414)
>         at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>         at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at 
> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
>         at java.lang.Thread.run(Thread.java:829)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to