[ https://issues.apache.org/jira/browse/GEODE-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce J Schuchardt resolved GEODE-8473. --------------------------------------- Fix Version/s: 1.14.0 Resolution: Fixed > Hang in ReplyProcessor21 when forced-disconnect does not establish a > cancellation cause > --------------------------------------------------------------------------------------- > > Key: GEODE-8473 > URL: https://issues.apache.org/jira/browse/GEODE-8473 > Project: Geode > Issue Type: Bug > Components: membership > Affects Versions: 1.13.0 > Reporter: Bruce J Schuchardt > Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > I suspect this is due to the recent Membership refactoring. In a test that > exposed GEODE-8467 I saw an application thread from before the > forced-disconnect still hanging around waiting for a response. > {noformat} > java.lang.Thread.State: TIMED_WAITING (parking) java.lang.Thread.State: > TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to > wait for <0x00000000ea5c43c0> (a java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865) > at > org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344) > at > org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6752) > at > org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6703) > at > org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6685) > at > org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6657) > at > org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99) > at > org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078) > at org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8288) at > util.TestHelper.getRegionStr(TestHelper.java:1669) at > util.TestHelper.regionHierarchyToString(TestHelper.java:1654) at > util.TestHelper.logRegionHierarchy(TestHelper.java:1639) at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > hydra.MethExecutor.execute(MethExecutor.java:173) at > hydra.MethExecutor.execute(MethExecutor.java:141) at > hydra.TestTask.execute(TestTask.java:197) at > hydra.RemoteTestModule$1.run(RemoteTestModule.java:213) {noformat} > ReplyProcessor21 uses a StoppableCountdownLatch to wait for a response. This > latch loops waiting for countdown but also checks > ClusterDistributionManager's CancelCriterion to see if the system is shutting > down. If so it stops waiting for a response. > Due to GEODE-8467 the thread that sets the CancelCriterion's shutdown > "rootCause" is never started. Either Membership needs to ensure that this > upward notification happens or ClusterDistributionManager's CancelCriterion > needs to check with the Services.Stopper in GMSMembership to see if a > "rootCause" has been established there. -- This message was sent by Atlassian Jira (v8.3.4#803005)