[
https://issues.apache.org/jira/browse/GEODE-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bruce J Schuchardt resolved GEODE-8473.
---------------------------------------
Fix Version/s: 1.14.0
Resolution: Fixed
> Hang in ReplyProcessor21 when forced-disconnect does not establish a
> cancellation cause
> ---------------------------------------------------------------------------------------
>
> Key: GEODE-8473
> URL: https://issues.apache.org/jira/browse/GEODE-8473
> Project: Geode
> Issue Type: Bug
> Components: membership
> Affects Versions: 1.13.0
> Reporter: Bruce J Schuchardt
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.14.0
>
>
> I suspect this is due to the recent Membership refactoring. In a test that
> exposed GEODE-8467 I saw an application thread from before the
> forced-disconnect still hanging around waiting for a response.
> {noformat}
> java.lang.Thread.State: TIMED_WAITING (parking) java.lang.Thread.State:
> TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to
> wait for <0x00000000ea5c43c0> (a java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
> at
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
> at
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
> at
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
> at
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
> at
> org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)
> at
> org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6752)
> at
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6703)
> at
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6685)
> at
> org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6657)
> at
> org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)
> at
> org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)
> at org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8288) at
> util.TestHelper.getRegionStr(TestHelper.java:1669) at
> util.TestHelper.regionHierarchyToString(TestHelper.java:1654) at
> util.TestHelper.logRegionHierarchy(TestHelper.java:1639) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> hydra.MethExecutor.execute(MethExecutor.java:173) at
> hydra.MethExecutor.execute(MethExecutor.java:141) at
> hydra.TestTask.execute(TestTask.java:197) at
> hydra.RemoteTestModule$1.run(RemoteTestModule.java:213) {noformat}
> ReplyProcessor21 uses a StoppableCountdownLatch to wait for a response. This
> latch loops waiting for countdown but also checks
> ClusterDistributionManager's CancelCriterion to see if the system is shutting
> down. If so it stops waiting for a response.
> Due to GEODE-8467 the thread that sets the CancelCriterion's shutdown
> "rootCause" is never started. Either Membership needs to ensure that this
> upward notification happens or ClusterDistributionManager's CancelCriterion
> needs to check with the Services.Stopper in GMSMembership to see if a
> "rootCause" has been established there.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)