[jira] [Resolved] (GEODE-8473) Hang in ReplyProcessor21 when forced-disconnect does not establish a cancellation cause

Bruce J Schuchardt (Jira) Wed, 16 Sep 2020 09:18:32 -0700


     [ 
https://issues.apache.org/jira/browse/GEODE-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bruce J Schuchardt resolved GEODE-8473.
---------------------------------------
    Fix Version/s: 1.14.0
       Resolution: Fixed

> Hang in ReplyProcessor21 when forced-disconnect does not establish a 
> cancellation cause
> ---------------------------------------------------------------------------------------
>
>                 Key: GEODE-8473
>                 URL: https://issues.apache.org/jira/browse/GEODE-8473
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.13.0
>            Reporter: Bruce J Schuchardt
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>
> I suspect this is due to the recent Membership refactoring.  In a test that 
> exposed GEODE-8467 I saw an application thread from before the 
> forced-disconnect still hanging around waiting for a response.
> {noformat}
>    java.lang.Thread.State: TIMED_WAITING (parking)   java.lang.Thread.State: 
> TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to 
> wait for  <0x00000000ea5c43c0> (a java.util.concurrent.CountDownLatch$Sync) 
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>  at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
>  at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
>  at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
>  at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
>  at 
> org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6752)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6703)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6685)
>  at 
> org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6657)
>  at 
> org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)
>  at 
> org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078) 
> at org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8288) at 
> util.TestHelper.getRegionStr(TestHelper.java:1669) at 
> util.TestHelper.regionHierarchyToString(TestHelper.java:1654) at 
> util.TestHelper.logRegionHierarchy(TestHelper.java:1639) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> hydra.MethExecutor.execute(MethExecutor.java:173) at 
> hydra.MethExecutor.execute(MethExecutor.java:141) at 
> hydra.TestTask.execute(TestTask.java:197) at 
> hydra.RemoteTestModule$1.run(RemoteTestModule.java:213) {noformat}
> ReplyProcessor21 uses a StoppableCountdownLatch to wait for a response.  This 
> latch loops waiting for countdown but also checks 
> ClusterDistributionManager's CancelCriterion to see if the system is shutting 
> down.  If so it stops waiting for a response.
> Due to GEODE-8467 the thread that sets the CancelCriterion's shutdown 
> "rootCause" is never started.  Either Membership needs to ensure that this 
> upward notification happens or ClusterDistributionManager's CancelCriterion 
> needs to check with the Services.Stopper in GMSMembership to see if a 
> "rootCause" has been established there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (GEODE-8473) Hang in ReplyProcessor21 when forced-disconnect does not establish a cancellation cause

Reply via email to