[
https://issues.apache.org/jira/browse/GEODE-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen Nichols closed GEODE-9887.
-------------------------------
> Deadlock when shutting down gws threads unnecessarily delay shutdown of
> server for 15 seconds
> ---------------------------------------------------------------------------------------------
>
> Key: GEODE-9887
> URL: https://issues.apache.org/jira/browse/GEODE-9887
> Project: Geode
> Issue Type: Bug
> Components: wan
> Reporter: Jakov Varenina
> Assignee: Jakov Varenina
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.15.0
>
>
> See deadlock in below logs:
> 1. "Distributed system shutdown hook" takes lock 0x00000000c445e988, initiate
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread" threads and
> waits for them to finish.
> 2. "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread5" set flag
> AckReaderThread.shutdown to true and wait for shutdown to finish by joining
> threads for max 15 seconds.
> 3. "AckReaderThread for : Event Processor for GatewaySender_sender1_4" thread
> waits for the lock 0x00000000c445e988 owned by "Distributed system shutdown
> hook" thread
> This deadlock only last for 15 seconds, because thread join will expire for
> all "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread" threads
> forcing them to finish. After these threads finish then "Distributed system
> shutdown hook" can continue the execution, release the lock and conclude the
> shutdown of the server.
>
> {code:java}
> "Distributed system shutdown hook" #14 prio=5 os_prio=0 cpu=20.78ms
> elapsed=11.33s tid=0x00007f848c005000 nid=0x1e04 waiting on condition
> [0x00007f83ec415000]
> java.lang.Thread.State: WAITING (parking)
> at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
> - parking to wait for <0x00000000fcc00e50> (a
> java.util.concurrent.FutureTask)
> at
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
> at
> java.util.concurrent.FutureTask.awaitDone([email protected]/FutureTask.java:447)
> at
> java.util.concurrent.FutureTask.get([email protected]/FutureTask.java:190)
> at
> java.util.concurrent.AbstractExecutorService.invokeAll([email protected]/AbstractExecutorService.java:247)
> at
> org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderEventProcessor.stopProcessing(ConcurrentParallelGatewaySenderEventProcessor.java:258)
> at
> org.apache.geode.internal.cache.wan.AbstractGatewaySender.stopProcessing(AbstractGatewaySender.java:726)
> at
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderImpl.stop(ParallelGatewaySenderImpl.java:118)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2165)
> - locked <0x00000000c11a7400> (a java.lang.Class for
> org.apache.geode.internal.cache.GemFireCacheImpl)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559)
> - locked <0x00000000c11a7400> (a java.lang.Class for
> org.apache.geode.internal.cache.GemFireCacheImpl)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.lambda$static$7(InternalDistributedSystem.java:2202)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem$$Lambda$110/0x0000000100226840.run(Unknown
> Source)
> at java.lang.Thread.run([email protected]/Thread.java:829)
> Locked ownable synchronizers:
> - <0x00000000c445e988> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> "AckReaderThread for : Event Processor for GatewaySender_sender1_4" #402
> daemon prio=5 os_prio=0 cpu=3168.26ms elapsed=640.74s tid=0x00007f8434023000
> nid=0x1181 waiting on condition [0x00007f83eda2b000]
> java.lang.Thread.State: WAITING (parking)
> at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
> - parking to wait for <0x00000000c445e988> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued([email protected]/AbstractQueuedSynchronizer.java:917)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:1240)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock([email protected]/ReentrantReadWriteLock.java:959)
> at
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.run(GatewaySenderEventRemoteDispatcher.java:665)
> Locked ownable synchronizers:
> - None
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread5" #872 daemon
> prio=5 os_prio=0 cpu=1.39ms elapsed=14.09s tid=0x00007f849801a000 nid=0x1e13
> in Object.wait() [0x00007f849c442000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait([email protected]/Native Method)
> - waiting on <no object reference available>
> at java.lang.Thread.join([email protected]/Thread.java:1308)
> - waiting to re-lock in wait() <0x00000000c542ce20> (a
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread)
> at
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.shutdown(GatewaySenderEventRemoteDispatcher.java:771)
> at
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.stopAckReaderThread(GatewaySenderEventRemoteDispatcher.java:802)
> at
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.stop(GatewaySenderEventRemoteDispatcher.java:826)
> at
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1222)
> at
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
> at
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
> at
> java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run([email protected]/Thread.java:829) Locked
> ownable synchronizers:
> - <0x00000000fcf4daa8> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)