[ https://issues.apache.org/jira/browse/GEODE-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen Nichols closed GEODE-9887. ------------------------------- > Deadlock when shutting down gws threads unnecessarily delay shutdown of > server for 15 seconds > --------------------------------------------------------------------------------------------- > > Key: GEODE-9887 > URL: https://issues.apache.org/jira/browse/GEODE-9887 > Project: Geode > Issue Type: Bug > Components: wan > Reporter: Jakov Varenina > Assignee: Jakov Varenina > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > See deadlock in below logs: > 1. "Distributed system shutdown hook" takes lock 0x00000000c445e988, initiate > "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread" threads and > waits for them to finish. > 2. "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread5" set flag > AckReaderThread.shutdown to true and wait for shutdown to finish by joining > threads for max 15 seconds. > 3. "AckReaderThread for : Event Processor for GatewaySender_sender1_4" thread > waits for the lock 0x00000000c445e988 owned by "Distributed system shutdown > hook" thread > This deadlock only last for 15 seconds, because thread join will expire for > all "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread" threads > forcing them to finish. After these threads finish then "Distributed system > shutdown hook" can continue the execution, release the lock and conclude the > shutdown of the server. > > {code:java} > "Distributed system shutdown hook" #14 prio=5 os_prio=0 cpu=20.78ms > elapsed=11.33s tid=0x00007f848c005000 nid=0x1e04 waiting on condition > [0x00007f83ec415000] > java.lang.Thread.State: WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method) > - parking to wait for <0x00000000fcc00e50> (a > java.util.concurrent.FutureTask) > at > java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194) > at > java.util.concurrent.FutureTask.awaitDone(java.base@11.0.13/FutureTask.java:447) > at > java.util.concurrent.FutureTask.get(java.base@11.0.13/FutureTask.java:190) > at > java.util.concurrent.AbstractExecutorService.invokeAll(java.base@11.0.13/AbstractExecutorService.java:247) > at > org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderEventProcessor.stopProcessing(ConcurrentParallelGatewaySenderEventProcessor.java:258) > at > org.apache.geode.internal.cache.wan.AbstractGatewaySender.stopProcessing(AbstractGatewaySender.java:726) > at > org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderImpl.stop(ParallelGatewaySenderImpl.java:118) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2165) > - locked <0x00000000c11a7400> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559) > - locked <0x00000000c11a7400> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.lambda$static$7(InternalDistributedSystem.java:2202) > at > org.apache.geode.distributed.internal.InternalDistributedSystem$$Lambda$110/0x0000000100226840.run(Unknown > Source) > at java.lang.Thread.run(java.base@11.0.13/Thread.java:829) > Locked ownable synchronizers: > - <0x00000000c445e988> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > "AckReaderThread for : Event Processor for GatewaySender_sender1_4" #402 > daemon prio=5 os_prio=0 cpu=3168.26ms elapsed=640.74s tid=0x00007f8434023000 > nid=0x1181 waiting on condition [0x00007f83eda2b000] > java.lang.Thread.State: WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method) > - parking to wait for <0x00000000c445e988> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.13/AbstractQueuedSynchronizer.java:885) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.13/AbstractQueuedSynchronizer.java:917) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.13/AbstractQueuedSynchronizer.java:1240) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java.base@11.0.13/ReentrantReadWriteLock.java:959) > at > org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.run(GatewaySenderEventRemoteDispatcher.java:665) > Locked ownable synchronizers: > - None > "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread5" #872 daemon > prio=5 os_prio=0 cpu=1.39ms elapsed=14.09s tid=0x00007f849801a000 nid=0x1e13 > in Object.wait() [0x00007f849c442000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(java.base@11.0.13/Native Method) > - waiting on <no object reference available> > at java.lang.Thread.join(java.base@11.0.13/Thread.java:1308) > - waiting to re-lock in wait() <0x00000000c542ce20> (a > org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread) > at > org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.shutdown(GatewaySenderEventRemoteDispatcher.java:771) > at > org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.stopAckReaderThread(GatewaySenderEventRemoteDispatcher.java:802) > at > org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.stop(GatewaySenderEventRemoteDispatcher.java:826) > at > org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1222) > at > org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399) > at > org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387) > at > java.util.concurrent.FutureTask.run(java.base@11.0.13/FutureTask.java:264) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.13/Thread.java:829) Locked > ownable synchronizers: > - <0x00000000fcf4daa8> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > {code} > -- This message was sent by Atlassian Jira (v8.20.7#820007)