[ 
https://issues.apache.org/jira/browse/GEODE-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen Nichols closed GEODE-9887.
-------------------------------

> Deadlock when shutting down gws threads unnecessarily delay shutdown of 
> server for 15 seconds
> ---------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9887
>                 URL: https://issues.apache.org/jira/browse/GEODE-9887
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> See deadlock in below logs:
> 1. "Distributed system shutdown hook" takes lock 0x00000000c445e988, initiate 
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread" threads and 
> waits for them to finish.
> 2. "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread5" set flag 
> AckReaderThread.shutdown to true and wait for shutdown to finish by joining 
> threads for max 15 seconds.
> 3. "AckReaderThread for : Event Processor for GatewaySender_sender1_4" thread 
> waits for the lock 0x00000000c445e988 owned by "Distributed system shutdown 
> hook"  thread
> This deadlock only last for 15 seconds, because thread join will expire for 
> all "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread" threads 
> forcing them to finish. After these threads finish then "Distributed system 
> shutdown hook" can continue the execution, release the lock and conclude the 
> shutdown of the server.
>  
> {code:java}
> "Distributed system shutdown hook" #14 prio=5 os_prio=0 cpu=20.78ms 
> elapsed=11.33s tid=0x00007f848c005000 nid=0x1e04 waiting on condition  
> [0x00007f83ec415000]
>    java.lang.Thread.State: WAITING (parking)
>         at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
>         - parking to wait for  <0x00000000fcc00e50> (a 
> java.util.concurrent.FutureTask)
>         at 
> java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
>         at 
> java.util.concurrent.FutureTask.awaitDone(java.base@11.0.13/FutureTask.java:447)
>         at 
> java.util.concurrent.FutureTask.get(java.base@11.0.13/FutureTask.java:190)
>         at 
> java.util.concurrent.AbstractExecutorService.invokeAll(java.base@11.0.13/AbstractExecutorService.java:247)
>         at 
> org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderEventProcessor.stopProcessing(ConcurrentParallelGatewaySenderEventProcessor.java:258)
>         at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySender.stopProcessing(AbstractGatewaySender.java:726)
>         at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderImpl.stop(ParallelGatewaySenderImpl.java:118)
>         at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2165)
>         - locked <0x00000000c11a7400> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559)
>         - locked <0x00000000c11a7400> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.lambda$static$7(InternalDistributedSystem.java:2202)
>         at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$$Lambda$110/0x0000000100226840.run(Unknown
>  Source)
>         at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)
>    Locked ownable synchronizers:
>         - <0x00000000c445e988> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> "AckReaderThread for : Event Processor for GatewaySender_sender1_4" #402 
> daemon prio=5 os_prio=0 cpu=3168.26ms elapsed=640.74s tid=0x00007f8434023000 
> nid=0x1181 waiting on condition  [0x00007f83eda2b000]
>    java.lang.Thread.State: WAITING (parking)
>     at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
>     - parking to wait for  <0x00000000c445e988> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>     at 
> java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
>    at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.13/AbstractQueuedSynchronizer.java:885)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.13/AbstractQueuedSynchronizer.java:917)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.13/AbstractQueuedSynchronizer.java:1240)
>     at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java.base@11.0.13/ReentrantReadWriteLock.java:959)
>     at 
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.run(GatewaySenderEventRemoteDispatcher.java:665)
>   Locked ownable synchronizers:
>     - None
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread5" #872 daemon 
> prio=5 os_prio=0 cpu=1.39ms elapsed=14.09s tid=0x00007f849801a000 nid=0x1e13 
> in Object.wait()  [0x00007f849c442000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(java.base@11.0.13/Native Method)
>         - waiting on <no object reference available>
>         at java.lang.Thread.join(java.base@11.0.13/Thread.java:1308)
>         - waiting to re-lock in wait() <0x00000000c542ce20> (a 
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread)
>         at 
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.shutdown(GatewaySenderEventRemoteDispatcher.java:771)
>         at 
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.stopAckReaderThread(GatewaySenderEventRemoteDispatcher.java:802)
>         at 
> org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.stop(GatewaySenderEventRemoteDispatcher.java:826)
>         at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1222)
>         at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
>         at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
>         at 
> java.util.concurrent.FutureTask.run(java.base@11.0.13/FutureTask.java:264)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/ThreadPoolExecutor.java:1128)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/ThreadPoolExecutor.java:628)
>         at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)   Locked 
> ownable synchronizers:
>         - <0x00000000fcf4daa8> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to