[ https://issues.apache.org/jira/browse/GEODE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce Schuchardt resolved GEODE-2024. ------------------------------------- Resolution: Fixed Fixed in GEODE-2228. FutureResult wasn't checking for cancellation in its get(time,timeunit) method. > Deadlock creating a new distributed lock service Grantor while transactions > are in progress > ------------------------------------------------------------------------------------------- > > Key: GEODE-2024 > URL: https://issues.apache.org/jira/browse/GEODE-2024 > Project: Geode > Issue Type: Bug > Components: distributed lock service > Reporter: Bruce Schuchardt > Assignee: Bruce Schuchardt > Fix For: 1.1.0 > > > An integration test running transactions in client caches while servers are > killed and restarted one by one encountered a hang when the server currrently > serving as the distributed-lock "grantor" was killed and another server > attempted to take on the role. > The stack traces below are from a build prior to package renaming. > TX commit thread: > {noformat} > vm_0_bridge1_w1-gst-dev23_17330:ServerConnection on port 21566 Thread 352 > ID=861 state=TIMED_WAITING > waiting to lock <java.util.concurrent.CountDownLatch$Sync@41c71f7d> > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:55) > at > com.gemstone.gemfire.internal.util.concurrent.FutureResult.get(FutureResult.java:54) > at > com.gemstone.gemfire.distributed.internal.locks.DLockService.waitForLockGrantorFutureResult(DLockService.java:774) > at > com.gemstone.gemfire.distributed.internal.locks.DLockService.notLockGrantorId(DLockService.java:837) > at > com.gemstone.gemfire.distributed.internal.locks.DLockService.releaseTryLocks(DLockService.java:2216) > at > com.gemstone.gemfire.internal.cache.locks.TXLockServiceImpl.release(TXLockServiceImpl.java:222) > locked <java.util.ArrayList@34bbf1cf> > at > com.gemstone.gemfire.internal.cache.TXLockRequest.releaseDistributed(TXLockRequest.java:91) > at > com.gemstone.gemfire.internal.cache.TXLockRequest.cleanup(TXLockRequest.java:120) > at com.gemstone.gemfire.internal.cache.TXState.cleanup(TXState.java:730) > at com.gemstone.gemfire.internal.cache.TXState.commit(TXState.java:447) > at > com.gemstone.gemfire.internal.cache.TXStateProxyImpl.commit(TXStateProxyImpl.java:234) > at > com.gemstone.gemfire.internal.cache.TXManagerImpl.commit(TXManagerImpl.java:325) > at > com.gemstone.gemfire.internal.cache.tier.sockets.command.CommitCommand.cmdExecute(CommitCommand.java:80) > at > com.gemstone.gemfire.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:177) > at > com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:799) > at > com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:930) > at > com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl$1$1.run(AcceptorImpl.java:555) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.ThreadPoolExecutor$Worker@549109017 > java.util.concurrent.locks.ReentrantLock$NonfairSync@317942158 > {noformat} > Grantor trying to recover: > {noformat} > "ServerConnection on port 21566 Thread 356" daemon prio=10 > tid=0x00007f9cf8020800 nid=0x5c9e waiting on condition [0x00007f9c698d6000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000f4f2e4d0> (a > java.util.concurrent.CountDownLatch$Sync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at > com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:55) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:743) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:819) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:796) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:886) > at > com.gemstone.gemfire.distributed.internal.locks.DLockRecoverGrantorProcessor.recoverLockGrantor(DLockRecoverGrantorProcessor.java:96) > at > com.gemstone.gemfire.distributed.internal.locks.DLockService.makeLocalGrantor(DLockService.java:457) > at > com.gemstone.gemfire.distributed.internal.locks.DLockService.createLocalGrantor(DLockService.java:408) > at > com.gemstone.gemfire.distributed.internal.locks.DLockService.getLockGrantorId(DLockService.java:347) > at > com.gemstone.gemfire.distributed.internal.locks.DLockService.acquireTryLocks(DLockService.java:2278) > at > com.gemstone.gemfire.internal.cache.locks.TXLockServiceImpl.txLock(TXLockServiceImpl.java:132) > at > com.gemstone.gemfire.internal.cache.TXLockRequest.obtain(TXLockRequest.java:72) > at > com.gemstone.gemfire.internal.cache.TXState.reserveAndCheck(TXState.java:287) > at com.gemstone.gemfire.internal.cache.TXState.commit(TXState.java:346) > at > com.gemstone.gemfire.internal.cache.TXStateProxyImpl.commit(TXStateProxyImpl.java:234) > at > com.gemstone.gemfire.internal.cache.TXManagerImpl.commit(TXManagerImpl.java:325) > at > com.gemstone.gemfire.internal.cache.tier.sockets.command.CommitCommand.cmdExecute(CommitCommand.java:80) > at > com.gemstone.gemfire.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:177) > at > com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:799) > at > com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:930) > at > com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl$1$1.run(AcceptorImpl.java:555) > at java.lang.Thread.run(Thread.java:745) > {noformat} > And the message-processing thread that is gathering lock service state for > the new grantor but is blocked: > {noformat} > "Pooled Waiting Message Processor 16" daemon prio=10 tid=0x00007f9c9c01d800 > nid=0x618c waiting on condition [0x00007f9c6a1e0000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000f0d906c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:929) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1245) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) > at > com.gemstone.gemfire.internal.util.concurrent.StoppableReentrantReadWriteLock$StoppableWriteLock.lockInterruptibly(StoppableReentrantReadWriteLock.java:198) > at > com.gemstone.gemfire.internal.cache.locks.TXLockServiceImpl.acquireRecoveryWriteLock(TXLockServiceImpl.java:240) > at > com.gemstone.gemfire.internal.cache.locks.TXRecoverGrantorMessageProcessor.processDLockRecoverGrantorMessage(TXRecoverGrantorMessageProcessor.java:79) > at > com.gemstone.gemfire.internal.cache.locks.TXRecoverGrantorMessageProcessor$1.run(TXRecoverGrantorMessageProcessor.java:44) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:692) > at > com.gemstone.gemfire.distributed.internal.DistributionManager$6$1.run(DistributionManager.java:1029) > at java.lang.Thread.run(Thread.java:745) > {noformat} > There were no other threads in the distributed system blocked in > TXLockServiceImpl.acquireRecoveryWriteLock so in this case it was a simple > deadlock involving three threads in a single server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)