[ https://issues.apache.org/jira/browse/GEODE-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188644#comment-17188644 ]
ASF GitHub Bot commented on GEODE-8475: --------------------------------------- agingade commented on a change in pull request #5492: URL: https://github.com/apache/geode/pull/5492#discussion_r481287853 ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/wan/parallel/ParallelGatewaySenderQueue.java ########## @@ -755,12 +755,16 @@ public boolean put(Object object) throws InterruptedException, CacheException { bucketFullPath, brq); } if (brq != null) { + boolean intializingLocked = brq.lockWhenRegionIsInitializing(); Review comment: Can we add unit tests to make sure failed Initialization lock is held during put. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/wan/parallel/ParallelGatewaySenderQueue.java ########## @@ -755,12 +755,16 @@ public boolean put(Object object) throws InterruptedException, CacheException { bucketFullPath, brq); } if (brq != null) { + boolean intializingLocked = brq.lockWhenRegionIsInitializing(); brq.getInitializationLock().readLock().lock(); try { putIntoBucketRegionQueue(brq, key, value); putDone = true; } finally { brq.getInitializationLock().readLock().unlock(); + if (intializingLocked) { Review comment: I assume we don't have to worry about the above unlock code throwing any exception... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Resolve a potential dead lock in ParallelGatewaySenderQueue > ------------------------------------------------------------ > > Key: GEODE-8475 > URL: https://issues.apache.org/jira/browse/GEODE-8475 > Project: Geode > Issue Type: Improvement > Reporter: Xiaojian Zhou > Assignee: Xiaojian Zhou > Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > When brq is created but encountered a failed GII, enqueue to it could have a > potential deadlock: > Thread-1: > ParallelGatewaySenderQueue.put() will get a > brq.getInitializationLock().readLock().lock() (lock-A’s read lock). Then > during the put operation, it will try to call lockWhenRegionIsInitializing() > to get failedInitialImageLock.readLock().lock (lock-B’s read lock) > Thread-2: > PRDS.createBucketRegion() will trigger GII but failed. So it will call > cleanUpAfterFailedGII(), where it will call lockFailedInitialImageWriteLock > () to get lock-B’s write lock first. Then call > BucketRegionQueue.clearEntries(). > It will call getInitializationLock().writeLock().lock() (lock-A’s write lock). > To fix it, we need to let thread-1 to get failedInitialImageLock.readLock() > (lock-B) before requesting lock-A. -- This message was sent by Atlassian Jira (v8.3.4#803005)