[ https://issues.apache.org/jira/browse/GEODE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234118#comment-17234118 ]
ASF subversion and git services commented on GEODE-8686: -------------------------------------------------------- Commit aaa3d13e9c604806624e43ba86d959473226a674 in geode's branch refs/heads/master from Donal Evans [ https://gitbox.apache.org/repos/asf?p=geode.git;h=aaa3d13 ] GEODE-8686: Prevent potential deadlock during GII and tombstone GC (#5707) - Do not call AbstractRegionMap.removeTombstone() outside of TombstoneService class - Add test to confirm that tombstones are correctly scheduled and collected with this change Authored-by: Donal Evans <doev...@vmware.com> (cherry picked from commit 70b1ee8b98f1458620539c4a962e82f8ef35707b) > Tombstone removal optimization during GII could cause deadlock > -------------------------------------------------------------- > > Key: GEODE-8686 > URL: https://issues.apache.org/jira/browse/GEODE-8686 > Project: Geode > Issue Type: Improvement > Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0 > Reporter: Donal Evans > Assignee: Donal Evans > Priority: Major > Labels: pull-request-available > Fix For: 1.12.1, 1.14.0, 1.13.1 > > > Similar to the issue described in GEODE-6526, if the condition in the below > if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a > call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could > lead to deadlock between the calling thread and a Tombstone GC thread calling > {{TombstoneService.gcTombstones()}}. > {code:java} > if (owner.getServerProxy() == null && > owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), > entryVersion.getRegionVersion())) { > // the received tombstone has already been reaped, so don't retain it > if (owner.getIndexManager() != null) { > owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, > IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); > } > removeTombstone(oldRe, entryVersion, false, false); > return false; > } else { > owner.scheduleTombstone(oldRe, entryVersion); > lruEntryDestroy(oldRe); > } > {code} > The proposed change is to remove this if statement and allow the old > tombstone to be collected later by calling {{scheduleTombstone()}} in all > cases. The call to {{AbstractRegionMap.removeTombstone()}} in > {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to > allow immediate removal of tombstones that we know have already been > collected on other members, but since the conditions to trigger it are rare > (the old entry must be a tombstone, the new entry received during GII must be > a tombstone with a newer version, and we must have already collected a > tombstone with a newer version than the new entry) and the overhead of > scheduling a tombstone to be collected is comparatively low, the performance > impact of removing this optimization in favour of simply scheduling the > tombstone to be collected in all cases should be insignificant. > The solution to the deadlock observed in GEODE-6526 was also to remove the > call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be > collected later and did not result in any unwanted behaviour, so the proposed > fix should be similarly low-impact. > Also of note is that with this proposed change, there will be no calls to > {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} > class, which should ensure that other deadlocks involving this method are not > possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)