[
https://issues.apache.org/jira/browse/GEODE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Donal Evans reassigned GEODE-8686:
----------------------------------
Assignee: Donal Evans
> Tombstone removal optimization during GII could cause deadlock
> --------------------------------------------------------------
>
> Key: GEODE-8686
> URL: https://issues.apache.org/jira/browse/GEODE-8686
> Project: Geode
> Issue Type: Improvement
> Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0
> Reporter: Donal Evans
> Assignee: Donal Evans
> Priority: Major
>
> Similar to the issue described in GEODE-6526, if the condition in the below
> if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a
> call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could
> lead to deadlock between the calling thread and a Tombstone GC thread calling
> {{TombstoneService.gcTombstones()}}.
> {code:java}
> if (owner.getServerProxy() == null &&
> owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(),
> entryVersion.getRegionVersion())) {
> // the received tombstone has already been reaped, so don't retain it
> if (owner.getIndexManager() != null) {
> owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY,
> IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP);
> }
> removeTombstone(oldRe, entryVersion, false, false);
> return false;
> } else {
> owner.scheduleTombstone(oldRe, entryVersion);
> lruEntryDestroy(oldRe);
> }
> {code}
> The proposed change is to remove this if statement and allow the old
> tombstone to be collected later by calling {{scheduleTombstone()}} in all
> cases. The call to {{AbstractRegionMap.removeTombstone()}} in
> {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to
> allow immediate removal of tombstones that we know have already been
> collected on other members, but since the conditions to trigger it are rare
> (the old entry must be a tombstone, the new entry received during GII must be
> a tombstone with a newer version, and we must have already collected a
> tombstone with a newer version than the new entry) and the overhead of
> scheduling a tombstone to be collected is comparatively low, the performance
> impact of removing this optimization in favour of simply scheduling the
> tombstone to be collected in all cases should be insignificant.
> The solution to the deadlock observed in GEODE-6526 was also to remove the
> call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be
> collected later and did not result in any unwanted behaviour, so the proposed
> fix should be similarly low-impact.
> Also of note is that with this proposed change, there will be no calls to
> {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}}
> class, which should ensure that other deadlocks involving this method are not
> possible.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)