Donal Evans created GEODE-8686:
----------------------------------

             Summary: Tombstone removal optimization during GII could cause 
deadlock
                 Key: GEODE-8686
                 URL: https://issues.apache.org/jira/browse/GEODE-8686
             Project: Geode
          Issue Type: Improvement
    Affects Versions: 1.13.0, 1.12.0, 1.11.0, 1.10.0, 1.14.0
            Reporter: Donal Evans


Similar to the issue described in GEODE-6526, if the condition in the below if 
statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a call 
to {{AbstractRegionMap.removeTombstone()}} will be triggered that could lead to 
deadlock between the calling thread and a Tombstone GC thread calling 
{{TombstoneService.gcTombstones()}}. 
{code:java}
if (owner.getServerProxy() == null && 
owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), 
entryVersion.getRegionVersion())) { 
  // the received tombstone has already been reaped, so don't retain it 
  if (owner.getIndexManager() != null) { 
    owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, 
IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); 
  } 
  removeTombstone(oldRe, entryVersion, false, false); 
  return false; 
} else { 
  owner.scheduleTombstone(oldRe, entryVersion); 
  lruEntryDestroy(oldRe); 
}
{code}
The proposed change is to remove this if statement and allow the old tombstone 
to be collected later by calling {{scheduleTombstone()}} in all cases{{.}} The 
call to {{AbstractRegionMap.removeTombstone()}} in 
{{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to 
allow immediate removal of tombstones that we know have already been collected 
on other members, but since the conditions to trigger it are rare (the old 
entry must be a tombstone, the new entry received during GII must be a 
tombstone with a newer version, and we must have already collected a tombstone 
with a newer version than the new entry) and the overhead of scheduling a 
tombstone to be collected is comparatively low, the performance impact of 
removing this optimization in favour of simply scheduling the tombstone to be 
collected in all cases should be insignificant.

The solution to the deadlock observed in GEODE-6526 was also to remove the call 
to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be 
collected later and did not result in any unwanted behaviour, so the proposed 
fix should be similarly low-impact.

Also of note is that with this proposed change, there will be no calls to 
{{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} 
class, which should ensure that other deadlocks involving this method are not 
possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to