[ https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen Nichols closed GEODE-9881. ------------------------------- > Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing > compaction > -------------------------------------------------------------------------------------- > > Key: GEODE-9881 > URL: https://issues.apache.org/jira/browse/GEODE-9881 > Project: Geode > Issue Type: Bug > Components: persistence > Reporter: Jakov Varenina > Assignee: Jakov Varenina > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > We have found problem in case when region is closed with Region.close() and > then recreated to start the recovery. If you inspect this code in close() > function you will notice that it doesn't make any sense: > {code:java} > void close(DiskRegion dr) { > // while a krf is being created can not close a region > lockCompactor(); > try { > if (!isDrfOnly()) { > DiskRegionInfo dri = getDRI(dr); > if (dri != null) { > long clearCount = dri.clear(null); > if (clearCount != 0) { > totalLiveCount.addAndGet(-clearCount); > // no need to call handleNoLiveValues because we now have an > // unrecovered region. > } > regionMap.get().remove(dr.getId(), dri); > } > addUnrecoveredRegion(dr.getId()); > } > } finally { > unlockCompactor(); > } > } > {code} > Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as > unrecovered and increments counter unrecoveredRegionCount. This > DiskRegionInfo object is contained in regionMap structure. Then afterwards it > removes DiskRegionInfo object (that was previously marked as unrecovered) > from the regionMap. This doesn't make any sense, it updated object and then > removed it from map to be garbage collected. As you will see later on this > will cause some issues when region is recovered. > Please check this code at recovery: > {code:java} > /** > * For each dri that this oplog has that is currently unrecoverable check to > see if a DiskRegion > * that is recoverable now exists. > */ > void checkForRecoverableRegion(DiskRegionView dr) { > if (unrecoveredRegionCount.get() > 0) { > DiskRegionInfo dri = getDRI(dr); > if (dri != null) { > if (dri.testAndSetRecovered(dr)) { > unrecoveredRegionCount.decrementAndGet(); > } > } > } > } > {code} > The problem is that geode will not clear counter unrecoveredRegionCount in > Oplog objects after recovery is done. This is because > checkForRecoverableRegion will check unrecoveredRegionCount counter and > perform testAndSetRecovered. The testAndSetRecovered will always return > false, because non of the DiskRegionInfo objects in region map have > unrecovered flag set to true (all object marked as unrecovered were deleted > by close(), and then they were recreated during recovery.... see note below). > The problem here is that all Oplogs will be fully recovered with the counter > incorrectly indicating unrecoveredRegionCount>0. This will later on prevent > the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) > when they reach compaction threshold. > Note: During recovery regionMap will be recreated from the Oplog files. Since > all DiskRegionInfo objects are deleted from regionMap during the close(), > they will be recreated by using function initRecoveredEntry during the > recovery. All DiskRegionInfo will be created with flag unrecovered set to > false. > -- This message was sent by Atlassian Jira (v8.20.7#820007)