[
https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen Nichols closed GEODE-9881.
-------------------------------
> Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing
> compaction
> --------------------------------------------------------------------------------------
>
> Key: GEODE-9881
> URL: https://issues.apache.org/jira/browse/GEODE-9881
> Project: Geode
> Issue Type: Bug
> Components: persistence
> Reporter: Jakov Varenina
> Assignee: Jakov Varenina
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.15.0
>
>
> We have found problem in case when region is closed with Region.close() and
> then recreated to start the recovery. If you inspect this code in close()
> function you will notice that it doesn't make any sense:
> {code:java}
> void close(DiskRegion dr) {
> // while a krf is being created can not close a region
> lockCompactor();
> try {
> if (!isDrfOnly()) {
> DiskRegionInfo dri = getDRI(dr);
> if (dri != null) {
> long clearCount = dri.clear(null);
> if (clearCount != 0) {
> totalLiveCount.addAndGet(-clearCount);
> // no need to call handleNoLiveValues because we now have an
> // unrecovered region.
> }
> regionMap.get().remove(dr.getId(), dri);
> }
> addUnrecoveredRegion(dr.getId());
> }
> } finally {
> unlockCompactor();
> }
> }
> {code}
> Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as
> unrecovered and increments counter unrecoveredRegionCount. This
> DiskRegionInfo object is contained in regionMap structure. Then afterwards it
> removes DiskRegionInfo object (that was previously marked as unrecovered)
> from the regionMap. This doesn't make any sense, it updated object and then
> removed it from map to be garbage collected. As you will see later on this
> will cause some issues when region is recovered.
> Please check this code at recovery:
> {code:java}
> /**
> * For each dri that this oplog has that is currently unrecoverable check to
> see if a DiskRegion
> * that is recoverable now exists.
> */
> void checkForRecoverableRegion(DiskRegionView dr) {
> if (unrecoveredRegionCount.get() > 0) {
> DiskRegionInfo dri = getDRI(dr);
> if (dri != null) {
> if (dri.testAndSetRecovered(dr)) {
> unrecoveredRegionCount.decrementAndGet();
> }
> }
> }
> }
> {code}
> The problem is that geode will not clear counter unrecoveredRegionCount in
> Oplog objects after recovery is done. This is because
> checkForRecoverableRegion will check unrecoveredRegionCount counter and
> perform testAndSetRecovered. The testAndSetRecovered will always return
> false, because non of the DiskRegionInfo objects in region map have
> unrecovered flag set to true (all object marked as unrecovered were deleted
> by close(), and then they were recreated during recovery.... see note below).
> The problem here is that all Oplogs will be fully recovered with the counter
> incorrectly indicating unrecoveredRegionCount>0. This will later on prevent
> the compaction of recovered Oplogs (the files that have .crf, .drf and .krf)
> when they reach compaction threshold.
> Note: During recovery regionMap will be recreated from the Oplog files. Since
> all DiskRegionInfo objects are deleted from regionMap during the close(),
> they will be recreated by using function initRecoveredEntry during the
> recovery. All DiskRegionInfo will be created with flag unrecovered set to
> false.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)