[ https://issues.apache.org/jira/browse/GEODE-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166997#comment-17166997 ]
ASF GitHub Bot commented on GEODE-7846: --------------------------------------- BenjaminPerryRoss commented on a change in pull request #5391: URL: https://github.com/apache/geode/pull/5391#discussion_r461218632 ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java ########## @@ -162,6 +162,7 @@ protected void waitForPrimary(PartitionedRegion.RetryTimeKeeper retryTimer) { doAfterClear(regionEvent); } } + incClearCount(); Review comment: This is part of a larger discussion about what we're actually trying to record with the duration. It's something Gester and I have touched on in the operations standup and I'll probably put something up regarding it in the slack channel but basically the duration in the initial version (the one you commented on) was measuring the total duration of the entire clear operation from the perspective of the coordinator. The way it's been changed to (in line with your suggestion) is to move it into the actual local clear. This makes the stat present on all members which have the region (as long as they have data) but it means that this value ONLY captures the actual process of clearing the region and not any of the stuff that happens in between (messaging, any work done on the coordinator other than the local clear, etc). For now it's been changed to something that matches with your thought process here, but if you have ideas on this I would love to hear them. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/CachePerfStats.java ########## @@ -442,7 +451,10 @@ f.createIntCounter("retries", "Number of times a concurrent destroy followed by a create has caused an entry operation to need to retry.", "operations"), - f.createLongCounter("clears", clearsDesc, "operations"), + f.createLongCounter("regionClears", regionClearsDesc, "operations"), Review comment: Agreed. I recall there was a documentation story for PR clear, but I'm not sure what happened to it. I'll hunt it down and if this documentation isn't covered by it I'll make sure to add it to this PR. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java ########## @@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean cacheWrite) { releaseLockForClear(regionEvent); } } - } finally { releaseDistributedClearLock(lockName); + incClearDuration(System.nanoTime() - clearStartTime); + } + } + + void incClearCount() { + if (partitionedRegion != null && partitionedRegion.getDataStore() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() != 0) { + CachePerfStats stats = partitionedRegion.getCachePerfStats(); + if (stats != null) { + logger.info("BR inc PR Region count:" + stats.getClass().getName() + ":" Review comment: Done ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java ########## @@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean cacheWrite) { releaseLockForClear(regionEvent); } } - } finally { releaseDistributedClearLock(lockName); + incClearDuration(System.nanoTime() - clearStartTime); + } + } + + void incClearCount() { + if (partitionedRegion != null && partitionedRegion.getDataStore() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() != 0) { + CachePerfStats stats = partitionedRegion.getCachePerfStats(); + if (stats != null) { + logger.info("BR inc PR Region count:" + stats.getClass().getName() + ":" + + partitionedRegion.getFullPath(), new Exception()); + stats.incClearCount(); + } } } + void incClearDuration(long durationNanos) { + if (partitionedRegion != null && partitionedRegion.getTotalNumberOfBuckets() != 0) { + CachePerfStats stats = partitionedRegion.getCachePerfStats(); + if (stats != null) { + logger.info("BR inc PR Duration by + " + durationNanos + " ns:" + stats.getClass().getName() Review comment: Done. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java ########## @@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean cacheWrite) { releaseLockForClear(regionEvent); } } - } finally { releaseDistributedClearLock(lockName); + incClearDuration(System.nanoTime() - clearStartTime); + } + } + + void incClearCount() { + if (partitionedRegion != null && partitionedRegion.getDataStore() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() != 0) { Review comment: Checking that the partitioned region is non null might be overkill, but the others are needed to determine whether or not this member has data. If there's no data on the member then no clear is actually taking place and we don't want to log a count. I added this many checks because I saw inconsistent behavior in my tests where sometimes the PR would have a list which was empty, sometimes it would have a list which was null for my Accessor. I don't think it's necessary to throw an exception in this case. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/RegionPerfStats.java ########## @@ -519,9 +519,15 @@ public void incEvictWorkTime(long delta) { cachePerfStats.incEvictWorkTime(delta); } + @Override + public void incRegionClearCount() { + stats.incLong(regionClearsId, 1L); + cachePerfStats.incClearCount(); + } + @Override public void incClearCount() { - stats.incLong(clearsId, 1L); + stats.incLong(bucketClearsId, 1L); Review comment: It took me awhile to understand what you meant here - but you were correct. There was some weird inheretance going on here stemming from the fact that I didn't understand the relationship between RegionStats, RegionPerfStats, and CachePerfStats. These should be fixed now. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/RegionStats.java ########## @@ -135,7 +135,9 @@ void incEvictWorkTime(long delta); - void incClearCount(); + void incBucketClearCount(); + + void incRegionClearCount(); Review comment: RegionPerfStats and other classes which used "incClearCount()" have been replaced with "incRegionClearCount()" as that is more descriptive and you're right - they should be consistent. This was being blocked previously by the error you identified in your previous comment. ########## File path: geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearDUnitTest.java ########## @@ -170,6 +170,13 @@ private void verifyClientRegionSize(int expectedNum) { return destroys; }; + SerializableCallableIF<Integer> getStatClearCount = () -> { Review comment: Removed. ########## File path: geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearDUnitTest.java ########## @@ -330,6 +337,101 @@ public void normalClearFromAccessorWithoutWriterButWithWriterOnDataStore() { .isEqualTo(0); } + @Test + public void normalClearFromDataStoreUpdatesStats() { + configureServers(false, true); + client1.invoke(this::initClientCache); + client2.invoke(this::initClientCache); + + // Verify no clears have been recorded in stats + dataStore1.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + + for (BucketRegion bucket : region.getDataStore().getAllLocalBucketRegions()) { + long clearCount = bucket.getCachePerfStats().getRegionClearCount(); + assertThat(clearCount).isEqualTo(0); + } + }); + + accessor.invoke(() -> feed(false)); + verifyServerRegionSize(NUM_ENTRIES); + dataStore1.invoke(() -> getRegion(false).clear()); + verifyServerRegionSize(0); + + // Verify the stats were properly updated for the bucket regions + dataStore1.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + long clearCount = 0L; Review comment: The trick here is that we don't know exactly how many buckets will be in each datastore for sure. In the current iteration of the test, 2 members will have 7 and one member will have 6 buckets (although I'd imagine you could end up with slightly different balance in specific scenarios). Originally I changed the test to have a number of total buckets across all members that was evenly divisible by the number of datastores - but this felt like a hardcoded trap for anyone trying to modify the test in the future. In the end I decided that it was good enough to determine that the count was incrementing to a consistent non-zero value even if I couldn't make sure that value matched up to a hardcoded expectation - but if you feel this is insufficient I can take another shot at this check. ########## File path: geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearDUnitTest.java ########## @@ -330,6 +337,101 @@ public void normalClearFromAccessorWithoutWriterButWithWriterOnDataStore() { .isEqualTo(0); } + @Test + public void normalClearFromDataStoreUpdatesStats() { + configureServers(false, true); + client1.invoke(this::initClientCache); + client2.invoke(this::initClientCache); + + // Verify no clears have been recorded in stats + dataStore1.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + + for (BucketRegion bucket : region.getDataStore().getAllLocalBucketRegions()) { + long clearCount = bucket.getCachePerfStats().getRegionClearCount(); + assertThat(clearCount).isEqualTo(0); + } + }); + + accessor.invoke(() -> feed(false)); + verifyServerRegionSize(NUM_ENTRIES); + dataStore1.invoke(() -> getRegion(false).clear()); + verifyServerRegionSize(0); + + // Verify the stats were properly updated for the bucket regions + dataStore1.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + long clearCount = 0L; + + for (BucketRegion bucket : region.getDataStore().getAllLocalBucketRegions()) { + if (clearCount == 0) { + clearCount = bucket.getCachePerfStats().getBucketClearCount(); + } + assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount) + .isNotEqualTo(0); + } + + assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1); + assertThat(region.getCachePerfStats().getPartitionedRegionClearDuration()).isGreaterThan(0); + }); + + dataStore2.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + long clearCount = 0L; + + for (BucketRegion bucket : region.getDataStore().getAllLocalBucketRegions()) { + if (clearCount == 0) { + clearCount = bucket.getCachePerfStats().getBucketClearCount(); + } + assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount) + .isNotEqualTo(0); + } + + assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1); Review comment: Good call - done. ########## File path: geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearDUnitTest.java ########## @@ -330,6 +337,101 @@ public void normalClearFromAccessorWithoutWriterButWithWriterOnDataStore() { .isEqualTo(0); } + @Test + public void normalClearFromDataStoreUpdatesStats() { + configureServers(false, true); + client1.invoke(this::initClientCache); + client2.invoke(this::initClientCache); + + // Verify no clears have been recorded in stats + dataStore1.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + + for (BucketRegion bucket : region.getDataStore().getAllLocalBucketRegions()) { + long clearCount = bucket.getCachePerfStats().getRegionClearCount(); + assertThat(clearCount).isEqualTo(0); + } + }); + + accessor.invoke(() -> feed(false)); + verifyServerRegionSize(NUM_ENTRIES); + dataStore1.invoke(() -> getRegion(false).clear()); + verifyServerRegionSize(0); + + // Verify the stats were properly updated for the bucket regions + dataStore1.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + long clearCount = 0L; + + for (BucketRegion bucket : region.getDataStore().getAllLocalBucketRegions()) { + if (clearCount == 0) { + clearCount = bucket.getCachePerfStats().getBucketClearCount(); + } + assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount) + .isNotEqualTo(0); + } + + assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1); + assertThat(region.getCachePerfStats().getPartitionedRegionClearDuration()).isGreaterThan(0); + }); + + dataStore2.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + long clearCount = 0L; + + for (BucketRegion bucket : region.getDataStore().getAllLocalBucketRegions()) { + if (clearCount == 0) { + clearCount = bucket.getCachePerfStats().getBucketClearCount(); + } + assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount) + .isNotEqualTo(0); + } + + assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1); + }); + + dataStore3.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + long clearCount = 0L; + + for (BucketRegion bucket : region.getDataStore().getAllLocalBucketRegions()) { + if (clearCount == 0) { + clearCount = bucket.getCachePerfStats().getBucketClearCount(); + } + assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount) + .isNotEqualTo(0); + } + + assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1); + }); + + + // The accessor shouldn't increment the region clear count + accessor.invoke(() -> { + PartitionedRegion region = (PartitionedRegion) getRegion(false); + assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(0); + }); + + // do the region destroy to compare that the same callbacks will be triggered + dataStore1.invoke(() -> { + Region region = getRegion(false); + region.destroyRegion(); + }); + + assertThat(dataStore1.invoke(getWriterDestroys)).isEqualTo(dataStore1.invoke(getWriterClears)) + .isEqualTo(0); + assertThat(dataStore2.invoke(getWriterDestroys)).isEqualTo(dataStore2.invoke(getWriterClears)) + .isEqualTo(0); + assertThat(dataStore3.invoke(getWriterDestroys)).isEqualTo(dataStore3.invoke(getWriterClears)) + .isEqualTo(0); + assertThat(accessor.invoke(getWriterDestroys)).isEqualTo(accessor.invoke(getWriterClears)) + .isEqualTo(1); + + assertThat(accessor.invoke(getBucketRegionWriterDestroys)) + .isEqualTo(accessor.invoke(getBucketRegionWriterClears)) + .isEqualTo(0); Review comment: These assertions mostly just check that the cache writer behaved as expected. While they're not specifically necessary, they're present in every other test in this suite so I opted to leave them in. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java ########## @@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean cacheWrite) { releaseLockForClear(regionEvent); } } - } finally { releaseDistributedClearLock(lockName); + incClearDuration(System.nanoTime() - clearStartTime); + } + } + + void incClearCount() { + if (partitionedRegion != null && partitionedRegion.getDataStore() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() != 0) { + CachePerfStats stats = partitionedRegion.getCachePerfStats(); + if (stats != null) { + logger.info("BR inc PR Region count:" + stats.getClass().getName() + ":" + + partitionedRegion.getFullPath(), new Exception()); + stats.incClearCount(); + } } } + void incClearDuration(long durationNanos) { + if (partitionedRegion != null && partitionedRegion.getTotalNumberOfBuckets() != 0) { Review comment: Same reason as above, but I wanted to add that this was simpler because in the original PR this call was only happening on the coordinator. Now this can happen on any member so I've changed it to check everything that was being checked for the clear count. ########## File path: geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java ########## @@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean cacheWrite) { releaseLockForClear(regionEvent); } } - } finally { releaseDistributedClearLock(lockName); + incClearDuration(System.nanoTime() - clearStartTime); + } + } + + void incClearCount() { + if (partitionedRegion != null && partitionedRegion.getDataStore() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null + && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() != 0) { Review comment: It could, but that pattern would have to be followed if this clear count was ever incremented from anywhere else in the code base, as it would be reliant upon being called within a conditional. Is that preferable? Personally, I feel like having the conditions - while a little messy - communicates the requirements of the method pretty clearly and while I wouldn't necessarily promote that approach for a more complex method which is already very busy with functionality, this method is simple enough to read even with the conditions. Aside from that I think what you're suggesting is certainly possible. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Clear in Partitioned Region should have its own stats > ----------------------------------------------------- > > Key: GEODE-7846 > URL: https://issues.apache.org/jira/browse/GEODE-7846 > Project: Geode > Issue Type: Improvement > Components: core > Reporter: Xiaojian Zhou > Priority: Major > Labels: GeodeCommons, GeodeOperationAPI, pull-request-available > > Clear operation in PR should have its own stats: > 1) clear operation executed. > 2) clear operation failed > 3) clear messages sends to buckets -- This message was sent by Atlassian Jira (v8.3.4#803005)