[jira] [Commented] (GEODE-7846) Clear in Partitioned Region should have its own stats

ASF GitHub Bot (Jira) Wed, 29 Jul 2020 00:41:22 -0700


    [ 
https://issues.apache.org/jira/browse/GEODE-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166997#comment-17166997
 ]


ASF GitHub Bot commented on GEODE-7846:
---------------------------------------

BenjaminPerryRoss commented on a change in pull request #5391:
URL: https://github.com/apache/geode/pull/5391#discussion_r461218632



##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java
##########
@@ -162,6 +162,7 @@ protected void 
waitForPrimary(PartitionedRegion.RetryTimeKeeper retryTimer) {
         doAfterClear(regionEvent);
       }
     }
+    incClearCount();

Review comment:
       This is part of a larger discussion about what we're actually trying to 
record with the duration. It's something Gester and I have touched on in the 
operations standup and I'll probably put something up regarding it in the slack 
channel but basically the duration in the initial version (the one you 
commented on) was measuring the total duration of the entire clear operation 
from the perspective of the coordinator. The way it's been changed to (in line 
with your suggestion) is to move it into the actual local clear. This makes the 
stat present on all members which have the region (as long as they have data) 
but it means that this value ONLY captures the actual process of clearing the 
region and not any of the stuff that happens in between (messaging, any work 
done on the coordinator other than the local clear, etc). For now it's been 
changed to something that matches with your thought process here, but if you 
have ideas on this I would love to hear them.

##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/CachePerfStats.java
##########
@@ -442,7 +451,10 @@
             f.createIntCounter("retries",
                 "Number of times a concurrent destroy followed by a create has 
caused an entry operation to need to retry.",
                 "operations"),
-            f.createLongCounter("clears", clearsDesc, "operations"),
+            f.createLongCounter("regionClears", regionClearsDesc, 
"operations"),

Review comment:
       Agreed. I recall there was a documentation story for PR clear, but I'm 
not sure what happened to it. I'll hunt it down and if this documentation isn't 
covered by it I'll make sure to add it to this PR.

##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java
##########
@@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean 
cacheWrite) {
           releaseLockForClear(regionEvent);
         }
       }
-
     } finally {
       releaseDistributedClearLock(lockName);
+      incClearDuration(System.nanoTime() - clearStartTime);
+    }
+  }
+
+  void incClearCount() {
+    if (partitionedRegion != null && partitionedRegion.getDataStore() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() 
!= 0) {
+      CachePerfStats stats = partitionedRegion.getCachePerfStats();
+      if (stats != null) {
+        logger.info("BR inc PR Region count:" + stats.getClass().getName() + 
":"

Review comment:
       Done

##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java
##########
@@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean 
cacheWrite) {
           releaseLockForClear(regionEvent);
         }
       }
-
     } finally {
       releaseDistributedClearLock(lockName);
+      incClearDuration(System.nanoTime() - clearStartTime);
+    }
+  }
+
+  void incClearCount() {
+    if (partitionedRegion != null && partitionedRegion.getDataStore() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() 
!= 0) {
+      CachePerfStats stats = partitionedRegion.getCachePerfStats();
+      if (stats != null) {
+        logger.info("BR inc PR Region count:" + stats.getClass().getName() + 
":"
+            + partitionedRegion.getFullPath(), new Exception());
+        stats.incClearCount();
+      }
     }
   }
 
+  void incClearDuration(long durationNanos) {
+    if (partitionedRegion != null && 
partitionedRegion.getTotalNumberOfBuckets() != 0) {
+      CachePerfStats stats = partitionedRegion.getCachePerfStats();
+      if (stats != null) {
+        logger.info("BR inc PR Duration by + " + durationNanos + " ns:" + 
stats.getClass().getName()

Review comment:
       Done.

##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java
##########
@@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean 
cacheWrite) {
           releaseLockForClear(regionEvent);
         }
       }
-
     } finally {
       releaseDistributedClearLock(lockName);
+      incClearDuration(System.nanoTime() - clearStartTime);
+    }
+  }
+
+  void incClearCount() {
+    if (partitionedRegion != null && partitionedRegion.getDataStore() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() 
!= 0) {

Review comment:
       Checking that the partitioned region is non null might be overkill, but 
the others are needed to determine whether or not this member has data. If 
there's no data on the member then no clear is actually taking place and we 
don't want to log a count. I added this many checks because I saw inconsistent 
behavior in my tests where sometimes the PR would have a list which was empty, 
sometimes it would have a list which was null for my Accessor. I don't think 
it's necessary to throw an exception in this case.

##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/RegionPerfStats.java
##########
@@ -519,9 +519,15 @@ public void incEvictWorkTime(long delta) {
     cachePerfStats.incEvictWorkTime(delta);
   }
 
+  @Override
+  public void incRegionClearCount() {
+    stats.incLong(regionClearsId, 1L);
+    cachePerfStats.incClearCount();
+  }
+
   @Override
   public void incClearCount() {
-    stats.incLong(clearsId, 1L);
+    stats.incLong(bucketClearsId, 1L);

Review comment:
       It took me awhile to understand what you meant here - but you were 
correct. There was some weird inheretance going on here stemming from the fact 
that I didn't understand the relationship between RegionStats, RegionPerfStats, 
and CachePerfStats. These should be fixed now.

##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/RegionStats.java
##########
@@ -135,7 +135,9 @@
 
   void incEvictWorkTime(long delta);
 
-  void incClearCount();
+  void incBucketClearCount();
+
+  void incRegionClearCount();

Review comment:
       RegionPerfStats and other classes which used "incClearCount()" have been 
replaced with "incRegionClearCount()" as that is more descriptive and you're 
right - they should be consistent. This was being blocked previously by the 
error you identified in your previous comment.

##########
File path: 
geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearDUnitTest.java
##########
@@ -170,6 +170,13 @@ private void verifyClientRegionSize(int expectedNum) {
     return destroys;
   };
 
+  SerializableCallableIF<Integer> getStatClearCount = () -> {

Review comment:
       Removed.

##########
File path: 
geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearDUnitTest.java
##########
@@ -330,6 +337,101 @@ public void 
normalClearFromAccessorWithoutWriterButWithWriterOnDataStore() {
         .isEqualTo(0);
   }
 
+  @Test
+  public void normalClearFromDataStoreUpdatesStats() {
+    configureServers(false, true);
+    client1.invoke(this::initClientCache);
+    client2.invoke(this::initClientCache);
+
+    // Verify no clears have been recorded in stats
+    dataStore1.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+
+      for (BucketRegion bucket : 
region.getDataStore().getAllLocalBucketRegions()) {
+        long clearCount = bucket.getCachePerfStats().getRegionClearCount();
+        assertThat(clearCount).isEqualTo(0);
+      }
+    });
+
+    accessor.invoke(() -> feed(false));
+    verifyServerRegionSize(NUM_ENTRIES);
+    dataStore1.invoke(() -> getRegion(false).clear());
+    verifyServerRegionSize(0);
+
+    // Verify the stats were properly updated for the bucket regions
+    dataStore1.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+      long clearCount = 0L;

Review comment:
       The trick here is that we don't know exactly how many buckets will be in 
each datastore for sure. In the current iteration of the test, 2 members will 
have 7 and one member will have 6 buckets (although I'd imagine you could end 
up with slightly different balance in specific scenarios). Originally I changed 
the test to have a number of total buckets across all members that was evenly 
divisible by the number of datastores - but this felt like a hardcoded trap for 
anyone trying to modify the test in the future. In the end I decided that it 
was good enough to determine that the count was incrementing to a consistent 
non-zero value even if I couldn't make sure that value matched up to a 
hardcoded expectation - but if you feel this is insufficient I can take another 
shot at this check.

##########
File path: 
geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearDUnitTest.java
##########
@@ -330,6 +337,101 @@ public void 
normalClearFromAccessorWithoutWriterButWithWriterOnDataStore() {
         .isEqualTo(0);
   }
 
+  @Test
+  public void normalClearFromDataStoreUpdatesStats() {
+    configureServers(false, true);
+    client1.invoke(this::initClientCache);
+    client2.invoke(this::initClientCache);
+
+    // Verify no clears have been recorded in stats
+    dataStore1.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+
+      for (BucketRegion bucket : 
region.getDataStore().getAllLocalBucketRegions()) {
+        long clearCount = bucket.getCachePerfStats().getRegionClearCount();
+        assertThat(clearCount).isEqualTo(0);
+      }
+    });
+
+    accessor.invoke(() -> feed(false));
+    verifyServerRegionSize(NUM_ENTRIES);
+    dataStore1.invoke(() -> getRegion(false).clear());
+    verifyServerRegionSize(0);
+
+    // Verify the stats were properly updated for the bucket regions
+    dataStore1.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+      long clearCount = 0L;
+
+      for (BucketRegion bucket : 
region.getDataStore().getAllLocalBucketRegions()) {
+        if (clearCount == 0) {
+          clearCount = bucket.getCachePerfStats().getBucketClearCount();
+        }
+        
assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount)
+            .isNotEqualTo(0);
+      }
+
+      
assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1);
+      
assertThat(region.getCachePerfStats().getPartitionedRegionClearDuration()).isGreaterThan(0);
+    });
+
+    dataStore2.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+      long clearCount = 0L;
+
+      for (BucketRegion bucket : 
region.getDataStore().getAllLocalBucketRegions()) {
+        if (clearCount == 0) {
+          clearCount = bucket.getCachePerfStats().getBucketClearCount();
+        }
+        
assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount)
+            .isNotEqualTo(0);
+      }
+
+      
assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1);

Review comment:
       Good call - done.

##########
File path: 
geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearDUnitTest.java
##########
@@ -330,6 +337,101 @@ public void 
normalClearFromAccessorWithoutWriterButWithWriterOnDataStore() {
         .isEqualTo(0);
   }
 
+  @Test
+  public void normalClearFromDataStoreUpdatesStats() {
+    configureServers(false, true);
+    client1.invoke(this::initClientCache);
+    client2.invoke(this::initClientCache);
+
+    // Verify no clears have been recorded in stats
+    dataStore1.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+
+      for (BucketRegion bucket : 
region.getDataStore().getAllLocalBucketRegions()) {
+        long clearCount = bucket.getCachePerfStats().getRegionClearCount();
+        assertThat(clearCount).isEqualTo(0);
+      }
+    });
+
+    accessor.invoke(() -> feed(false));
+    verifyServerRegionSize(NUM_ENTRIES);
+    dataStore1.invoke(() -> getRegion(false).clear());
+    verifyServerRegionSize(0);
+
+    // Verify the stats were properly updated for the bucket regions
+    dataStore1.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+      long clearCount = 0L;
+
+      for (BucketRegion bucket : 
region.getDataStore().getAllLocalBucketRegions()) {
+        if (clearCount == 0) {
+          clearCount = bucket.getCachePerfStats().getBucketClearCount();
+        }
+        
assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount)
+            .isNotEqualTo(0);
+      }
+
+      
assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1);
+      
assertThat(region.getCachePerfStats().getPartitionedRegionClearDuration()).isGreaterThan(0);
+    });
+
+    dataStore2.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+      long clearCount = 0L;
+
+      for (BucketRegion bucket : 
region.getDataStore().getAllLocalBucketRegions()) {
+        if (clearCount == 0) {
+          clearCount = bucket.getCachePerfStats().getBucketClearCount();
+        }
+        
assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount)
+            .isNotEqualTo(0);
+      }
+
+      
assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1);
+    });
+
+    dataStore3.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+      long clearCount = 0L;
+
+      for (BucketRegion bucket : 
region.getDataStore().getAllLocalBucketRegions()) {
+        if (clearCount == 0) {
+          clearCount = bucket.getCachePerfStats().getBucketClearCount();
+        }
+        
assertThat(bucket.getCachePerfStats().getBucketClearCount()).isEqualTo(clearCount)
+            .isNotEqualTo(0);
+      }
+
+      
assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(1);
+    });
+
+
+    // The accessor shouldn't increment the region clear count
+    accessor.invoke(() -> {
+      PartitionedRegion region = (PartitionedRegion) getRegion(false);
+      
assertThat(region.getCachePerfStats().getRegionClearCount()).isEqualTo(0);
+    });
+
+    // do the region destroy to compare that the same callbacks will be 
triggered
+    dataStore1.invoke(() -> {
+      Region region = getRegion(false);
+      region.destroyRegion();
+    });
+
+    
assertThat(dataStore1.invoke(getWriterDestroys)).isEqualTo(dataStore1.invoke(getWriterClears))
+        .isEqualTo(0);
+    
assertThat(dataStore2.invoke(getWriterDestroys)).isEqualTo(dataStore2.invoke(getWriterClears))
+        .isEqualTo(0);
+    
assertThat(dataStore3.invoke(getWriterDestroys)).isEqualTo(dataStore3.invoke(getWriterClears))
+        .isEqualTo(0);
+    
assertThat(accessor.invoke(getWriterDestroys)).isEqualTo(accessor.invoke(getWriterClears))
+        .isEqualTo(1);
+
+    assertThat(accessor.invoke(getBucketRegionWriterDestroys))
+        .isEqualTo(accessor.invoke(getBucketRegionWriterClears))
+        .isEqualTo(0);

Review comment:
       These assertions mostly just check that the cache writer behaved as 
expected. While they're not specifically necessary, they're present in every 
other test in this suite so I opted to leave them in.

##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java
##########
@@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean 
cacheWrite) {
           releaseLockForClear(regionEvent);
         }
       }
-
     } finally {
       releaseDistributedClearLock(lockName);
+      incClearDuration(System.nanoTime() - clearStartTime);
+    }
+  }
+
+  void incClearCount() {
+    if (partitionedRegion != null && partitionedRegion.getDataStore() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() 
!= 0) {
+      CachePerfStats stats = partitionedRegion.getCachePerfStats();
+      if (stats != null) {
+        logger.info("BR inc PR Region count:" + stats.getClass().getName() + 
":"
+            + partitionedRegion.getFullPath(), new Exception());
+        stats.incClearCount();
+      }
     }
   }
 
+  void incClearDuration(long durationNanos) {
+    if (partitionedRegion != null && 
partitionedRegion.getTotalNumberOfBuckets() != 0) {

Review comment:
       Same reason as above, but I wanted to add that this was simpler because 
in the original PR this call was only happening on the coordinator. Now this 
can happen on any member so I've changed it to check everything that was being 
checked for the clear count.

##########
File path: 
geode-core/src/main/java/org/apache/geode/internal/cache/PartitionedRegionClear.java
##########
@@ -367,12 +370,38 @@ void doClear(RegionEventImpl regionEvent, boolean 
cacheWrite) {
           releaseLockForClear(regionEvent);
         }
       }
-
     } finally {
       releaseDistributedClearLock(lockName);
+      incClearDuration(System.nanoTime() - clearStartTime);
+    }
+  }
+
+  void incClearCount() {
+    if (partitionedRegion != null && partitionedRegion.getDataStore() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions() != null
+        && partitionedRegion.getDataStore().getAllLocalBucketRegions().size() 
!= 0) {

Review comment:
       It could, but that pattern would have to be followed if this clear count 
was ever incremented from anywhere else in the code base, as it would be 
reliant upon being called within a conditional. Is that preferable? Personally, 
I feel like having the conditions - while a little messy - communicates the 
requirements of the method pretty clearly and while I wouldn't necessarily 
promote that approach for a more complex method which is already very busy with 
functionality, this method is simple enough to read even with the conditions. 
Aside from that I think what you're suggesting is certainly possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Clear in Partitioned Region should have its own stats
> -----------------------------------------------------
>
>                 Key: GEODE-7846
>                 URL: https://issues.apache.org/jira/browse/GEODE-7846
>             Project: Geode
>          Issue Type: Improvement
>          Components: core
>            Reporter: Xiaojian Zhou
>            Priority: Major
>              Labels: GeodeCommons, GeodeOperationAPI, pull-request-available
>
> Clear operation in PR should have its own stats: 
> 1) clear operation executed. 
> 2) clear operation failed
> 3) clear messages sends to buckets



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEODE-7846) Clear in Partitioned Region should have its own stats

Reply via email to