Kirk Lund created GEODE-10323: --------------------------------- Summary: OffHeapStorageJUnitTest testCreateOffHeapStorage fails with AssertionError: expected:<100> but was:<1048576> Key: GEODE-10323 URL: https://issues.apache.org/jira/browse/GEODE-10323 Project: Geode Issue Type: Bug Components: offheap Reporter: Kirk Lund
{{OffHeapStorageJUnitTest testCreateOffHeapStorage}} started failing intermittently due to commit a350ed2 which went in as "[GEODE-10087|https://issues.apache.org/jira/browse/GEODE-10087]: Enhance off-heap fragmentation visibility. ([#7407|https://github.com/apache/geode/pull/7407/commits/1640effc5c760afa8d9ec4c2743950bb1de0ef8f])". The failure stack looks like: {noformat} OffHeapStorageJUnitTest > testCreateOffHeapStorage FAILED java.lang.AssertionError: expected:<100> but was:<1048576> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at org.apache.geode.internal.offheap.OffHeapStorageJUnitTest.testCreateOffHeapStorage(OffHeapStorageJUnitTest.java:220) {noformat} The cause is in {{MemoryAllocatorImpl}}. A new scheduled executor {{updateNonRealTimeStatsExecutor}} was added near the bottom of the constructor which immediately gets scheduled to invoke {{freeList::updateNonRealTimeStats}} repeatedly at the specified frequency of {{updateOffHeapStatsFrequencyMs}} whenever an instance of {{MemoryAllocatorImpl}} is created. {{freeList::updateNonRealTimeStats}} then updates {{setLargestFragment}} and {{setFreedChunks}}. Scheduling or starting any sort of threads from a constructor is considered a bad practice and should only be done via some sort of activation method such as {{start()}} which can be invoked from the static factory method for {{MemoryAllocatorImpl}}. The unit tests would then continue to construct instances via a constructor that does NOT invoke {{start()}}. {{OffHeapStorageJUnitTest testCreateOffHeapStorage}}, and possibly other tests, is written with the assumption that no other thread or component can or will be updating the {{OffHeapStats}}. Hence it is then safe for the test to setLargestFragment and then assert that it has the value passed in: {noformat} 219: stats.setLargestFragment(100); 220: assertEquals(100, stats.getLargestFragment()); {noformat} Line 220 is the source of the assertion failure because {{updateNonRealTimeStatsExecutor}} is racing with the JUnit thread and changes the value of {{setLargestFragment}} immediately after the test sets the value to 100, but before the test invokes the assertion. Looking back at the [PR test results|https://cio.hdb.gemfire-ci.info/commits/pr/7407/junit?days=90] we can see that stress-new-test failed 5 times with this exact failure before being merged to develop: * http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646140652/ * http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646154134/ (x2) * http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646156997/ * http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646170346/ -- This message was sent by Atlassian Jira (v8.20.7#820007)