[
https://issues.apache.org/jira/browse/GEODE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirk Lund updated GEODE-10323:
------------------------------
Description:
[Please see 1st comment on this ticket below the description for recommendation
on how to fix.]
{{OffHeapStorageJUnitTest testCreateOffHeapStorage}} started failing
intermittently due to commit a350ed2 which went in as
"[GEODE-10087|https://issues.apache.org/jira/browse/GEODE-10087]: Enhance
off-heap fragmentation visibility.
([#7407|https://github.com/apache/geode/pull/7407/commits/1640effc5c760afa8d9ec4c2743950bb1de0ef8f])".
The failure stack looks like:
{noformat}
OffHeapStorageJUnitTest > testCreateOffHeapStorage FAILED
java.lang.AssertionError: expected:<100> but was:<1048576>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at
org.apache.geode.internal.offheap.OffHeapStorageJUnitTest.testCreateOffHeapStorage(OffHeapStorageJUnitTest.java:220)
{noformat}
The cause is in {{MemoryAllocatorImpl}}. A new scheduled executor
{{updateNonRealTimeStatsExecutor}} was added near the bottom of the constructor
which immediately gets scheduled to invoke {{freeList::updateNonRealTimeStats}}
repeatedly at the specified frequency of {{updateOffHeapStatsFrequencyMs}}
whenever an instance of {{MemoryAllocatorImpl}} is created.
{{freeList::updateNonRealTimeStats}} then updates {{setLargestFragment}} and
{{setFreedChunks}}.
Scheduling or starting any sort of threads from a constructor is considered a
bad practice and should only be done via some sort of activation method such as
{{start()}} which can be invoked from the static factory method for
{{MemoryAllocatorImpl}}. The unit tests would then continue to construct
instances via a constructor that does NOT invoke {{start()}}.
{{OffHeapStorageJUnitTest testCreateOffHeapStorage}}, and possibly other tests,
is written with the assumption that no other thread or component can or will be
updating the {{OffHeapStats}}. Hence it is then safe for the test to
setLargestFragment and then assert that it has the value passed in:
{noformat}
219: stats.setLargestFragment(100);
220: assertEquals(100, stats.getLargestFragment());
{noformat}
Line 220 is the source of the assertion failure because
{{updateNonRealTimeStatsExecutor}} is racing with the JUnit thread and changes
the value of {{setLargestFragment}} immediately after the test sets the value
to 100, but before the test invokes the assertion.
Looking back at the [PR test
results|https://cio.hdb.gemfire-ci.info/commits/pr/7407/junit?days=90] we can
see that stress-new-test failed 5 times with this exact failure before being
merged to develop:
*
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646140652/
*
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646154134/
(x2)
*
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646156997/
*
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646170346/
was:
{{OffHeapStorageJUnitTest testCreateOffHeapStorage}} started failing
intermittently due to commit a350ed2 which went in as
"[GEODE-10087|https://issues.apache.org/jira/browse/GEODE-10087]: Enhance
off-heap fragmentation visibility.
([#7407|https://github.com/apache/geode/pull/7407/commits/1640effc5c760afa8d9ec4c2743950bb1de0ef8f])".
The failure stack looks like:
{noformat}
OffHeapStorageJUnitTest > testCreateOffHeapStorage FAILED
java.lang.AssertionError: expected:<100> but was:<1048576>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at
org.apache.geode.internal.offheap.OffHeapStorageJUnitTest.testCreateOffHeapStorage(OffHeapStorageJUnitTest.java:220)
{noformat}
The cause is in {{MemoryAllocatorImpl}}. A new scheduled executor
{{updateNonRealTimeStatsExecutor}} was added near the bottom of the constructor
which immediately gets scheduled to invoke {{freeList::updateNonRealTimeStats}}
repeatedly at the specified frequency of {{updateOffHeapStatsFrequencyMs}}
whenever an instance of {{MemoryAllocatorImpl}} is created.
{{freeList::updateNonRealTimeStats}} then updates {{setLargestFragment}} and
{{setFreedChunks}}.
Scheduling or starting any sort of threads from a constructor is considered a
bad practice and should only be done via some sort of activation method such as
{{start()}} which can be invoked from the static factory method for
{{MemoryAllocatorImpl}}. The unit tests would then continue to construct
instances via a constructor that does NOT invoke {{start()}}.
{{OffHeapStorageJUnitTest testCreateOffHeapStorage}}, and possibly other tests,
is written with the assumption that no other thread or component can or will be
updating the {{OffHeapStats}}. Hence it is then safe for the test to
setLargestFragment and then assert that it has the value passed in:
{noformat}
219: stats.setLargestFragment(100);
220: assertEquals(100, stats.getLargestFragment());
{noformat}
Line 220 is the source of the assertion failure because
{{updateNonRealTimeStatsExecutor}} is racing with the JUnit thread and changes
the value of {{setLargestFragment}} immediately after the test sets the value
to 100, but before the test invokes the assertion.
Looking back at the [PR test
results|https://cio.hdb.gemfire-ci.info/commits/pr/7407/junit?days=90] we can
see that stress-new-test failed 5 times with this exact failure before being
merged to develop:
*
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646140652/
*
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646154134/
(x2)
*
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646156997/
*
http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646170346/
> OffHeapStorageJUnitTest testCreateOffHeapStorage fails with AssertionError:
> expected:<100> but was:<1048576>
> ------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-10323
> URL: https://issues.apache.org/jira/browse/GEODE-10323
> Project: Geode
> Issue Type: Bug
> Components: offheap
> Reporter: Kirk Lund
> Priority: Major
> Labels: needsTriage
>
> [Please see 1st comment on this ticket below the description for
> recommendation on how to fix.]
> {{OffHeapStorageJUnitTest testCreateOffHeapStorage}} started failing
> intermittently due to commit a350ed2 which went in as
> "[GEODE-10087|https://issues.apache.org/jira/browse/GEODE-10087]: Enhance
> off-heap fragmentation visibility.
> ([#7407|https://github.com/apache/geode/pull/7407/commits/1640effc5c760afa8d9ec4c2743950bb1de0ef8f])".
> The failure stack looks like:
> {noformat}
> OffHeapStorageJUnitTest > testCreateOffHeapStorage FAILED
> java.lang.AssertionError: expected:<100> but was:<1048576>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at
> org.apache.geode.internal.offheap.OffHeapStorageJUnitTest.testCreateOffHeapStorage(OffHeapStorageJUnitTest.java:220)
> {noformat}
> The cause is in {{MemoryAllocatorImpl}}. A new scheduled executor
> {{updateNonRealTimeStatsExecutor}} was added near the bottom of the
> constructor which immediately gets scheduled to invoke
> {{freeList::updateNonRealTimeStats}} repeatedly at the specified frequency of
> {{updateOffHeapStatsFrequencyMs}} whenever an instance of
> {{MemoryAllocatorImpl}} is created.
> {{freeList::updateNonRealTimeStats}} then updates {{setLargestFragment}} and
> {{setFreedChunks}}.
> Scheduling or starting any sort of threads from a constructor is considered a
> bad practice and should only be done via some sort of activation method such
> as {{start()}} which can be invoked from the static factory method for
> {{MemoryAllocatorImpl}}. The unit tests would then continue to construct
> instances via a constructor that does NOT invoke {{start()}}.
> {{OffHeapStorageJUnitTest testCreateOffHeapStorage}}, and possibly other
> tests, is written with the assumption that no other thread or component can
> or will be updating the {{OffHeapStats}}. Hence it is then safe for the test
> to setLargestFragment and then assert that it has the value passed in:
> {noformat}
> 219: stats.setLargestFragment(100);
> 220: assertEquals(100, stats.getLargestFragment());
> {noformat}
> Line 220 is the source of the assertion failure because
> {{updateNonRealTimeStatsExecutor}} is racing with the JUnit thread and
> changes the value of {{setLargestFragment}} immediately after the test sets
> the value to 100, but before the test invokes the assertion.
> Looking back at the [PR test
> results|https://cio.hdb.gemfire-ci.info/commits/pr/7407/junit?days=90] we can
> see that stress-new-test failed 5 times with this exact failure before being
> merged to develop:
> *
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646140652/
> *
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646154134/
> (x2)
> *
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646156997/
> *
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646170346/
--
This message was sent by Atlassian Jira
(v8.20.7#820007)