[jira] [Commented] (GEODE-10323) OffHeapStorageJUnitTest testCreateOffHeapStorage fails with AssertionError: expected:<100> but was:<1048576>

2022-07-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567111#comment-17567111
 ] 

ASF subversion and git services commented on GEODE-10323:
-

Commit e67e38e5374282fb2fae868dbc2f9aa3dbe935fe in geode's branch 
refs/heads/develop from Alberto Gomez
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=e67e38e537 ]

GEODE-10323: Add small changes after review (#7819)



> OffHeapStorageJUnitTest testCreateOffHeapStorage fails with AssertionError: 
> expected:<100> but was:<1048576>
> 
>
> Key: GEODE-10323
> URL: https://issues.apache.org/jira/browse/GEODE-10323
> Project: Geode
>  Issue Type: Bug
>  Components: offheap
>Affects Versions: 1.16.0
>Reporter: Kirk Lund
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
>
> [Please see 1st comment on this ticket below the description for 
> recommendation on how to fix.]
> {{OffHeapStorageJUnitTest testCreateOffHeapStorage}} started failing 
> intermittently due to commit a350ed2 which went in as 
> "[GEODE-10087|https://issues.apache.org/jira/browse/GEODE-10087]: Enhance 
> off-heap fragmentation visibility. 
> ([#7407|https://github.com/apache/geode/pull/7407/commits/1640effc5c760afa8d9ec4c2743950bb1de0ef8f])".
> The failure stack looks like:
> {noformat}
> OffHeapStorageJUnitTest > testCreateOffHeapStorage FAILED
> java.lang.AssertionError: expected:<100> but was:<1048576>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at 
> org.apache.geode.internal.offheap.OffHeapStorageJUnitTest.testCreateOffHeapStorage(OffHeapStorageJUnitTest.java:220)
> {noformat}
> The cause is in {{MemoryAllocatorImpl}}. A new scheduled executor 
> {{updateNonRealTimeStatsExecutor}} was added near the bottom of the 
> constructor which immediately gets scheduled to invoke 
> {{freeList::updateNonRealTimeStats}} repeatedly at the specified frequency of 
> {{updateOffHeapStatsFrequencyMs}} whenever an instance of 
> {{MemoryAllocatorImpl}} is created.
> {{freeList::updateNonRealTimeStats}} then updates {{setLargestFragment}} and 
> {{setFreedChunks}}.
> Scheduling or starting any sort of threads from a constructor is considered a 
> bad practice and should only be done via some sort of activation method such 
> as {{start()}} which can be invoked from the static factory method for 
> {{MemoryAllocatorImpl}}. The unit tests would then continue to construct 
> instances via a constructor that does NOT invoke {{start()}}.
> {{OffHeapStorageJUnitTest testCreateOffHeapStorage}}, and possibly other 
> tests, is written with the assumption that no other thread or component can 
> or will be updating the {{OffHeapStats}}. Hence it is then safe for the test 
> to setLargestFragment and then assert that it has the value passed in:
> {noformat}
> 219:  stats.setLargestFragment(100);
> 220:  assertEquals(100, stats.getLargestFragment());
> {noformat}
> Line 220 is the source of the assertion failure because 
> {{updateNonRealTimeStatsExecutor}} is racing with the JUnit thread and 
> changes the value of {{setLargestFragment}} immediately after the test sets 
> the value to 100, but before the test invokes the assertion.
> Looking back at the [PR test 
> results|https://cio.hdb.gemfire-ci.info/commits/pr/7407/junit?days=90] we can 
> see that stress-new-test failed 5 times with this exact failure before being 
> merged to develop:
> * 
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646140652/
> * 
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646154134/
>  (x2)
> * 
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646156997/
> * 
> http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-7407/test-results/repeatTest/1646170346/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (GEODE-10344) Send alert when thread stuck for longer than configured time

2022-07-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567112#comment-17567112
 ] 

ASF subversion and git services commented on GEODE-10344:
-

Commit acdc18e31e5ea6e7e34d0d3475ea17e8efc5b65d in geode's branch 
refs/heads/develop from Alberto Gomez
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=acdc18e31e ]

GEODE-10344: Send alert when thread stuck for long (#7747)

This is the implementation of the feature
described in RFC:
https://cwiki.apache.org/confluence/display/GEODE/Management+of+threads+stuck+for+a+long+time+in+Geode

> Send alert when thread stuck for longer than configured time
> 
>
> Key: GEODE-10344
> URL: https://issues.apache.org/jira/browse/GEODE-10344
> Project: Geode
>  Issue Type: New Feature
>  Components: management
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
>
> As per RFC 
> [https://cwiki.apache.org/confluence/display/GEODE/Management+of+threads+stuck+for+a+long+time+in+Geode],
>  it is proposed to send an alert when a thread in a member has been stuck for 
> a maximum configurable value.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (GEODE-10344) Send alert when thread stuck for longer than configured time

2022-07-14 Thread Alberto Gomez (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto Gomez resolved GEODE-10344.
---
Fix Version/s: 1.16.0
   Resolution: Fixed

> Send alert when thread stuck for longer than configured time
> 
>
> Key: GEODE-10344
> URL: https://issues.apache.org/jira/browse/GEODE-10344
> Project: Geode
>  Issue Type: New Feature
>  Components: management
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> As per RFC 
> [https://cwiki.apache.org/confluence/display/GEODE/Management+of+threads+stuck+for+a+long+time+in+Geode],
>  it is proposed to send an alert when a thread in a member has been stuck for 
> a maximum configurable value.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)