[jira] [Assigned] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers

2020-11-05 Thread Jakov Varenina (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina reassigned GEODE-8687:
-

Assignee: Jakov Varenina

> Durable client is continuously re-registering CQs on all servers when event 
> de-serialization fails causing resource exhaustion on servers 
> --
>
> Key: GEODE-8687
> URL: https://issues.apache.org/jira/browse/GEODE-8687
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Affects Versions: 1.13.0
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
> Attachments: deserialzationFault.log
>
>
> When ReflectionBasedAutoSerializer is wrongly/not set it results with 
> serialization exception on client at the reception of the CQ events. 
> Serialization exception isn't logged which is misleading, and is hard to find 
> that actually ReflectionBasedAutoSerializer isn't set correctly. Only log 
> that can be seen is that client/servers subscription connections are closed 
> due to EOF. This is because client destroys subscriptions connections 
> intentionally, but doesn't log reason (PdxSerializationException) that led to 
> this. It would be good that serialization exceptions are logged as error or 
> warn.
> Client destroys subscription connection and perform server fail-over whenever 
> serialization issue occurs. Additionally when subscription connection for 
> particular server fails multiple times then this server is put in deny list 
> for 10 seconds (this is configurable with {{ping-interval}}). After 10s 
> expire the server is removed from list and it is available for subscription 
> connection which will be destroyed again due serialization issue. This will 
> go indefinitely and approx. every 10s in this case the client subscribes to 
> each servers at least once. Due to serialization issue events aren't sent to 
> client and remain in subscription queues.
> Whenever connection fails due to serialization issue and client is not 
> durable then subscription queue is closed and events are lost.
> The biggest problem arises when client is durable. This is because 
> subscription queue remains on server for configurable period of time (e.g. 
> 300s) waiting for client to reconnect. When client perform fail-over to 
> another server it will create new subscription queue using initial image from 
> old queue that is currently paused. This means that all events from old queue 
> will be transferred to new subscription queue hosted by the current primary 
> server. This will happen on all servers and all of them will have copy of the 
> queue even subscription redundancy isn't configured. The problem here is that 
> client will periodically (every 10s in this case) establish connection to 
> each servers, so configured timeout (e.g. 300s) will never expire, but it 
> will be renewed each time client is registered. This could cause a lots of 
> problems since memory and disk usage (if overflow on queue is configured) 
> will increase on all servers.
> You can find in attached logs for the problematic case with durable client :
> vm0          -> locator
> vm1, vm2   -> servers
> vm3  -> durable client with enabled subscription handling CQ 
> events
> vm4              -> client generating traffic that should trigger registered 
> CQ
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226620#comment-17226620
 ] 

ASF GitHub Bot commented on GEODE-8547:
---

mivanac merged pull request #5567:
URL: https://github.com/apache/geode/pull/5567


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Command "show missing-disk-stores" not working, when all servers are down
> -
>
> Key: GEODE-8547
> URL: https://issues.apache.org/jira/browse/GEODE-8547
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
>
> If cluster with 2 locators and 2 servers was ungracefully shutdown it can 
> happen that locators that are able to start up are not having most recent 
> data to bring up Cluster Configuration Service.
> If we excute command "show missing-disk-stores" it will not work, since all 
> servers are down,
> so we are stuck in this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down

2020-11-05 Thread Mario Ivanac (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mario Ivanac resolved GEODE-8547.
-
Fix Version/s: 1.14.0
   Resolution: Fixed

> Command "show missing-disk-stores" not working, when all servers are down
> -
>
> Key: GEODE-8547
> URL: https://issues.apache.org/jira/browse/GEODE-8547
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.14.0
>
>
> If cluster with 2 locators and 2 servers was ungracefully shutdown it can 
> happen that locators that are able to start up are not having most recent 
> data to bring up Cluster Configuration Service.
> If we excute command "show missing-disk-stores" it will not work, since all 
> servers are down,
> so we are stuck in this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226626#comment-17226626
 ] 

ASF subversion and git services commented on GEODE-8547:


Commit 7cc14eef52e06fe1e8c56bd766df56297b9c9ff8 in geode's branch 
refs/heads/develop from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=7cc14ee ]

GEODE-8547: Added impacts to show missing disk-stores (#5567)

* GEODE-8547: Added impacts to show missing disk-stores

* GEODE-8547: Added DUnit test

* GEODE-8547: update after comments

* GEODE-8547: remove unused variables

* GEODE-8547: update test

> Command "show missing-disk-stores" not working, when all servers are down
> -
>
> Key: GEODE-8547
> URL: https://issues.apache.org/jira/browse/GEODE-8547
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.14.0
>
>
> If cluster with 2 locators and 2 servers was ungracefully shutdown it can 
> happen that locators that are able to start up are not having most recent 
> data to bring up Cluster Configuration Service.
> If we excute command "show missing-disk-stores" it will not work, since all 
> servers are down,
> so we are stuck in this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226621#comment-17226621
 ] 

ASF subversion and git services commented on GEODE-8547:


Commit 7cc14eef52e06fe1e8c56bd766df56297b9c9ff8 in geode's branch 
refs/heads/develop from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=7cc14ee ]

GEODE-8547: Added impacts to show missing disk-stores (#5567)

* GEODE-8547: Added impacts to show missing disk-stores

* GEODE-8547: Added DUnit test

* GEODE-8547: update after comments

* GEODE-8547: remove unused variables

* GEODE-8547: update test

> Command "show missing-disk-stores" not working, when all servers are down
> -
>
> Key: GEODE-8547
> URL: https://issues.apache.org/jira/browse/GEODE-8547
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.14.0
>
>
> If cluster with 2 locators and 2 servers was ungracefully shutdown it can 
> happen that locators that are able to start up are not having most recent 
> data to bring up Cluster Configuration Service.
> If we excute command "show missing-disk-stores" it will not work, since all 
> servers are down,
> so we are stuck in this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226623#comment-17226623
 ] 

ASF subversion and git services commented on GEODE-8547:


Commit 7cc14eef52e06fe1e8c56bd766df56297b9c9ff8 in geode's branch 
refs/heads/develop from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=7cc14ee ]

GEODE-8547: Added impacts to show missing disk-stores (#5567)

* GEODE-8547: Added impacts to show missing disk-stores

* GEODE-8547: Added DUnit test

* GEODE-8547: update after comments

* GEODE-8547: remove unused variables

* GEODE-8547: update test

> Command "show missing-disk-stores" not working, when all servers are down
> -
>
> Key: GEODE-8547
> URL: https://issues.apache.org/jira/browse/GEODE-8547
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.14.0
>
>
> If cluster with 2 locators and 2 servers was ungracefully shutdown it can 
> happen that locators that are able to start up are not having most recent 
> data to bring up Cluster Configuration Service.
> If we excute command "show missing-disk-stores" it will not work, since all 
> servers are down,
> so we are stuck in this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226625#comment-17226625
 ] 

ASF subversion and git services commented on GEODE-8547:


Commit 7cc14eef52e06fe1e8c56bd766df56297b9c9ff8 in geode's branch 
refs/heads/develop from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=7cc14ee ]

GEODE-8547: Added impacts to show missing disk-stores (#5567)

* GEODE-8547: Added impacts to show missing disk-stores

* GEODE-8547: Added DUnit test

* GEODE-8547: update after comments

* GEODE-8547: remove unused variables

* GEODE-8547: update test

> Command "show missing-disk-stores" not working, when all servers are down
> -
>
> Key: GEODE-8547
> URL: https://issues.apache.org/jira/browse/GEODE-8547
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.14.0
>
>
> If cluster with 2 locators and 2 servers was ungracefully shutdown it can 
> happen that locators that are able to start up are not having most recent 
> data to bring up Cluster Configuration Service.
> If we excute command "show missing-disk-stores" it will not work, since all 
> servers are down,
> so we are stuck in this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226624#comment-17226624
 ] 

ASF subversion and git services commented on GEODE-8547:


Commit 7cc14eef52e06fe1e8c56bd766df56297b9c9ff8 in geode's branch 
refs/heads/develop from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=7cc14ee ]

GEODE-8547: Added impacts to show missing disk-stores (#5567)

* GEODE-8547: Added impacts to show missing disk-stores

* GEODE-8547: Added DUnit test

* GEODE-8547: update after comments

* GEODE-8547: remove unused variables

* GEODE-8547: update test

> Command "show missing-disk-stores" not working, when all servers are down
> -
>
> Key: GEODE-8547
> URL: https://issues.apache.org/jira/browse/GEODE-8547
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.14.0
>
>
> If cluster with 2 locators and 2 servers was ungracefully shutdown it can 
> happen that locators that are able to start up are not having most recent 
> data to bring up Cluster Configuration Service.
> If we excute command "show missing-disk-stores" it will not work, since all 
> servers are down,
> so we are stuck in this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8547) Command "show missing-disk-stores" not working, when all servers are down

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226622#comment-17226622
 ] 

ASF subversion and git services commented on GEODE-8547:


Commit 7cc14eef52e06fe1e8c56bd766df56297b9c9ff8 in geode's branch 
refs/heads/develop from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=7cc14ee ]

GEODE-8547: Added impacts to show missing disk-stores (#5567)

* GEODE-8547: Added impacts to show missing disk-stores

* GEODE-8547: Added DUnit test

* GEODE-8547: update after comments

* GEODE-8547: remove unused variables

* GEODE-8547: update test

> Command "show missing-disk-stores" not working, when all servers are down
> -
>
> Key: GEODE-8547
> URL: https://issues.apache.org/jira/browse/GEODE-8547
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.14.0
>
>
> If cluster with 2 locators and 2 servers was ungracefully shutdown it can 
> happen that locators that are able to start up are not having most recent 
> data to bring up Cluster Configuration Service.
> If we excute command "show missing-disk-stores" it will not work, since all 
> servers are down,
> so we are stuck in this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8688) Flaxy C++ Native client integration test cases: PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop

2020-11-05 Thread Alberto Gomez (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto Gomez reassigned GEODE-8688:


Assignee: Alberto Gomez

> Flaxy C++ Native client integration test cases: 
> PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop
> --
>
> Key: GEODE-8688
> URL: https://issues.apache.org/jira/browse/GEODE-8688
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Affects Versions: 1.13.0
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>
> The following test cases for the C++ native client are flaky:
> PartitionRegionOpsTest.getPartitionedRegionWithRedundancyServerGoesDownSingleHop
> PartitionRegionOpsTest.putPartitionedRegionWithRedundancyServerGoesDownSingleHop
>  
> They fail very often when run in CI although I have not seen them fail when 
> executed manually.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8688) Flaxy C++ Native client integration test cases: PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop

2020-11-05 Thread Alberto Gomez (Jira)
Alberto Gomez created GEODE-8688:


 Summary: Flaxy C++ Native client integration test cases: 
PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop
 Key: GEODE-8688
 URL: https://issues.apache.org/jira/browse/GEODE-8688
 Project: Geode
  Issue Type: Bug
  Components: native client
Affects Versions: 1.13.0
Reporter: Alberto Gomez


The following test cases for the C++ native client are flaky:

PartitionRegionOpsTest.getPartitionedRegionWithRedundancyServerGoesDownSingleHop

PartitionRegionOpsTest.putPartitionedRegionWithRedundancyServerGoesDownSingleHop

 

They fail very often when run in CI although I have not seen them fail when 
executed manually.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8688) Flaxy C++ Native client integration test cases: PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226730#comment-17226730
 ] 

ASF GitHub Bot commented on GEODE-8688:
---

albertogpz opened a new pull request #686:
URL: https://github.com/apache/geode-native/pull/686


   The following integration test cases under
   integration/test (new integration tests)
   ar flaky (do not
   fail normally when run locally but fail very often
   when run in CI).
   
   - 
PartitionRegionOpsTest.getPartitionedRegionWithRedundancyServerGoesDownSingleHop
   - 
PartitionRegionOpsTest.putPartitionedRegionWithRedundancyServerGoesDownSingleHop
   
   There were two reasons that can make them fail.
   
   One of them is that sometimes the connections to the server have expired
   before the server is restarted and therefore, when traffic is sent
   to the restarted server, no errors are found. To fix this,
   the pool configuration for the test client
   has been changed so that connections do not expire.
   
   The other reason is that sometimes the error in the connection is
   found by the ping thread that is invoking the
   ThinClientPoolDM::sendRequestToEP() method and in this method,
   when the IO error or TIMEOUT error are encountered,
   the endpoint is not removed from the metadata (by means of the
   removeBucketServerLocation method).
   The code has been updated to remove the metadata also in this
   case.
   
   With these two changes, the test cases are not flaky anymore.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaxy C++ Native client integration test cases: 
> PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop
> --
>
> Key: GEODE-8688
> URL: https://issues.apache.org/jira/browse/GEODE-8688
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Affects Versions: 1.13.0
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>
> The following test cases for the C++ native client are flaky:
> PartitionRegionOpsTest.getPartitionedRegionWithRedundancyServerGoesDownSingleHop
> PartitionRegionOpsTest.putPartitionedRegionWithRedundancyServerGoesDownSingleHop
>  
> They fail very often when run in CI although I have not seen them fail when 
> executed manually.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8688) Flaxy C++ Native client integration test cases: PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop

2020-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-8688:
--
Labels: pull-request-available  (was: )

> Flaxy C++ Native client integration test cases: 
> PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop
> --
>
> Key: GEODE-8688
> URL: https://issues.apache.org/jira/browse/GEODE-8688
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Affects Versions: 1.13.0
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
>
> The following test cases for the C++ native client are flaky:
> PartitionRegionOpsTest.getPartitionedRegionWithRedundancyServerGoesDownSingleHop
> PartitionRegionOpsTest.putPartitionedRegionWithRedundancyServerGoesDownSingleHop
>  
> They fail very often when run in CI although I have not seen them fail when 
> executed manually.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8689) CI Failure: DistributedAckPersistentRegionCCEDUnitTest > testConcurrentEventsOnEmptyRegion FAILED

2020-11-05 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226737#comment-17226737
 ] 

Geode Integration commented on GEODE-8689:
--

Seen in [DistributedTestOpenJDK11 
#574|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/574]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0467/test-results/distributedTest/1604579602/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0467/test-artifacts/1604579602/distributedtestfiles-OpenJDK11-1.14.0-build.0467.tgz].

> CI Failure: DistributedAckPersistentRegionCCEDUnitTest > 
> testConcurrentEventsOnEmptyRegion FAILED
> -
>
> Key: GEODE-8689
> URL: https://issues.apache.org/jira/browse/GEODE-8689
> Project: Geode
>  Issue Type: Bug
>Reporter: Sarah Abbey
>Priority: Major
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/574
> {code:java}
> org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait for 
> the members to eventually be consistent' didn't complete within 5 minutes 
> because assertion condition defined as a lambda expression in 
> org.apache.geode.cache30.MultiVMRegionTestCase [region contents are not 
> consistent for cckey7] expected:<"ccvalue513398912"> but was:.
>   at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
>   at 
> org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
>   at 
> org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
>   at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
>   at 
> org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8689) CI Failure: DistributedAckPersistentRegionCCEDUnitTest > testConcurrentEventsOnEmptyRegion FAILED

2020-11-05 Thread Sarah Abbey (Jira)
Sarah Abbey created GEODE-8689:
--

 Summary: CI Failure: DistributedAckPersistentRegionCCEDUnitTest > 
testConcurrentEventsOnEmptyRegion FAILED
 Key: GEODE-8689
 URL: https://issues.apache.org/jira/browse/GEODE-8689
 Project: Geode
  Issue Type: Bug
Reporter: Sarah Abbey


https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/574


{code:java}
org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait for 
the members to eventually be consistent' didn't complete within 5 minutes 
because assertion condition defined as a lambda expression in 
org.apache.geode.cache30.MultiVMRegionTestCase [region contents are not 
consistent for cckey7] expected:<"ccvalue513398912"> but was:.
at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
at 
org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
at 
org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
at 
org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679)
...
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8537) Memory increases whenever LRU eviction is enabled

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226923#comment-17226923
 ] 

ASF GitHub Bot commented on GEODE-8537:
---

gaussianrecurrence opened a new pull request #687:
URL: https://github.com/apache/geode-native/pull/687


   - Whenever LRU eviction was enabled it was noted an slight increase in the 
memory usage.
 Specifically in an scenario in which a set of entries are continously 
created and destroyed.
   - Problem was that entries within LRUList where inserted but not removed 
until LRU
  eviction happened and in the described case above, that was never.
   - Solution was to replace the LRUList by a refactored version called 
LRUQueue and
 also to remove the entries from the queue upon destroy or invalidation.
   - Also a dead-lock between EvictionController and EvictionThread has been 
solved.
 However this part is asking for a refactor.
   - Unit tests have been added for the LRUQueue.
   - Integration test have been added in the new integration test for the LRU 
eviction.
   - Also a wrongly implemented LRU eviction test was removed from the old 
integration tests.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Memory increases whenever LRU eviction is enabled
> -
>
> Key: GEODE-8537
> URL: https://issues.apache.org/jira/browse/GEODE-8537
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Affects Versions: 1.13.0
>Reporter: Mario Salazar de Torres
>Assignee: Mario Salazar de Torres
>Priority: Major
> Attachments: massif-8419.png, massif.out.8419
>
>
> *HAVING* configured concurrency-checks-enabled=false in the client-cache.xml 
> for a region
> *HAVING* configured heap-lru-limit=10 in the client-cache.xml for the region 
> region
> *HAVING* configured heap-lru-delta=10 in the client-cache.xml for the region 
> region
> *HAVING* configured subscription-notification for the pool on which the 
> region is defined
> *HAVING* regsitered interest on all the keys of this region, values included
> *AFTER* receiving lots of LOCA_CREATE and LOCAL_DESTROY notifications
> *THEN* memory increases continously over time, even going over the LRU limit.
> Find massif tool report as massif.out.8419 showing the memory increase.
> Also this is a capture of massif-visualizer for the report:
> !massif-8419.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8537) Memory increases whenever LRU eviction is enabled

2020-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-8537:
--
Labels: pull-request-available  (was: )

> Memory increases whenever LRU eviction is enabled
> -
>
> Key: GEODE-8537
> URL: https://issues.apache.org/jira/browse/GEODE-8537
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Affects Versions: 1.13.0
>Reporter: Mario Salazar de Torres
>Assignee: Mario Salazar de Torres
>Priority: Major
>  Labels: pull-request-available
> Attachments: massif-8419.png, massif.out.8419
>
>
> *HAVING* configured concurrency-checks-enabled=false in the client-cache.xml 
> for a region
> *HAVING* configured heap-lru-limit=10 in the client-cache.xml for the region 
> region
> *HAVING* configured heap-lru-delta=10 in the client-cache.xml for the region 
> region
> *HAVING* configured subscription-notification for the pool on which the 
> region is defined
> *HAVING* regsitered interest on all the keys of this region, values included
> *AFTER* receiving lots of LOCA_CREATE and LOCAL_DESTROY notifications
> *THEN* memory increases continously over time, even going over the LRU limit.
> Find massif tool report as massif.out.8419 showing the memory increase.
> Also this is a capture of massif-visualizer for the report:
> !massif-8419.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8666) Enforce warning no-non-virtual-dtor

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226926#comment-17226926
 ] 

ASF GitHub Bot commented on GEODE-8666:
---

gaussianrecurrence commented on pull request #680:
URL: https://github.com/apache/geode-native/pull/680#issuecomment-722570740


   > I'll close this after @gaussianrecurrence reports back with any learnings 
from why the ABI compliance tool says it is fine (when I also agree it is not). 
Will also update the JIRA story to reflect that it can't be done yet
   
   Sadly nobody seems to know the reason. I guess at least from my side will 
have to remain a mistery. If you happen to ever find out why, please let me 
know :S



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce warning no-non-virtual-dtor
> ---
>
> Key: GEODE-8666
> URL: https://issues.apache.org/jira/browse/GEODE-8666
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>  Labels: pull-request-available
>
> Given I compile the code without exempting no-non-virtual-dtor
> Then it should compile
> Note - was marked as a todo



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8614) Provide an specific client-side exception for server LowMemoryException

2020-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-8614:
--
Labels: pull-request-available  (was: )

> Provide an specific client-side exception for server LowMemoryException
> ---
>
> Key: GEODE-8614
> URL: https://issues.apache.org/jira/browse/GEODE-8614
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Affects Versions: 1.11.0, 1.12.0, 1.13.0
>Reporter: Mario Salazar de Torres
>Priority: Major
>  Labels: pull-request-available
>
> *AS AN* native client contributor
>  *I WANT* to have a client-side exception for LowMemoryException
>  *SO THAT* I can nofity accordingly from the client-side upon server 
> memory-depletion.
> —
> *Additional information*
>  This is the callstack of the LowMemoryException:
> {noformat}
> [error 2020/10/13 09:54:14.401405 UTC 140522117220352] Region::put: An 
> exception (org.apache.geode.cache.LowMemoryException: PartitionedRegion: 
> /part_a cannot process operation on key foo|0 because members 
> [192.168.240.14(dms-server-1:1):41000] are running low on memory
> at 
> org.apache.geode.internal.cache.partitioned.RegionAdvisor.checkIfBucketSick(RegionAdvisor.java:482)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.checkIfAboveThreshold(PartitionedRegion.java:2278)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.putInBucket(PartitionedRegion.java:2982)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.virtualPut(PartitionedRegion.java:2212)
> at 
> org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:170)
> at 
> org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5573)
> at 
> org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5533)
> at 
> org.apache.geode.internal.cache.LocalRegion.basicBridgePut(LocalRegion.java:5212)
> at 
> org.apache.geode.internal.cache.tier.sockets.command.Put65.cmdExecute(Put65.java:411)
> at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
> at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848)
> at 
> org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72)
> at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676)
> at 
> org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
> at java.base/java.lang.Thread.run(Thread.java:834) ) happened at remote 
> server.
> {noformat}
> Idea would be to modify *ThinClientRegion::handleServerException* in order to 
> return a new error and later on, map it to a new created exception
> *Suggestions*
>  The new exception could be called:
>  * CacheServerLowMemoryException
>  * ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8614) Provide an specific client-side exception for server LowMemoryException

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226929#comment-17226929
 ] 

ASF GitHub Bot commented on GEODE-8614:
---

gaussianrecurrence opened a new pull request #688:
URL: https://github.com/apache/geode-native/pull/688


   - Added LowMemoryException to be thrown in the client whenever the server
 runs out of memory.
   - Added QueryExecutionLowMemoryException to be thrown in the client whenever
 the monitoring queries feature on the server detects that the member is 
running
 low on memory.
   - Added UTs to verity error to exception translation is working.
   - Added new integration tests for both exceptions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Provide an specific client-side exception for server LowMemoryException
> ---
>
> Key: GEODE-8614
> URL: https://issues.apache.org/jira/browse/GEODE-8614
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Affects Versions: 1.11.0, 1.12.0, 1.13.0
>Reporter: Mario Salazar de Torres
>Priority: Major
>
> *AS AN* native client contributor
>  *I WANT* to have a client-side exception for LowMemoryException
>  *SO THAT* I can nofity accordingly from the client-side upon server 
> memory-depletion.
> —
> *Additional information*
>  This is the callstack of the LowMemoryException:
> {noformat}
> [error 2020/10/13 09:54:14.401405 UTC 140522117220352] Region::put: An 
> exception (org.apache.geode.cache.LowMemoryException: PartitionedRegion: 
> /part_a cannot process operation on key foo|0 because members 
> [192.168.240.14(dms-server-1:1):41000] are running low on memory
> at 
> org.apache.geode.internal.cache.partitioned.RegionAdvisor.checkIfBucketSick(RegionAdvisor.java:482)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.checkIfAboveThreshold(PartitionedRegion.java:2278)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.putInBucket(PartitionedRegion.java:2982)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.virtualPut(PartitionedRegion.java:2212)
> at 
> org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:170)
> at 
> org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5573)
> at 
> org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5533)
> at 
> org.apache.geode.internal.cache.LocalRegion.basicBridgePut(LocalRegion.java:5212)
> at 
> org.apache.geode.internal.cache.tier.sockets.command.Put65.cmdExecute(Put65.java:411)
> at 
> org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
> at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848)
> at 
> org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72)
> at 
> org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676)
> at 
> org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
> at java.base/java.lang.Thread.run(Thread.java:834) ) happened at remote 
> server.
> {noformat}
> Idea would be to modify *ThinClientRegion::handleServerException* in order to 
> return a new error and later on, map it to a new created exception
> *Suggestions*
>  The new exception could be called:
>  * CacheServerLowMemoryException
>  * ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8690) Member that fails availability check is never suspected again

2020-11-05 Thread Bruce J Schuchardt (Jira)
Bruce J Schuchardt created GEODE-8690:
-

 Summary: Member that fails availability check is never suspected 
again
 Key: GEODE-8690
 URL: https://issues.apache.org/jira/browse/GEODE-8690
 Project: Geode
  Issue Type: Bug
  Components: membership
Affects Versions: 1.13.0, 1.12.0, 1.14.0
Reporter: Bruce J Schuchardt


In a test run on support/1.12 there was a cluster with 3 locators and a number 
of servers.  It had a membership view like this:
{noformat}
[ loc1, loc2, loc3, server1, server2, etc]
{noformat}

The test killed loc1 and loc2 and tried to restart loc2.  In this scenario loc3 
should have detected the loss of the other two locators and it should have 
become the membership coordinator but it didn't.  Loc3 detected the loss of 
loc2 and then received a LEAVE request from loc1.  At that point it ought to 
have either started examining loc2 again or perhaps just become the 
coordinator, but it did neither of these and the cluster had no coordinator.

This is similar to GEODE-3780 but in that case an earlier availability check 
passed.

In the test run the names of the locators are
loc1=locatorgemfire_4_3
loc2=locatorgemfire_4_4 and
loc3=locatorgemfire_4_2

{noformat}
[info 2020/10/30 21:51:51.197 PDT :41005 shared unordered 
uid=2 port=42550> tid=0x36] Performing availability check for suspect member 
(locatorgemfire_4_4_host2_3884:3884:locator):41005 reason=member 
unexpectedly shut down shared, unordered connection

[info 2020/10/30 21:51:51.309 PDT  
tid=0x51] received leave request from 
(locatorgemfire_4_3_host2_3866:3866:locator):41004 for 
(locatorgemfire_4_3_host2_3866:3866:locator):41004

[info 2020/10/30 21:51:51.345 PDT  
tid=0x51] Checking to see if I should become coordinator.  My address is 
(locatorgemfire_4_2_host2_3852:3852:locator):41007

[info 2020/10/30 21:51:51.346 PDT  
tid=0x51] View with removed and left members removed is 
View[rs-(locatorgemfire_4_3_host2_3866:3866:locator):41004|3] members: 
[(locatorgemfire_4_4_host2_3884:3884:locator):41005, 
(locatorgemfire_4_2_host2_3852:3852:locator):41007, 
(locatorgemfire_4_1_host2_3843:3843:locator):41006, 
(peergemfire_4_1_host2_3959:3959):41010{lead}, 
(peergemfire_4_2_host2_3967:3967):41009] and coordinator would be 
(locatorgemfire_4_4_host2_3884:3884:locator):41005
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8688) Flaxy C++ Native client integration test cases: PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226938#comment-17226938
 ] 

ASF GitHub Bot commented on GEODE-8688:
---

pdxcodemonkey commented on a change in pull request #686:
URL: https://github.com/apache/geode-native/pull/686#discussion_r518308858



##
File path: cppcache/integration/test/PartitionRegionOpsTest.cpp
##
@@ -144,6 +146,10 @@ void verifyMetadataWasRemovedAtFirstError() {
   }
 }
   }
+  std::cout << "timeoutErrors: " << timeoutErrors << ", ioErrors: " << ioErrors
+<< ", metadataRemovedDueToTimeout: " << metadataRemovedDueToTimeout
+<< ", metadataRemovedDueToIoErr: " << metadataRemovedDueToIoErr
+<< std::endl;

Review comment:
   If you really want to see this output when running your test, I believe 
it's best to use std::cerr rather than cout.  This looks like leftover 
debugging trace stuff to me, tho.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaxy C++ Native client integration test cases: 
> PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop
> --
>
> Key: GEODE-8688
> URL: https://issues.apache.org/jira/browse/GEODE-8688
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Affects Versions: 1.13.0
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
>
> The following test cases for the C++ native client are flaky:
> PartitionRegionOpsTest.getPartitionedRegionWithRedundancyServerGoesDownSingleHop
> PartitionRegionOpsTest.putPartitionedRegionWithRedundancyServerGoesDownSingleHop
>  
> They fail very often when run in CI although I have not seen them fail when 
> executed manually.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226948#comment-17226948
 ] 

ASF GitHub Bot commented on GEODE-8681:
---

echobravopapa opened a new pull request #5706:
URL: https://github.com/apache/geode/pull/5706


   …ng with TLS enabled (#5699)
   
   A socket-read could pick up more than one message and a single unwrap()
   could decrypt multiple messages.
   Normally the engine isn't closed and it reports normal
   status from an unwrap() operation, and Connection.processInputBuffer
   picks up each message, one by one, from the buffer and dispatches them.
   But if the SSLEngine is closed we were ignoring any already-decrypted
   data sitting in the unwrapped buffer and instead we were throwing an 
SSLException.
   
   (cherry picked from commit 7da8f9b516ac1e2525a1dfc922af7bfb8995f2c6)
   (cherry picked from commit 03bbc2ac54998cbb015d533e4fe6e75b3e973146)
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> peer-to-peer message loss due to sending connection closing with TLS enabled
> 
>
> Key: GEODE-8681
> URL: https://issues.apache.org/jira/browse/GEODE-8681
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available, release-blocker
>
> We have observed message loss when TLS is enabled and a distributed lock is 
> released right after sending a message that doesn't require acknowledgement 
> if the sending socket is immediately closed. The closing of sockets 
> immediately after sending a message is frequently seen in function execution 
> threads or server-side application threads that use this pattern:
> {code:java}
>  try {
> DistributedSystem.setThreadsSocketPolicy(false);
> acquireDistributedLock(lockName);
> (perform one or more cache operations)
>   } finally {
> distLockService.unlock(lockName);
> DistributedSystem.releaseThreadsSockets(); // closes the socket
>   }
> {code}
> The fault seems to be in NioSSLEngine.unwrap(), which throws an 
> SSLException() if it finds the SSLEngine is closed even though there is valid 
> data in its decrypt buffer.  It shouldn't throw an exception in that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8686) Tombstone removal optimization during GII could cause deadlock

2020-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-8686:
--
Labels: pull-request-available  (was: )

> Tombstone removal optimization during GII could cause deadlock
> --
>
> Key: GEODE-8686
> URL: https://issues.apache.org/jira/browse/GEODE-8686
> Project: Geode
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0
>Reporter: Donal Evans
>Assignee: Donal Evans
>Priority: Major
>  Labels: pull-request-available
>
> Similar to the issue described in GEODE-6526, if the condition in the below 
> if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a 
> call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could 
> lead to deadlock between the calling thread and a Tombstone GC thread calling 
> {{TombstoneService.gcTombstones()}}. 
> {code:java}
> if (owner.getServerProxy() == null && 
> owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), 
> entryVersion.getRegionVersion())) { 
>   // the received tombstone has already been reaped, so don't retain it 
>   if (owner.getIndexManager() != null) { 
> owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, 
> IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); 
>   } 
>   removeTombstone(oldRe, entryVersion, false, false); 
>   return false; 
> } else { 
>   owner.scheduleTombstone(oldRe, entryVersion); 
>   lruEntryDestroy(oldRe); 
> }
> {code}
> The proposed change is to remove this if statement and allow the old 
> tombstone to be collected later by calling {{scheduleTombstone()}} in all 
> cases. The call to {{AbstractRegionMap.removeTombstone()}} in 
> {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to 
> allow immediate removal of tombstones that we know have already been 
> collected on other members, but since the conditions to trigger it are rare 
> (the old entry must be a tombstone, the new entry received during GII must be 
> a tombstone with a newer version, and we must have already collected a 
> tombstone with a newer version than the new entry) and the overhead of 
> scheduling a tombstone to be collected is comparatively low, the performance 
> impact of removing this optimization in favour of simply scheduling the 
> tombstone to be collected in all cases should be insignificant.
> The solution to the deadlock observed in GEODE-6526 was also to remove the 
> call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be 
> collected later and did not result in any unwanted behaviour, so the proposed 
> fix should be similarly low-impact.
> Also of note is that with this proposed change, there will be no calls to 
> {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} 
> class, which should ensure that other deadlocks involving this method are not 
> possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8686) Tombstone removal optimization during GII could cause deadlock

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226963#comment-17226963
 ] 

ASF GitHub Bot commented on GEODE-8686:
---

DonalEvans opened a new pull request #5707:
URL: https://github.com/apache/geode/pull/5707


   - Do not call AbstractRegionMap.removeTombstone() outside of
   TombstoneService class
   - Add test to confirm that tombstones are correctly scheduled and
   collected with this change
   
   Authored-by: Donal Evans 
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [x] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   - [x] Does `gradlew build` run cleanly?
   
   - [x] Have you written or updated unit tests to verify your changes?
   
   - [N/A] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tombstone removal optimization during GII could cause deadlock
> --
>
> Key: GEODE-8686
> URL: https://issues.apache.org/jira/browse/GEODE-8686
> Project: Geode
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0
>Reporter: Donal Evans
>Assignee: Donal Evans
>Priority: Major
>
> Similar to the issue described in GEODE-6526, if the condition in the below 
> if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a 
> call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could 
> lead to deadlock between the calling thread and a Tombstone GC thread calling 
> {{TombstoneService.gcTombstones()}}. 
> {code:java}
> if (owner.getServerProxy() == null && 
> owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), 
> entryVersion.getRegionVersion())) { 
>   // the received tombstone has already been reaped, so don't retain it 
>   if (owner.getIndexManager() != null) { 
> owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, 
> IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); 
>   } 
>   removeTombstone(oldRe, entryVersion, false, false); 
>   return false; 
> } else { 
>   owner.scheduleTombstone(oldRe, entryVersion); 
>   lruEntryDestroy(oldRe); 
> }
> {code}
> The proposed change is to remove this if statement and allow the old 
> tombstone to be collected later by calling {{scheduleTombstone()}} in all 
> cases. The call to {{AbstractRegionMap.removeTombstone()}} in 
> {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to 
> allow immediate removal of tombstones that we know have already been 
> collected on other members, but since the conditions to trigger it are rare 
> (the old entry must be a tombstone, the new entry received during GII must be 
> a tombstone with a newer version, and we must have already collected a 
> tombstone with a newer version than the new entry) and the overhead of 
> scheduling a tombstone to be collected is comparatively low, the performance 
> impact of removing this optimization in favour of simply scheduling the 
> tombstone to be collected in all cases should be insignificant.
> The solution to the deadlock observed in GEODE-6526 was also to remove the 
> call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be 
> collected later and did not result in any unwanted behaviour, so the proposed 
> fix should be similarly low-impact.
> Also of note is that with this proposed change, there will be no calls to 
> {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} 
> class, which should ensure that other deadlocks involving this method are not 
> possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226964#comment-17226964
 ] 

ASF GitHub Bot commented on GEODE-8652:
---

Bill opened a new pull request #5708:
URL: https://github.com/apache/geode/pull/5708


   Reverts apache/geode#5694



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226966#comment-17226966
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit 9ef2718f243f34306880efc749d46d2d25172b4b in geode's branch 
refs/heads/revert-5694-backport-1-12-GEODE-8652-and-friends from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9ef2718 ]

Revert "GEODE-8652: NioSslEngine.close() Bypasses Locks (#5666)"

This reverts commit 06642ead279c500180f396c865b6277cb92ae27d.


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226965#comment-17226965
 ] 

ASF GitHub Bot commented on GEODE-8652:
---

Bill opened a new pull request #5709:
URL: https://github.com/apache/geode/pull/5709


   Reverts apache/geode#5693



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226967#comment-17226967
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit 9ef2718f243f34306880efc749d46d2d25172b4b in geode's branch 
refs/heads/revert-5694-backport-1-12-GEODE-8652-and-friends from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9ef2718 ]

Revert "GEODE-8652: NioSslEngine.close() Bypasses Locks (#5666)"

This reverts commit 06642ead279c500180f396c865b6277cb92ae27d.


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226969#comment-17226969
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit 4954648d5801148db42973315ab439fad86d4c1a in geode's branch 
refs/heads/revert-5693-backport-1-13-GEODE-8652-and-friends from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=4954648 ]

Revert "GEODE-8652: NioSslEngine.close() Bypasses Locks (#5666)"

This reverts commit b2af727ce23fd155f3665e3db2ecee6e8f80fba7.


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.

[jira] [Commented] (GEODE-8540) Repackage DUnitBlackboard in a JUnit Rule

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226970#comment-17226970
 ] 

ASF subversion and git services commented on GEODE-8540:


Commit b06e328798d27c30683a850241681338ac7fed55 in geode's branch 
refs/heads/revert-5693-backport-1-13-GEODE-8652-and-friends from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=b06e328 ]

Revert "GEODE-8540: Create new DistributedBlackboard Rule (#5557)"

This reverts commit cde469c6b6955a334e6bbf22accfc0735f0c70f4.


> Repackage DUnitBlackboard in a JUnit Rule
> -
>
> Key: GEODE-8540
> URL: https://issues.apache.org/jira/browse/GEODE-8540
> Project: Geode
>  Issue Type: Wish
>  Components: tests
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0, 1.13.1
>
>
> Repackage DUnitBlackboard in a JUnit Rule



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8136) Repackage and improve javadocs for UncheckedUtils

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226971#comment-17226971
 ] 

ASF subversion and git services commented on GEODE-8136:


Commit ba3b156ec6907b773b66c8628f6507cd5f5d2d4f in geode's branch 
refs/heads/revert-5693-backport-1-13-GEODE-8652-and-friends from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=ba3b156 ]

Revert "GEODE-8136: Move UncheckedUtils to geode-common (#5123)"

This reverts commit 10af7ea015ec85ef02b2e972c7a3dd3ec23bcb7f.


> Repackage and improve javadocs for UncheckedUtils
> -
>
> Key: GEODE-8136
> URL: https://issues.apache.org/jira/browse/GEODE-8136
> Project: Geode
>  Issue Type: Wish
>  Components: core
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> UncheckedUtils is a collection of simple utilities for unchecked casts in 
> both test and product code. We should move it to the most common module it 
> can live in, rename methods to be more description, and add javadocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8136) Repackage and improve javadocs for UncheckedUtils

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226974#comment-17226974
 ] 

ASF subversion and git services commented on GEODE-8136:


Commit ba3b156ec6907b773b66c8628f6507cd5f5d2d4f in geode's branch 
refs/heads/revert-5693-backport-1-13-GEODE-8652-and-friends from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=ba3b156 ]

Revert "GEODE-8136: Move UncheckedUtils to geode-common (#5123)"

This reverts commit 10af7ea015ec85ef02b2e972c7a3dd3ec23bcb7f.


> Repackage and improve javadocs for UncheckedUtils
> -
>
> Key: GEODE-8136
> URL: https://issues.apache.org/jira/browse/GEODE-8136
> Project: Geode
>  Issue Type: Wish
>  Components: core
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> UncheckedUtils is a collection of simple utilities for unchecked casts in 
> both test and product code. We should move it to the most common module it 
> can live in, rename methods to be more description, and add javadocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8540) Repackage DUnitBlackboard in a JUnit Rule

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226973#comment-17226973
 ] 

ASF subversion and git services commented on GEODE-8540:


Commit b06e328798d27c30683a850241681338ac7fed55 in geode's branch 
refs/heads/revert-5693-backport-1-13-GEODE-8652-and-friends from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=b06e328 ]

Revert "GEODE-8540: Create new DistributedBlackboard Rule (#5557)"

This reverts commit cde469c6b6955a334e6bbf22accfc0735f0c70f4.


> Repackage DUnitBlackboard in a JUnit Rule
> -
>
> Key: GEODE-8540
> URL: https://issues.apache.org/jira/browse/GEODE-8540
> Project: Geode
>  Issue Type: Wish
>  Components: tests
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0, 1.13.1
>
>
> Repackage DUnitBlackboard in a JUnit Rule



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226972#comment-17226972
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit 4954648d5801148db42973315ab439fad86d4c1a in geode's branch 
refs/heads/revert-5693-backport-1-13-GEODE-8652-and-friends from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=4954648 ]

Revert "GEODE-8652: NioSslEngine.close() Bypasses Locks (#5666)"

This reverts commit b2af727ce23fd155f3665e3db2ecee6e8f80fba7.


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226981#comment-17226981
 ] 

ASF GitHub Bot commented on GEODE-8652:
---

Bill opened a new pull request #5710:
URL: https://github.com/apache/geode/pull/5710


   This reverts commit 08e9e9673d0ed0a3d74c6d16e706817cab09.
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembersh

[jira] [Commented] (GEODE-8686) Tombstone removal optimization during GII could cause deadlock

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226990#comment-17226990
 ] 

ASF GitHub Bot commented on GEODE-8686:
---

lgtm-com[bot] commented on pull request #5707:
URL: https://github.com/apache/geode/pull/5707#issuecomment-722646152


   This pull request **fixes 1 alert** when merging 
f42efca780a24a697672b6d4f04fd66d82fa730a into 
7cc14eef52e06fe1e8c56bd766df56297b9c9ff8 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-28c30f2e70e63f14ced912c0945c9dd00166b91c)
   
   **fixed alerts:**
   
   * 1 for Dereferenced variable may be null



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tombstone removal optimization during GII could cause deadlock
> --
>
> Key: GEODE-8686
> URL: https://issues.apache.org/jira/browse/GEODE-8686
> Project: Geode
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.14.0
>Reporter: Donal Evans
>Assignee: Donal Evans
>Priority: Major
>  Labels: pull-request-available
>
> Similar to the issue described in GEODE-6526, if the condition in the below 
> if statement in {{AbstractRegionMap.initialImagePut()}} evaluates to true, a 
> call to {{AbstractRegionMap.removeTombstone()}} will be triggered that could 
> lead to deadlock between the calling thread and a Tombstone GC thread calling 
> {{TombstoneService.gcTombstones()}}. 
> {code:java}
> if (owner.getServerProxy() == null && 
> owner.getVersionVector().isTombstoneTooOld( entryVersion.getMemberID(), 
> entryVersion.getRegionVersion())) { 
>   // the received tombstone has already been reaped, so don't retain it 
>   if (owner.getIndexManager() != null) { 
> owner.getIndexManager().updateIndexes(oldRe, IndexManager.REMOVE_ENTRY, 
> IndexProtocol.REMOVE_DUE_TO_GII_TOMBSTONE_CLEANUP); 
>   } 
>   removeTombstone(oldRe, entryVersion, false, false); 
>   return false; 
> } else { 
>   owner.scheduleTombstone(oldRe, entryVersion); 
>   lruEntryDestroy(oldRe); 
> }
> {code}
> The proposed change is to remove this if statement and allow the old 
> tombstone to be collected later by calling {{scheduleTombstone()}} in all 
> cases. The call to {{AbstractRegionMap.removeTombstone()}} in 
> {{AbstractRegionMap.initialImagePut()}} is intended to be an optimization to 
> allow immediate removal of tombstones that we know have already been 
> collected on other members, but since the conditions to trigger it are rare 
> (the old entry must be a tombstone, the new entry received during GII must be 
> a tombstone with a newer version, and we must have already collected a 
> tombstone with a newer version than the new entry) and the overhead of 
> scheduling a tombstone to be collected is comparatively low, the performance 
> impact of removing this optimization in favour of simply scheduling the 
> tombstone to be collected in all cases should be insignificant.
> The solution to the deadlock observed in GEODE-6526 was also to remove the 
> call to {{AbstractRegionMap.removeTombstone()}} and allow the tombstone to be 
> collected later and did not result in any unwanted behaviour, so the proposed 
> fix should be similarly low-impact.
> Also of note is that with this proposed change, there will be no calls to 
> {{AbstractRegionMap.removeTombstone()}} outside of the {{TombstoneService}} 
> class, which should ensure that other deadlocks involving this method are not 
> possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226996#comment-17226996
 ] 

ASF GitHub Bot commented on GEODE-8652:
---

Bill merged pull request #5708:
URL: https://github.com/apache/geode/pull/5708


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226997#comment-17226997
 ] 

ASF GitHub Bot commented on GEODE-8652:
---

Bill merged pull request #5709:
URL: https://github.com/apache/geode/pull/5709


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226999#comment-17226999
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit bec47047dec2ddb64e000b71004fbef8ed3b2b88 in geode's branch 
refs/heads/support/1.12 from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=bec4704 ]

Revert "GEODE-8652: NioSslEngine.close() Bypasses Locks (#5666)"

This reverts commit 06642ead279c500180f396c865b6277cb92ae27d.


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227000#comment-17227000
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit ef74657254c2b2707a31b43af52af1734b71e961 in geode's branch 
refs/heads/support/1.13 from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=ef74657 ]

Revert "GEODE-8652: NioSslEngine.close() Bypasses Locks (#5666)"

This reverts commit b2af727ce23fd155f3665e3db2ecee6e8f80fba7.


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)

[jira] [Commented] (GEODE-8136) Repackage and improve javadocs for UncheckedUtils

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227002#comment-17227002
 ] 

ASF subversion and git services commented on GEODE-8136:


Commit 986334e9198a1756b839d0d13028f4a846ea29b5 in geode's branch 
refs/heads/support/1.13 from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=986334e ]

Revert "GEODE-8136: Move UncheckedUtils to geode-common (#5123)"

This reverts commit 10af7ea015ec85ef02b2e972c7a3dd3ec23bcb7f.


> Repackage and improve javadocs for UncheckedUtils
> -
>
> Key: GEODE-8136
> URL: https://issues.apache.org/jira/browse/GEODE-8136
> Project: Geode
>  Issue Type: Wish
>  Components: core
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> UncheckedUtils is a collection of simple utilities for unchecked casts in 
> both test and product code. We should move it to the most common module it 
> can live in, rename methods to be more description, and add javadocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8540) Repackage DUnitBlackboard in a JUnit Rule

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227001#comment-17227001
 ] 

ASF subversion and git services commented on GEODE-8540:


Commit 4886d2055f9cd0792694d0edb61537429a037439 in geode's branch 
refs/heads/support/1.13 from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=4886d20 ]

Revert "GEODE-8540: Create new DistributedBlackboard Rule (#5557)"

This reverts commit cde469c6b6955a334e6bbf22accfc0735f0c70f4.


> Repackage DUnitBlackboard in a JUnit Rule
> -
>
> Key: GEODE-8540
> URL: https://issues.apache.org/jira/browse/GEODE-8540
> Project: Geode
>  Issue Type: Wish
>  Components: tests
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0, 1.13.1
>
>
> Repackage DUnitBlackboard in a JUnit Rule



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227018#comment-17227018
 ] 

ASF GitHub Bot commented on GEODE-8652:
---

Bill merged pull request #5710:
URL: https://github.com/apache/geode/pull/5710


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227019#comment-17227019
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit 9653a0b6e490272fa77d375049f0e9f1cb6c8929 in geode's branch 
refs/heads/develop from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9653a0b ]

Revert "GEODE-8652: NioSslEngine.close() Bypasses Locks (#5666)"

This reverts commit 08e9e9673d0ed0a3d74c6d16e706817cab09.


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227030#comment-17227030
 ] 

ASF GitHub Bot commented on GEODE-8652:
---

Bill opened a new pull request #5712:
URL: https://github.com/apache/geode/pull/5712


   This is a second try fixing GEODE-8652.
   
   We committed the change a week ago but then found other problems in some 
applications. We've included a new concurrency test in this latest PR that 
validates that issue is resolved.
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [x] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [x] Does `gradlew build` run cleanly?
   
   - [x] Have you written or updated unit tests to verify your changes?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> or

[jira] [Updated] (GEODE-8540) Repackage DUnitBlackboard in a JUnit Rule

2020-11-05 Thread Owen Nichols (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen Nichols updated GEODE-8540:

Fix Version/s: (was: 1.13.1)

> Repackage DUnitBlackboard in a JUnit Rule
> -
>
> Key: GEODE-8540
> URL: https://issues.apache.org/jira/browse/GEODE-8540
> Project: Geode
>  Issue Type: Wish
>  Components: tests
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> Repackage DUnitBlackboard in a JUnit Rule



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8136) Repackage and improve javadocs for UncheckedUtils

2020-11-05 Thread Owen Nichols (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen Nichols updated GEODE-8136:

Fix Version/s: (was: 1.13.1)

> Repackage and improve javadocs for UncheckedUtils
> -
>
> Key: GEODE-8136
> URL: https://issues.apache.org/jira/browse/GEODE-8136
> Project: Geode
>  Issue Type: Wish
>  Components: core
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
> Fix For: 1.14.0
>
>
> UncheckedUtils is a collection of simple utilities for unchecked casts in 
> both test and product code. We should move it to the most common module it 
> can live in, rename methods to be more description, and add javadocs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227041#comment-17227041
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227037#comment-17227037
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227040#comment-17227040
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227039#comment-17227039
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227042#comment-17227042
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227038#comment-17227038
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227054#comment-17227054
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227044#comment-17227044
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227053#comment-17227053
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227047#comment-17227047
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227046#comment-17227046
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227051#comment-17227051
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227049#comment-17227049
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227045#comment-17227045
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227052#comment-17227052
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227050#comment-17227050
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227048#comment-17227048
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227056#comment-17227056
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227057#comment-17227057
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227060#comment-17227060
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227043#comment-17227043
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227058#comment-17227058
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227055#comment-17227055
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7727) Geode P2P connection hanging

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227059#comment-17227059
 ] 

ASF subversion and git services commented on GEODE-7727:


Commit 798a245147835c1e1b0026e863b9816a3ce2c551 in geode's branch 
refs/heads/support/1.12 from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=798a245 ]

GEODE-7727: modify sender thread to detect relese of connection (#4751)

* GEODE-7727: modify sender thread to detect relese of connection

* GEODE-7727: Update solution only for shared connections

* GEODE-7727: added test

* GEODE-7727: update ater comments

* GEODE-7727: update test

* GEODE-7727: fix for async write hanging

* GEODE-7727: Test of region operations in the face of closed connections

Adding a test for what happens to region operations when a connection is closed
out from under the system. This test hangs without the changes to let the
reader thread keep running.

Fix to test

* GEODE-7727: Preventing a double release of the input buffer

The releaseInputBuffer method was not thread safe. If it is called
concurrently, it will end up being released twice, which will add the buffer to
to the buffer pool twice. Later, this could result in two threads using the
same buffer, resulting in corruption of the buffer.

With the changes for GEODE-7727, we made it likely that releaseInputBuffer
would be called concurrently. If a member departs, one thread will call
Connection.close. Connection.close will close the socket and call
releaseInputBuffer. However, closing the socket will wake up the reader thread,
which will also call releaseInputBuffer concurrently.

Making releaseInputBuffer thread safe by introducing a lock.

* GEODE-7727: update after merge

* GEODE-7727: update test name

Co-authored-by: Dan Smith 
(cherry picked from commit c8413592e5573f675c538c63ef9ee9f97a349e73)


> Geode P2P connection hanging
> 
>
> Key: GEODE-7727
> URL: https://issues.apache.org/jira/browse/GEODE-7727
> Project: Geode
>  Issue Type: Bug
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: needs-review, pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#172b4d}Geode P2P handshake reader stops listening to it's socket once 
> the handshake between 2 peers is established. This seems to be a design 
> choice. 
> {color}
> {color:#172b4d}The problem is when the connection gets killed (TCP FIN). 
> Since nothing is listening on the socket, nothing will get that FIN package 
> and close the connection. The connection is left hanging (CLOSE-WAIT state). 
> The peers are then unable to establish proper P2P communication later.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8689) CI Failure: DistributedAckPersistentRegionCCEDUnitTest > testConcurrentEventsOnEmptyRegion FAILED

2020-11-05 Thread Sarah Abbey (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarah Abbey resolved GEODE-8689.

Resolution: Duplicate

> CI Failure: DistributedAckPersistentRegionCCEDUnitTest > 
> testConcurrentEventsOnEmptyRegion FAILED
> -
>
> Key: GEODE-8689
> URL: https://issues.apache.org/jira/browse/GEODE-8689
> Project: Geode
>  Issue Type: Bug
>Reporter: Sarah Abbey
>Priority: Major
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/574
> {code:java}
> org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait for 
> the members to eventually be consistent' didn't complete within 5 minutes 
> because assertion condition defined as a lambda expression in 
> org.apache.geode.cache30.MultiVMRegionTestCase [region contents are not 
> consistent for cckey7] expected:<"ccvalue513398912"> but was:.
>   at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
>   at 
> org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
>   at 
> org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
>   at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
>   at 
> org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7472) DistributedAckOverflowRegionCCEOffHeapDUnitTest.testConcurrentEventsOnEmptyRegion failed in DistributedTestOpenJDK8

2020-11-05 Thread Sarah Abbey (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227088#comment-17227088
 ] 

Sarah Abbey commented on GEODE-7472:


Re-opening issue due to CI failure: 
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/574

> DistributedAckOverflowRegionCCEOffHeapDUnitTest.testConcurrentEventsOnEmptyRegion
>  failed in DistributedTestOpenJDK8
> ---
>
> Key: GEODE-7472
> URL: https://issues.apache.org/jira/browse/GEODE-7472
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.12.0
>Reporter: Mark Hanson
>Assignee: Ernest Burghardt
>Priority: Major
>  Labels: flaky
> Fix For: 1.12.0
>
>
> testConcurrentEvents is failing in testing.
>  
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/mhansonp-mhanson-mass-test-ru-main/jobs/DistributedTestOpenJDK8/builds/1680]
> {noformat}
> org.apache.geode.cache30.DistributedAckOverflowRegionCCEOffHeapDUnitTest > 
> testConcurrentEventsOnEmptyRegion FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait 
> for the members to eventually be consistent' didn't complete within 300 
> seconds because assertion condition defined as a lambda expression in 
> org.apache.geode.cache30.MultiVMRegionTestCase that uses 
> org.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM [r2 contents are 
> not consistent with r1 for cckey7] expected:<"ccvalue1212233561"> but 
> was:.
> Caused by:
> org.junit.ComparisonFailure: [r2 contents are not consistent with r1 
> for cckey7] expected:<"ccvalue1212233561"> but was:
> {noformat}
>  
> {noformat}
>  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-results/distributedTest/1574073134/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-artifacts/1574073134/distributedtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0007.tgz{noformat}
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/mhansonp-mhanson-mass-test-ru-main/jobs/DistributedTestOpenJDK8/builds/1613]
> {noformat}
> org.apache.geode.cache30.DistributedAckOverflowRegionCCEOffHeapDUnitTest > 
> testConcurrentEventsOnEmptyRegion FAILED
> 22:20:16org.awaitility.core.ConditionTimeoutException: Condition with 
> alias 'Wait for the members to eventually be consistent' didn't complete 
> within 300 seconds because assertion condition defined as a lambda expression 
> in org.apache.geode.cache30.MultiVMRegionTestCase that uses 
> org.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM [r2 contents are 
> not consistent with r1 for subkey cckey3-1] expected:<"ccvalue[-1702102599]"> 
> but was:<"ccvalue[2145556138]">.
> 22:20:16
> 22:20:16Caused by:
> 22:20:16org.junit.ComparisonFailure: [r2 contents are not consistent 
> with r1 for subkey cckey3-1] expected:<"ccvalue[-1702102599]"> but 
> was:<"ccvalue[2145556138]">
> 23:12:55 {noformat}
>  
> {noformat}
>  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-results/distributedTest/1573975176/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-artifacts/1573975176/distributedtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0007.tgz{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-7472) DistributedAckOverflowRegionCCEOffHeapDUnitTest.testConcurrentEventsOnEmptyRegion failed in DistributedTestOpenJDK8

2020-11-05 Thread Sarah Abbey (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarah Abbey reassigned GEODE-7472:
--

Assignee: (was: Ernest Burghardt)

> DistributedAckOverflowRegionCCEOffHeapDUnitTest.testConcurrentEventsOnEmptyRegion
>  failed in DistributedTestOpenJDK8
> ---
>
> Key: GEODE-7472
> URL: https://issues.apache.org/jira/browse/GEODE-7472
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.12.0
>Reporter: Mark Hanson
>Priority: Major
>  Labels: flaky
> Fix For: 1.12.0
>
>
> testConcurrentEvents is failing in testing.
>  
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/mhansonp-mhanson-mass-test-ru-main/jobs/DistributedTestOpenJDK8/builds/1680]
> {noformat}
> org.apache.geode.cache30.DistributedAckOverflowRegionCCEOffHeapDUnitTest > 
> testConcurrentEventsOnEmptyRegion FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait 
> for the members to eventually be consistent' didn't complete within 300 
> seconds because assertion condition defined as a lambda expression in 
> org.apache.geode.cache30.MultiVMRegionTestCase that uses 
> org.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM [r2 contents are 
> not consistent with r1 for cckey7] expected:<"ccvalue1212233561"> but 
> was:.
> Caused by:
> org.junit.ComparisonFailure: [r2 contents are not consistent with r1 
> for cckey7] expected:<"ccvalue1212233561"> but was:
> {noformat}
>  
> {noformat}
>  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-results/distributedTest/1574073134/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-artifacts/1574073134/distributedtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0007.tgz{noformat}
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/mhansonp-mhanson-mass-test-ru-main/jobs/DistributedTestOpenJDK8/builds/1613]
> {noformat}
> org.apache.geode.cache30.DistributedAckOverflowRegionCCEOffHeapDUnitTest > 
> testConcurrentEventsOnEmptyRegion FAILED
> 22:20:16org.awaitility.core.ConditionTimeoutException: Condition with 
> alias 'Wait for the members to eventually be consistent' didn't complete 
> within 300 seconds because assertion condition defined as a lambda expression 
> in org.apache.geode.cache30.MultiVMRegionTestCase that uses 
> org.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM [r2 contents are 
> not consistent with r1 for subkey cckey3-1] expected:<"ccvalue[-1702102599]"> 
> but was:<"ccvalue[2145556138]">.
> 22:20:16
> 22:20:16Caused by:
> 22:20:16org.junit.ComparisonFailure: [r2 contents are not consistent 
> with r1 for subkey cckey3-1] expected:<"ccvalue[-1702102599]"> but 
> was:<"ccvalue[2145556138]">
> 23:12:55 {noformat}
>  
> {noformat}
>  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-results/distributedTest/1573975176/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-artifacts/1573975176/distributedtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0007.tgz{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-7472) DistributedAckOverflowRegionCCEOffHeapDUnitTest.testConcurrentEventsOnEmptyRegion failed in DistributedTestOpenJDK8

2020-11-05 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227089#comment-17227089
 ] 

Geode Integration commented on GEODE-7472:
--

Seen in [DistributedTestOpenJDK11 
#574|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/574]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0467/test-results/distributedTest/1604579602/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0467/test-artifacts/1604579602/distributedtestfiles-OpenJDK11-1.14.0-build.0467.tgz].

> DistributedAckOverflowRegionCCEOffHeapDUnitTest.testConcurrentEventsOnEmptyRegion
>  failed in DistributedTestOpenJDK8
> ---
>
> Key: GEODE-7472
> URL: https://issues.apache.org/jira/browse/GEODE-7472
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.12.0
>Reporter: Mark Hanson
>Priority: Major
>  Labels: flaky
> Fix For: 1.12.0
>
>
> testConcurrentEvents is failing in testing.
>  
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/mhansonp-mhanson-mass-test-ru-main/jobs/DistributedTestOpenJDK8/builds/1680]
> {noformat}
> org.apache.geode.cache30.DistributedAckOverflowRegionCCEOffHeapDUnitTest > 
> testConcurrentEventsOnEmptyRegion FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait 
> for the members to eventually be consistent' didn't complete within 300 
> seconds because assertion condition defined as a lambda expression in 
> org.apache.geode.cache30.MultiVMRegionTestCase that uses 
> org.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM [r2 contents are 
> not consistent with r1 for cckey7] expected:<"ccvalue1212233561"> but 
> was:.
> Caused by:
> org.junit.ComparisonFailure: [r2 contents are not consistent with r1 
> for cckey7] expected:<"ccvalue1212233561"> but was:
> {noformat}
>  
> {noformat}
>  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-results/distributedTest/1574073134/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-artifacts/1574073134/distributedtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0007.tgz{noformat}
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/mhansonp-mhanson-mass-test-ru-main/jobs/DistributedTestOpenJDK8/builds/1613]
> {noformat}
> org.apache.geode.cache30.DistributedAckOverflowRegionCCEOffHeapDUnitTest > 
> testConcurrentEventsOnEmptyRegion FAILED
> 22:20:16org.awaitility.core.ConditionTimeoutException: Condition with 
> alias 'Wait for the members to eventually be consistent' didn't complete 
> within 300 seconds because assertion condition defined as a lambda expression 
> in org.apache.geode.cache30.MultiVMRegionTestCase that uses 
> org.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM, 
> org.apache.geode.test.dunit.VMorg.apache.geode.test.dunit.VM [r2 contents are 
> not consistent with r1 for subkey cckey3-1] expected:<"ccvalue[-1702102599]"> 
> but was:<"ccvalue[2145556138]">.
> 22:20:16
> 22:20:16Caused by:
> 22:20:16org.junit.ComparisonFailure: [r2 contents are not consistent 
> with r1 for subkey cckey3-1] expected:<"ccvalue[-1702102599]"> but 
> was:<"ccvalue[2145556138]">
> 23:12:55 {noformat}
>  
> {noformat}
>  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-results/distributedTest/1573975176/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/mhansonp-mhanson-mass-test-ru-main/1.10.0-SNAPSHOT.0007/test-artifacts/1573975176/distributedtestfiles-OpenJDK8-1.10.0-SNAPSHOT.0007.tgz{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227092#comment-17227092
 ] 

ASF GitHub Bot commented on GEODE-8681:
---

echobravopapa closed pull request #5706:
URL: https://github.com/apache/geode/pull/5706


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> peer-to-peer message loss due to sending connection closing with TLS enabled
> 
>
> Key: GEODE-8681
> URL: https://issues.apache.org/jira/browse/GEODE-8681
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available, release-blocker
>
> We have observed message loss when TLS is enabled and a distributed lock is 
> released right after sending a message that doesn't require acknowledgement 
> if the sending socket is immediately closed. The closing of sockets 
> immediately after sending a message is frequently seen in function execution 
> threads or server-side application threads that use this pattern:
> {code:java}
>  try {
> DistributedSystem.setThreadsSocketPolicy(false);
> acquireDistributedLock(lockName);
> (perform one or more cache operations)
>   } finally {
> distLockService.unlock(lockName);
> DistributedSystem.releaseThreadsSockets(); // closes the socket
>   }
> {code}
> The fault seems to be in NioSSLEngine.unwrap(), which throws an 
> SSLException() if it finds the SSLEngine is closed even though there is valid 
> data in its decrypt buffer.  It shouldn't throw an exception in that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8688) Flaxy C++ Native client integration test cases: PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227093#comment-17227093
 ] 

ASF GitHub Bot commented on GEODE-8688:
---

pivotal-jbarrett commented on a change in pull request #686:
URL: https://github.com/apache/geode-native/pull/686#discussion_r518456823



##
File path: cppcache/integration/test/PartitionRegionOpsTest.cpp
##
@@ -144,6 +146,10 @@ void verifyMetadataWasRemovedAtFirstError() {
   }
 }
   }
+  std::cout << "timeoutErrors: " << timeoutErrors << ", ioErrors: " << ioErrors
+<< ", metadataRemovedDueToTimeout: " << metadataRemovedDueToTimeout
+<< ", metadataRemovedDueToIoErr: " << metadataRemovedDueToIoErr
+<< std::endl;

Review comment:
   Tests should never output anything other than assertion failures. If 
more information is useful to log then it is useful to assert.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaxy C++ Native client integration test cases: 
> PartitionRegionOpsTest.[get|put]PartitionedRegionWithRedundancyServerGoesDownSingleHop
> --
>
> Key: GEODE-8688
> URL: https://issues.apache.org/jira/browse/GEODE-8688
> Project: Geode
>  Issue Type: Bug
>  Components: native client
>Affects Versions: 1.13.0
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
>
> The following test cases for the C++ native client are flaky:
> PartitionRegionOpsTest.getPartitionedRegionWithRedundancyServerGoesDownSingleHop
> PartitionRegionOpsTest.putPartitionedRegionWithRedundancyServerGoesDownSingleHop
>  
> They fail very often when run in CI although I have not seen them fail when 
> executed manually.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227094#comment-17227094
 ] 

ASF GitHub Bot commented on GEODE-8681:
---

echobravopapa opened a new pull request #5713:
URL: https://github.com/apache/geode/pull/5713


   …ng with TLS enabled (#5699)
   
   A socket-read could pick up more than one message and a single unwrap()
   could decrypt multiple messages.
   Normally the engine isn't closed and it reports normal
   status from an unwrap() operation, and Connection.processInputBuffer
   picks up each message, one by one, from the buffer and dispatches them.
   But if the SSLEngine is closed we were ignoring any already-decrypted
   data sitting in the unwrapped buffer and instead we were throwing an 
SSLException.
   
   (cherry picked from commit 7da8f9b516ac1e2525a1dfc922af7bfb8995f2c6)
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> peer-to-peer message loss due to sending connection closing with TLS enabled
> 
>
> Key: GEODE-8681
> URL: https://issues.apache.org/jira/browse/GEODE-8681
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available, release-blocker
>
> We have observed message loss when TLS is enabled and a distributed lock is 
> released right after sending a message that doesn't require acknowledgement 
> if the sending socket is immediately closed. The closing of sockets 
> immediately after sending a message is frequently seen in function execution 
> threads or server-side application threads that use this pattern:
> {code:java}
>  try {
> DistributedSystem.setThreadsSocketPolicy(false);
> acquireDistributedLock(lockName);
> (perform one or more cache operations)
>   } finally {
> distLockService.unlock(lockName);
> DistributedSystem.releaseThreadsSockets(); // closes the socket
>   }
> {code}
> The fault seems to be in NioSSLEngine.unwrap(), which throws an 
> SSLException() if it finds the SSLEngine is closed even though there is valid 
> data in its decrypt buffer.  It shouldn't throw an exception in that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8667) Duplicate online Oplog compaction after offline Oplog compaction

2020-11-05 Thread Jianxia Chen (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227100#comment-17227100
 ] 

Jianxia Chen commented on GEODE-8667:
-

When Oplog.totalCount == 0, no Oplog compaction is needed.

> Duplicate online Oplog compaction after offline Oplog compaction
> 
>
> Key: GEODE-8667
> URL: https://issues.apache.org/jira/browse/GEODE-8667
> Project: Geode
>  Issue Type: Bug
>Reporter: Jianxia Chen
>Assignee: Jianxia Chen
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>
> Use `compact offline-disk-store` command to compact the Oplogs offline. 
> Then restart the servers. 
> The logs show OplogCompactor thread is compacting Oplogs again when 
> restarting the servers, even though the Oplogs were just compacted offline. 
> For example:
> ```
> [info 2020/10/27 16:32:22.534 PDT  tid=0x35] Recovered 
> values for disk store DEFAULT with unique id 
> 76393d3c-dd10-4b89-b655-821d37631774
> [info 2020/10/27 16:32:22.535 PDT  
> tid=0x35] OplogCompactor for DEFAULT compaction oplog id(s): oplog#2
> [info 2020/10/27 16:32:22.537 PDT  
> tid=0x35] compaction did 2 creates and updates in 2 ms
> [info 2020/10/27 16:32:22.537 PDT  tid=0x36] Deleted 
> oplog#2 crf for disk store DEFAULT.
> [info 2020/10/27 16:32:22.538 PDT  tid=0x36] Deleted 
> oplog#2 krf for disk store DEFAULT.
> [info 2020/10/27 16:32:22.538 PDT  tid=0x36] Deleted 
> oplog#2 drf for disk store DEFAULT.
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227102#comment-17227102
 ] 

ASF GitHub Bot commented on GEODE-8652:
---

Bill merged pull request #5712:
URL: https://github.com/apache/geode/pull/5712


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501)
>   at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291)
> {noformat}
> That thread is waiting on a lock held by this thread (in ds3) which is 
> waiting on an acknowledgement to a PutAllPRMessage sent to ds1.
> {noformat}
> "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 
> tid=0x7fdb9c030800 nid=0x30d1 runnable [0x7fdb732f]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227103#comment-17227103
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit af267c005a63317cbb8528cdb38eccf6a8747818 in geode's branch 
refs/heads/develop from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=af267c0 ]

* GEODE-8652: NioSslEngine.close() Bypasses Locks (#5712)

- NioSslEngine.close() proceeds even if readers (or writers) are
  operating on its ByteBuffers, allowing Connection.close() to close
  its socket and proceed.

- NioSslEngine.close() needed a lock only on the output buffer, so
  we split what was a single lock into two. Also instead of using
  synchronized we use a ReentrantLock so we can
  call tryLock() and time out if needed in NioSslEngine.close().

- Since readers/writers may hold locks on these input/output buffers
  when NioSslEngine.close() is called a reference count is maintained
  and the buffers are returned to the pool only when the last user
  is done.

- To manage the locking and reference counting a new AutoCloseable
  ByteBufferSharing interface is introduced with a trivial
  implementation: ByteBufferSharingNoOp and a real implementation:
  ByteBufferSharingImpl.

- Added a new unit test, and a new concurrency test for
  ByteBufferSharingImpl: both ensure that ByteBuffers are returned
  to the pool exactly once. Added a new DUnit test for the interaction
  between ByteBufferSharingImpl and NioSslEngine and Connection.

Co-authored-by: Bill Burcham 
Co-authored-by: Darrel Schneider 
Co-authored-by: Ernie Burghardt 
Co-authored-by: Dan Smith 

> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> 

[jira] [Commented] (GEODE-8652) member hung in Connection.notifyHandshakeWaiter() during disconnect waiting for a lock held by another thread in Connection.readAck()

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227104#comment-17227104
 ] 

ASF subversion and git services commented on GEODE-8652:


Commit af267c005a63317cbb8528cdb38eccf6a8747818 in geode's branch 
refs/heads/develop from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=af267c0 ]

* GEODE-8652: NioSslEngine.close() Bypasses Locks (#5712)

- NioSslEngine.close() proceeds even if readers (or writers) are
  operating on its ByteBuffers, allowing Connection.close() to close
  its socket and proceed.

- NioSslEngine.close() needed a lock only on the output buffer, so
  we split what was a single lock into two. Also instead of using
  synchronized we use a ReentrantLock so we can
  call tryLock() and time out if needed in NioSslEngine.close().

- Since readers/writers may hold locks on these input/output buffers
  when NioSslEngine.close() is called a reference count is maintained
  and the buffers are returned to the pool only when the last user
  is done.

- To manage the locking and reference counting a new AutoCloseable
  ByteBufferSharing interface is introduced with a trivial
  implementation: ByteBufferSharingNoOp and a real implementation:
  ByteBufferSharingImpl.

- Added a new unit test, and a new concurrency test for
  ByteBufferSharingImpl: both ensure that ByteBuffers are returned
  to the pool exactly once. Added a new DUnit test for the interaction
  between ByteBufferSharingImpl and NioSslEngine and Connection.

Co-authored-by: Bill Burcham 
Co-authored-by: Darrel Schneider 
Co-authored-by: Ernie Burghardt 
Co-authored-by: Dan Smith 

> member hung in Connection.notifyHandshakeWaiter() during disconnect waiting 
> for a lock held by another thread in Connection.readAck() 
> --
>
> Key: GEODE-8652
> URL: https://issues.apache.org/jira/browse/GEODE-8652
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Bill Burcham
>Assignee: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.1, 1.14.0, 1.13.1
>
>
> An application encountered the following hang in a TLS-enabled cluster.
> Let's call the cluster members ds3 -> ds1. 
> ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in 
> {{SocketChannel.read()}} waiting for the acknowledgement.
> {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut 
> the member down. That thread blocks trying to acquire a lock on the 
> {{NioSslEngine}} held by the first thread (the one doing waiting for the ack 
> to the put-all.)
> Somehow the shutdown thread must be allowed to proceed.
> Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) 
> trying to shut down the member but it's stuck waiting for the monitor on the 
> {{NioSslEngine}}:
> {noformat}
> "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 
> tid=0x7fdb9c011000 nid=0x2fcc waiting for monitor entry 
> [0x7fdb6f4b7000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804)
>   - waiting to lock <0xf2635b28> (a 
> org.apache.geode.internal.net.NioSslEngine)
>   at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350)
>   at 
> org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604)
>   at 
> org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661)
>   - locked <0xf2678cf8> (a java.util.ArrayList)
>   - locked <0xf1187348> (a java.util.concurrent.ConcurrentHashMap)
>   at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487)
>   at 
> org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644)
>   - locked <0xf11867a8> (a 
> org.apache.geode.distributed.internal.direct.DirectChannel)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82)
>   at 
> org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904)
>   at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908)
>   at 
> 

[jira] [Commented] (GEODE-8681) peer-to-peer message loss due to sending connection closing with TLS enabled

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227105#comment-17227105
 ] 

ASF GitHub Bot commented on GEODE-8681:
---

echobravopapa opened a new pull request #5714:
URL: https://github.com/apache/geode/pull/5714


   …ng with TLS enabled (#5699)
   
   A socket-read could pick up more than one message and a single unwrap()
   could decrypt multiple messages.
   Normally the engine isn't closed and it reports normal
   status from an unwrap() operation, and Connection.processInputBuffer
   picks up each message, one by one, from the buffer and dispatches them.
   But if the SSLEngine is closed we were ignoring any already-decrypted
   data sitting in the unwrapped buffer and instead we were throwing an 
SSLException.
   
   (cherry picked from commit 7da8f9b516ac1e2525a1dfc922af7bfb8995f2c6)
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> peer-to-peer message loss due to sending connection closing with TLS enabled
> 
>
> Key: GEODE-8681
> URL: https://issues.apache.org/jira/browse/GEODE-8681
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.10.0, 1.11.0, 1.12.0, 1.13.0
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available, release-blocker
>
> We have observed message loss when TLS is enabled and a distributed lock is 
> released right after sending a message that doesn't require acknowledgement 
> if the sending socket is immediately closed. The closing of sockets 
> immediately after sending a message is frequently seen in function execution 
> threads or server-side application threads that use this pattern:
> {code:java}
>  try {
> DistributedSystem.setThreadsSocketPolicy(false);
> acquireDistributedLock(lockName);
> (perform one or more cache operations)
>   } finally {
> distLockService.unlock(lockName);
> DistributedSystem.releaseThreadsSockets(); // closes the socket
>   }
> {code}
> The fault seems to be in NioSSLEngine.unwrap(), which throws an 
> SSLException() if it finds the SSLEngine is closed even though there is valid 
> data in its decrypt buffer.  It shouldn't throw an exception in that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8466) Create a ClassLoaderService to abstract away dealing with the default ClassLoader directly

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227111#comment-17227111
 ] 

ASF GitHub Bot commented on GEODE-8466:
---

lgtm-com[bot] commented on pull request #5658:
URL: https://github.com/apache/geode/pull/5658#issuecomment-722765604


   This pull request **introduces 3 alerts** and **fixes 1** when merging 
43000b9fa166477601cb64bb14dba9a7439e2c2d into 
af267c005a63317cbb8528cdb38eccf6a8747818 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-8eea1ae86ef56849af481e8b8e63b58fa5cd0422)
   
   **new alerts:**
   
   * 2 for Potential input resource leak
   * 1 for Use of a broken or risky cryptographic algorithm
   
   **fixed alerts:**
   
   * 1 for Use of a broken or risky cryptographic algorithm



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create a ClassLoaderService to abstract away dealing with the default 
> ClassLoader directly
> --
>
> Key: GEODE-8466
> URL: https://issues.apache.org/jira/browse/GEODE-8466
> Project: Geode
>  Issue Type: New Feature
>  Components: core
>Reporter: Udo Kohlmeyer
>Assignee: Udo Kohlmeyer
>Priority: Major
>  Labels: pull-request-available
>
> With the addition of ClassLoader isolation using JBoss Modules GEODE-8067, 
> the manner in which we interact with the ClassLoader needs to change.
> An abstraction is required around the default functions like 
> `findResourceAsStream`, `loadClass` and `loadService`.
> As these features will behave differently between different ClassLoader 
> implementations, it is best to have a single service that will expose that 
> functionality in a transparent manner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227120#comment-17227120
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit 61ad7515f2bae57ea8a4966604f8db5778daf99d in geode's branch 
refs/heads/support/1.13 from Jens Deppe
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=61ad751 ]

GEODE-8603: Potentially expand classes identified for CI stressing to include 
subclasses (#5601) (#5674)

- Make StressNewTestHelper create the complete gradle test task commands
- Since some tests may have subclasses in different source sets, (which
  would require a different repeat task name), it's easier for the
  command generation to all happen in the java helper rather than a
  combination of bash and java.
- Include candidate test class if it is not abstract
- Output a fake Gradle param so that scripts can determine the number of
  tests included.
- Change the CI stress job timeout from 6 to 10 hours.
- Increase the test count threshold from 25 to 35 changed tests. This
  number also includes any tests inferred by this new code.

(cherry picked from commit 4039a363a4b057bca322f29dcf33aa0664f1a912)


> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227121#comment-17227121
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit 61ad7515f2bae57ea8a4966604f8db5778daf99d in geode's branch 
refs/heads/support/1.13 from Jens Deppe
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=61ad751 ]

GEODE-8603: Potentially expand classes identified for CI stressing to include 
subclasses (#5601) (#5674)

- Make StressNewTestHelper create the complete gradle test task commands
- Since some tests may have subclasses in different source sets, (which
  would require a different repeat task name), it's easier for the
  command generation to all happen in the java helper rather than a
  combination of bash and java.
- Include candidate test class if it is not abstract
- Output a fake Gradle param so that scripts can determine the number of
  tests included.
- Change the CI stress job timeout from 6 to 10 hours.
- Increase the test count threshold from 25 to 35 changed tests. This
  number also includes any tests inferred by this new code.

(cherry picked from commit 4039a363a4b057bca322f29dcf33aa0664f1a912)


> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227122#comment-17227122
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit 61ad7515f2bae57ea8a4966604f8db5778daf99d in geode's branch 
refs/heads/support/1.13 from Jens Deppe
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=61ad751 ]

GEODE-8603: Potentially expand classes identified for CI stressing to include 
subclasses (#5601) (#5674)

- Make StressNewTestHelper create the complete gradle test task commands
- Since some tests may have subclasses in different source sets, (which
  would require a different repeat task name), it's easier for the
  command generation to all happen in the java helper rather than a
  combination of bash and java.
- Include candidate test class if it is not abstract
- Output a fake Gradle param so that scripts can determine the number of
  tests included.
- Change the CI stress job timeout from 6 to 10 hours.
- Increase the test count threshold from 25 to 35 changed tests. This
  number also includes any tests inferred by this new code.

(cherry picked from commit 4039a363a4b057bca322f29dcf33aa0664f1a912)


> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227131#comment-17227131
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit 9b2aea942d162f6ee43e3a7bcf8e654d5fbb9d3d in geode's branch 
refs/heads/support/1.12 from Jens Deppe
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9b2aea9 ]

GEODE-8603: Potentially expand classes identified for CI stressing to include 
subclasses (#5601) (#5674)

- Make StressNewTestHelper create the complete gradle test task commands
- Since some tests may have subclasses in different source sets, (which
  would require a different repeat task name), it's easier for the
  command generation to all happen in the java helper rather than a
  combination of bash and java.
- Include candidate test class if it is not abstract
- Output a fake Gradle param so that scripts can determine the number of
  tests included.
- Change the CI stress job timeout from 6 to 10 hours.
- Increase the test count threshold from 25 to 35 changed tests. This
  number also includes any tests inferred by this new code.

(cherry picked from commit 4039a363a4b057bca322f29dcf33aa0664f1a912)


> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227167#comment-17227167
 ] 

ASF GitHub Bot commented on GEODE-8603:
---

onichols-pivotal opened a new pull request #5717:
URL: https://github.com/apache/geode/pull/5717


   it seems that devBuild is not sufficient to ensure that StressNewTestHelper 
and all tests are built before running the helper.  Therefore, explicitly call 
`compileTestJava` to ensure that we do the needful regardless of branch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227202#comment-17227202
 ] 

ASF GitHub Bot commented on GEODE-8603:
---

onichols-pivotal merged pull request #5717:
URL: https://github.com/apache/geode/pull/5717


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227205#comment-17227205
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227204#comment-17227204
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227208#comment-17227208
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227207#comment-17227207
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227210#comment-17227210
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227211#comment-17227211
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227213#comment-17227213
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227214#comment-17227214
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8603) Potentially expand classes identified for CI stressing to include subclasses

2020-11-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227215#comment-17227215
 ] 

ASF subversion and git services commented on GEODE-8603:


Commit d1e003c822463e20ce43109e6dc3f6f72a110586 in geode's branch 
refs/heads/develop from Owen Nichols
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d1e003c ]

GEODE-8603: fix StressNew for support branches (#5717)

* GEODE-8603: fix StressNew for support branches

* all three test compile targets are needed

> Potentially expand classes identified for CI stressing to include subclasses
> 
>
> Key: GEODE-8603
> URL: https://issues.apache.org/jira/browse/GEODE-8603
> Project: Geode
>  Issue Type: Test
>  Components: ci, tests
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >