[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156559#comment-17156559
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

jvarenina commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r453489804



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   I have tested TC with redundancy configured and it seems that the 
recovery of CQs is done a differently in this case. The remaining server sends 
within `InitialImageOperation$FilterInfoMessage` all releveant CQs information 
to the starting server. At the reception of the message the starting server 
then registers CQs as durable (so no problem in this case observed).
   
   **Primary server:**
   ```
   [debug 2020/07/10 13:30:54.916 CEST :41001 shared unordered uid=1 local port=53683 
remote port=45674> tid=0x57] Received message 
'InitialImageOperation$RequestFilterInfoMessage(region 
path='/_gfe_durable_client_with_id_AppCounters_1_queue'; 
sender=192.168.1.102(server3:31347):41001; processorId=27)' from 
<192.168.1.102(server3:31347):41001>
   ```
   
   **Starting server:**
   ```
   [debug 2020/07/10 13:30:54.916 CEST  
tid=0x48] Sending (InitialImageOperation$RequestFilterInfoMessage(region 
path='/_gfe_durable_client_with_id_AppCounters_1_queue'; 
sender=192.168.1.102(server3:31347):41001; processorId=27)) to 1 peers 
([192.168.1.102(server1:30862):41000]) via tcp/ip
   
   [debug 2020/07/10 13:30:54.918 CEST :41000 shared unordered uid=5 local port=52175 
remote port=46552> tid=0x30] Received message 
'InitialImageOperation$FilterInfoMessage processorId=27 from 
192.168.1.102(server1:30862):41000; NON_DURABLE allKeys=0; allKeysInv=0; 
keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; 
patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; DURABLE 
allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; 
patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; 
filtersOfInterestInv=0; cqs=1' from <192.168.1.102(server1:30862):41000>
   
   [debug 2020/07/10 13:30:54.919 CEST  tid=0x3d] Processing FilterInfo for proxy: 
CacheClientProxy[identity(192.168.1.102(31226:loner):45576:8b927d38,connection=1,durableAttributes=DurableClientAttributes[id=AppCounters;
 timeout=200]); port=57552; primary=false; version=GEODE 1.12.0] : 
InitialImageOperation$FilterInfoMessage processorId=27 from 
192.168.1.102(server1:30862):41000; NON_DURABLE allKeys=0; allKeysInv=0; 
keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; 
patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; DURABLE 
allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; 
patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; 
filtersOfInterestInv=0; cqs=1
   
   [debug 2020/07/10 13:30:54.944 CEST  tid=0x3d] Server side query for the cq: randomTracker is: SELECT * FROM 
/example-region i where i > 70
   [debug 2020/07/10 13:30:54.944 CEST  tid=0x3d] Added CQ to the base region: /example-region With key as: 
randomTracker__AppCounters
   [debug 2020/07/10 13:30:54.944 CEST  tid=0x3d] Adding CQ into MatchingCQ map, CQName: randomTracker__AppCounters 
Number of matched querys are: 1
   [debug 2020/07/10 13:30:54.945 CEST  tid=0x3d] Adding to CQ Repository. CqName : randomTracker ServerCqName : 
randomTracker__AppCounters
   [debug 2020/07/10 13:30:54.945 CEST  tid=0x3d] Adding CQ randomTracker__AppCounters to this members FilterProfile.
   [debug 2020/07/10 13:30:54.945 CEST  tid=0x3d] Successfully created CQ on the server. CqName : randomTracker
   ```
   
   I can attach full logs if you need. Also, I have found the the following 
comment in the client code:
   ```
   // Even though the new redundant queue will usually recover
   // subscription information (see bug #39014) from its initial
   // image provider, in bug #42280 we found that this is not always
   // the case, so clients must always register interest with the new
   // redundant server.
   if (recoverInterest) {
 recoverInterest(queueConnection, isFirstNewConnection);
   }
   ```
   It is stated here the there is possible case when redundant queue isn't 
recovered by `InitialImageOperation$FilterInfoMessage`, but I haven't been able 
to reproduce that case. Do you see any benefit in finding and creating TC for 
this scenario, since recovery of durable CQ is already tested with TC without 
redundancy?




[jira] [Commented] (GEODE-8119) Threads are not properly closed when offline disk-store commands are invoked

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156655#comment-17156655
 ] 

ASF GitHub Bot commented on GEODE-8119:
---

jujoramos commented on pull request #5175:
URL: https://github.com/apache/geode/pull/5175#issuecomment-657507636


   Hello @mkevo,
   
   This PR has been inactive for quite some time now, should we close it or are 
you planning to continue working on it?.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Threads are not properly closed when offline disk-store commands are invoked
> 
>
> Key: GEODE-8119
> URL: https://issues.apache.org/jira/browse/GEODE-8119
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Kevo
>Assignee: Mario Kevo
>Priority: Major
>
> Threads can be opened when you are online and offline, but close only when 
> you are online. Once some offline command started thread it cannot be closed 
> and after some time if there is a bigger number of this threads it can lead 
> to OOM exception.
> Also the problem is that its validating only disk-dirs but not diskStore 
> name. So thread can be created but there is no diskStore with that name and 
> it will also hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8119) Threads are not properly closed when offline disk-store commands are invoked

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156659#comment-17156659
 ] 

ASF GitHub Bot commented on GEODE-8119:
---

mkevo commented on pull request #5175:
URL: https://github.com/apache/geode/pull/5175#issuecomment-657509799


   > Hello @mkevo,
   > 
   > This PR has been inactive for quite some time now, should we close it or 
are you planning to continue working on it?.
   
   Hi @jujoramos,
   I have some other commitments, as soon as possible I will come back to this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Threads are not properly closed when offline disk-store commands are invoked
> 
>
> Key: GEODE-8119
> URL: https://issues.apache.org/jira/browse/GEODE-8119
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Mario Kevo
>Assignee: Mario Kevo
>Priority: Major
>
> Threads can be opened when you are online and offline, but close only when 
> you are online. Once some offline command started thread it cannot be closed 
> and after some time if there is a bigger number of this threads it can lead 
> to OOM exception.
> Also the problem is that its validating only disk-dirs but not diskStore 
> name. So thread can be created but there is no diskStore with that name and 
> it will also hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156682#comment-17156682
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

jvarenina commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r453607399



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   Thanks for your comments!
   
   Related to the Interest recovery, I have tried following case:
   ```
   Start three servers. Start client with following config:
   - redundnacy set to 0
   - register non-durable Interests
   - configure durable id
   ```
   
   After I shutdown primary server I expected that the client should 
register/recover Interests on the another running server. I have tried exactly 
same case with and without the code you suggested. What I have noticed that 
some steps are missing related to the recovery of non-durable Interest when 
using solution you suggested (please check logs below). Is this expected?
   
   without your code:
   ```
   [debug 2020/07/13 13:56:37.574 CEST  
tid=0x5a] SubscriptionManager redundancy satisfier - Non backup server was made 
primary. Recovering interest jakov:26486
   [info 2020/07/13 13:56:37.574 CEST :41002 port 26486> tid=0x5b] Cache Client Updater 
Thread  on 192.168.1.101(12538):41002 port 26486 (jakov:26486) : ready to 
process messages.
   [debug 2020/07/13 13:56:37.576 CEST  
tid=0x5a] 
org.apache.geode.cache.client.internal.QueueManagerImpl@69610f07.recoverSingleRegion
 starting kind=KEY region=/HAInterestBaseTest_region: {k1=KEYS, k2=KEYS}
   [debug 2020/07/13 13:56:37.576 CEST  
tid=0x5a] registerInterestsStarted: new count = 1
   [debug 2020/07/13 13:56:37.578 CEST  
tid=0x5a] localDestroyNoCallbacks key=k2
   [debug 2020/07/13 13:56:37.579 CEST  
tid=0x5a] basicDestroyPart2: k2, version=null
   [debug 2020/07/13 13:56:37.580 CEST  
tid=0x5a] VersionedThinRegionEntryHeapStringKey1@1f47aafa (key=k2; 
rawValue=REMOVED_PHASE1; version={v1; rv2; mbr=192.168.1.101(12538):41002; 
time=1594641397170};member=192.168.1.101(12538):41002) dispatching event 
EntryEventImpl[op=LOCAL_DESTROY;region=/HAInterestBaseTest_region;key=k2;callbackArg=null;originRemote=false;originMember=jakov(12395:loner):57906:02b10848]
   [debug 2020/07/13 13:56:37.580 CEST  
tid=0x5a] localDestroyNoCallbacks key=k1
   [debug 2020/07/13 13:56:37.580 CEST  
tid=0x5a] basicDestroyPart2: k1, version=null
   [debug 2020/07/13 13:56:37.580 CEST  
tid=0x5a] VersionedThinRegionEntryHeapStringKey1@5ecf6c9c (key=k1; 
rawValue=REMOVED_PHASE1; version={v1; rv1; mbr=192.168.1.101(12538):41002; 
time=1594641397148};member=192.168.1.101(12538):41002) dispatching event 
EntryEventImpl[op=LOCAL_DESTROY;region=/HAInterestBaseTest_region;key=k1;callbackArg=null;originRemote=false;originMember=jakov(12395:loner):57906:02b10848]
   [debug 2020/07/13 13:56:37.580 CEST  
tid=0x5a] 
org.apache.geode.cache.client.internal.QueueManagerImpl@69610f07.recoverSingleRegion
 :Endpoint recovered is primary so clearing the keys of interest starting 
kind=KEY region=/HAInterestBaseTest_region: [k1, k2]
   [debug 2020/07/13 13:56:37.584 CEST  
tid=0x5a] 
org.apache.geode.internal.cache.LocalRegion[path='/HAInterestBaseTest_region';scope=LOCAL';dataPolicy=NORMAL;
 concurrencyChecksEnabled] refreshEntriesFromServerKeys count=2 policy=KEYS
 k1
 k2
   [debug 2020/07/13 13:56:37.584 CEST  
tid=0x5a] refreshEntries region=/HAInterestBaseTest_region
   [debug 2020/07/13 13:56:37.585 CEST  
tid=0x5a] registerInterestCompleted: new value = 0
   [debug 2020/07/13 13:56:37.585 CEST  
tid=0x5a] registerInterestCompleted: Signalling end of register-interest
   [debug 2020/07/13 13:56:37.586 CEST  
tid=0x5a] Primary recovery not needed
   
   ```
   
   with your code:
   ```
   [debug 2020/07/13 13:44:20.028 CEST  
tid=0x5a] SubscriptionManager redundancy satisfier - Non backup server was made 
primary. Recovering interest jakov:28101
   [info 2020/07/13 13:44:20.028 CEST :41002 port 28101> tid=0x5b] Cache Client Updater 
Thread  on 192.168.1.101(11053):41002 port 28101 (jakov:28101) : ready to 
process messages.
   [debug 2020/07/13 13:44:20.030 CEST  
tid=0x5a] Primary recovery not needed
   ```
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156683#comment-17156683
 ] 

ASF GitHub Bot commented on GEODE-8329:
---

jvarenina commented on a change in pull request #5360:
URL: https://github.com/apache/geode/pull/5360#discussion_r453489804



##
File path: 
geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java
##
@@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, 
boolean isDurable) {
 .set(((DefaultQueryService) 
this.pool.getQueryService()).getUserAttributes(name));
   }
   try {
-if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) {
+if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT

Review comment:
   I have tested TC with redundancy configured and it seems that the 
recovery of CQs is done a differently in this case. The primary server sends 
within `InitialImageOperation$FilterInfoMessage` all releveant CQs information 
to the starting server. At the reception of the message the starting server 
then registers CQs as durable (so no problem in this case observed).
   
   **Primary server:**
   ```
   [debug 2020/07/10 13:30:54.916 CEST :41001 shared unordered uid=1 local port=53683 
remote port=45674> tid=0x57] Received message 
'InitialImageOperation$RequestFilterInfoMessage(region 
path='/_gfe_durable_client_with_id_AppCounters_1_queue'; 
sender=192.168.1.102(server3:31347):41001; processorId=27)' from 
<192.168.1.102(server3:31347):41001>
   ```
   
   **Starting server:**
   ```
   [debug 2020/07/10 13:30:54.916 CEST  
tid=0x48] Sending (InitialImageOperation$RequestFilterInfoMessage(region 
path='/_gfe_durable_client_with_id_AppCounters_1_queue'; 
sender=192.168.1.102(server3:31347):41001; processorId=27)) to 1 peers 
([192.168.1.102(server1:30862):41000]) via tcp/ip
   
   [debug 2020/07/10 13:30:54.918 CEST :41000 shared unordered uid=5 local port=52175 
remote port=46552> tid=0x30] Received message 
'InitialImageOperation$FilterInfoMessage processorId=27 from 
192.168.1.102(server1:30862):41000; NON_DURABLE allKeys=0; allKeysInv=0; 
keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; 
patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; DURABLE 
allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; 
patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; 
filtersOfInterestInv=0; cqs=1' from <192.168.1.102(server1:30862):41000>
   
   [debug 2020/07/10 13:30:54.919 CEST  tid=0x3d] Processing FilterInfo for proxy: 
CacheClientProxy[identity(192.168.1.102(31226:loner):45576:8b927d38,connection=1,durableAttributes=DurableClientAttributes[id=AppCounters;
 timeout=200]); port=57552; primary=false; version=GEODE 1.12.0] : 
InitialImageOperation$FilterInfoMessage processorId=27 from 
192.168.1.102(server1:30862):41000; NON_DURABLE allKeys=0; allKeysInv=0; 
keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; 
patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; DURABLE 
allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; 
patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; 
filtersOfInterestInv=0; cqs=1
   
   [debug 2020/07/10 13:30:54.944 CEST  tid=0x3d] Server side query for the cq: randomTracker is: SELECT * FROM 
/example-region i where i > 70
   [debug 2020/07/10 13:30:54.944 CEST  tid=0x3d] Added CQ to the base region: /example-region With key as: 
randomTracker__AppCounters
   [debug 2020/07/10 13:30:54.944 CEST  tid=0x3d] Adding CQ into MatchingCQ map, CQName: randomTracker__AppCounters 
Number of matched querys are: 1
   [debug 2020/07/10 13:30:54.945 CEST  tid=0x3d] Adding to CQ Repository. CqName : randomTracker ServerCqName : 
randomTracker__AppCounters
   [debug 2020/07/10 13:30:54.945 CEST  tid=0x3d] Adding CQ randomTracker__AppCounters to this members FilterProfile.
   [debug 2020/07/10 13:30:54.945 CEST  tid=0x3d] Successfully created CQ on the server. CqName : randomTracker
   ```
   
   I can attach full logs if you need. Also, I have found the the following 
comment in the client code:
   ```
   // Even though the new redundant queue will usually recover
   // subscription information (see bug #39014) from its initial
   // image provider, in bug #42280 we found that this is not always
   // the case, so clients must always register interest with the new
   // redundant server.
   if (recoverInterest) {
 recoverInterest(queueConnection, isFirstNewConnection);
   }
   ```
   It is stated here the there is possible case when redundant queue isn't 
recovered by `InitialImageOperation$FilterInfoMessage`, but I haven't been able 
to reproduce that case. Do you see any benefit in finding and creating TC for 
this scenario, since recovery of durable CQ is already tested with TC without 
redundancy?




-

[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156768#comment-17156768
 ] 

ASF GitHub Bot commented on GEODE-8351:
---

sabbeyPivotal commented on a change in pull request #5364:
URL: https://github.com/apache/geode/pull/5364#discussion_r453711863



##
File path: 
geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+
+package org.apache.geode.redis.internal.data;
+
+import static 
org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT;
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import redis.clients.jedis.Jedis;
+
+import org.apache.geode.cache.Region;
+import org.apache.geode.cache.partition.PartitionRegionHelper;
+import org.apache.geode.internal.cache.InternalCache;
+import org.apache.geode.test.awaitility.GeodeAwaitility;
+import org.apache.geode.test.dunit.rules.ClusterStartupRule;
+import org.apache.geode.test.dunit.rules.MemberVM;
+import org.apache.geode.test.dunit.rules.RedisClusterStartupRule;
+
+public class DeltaDUnitTest {
+
+  @ClassRule
+  public static RedisClusterStartupRule clusterStartUp = new 
RedisClusterStartupRule(4);
+
+  private static final String LOCAL_HOST = "127.0.0.1";
+  private static final int SET_SIZE = 10;
+  private static final int JEDIS_TIMEOUT =
+  Math.toIntExact(GeodeAwaitility.getTimeout().toMillis());
+  private static Jedis jedis1;
+  private static Jedis jedis2;
+
+  private static Properties locatorProperties;
+
+  private static MemberVM locator;
+  private static MemberVM server1;
+  private static MemberVM server2;
+
+  private static int redisServerPort1;
+  private static int redisServerPort2;
+
+  @BeforeClass
+  public static void classSetup() {
+locatorProperties = new Properties();
+locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000");
+
+locator = clusterStartUp.startLocatorVM(0, locatorProperties);
+server1 = clusterStartUp.startRedisVM(1, locator.getPort());
+server2 = clusterStartUp.startRedisVM(2, locator.getPort());
+
+redisServerPort1 = clusterStartUp.getRedisPort(1);
+redisServerPort2 = clusterStartUp.getRedisPort(2);
+
+jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT);
+jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT);
+  }
+
+  @Before
+  public void testSetup() {
+jedis1.flushAll();
+  }
+
+  @AfterClass
+  public static void tearDown() {
+jedis1.disconnect();
+jedis2.disconnect();
+
+server1.stop();
+server2.stop();
+  }
+
+  @Test
+  public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() {
+String key = "key";
+String baseValue = "value-";
+jedis1.set(key, baseValue);
+for (int i = 0; i < SET_SIZE; i++) {
+  jedis1.set(key, String.valueOf(i));

Review comment:
   Yes, thank you!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DUnit tests for Delta Propagation
> -
>
> Key: GEODE-8351
> URL: https://issues.apache.org/jira/browse/GEODE-8351
> Project: Geode
>  Issue Type: Test
>  Components: redis, tests
>Reporter: Sarah Abbey
>Priority: Major
>
> Need to confirm that when deltas are propagated, the data is correctly stored 
> on the secondary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156769#comment-17156769
 ] 

ASF GitHub Bot commented on GEODE-8351:
---

sabbeyPivotal commented on a change in pull request #5364:
URL: https://github.com/apache/geode/pull/5364#discussion_r453712504



##
File path: 
geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+
+package org.apache.geode.redis.internal.data;
+
+import static 
org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT;
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import redis.clients.jedis.Jedis;
+
+import org.apache.geode.cache.Region;
+import org.apache.geode.cache.partition.PartitionRegionHelper;
+import org.apache.geode.internal.cache.InternalCache;
+import org.apache.geode.test.awaitility.GeodeAwaitility;
+import org.apache.geode.test.dunit.rules.ClusterStartupRule;
+import org.apache.geode.test.dunit.rules.MemberVM;
+import org.apache.geode.test.dunit.rules.RedisClusterStartupRule;
+
+public class DeltaDUnitTest {
+
+  @ClassRule
+  public static RedisClusterStartupRule clusterStartUp = new 
RedisClusterStartupRule(4);
+
+  private static final String LOCAL_HOST = "127.0.0.1";
+  private static final int SET_SIZE = 10;
+  private static final int JEDIS_TIMEOUT =
+  Math.toIntExact(GeodeAwaitility.getTimeout().toMillis());
+  private static Jedis jedis1;
+  private static Jedis jedis2;
+
+  private static Properties locatorProperties;
+
+  private static MemberVM locator;
+  private static MemberVM server1;
+  private static MemberVM server2;
+
+  private static int redisServerPort1;
+  private static int redisServerPort2;
+
+  @BeforeClass
+  public static void classSetup() {
+locatorProperties = new Properties();
+locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000");
+
+locator = clusterStartUp.startLocatorVM(0, locatorProperties);
+server1 = clusterStartUp.startRedisVM(1, locator.getPort());
+server2 = clusterStartUp.startRedisVM(2, locator.getPort());
+
+redisServerPort1 = clusterStartUp.getRedisPort(1);
+redisServerPort2 = clusterStartUp.getRedisPort(2);
+
+jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT);
+jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT);
+  }
+
+  @Before
+  public void testSetup() {
+jedis1.flushAll();
+  }
+
+  @AfterClass
+  public static void tearDown() {
+jedis1.disconnect();
+jedis2.disconnect();
+
+server1.stop();
+server2.stop();
+  }
+
+  @Test
+  public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() {
+String key = "key";
+String baseValue = "value-";
+jedis1.set(key, baseValue);
+for (int i = 0; i < SET_SIZE; i++) {

Review comment:
   Thank you, meant to change that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DUnit tests for Delta Propagation
> -
>
> Key: GEODE-8351
> URL: https://issues.apache.org/jira/browse/GEODE-8351
> Project: Geode
>  Issue Type: Test
>  Components: redis, tests
>Reporter: Sarah Abbey
>Priority: Major
>
> Need to confirm that when deltas are propagated, the data is correctly stored 
> on the secondary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156775#comment-17156775
 ] 

ASF GitHub Bot commented on GEODE-8351:
---

sabbeyPivotal commented on a change in pull request #5364:
URL: https://github.com/apache/geode/pull/5364#discussion_r453718436



##
File path: 
geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+
+package org.apache.geode.redis.internal.data;
+
+import static 
org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT;
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import redis.clients.jedis.Jedis;
+
+import org.apache.geode.cache.Region;
+import org.apache.geode.cache.partition.PartitionRegionHelper;
+import org.apache.geode.internal.cache.InternalCache;
+import org.apache.geode.test.awaitility.GeodeAwaitility;
+import org.apache.geode.test.dunit.rules.ClusterStartupRule;
+import org.apache.geode.test.dunit.rules.MemberVM;
+import org.apache.geode.test.dunit.rules.RedisClusterStartupRule;
+
+public class DeltaDUnitTest {
+
+  @ClassRule
+  public static RedisClusterStartupRule clusterStartUp = new 
RedisClusterStartupRule(4);
+
+  private static final String LOCAL_HOST = "127.0.0.1";
+  private static final int SET_SIZE = 10;
+  private static final int JEDIS_TIMEOUT =
+  Math.toIntExact(GeodeAwaitility.getTimeout().toMillis());
+  private static Jedis jedis1;
+  private static Jedis jedis2;
+
+  private static Properties locatorProperties;
+
+  private static MemberVM locator;
+  private static MemberVM server1;
+  private static MemberVM server2;
+
+  private static int redisServerPort1;
+  private static int redisServerPort2;
+
+  @BeforeClass
+  public static void classSetup() {
+locatorProperties = new Properties();
+locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000");
+
+locator = clusterStartUp.startLocatorVM(0, locatorProperties);
+server1 = clusterStartUp.startRedisVM(1, locator.getPort());
+server2 = clusterStartUp.startRedisVM(2, locator.getPort());
+
+redisServerPort1 = clusterStartUp.getRedisPort(1);
+redisServerPort2 = clusterStartUp.getRedisPort(2);
+
+jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT);
+jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT);
+  }
+
+  @Before
+  public void testSetup() {
+jedis1.flushAll();
+  }
+
+  @AfterClass
+  public static void tearDown() {
+jedis1.disconnect();
+jedis2.disconnect();
+
+server1.stop();
+server2.stop();
+  }
+
+  @Test
+  public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() {
+String key = "key";
+String baseValue = "value-";
+jedis1.set(key, baseValue);
+for (int i = 0; i < SET_SIZE; i++) {
+  jedis1.set(key, String.valueOf(i));
+
+  String server1LocalValue = server1.invoke(() -> {
+InternalCache cache = ClusterStartupRule.getCache();
+Region region = 
cache.getRegion("__REDIS_DATA");
+Region localRegion =
+PartitionRegionHelper.getLocalData(region);
+
+RedisData localValue = localRegion.get(new 
ByteArrayWrapper(key.getBytes()));
+return localValue.toString();

Review comment:
   good call, updating it!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DUnit tests for Delta Propagation
> -
>
> Key: GEODE-8351
> URL: https://

[jira] [Created] (GEODE-8354) CI failure: DescribeClientCommandDUnitTest > describeClientWithoutSubscription FAILED

2020-07-13 Thread Owen Nichols (Jira)
Owen Nichols created GEODE-8354:
---

 Summary: CI failure: DescribeClientCommandDUnitTest > 
describeClientWithoutSubscription FAILED
 Key: GEODE-8354
 URL: https://issues.apache.org/jira/browse/GEODE-8354
 Project: Geode
  Issue Type: Bug
  Components: management
Reporter: Owen Nichols


java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.geode.management.internal.cli.functions.ContinuousQueryFunction$ClientInfo
{noformat}
> Task :geode-cq:distributedTest

org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest
 > describeClientWithoutSubscription FAILED
java.lang.AssertionError: Suspicious strings were written to the log during 
this run.
Fix the strings or use IgnoredException.addIgnoredException to ignore.
---
Found suspect string in log4j at line 2887

[error 2020/07/10 23:22:52.403 GMT  
tid=110] Could not execute "describe client 
--clientID=10.0.0.97(11700:loner):54634:b3e7093b".
java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.geode.management.internal.cli.functions.ContinuousQueryFunction$ClientInfo
  at 
org.apache.geode.management.internal.cli.commands.DescribeClientCommand.describeClient(DescribeClientCommand.java:123)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:282)
  at 
org.apache.geode.management.internal.cli.remote.CommandExecutor.callInvokeMethod(CommandExecutor.java:151)
  at 
org.apache.geode.management.internal.cli.remote.CommandExecutor.invokeCommand(CommandExecutor.java:161)
  at 
org.apache.geode.management.internal.cli.remote.CommandExecutor.execute(CommandExecutor.java:88)
  at 
org.apache.geode.management.internal.cli.remote.CommandExecutor.execute(CommandExecutor.java:71)
  at 
org.apache.geode.management.internal.cli.remote.OnlineCommandProcessor.executeCommand(OnlineCommandProcessor.java:130)
  at 
org.apache.geode.management.internal.cli.remote.OnlineCommandProcessor.executeCommandReturningJson(OnlineCommandProcessor.java:136)
  at 
org.apache.geode.management.internal.beans.MemberMBeanBridge.processCommand(MemberMBeanBridge.java:1237)
  at 
org.apache.geode.management.internal.beans.MemberMBean.processCommand(MemberMBean.java:424)
 {noformat}
 

seen in [WindowsGfshDistributedTestOpenJDK8 
#335|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK8/builds/335#A]

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[*http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0217/test-results/distributedTest/1594428192/*]
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Test report artifacts from this job are available at:

[*http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0217/test-artifacts/1594428192/windows-gfshdistributedtest-OpenJDK8-1.14.0-build.0217.tgz*]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156820#comment-17156820
 ] 

ASF GitHub Bot commented on GEODE-8340:
---

pdxcodemonkey commented on a change in pull request #625:
URL: https://github.com/apache/geode-native/pull/625#discussion_r453764965



##
File path: cppcache/src/ExceptionTypes.cpp
##
@@ -297,7 +297,25 @@ const std::string& getThreadLocalExceptionMessage();
   PutAllPartialResultException ex(message);
   throw ex;
 }
-default: {
+case GF_NOERR:

Review comment:
   Same as above - adding a zillion case statements doesn't make code 
clearer.

##
File path: cppcache/include/geode/DataInput.hpp
##
@@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput {
 // empty string
 break;
   // TODO: What's the right response here?
-  default:
+  case internal::DSCode::FixedIDDefault:

Review comment:
   Wow I really really don't like this.  Is there anything we can do to fix 
the warning _besides_ adding ~65 case statements?  This makes the code less 
readable, not more IMO.

##
File path: cppcache/src/TcrMessage.cpp
##
@@ -1015,6 +1015,36 @@ void TcrMessage::processChunk(const 
std::vector& chunk, int32_t len,
 break;
   }
   // fall-through for other cases
+  if (m_chunkedResult != nullptr) {

Review comment:
   What's this block of code doing?  I don't immediately see a place where 
it was removed/moved, so is it new?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce Switch compiler warnings as errors
> --
>
> Key: GEODE-8340
> URL: https://issues.apache.org/jira/browse/GEODE-8340
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>
> Given I compile the code without exempting no-switch-enum and 
> no-implicit-fallthrough and no-covered-switch-default
> Then it should compile
> Note - was marked as a todo, seems reasonable to tackle all these at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156821#comment-17156821
 ] 

ASF GitHub Bot commented on GEODE-8326:
---

pivotal-eshu merged pull request #5358:
URL: https://github.com/apache/geode/pull/5358


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CI Failure: 
> FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled
>  times out waiting for client metadata
> ---
>
> Key: GEODE-8326
> URL: https://issues.apache.org/jira/browse/GEODE-8326
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Affects Versions: 1.13.0
>Reporter: Kirk Lund
>Assignee: Eric Shu
>Priority: Major
>  Labels: caching-applications
>
> CI Failure: 
> http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/
> {noformat}
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  > 
> clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject]
>  FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  that uses org.apache.geode.cache.client.internal.ClientMetadataService was 
> not fulfilled within 5 minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254)
> {noformat}
> The failure occurs after waiting 5 minutes for the ClientMetadataService to 
> stabilize. See ClientMetadataService#isMetadataStable.
> The timeout occurs within a block of test code that was introduced by Jake in 
> PR #3840:
> {noformat}
> GEODE-7006: Fixes function execution by id with transactions. (#3840)  
> * Fixes test to force and wait for PR metadata to update.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata

2020-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156823#comment-17156823
 ] 

ASF subversion and git services commented on GEODE-8326:


Commit 9cd8e7d82c90aed804c39bd0fadd31c5d2eac18c in geode's branch 
refs/heads/develop from Eric Shu
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9cd8e7d ]

GEODE-8326: remove 5 minutes wait to get stack dump (#5358)



> CI Failure: 
> FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled
>  times out waiting for client metadata
> ---
>
> Key: GEODE-8326
> URL: https://issues.apache.org/jira/browse/GEODE-8326
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Affects Versions: 1.13.0
>Reporter: Kirk Lund
>Assignee: Eric Shu
>Priority: Major
>  Labels: caching-applications
>
> CI Failure: 
> http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/
> {noformat}
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  > 
> clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject]
>  FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  that uses org.apache.geode.cache.client.internal.ClientMetadataService was 
> not fulfilled within 5 minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254)
> {noformat}
> The failure occurs after waiting 5 minutes for the ClientMetadataService to 
> stabilize. See ClientMetadataService#isMetadataStable.
> The timeout occurs within a block of test code that was introduced by Jake in 
> PR #3840:
> {noformat}
> GEODE-7006: Fixes function execution by id with transactions. (#3840)  
> * Fixes test to force and wait for PR metadata to update.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8342) Remove non-inclusive language

2020-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156825#comment-17156825
 ] 

ASF subversion and git services commented on GEODE-8342:


Commit 729185236e66377e5b367d40e43c8654314c60ed in geode-native's branch 
refs/heads/develop from Jacob Barrett
[ https://gitbox.apache.org/repos/asf?p=geode-native.git;h=7291852 ]

GEODE-8342: Replace non-inclusive language. (#626)

* 'blacklist' wasn't effectively in use anyway, so just remove it.

> Remove non-inclusive language
> -
>
> Key: GEODE-8342
> URL: https://issues.apache.org/jira/browse/GEODE-8342
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Jacob Barrett
>Priority: Major
>
> Geode native includes some non-inclusive language that should be replaced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156824#comment-17156824
 ] 

ASF GitHub Bot commented on GEODE-8351:
---

sabbeyPivotal commented on a change in pull request #5364:
URL: https://github.com/apache/geode/pull/5364#discussion_r453770437



##
File path: 
geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+
+package org.apache.geode.redis.internal.data;
+
+import static 
org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT;
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import redis.clients.jedis.Jedis;
+
+import org.apache.geode.cache.Region;
+import org.apache.geode.cache.partition.PartitionRegionHelper;
+import org.apache.geode.internal.cache.InternalCache;
+import org.apache.geode.test.awaitility.GeodeAwaitility;
+import org.apache.geode.test.dunit.rules.ClusterStartupRule;
+import org.apache.geode.test.dunit.rules.MemberVM;
+import org.apache.geode.test.dunit.rules.RedisClusterStartupRule;
+
+public class DeltaDUnitTest {
+
+  @ClassRule
+  public static RedisClusterStartupRule clusterStartUp = new 
RedisClusterStartupRule(4);
+
+  private static final String LOCAL_HOST = "127.0.0.1";
+  private static final int SET_SIZE = 10;
+  private static final int JEDIS_TIMEOUT =
+  Math.toIntExact(GeodeAwaitility.getTimeout().toMillis());
+  private static Jedis jedis1;
+  private static Jedis jedis2;
+
+  private static Properties locatorProperties;
+
+  private static MemberVM locator;
+  private static MemberVM server1;
+  private static MemberVM server2;
+
+  private static int redisServerPort1;
+  private static int redisServerPort2;
+
+  @BeforeClass
+  public static void classSetup() {
+locatorProperties = new Properties();
+locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000");
+
+locator = clusterStartUp.startLocatorVM(0, locatorProperties);
+server1 = clusterStartUp.startRedisVM(1, locator.getPort());
+server2 = clusterStartUp.startRedisVM(2, locator.getPort());
+
+redisServerPort1 = clusterStartUp.getRedisPort(1);
+redisServerPort2 = clusterStartUp.getRedisPort(2);
+
+jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT);
+jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT);
+  }
+
+  @Before
+  public void testSetup() {
+jedis1.flushAll();
+  }
+
+  @AfterClass
+  public static void tearDown() {
+jedis1.disconnect();
+jedis2.disconnect();
+
+server1.stop();
+server2.stop();
+  }
+
+  @Test
+  public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() {
+String key = "key";
+String baseValue = "value-";
+jedis1.set(key, baseValue);
+for (int i = 0; i < SET_SIZE; i++) {
+  jedis1.set(key, String.valueOf(i));
+
+  String server1LocalValue = server1.invoke(() -> {

Review comment:
   That is true. I made a generic method for getting the local region and 
getting the correct data.  I'm not sure how much further we could go without 
sacrificing readability?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DUnit tests for Delta Propagation
> -
>
> Key: GEODE-8351
> URL: https://issues.apache.org/jira/browse/GEODE-8351
> Project: Geode
>  Issue Type: Test
>  Components: redis, tests
>Reporter: Sarah Abbey
>Prio

[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156822#comment-17156822
 ] 

ASF GitHub Bot commented on GEODE-8340:
---

moleske commented on a change in pull request #625:
URL: https://github.com/apache/geode-native/pull/625#discussion_r453770010



##
File path: cppcache/src/TcrMessage.cpp
##
@@ -1015,6 +1015,36 @@ void TcrMessage::processChunk(const 
std::vector& chunk, int32_t len,
 break;
   }
   // fall-through for other cases
+  if (m_chunkedResult != nullptr) {

Review comment:
   The previous code did a fallthrough if it didn't make it to the `break;` 
statement on line 999 or 1015 (which is buried in an `if` and `if else`).  This 
is the same code as on line 1054.  I thought about extracting a function but 
got lazy





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce Switch compiler warnings as errors
> --
>
> Key: GEODE-8340
> URL: https://issues.apache.org/jira/browse/GEODE-8340
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>
> Given I compile the code without exempting no-switch-enum and 
> no-implicit-fallthrough and no-covered-switch-default
> Then it should compile
> Note - was marked as a todo, seems reasonable to tackle all these at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8342) Remove non-inclusive language

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156826#comment-17156826
 ] 

ASF GitHub Bot commented on GEODE-8342:
---

pdxcodemonkey merged pull request #626:
URL: https://github.com/apache/geode-native/pull/626


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove non-inclusive language
> -
>
> Key: GEODE-8342
> URL: https://issues.apache.org/jira/browse/GEODE-8342
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Jacob Barrett
>Priority: Major
>
> Geode native includes some non-inclusive language that should be replaced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8348) CI does not build benchmarks image

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156829#comment-17156829
 ] 

ASF GitHub Bot commented on GEODE-8348:
---

smgoller opened a new pull request #131:
URL: https://github.com/apache/geode-benchmarks/pull/131


   Add the ability to choose what purpose tag is searched for when launching 
benchmarks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CI does not build benchmarks image
> --
>
> Key: GEODE-8348
> URL: https://issues.apache.org/jira/browse/GEODE-8348
> Project: Geode
>  Issue Type: Bug
>  Components: ci
>Reporter: Sean Goller
>Priority: Major
>
> The CI infrastructure relies on the existence of a google compute image in 
> order to function. Currently that image is not build anywhere in CI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156831#comment-17156831
 ] 

ASF GitHub Bot commented on GEODE-8340:
---

moleske commented on a change in pull request #625:
URL: https://github.com/apache/geode-native/pull/625#discussion_r453775780



##
File path: cppcache/include/geode/DataInput.hpp
##
@@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput {
 // empty string
 break;
   // TODO: What's the right response here?
-  default:
+  case internal::DSCode::FixedIDDefault:

Review comment:
   I actually like it since it forces you to acknowledge all possible 
switches for an enum (which is enumerable).  That way you don't accidentally go 
to the default when adding a new enum.  If the general consensus is we prefer 
not to change `-Wno-switch-enum` then we can revert that commit.  I could make 
it worse by adding `break;` statements in between all the cases :-p





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce Switch compiler warnings as errors
> --
>
> Key: GEODE-8340
> URL: https://issues.apache.org/jira/browse/GEODE-8340
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>
> Given I compile the code without exempting no-switch-enum and 
> no-implicit-fallthrough and no-covered-switch-default
> Then it should compile
> Note - was marked as a todo, seems reasonable to tackle all these at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8355) long-running-test details need to be world-readable

2020-07-13 Thread Robert Houghton (Jira)
Robert Houghton created GEODE-8355:
--

 Summary: long-running-test details need to be world-readable
 Key: GEODE-8355
 URL: https://issues.apache.org/jira/browse/GEODE-8355
 Project: Geode
  Issue Type: Improvement
  Components: ci
Reporter: Robert Houghton






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8356) Gfsh export logs to support capturing thread dumps in the logs

2020-07-13 Thread Anilkumar Gingade (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anilkumar Gingade updated GEODE-8356:
-
Labels: GeodeOperationAPI  (was: )

> Gfsh export logs to support capturing thread dumps in the logs
> --
>
> Key: GEODE-8356
> URL: https://issues.apache.org/jira/browse/GEODE-8356
> Project: Geode
>  Issue Type: Bug
>  Components: gfsh
>Reporter: Anilkumar Gingade
>Priority: Major
>  Labels: GeodeOperationAPI
>
> With an option to say "--with-thread-dump"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8356) Gfsh export logs to support capturing thread dumps in the logs

2020-07-13 Thread Anilkumar Gingade (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anilkumar Gingade updated GEODE-8356:
-
Issue Type: Improvement  (was: Bug)

> Gfsh export logs to support capturing thread dumps in the logs
> --
>
> Key: GEODE-8356
> URL: https://issues.apache.org/jira/browse/GEODE-8356
> Project: Geode
>  Issue Type: Improvement
>  Components: gfsh
>Reporter: Anilkumar Gingade
>Priority: Major
>  Labels: GeodeOperationAPI
>
> With an option to say "--with-thread-dump"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8356) Gfsh export logs to support capturing thread dumps in the logs

2020-07-13 Thread Anilkumar Gingade (Jira)
Anilkumar Gingade created GEODE-8356:


 Summary: Gfsh export logs to support capturing thread dumps in the 
logs
 Key: GEODE-8356
 URL: https://issues.apache.org/jira/browse/GEODE-8356
 Project: Geode
  Issue Type: Bug
  Components: gfsh
Reporter: Anilkumar Gingade


With an option to say "--with-thread-dump"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8355) long-running-test details need to be world-readable

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156834#comment-17156834
 ] 

ASF GitHub Bot commented on GEODE-8355:
---

rhoughton-pivot opened a new pull request #5366:
URL: https://github.com/apache/geode/pull/5366


   Authored-by: Robert Houghton 
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [X] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [X] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [X] Is your initial contribution a single, squashed commit?
   
   - [X] Does `gradlew build` run cleanly?
   
   - [n/a] Have you written or updated unit tests to verify your changes?
   
   - [n/a] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> long-running-test details need to be world-readable
> ---
>
> Key: GEODE-8355
> URL: https://issues.apache.org/jira/browse/GEODE-8355
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Robert Houghton
>Assignee: Robert Houghton
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8355) long-running-test details need to be world-readable

2020-07-13 Thread Robert Houghton (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Houghton reassigned GEODE-8355:
--

Assignee: Robert Houghton

> long-running-test details need to be world-readable
> ---
>
> Key: GEODE-8355
> URL: https://issues.apache.org/jira/browse/GEODE-8355
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Robert Houghton
>Assignee: Robert Houghton
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8348) CI does not build benchmarks image

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156837#comment-17156837
 ] 

ASF GitHub Bot commented on GEODE-8348:
---

smgoller opened a new pull request #5367:
URL: https://github.com/apache/geode/pull/5367


   * Add EC2 builder job to images.
   * Benchmarks job uses branch-specific image.
   * Change benchmarks source repository location to the deployed fork's repo 
instead
 of forcing apache.
   * download our own copy of fly to use when deploying pipelines via 
deploy_meta.sh.
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CI does not build benchmarks image
> --
>
> Key: GEODE-8348
> URL: https://issues.apache.org/jira/browse/GEODE-8348
> Project: Geode
>  Issue Type: Bug
>  Components: ci
>Reporter: Sean Goller
>Priority: Major
>
> The CI infrastructure relies on the existence of a google compute image in 
> order to function. Currently that image is not build anywhere in CI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8350) Offical Docker Image Needs ENTRYPOINT

2020-07-13 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-8350:

Labels: starter  (was: )

> Offical Docker Image Needs ENTRYPOINT
> -
>
> Key: GEODE-8350
> URL: https://issues.apache.org/jira/browse/GEODE-8350
> Project: Geode
>  Issue Type: Bug
>Reporter: Bill Burcham
>Priority: Major
>  Labels: starter
>
> The official docker image defines a {{CMD ["gfsh"]}} but no {{ENTRYPOINT}}. 
> As a result, it's easy to run {{gfsh}} interactively:
> {noformat}
> docker run -it apachegeode/geode
> {noformat}
> but to run a non-interactive {{gfsh}} command/script takes extra effort:
> {noformat}
> docker run --entrypoint gfsh apachegeode/geode -e version
> {noformat}
> When this story is complete, the official Docker image will define an 
> {{ENTRYPOINT ["gfsh"]}}  that will allow execution of a non-interactive 
> script like:
> {noformat}
> docker run apachegeode/geode -e version
> {noformat}
> As before, it will be possible to enter an interactive {{gfsh}} session via:
> {noformat}
> docker run -it apachegeode/geode
> {noformat}
> Note, the Dockerfile probably won't need to define any {{CMD}} at all when 
> this story is complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8357) Exhausting the high priority message pool can result in deadlock

2020-07-13 Thread Kirk Lund (Jira)
Kirk Lund created GEODE-8357:


 Summary: Exhausting the high priority message pool can result in 
deadlock
 Key: GEODE-8357
 URL: https://issues.apache.org/jira/browse/GEODE-8357
 Project: Geode
  Issue Type: Bug
  Components: messaging
Reporter: Kirk Lund


The system property "DistributionManager.MAX_THREADS" default to 100:
{noformat}
int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
{noformat}
The system property used to be defined in geode-core ClusterDistributionManager 
and has moved to geode-core OperationExecutors.

The value is used to limit ClusterOperationExecutors threadPool and 
highPriorityPool:
{noformat}
threadPool =
CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
Processor ",
thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());

highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
"Pooled High Priority Message Processor ",
thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
{noformat}
I have seen server startup hang when recovering lots of expired entries from 
disk while using PDX. The hang looks like a dlock request for the PDX lock is 
not receiving a response. Checking the value for the 
distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
maxed out and never dropping.

The dlock response granting the PDX lock is stuck in the highPriorityQueue 
because there are no more highPriorityQueue threads available to process the 
response. All of the highPriorityQueue thread stack dumps show tasks such as 
recovering bucket from disk are blocked waiting for the PDX lock.

Several changes could improve this situation, either in conjunction or 
separately:
# improve observability to enable support to identify that this situation has 
occurred
# automatically identify this situation and warn the user with a log statement
# automatically prevent this situation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8357) Exhausting the high priority message pool can result in deadlock

2020-07-13 Thread Kirk Lund (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund reassigned GEODE-8357:


Assignee: Kirk Lund

> Exhausting the high priority message pool can result in deadlock
> 
>
> Key: GEODE-8357
> URL: https://issues.apache.org/jira/browse/GEODE-8357
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>
> The system property "DistributionManager.MAX_THREADS" default to 100:
> {noformat}
> int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
> {noformat}
> The system property used to be defined in geode-core 
> ClusterDistributionManager and has moved to geode-core OperationExecutors.
> The value is used to limit ClusterOperationExecutors threadPool and 
> highPriorityPool:
> {noformat}
> threadPool =
> CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
> Processor ",
> thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
> MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
> INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());
> highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
> "Pooled High Priority Message Processor ",
> thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
> MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
> INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
> {noformat}
> I have seen server startup hang when recovering lots of expired entries from 
> disk while using PDX. The hang looks like a dlock request for the PDX lock is 
> not receiving a response. Checking the value for the 
> distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
> maxed out and never dropping.
> The dlock response granting the PDX lock is stuck in the highPriorityQueue 
> because there are no more highPriorityQueue threads available to process the 
> response. All of the highPriorityQueue thread stack dumps show tasks such as 
> recovering bucket from disk are blocked waiting for the PDX lock.
> Several changes could improve this situation, either in conjunction or 
> individually:
> # improve observability to enable support to identify that this situation has 
> occurred
> # automatically identify this situation and warn the user with a log statement
> # automatically prevent this situation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8357) Exhausting the high priority message pool can result in deadlock

2020-07-13 Thread Kirk Lund (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-8357:
-
Description: 
The system property "DistributionManager.MAX_THREADS" default to 100:
{noformat}
int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
{noformat}
The system property used to be defined in geode-core ClusterDistributionManager 
and has moved to geode-core OperationExecutors.

The value is used to limit ClusterOperationExecutors threadPool and 
highPriorityPool:
{noformat}
threadPool =
CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
Processor ",
thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());

highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
"Pooled High Priority Message Processor ",
thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
{noformat}
I have seen server startup hang when recovering lots of expired entries from 
disk while using PDX. The hang looks like a dlock request for the PDX lock is 
not receiving a response. Checking the value for the 
distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
maxed out and never dropping.

The dlock response granting the PDX lock is stuck in the highPriorityQueue 
because there are no more highPriorityQueue threads available to process the 
response. All of the highPriorityQueue thread stack dumps show tasks such as 
recovering bucket from disk are blocked waiting for the PDX lock.

Several changes could improve this situation, either in conjunction or 
individually:
# improve observability to enable support to identify that this situation has 
occurred
# automatically identify this situation and warn the user with a log statement
# automatically prevent this situation

  was:
The system property "DistributionManager.MAX_THREADS" default to 100:
{noformat}
int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
{noformat}
The system property used to be defined in geode-core ClusterDistributionManager 
and has moved to geode-core OperationExecutors.

The value is used to limit ClusterOperationExecutors threadPool and 
highPriorityPool:
{noformat}
threadPool =
CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
Processor ",
thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());

highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
"Pooled High Priority Message Processor ",
thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
{noformat}
I have seen server startup hang when recovering lots of expired entries from 
disk while using PDX. The hang looks like a dlock request for the PDX lock is 
not receiving a response. Checking the value for the 
distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
maxed out and never dropping.

The dlock response granting the PDX lock is stuck in the highPriorityQueue 
because there are no more highPriorityQueue threads available to process the 
response. All of the highPriorityQueue thread stack dumps show tasks such as 
recovering bucket from disk are blocked waiting for the PDX lock.

Several changes could improve this situation, either in conjunction or 
separately:
# improve observability to enable support to identify that this situation has 
occurred
# automatically identify this situation and warn the user with a log statement
# automatically prevent this situation


> Exhausting the high priority message pool can result in deadlock
> 
>
> Key: GEODE-8357
> URL: https://issues.apache.org/jira/browse/GEODE-8357
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Reporter: Kirk Lund
>Priority: Major
>
> The system property "DistributionManager.MAX_THREADS" default to 100:
> {noformat}
> int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
> {noformat}
> The system property used to be defined in geode-core 
> ClusterDistributionManager and has moved to geode-core OperationExecutors.
> The value is used to limit ClusterOperationExecutors threadPool and 
> highPriorityPool:
> {noformat}
> threadPool =
> CoreLoggingExecutors.newThreadPoolWithFeedS

[jira] [Updated] (GEODE-8357) Exhausting the high priority message pool can result in deadlock

2020-07-13 Thread Kirk Lund (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-8357:
-
Labels: GeodeOperationAPI  (was: )

> Exhausting the high priority message pool can result in deadlock
> 
>
> Key: GEODE-8357
> URL: https://issues.apache.org/jira/browse/GEODE-8357
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI
>
> The system property "DistributionManager.MAX_THREADS" default to 100:
> {noformat}
> int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
> {noformat}
> The system property used to be defined in geode-core 
> ClusterDistributionManager and has moved to geode-core OperationExecutors.
> The value is used to limit ClusterOperationExecutors threadPool and 
> highPriorityPool:
> {noformat}
> threadPool =
> CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
> Processor ",
> thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
> MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
> INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());
> highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
> "Pooled High Priority Message Processor ",
> thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
> MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
> INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
> {noformat}
> I have seen server startup hang when recovering lots of expired entries from 
> disk while using PDX. The hang looks like a dlock request for the PDX lock is 
> not receiving a response. Checking the value for the 
> distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
> maxed out and never dropping.
> The dlock response granting the PDX lock is stuck in the highPriorityQueue 
> because there are no more highPriorityQueue threads available to process the 
> response. All of the highPriorityQueue thread stack dumps show tasks such as 
> recovering bucket from disk are blocked waiting for the PDX lock.
> Several changes could improve this situation, either in conjunction or 
> individually:
> # improve observability to enable support to identify that this situation has 
> occurred
> # automatically identify this situation and warn the user with a log statement
> # automatically prevent this situation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8357) Exhausting the high priority message pool can result in deadlock

2020-07-13 Thread Kirk Lund (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-8357:
-
Affects Version/s: 1.0.0-incubating
   1.2.0
   1.3.0
   1.4.0
   1.5.0
   1.6.0
   1.7.0
   1.8.0
   1.9.0
   1.10.0
   1.11.0
   1.12.0

> Exhausting the high priority message pool can result in deadlock
> 
>
> Key: GEODE-8357
> URL: https://issues.apache.org/jira/browse/GEODE-8357
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.0.0-incubating, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 
> 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI
>
> The system property "DistributionManager.MAX_THREADS" default to 100:
> {noformat}
> int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
> {noformat}
> The system property used to be defined in geode-core 
> ClusterDistributionManager and has moved to geode-core OperationExecutors.
> The value is used to limit ClusterOperationExecutors threadPool and 
> highPriorityPool:
> {noformat}
> threadPool =
> CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
> Processor ",
> thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
> MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
> INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());
> highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
> "Pooled High Priority Message Processor ",
> thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
> MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
> INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
> {noformat}
> I have seen server startup hang when recovering lots of expired entries from 
> disk while using PDX. The hang looks like a dlock request for the PDX lock is 
> not receiving a response. Checking the value for the 
> distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
> maxed out and never dropping.
> The dlock response granting the PDX lock is stuck in the highPriorityQueue 
> because there are no more highPriorityQueue threads available to process the 
> response. All of the highPriorityQueue thread stack dumps show tasks such as 
> recovering bucket from disk are blocked waiting for the PDX lock.
> Several changes could improve this situation, either in conjunction or 
> individually:
> # improve observability to enable support to identify that this situation has 
> occurred
> # automatically identify this situation and warn the user with a log statement
> # automatically prevent this situation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156848#comment-17156848
 ] 

ASF GitHub Bot commented on GEODE-8326:
---

onichols-pivotal commented on a change in pull request #5358:
URL: https://github.com/apache/geode/pull/5358#discussion_r453805506



##
File path: 
geode-core/src/distributedTest/java/org/apache/geode/internal/cache/partitioned/fixed/FixedPartitioningWithTransactionDistributedTest.java
##
@@ -238,7 +238,7 @@ private void forceClientMetadataUpdate(Region region) {
 ClientMetadataService clientMetadataService =
 ((InternalCache) 
clientCacheRule.getClientCache()).getClientMetadataService();
 clientMetadataService.scheduleGetPRMetaData((InternalRegion) region, true);
-await().atMost(5, MINUTES).until(clientMetadataService::isMetadataStable);
+await().atMost(5, HOURS).until(clientMetadataService::isMetadataStable);

Review comment:
   seems like this experiment could have been conducted in the PR pipeline, 
shouldn't have needed to merge to develop





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CI Failure: 
> FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled
>  times out waiting for client metadata
> ---
>
> Key: GEODE-8326
> URL: https://issues.apache.org/jira/browse/GEODE-8326
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Affects Versions: 1.13.0
>Reporter: Kirk Lund
>Assignee: Eric Shu
>Priority: Major
>  Labels: caching-applications
>
> CI Failure: 
> http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/
> {noformat}
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  > 
> clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject]
>  FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  that uses org.apache.geode.cache.client.internal.ClientMetadataService was 
> not fulfilled within 5 minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254)
> {noformat}
> The failure occurs after waiting 5 minutes for the ClientMetadataService to 
> stabilize. See ClientMetadataService#isMetadataStable.
> The timeout occurs within a block of test code that was introduced by Jake in 
> PR #3840:
> {noformat}
> GEODE-7006: Fixes function execution by id with transactions. (#3840)  
> * Fixes test to force and wait for PR metadata to update.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata

2020-07-13 Thread Kirk Lund (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156870#comment-17156870
 ] 

Kirk Lund commented on GEODE-8326:
--

[~onichols] Thanks for pointing that out! That wasn't supposed to get merged to 
develop.

> CI Failure: 
> FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled
>  times out waiting for client metadata
> ---
>
> Key: GEODE-8326
> URL: https://issues.apache.org/jira/browse/GEODE-8326
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Affects Versions: 1.13.0
>Reporter: Kirk Lund
>Assignee: Eric Shu
>Priority: Major
>  Labels: caching-applications
>
> CI Failure: 
> http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/
> {noformat}
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  > 
> clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject]
>  FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  that uses org.apache.geode.cache.client.internal.ClientMetadataService was 
> not fulfilled within 5 minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254)
> {noformat}
> The failure occurs after waiting 5 minutes for the ClientMetadataService to 
> stabilize. See ClientMetadataService#isMetadataStable.
> The timeout occurs within a block of test code that was introduced by Jake in 
> PR #3840:
> {noformat}
> GEODE-7006: Fixes function execution by id with transactions. (#3840)  
> * Fixes test to force and wait for PR metadata to update.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for clie

2020-07-13 Thread Kirk Lund (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-8326:
-
Comment: was deleted

(was: [~onichols] Thanks for pointing that out! That wasn't supposed to get 
merged to develop.)

> CI Failure: 
> FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled
>  times out waiting for client metadata
> ---
>
> Key: GEODE-8326
> URL: https://issues.apache.org/jira/browse/GEODE-8326
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Affects Versions: 1.13.0
>Reporter: Kirk Lund
>Assignee: Eric Shu
>Priority: Major
>  Labels: caching-applications
>
> CI Failure: 
> http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/
> {noformat}
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  > 
> clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject]
>  FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest
>  that uses org.apache.geode.cache.client.internal.ClientMetadataService was 
> not fulfilled within 5 minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458)
> at 
> org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254)
> {noformat}
> The failure occurs after waiting 5 minutes for the ClientMetadataService to 
> stabilize. See ClientMetadataService#isMetadataStable.
> The timeout occurs within a block of test code that was introduced by Jake in 
> PR #3840:
> {noformat}
> GEODE-7006: Fixes function execution by id with transactions. (#3840)  
> * Fixes test to force and wait for PR metadata to update.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8298) member version comparison sense inconsistent when deciding on multicast

2020-07-13 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-8298:

Description: 
Since about 2014 when we introduced the {{Version}} class to replace use of 
{{short}} s all over the place for serialization versions, these two loops in 
{{GMSMembership.processView()}} have used comparisons that disagree in sense:

{code}
// We perform the update under a global lock so that other
// incoming events will not be lost in terms of our global view.
latestViewWriteLock.lock();
try {
  // first determine the version for multicast message serialization
  VersionOrdinal version = Version.CURRENT;
  for (final Entry internalIDLongEntry : surpriseMembers
  .entrySet()) {
ID mbr = internalIDLongEntry.getKey();
final VersionOrdinal itsVersion = mbr.getVersionObject();
if (itsVersion != null && version.compareTo(itsVersion) < 0) {
  version = itsVersion;
}
  }
  for (ID mbr : newView.getMembers()) {
final VersionOrdinal itsVersion = mbr.getVersionObject();
if (itsVersion != null && itsVersion.compareTo(version) < 0) {
  version = mbr.getVersionObject();
}
  }
  disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT);
{code}

The goal here is to find the oldest version and if that version is older than 
our local version we disable multicast. So we want to put the minimum into 
{{version}}. So the first loop's comparison is wrong and the second one is 
right.

While we are in here let's combine the two loops using 
{{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()), 
 newView.getMembers().stream()).forEach(member -> ...)}}.

Alternatives are described here: 
https://www.baeldung.com/java-combine-multiple-collections

Once we have the combined {{Iterable}} we can use something like 
{{Collections.min()}} to find the minimum in one swell foop and this whole 
thing collapses to one or two declarative expressions.

When this story is complete, the functionality will be in a separate method and 
we'll have a unit test for it.

  was:
Since about 2014 when we introduced the {{Version}} class to replace use of 
{{short}}s all over the place for serialization versions, these two loops in 
{{GMSMembership.processView()}} have used comparisons that disagree in sense:

{code}
// We perform the update under a global lock so that other
// incoming events will not be lost in terms of our global view.
latestViewWriteLock.lock();
try {
  // first determine the version for multicast message serialization
  VersionOrdinal version = Version.CURRENT;
  for (final Entry internalIDLongEntry : surpriseMembers
  .entrySet()) {
ID mbr = internalIDLongEntry.getKey();
final VersionOrdinal itsVersion = mbr.getVersionObject();
if (itsVersion != null && version.compareTo(itsVersion) < 0) {
  version = itsVersion;
}
  }
  for (ID mbr : newView.getMembers()) {
final VersionOrdinal itsVersion = mbr.getVersionObject();
if (itsVersion != null && itsVersion.compareTo(version) < 0) {
  version = mbr.getVersionObject();
}
  }
  disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT);
{code}

The goal here is to find the oldest version and if that version is older than 
our local version we disable multicast. So we want to put the minimum into 
{{version}}. So the first loop's comparison is wrong and the second one is 
right.

While we are in here let's combine the two loops using 
{{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()), 
 newView.getMembers().stream()).forEach(member -> ...)}}.

Alternatives are described here: 
https://www.baeldung.com/java-combine-multiple-collections

Once we have the combined {{Iterable}} we can use something like 
{{Collections.min()}} to find the minimum in one swell foop and this whole 
thing collapses to one or two declarative expressions.

When this story is complete, the functionality will be in a separate method and 
we'll have a unit test for it.


> member version comparison sense inconsistent when deciding on multicast
> ---
>
> Key: GEODE-8298
> URL: https://issues.apache.org/jira/browse/GEODE-8298
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bill Burcham
>Priority: Major
>  Labels: starter
>
> Since about 2014 when we introduced the {{Version}} class to replace use of 
> {{short}} s all over the place for serialization versions, these two loops in 
> {{GMSMembership.processView()}} have used comparisons that disagree in sense:
> {code}
> // We perform the update under a

[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156878#comment-17156878
 ] 

ASF GitHub Bot commented on GEODE-8340:
---

moleske commented on a change in pull request #625:
URL: https://github.com/apache/geode-native/pull/625#discussion_r453770010



##
File path: cppcache/src/TcrMessage.cpp
##
@@ -1015,6 +1015,36 @@ void TcrMessage::processChunk(const 
std::vector& chunk, int32_t len,
 break;
   }
   // fall-through for other cases
+  if (m_chunkedResult != nullptr) {

Review comment:
   The previous code did a fallthrough if it didn't make it to the `break;` 
statement on line 999 or 1015 (which is buried in an `if` and `else if`).  This 
is the same code as on line 1054.  I thought about extracting a function but 
got lazy





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce Switch compiler warnings as errors
> --
>
> Key: GEODE-8340
> URL: https://issues.apache.org/jira/browse/GEODE-8340
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>
> Given I compile the code without exempting no-switch-enum and 
> no-implicit-fallthrough and no-covered-switch-default
> Then it should compile
> Note - was marked as a todo, seems reasonable to tackle all these at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8356) Gfsh export logs to support capturing thread dumps in the logs

2020-07-13 Thread Anilkumar Gingade (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156888#comment-17156888
 ] 

Anilkumar Gingade commented on GEODE-8356:
--

There is already gfsh command to get the stack-trace. This is to make the 
entire process of getting the logs/stas/dumps in one single go.

> Gfsh export logs to support capturing thread dumps in the logs
> --
>
> Key: GEODE-8356
> URL: https://issues.apache.org/jira/browse/GEODE-8356
> Project: Geode
>  Issue Type: Improvement
>  Components: gfsh
>Reporter: Anilkumar Gingade
>Priority: Major
>  Labels: GeodeOperationAPI
>
> With an option to say "--with-thread-dump"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8298) member version comparison sense inconsistent when deciding on multicast

2020-07-13 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham reassigned GEODE-8298:
---

Assignee: Kamilla Aslami

> member version comparison sense inconsistent when deciding on multicast
> ---
>
> Key: GEODE-8298
> URL: https://issues.apache.org/jira/browse/GEODE-8298
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bill Burcham
>Assignee: Kamilla Aslami
>Priority: Major
>  Labels: starter
>
> Since about 2014 when we introduced the {{Version}} class to replace use of 
> {{short}} s all over the place for serialization versions, these two loops in 
> {{GMSMembership.processView()}} have used comparisons that disagree in sense:
> {code}
> // We perform the update under a global lock so that other
> // incoming events will not be lost in terms of our global view.
> latestViewWriteLock.lock();
> try {
>   // first determine the version for multicast message serialization
>   VersionOrdinal version = Version.CURRENT;
>   for (final Entry internalIDLongEntry : surpriseMembers
>   .entrySet()) {
> ID mbr = internalIDLongEntry.getKey();
> final VersionOrdinal itsVersion = mbr.getVersionObject();
> if (itsVersion != null && version.compareTo(itsVersion) < 0) {
>   version = itsVersion;
> }
>   }
>   for (ID mbr : newView.getMembers()) {
> final VersionOrdinal itsVersion = mbr.getVersionObject();
> if (itsVersion != null && itsVersion.compareTo(version) < 0) {
>   version = mbr.getVersionObject();
> }
>   }
>   disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT);
> {code}
> The goal here is to find the oldest version and if that version is older than 
> our local version we disable multicast. So we want to put the minimum into 
> {{version}}. So the first loop's comparison is wrong and the second one is 
> right.
> While we are in here let's combine the two loops using 
> {{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()),
>   newView.getMembers().stream()).forEach(member -> ...)}}.
> Alternatives are described here: 
> https://www.baeldung.com/java-combine-multiple-collections
> Once we have the combined {{Iterable}} we can use something like 
> {{Collections.min()}} to find the minimum in one swell foop and this whole 
> thing collapses to one or two declarative expressions.
> When this story is complete, the functionality will be in a separate method 
> and we'll have a unit test for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8298) member version comparison sense inconsistent when deciding on multicast

2020-07-13 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham reassigned GEODE-8298:
---

Assignee: (was: Kamilla Aslami)

> member version comparison sense inconsistent when deciding on multicast
> ---
>
> Key: GEODE-8298
> URL: https://issues.apache.org/jira/browse/GEODE-8298
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bill Burcham
>Priority: Major
>  Labels: starter
>
> Since about 2014 when we introduced the {{Version}} class to replace use of 
> {{short}} s all over the place for serialization versions, these two loops in 
> {{GMSMembership.processView()}} have used comparisons that disagree in sense:
> {code}
> // We perform the update under a global lock so that other
> // incoming events will not be lost in terms of our global view.
> latestViewWriteLock.lock();
> try {
>   // first determine the version for multicast message serialization
>   VersionOrdinal version = Version.CURRENT;
>   for (final Entry internalIDLongEntry : surpriseMembers
>   .entrySet()) {
> ID mbr = internalIDLongEntry.getKey();
> final VersionOrdinal itsVersion = mbr.getVersionObject();
> if (itsVersion != null && version.compareTo(itsVersion) < 0) {
>   version = itsVersion;
> }
>   }
>   for (ID mbr : newView.getMembers()) {
> final VersionOrdinal itsVersion = mbr.getVersionObject();
> if (itsVersion != null && itsVersion.compareTo(version) < 0) {
>   version = mbr.getVersionObject();
> }
>   }
>   disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT);
> {code}
> The goal here is to find the oldest version and if that version is older than 
> our local version we disable multicast. So we want to put the minimum into 
> {{version}}. So the first loop's comparison is wrong and the second one is 
> right.
> While we are in here let's combine the two loops using 
> {{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()),
>   newView.getMembers().stream()).forEach(member -> ...)}}.
> Alternatives are described here: 
> https://www.baeldung.com/java-combine-multiple-collections
> Once we have the combined {{Iterable}} we can use something like 
> {{Collections.min()}} to find the minimum in one swell foop and this whole 
> thing collapses to one or two declarative expressions.
> When this story is complete, the functionality will be in a separate method 
> and we'll have a unit test for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (GEODE-8298) member version comparison sense inconsistent when deciding on multicast

2020-07-13 Thread Kamilla Aslami (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamilla Aslami reassigned GEODE-8298:
-

Assignee: Kamilla Aslami

> member version comparison sense inconsistent when deciding on multicast
> ---
>
> Key: GEODE-8298
> URL: https://issues.apache.org/jira/browse/GEODE-8298
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bill Burcham
>Assignee: Kamilla Aslami
>Priority: Major
>  Labels: starter
>
> Since about 2014 when we introduced the {{Version}} class to replace use of 
> {{short}} s all over the place for serialization versions, these two loops in 
> {{GMSMembership.processView()}} have used comparisons that disagree in sense:
> {code}
> // We perform the update under a global lock so that other
> // incoming events will not be lost in terms of our global view.
> latestViewWriteLock.lock();
> try {
>   // first determine the version for multicast message serialization
>   VersionOrdinal version = Version.CURRENT;
>   for (final Entry internalIDLongEntry : surpriseMembers
>   .entrySet()) {
> ID mbr = internalIDLongEntry.getKey();
> final VersionOrdinal itsVersion = mbr.getVersionObject();
> if (itsVersion != null && version.compareTo(itsVersion) < 0) {
>   version = itsVersion;
> }
>   }
>   for (ID mbr : newView.getMembers()) {
> final VersionOrdinal itsVersion = mbr.getVersionObject();
> if (itsVersion != null && itsVersion.compareTo(version) < 0) {
>   version = mbr.getVersionObject();
> }
>   }
>   disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT);
> {code}
> The goal here is to find the oldest version and if that version is older than 
> our local version we disable multicast. So we want to put the minimum into 
> {{version}}. So the first loop's comparison is wrong and the second one is 
> right.
> While we are in here let's combine the two loops using 
> {{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()),
>   newView.getMembers().stream()).forEach(member -> ...)}}.
> Alternatives are described here: 
> https://www.baeldung.com/java-combine-multiple-collections
> Once we have the combined {{Iterable}} we can use something like 
> {{Collections.min()}} to find the minimum in one swell foop and this whole 
> thing collapses to one or two declarative expressions.
> When this story is complete, the functionality will be in a separate method 
> and we'll have a unit test for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8302) WAN Conflation stats are being incorrectly incremented

2020-07-13 Thread Donal Evans (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donal Evans resolved GEODE-8302.

Fix Version/s: 1.14.0
   Resolution: Fixed

> WAN Conflation stats are being incorrectly incremented
> --
>
> Key: GEODE-8302
> URL: https://issues.apache.org/jira/browse/GEODE-8302
> Project: Geode
>  Issue Type: Bug
>  Components: statistics, wan
>Affects Versions: 1.14.0
>Reporter: Donal Evans
>Assignee: Alberto Gomez
>Priority: Major
> Fix For: 1.14.0
>
>
> When the below diff (which adds checks to confirm that conflation stats are 
> not incremented in WAN tests with conflation disabled) is applied, the 
> modified tests fail due to conflation stats being incorrectly incremented. 
> This behaviour is only observed since the changes included in this PR were 
> introduced: https://github.com/apache/geode/pull/4928
> {noformat}
> diff --git 
> a/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialWANStatsDUnitTest.java
>  
> b/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialWANStatsDUnitTest.java
> index b2ed76728f..bc6beb0002 100644
> --- 
> a/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialWANStatsDUnitTest.java
> +++ 
> b/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialWANStatsDUnitTest.java
> @@ -209,6 +209,7 @@ public class SerialWANStatsDUnitTest extends WANTestBase {
>  
>  vm4.invoke(() -> WANTestBase.checkQueueStats("ln", 0, entries, entries, 
> entries));
>  vm4.invoke(() -> WANTestBase.checkBatchStats("ln", 1, true));
> +vm4.invoke(() -> WANTestBase.checkConflatedStats("ln", 0));
>  
>  // wait until queue is empty
>  vm5.invoke(() -> await()
> @@ -354,6 +355,7 @@ public class SerialWANStatsDUnitTest extends WANTestBase {
>  
>  vm4.invoke(() -> WANTestBase.checkQueueStats("ln", 0, entries, entries, 
> entries));
>  vm4.invoke(() -> WANTestBase.checkBatchStats("ln", 2, true, true));
> +vm4.invoke(() -> WANTestBase.checkConflatedStats("ln", 0));
>  
>  // wait until queue is empty
>  vm5.invoke(() -> await()
> {noformat}
> In addition to the tests above, 
> SerialWANPropagation_PartitionedRegionDUnitTest.testPartitionedSerialPropagationHA()
>  fails with incorrectly incremented conflation stats if a similar check is 
> introduced at the end of the test. Again, without the changes introduced by 
> PR #4928, this modified test passes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156943#comment-17156943
 ] 

ASF GitHub Bot commented on GEODE-8340:
---

pdxcodemonkey commented on a change in pull request #625:
URL: https://github.com/apache/geode-native/pull/625#discussion_r453895218



##
File path: cppcache/include/geode/DataInput.hpp
##
@@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput {
 // empty string
 break;
   // TODO: What's the right response here?
-  default:
+  case internal::DSCode::FixedIDDefault:

Review comment:
   Sorry, I don't think the chance of getting a warning for free is worth 
the cost of some egregious cluttering of the code.  We're only using 4 of the 
61 possible values for a DSCode, so it was kind of an abuse of a switch 
statement to begin with.  This should probably just be a 4-way if-else block.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce Switch compiler warnings as errors
> --
>
> Key: GEODE-8340
> URL: https://issues.apache.org/jira/browse/GEODE-8340
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>
> Given I compile the code without exempting no-switch-enum and 
> no-implicit-fallthrough and no-covered-switch-default
> Then it should compile
> Note - was marked as a todo, seems reasonable to tackle all these at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8357) Exhausting the high priority message pool can result in deadlock

2020-07-13 Thread Kirk Lund (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-8357:
-
Description: 
The system property "DistributionManager.MAX_THREADS" default to 100:
{noformat}
int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
{noformat}
The system property used to be defined in geode-core ClusterDistributionManager 
and has moved to geode-core OperationExecutors.

The value is used to limit ClusterOperationExecutors threadPool and 
highPriorityPool:
{noformat}
threadPool =
CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
Processor ",
thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());

highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
"Pooled High Priority Message Processor ",
thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
{noformat}
I have seen server startup hang when recovering lots of expired entries from 
disk while using PDX. The hang looks like a dlock request for the PDX lock is 
not receiving a response. Checking the value for the 
distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
maxed out and never dropping.

The dlock response granting the PDX lock is stuck in the highPriorityQueue 
because there are no more highPriorityQueue threads available to process the 
response. All of the highPriorityQueue thread stack dumps show tasks such as 
recovering bucket from disk are blocked waiting for the PDX lock.

Several changes could improve this situation, either in conjunction or 
individually:
# improve observability to enable support to identify that this situation has 
occurred
# automatically identify this situation and warn the user with a log statement
# automatically prevent this situation
# identify the messages that are prone to causing deadlocks and move them to a 
dedicated thread pool with a higher limit

  was:
The system property "DistributionManager.MAX_THREADS" default to 100:
{noformat}
int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
{noformat}
The system property used to be defined in geode-core ClusterDistributionManager 
and has moved to geode-core OperationExecutors.

The value is used to limit ClusterOperationExecutors threadPool and 
highPriorityPool:
{noformat}
threadPool =
CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
Processor ",
thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());

highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
"Pooled High Priority Message Processor ",
thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
{noformat}
I have seen server startup hang when recovering lots of expired entries from 
disk while using PDX. The hang looks like a dlock request for the PDX lock is 
not receiving a response. Checking the value for the 
distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
maxed out and never dropping.

The dlock response granting the PDX lock is stuck in the highPriorityQueue 
because there are no more highPriorityQueue threads available to process the 
response. All of the highPriorityQueue thread stack dumps show tasks such as 
recovering bucket from disk are blocked waiting for the PDX lock.

Several changes could improve this situation, either in conjunction or 
individually:
# improve observability to enable support to identify that this situation has 
occurred
# automatically identify this situation and warn the user with a log statement
# automatically prevent this situation


> Exhausting the high priority message pool can result in deadlock
> 
>
> Key: GEODE-8357
> URL: https://issues.apache.org/jira/browse/GEODE-8357
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.0.0-incubating, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 
> 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI
>
> The system property "DistributionManager.MAX_THREADS" default to 100:
> {noformat}
> int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS",

[jira] [Updated] (GEODE-8357) Exhausting the high priority message thread pool can result in deadlock

2020-07-13 Thread Kirk Lund (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-8357:
-
Summary: Exhausting the high priority message thread pool can result in 
deadlock  (was: Exhausting the high priority message pool can result in 
deadlock)

> Exhausting the high priority message thread pool can result in deadlock
> ---
>
> Key: GEODE-8357
> URL: https://issues.apache.org/jira/browse/GEODE-8357
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.0.0-incubating, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 
> 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: GeodeOperationAPI
>
> The system property "DistributionManager.MAX_THREADS" default to 100:
> {noformat}
> int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
> {noformat}
> The system property used to be defined in geode-core 
> ClusterDistributionManager and has moved to geode-core OperationExecutors.
> The value is used to limit ClusterOperationExecutors threadPool and 
> highPriorityPool:
> {noformat}
> threadPool =
> CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
> Processor ",
> thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
> MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
> INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());
> highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
> "Pooled High Priority Message Processor ",
> thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
> MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
> INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
> {noformat}
> I have seen server startup hang when recovering lots of expired entries from 
> disk while using PDX. The hang looks like a dlock request for the PDX lock is 
> not receiving a response. Checking the value for the 
> distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
> maxed out and never dropping.
> The dlock response granting the PDX lock is stuck in the highPriorityQueue 
> because there are no more highPriorityQueue threads available to process the 
> response. All of the highPriorityQueue thread stack dumps show tasks such as 
> recovering bucket from disk are blocked waiting for the PDX lock.
> Several changes could improve this situation, either in conjunction or 
> individually:
> # improve observability to enable support to identify that this situation has 
> occurred
> # automatically identify this situation and warn the user with a log statement
> # automatically prevent this situation
> # identify the messages that are prone to causing deadlocks and move them to 
> a dedicated thread pool with a higher limit



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8355) long-running-test details need to be world-readable

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156947#comment-17156947
 ] 

ASF GitHub Bot commented on GEODE-8355:
---

rhoughton-pivot merged pull request #5366:
URL: https://github.com/apache/geode/pull/5366


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> long-running-test details need to be world-readable
> ---
>
> Key: GEODE-8355
> URL: https://issues.apache.org/jira/browse/GEODE-8355
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Robert Houghton
>Assignee: Robert Houghton
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8355) long-running-test details need to be world-readable

2020-07-13 Thread Robert Houghton (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Houghton resolved GEODE-8355.

Fix Version/s: 1.14.0
   Resolution: Fixed

> long-running-test details need to be world-readable
> ---
>
> Key: GEODE-8355
> URL: https://issues.apache.org/jira/browse/GEODE-8355
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Robert Houghton
>Assignee: Robert Houghton
>Priority: Major
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8355) long-running-test details need to be world-readable

2020-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156948#comment-17156948
 ] 

ASF subversion and git services commented on GEODE-8355:


Commit c41e3b4b559bfbc744c8c21844cd126de2ad2fb9 in geode's branch 
refs/heads/develop from Robert Houghton
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c41e3b4 ]

GEODE-8355: add `public: true` to the test job in long-running-test (#5366)

Authored-by: Robert Houghton 

> long-running-test details need to be world-readable
> ---
>
> Key: GEODE-8355
> URL: https://issues.apache.org/jira/browse/GEODE-8355
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Robert Houghton
>Assignee: Robert Houghton
>Priority: Major
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156961#comment-17156961
 ] 

ASF GitHub Bot commented on GEODE-8340:
---

moleske commented on a change in pull request #625:
URL: https://github.com/apache/geode-native/pull/625#discussion_r453915203



##
File path: cppcache/include/geode/DataInput.hpp
##
@@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput {
 // empty string
 break;
   // TODO: What's the right response here?
-  default:
+  case internal::DSCode::FixedIDDefault:

Review comment:
   I'll switch it to an if-else block and we'll see if that cleans up better





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce Switch compiler warnings as errors
> --
>
> Key: GEODE-8340
> URL: https://issues.apache.org/jira/browse/GEODE-8340
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>
> Given I compile the code without exempting no-switch-enum and 
> no-implicit-fallthrough and no-covered-switch-default
> Then it should compile
> Note - was marked as a todo, seems reasonable to tackle all these at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156981#comment-17156981
 ] 

ASF GitHub Bot commented on GEODE-8340:
---

pdxcodemonkey commented on a change in pull request #625:
URL: https://github.com/apache/geode-native/pull/625#discussion_r453936057



##
File path: cppcache/include/geode/DataInput.hpp
##
@@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput {
 // empty string
 break;
   // TODO: What's the right response here?
-  default:
+  case internal::DSCode::FixedIDDefault:

Review comment:
   Just gonna leave this here, cause I spent the time & effort to figure it 
out.  Probably slightly better performance than the if-else approach, but I'm 
still not sure about readability:
   
   ```
 template 
 inline std::basic_string readString() {
   std::basic_string value;
   std::map > readers;
   
   readers.insert(std::make_pair(
   internal::DSCode::CacheableString,
   [=](std::string& val) { this->readJavaModifiedUtf8(val); }));
   
   readers.insert(
   std::make_pair(internal::DSCode::CacheableStringHuge,
  [=](std::string& val) { this->readUtf16Huge(val); }));
   
   readers.insert(
   std::make_pair(internal::DSCode::CacheableASCIIString,
  [=](std::string& val) { this->readAscii(val); }));
   
   readers.insert(
   std::make_pair(internal::DSCode::CacheableASCIIStringHuge,
  [=](std::string& val) { this->readAsciiHuge(val); }));
   
   auto type = static_cast(read());
   auto it = readers.find(static_cast(read()));
   
   if (it != readers.end()) {
 it->second(value);
   }
   return value;
 }
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce Switch compiler warnings as errors
> --
>
> Key: GEODE-8340
> URL: https://issues.apache.org/jira/browse/GEODE-8340
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>
> Given I compile the code without exempting no-switch-enum and 
> no-implicit-fallthrough and no-covered-switch-default
> Then it should compile
> Note - was marked as a todo, seems reasonable to tackle all these at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8067) ClassLoader Isolation

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157002#comment-17157002
 ] 

ASF GitHub Bot commented on GEODE-8067:
---

lgtm-com[bot] commented on pull request #5357:
URL: https://github.com/apache/geode/pull/5357#issuecomment-657812961


   This pull request **introduces 2 alerts** and **fixes 2** when merging 
9cb762ed91581557c8c0fca0ac8983834a8e595c into 
c41e3b4b559bfbc744c8c21844cd126de2ad2fb9 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-4bdd1467f1ba8de5c3639c0500e1641205fbfb8f)
   
   **new alerts:**
   
   * 2 for Potential input resource leak
   
   **fixed alerts:**
   
   * 2 for Unused variable, import, function or class



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ClassLoader Isolation
> -
>
> Key: GEODE-8067
> URL: https://issues.apache.org/jira/browse/GEODE-8067
> Project: Geode
>  Issue Type: New Feature
>  Components: client/server
>Reporter: Udo Kohlmeyer
>Assignee: Udo Kohlmeyer
>Priority: Major
>
> This is the root jira for the first pass implementation for [ClassLoader 
> Isolation|https://cwiki.apache.org/confluence/display/GEODE/ClassLoader+Isolation]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157003#comment-17157003
 ] 

ASF GitHub Bot commented on GEODE-8340:
---

moleske commented on a change in pull request #625:
URL: https://github.com/apache/geode-native/pull/625#discussion_r453958895



##
File path: cppcache/include/geode/DataInput.hpp
##
@@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput {
 // empty string
 break;
   // TODO: What's the right response here?
-  default:
+  case internal::DSCode::FixedIDDefault:

Review comment:
   I checked the `Allow edits and access to secrets by maintainers` so feel 
free to add this as commit and see if it works (I'm assuming the commit will go 
to the branch and trigger the [CI 
pipeline](https://github.com/moleske/geode-native/actions) I've been playing 
around with)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enforce Switch compiler warnings as errors
> --
>
> Key: GEODE-8340
> URL: https://issues.apache.org/jira/browse/GEODE-8340
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Michael Oleske
>Priority: Major
>
> Given I compile the code without exempting no-switch-enum and 
> no-implicit-fallthrough and no-covered-switch-default
> Then it should compile
> Note - was marked as a todo, seems reasonable to tackle all these at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-6950) Locator can't start if a lot of clients already started

2020-07-13 Thread Eugene Nedzvetsky (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Nedzvetsky updated GEODE-6950:
-
Affects Version/s: 1.10.0
   1.11.0
   1.12.0

> Locator can't start if a lot of clients already started
> ---
>
> Key: GEODE-6950
> URL: https://issues.apache.org/jira/browse/GEODE-6950
> Project: Geode
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Eugene Nedzvetsky
>Priority: Major
> Attachments: 1.log
>
>
> Locator can't start if a few hundred clients already started.
> Steps to reproduce:
> 1. Start Locator
> 2. Start 300 Geode clients
> 3. Stop Locator
> 4. Start Locator again
> Observe 100% CPU load and after some time Locator app crashes with timeout 
> exceptions in the log.
> The problem is in the method 
> org.apache.geode.distributed.internal.InternalLocator.PrimaryHandler#processRequest
> handlerMapping doesn't have handlers for LocatorListRequest and 
> ClientConnectionRequest requests on Locator startup and in this case work 
> code part with condition 'if(giveup == 0)'(InternalLocator:1185)
> Pause Thread.sleep(1000) works only on the first iteration and after that 
> giveup>0 and CPU just spends resources on cycle execution without any pauses.
> Call Thread.sleep(1000)  should be after if(giveup>0) condition block. It 
> will be called on each iteration in this case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-6950) Locator can't start if a lot of clients already started

2020-07-13 Thread Eugene Nedzvetsky (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157007#comment-17157007
 ] 

Eugene Nedzvetsky commented on GEODE-6950:
--

Current version:
{code:java}
  @Override
  public Object processRequest(Object request) throws IOException {
long giveup = 0;
while (giveup == 0 || System.currentTimeMillis() < giveup) {
  TcpHandler handler;
  if (request instanceof PeerLocatorRequest) {
handler = handlerMapping.get(PeerLocatorRequest.class);
  } else {
handler = handlerMapping.get(request.getClass());
  }  if (handler != null) {
return handler.processRequest(request);
  }  if (locatorListener != null) {
return locatorListener.handleRequest(request);
  }  // either there is a configuration problem or the locator is still 
starting up
  if (giveup == 0) {
int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime();
if (locatorWaitTime <= 0) {
  // always retry some number of times
  locatorWaitTime = 30;
}
giveup = System.currentTimeMillis() + locatorWaitTime * 1000L;
try {
  Thread.sleep(1000);
} catch (InterruptedException ignored) {
  // running in an executor - no need to set the interrupted flag on 
the thread
  return null;
}
  }
}
logger.info(
"Received a location request of class {} but the handler for this is 
either not enabled or is not ready to process requests",
request.getClass().getSimpleName());
return null;
  }
 {code}
Fix:
{code}
  @Override
  public Object processRequest(Object request) throws IOException {
long giveup = 0;
while (giveup == 0 || System.currentTimeMillis() < giveup) {
  TcpHandler handler;
  if (request instanceof PeerLocatorRequest) {
handler = handlerMapping.get(PeerLocatorRequest.class);
  } else {
handler = handlerMapping.get(request.getClass());
  }

  if (handler != null) {
return handler.processRequest(request);
  }

  if (locatorListener != null) {
return locatorListener.handleRequest(request);
  }

  // either there is a configuration problem or the locator is still 
starting up
  if (giveup == 0) {
int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime();
if (locatorWaitTime <= 0) {
  // always retry some number of times
  locatorWaitTime = 30;
}
giveup = System.currentTimeMillis() + locatorWaitTime * 1000L;
  }
 try {
Thread.sleep(1000);
  } catch (InterruptedException ignored) {
// running in an executor - no need to set the interrupted flag on the 
thread
return null;
  }
}
logger.info(
"Received a location request of class {} but the handler for this is 
either not enabled or is not ready to process requests",
request.getClass().getSimpleName());
return null;
  }
{code}

> Locator can't start if a lot of clients already started
> ---
>
> Key: GEODE-6950
> URL: https://issues.apache.org/jira/browse/GEODE-6950
> Project: Geode
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Eugene Nedzvetsky
>Priority: Major
> Attachments: 1.log
>
>
> Locator can't start if a few hundred clients already started.
> Steps to reproduce:
> 1. Start Locator
> 2. Start 300 Geode clients
> 3. Stop Locator
> 4. Start Locator again
> Observe 100% CPU load and after some time Locator app crashes with timeout 
> exceptions in the log.
> The problem is in the method 
> org.apache.geode.distributed.internal.InternalLocator.PrimaryHandler#processRequest
> handlerMapping doesn't have handlers for LocatorListRequest and 
> ClientConnectionRequest requests on Locator startup and in this case work 
> code part with condition 'if(giveup == 0)'(InternalLocator:1185)
> Pause Thread.sleep(1000) works only on the first iteration and after that 
> giveup>0 and CPU just spends resources on cycle execution without any pauses.
> Call Thread.sleep(1000)  should be after if(giveup>0) condition block. It 
> will be called on each iteration in this case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (GEODE-6950) Locator can't start if a lot of clients already started

2020-07-13 Thread Eugene Nedzvetsky (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157007#comment-17157007
 ] 

Eugene Nedzvetsky edited comment on GEODE-6950 at 7/13/20, 10:06 PM:
-

org.apache.geode.distributed.internal.PrimaryHandler:85
Current version:
{code:java}
  @Override
  public Object processRequest(Object request) throws IOException {
long giveup = 0;
while (giveup == 0 || System.currentTimeMillis() < giveup) {
  TcpHandler handler;
  if (request instanceof PeerLocatorRequest) {
handler = handlerMapping.get(PeerLocatorRequest.class);
  } else {
handler = handlerMapping.get(request.getClass());
  }  if (handler != null) {
return handler.processRequest(request);
  }  if (locatorListener != null) {
return locatorListener.handleRequest(request);
  }  // either there is a configuration problem or the locator is still 
starting up
  if (giveup == 0) {
int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime();
if (locatorWaitTime <= 0) {
  // always retry some number of times
  locatorWaitTime = 30;
}
giveup = System.currentTimeMillis() + locatorWaitTime * 1000L;
try {
  Thread.sleep(1000);
} catch (InterruptedException ignored) {
  // running in an executor - no need to set the interrupted flag on 
the thread
  return null;
}
  }
}
logger.info(
"Received a location request of class {} but the handler for this is 
either not enabled or is not ready to process requests",
request.getClass().getSimpleName());
return null;
  }
 {code}
Fix:
{code}
  @Override
  public Object processRequest(Object request) throws IOException {
long giveup = 0;
while (giveup == 0 || System.currentTimeMillis() < giveup) {
  TcpHandler handler;
  if (request instanceof PeerLocatorRequest) {
handler = handlerMapping.get(PeerLocatorRequest.class);
  } else {
handler = handlerMapping.get(request.getClass());
  }

  if (handler != null) {
return handler.processRequest(request);
  }

  if (locatorListener != null) {
return locatorListener.handleRequest(request);
  }

  // either there is a configuration problem or the locator is still 
starting up
  if (giveup == 0) {
int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime();
if (locatorWaitTime <= 0) {
  // always retry some number of times
  locatorWaitTime = 30;
}
giveup = System.currentTimeMillis() + locatorWaitTime * 1000L;
  }
 try {
Thread.sleep(1000);
  } catch (InterruptedException ignored) {
// running in an executor - no need to set the interrupted flag on the 
thread
return null;
  }
}
logger.info(
"Received a location request of class {} but the handler for this is 
either not enabled or is not ready to process requests",
request.getClass().getSimpleName());
return null;
  }
{code}


was (Author: eugenex9):
Current version:
{code:java}
  @Override
  public Object processRequest(Object request) throws IOException {
long giveup = 0;
while (giveup == 0 || System.currentTimeMillis() < giveup) {
  TcpHandler handler;
  if (request instanceof PeerLocatorRequest) {
handler = handlerMapping.get(PeerLocatorRequest.class);
  } else {
handler = handlerMapping.get(request.getClass());
  }  if (handler != null) {
return handler.processRequest(request);
  }  if (locatorListener != null) {
return locatorListener.handleRequest(request);
  }  // either there is a configuration problem or the locator is still 
starting up
  if (giveup == 0) {
int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime();
if (locatorWaitTime <= 0) {
  // always retry some number of times
  locatorWaitTime = 30;
}
giveup = System.currentTimeMillis() + locatorWaitTime * 1000L;
try {
  Thread.sleep(1000);
} catch (InterruptedException ignored) {
  // running in an executor - no need to set the interrupted flag on 
the thread
  return null;
}
  }
}
logger.info(
"Received a location request of class {} but the handler for this is 
either not enabled or is not ready to process requests",
request.getClass().getSimpleName());
return null;
  }
 {code}
Fix:
{code}
  @Override
  public Object processRequest(Object request) throws IOException {
long giveup = 0;
while (giveup == 0 || System.currentTimeMillis() < giveup) {
  TcpHandler handler;
  if (request instanceof PeerLocatorRequest) {
handler = handlerMapping.get(PeerLocator

[jira] [Commented] (GEODE-7670) Partitioned Region clear operations can occur during concurrent data operations

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157014#comment-17157014
 ] 

ASF GitHub Bot commented on GEODE-7670:
---

gesterzhou commented on a change in pull request #4848:
URL: https://github.com/apache/geode/pull/4848#discussion_r453976769



##
File path: 
geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearWithConcurrentOperationsDUnitTest.java
##
@@ -0,0 +1,715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+package org.apache.geode.internal.cache;
+
+import static org.apache.geode.internal.util.ArrayUtils.asList;
+import static org.apache.geode.test.awaitility.GeodeAwaitility.await;
+import static org.apache.geode.test.dunit.VM.getVM;
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+
+import java.io.Serializable;
+import java.time.Instant;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.IntStream;
+
+import junitparams.JUnitParamsRunner;
+import junitparams.Parameters;
+import junitparams.naming.TestCaseName;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+
+import org.apache.geode.ForcedDisconnectException;
+import org.apache.geode.cache.Cache;
+import org.apache.geode.cache.CacheWriter;
+import org.apache.geode.cache.CacheWriterException;
+import org.apache.geode.cache.PartitionAttributes;
+import org.apache.geode.cache.PartitionAttributesFactory;
+import org.apache.geode.cache.PartitionedRegionPartialClearException;
+import org.apache.geode.cache.Region;
+import org.apache.geode.cache.RegionEvent;
+import org.apache.geode.cache.RegionShortcut;
+import org.apache.geode.cache.partition.PartitionRegionHelper;
+import org.apache.geode.cache.util.CacheWriterAdapter;
+import org.apache.geode.distributed.DistributedSystemDisconnectedException;
+import org.apache.geode.distributed.internal.DMStats;
+import org.apache.geode.distributed.internal.InternalDistributedSystem;
+import 
org.apache.geode.distributed.internal.membership.api.MembershipManagerHelper;
+import org.apache.geode.internal.cache.versions.RegionVersionHolder;
+import org.apache.geode.internal.cache.versions.RegionVersionVector;
+import org.apache.geode.internal.cache.versions.VersionSource;
+import org.apache.geode.test.dunit.AsyncInvocation;
+import org.apache.geode.test.dunit.VM;
+import org.apache.geode.test.dunit.rules.CacheRule;
+import org.apache.geode.test.dunit.rules.DistributedRule;
+
+/**
+ * Tests to verify that {@link PartitionedRegion#clear()} operation can be 
executed multiple times
+ * on the same region while other cache operations are being executed 
concurrently and members are
+ * added or removed.
+ */
+@RunWith(JUnitParamsRunner.class)
+public class PartitionedRegionClearWithConcurrentOperationsDUnitTest 
implements Serializable {
+  private static final Integer BUCKETS = 13;
+  private static final String REGION_NAME = "PartitionedRegion";
+  private static final String TEST_CASE_NAME =
+  "[{index}] {method}(Coordinator:{0}, RegionType:{1})";
+
+  @Rule
+  public DistributedRule distributedRule = new DistributedRule(3);
+
+  @Rule
+  public CacheRule cacheRule = CacheRule.builder().createCacheInAll().build();
+
+  private VM accessor, server1, server2;
+
+  private enum TestVM {
+ACCESSOR(0), SERVER1(1), SERVER2(2);
+
+final int vmNumber;
+
+TestVM(int vmNumber) {
+  this.vmNumber = vmNumber;
+}
+  }
+
+  @SuppressWarnings("unused")
+  static RegionShortcut[] regionTypes() {
+return new RegionShortcut[] {
+RegionShortcut.PARTITION, RegionShortcut.PARTITION_REDUNDANT
+};
+  }
+
+  @SuppressWarnings("unused")
+  static TestVM[] coordinators() {
+return new TestVM[] {
+TestVM.SERVER1, TestVM.ACCESSOR
+};
+  }
+
+  @SuppressWarnings("unused")
+  static Object[] coordinatorsAndRegionTypes() {
+ArrayList parameters = new Ar

[jira] [Updated] (GEODE-8320) SerialWANStatsDUnitTest.testReplicatedSerialPropagationHAWithGroupTransactionEvents is failiing

2020-07-13 Thread Owen Nichols (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen Nichols updated GEODE-8320:

Fix Version/s: (was: 1.14.0)

> SerialWANStatsDUnitTest.testReplicatedSerialPropagationHAWithGroupTransactionEvents
>  is failiing
> ---
>
> Key: GEODE-8320
> URL: https://issues.apache.org/jira/browse/GEODE-8320
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Mark Hanson
>Assignee: Alberto Gomez
>Priority: Major
>
> {noformat}
> org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest > 
> testReplicatedSerialPropagationHAWithGroupTransactionEvents FAILED
> 11:55:01
>  org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest$$Lambda$134/1929719983.run
>  in VM 2 running on Host 249227cf2774 with 8 VMs
> 11:55:01
>  at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:610)
> 11:55:01
>  at org.apache.geode.test.dunit.VM.invoke(VM.java:437)
> 11:55:01
>  at 
> org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest.testReplicatedSerialPropagationHAWithGroupTransactionEvents(SerialWANStatsDUnitTest.java:578)
> 11:55:01
> 11:55:01
>  Caused by:
> 11:55:01
>  org.awaitility.core.ConditionTimeoutException: Assertion condition defined 
> as a lambda expression in org.apache.geode.internal.cache.wan.WANTestBase 
> that uses int, intorg.apache.geode.cache.Region Expected region entries: 
> 2 but actual entries: 1 present region keyset [7435200, <* 
> Intentionally cut out by Jira submitter *> 8851200] expected:<2> but 
> was:<1> within 5 minutes.
> 11:55:01
>  at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> 11:55:01
>  at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
> 11:55:01
>  at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
> 11:55:01
>  at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> 11:55:01
>  at 
> org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679)
> 11:55:01
>  at 
> org.apache.geode.internal.cache.wan.WANTestBase.validateRegionSize(WANTestBase.java:2942)
> 11:55:01
>  at 
> org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest.lambda$testReplicatedSerialPropagationHAWithGroupTransactionEvents$bb17a952$8(SerialWANStatsDUnitTest.java:578)
> 11:55:01
> 11:55:01
>  Caused by:
> 11:55:01
>  java.lang.AssertionError: Expected region entries: 2 but actual entries: 
> 1 present region keyset [7435200, <*Intentionally cut out by Jira 
> submitter*> ] expected:<2> but was:<1>
> 12:31:11 
> {noformat}
>  
>  
>  
>  
> {noformat}
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0193/test-results/distributedTest/1593463337/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0193/test-artifacts/1593463337/distributedtestfiles-OpenJDK8-1.14.0-build.0193.tgz
>  {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8327) Upgrade buildSrc guava version

2020-07-13 Thread Robert Houghton (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Houghton resolved GEODE-8327.

Fix Version/s: 1.14.0
   Resolution: Fixed

> Upgrade buildSrc guava version
> --
>
> Key: GEODE-8327
> URL: https://issues.apache.org/jira/browse/GEODE-8327
> Project: Geode
>  Issue Type: Improvement
>  Components: build
>Reporter: Robert Houghton
>Priority: Major
> Fix For: 1.14.0
>
>
> Gradle buildSrc uses an old guava library that is transitive via 
> palantir.docker. Pull it up to a modern version to allow better/more plugin 
> integrations (like Jib)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8313) Improve RedisData synchronization for toData

2020-07-13 Thread Jens Deppe (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Deppe resolved GEODE-8313.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> Improve RedisData synchronization for toData
> 
>
> Key: GEODE-8313
> URL: https://issues.apache.org/jira/browse/GEODE-8313
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Reporter: Jens Deppe
>Assignee: Jens Deppe
>Priority: Major
> Fix For: 1.14.0
>
>
> During GII, redis data structures may throw 
> {{ConcurrentModificationException}}s from {{toData}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8067) ClassLoader Isolation

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157018#comment-17157018
 ] 

ASF GitHub Bot commented on GEODE-8067:
---

lgtm-com[bot] commented on pull request #5357:
URL: https://github.com/apache/geode/pull/5357#issuecomment-657836650


   This pull request **fixes 2 alerts** when merging 
ff1286006fddc924485d7668155cc56c58a63933 into 
c41e3b4b559bfbc744c8c21844cd126de2ad2fb9 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-ffc286a7b66fc13ae218b8aa90d7a9e74fd0b606)
   
   **fixed alerts:**
   
   * 2 for Unused variable, import, function or class



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ClassLoader Isolation
> -
>
> Key: GEODE-8067
> URL: https://issues.apache.org/jira/browse/GEODE-8067
> Project: Geode
>  Issue Type: New Feature
>  Components: client/server
>Reporter: Udo Kohlmeyer
>Assignee: Udo Kohlmeyer
>Priority: Major
>
> This is the root jira for the first pass implementation for [ClassLoader 
> Isolation|https://cwiki.apache.org/confluence/display/GEODE/ClassLoader+Isolation]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8305) PubSubIntegrationTest failing on Windows

2020-07-13 Thread Jens Deppe (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Deppe resolved GEODE-8305.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> PubSubIntegrationTest failing on Windows
> 
>
> Key: GEODE-8305
> URL: https://issues.apache.org/jira/browse/GEODE-8305
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Reporter: Darrel Schneider
>Assignee: Jens Deppe
>Priority: Major
> Fix For: 1.14.0
>
>
> {noformat}
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest > 
> testSubscribeAndPublishUsingBinaryData FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest that 
> uses org.apache.geode.redis.mocks.MockBinarySubscriber was not fulfilled 
> within 5 minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.waitFor(PubSubIntegrationTest.java:680)
> at 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.testSubscribeAndPublishUsingBinaryData(PubSubIntegrationTest.java:327)
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest > 
> testUnsubscribingImplicitlyFromAllChannels FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest that 
> uses org.apache.geode.redis.mocks.MockSubscriber was not fulfilled within 5 
> minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.waitFor(PubSubIntegrationTest.java:680)
> at 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.testUnsubscribingImplicitlyFromAllChannels(PubSubIntegrationTest.java:400)
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest > 
> testPatternSubscribe FAILED
> org.awaitility.core.ConditionTimeoutException: Condition with lambda 
> expression in 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest that 
> uses org.apache.geode.redis.mocks.MockSubscriber was not fulfilled within 5 
> minutes.
> at 
> org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
> at 
> org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895)
> at 
> org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864)
> at 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.waitFor(PubSubIntegrationTest.java:680)
> at 
> org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.testPatternSubscribe(PubSubIntegrationTest.java:560)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8319) NPE due to locator missing cluster config folder

2020-07-13 Thread Jinmei Liao (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinmei Liao resolved GEODE-8319.

Fix Version/s: 1.14.0
   Resolution: Fixed

> NPE due to locator missing cluster config folder
> 
>
> Key: GEODE-8319
> URL: https://issues.apache.org/jira/browse/GEODE-8319
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Jinmei Liao
>Assignee: Jinmei Liao
>Priority: Major
> Fix For: 1.14.0
>
>
> Opening a JIRA because I believe any NPE is unacceptable in the product.  
> Please provide some better more explanatory type of error exception if 
> missing the cluster configuration in the locator.
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.geode.distributed.internal.InternalConfigurationPersistenceService.loadSharedConfigurationFromDir(InternalConfigurationPersistenceService.java:672)
> at 
> org.apache.geode.distributed.internal.InternalConfigurationPersistenceService.initSharedConfiguration(InternalConfigurationPersistenceService.java:435)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startConfigurationPersistenceService(InternalLocator.java:1348)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startClusterManagementService(InternalLocator.java:733)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startCache(InternalLocator.java:729)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startDistributedSystem(InternalLocator.java:708)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:374)
> at 
> org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:679)
> at 
> com.citi.cate.gemfire.CitiGemFireLocatorStart.run(CitiGemFireLocatorStart.java:47)
> at 
> com.citi.cate.gemfire.CitiGemFireLocatorStart.main(CitiGemFireLocatorStart.java:115)
> Environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (GEODE-8239) Gradle configuration to create manifests for all Geode jars

2020-07-13 Thread Patrick Johnsn (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Johnsn resolved GEODE-8239.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> Gradle configuration to create manifests for all Geode jars
> ---
>
> Key: GEODE-8239
> URL: https://issues.apache.org/jira/browse/GEODE-8239
> Project: Geode
>  Issue Type: Sub-task
>  Components: client/server
>Reporter: Patrick Johnsn
>Assignee: Patrick Johnsn
>Priority: Major
> Fix For: 1.14.0
>
>
> Modify the Gradle configuration to generate a manifest file with "Class-Path" 
> and "Dependent-Modules" attributes inside the jars. This manifest will be 
> used when defining modules using the jars.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8200) Rebalance operations stuck in "IN_PROGRESS" state forever

2020-07-13 Thread Jinmei Liao (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinmei Liao updated GEODE-8200:
---
Fix Version/s: 1.14.0

> Rebalance operations stuck in "IN_PROGRESS" state forever
> -
>
> Key: GEODE-8200
> URL: https://issues.apache.org/jira/browse/GEODE-8200
> Project: Geode
>  Issue Type: Bug
>  Components: management
>Reporter: Aaron Lindsey
>Assignee: Jianxia Chen
>Priority: Major
>  Labels: GeodeOperationAPI
> Fix For: 1.14.0
>
> Attachments: GEODE-8200-exportedLogs.zip
>
>
> We use the management REST API to call rebalance immediately before stopping 
> a server to limit the possibility of data loss. In a cluster with 3 locators, 
> 3 servers, and no regions, we noticed that sometimes the rebalance operation 
> never ends if one of the locators is restarting concurrently with the 
> rebalance operation.
> More specifically, the scenario where we see this issue crop up is during an 
> automated "rolling restart" operation in a Kubernetes environment which 
> proceeds as follows:
> * At most one locator and one server are restarting at any point in time
> * Each locator/server waits until the previous locator/server is fully online 
> before restarting
> * Immediately before stopping a server, a rebalance operation is performed 
> and the server is not stopped until the rebalance operation is completed
> The impact of this issue is that the "rolling restart" operation will never 
> complete, because it cannot proceed with stopping a server until the 
> rebalance operation is completed. A human is then required to intervene and 
> manually trigger a rebalance and stop the server. This type of "rolling 
> restart" operation is triggered fairly often in Kubernetes — any time part of 
> the configuration of the locators or servers changes. 
> The following JSON is a sample response from the management REST API that 
> shows the rebalance operation stuck in "IN_PROGRESS".
> {code}
> {
>   "statusCode": "IN_PROGRESS",
>   "links": {
> "self": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances/a47f23c8-02b3-443c-a367-636fd6921ea7";,
> "list": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances";
>   },
>   "operationStart": "2020-05-27T22:38:30.619Z",
>   "operationId": "a47f23c8-02b3-443c-a367-636fd6921ea7",
>   "operation": {
> "simulate": false
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8357) Exhausting the high priority message thread pool can result in deadlock

2020-07-13 Thread Kirk Lund (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund updated GEODE-8357:
-
Description: 
The system property "DistributionManager.MAX_THREADS" default to 100:
{noformat}
int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
{noformat}
The system property used to be defined in geode-core ClusterDistributionManager 
and has moved to geode-core OperationExecutors.

The value is used to limit ClusterOperationExecutors threadPool and 
highPriorityPool:
{noformat}
threadPool =
CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
Processor ",
thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());

highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
"Pooled High Priority Message Processor ",
thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
{noformat}
I have seen server startup hang when recovering lots of expired entries from 
disk while using PDX. The hang looks like a dlock request for the PDX lock is 
not receiving a response. Checking the value for the 
distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
maxed out and never dropping.

The dlock response granting the PDX lock is stuck in the highPriorityQueue 
because there are no more highPriorityQueue threads available to process the 
response. All of the highPriorityQueue thread stack dumps show tasks such as 
recovering bucket from disk are blocked waiting for the PDX lock.

Several changes could improve this situation, either in conjunction or 
individually:
# improve observability to enable support to identify that this situation has 
occurred
# increase MAX_THREADS default to 1000
# automatically identify this situation and warn the user with a log statement
# automatically prevent this situation
# identify the messages that are prone to causing deadlocks and move them to a 
dedicated thread pool with a higher limit

  was:
The system property "DistributionManager.MAX_THREADS" default to 100:
{noformat}
int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100);
{noformat}
The system property used to be defined in geode-core ClusterDistributionManager 
and has moved to geode-core OperationExecutors.

The value is used to limit ClusterOperationExecutors threadPool and 
highPriorityPool:
{noformat}
threadPool =
CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message 
Processor ",
thread -> stats.incProcessingThreadStarts(), this::doProcessingThread,
MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper());

highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics(
"Pooled High Priority Message Processor ",
thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread,
MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor,
INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper());
{noformat}
I have seen server startup hang when recovering lots of expired entries from 
disk while using PDX. The hang looks like a dlock request for the PDX lock is 
not receiving a response. Checking the value for the 
distributionStats#highPriorityQueueSize statistic (in VSD) shows the value 
maxed out and never dropping.

The dlock response granting the PDX lock is stuck in the highPriorityQueue 
because there are no more highPriorityQueue threads available to process the 
response. All of the highPriorityQueue thread stack dumps show tasks such as 
recovering bucket from disk are blocked waiting for the PDX lock.

Several changes could improve this situation, either in conjunction or 
individually:
# improve observability to enable support to identify that this situation has 
occurred
# automatically identify this situation and warn the user with a log statement
# automatically prevent this situation
# identify the messages that are prone to causing deadlocks and move them to a 
dedicated thread pool with a higher limit


> Exhausting the high priority message thread pool can result in deadlock
> ---
>
> Key: GEODE-8357
> URL: https://issues.apache.org/jira/browse/GEODE-8357
> Project: Geode
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.0.0-incubating, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 
> 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0
>Reporter: Kirk Lund
>Assignee: Kirk Lund
>Priority: Major
>  Labels: Geo

[jira] [Commented] (GEODE-8067) ClassLoader Isolation

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157040#comment-17157040
 ] 

ASF GitHub Bot commented on GEODE-8067:
---

lgtm-com[bot] commented on pull request #5357:
URL: https://github.com/apache/geode/pull/5357#issuecomment-65711


   This pull request **introduces 2 alerts** and **fixes 2** when merging 
591c7469807ff61ab71176003f906c986845957a into 
c41e3b4b559bfbc744c8c21844cd126de2ad2fb9 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-89c2bb8fd7b4c28a26432c538ccc3f17af02)
   
   **new alerts:**
   
   * 2 for Potential input resource leak
   
   **fixed alerts:**
   
   * 2 for Unused variable, import, function or class



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ClassLoader Isolation
> -
>
> Key: GEODE-8067
> URL: https://issues.apache.org/jira/browse/GEODE-8067
> Project: Geode
>  Issue Type: New Feature
>  Components: client/server
>Reporter: Udo Kohlmeyer
>Assignee: Udo Kohlmeyer
>Priority: Major
>
> This is the root jira for the first pass implementation for [ClassLoader 
> Isolation|https://cwiki.apache.org/confluence/display/GEODE/ClassLoader+Isolation]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8333) Fix PUBSUB hang

2020-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157148#comment-17157148
 ] 

ASF GitHub Bot commented on GEODE-8333:
---

jdeppe-pivotal opened a new pull request #5368:
URL: https://github.com/apache/geode/pull/5368


   - Introduce notion of a Subscription being 'active'. This flag is only
 set once a subscriber has been moved to the 'subscribers'
 EventLoopGroup. This avoids a subscriber processing a publish message
 when it is still on the 'worker' EventLoopGroup which may cause a
 hang.
   - Refactor various MockSubscribers to a single class.
   
   Authored-by: Jens Deppe 
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix PUBSUB hang
> ---
>
> Key: GEODE-8333
> URL: https://issues.apache.org/jira/browse/GEODE-8333
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Reporter: Sarah Abbey
>Priority: Major
>
> PUBSUB hangs with concurrent publishers and subscribers on multiple servers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)