[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover
[ https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156559#comment-17156559 ] ASF GitHub Bot commented on GEODE-8329: --- jvarenina commented on a change in pull request #5360: URL: https://github.com/apache/geode/pull/5360#discussion_r453489804 ## File path: geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java ## @@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, boolean isDurable) { .set(((DefaultQueryService) this.pool.getQueryService()).getUserAttributes(name)); } try { -if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) { +if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT Review comment: I have tested TC with redundancy configured and it seems that the recovery of CQs is done a differently in this case. The remaining server sends within `InitialImageOperation$FilterInfoMessage` all releveant CQs information to the starting server. At the reception of the message the starting server then registers CQs as durable (so no problem in this case observed). **Primary server:** ``` [debug 2020/07/10 13:30:54.916 CEST :41001 shared unordered uid=1 local port=53683 remote port=45674> tid=0x57] Received message 'InitialImageOperation$RequestFilterInfoMessage(region path='/_gfe_durable_client_with_id_AppCounters_1_queue'; sender=192.168.1.102(server3:31347):41001; processorId=27)' from <192.168.1.102(server3:31347):41001> ``` **Starting server:** ``` [debug 2020/07/10 13:30:54.916 CEST tid=0x48] Sending (InitialImageOperation$RequestFilterInfoMessage(region path='/_gfe_durable_client_with_id_AppCounters_1_queue'; sender=192.168.1.102(server3:31347):41001; processorId=27)) to 1 peers ([192.168.1.102(server1:30862):41000]) via tcp/ip [debug 2020/07/10 13:30:54.918 CEST :41000 shared unordered uid=5 local port=52175 remote port=46552> tid=0x30] Received message 'InitialImageOperation$FilterInfoMessage processorId=27 from 192.168.1.102(server1:30862):41000; NON_DURABLE allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; DURABLE allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; cqs=1' from <192.168.1.102(server1:30862):41000> [debug 2020/07/10 13:30:54.919 CEST tid=0x3d] Processing FilterInfo for proxy: CacheClientProxy[identity(192.168.1.102(31226:loner):45576:8b927d38,connection=1,durableAttributes=DurableClientAttributes[id=AppCounters; timeout=200]); port=57552; primary=false; version=GEODE 1.12.0] : InitialImageOperation$FilterInfoMessage processorId=27 from 192.168.1.102(server1:30862):41000; NON_DURABLE allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; DURABLE allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; cqs=1 [debug 2020/07/10 13:30:54.944 CEST tid=0x3d] Server side query for the cq: randomTracker is: SELECT * FROM /example-region i where i > 70 [debug 2020/07/10 13:30:54.944 CEST tid=0x3d] Added CQ to the base region: /example-region With key as: randomTracker__AppCounters [debug 2020/07/10 13:30:54.944 CEST tid=0x3d] Adding CQ into MatchingCQ map, CQName: randomTracker__AppCounters Number of matched querys are: 1 [debug 2020/07/10 13:30:54.945 CEST tid=0x3d] Adding to CQ Repository. CqName : randomTracker ServerCqName : randomTracker__AppCounters [debug 2020/07/10 13:30:54.945 CEST tid=0x3d] Adding CQ randomTracker__AppCounters to this members FilterProfile. [debug 2020/07/10 13:30:54.945 CEST tid=0x3d] Successfully created CQ on the server. CqName : randomTracker ``` I can attach full logs if you need. Also, I have found the the following comment in the client code: ``` // Even though the new redundant queue will usually recover // subscription information (see bug #39014) from its initial // image provider, in bug #42280 we found that this is not always // the case, so clients must always register interest with the new // redundant server. if (recoverInterest) { recoverInterest(queueConnection, isFirstNewConnection); } ``` It is stated here the there is possible case when redundant queue isn't recovered by `InitialImageOperation$FilterInfoMessage`, but I haven't been able to reproduce that case. Do you see any benefit in finding and creating TC for this scenario, since recovery of durable CQ is already tested with TC without redundancy?
[jira] [Commented] (GEODE-8119) Threads are not properly closed when offline disk-store commands are invoked
[ https://issues.apache.org/jira/browse/GEODE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156655#comment-17156655 ] ASF GitHub Bot commented on GEODE-8119: --- jujoramos commented on pull request #5175: URL: https://github.com/apache/geode/pull/5175#issuecomment-657507636 Hello @mkevo, This PR has been inactive for quite some time now, should we close it or are you planning to continue working on it?. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Threads are not properly closed when offline disk-store commands are invoked > > > Key: GEODE-8119 > URL: https://issues.apache.org/jira/browse/GEODE-8119 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Mario Kevo >Assignee: Mario Kevo >Priority: Major > > Threads can be opened when you are online and offline, but close only when > you are online. Once some offline command started thread it cannot be closed > and after some time if there is a bigger number of this threads it can lead > to OOM exception. > Also the problem is that its validating only disk-dirs but not diskStore > name. So thread can be created but there is no diskStore with that name and > it will also hang. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8119) Threads are not properly closed when offline disk-store commands are invoked
[ https://issues.apache.org/jira/browse/GEODE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156659#comment-17156659 ] ASF GitHub Bot commented on GEODE-8119: --- mkevo commented on pull request #5175: URL: https://github.com/apache/geode/pull/5175#issuecomment-657509799 > Hello @mkevo, > > This PR has been inactive for quite some time now, should we close it or are you planning to continue working on it?. Hi @jujoramos, I have some other commitments, as soon as possible I will come back to this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Threads are not properly closed when offline disk-store commands are invoked > > > Key: GEODE-8119 > URL: https://issues.apache.org/jira/browse/GEODE-8119 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Mario Kevo >Assignee: Mario Kevo >Priority: Major > > Threads can be opened when you are online and offline, but close only when > you are online. Once some offline command started thread it cannot be closed > and after some time if there is a bigger number of this threads it can lead > to OOM exception. > Also the problem is that its validating only disk-dirs but not diskStore > name. So thread can be created but there is no diskStore with that name and > it will also hang. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover
[ https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156682#comment-17156682 ] ASF GitHub Bot commented on GEODE-8329: --- jvarenina commented on a change in pull request #5360: URL: https://github.com/apache/geode/pull/5360#discussion_r453607399 ## File path: geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java ## @@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, boolean isDurable) { .set(((DefaultQueryService) this.pool.getQueryService()).getUserAttributes(name)); } try { -if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) { +if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT Review comment: Thanks for your comments! Related to the Interest recovery, I have tried following case: ``` Start three servers. Start client with following config: - redundnacy set to 0 - register non-durable Interests - configure durable id ``` After I shutdown primary server I expected that the client should register/recover Interests on the another running server. I have tried exactly same case with and without the code you suggested. What I have noticed that some steps are missing related to the recovery of non-durable Interest when using solution you suggested (please check logs below). Is this expected? without your code: ``` [debug 2020/07/13 13:56:37.574 CEST tid=0x5a] SubscriptionManager redundancy satisfier - Non backup server was made primary. Recovering interest jakov:26486 [info 2020/07/13 13:56:37.574 CEST :41002 port 26486> tid=0x5b] Cache Client Updater Thread on 192.168.1.101(12538):41002 port 26486 (jakov:26486) : ready to process messages. [debug 2020/07/13 13:56:37.576 CEST tid=0x5a] org.apache.geode.cache.client.internal.QueueManagerImpl@69610f07.recoverSingleRegion starting kind=KEY region=/HAInterestBaseTest_region: {k1=KEYS, k2=KEYS} [debug 2020/07/13 13:56:37.576 CEST tid=0x5a] registerInterestsStarted: new count = 1 [debug 2020/07/13 13:56:37.578 CEST tid=0x5a] localDestroyNoCallbacks key=k2 [debug 2020/07/13 13:56:37.579 CEST tid=0x5a] basicDestroyPart2: k2, version=null [debug 2020/07/13 13:56:37.580 CEST tid=0x5a] VersionedThinRegionEntryHeapStringKey1@1f47aafa (key=k2; rawValue=REMOVED_PHASE1; version={v1; rv2; mbr=192.168.1.101(12538):41002; time=1594641397170};member=192.168.1.101(12538):41002) dispatching event EntryEventImpl[op=LOCAL_DESTROY;region=/HAInterestBaseTest_region;key=k2;callbackArg=null;originRemote=false;originMember=jakov(12395:loner):57906:02b10848] [debug 2020/07/13 13:56:37.580 CEST tid=0x5a] localDestroyNoCallbacks key=k1 [debug 2020/07/13 13:56:37.580 CEST tid=0x5a] basicDestroyPart2: k1, version=null [debug 2020/07/13 13:56:37.580 CEST tid=0x5a] VersionedThinRegionEntryHeapStringKey1@5ecf6c9c (key=k1; rawValue=REMOVED_PHASE1; version={v1; rv1; mbr=192.168.1.101(12538):41002; time=1594641397148};member=192.168.1.101(12538):41002) dispatching event EntryEventImpl[op=LOCAL_DESTROY;region=/HAInterestBaseTest_region;key=k1;callbackArg=null;originRemote=false;originMember=jakov(12395:loner):57906:02b10848] [debug 2020/07/13 13:56:37.580 CEST tid=0x5a] org.apache.geode.cache.client.internal.QueueManagerImpl@69610f07.recoverSingleRegion :Endpoint recovered is primary so clearing the keys of interest starting kind=KEY region=/HAInterestBaseTest_region: [k1, k2] [debug 2020/07/13 13:56:37.584 CEST tid=0x5a] org.apache.geode.internal.cache.LocalRegion[path='/HAInterestBaseTest_region';scope=LOCAL';dataPolicy=NORMAL; concurrencyChecksEnabled] refreshEntriesFromServerKeys count=2 policy=KEYS k1 k2 [debug 2020/07/13 13:56:37.584 CEST tid=0x5a] refreshEntries region=/HAInterestBaseTest_region [debug 2020/07/13 13:56:37.585 CEST tid=0x5a] registerInterestCompleted: new value = 0 [debug 2020/07/13 13:56:37.585 CEST tid=0x5a] registerInterestCompleted: Signalling end of register-interest [debug 2020/07/13 13:56:37.586 CEST tid=0x5a] Primary recovery not needed ``` with your code: ``` [debug 2020/07/13 13:44:20.028 CEST tid=0x5a] SubscriptionManager redundancy satisfier - Non backup server was made primary. Recovering interest jakov:28101 [info 2020/07/13 13:44:20.028 CEST :41002 port 28101> tid=0x5b] Cache Client Updater Thread on 192.168.1.101(11053):41002 port 28101 (jakov:28101) : ready to process messages. [debug 2020/07/13 13:44:20.030 CEST tid=0x5a] Primary recovery not needed ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.
[jira] [Commented] (GEODE-8329) Durable CQ not registered as durable after server failover
[ https://issues.apache.org/jira/browse/GEODE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156683#comment-17156683 ] ASF GitHub Bot commented on GEODE-8329: --- jvarenina commented on a change in pull request #5360: URL: https://github.com/apache/geode/pull/5360#discussion_r453489804 ## File path: geode-core/src/main/java/org/apache/geode/cache/client/internal/QueueManagerImpl.java ## @@ -1112,7 +1112,8 @@ private void recoverCqs(Connection recoveredConnection, boolean isDurable) { .set(((DefaultQueryService) this.pool.getQueryService()).getUserAttributes(name)); } try { -if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT) { +if (((CqStateImpl) cqi.getState()).getState() != CqStateImpl.INIT Review comment: I have tested TC with redundancy configured and it seems that the recovery of CQs is done a differently in this case. The primary server sends within `InitialImageOperation$FilterInfoMessage` all releveant CQs information to the starting server. At the reception of the message the starting server then registers CQs as durable (so no problem in this case observed). **Primary server:** ``` [debug 2020/07/10 13:30:54.916 CEST :41001 shared unordered uid=1 local port=53683 remote port=45674> tid=0x57] Received message 'InitialImageOperation$RequestFilterInfoMessage(region path='/_gfe_durable_client_with_id_AppCounters_1_queue'; sender=192.168.1.102(server3:31347):41001; processorId=27)' from <192.168.1.102(server3:31347):41001> ``` **Starting server:** ``` [debug 2020/07/10 13:30:54.916 CEST tid=0x48] Sending (InitialImageOperation$RequestFilterInfoMessage(region path='/_gfe_durable_client_with_id_AppCounters_1_queue'; sender=192.168.1.102(server3:31347):41001; processorId=27)) to 1 peers ([192.168.1.102(server1:30862):41000]) via tcp/ip [debug 2020/07/10 13:30:54.918 CEST :41000 shared unordered uid=5 local port=52175 remote port=46552> tid=0x30] Received message 'InitialImageOperation$FilterInfoMessage processorId=27 from 192.168.1.102(server1:30862):41000; NON_DURABLE allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; DURABLE allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; cqs=1' from <192.168.1.102(server1:30862):41000> [debug 2020/07/10 13:30:54.919 CEST tid=0x3d] Processing FilterInfo for proxy: CacheClientProxy[identity(192.168.1.102(31226:loner):45576:8b927d38,connection=1,durableAttributes=DurableClientAttributes[id=AppCounters; timeout=200]); port=57552; primary=false; version=GEODE 1.12.0] : InitialImageOperation$FilterInfoMessage processorId=27 from 192.168.1.102(server1:30862):41000; NON_DURABLE allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; DURABLE allKeys=0; allKeysInv=0; keysOfInterest=0; keysOfInterestInv=0; patternsOfInterest=0; patternsOfInterestInv=0; filtersOfInterest=0; filtersOfInterestInv=0; cqs=1 [debug 2020/07/10 13:30:54.944 CEST tid=0x3d] Server side query for the cq: randomTracker is: SELECT * FROM /example-region i where i > 70 [debug 2020/07/10 13:30:54.944 CEST tid=0x3d] Added CQ to the base region: /example-region With key as: randomTracker__AppCounters [debug 2020/07/10 13:30:54.944 CEST tid=0x3d] Adding CQ into MatchingCQ map, CQName: randomTracker__AppCounters Number of matched querys are: 1 [debug 2020/07/10 13:30:54.945 CEST tid=0x3d] Adding to CQ Repository. CqName : randomTracker ServerCqName : randomTracker__AppCounters [debug 2020/07/10 13:30:54.945 CEST tid=0x3d] Adding CQ randomTracker__AppCounters to this members FilterProfile. [debug 2020/07/10 13:30:54.945 CEST tid=0x3d] Successfully created CQ on the server. CqName : randomTracker ``` I can attach full logs if you need. Also, I have found the the following comment in the client code: ``` // Even though the new redundant queue will usually recover // subscription information (see bug #39014) from its initial // image provider, in bug #42280 we found that this is not always // the case, so clients must always register interest with the new // redundant server. if (recoverInterest) { recoverInterest(queueConnection, isFirstNewConnection); } ``` It is stated here the there is possible case when redundant queue isn't recovered by `InitialImageOperation$FilterInfoMessage`, but I haven't been able to reproduce that case. Do you see any benefit in finding and creating TC for this scenario, since recovery of durable CQ is already tested with TC without redundancy? -
[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation
[ https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156768#comment-17156768 ] ASF GitHub Bot commented on GEODE-8351: --- sabbeyPivotal commented on a change in pull request #5364: URL: https://github.com/apache/geode/pull/5364#discussion_r453711863 ## File path: geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ + +package org.apache.geode.redis.internal.data; + +import static org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT; +import static org.assertj.core.api.Assertions.assertThat; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.Set; + +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.ClassRule; +import org.junit.Test; +import redis.clients.jedis.Jedis; + +import org.apache.geode.cache.Region; +import org.apache.geode.cache.partition.PartitionRegionHelper; +import org.apache.geode.internal.cache.InternalCache; +import org.apache.geode.test.awaitility.GeodeAwaitility; +import org.apache.geode.test.dunit.rules.ClusterStartupRule; +import org.apache.geode.test.dunit.rules.MemberVM; +import org.apache.geode.test.dunit.rules.RedisClusterStartupRule; + +public class DeltaDUnitTest { + + @ClassRule + public static RedisClusterStartupRule clusterStartUp = new RedisClusterStartupRule(4); + + private static final String LOCAL_HOST = "127.0.0.1"; + private static final int SET_SIZE = 10; + private static final int JEDIS_TIMEOUT = + Math.toIntExact(GeodeAwaitility.getTimeout().toMillis()); + private static Jedis jedis1; + private static Jedis jedis2; + + private static Properties locatorProperties; + + private static MemberVM locator; + private static MemberVM server1; + private static MemberVM server2; + + private static int redisServerPort1; + private static int redisServerPort2; + + @BeforeClass + public static void classSetup() { +locatorProperties = new Properties(); +locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000"); + +locator = clusterStartUp.startLocatorVM(0, locatorProperties); +server1 = clusterStartUp.startRedisVM(1, locator.getPort()); +server2 = clusterStartUp.startRedisVM(2, locator.getPort()); + +redisServerPort1 = clusterStartUp.getRedisPort(1); +redisServerPort2 = clusterStartUp.getRedisPort(2); + +jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT); +jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT); + } + + @Before + public void testSetup() { +jedis1.flushAll(); + } + + @AfterClass + public static void tearDown() { +jedis1.disconnect(); +jedis2.disconnect(); + +server1.stop(); +server2.stop(); + } + + @Test + public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() { +String key = "key"; +String baseValue = "value-"; +jedis1.set(key, baseValue); +for (int i = 0; i < SET_SIZE; i++) { + jedis1.set(key, String.valueOf(i)); Review comment: Yes, thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DUnit tests for Delta Propagation > - > > Key: GEODE-8351 > URL: https://issues.apache.org/jira/browse/GEODE-8351 > Project: Geode > Issue Type: Test > Components: redis, tests >Reporter: Sarah Abbey >Priority: Major > > Need to confirm that when deltas are propagated, the data is correctly stored > on the secondary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation
[ https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156769#comment-17156769 ] ASF GitHub Bot commented on GEODE-8351: --- sabbeyPivotal commented on a change in pull request #5364: URL: https://github.com/apache/geode/pull/5364#discussion_r453712504 ## File path: geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ + +package org.apache.geode.redis.internal.data; + +import static org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT; +import static org.assertj.core.api.Assertions.assertThat; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.Set; + +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.ClassRule; +import org.junit.Test; +import redis.clients.jedis.Jedis; + +import org.apache.geode.cache.Region; +import org.apache.geode.cache.partition.PartitionRegionHelper; +import org.apache.geode.internal.cache.InternalCache; +import org.apache.geode.test.awaitility.GeodeAwaitility; +import org.apache.geode.test.dunit.rules.ClusterStartupRule; +import org.apache.geode.test.dunit.rules.MemberVM; +import org.apache.geode.test.dunit.rules.RedisClusterStartupRule; + +public class DeltaDUnitTest { + + @ClassRule + public static RedisClusterStartupRule clusterStartUp = new RedisClusterStartupRule(4); + + private static final String LOCAL_HOST = "127.0.0.1"; + private static final int SET_SIZE = 10; + private static final int JEDIS_TIMEOUT = + Math.toIntExact(GeodeAwaitility.getTimeout().toMillis()); + private static Jedis jedis1; + private static Jedis jedis2; + + private static Properties locatorProperties; + + private static MemberVM locator; + private static MemberVM server1; + private static MemberVM server2; + + private static int redisServerPort1; + private static int redisServerPort2; + + @BeforeClass + public static void classSetup() { +locatorProperties = new Properties(); +locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000"); + +locator = clusterStartUp.startLocatorVM(0, locatorProperties); +server1 = clusterStartUp.startRedisVM(1, locator.getPort()); +server2 = clusterStartUp.startRedisVM(2, locator.getPort()); + +redisServerPort1 = clusterStartUp.getRedisPort(1); +redisServerPort2 = clusterStartUp.getRedisPort(2); + +jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT); +jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT); + } + + @Before + public void testSetup() { +jedis1.flushAll(); + } + + @AfterClass + public static void tearDown() { +jedis1.disconnect(); +jedis2.disconnect(); + +server1.stop(); +server2.stop(); + } + + @Test + public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() { +String key = "key"; +String baseValue = "value-"; +jedis1.set(key, baseValue); +for (int i = 0; i < SET_SIZE; i++) { Review comment: Thank you, meant to change that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DUnit tests for Delta Propagation > - > > Key: GEODE-8351 > URL: https://issues.apache.org/jira/browse/GEODE-8351 > Project: Geode > Issue Type: Test > Components: redis, tests >Reporter: Sarah Abbey >Priority: Major > > Need to confirm that when deltas are propagated, the data is correctly stored > on the secondary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation
[ https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156775#comment-17156775 ] ASF GitHub Bot commented on GEODE-8351: --- sabbeyPivotal commented on a change in pull request #5364: URL: https://github.com/apache/geode/pull/5364#discussion_r453718436 ## File path: geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ + +package org.apache.geode.redis.internal.data; + +import static org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT; +import static org.assertj.core.api.Assertions.assertThat; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.Set; + +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.ClassRule; +import org.junit.Test; +import redis.clients.jedis.Jedis; + +import org.apache.geode.cache.Region; +import org.apache.geode.cache.partition.PartitionRegionHelper; +import org.apache.geode.internal.cache.InternalCache; +import org.apache.geode.test.awaitility.GeodeAwaitility; +import org.apache.geode.test.dunit.rules.ClusterStartupRule; +import org.apache.geode.test.dunit.rules.MemberVM; +import org.apache.geode.test.dunit.rules.RedisClusterStartupRule; + +public class DeltaDUnitTest { + + @ClassRule + public static RedisClusterStartupRule clusterStartUp = new RedisClusterStartupRule(4); + + private static final String LOCAL_HOST = "127.0.0.1"; + private static final int SET_SIZE = 10; + private static final int JEDIS_TIMEOUT = + Math.toIntExact(GeodeAwaitility.getTimeout().toMillis()); + private static Jedis jedis1; + private static Jedis jedis2; + + private static Properties locatorProperties; + + private static MemberVM locator; + private static MemberVM server1; + private static MemberVM server2; + + private static int redisServerPort1; + private static int redisServerPort2; + + @BeforeClass + public static void classSetup() { +locatorProperties = new Properties(); +locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000"); + +locator = clusterStartUp.startLocatorVM(0, locatorProperties); +server1 = clusterStartUp.startRedisVM(1, locator.getPort()); +server2 = clusterStartUp.startRedisVM(2, locator.getPort()); + +redisServerPort1 = clusterStartUp.getRedisPort(1); +redisServerPort2 = clusterStartUp.getRedisPort(2); + +jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT); +jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT); + } + + @Before + public void testSetup() { +jedis1.flushAll(); + } + + @AfterClass + public static void tearDown() { +jedis1.disconnect(); +jedis2.disconnect(); + +server1.stop(); +server2.stop(); + } + + @Test + public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() { +String key = "key"; +String baseValue = "value-"; +jedis1.set(key, baseValue); +for (int i = 0; i < SET_SIZE; i++) { + jedis1.set(key, String.valueOf(i)); + + String server1LocalValue = server1.invoke(() -> { +InternalCache cache = ClusterStartupRule.getCache(); +Region region = cache.getRegion("__REDIS_DATA"); +Region localRegion = +PartitionRegionHelper.getLocalData(region); + +RedisData localValue = localRegion.get(new ByteArrayWrapper(key.getBytes())); +return localValue.toString(); Review comment: good call, updating it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DUnit tests for Delta Propagation > - > > Key: GEODE-8351 > URL: https://
[jira] [Created] (GEODE-8354) CI failure: DescribeClientCommandDUnitTest > describeClientWithoutSubscription FAILED
Owen Nichols created GEODE-8354: --- Summary: CI failure: DescribeClientCommandDUnitTest > describeClientWithoutSubscription FAILED Key: GEODE-8354 URL: https://issues.apache.org/jira/browse/GEODE-8354 Project: Geode Issue Type: Bug Components: management Reporter: Owen Nichols java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.geode.management.internal.cli.functions.ContinuousQueryFunction$ClientInfo {noformat} > Task :geode-cq:distributedTest org.apache.geode.management.internal.cli.commands.DescribeClientCommandDUnitTest > describeClientWithoutSubscription FAILED java.lang.AssertionError: Suspicious strings were written to the log during this run. Fix the strings or use IgnoredException.addIgnoredException to ignore. --- Found suspect string in log4j at line 2887 [error 2020/07/10 23:22:52.403 GMT tid=110] Could not execute "describe client --clientID=10.0.0.97(11700:loner):54634:b3e7093b". java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.geode.management.internal.cli.functions.ContinuousQueryFunction$ClientInfo at org.apache.geode.management.internal.cli.commands.DescribeClientCommand.describeClient(DescribeClientCommand.java:123) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:282) at org.apache.geode.management.internal.cli.remote.CommandExecutor.callInvokeMethod(CommandExecutor.java:151) at org.apache.geode.management.internal.cli.remote.CommandExecutor.invokeCommand(CommandExecutor.java:161) at org.apache.geode.management.internal.cli.remote.CommandExecutor.execute(CommandExecutor.java:88) at org.apache.geode.management.internal.cli.remote.CommandExecutor.execute(CommandExecutor.java:71) at org.apache.geode.management.internal.cli.remote.OnlineCommandProcessor.executeCommand(OnlineCommandProcessor.java:130) at org.apache.geode.management.internal.cli.remote.OnlineCommandProcessor.executeCommandReturningJson(OnlineCommandProcessor.java:136) at org.apache.geode.management.internal.beans.MemberMBeanBridge.processCommand(MemberMBeanBridge.java:1237) at org.apache.geode.management.internal.beans.MemberMBean.processCommand(MemberMBean.java:424) {noformat} seen in [WindowsGfshDistributedTestOpenJDK8 #335|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/WindowsGfshDistributedTestOpenJDK8/builds/335#A] =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= [*http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0217/test-results/distributedTest/1594428192/*] =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test report artifacts from this job are available at: [*http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0217/test-artifacts/1594428192/windows-gfshdistributedtest-OpenJDK8-1.14.0-build.0217.tgz*] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors
[ https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156820#comment-17156820 ] ASF GitHub Bot commented on GEODE-8340: --- pdxcodemonkey commented on a change in pull request #625: URL: https://github.com/apache/geode-native/pull/625#discussion_r453764965 ## File path: cppcache/src/ExceptionTypes.cpp ## @@ -297,7 +297,25 @@ const std::string& getThreadLocalExceptionMessage(); PutAllPartialResultException ex(message); throw ex; } -default: { +case GF_NOERR: Review comment: Same as above - adding a zillion case statements doesn't make code clearer. ## File path: cppcache/include/geode/DataInput.hpp ## @@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput { // empty string break; // TODO: What's the right response here? - default: + case internal::DSCode::FixedIDDefault: Review comment: Wow I really really don't like this. Is there anything we can do to fix the warning _besides_ adding ~65 case statements? This makes the code less readable, not more IMO. ## File path: cppcache/src/TcrMessage.cpp ## @@ -1015,6 +1015,36 @@ void TcrMessage::processChunk(const std::vector& chunk, int32_t len, break; } // fall-through for other cases + if (m_chunkedResult != nullptr) { Review comment: What's this block of code doing? I don't immediately see a place where it was removed/moved, so is it new? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce Switch compiler warnings as errors > -- > > Key: GEODE-8340 > URL: https://issues.apache.org/jira/browse/GEODE-8340 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Michael Oleske >Priority: Major > > Given I compile the code without exempting no-switch-enum and > no-implicit-fallthrough and no-covered-switch-default > Then it should compile > Note - was marked as a todo, seems reasonable to tackle all these at once -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata
[ https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156821#comment-17156821 ] ASF GitHub Bot commented on GEODE-8326: --- pivotal-eshu merged pull request #5358: URL: https://github.com/apache/geode/pull/5358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CI Failure: > FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled > times out waiting for client metadata > --- > > Key: GEODE-8326 > URL: https://issues.apache.org/jira/browse/GEODE-8326 > Project: Geode > Issue Type: Bug > Components: client/server, tests >Affects Versions: 1.13.0 >Reporter: Kirk Lund >Assignee: Eric Shu >Priority: Major > Labels: caching-applications > > CI Failure: > http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/ > {noformat} > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > > > clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject] > FAILED > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > that uses org.apache.geode.cache.client.internal.ClientMetadataService was > not fulfilled within 5 minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:78) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:26) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254) > {noformat} > The failure occurs after waiting 5 minutes for the ClientMetadataService to > stabilize. See ClientMetadataService#isMetadataStable. > The timeout occurs within a block of test code that was introduced by Jake in > PR #3840: > {noformat} > GEODE-7006: Fixes function execution by id with transactions. (#3840) > * Fixes test to force and wait for PR metadata to update. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata
[ https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156823#comment-17156823 ] ASF subversion and git services commented on GEODE-8326: Commit 9cd8e7d82c90aed804c39bd0fadd31c5d2eac18c in geode's branch refs/heads/develop from Eric Shu [ https://gitbox.apache.org/repos/asf?p=geode.git;h=9cd8e7d ] GEODE-8326: remove 5 minutes wait to get stack dump (#5358) > CI Failure: > FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled > times out waiting for client metadata > --- > > Key: GEODE-8326 > URL: https://issues.apache.org/jira/browse/GEODE-8326 > Project: Geode > Issue Type: Bug > Components: client/server, tests >Affects Versions: 1.13.0 >Reporter: Kirk Lund >Assignee: Eric Shu >Priority: Major > Labels: caching-applications > > CI Failure: > http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/ > {noformat} > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > > > clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject] > FAILED > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > that uses org.apache.geode.cache.client.internal.ClientMetadataService was > not fulfilled within 5 minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:78) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:26) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254) > {noformat} > The failure occurs after waiting 5 minutes for the ClientMetadataService to > stabilize. See ClientMetadataService#isMetadataStable. > The timeout occurs within a block of test code that was introduced by Jake in > PR #3840: > {noformat} > GEODE-7006: Fixes function execution by id with transactions. (#3840) > * Fixes test to force and wait for PR metadata to update. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8342) Remove non-inclusive language
[ https://issues.apache.org/jira/browse/GEODE-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156825#comment-17156825 ] ASF subversion and git services commented on GEODE-8342: Commit 729185236e66377e5b367d40e43c8654314c60ed in geode-native's branch refs/heads/develop from Jacob Barrett [ https://gitbox.apache.org/repos/asf?p=geode-native.git;h=7291852 ] GEODE-8342: Replace non-inclusive language. (#626) * 'blacklist' wasn't effectively in use anyway, so just remove it. > Remove non-inclusive language > - > > Key: GEODE-8342 > URL: https://issues.apache.org/jira/browse/GEODE-8342 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Jacob Barrett >Priority: Major > > Geode native includes some non-inclusive language that should be replaced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8351) DUnit tests for Delta Propagation
[ https://issues.apache.org/jira/browse/GEODE-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156824#comment-17156824 ] ASF GitHub Bot commented on GEODE-8351: --- sabbeyPivotal commented on a change in pull request #5364: URL: https://github.com/apache/geode/pull/5364#discussion_r453770437 ## File path: geode-redis/src/distributedTest/java/org/apache/geode/redis/internal/data/DeltaDUnitTest.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ + +package org.apache.geode.redis.internal.data; + +import static org.apache.geode.distributed.ConfigurationProperties.MAX_WAIT_TIME_RECONNECT; +import static org.assertj.core.api.Assertions.assertThat; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.Set; + +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.ClassRule; +import org.junit.Test; +import redis.clients.jedis.Jedis; + +import org.apache.geode.cache.Region; +import org.apache.geode.cache.partition.PartitionRegionHelper; +import org.apache.geode.internal.cache.InternalCache; +import org.apache.geode.test.awaitility.GeodeAwaitility; +import org.apache.geode.test.dunit.rules.ClusterStartupRule; +import org.apache.geode.test.dunit.rules.MemberVM; +import org.apache.geode.test.dunit.rules.RedisClusterStartupRule; + +public class DeltaDUnitTest { + + @ClassRule + public static RedisClusterStartupRule clusterStartUp = new RedisClusterStartupRule(4); + + private static final String LOCAL_HOST = "127.0.0.1"; + private static final int SET_SIZE = 10; + private static final int JEDIS_TIMEOUT = + Math.toIntExact(GeodeAwaitility.getTimeout().toMillis()); + private static Jedis jedis1; + private static Jedis jedis2; + + private static Properties locatorProperties; + + private static MemberVM locator; + private static MemberVM server1; + private static MemberVM server2; + + private static int redisServerPort1; + private static int redisServerPort2; + + @BeforeClass + public static void classSetup() { +locatorProperties = new Properties(); +locatorProperties.setProperty(MAX_WAIT_TIME_RECONNECT, "15000"); + +locator = clusterStartUp.startLocatorVM(0, locatorProperties); +server1 = clusterStartUp.startRedisVM(1, locator.getPort()); +server2 = clusterStartUp.startRedisVM(2, locator.getPort()); + +redisServerPort1 = clusterStartUp.getRedisPort(1); +redisServerPort2 = clusterStartUp.getRedisPort(2); + +jedis1 = new Jedis(LOCAL_HOST, redisServerPort1, JEDIS_TIMEOUT); +jedis2 = new Jedis(LOCAL_HOST, redisServerPort2, JEDIS_TIMEOUT); + } + + @Before + public void testSetup() { +jedis1.flushAll(); + } + + @AfterClass + public static void tearDown() { +jedis1.disconnect(); +jedis2.disconnect(); + +server1.stop(); +server2.stop(); + } + + @Test + public void shouldCorrectlyPropagateDeltaToSecondaryServer_whenAppending() { +String key = "key"; +String baseValue = "value-"; +jedis1.set(key, baseValue); +for (int i = 0; i < SET_SIZE; i++) { + jedis1.set(key, String.valueOf(i)); + + String server1LocalValue = server1.invoke(() -> { Review comment: That is true. I made a generic method for getting the local region and getting the correct data. I'm not sure how much further we could go without sacrificing readability? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DUnit tests for Delta Propagation > - > > Key: GEODE-8351 > URL: https://issues.apache.org/jira/browse/GEODE-8351 > Project: Geode > Issue Type: Test > Components: redis, tests >Reporter: Sarah Abbey >Prio
[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors
[ https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156822#comment-17156822 ] ASF GitHub Bot commented on GEODE-8340: --- moleske commented on a change in pull request #625: URL: https://github.com/apache/geode-native/pull/625#discussion_r453770010 ## File path: cppcache/src/TcrMessage.cpp ## @@ -1015,6 +1015,36 @@ void TcrMessage::processChunk(const std::vector& chunk, int32_t len, break; } // fall-through for other cases + if (m_chunkedResult != nullptr) { Review comment: The previous code did a fallthrough if it didn't make it to the `break;` statement on line 999 or 1015 (which is buried in an `if` and `if else`). This is the same code as on line 1054. I thought about extracting a function but got lazy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce Switch compiler warnings as errors > -- > > Key: GEODE-8340 > URL: https://issues.apache.org/jira/browse/GEODE-8340 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Michael Oleske >Priority: Major > > Given I compile the code without exempting no-switch-enum and > no-implicit-fallthrough and no-covered-switch-default > Then it should compile > Note - was marked as a todo, seems reasonable to tackle all these at once -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8342) Remove non-inclusive language
[ https://issues.apache.org/jira/browse/GEODE-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156826#comment-17156826 ] ASF GitHub Bot commented on GEODE-8342: --- pdxcodemonkey merged pull request #626: URL: https://github.com/apache/geode-native/pull/626 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove non-inclusive language > - > > Key: GEODE-8342 > URL: https://issues.apache.org/jira/browse/GEODE-8342 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Jacob Barrett >Priority: Major > > Geode native includes some non-inclusive language that should be replaced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8348) CI does not build benchmarks image
[ https://issues.apache.org/jira/browse/GEODE-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156829#comment-17156829 ] ASF GitHub Bot commented on GEODE-8348: --- smgoller opened a new pull request #131: URL: https://github.com/apache/geode-benchmarks/pull/131 Add the ability to choose what purpose tag is searched for when launching benchmarks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CI does not build benchmarks image > -- > > Key: GEODE-8348 > URL: https://issues.apache.org/jira/browse/GEODE-8348 > Project: Geode > Issue Type: Bug > Components: ci >Reporter: Sean Goller >Priority: Major > > The CI infrastructure relies on the existence of a google compute image in > order to function. Currently that image is not build anywhere in CI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors
[ https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156831#comment-17156831 ] ASF GitHub Bot commented on GEODE-8340: --- moleske commented on a change in pull request #625: URL: https://github.com/apache/geode-native/pull/625#discussion_r453775780 ## File path: cppcache/include/geode/DataInput.hpp ## @@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput { // empty string break; // TODO: What's the right response here? - default: + case internal::DSCode::FixedIDDefault: Review comment: I actually like it since it forces you to acknowledge all possible switches for an enum (which is enumerable). That way you don't accidentally go to the default when adding a new enum. If the general consensus is we prefer not to change `-Wno-switch-enum` then we can revert that commit. I could make it worse by adding `break;` statements in between all the cases :-p This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce Switch compiler warnings as errors > -- > > Key: GEODE-8340 > URL: https://issues.apache.org/jira/browse/GEODE-8340 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Michael Oleske >Priority: Major > > Given I compile the code without exempting no-switch-enum and > no-implicit-fallthrough and no-covered-switch-default > Then it should compile > Note - was marked as a todo, seems reasonable to tackle all these at once -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GEODE-8355) long-running-test details need to be world-readable
Robert Houghton created GEODE-8355: -- Summary: long-running-test details need to be world-readable Key: GEODE-8355 URL: https://issues.apache.org/jira/browse/GEODE-8355 Project: Geode Issue Type: Improvement Components: ci Reporter: Robert Houghton -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8356) Gfsh export logs to support capturing thread dumps in the logs
[ https://issues.apache.org/jira/browse/GEODE-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anilkumar Gingade updated GEODE-8356: - Labels: GeodeOperationAPI (was: ) > Gfsh export logs to support capturing thread dumps in the logs > -- > > Key: GEODE-8356 > URL: https://issues.apache.org/jira/browse/GEODE-8356 > Project: Geode > Issue Type: Bug > Components: gfsh >Reporter: Anilkumar Gingade >Priority: Major > Labels: GeodeOperationAPI > > With an option to say "--with-thread-dump" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8356) Gfsh export logs to support capturing thread dumps in the logs
[ https://issues.apache.org/jira/browse/GEODE-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anilkumar Gingade updated GEODE-8356: - Issue Type: Improvement (was: Bug) > Gfsh export logs to support capturing thread dumps in the logs > -- > > Key: GEODE-8356 > URL: https://issues.apache.org/jira/browse/GEODE-8356 > Project: Geode > Issue Type: Improvement > Components: gfsh >Reporter: Anilkumar Gingade >Priority: Major > Labels: GeodeOperationAPI > > With an option to say "--with-thread-dump" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GEODE-8356) Gfsh export logs to support capturing thread dumps in the logs
Anilkumar Gingade created GEODE-8356: Summary: Gfsh export logs to support capturing thread dumps in the logs Key: GEODE-8356 URL: https://issues.apache.org/jira/browse/GEODE-8356 Project: Geode Issue Type: Bug Components: gfsh Reporter: Anilkumar Gingade With an option to say "--with-thread-dump" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8355) long-running-test details need to be world-readable
[ https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156834#comment-17156834 ] ASF GitHub Bot commented on GEODE-8355: --- rhoughton-pivot opened a new pull request #5366: URL: https://github.com/apache/geode/pull/5366 Authored-by: Robert Houghton Thank you for submitting a contribution to Apache Geode. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [X] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [X] Has your PR been rebased against the latest commit within the target branch (typically `develop`)? - [X] Is your initial contribution a single, squashed commit? - [X] Does `gradlew build` run cleanly? - [n/a] Have you written or updated unit tests to verify your changes? - [n/a] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? ### Note: Please ensure that once the PR is submitted, check Concourse for build issues and submit an update to your PR as soon as possible. If you need help, please send an email to d...@geode.apache.org. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > long-running-test details need to be world-readable > --- > > Key: GEODE-8355 > URL: https://issues.apache.org/jira/browse/GEODE-8355 > Project: Geode > Issue Type: Improvement > Components: ci >Reporter: Robert Houghton >Assignee: Robert Houghton >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (GEODE-8355) long-running-test details need to be world-readable
[ https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Houghton reassigned GEODE-8355: -- Assignee: Robert Houghton > long-running-test details need to be world-readable > --- > > Key: GEODE-8355 > URL: https://issues.apache.org/jira/browse/GEODE-8355 > Project: Geode > Issue Type: Improvement > Components: ci >Reporter: Robert Houghton >Assignee: Robert Houghton >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8348) CI does not build benchmarks image
[ https://issues.apache.org/jira/browse/GEODE-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156837#comment-17156837 ] ASF GitHub Bot commented on GEODE-8348: --- smgoller opened a new pull request #5367: URL: https://github.com/apache/geode/pull/5367 * Add EC2 builder job to images. * Benchmarks job uses branch-specific image. * Change benchmarks source repository location to the deployed fork's repo instead of forcing apache. * download our own copy of fly to use when deploying pipelines via deploy_meta.sh. Thank you for submitting a contribution to Apache Geode. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Has your PR been rebased against the latest commit within the target branch (typically `develop`)? - [ ] Is your initial contribution a single, squashed commit? - [ ] Does `gradlew build` run cleanly? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? ### Note: Please ensure that once the PR is submitted, check Concourse for build issues and submit an update to your PR as soon as possible. If you need help, please send an email to d...@geode.apache.org. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CI does not build benchmarks image > -- > > Key: GEODE-8348 > URL: https://issues.apache.org/jira/browse/GEODE-8348 > Project: Geode > Issue Type: Bug > Components: ci >Reporter: Sean Goller >Priority: Major > > The CI infrastructure relies on the existence of a google compute image in > order to function. Currently that image is not build anywhere in CI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8350) Offical Docker Image Needs ENTRYPOINT
[ https://issues.apache.org/jira/browse/GEODE-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-8350: Labels: starter (was: ) > Offical Docker Image Needs ENTRYPOINT > - > > Key: GEODE-8350 > URL: https://issues.apache.org/jira/browse/GEODE-8350 > Project: Geode > Issue Type: Bug >Reporter: Bill Burcham >Priority: Major > Labels: starter > > The official docker image defines a {{CMD ["gfsh"]}} but no {{ENTRYPOINT}}. > As a result, it's easy to run {{gfsh}} interactively: > {noformat} > docker run -it apachegeode/geode > {noformat} > but to run a non-interactive {{gfsh}} command/script takes extra effort: > {noformat} > docker run --entrypoint gfsh apachegeode/geode -e version > {noformat} > When this story is complete, the official Docker image will define an > {{ENTRYPOINT ["gfsh"]}} that will allow execution of a non-interactive > script like: > {noformat} > docker run apachegeode/geode -e version > {noformat} > As before, it will be possible to enter an interactive {{gfsh}} session via: > {noformat} > docker run -it apachegeode/geode > {noformat} > Note, the Dockerfile probably won't need to define any {{CMD}} at all when > this story is complete. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (GEODE-8357) Exhausting the high priority message pool can result in deadlock
Kirk Lund created GEODE-8357: Summary: Exhausting the high priority message pool can result in deadlock Key: GEODE-8357 URL: https://issues.apache.org/jira/browse/GEODE-8357 Project: Geode Issue Type: Bug Components: messaging Reporter: Kirk Lund The system property "DistributionManager.MAX_THREADS" default to 100: {noformat} int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); {noformat} The system property used to be defined in geode-core ClusterDistributionManager and has moved to geode-core OperationExecutors. The value is used to limit ClusterOperationExecutors threadPool and highPriorityPool: {noformat} threadPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message Processor ", thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( "Pooled High Priority Message Processor ", thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); {noformat} I have seen server startup hang when recovering lots of expired entries from disk while using PDX. The hang looks like a dlock request for the PDX lock is not receiving a response. Checking the value for the distributionStats#highPriorityQueueSize statistic (in VSD) shows the value maxed out and never dropping. The dlock response granting the PDX lock is stuck in the highPriorityQueue because there are no more highPriorityQueue threads available to process the response. All of the highPriorityQueue thread stack dumps show tasks such as recovering bucket from disk are blocked waiting for the PDX lock. Several changes could improve this situation, either in conjunction or separately: # improve observability to enable support to identify that this situation has occurred # automatically identify this situation and warn the user with a log statement # automatically prevent this situation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (GEODE-8357) Exhausting the high priority message pool can result in deadlock
[ https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Lund reassigned GEODE-8357: Assignee: Kirk Lund > Exhausting the high priority message pool can result in deadlock > > > Key: GEODE-8357 > URL: https://issues.apache.org/jira/browse/GEODE-8357 > Project: Geode > Issue Type: Bug > Components: messaging >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > > The system property "DistributionManager.MAX_THREADS" default to 100: > {noformat} > int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); > {noformat} > The system property used to be defined in geode-core > ClusterDistributionManager and has moved to geode-core OperationExecutors. > The value is used to limit ClusterOperationExecutors threadPool and > highPriorityPool: > {noformat} > threadPool = > CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message > Processor ", > thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, > MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, > INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); > highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( > "Pooled High Priority Message Processor ", > thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, > MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, > INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); > {noformat} > I have seen server startup hang when recovering lots of expired entries from > disk while using PDX. The hang looks like a dlock request for the PDX lock is > not receiving a response. Checking the value for the > distributionStats#highPriorityQueueSize statistic (in VSD) shows the value > maxed out and never dropping. > The dlock response granting the PDX lock is stuck in the highPriorityQueue > because there are no more highPriorityQueue threads available to process the > response. All of the highPriorityQueue thread stack dumps show tasks such as > recovering bucket from disk are blocked waiting for the PDX lock. > Several changes could improve this situation, either in conjunction or > individually: > # improve observability to enable support to identify that this situation has > occurred > # automatically identify this situation and warn the user with a log statement > # automatically prevent this situation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8357) Exhausting the high priority message pool can result in deadlock
[ https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Lund updated GEODE-8357: - Description: The system property "DistributionManager.MAX_THREADS" default to 100: {noformat} int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); {noformat} The system property used to be defined in geode-core ClusterDistributionManager and has moved to geode-core OperationExecutors. The value is used to limit ClusterOperationExecutors threadPool and highPriorityPool: {noformat} threadPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message Processor ", thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( "Pooled High Priority Message Processor ", thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); {noformat} I have seen server startup hang when recovering lots of expired entries from disk while using PDX. The hang looks like a dlock request for the PDX lock is not receiving a response. Checking the value for the distributionStats#highPriorityQueueSize statistic (in VSD) shows the value maxed out and never dropping. The dlock response granting the PDX lock is stuck in the highPriorityQueue because there are no more highPriorityQueue threads available to process the response. All of the highPriorityQueue thread stack dumps show tasks such as recovering bucket from disk are blocked waiting for the PDX lock. Several changes could improve this situation, either in conjunction or individually: # improve observability to enable support to identify that this situation has occurred # automatically identify this situation and warn the user with a log statement # automatically prevent this situation was: The system property "DistributionManager.MAX_THREADS" default to 100: {noformat} int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); {noformat} The system property used to be defined in geode-core ClusterDistributionManager and has moved to geode-core OperationExecutors. The value is used to limit ClusterOperationExecutors threadPool and highPriorityPool: {noformat} threadPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message Processor ", thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( "Pooled High Priority Message Processor ", thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); {noformat} I have seen server startup hang when recovering lots of expired entries from disk while using PDX. The hang looks like a dlock request for the PDX lock is not receiving a response. Checking the value for the distributionStats#highPriorityQueueSize statistic (in VSD) shows the value maxed out and never dropping. The dlock response granting the PDX lock is stuck in the highPriorityQueue because there are no more highPriorityQueue threads available to process the response. All of the highPriorityQueue thread stack dumps show tasks such as recovering bucket from disk are blocked waiting for the PDX lock. Several changes could improve this situation, either in conjunction or separately: # improve observability to enable support to identify that this situation has occurred # automatically identify this situation and warn the user with a log statement # automatically prevent this situation > Exhausting the high priority message pool can result in deadlock > > > Key: GEODE-8357 > URL: https://issues.apache.org/jira/browse/GEODE-8357 > Project: Geode > Issue Type: Bug > Components: messaging >Reporter: Kirk Lund >Priority: Major > > The system property "DistributionManager.MAX_THREADS" default to 100: > {noformat} > int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); > {noformat} > The system property used to be defined in geode-core > ClusterDistributionManager and has moved to geode-core OperationExecutors. > The value is used to limit ClusterOperationExecutors threadPool and > highPriorityPool: > {noformat} > threadPool = > CoreLoggingExecutors.newThreadPoolWithFeedS
[jira] [Updated] (GEODE-8357) Exhausting the high priority message pool can result in deadlock
[ https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Lund updated GEODE-8357: - Labels: GeodeOperationAPI (was: ) > Exhausting the high priority message pool can result in deadlock > > > Key: GEODE-8357 > URL: https://issues.apache.org/jira/browse/GEODE-8357 > Project: Geode > Issue Type: Bug > Components: messaging >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI > > The system property "DistributionManager.MAX_THREADS" default to 100: > {noformat} > int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); > {noformat} > The system property used to be defined in geode-core > ClusterDistributionManager and has moved to geode-core OperationExecutors. > The value is used to limit ClusterOperationExecutors threadPool and > highPriorityPool: > {noformat} > threadPool = > CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message > Processor ", > thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, > MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, > INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); > highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( > "Pooled High Priority Message Processor ", > thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, > MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, > INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); > {noformat} > I have seen server startup hang when recovering lots of expired entries from > disk while using PDX. The hang looks like a dlock request for the PDX lock is > not receiving a response. Checking the value for the > distributionStats#highPriorityQueueSize statistic (in VSD) shows the value > maxed out and never dropping. > The dlock response granting the PDX lock is stuck in the highPriorityQueue > because there are no more highPriorityQueue threads available to process the > response. All of the highPriorityQueue thread stack dumps show tasks such as > recovering bucket from disk are blocked waiting for the PDX lock. > Several changes could improve this situation, either in conjunction or > individually: > # improve observability to enable support to identify that this situation has > occurred > # automatically identify this situation and warn the user with a log statement > # automatically prevent this situation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8357) Exhausting the high priority message pool can result in deadlock
[ https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Lund updated GEODE-8357: - Affects Version/s: 1.0.0-incubating 1.2.0 1.3.0 1.4.0 1.5.0 1.6.0 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 > Exhausting the high priority message pool can result in deadlock > > > Key: GEODE-8357 > URL: https://issues.apache.org/jira/browse/GEODE-8357 > Project: Geode > Issue Type: Bug > Components: messaging >Affects Versions: 1.0.0-incubating, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, > 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0 >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI > > The system property "DistributionManager.MAX_THREADS" default to 100: > {noformat} > int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); > {noformat} > The system property used to be defined in geode-core > ClusterDistributionManager and has moved to geode-core OperationExecutors. > The value is used to limit ClusterOperationExecutors threadPool and > highPriorityPool: > {noformat} > threadPool = > CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message > Processor ", > thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, > MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, > INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); > highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( > "Pooled High Priority Message Processor ", > thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, > MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, > INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); > {noformat} > I have seen server startup hang when recovering lots of expired entries from > disk while using PDX. The hang looks like a dlock request for the PDX lock is > not receiving a response. Checking the value for the > distributionStats#highPriorityQueueSize statistic (in VSD) shows the value > maxed out and never dropping. > The dlock response granting the PDX lock is stuck in the highPriorityQueue > because there are no more highPriorityQueue threads available to process the > response. All of the highPriorityQueue thread stack dumps show tasks such as > recovering bucket from disk are blocked waiting for the PDX lock. > Several changes could improve this situation, either in conjunction or > individually: > # improve observability to enable support to identify that this situation has > occurred > # automatically identify this situation and warn the user with a log statement > # automatically prevent this situation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata
[ https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156848#comment-17156848 ] ASF GitHub Bot commented on GEODE-8326: --- onichols-pivotal commented on a change in pull request #5358: URL: https://github.com/apache/geode/pull/5358#discussion_r453805506 ## File path: geode-core/src/distributedTest/java/org/apache/geode/internal/cache/partitioned/fixed/FixedPartitioningWithTransactionDistributedTest.java ## @@ -238,7 +238,7 @@ private void forceClientMetadataUpdate(Region region) { ClientMetadataService clientMetadataService = ((InternalCache) clientCacheRule.getClientCache()).getClientMetadataService(); clientMetadataService.scheduleGetPRMetaData((InternalRegion) region, true); -await().atMost(5, MINUTES).until(clientMetadataService::isMetadataStable); +await().atMost(5, HOURS).until(clientMetadataService::isMetadataStable); Review comment: seems like this experiment could have been conducted in the PR pipeline, shouldn't have needed to merge to develop This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CI Failure: > FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled > times out waiting for client metadata > --- > > Key: GEODE-8326 > URL: https://issues.apache.org/jira/browse/GEODE-8326 > Project: Geode > Issue Type: Bug > Components: client/server, tests >Affects Versions: 1.13.0 >Reporter: Kirk Lund >Assignee: Eric Shu >Priority: Major > Labels: caching-applications > > CI Failure: > http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/ > {noformat} > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > > > clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject] > FAILED > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > that uses org.apache.geode.cache.client.internal.ClientMetadataService was > not fulfilled within 5 minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:78) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:26) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254) > {noformat} > The failure occurs after waiting 5 minutes for the ClientMetadataService to > stabilize. See ClientMetadataService#isMetadataStable. > The timeout occurs within a block of test code that was introduced by Jake in > PR #3840: > {noformat} > GEODE-7006: Fixes function execution by id with transactions. (#3840) > * Fixes test to force and wait for PR metadata to update. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for client metadata
[ https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156870#comment-17156870 ] Kirk Lund commented on GEODE-8326: -- [~onichols] Thanks for pointing that out! That wasn't supposed to get merged to develop. > CI Failure: > FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled > times out waiting for client metadata > --- > > Key: GEODE-8326 > URL: https://issues.apache.org/jira/browse/GEODE-8326 > Project: Geode > Issue Type: Bug > Components: client/server, tests >Affects Versions: 1.13.0 >Reporter: Kirk Lund >Assignee: Eric Shu >Priority: Major > Labels: caching-applications > > CI Failure: > http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/ > {noformat} > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > > > clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject] > FAILED > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > that uses org.apache.geode.cache.client.internal.ClientMetadataService was > not fulfilled within 5 minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:78) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:26) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254) > {noformat} > The failure occurs after waiting 5 minutes for the ClientMetadataService to > stabilize. See ClientMetadataService#isMetadataStable. > The timeout occurs within a block of test code that was introduced by Jake in > PR #3840: > {noformat} > GEODE-7006: Fixes function execution by id with transactions. (#3840) > * Fixes test to force and wait for PR metadata to update. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (GEODE-8326) CI Failure: FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled times out waiting for clie
[ https://issues.apache.org/jira/browse/GEODE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Lund updated GEODE-8326: - Comment: was deleted (was: [~onichols] Thanks for pointing that out! That wasn't supposed to get merged to develop.) > CI Failure: > FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled > times out waiting for client metadata > --- > > Key: GEODE-8326 > URL: https://issues.apache.org/jira/browse/GEODE-8326 > Project: Geode > Issue Type: Bug > Components: client/server, tests >Affects Versions: 1.13.0 >Reporter: Kirk Lund >Assignee: Eric Shu >Priority: Major > Labels: caching-applications > > CI Failure: > http://files.apachegeode-ci.info/builds/apache-support-1-13-main/1.13.0-build.0296/test-results/distributedTest/1592846714/ > {noformat} > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > > > clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled[ExecuteFunctionByObject] > FAILED > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest > that uses org.apache.geode.cache.client.internal.ClientMetadataService was > not fulfilled within 5 minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:78) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:26) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.forceClientMetadataUpdate(FixedPartitioningWithTransactionDistributedTest.java:241) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.doFunctionTransactionAndSuspend(FixedPartitioningWithTransactionDistributedTest.java:458) > at > org.apache.geode.internal.cache.partitioned.fixed.FixedPartitioningWithTransactionDistributedTest.clientCanRollbackFunctionOnRegionWithoutFilterAndWithSingleHopEnabled(FixedPartitioningWithTransactionDistributedTest.java:254) > {noformat} > The failure occurs after waiting 5 minutes for the ClientMetadataService to > stabilize. See ClientMetadataService#isMetadataStable. > The timeout occurs within a block of test code that was introduced by Jake in > PR #3840: > {noformat} > GEODE-7006: Fixes function execution by id with transactions. (#3840) > * Fixes test to force and wait for PR metadata to update. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8298) member version comparison sense inconsistent when deciding on multicast
[ https://issues.apache.org/jira/browse/GEODE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-8298: Description: Since about 2014 when we introduced the {{Version}} class to replace use of {{short}} s all over the place for serialization versions, these two loops in {{GMSMembership.processView()}} have used comparisons that disagree in sense: {code} // We perform the update under a global lock so that other // incoming events will not be lost in terms of our global view. latestViewWriteLock.lock(); try { // first determine the version for multicast message serialization VersionOrdinal version = Version.CURRENT; for (final Entry internalIDLongEntry : surpriseMembers .entrySet()) { ID mbr = internalIDLongEntry.getKey(); final VersionOrdinal itsVersion = mbr.getVersionObject(); if (itsVersion != null && version.compareTo(itsVersion) < 0) { version = itsVersion; } } for (ID mbr : newView.getMembers()) { final VersionOrdinal itsVersion = mbr.getVersionObject(); if (itsVersion != null && itsVersion.compareTo(version) < 0) { version = mbr.getVersionObject(); } } disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT); {code} The goal here is to find the oldest version and if that version is older than our local version we disable multicast. So we want to put the minimum into {{version}}. So the first loop's comparison is wrong and the second one is right. While we are in here let's combine the two loops using {{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()), newView.getMembers().stream()).forEach(member -> ...)}}. Alternatives are described here: https://www.baeldung.com/java-combine-multiple-collections Once we have the combined {{Iterable}} we can use something like {{Collections.min()}} to find the minimum in one swell foop and this whole thing collapses to one or two declarative expressions. When this story is complete, the functionality will be in a separate method and we'll have a unit test for it. was: Since about 2014 when we introduced the {{Version}} class to replace use of {{short}}s all over the place for serialization versions, these two loops in {{GMSMembership.processView()}} have used comparisons that disagree in sense: {code} // We perform the update under a global lock so that other // incoming events will not be lost in terms of our global view. latestViewWriteLock.lock(); try { // first determine the version for multicast message serialization VersionOrdinal version = Version.CURRENT; for (final Entry internalIDLongEntry : surpriseMembers .entrySet()) { ID mbr = internalIDLongEntry.getKey(); final VersionOrdinal itsVersion = mbr.getVersionObject(); if (itsVersion != null && version.compareTo(itsVersion) < 0) { version = itsVersion; } } for (ID mbr : newView.getMembers()) { final VersionOrdinal itsVersion = mbr.getVersionObject(); if (itsVersion != null && itsVersion.compareTo(version) < 0) { version = mbr.getVersionObject(); } } disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT); {code} The goal here is to find the oldest version and if that version is older than our local version we disable multicast. So we want to put the minimum into {{version}}. So the first loop's comparison is wrong and the second one is right. While we are in here let's combine the two loops using {{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()), newView.getMembers().stream()).forEach(member -> ...)}}. Alternatives are described here: https://www.baeldung.com/java-combine-multiple-collections Once we have the combined {{Iterable}} we can use something like {{Collections.min()}} to find the minimum in one swell foop and this whole thing collapses to one or two declarative expressions. When this story is complete, the functionality will be in a separate method and we'll have a unit test for it. > member version comparison sense inconsistent when deciding on multicast > --- > > Key: GEODE-8298 > URL: https://issues.apache.org/jira/browse/GEODE-8298 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bill Burcham >Priority: Major > Labels: starter > > Since about 2014 when we introduced the {{Version}} class to replace use of > {{short}} s all over the place for serialization versions, these two loops in > {{GMSMembership.processView()}} have used comparisons that disagree in sense: > {code} > // We perform the update under a
[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors
[ https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156878#comment-17156878 ] ASF GitHub Bot commented on GEODE-8340: --- moleske commented on a change in pull request #625: URL: https://github.com/apache/geode-native/pull/625#discussion_r453770010 ## File path: cppcache/src/TcrMessage.cpp ## @@ -1015,6 +1015,36 @@ void TcrMessage::processChunk(const std::vector& chunk, int32_t len, break; } // fall-through for other cases + if (m_chunkedResult != nullptr) { Review comment: The previous code did a fallthrough if it didn't make it to the `break;` statement on line 999 or 1015 (which is buried in an `if` and `else if`). This is the same code as on line 1054. I thought about extracting a function but got lazy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce Switch compiler warnings as errors > -- > > Key: GEODE-8340 > URL: https://issues.apache.org/jira/browse/GEODE-8340 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Michael Oleske >Priority: Major > > Given I compile the code without exempting no-switch-enum and > no-implicit-fallthrough and no-covered-switch-default > Then it should compile > Note - was marked as a todo, seems reasonable to tackle all these at once -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8356) Gfsh export logs to support capturing thread dumps in the logs
[ https://issues.apache.org/jira/browse/GEODE-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156888#comment-17156888 ] Anilkumar Gingade commented on GEODE-8356: -- There is already gfsh command to get the stack-trace. This is to make the entire process of getting the logs/stas/dumps in one single go. > Gfsh export logs to support capturing thread dumps in the logs > -- > > Key: GEODE-8356 > URL: https://issues.apache.org/jira/browse/GEODE-8356 > Project: Geode > Issue Type: Improvement > Components: gfsh >Reporter: Anilkumar Gingade >Priority: Major > Labels: GeodeOperationAPI > > With an option to say "--with-thread-dump" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (GEODE-8298) member version comparison sense inconsistent when deciding on multicast
[ https://issues.apache.org/jira/browse/GEODE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham reassigned GEODE-8298: --- Assignee: Kamilla Aslami > member version comparison sense inconsistent when deciding on multicast > --- > > Key: GEODE-8298 > URL: https://issues.apache.org/jira/browse/GEODE-8298 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bill Burcham >Assignee: Kamilla Aslami >Priority: Major > Labels: starter > > Since about 2014 when we introduced the {{Version}} class to replace use of > {{short}} s all over the place for serialization versions, these two loops in > {{GMSMembership.processView()}} have used comparisons that disagree in sense: > {code} > // We perform the update under a global lock so that other > // incoming events will not be lost in terms of our global view. > latestViewWriteLock.lock(); > try { > // first determine the version for multicast message serialization > VersionOrdinal version = Version.CURRENT; > for (final Entry internalIDLongEntry : surpriseMembers > .entrySet()) { > ID mbr = internalIDLongEntry.getKey(); > final VersionOrdinal itsVersion = mbr.getVersionObject(); > if (itsVersion != null && version.compareTo(itsVersion) < 0) { > version = itsVersion; > } > } > for (ID mbr : newView.getMembers()) { > final VersionOrdinal itsVersion = mbr.getVersionObject(); > if (itsVersion != null && itsVersion.compareTo(version) < 0) { > version = mbr.getVersionObject(); > } > } > disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT); > {code} > The goal here is to find the oldest version and if that version is older than > our local version we disable multicast. So we want to put the minimum into > {{version}}. So the first loop's comparison is wrong and the second one is > right. > While we are in here let's combine the two loops using > {{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()), > newView.getMembers().stream()).forEach(member -> ...)}}. > Alternatives are described here: > https://www.baeldung.com/java-combine-multiple-collections > Once we have the combined {{Iterable}} we can use something like > {{Collections.min()}} to find the minimum in one swell foop and this whole > thing collapses to one or two declarative expressions. > When this story is complete, the functionality will be in a separate method > and we'll have a unit test for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (GEODE-8298) member version comparison sense inconsistent when deciding on multicast
[ https://issues.apache.org/jira/browse/GEODE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham reassigned GEODE-8298: --- Assignee: (was: Kamilla Aslami) > member version comparison sense inconsistent when deciding on multicast > --- > > Key: GEODE-8298 > URL: https://issues.apache.org/jira/browse/GEODE-8298 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bill Burcham >Priority: Major > Labels: starter > > Since about 2014 when we introduced the {{Version}} class to replace use of > {{short}} s all over the place for serialization versions, these two loops in > {{GMSMembership.processView()}} have used comparisons that disagree in sense: > {code} > // We perform the update under a global lock so that other > // incoming events will not be lost in terms of our global view. > latestViewWriteLock.lock(); > try { > // first determine the version for multicast message serialization > VersionOrdinal version = Version.CURRENT; > for (final Entry internalIDLongEntry : surpriseMembers > .entrySet()) { > ID mbr = internalIDLongEntry.getKey(); > final VersionOrdinal itsVersion = mbr.getVersionObject(); > if (itsVersion != null && version.compareTo(itsVersion) < 0) { > version = itsVersion; > } > } > for (ID mbr : newView.getMembers()) { > final VersionOrdinal itsVersion = mbr.getVersionObject(); > if (itsVersion != null && itsVersion.compareTo(version) < 0) { > version = mbr.getVersionObject(); > } > } > disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT); > {code} > The goal here is to find the oldest version and if that version is older than > our local version we disable multicast. So we want to put the minimum into > {{version}}. So the first loop's comparison is wrong and the second one is > right. > While we are in here let's combine the two loops using > {{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()), > newView.getMembers().stream()).forEach(member -> ...)}}. > Alternatives are described here: > https://www.baeldung.com/java-combine-multiple-collections > Once we have the combined {{Iterable}} we can use something like > {{Collections.min()}} to find the minimum in one swell foop and this whole > thing collapses to one or two declarative expressions. > When this story is complete, the functionality will be in a separate method > and we'll have a unit test for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (GEODE-8298) member version comparison sense inconsistent when deciding on multicast
[ https://issues.apache.org/jira/browse/GEODE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamilla Aslami reassigned GEODE-8298: - Assignee: Kamilla Aslami > member version comparison sense inconsistent when deciding on multicast > --- > > Key: GEODE-8298 > URL: https://issues.apache.org/jira/browse/GEODE-8298 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bill Burcham >Assignee: Kamilla Aslami >Priority: Major > Labels: starter > > Since about 2014 when we introduced the {{Version}} class to replace use of > {{short}} s all over the place for serialization versions, these two loops in > {{GMSMembership.processView()}} have used comparisons that disagree in sense: > {code} > // We perform the update under a global lock so that other > // incoming events will not be lost in terms of our global view. > latestViewWriteLock.lock(); > try { > // first determine the version for multicast message serialization > VersionOrdinal version = Version.CURRENT; > for (final Entry internalIDLongEntry : surpriseMembers > .entrySet()) { > ID mbr = internalIDLongEntry.getKey(); > final VersionOrdinal itsVersion = mbr.getVersionObject(); > if (itsVersion != null && version.compareTo(itsVersion) < 0) { > version = itsVersion; > } > } > for (ID mbr : newView.getMembers()) { > final VersionOrdinal itsVersion = mbr.getVersionObject(); > if (itsVersion != null && itsVersion.compareTo(version) < 0) { > version = mbr.getVersionObject(); > } > } > disableMulticastForRollingUpgrade = !version.equals(Version.CURRENT); > {code} > The goal here is to find the oldest version and if that version is older than > our local version we disable multicast. So we want to put the minimum into > {{version}}. So the first loop's comparison is wrong and the second one is > right. > While we are in here let's combine the two loops using > {{Stream.concat(surpriseMembers.entrySet().stream().map(entry->entry.getKey()), > newView.getMembers().stream()).forEach(member -> ...)}}. > Alternatives are described here: > https://www.baeldung.com/java-combine-multiple-collections > Once we have the combined {{Iterable}} we can use something like > {{Collections.min()}} to find the minimum in one swell foop and this whole > thing collapses to one or two declarative expressions. > When this story is complete, the functionality will be in a separate method > and we'll have a unit test for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEODE-8302) WAN Conflation stats are being incorrectly incremented
[ https://issues.apache.org/jira/browse/GEODE-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Donal Evans resolved GEODE-8302. Fix Version/s: 1.14.0 Resolution: Fixed > WAN Conflation stats are being incorrectly incremented > -- > > Key: GEODE-8302 > URL: https://issues.apache.org/jira/browse/GEODE-8302 > Project: Geode > Issue Type: Bug > Components: statistics, wan >Affects Versions: 1.14.0 >Reporter: Donal Evans >Assignee: Alberto Gomez >Priority: Major > Fix For: 1.14.0 > > > When the below diff (which adds checks to confirm that conflation stats are > not incremented in WAN tests with conflation disabled) is applied, the > modified tests fail due to conflation stats being incorrectly incremented. > This behaviour is only observed since the changes included in this PR were > introduced: https://github.com/apache/geode/pull/4928 > {noformat} > diff --git > a/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialWANStatsDUnitTest.java > > b/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialWANStatsDUnitTest.java > index b2ed76728f..bc6beb0002 100644 > --- > a/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialWANStatsDUnitTest.java > +++ > b/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialWANStatsDUnitTest.java > @@ -209,6 +209,7 @@ public class SerialWANStatsDUnitTest extends WANTestBase { > > vm4.invoke(() -> WANTestBase.checkQueueStats("ln", 0, entries, entries, > entries)); > vm4.invoke(() -> WANTestBase.checkBatchStats("ln", 1, true)); > +vm4.invoke(() -> WANTestBase.checkConflatedStats("ln", 0)); > > // wait until queue is empty > vm5.invoke(() -> await() > @@ -354,6 +355,7 @@ public class SerialWANStatsDUnitTest extends WANTestBase { > > vm4.invoke(() -> WANTestBase.checkQueueStats("ln", 0, entries, entries, > entries)); > vm4.invoke(() -> WANTestBase.checkBatchStats("ln", 2, true, true)); > +vm4.invoke(() -> WANTestBase.checkConflatedStats("ln", 0)); > > // wait until queue is empty > vm5.invoke(() -> await() > {noformat} > In addition to the tests above, > SerialWANPropagation_PartitionedRegionDUnitTest.testPartitionedSerialPropagationHA() > fails with incorrectly incremented conflation stats if a similar check is > introduced at the end of the test. Again, without the changes introduced by > PR #4928, this modified test passes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors
[ https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156943#comment-17156943 ] ASF GitHub Bot commented on GEODE-8340: --- pdxcodemonkey commented on a change in pull request #625: URL: https://github.com/apache/geode-native/pull/625#discussion_r453895218 ## File path: cppcache/include/geode/DataInput.hpp ## @@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput { // empty string break; // TODO: What's the right response here? - default: + case internal::DSCode::FixedIDDefault: Review comment: Sorry, I don't think the chance of getting a warning for free is worth the cost of some egregious cluttering of the code. We're only using 4 of the 61 possible values for a DSCode, so it was kind of an abuse of a switch statement to begin with. This should probably just be a 4-way if-else block. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce Switch compiler warnings as errors > -- > > Key: GEODE-8340 > URL: https://issues.apache.org/jira/browse/GEODE-8340 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Michael Oleske >Priority: Major > > Given I compile the code without exempting no-switch-enum and > no-implicit-fallthrough and no-covered-switch-default > Then it should compile > Note - was marked as a todo, seems reasonable to tackle all these at once -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8357) Exhausting the high priority message pool can result in deadlock
[ https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Lund updated GEODE-8357: - Description: The system property "DistributionManager.MAX_THREADS" default to 100: {noformat} int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); {noformat} The system property used to be defined in geode-core ClusterDistributionManager and has moved to geode-core OperationExecutors. The value is used to limit ClusterOperationExecutors threadPool and highPriorityPool: {noformat} threadPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message Processor ", thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( "Pooled High Priority Message Processor ", thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); {noformat} I have seen server startup hang when recovering lots of expired entries from disk while using PDX. The hang looks like a dlock request for the PDX lock is not receiving a response. Checking the value for the distributionStats#highPriorityQueueSize statistic (in VSD) shows the value maxed out and never dropping. The dlock response granting the PDX lock is stuck in the highPriorityQueue because there are no more highPriorityQueue threads available to process the response. All of the highPriorityQueue thread stack dumps show tasks such as recovering bucket from disk are blocked waiting for the PDX lock. Several changes could improve this situation, either in conjunction or individually: # improve observability to enable support to identify that this situation has occurred # automatically identify this situation and warn the user with a log statement # automatically prevent this situation # identify the messages that are prone to causing deadlocks and move them to a dedicated thread pool with a higher limit was: The system property "DistributionManager.MAX_THREADS" default to 100: {noformat} int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); {noformat} The system property used to be defined in geode-core ClusterDistributionManager and has moved to geode-core OperationExecutors. The value is used to limit ClusterOperationExecutors threadPool and highPriorityPool: {noformat} threadPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message Processor ", thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( "Pooled High Priority Message Processor ", thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); {noformat} I have seen server startup hang when recovering lots of expired entries from disk while using PDX. The hang looks like a dlock request for the PDX lock is not receiving a response. Checking the value for the distributionStats#highPriorityQueueSize statistic (in VSD) shows the value maxed out and never dropping. The dlock response granting the PDX lock is stuck in the highPriorityQueue because there are no more highPriorityQueue threads available to process the response. All of the highPriorityQueue thread stack dumps show tasks such as recovering bucket from disk are blocked waiting for the PDX lock. Several changes could improve this situation, either in conjunction or individually: # improve observability to enable support to identify that this situation has occurred # automatically identify this situation and warn the user with a log statement # automatically prevent this situation > Exhausting the high priority message pool can result in deadlock > > > Key: GEODE-8357 > URL: https://issues.apache.org/jira/browse/GEODE-8357 > Project: Geode > Issue Type: Bug > Components: messaging >Affects Versions: 1.0.0-incubating, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, > 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0 >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI > > The system property "DistributionManager.MAX_THREADS" default to 100: > {noformat} > int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS",
[jira] [Updated] (GEODE-8357) Exhausting the high priority message thread pool can result in deadlock
[ https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Lund updated GEODE-8357: - Summary: Exhausting the high priority message thread pool can result in deadlock (was: Exhausting the high priority message pool can result in deadlock) > Exhausting the high priority message thread pool can result in deadlock > --- > > Key: GEODE-8357 > URL: https://issues.apache.org/jira/browse/GEODE-8357 > Project: Geode > Issue Type: Bug > Components: messaging >Affects Versions: 1.0.0-incubating, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, > 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0 >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI > > The system property "DistributionManager.MAX_THREADS" default to 100: > {noformat} > int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); > {noformat} > The system property used to be defined in geode-core > ClusterDistributionManager and has moved to geode-core OperationExecutors. > The value is used to limit ClusterOperationExecutors threadPool and > highPriorityPool: > {noformat} > threadPool = > CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message > Processor ", > thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, > MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, > INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); > highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( > "Pooled High Priority Message Processor ", > thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, > MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, > INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); > {noformat} > I have seen server startup hang when recovering lots of expired entries from > disk while using PDX. The hang looks like a dlock request for the PDX lock is > not receiving a response. Checking the value for the > distributionStats#highPriorityQueueSize statistic (in VSD) shows the value > maxed out and never dropping. > The dlock response granting the PDX lock is stuck in the highPriorityQueue > because there are no more highPriorityQueue threads available to process the > response. All of the highPriorityQueue thread stack dumps show tasks such as > recovering bucket from disk are blocked waiting for the PDX lock. > Several changes could improve this situation, either in conjunction or > individually: > # improve observability to enable support to identify that this situation has > occurred > # automatically identify this situation and warn the user with a log statement > # automatically prevent this situation > # identify the messages that are prone to causing deadlocks and move them to > a dedicated thread pool with a higher limit -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8355) long-running-test details need to be world-readable
[ https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156947#comment-17156947 ] ASF GitHub Bot commented on GEODE-8355: --- rhoughton-pivot merged pull request #5366: URL: https://github.com/apache/geode/pull/5366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > long-running-test details need to be world-readable > --- > > Key: GEODE-8355 > URL: https://issues.apache.org/jira/browse/GEODE-8355 > Project: Geode > Issue Type: Improvement > Components: ci >Reporter: Robert Houghton >Assignee: Robert Houghton >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEODE-8355) long-running-test details need to be world-readable
[ https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Houghton resolved GEODE-8355. Fix Version/s: 1.14.0 Resolution: Fixed > long-running-test details need to be world-readable > --- > > Key: GEODE-8355 > URL: https://issues.apache.org/jira/browse/GEODE-8355 > Project: Geode > Issue Type: Improvement > Components: ci >Reporter: Robert Houghton >Assignee: Robert Houghton >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8355) long-running-test details need to be world-readable
[ https://issues.apache.org/jira/browse/GEODE-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156948#comment-17156948 ] ASF subversion and git services commented on GEODE-8355: Commit c41e3b4b559bfbc744c8c21844cd126de2ad2fb9 in geode's branch refs/heads/develop from Robert Houghton [ https://gitbox.apache.org/repos/asf?p=geode.git;h=c41e3b4 ] GEODE-8355: add `public: true` to the test job in long-running-test (#5366) Authored-by: Robert Houghton > long-running-test details need to be world-readable > --- > > Key: GEODE-8355 > URL: https://issues.apache.org/jira/browse/GEODE-8355 > Project: Geode > Issue Type: Improvement > Components: ci >Reporter: Robert Houghton >Assignee: Robert Houghton >Priority: Major > Fix For: 1.14.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors
[ https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156961#comment-17156961 ] ASF GitHub Bot commented on GEODE-8340: --- moleske commented on a change in pull request #625: URL: https://github.com/apache/geode-native/pull/625#discussion_r453915203 ## File path: cppcache/include/geode/DataInput.hpp ## @@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput { // empty string break; // TODO: What's the right response here? - default: + case internal::DSCode::FixedIDDefault: Review comment: I'll switch it to an if-else block and we'll see if that cleans up better This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce Switch compiler warnings as errors > -- > > Key: GEODE-8340 > URL: https://issues.apache.org/jira/browse/GEODE-8340 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Michael Oleske >Priority: Major > > Given I compile the code without exempting no-switch-enum and > no-implicit-fallthrough and no-covered-switch-default > Then it should compile > Note - was marked as a todo, seems reasonable to tackle all these at once -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors
[ https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156981#comment-17156981 ] ASF GitHub Bot commented on GEODE-8340: --- pdxcodemonkey commented on a change in pull request #625: URL: https://github.com/apache/geode-native/pull/625#discussion_r453936057 ## File path: cppcache/include/geode/DataInput.hpp ## @@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput { // empty string break; // TODO: What's the right response here? - default: + case internal::DSCode::FixedIDDefault: Review comment: Just gonna leave this here, cause I spent the time & effort to figure it out. Probably slightly better performance than the if-else approach, but I'm still not sure about readability: ``` template inline std::basic_string readString() { std::basic_string value; std::map > readers; readers.insert(std::make_pair( internal::DSCode::CacheableString, [=](std::string& val) { this->readJavaModifiedUtf8(val); })); readers.insert( std::make_pair(internal::DSCode::CacheableStringHuge, [=](std::string& val) { this->readUtf16Huge(val); })); readers.insert( std::make_pair(internal::DSCode::CacheableASCIIString, [=](std::string& val) { this->readAscii(val); })); readers.insert( std::make_pair(internal::DSCode::CacheableASCIIStringHuge, [=](std::string& val) { this->readAsciiHuge(val); })); auto type = static_cast(read()); auto it = readers.find(static_cast(read())); if (it != readers.end()) { it->second(value); } return value; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce Switch compiler warnings as errors > -- > > Key: GEODE-8340 > URL: https://issues.apache.org/jira/browse/GEODE-8340 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Michael Oleske >Priority: Major > > Given I compile the code without exempting no-switch-enum and > no-implicit-fallthrough and no-covered-switch-default > Then it should compile > Note - was marked as a todo, seems reasonable to tackle all these at once -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8067) ClassLoader Isolation
[ https://issues.apache.org/jira/browse/GEODE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157002#comment-17157002 ] ASF GitHub Bot commented on GEODE-8067: --- lgtm-com[bot] commented on pull request #5357: URL: https://github.com/apache/geode/pull/5357#issuecomment-657812961 This pull request **introduces 2 alerts** and **fixes 2** when merging 9cb762ed91581557c8c0fca0ac8983834a8e595c into c41e3b4b559bfbc744c8c21844cd126de2ad2fb9 - [view on LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-4bdd1467f1ba8de5c3639c0500e1641205fbfb8f) **new alerts:** * 2 for Potential input resource leak **fixed alerts:** * 2 for Unused variable, import, function or class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ClassLoader Isolation > - > > Key: GEODE-8067 > URL: https://issues.apache.org/jira/browse/GEODE-8067 > Project: Geode > Issue Type: New Feature > Components: client/server >Reporter: Udo Kohlmeyer >Assignee: Udo Kohlmeyer >Priority: Major > > This is the root jira for the first pass implementation for [ClassLoader > Isolation|https://cwiki.apache.org/confluence/display/GEODE/ClassLoader+Isolation] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8340) Enforce Switch compiler warnings as errors
[ https://issues.apache.org/jira/browse/GEODE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157003#comment-17157003 ] ASF GitHub Bot commented on GEODE-8340: --- moleske commented on a change in pull request #625: URL: https://github.com/apache/geode-native/pull/625#discussion_r453958895 ## File path: cppcache/include/geode/DataInput.hpp ## @@ -313,7 +313,53 @@ class APACHE_GEODE_EXPORT DataInput { // empty string break; // TODO: What's the right response here? - default: + case internal::DSCode::FixedIDDefault: Review comment: I checked the `Allow edits and access to secrets by maintainers` so feel free to add this as commit and see if it works (I'm assuming the commit will go to the branch and trigger the [CI pipeline](https://github.com/moleske/geode-native/actions) I've been playing around with) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enforce Switch compiler warnings as errors > -- > > Key: GEODE-8340 > URL: https://issues.apache.org/jira/browse/GEODE-8340 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Michael Oleske >Priority: Major > > Given I compile the code without exempting no-switch-enum and > no-implicit-fallthrough and no-covered-switch-default > Then it should compile > Note - was marked as a todo, seems reasonable to tackle all these at once -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-6950) Locator can't start if a lot of clients already started
[ https://issues.apache.org/jira/browse/GEODE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Nedzvetsky updated GEODE-6950: - Affects Version/s: 1.10.0 1.11.0 1.12.0 > Locator can't start if a lot of clients already started > --- > > Key: GEODE-6950 > URL: https://issues.apache.org/jira/browse/GEODE-6950 > Project: Geode > Issue Type: Bug > Components: core >Affects Versions: 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0 >Reporter: Eugene Nedzvetsky >Priority: Major > Attachments: 1.log > > > Locator can't start if a few hundred clients already started. > Steps to reproduce: > 1. Start Locator > 2. Start 300 Geode clients > 3. Stop Locator > 4. Start Locator again > Observe 100% CPU load and after some time Locator app crashes with timeout > exceptions in the log. > The problem is in the method > org.apache.geode.distributed.internal.InternalLocator.PrimaryHandler#processRequest > handlerMapping doesn't have handlers for LocatorListRequest and > ClientConnectionRequest requests on Locator startup and in this case work > code part with condition 'if(giveup == 0)'(InternalLocator:1185) > Pause Thread.sleep(1000) works only on the first iteration and after that > giveup>0 and CPU just spends resources on cycle execution without any pauses. > Call Thread.sleep(1000) should be after if(giveup>0) condition block. It > will be called on each iteration in this case. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-6950) Locator can't start if a lot of clients already started
[ https://issues.apache.org/jira/browse/GEODE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157007#comment-17157007 ] Eugene Nedzvetsky commented on GEODE-6950: -- Current version: {code:java} @Override public Object processRequest(Object request) throws IOException { long giveup = 0; while (giveup == 0 || System.currentTimeMillis() < giveup) { TcpHandler handler; if (request instanceof PeerLocatorRequest) { handler = handlerMapping.get(PeerLocatorRequest.class); } else { handler = handlerMapping.get(request.getClass()); } if (handler != null) { return handler.processRequest(request); } if (locatorListener != null) { return locatorListener.handleRequest(request); } // either there is a configuration problem or the locator is still starting up if (giveup == 0) { int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime(); if (locatorWaitTime <= 0) { // always retry some number of times locatorWaitTime = 30; } giveup = System.currentTimeMillis() + locatorWaitTime * 1000L; try { Thread.sleep(1000); } catch (InterruptedException ignored) { // running in an executor - no need to set the interrupted flag on the thread return null; } } } logger.info( "Received a location request of class {} but the handler for this is either not enabled or is not ready to process requests", request.getClass().getSimpleName()); return null; } {code} Fix: {code} @Override public Object processRequest(Object request) throws IOException { long giveup = 0; while (giveup == 0 || System.currentTimeMillis() < giveup) { TcpHandler handler; if (request instanceof PeerLocatorRequest) { handler = handlerMapping.get(PeerLocatorRequest.class); } else { handler = handlerMapping.get(request.getClass()); } if (handler != null) { return handler.processRequest(request); } if (locatorListener != null) { return locatorListener.handleRequest(request); } // either there is a configuration problem or the locator is still starting up if (giveup == 0) { int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime(); if (locatorWaitTime <= 0) { // always retry some number of times locatorWaitTime = 30; } giveup = System.currentTimeMillis() + locatorWaitTime * 1000L; } try { Thread.sleep(1000); } catch (InterruptedException ignored) { // running in an executor - no need to set the interrupted flag on the thread return null; } } logger.info( "Received a location request of class {} but the handler for this is either not enabled or is not ready to process requests", request.getClass().getSimpleName()); return null; } {code} > Locator can't start if a lot of clients already started > --- > > Key: GEODE-6950 > URL: https://issues.apache.org/jira/browse/GEODE-6950 > Project: Geode > Issue Type: Bug > Components: core >Affects Versions: 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0 >Reporter: Eugene Nedzvetsky >Priority: Major > Attachments: 1.log > > > Locator can't start if a few hundred clients already started. > Steps to reproduce: > 1. Start Locator > 2. Start 300 Geode clients > 3. Stop Locator > 4. Start Locator again > Observe 100% CPU load and after some time Locator app crashes with timeout > exceptions in the log. > The problem is in the method > org.apache.geode.distributed.internal.InternalLocator.PrimaryHandler#processRequest > handlerMapping doesn't have handlers for LocatorListRequest and > ClientConnectionRequest requests on Locator startup and in this case work > code part with condition 'if(giveup == 0)'(InternalLocator:1185) > Pause Thread.sleep(1000) works only on the first iteration and after that > giveup>0 and CPU just spends resources on cycle execution without any pauses. > Call Thread.sleep(1000) should be after if(giveup>0) condition block. It > will be called on each iteration in this case. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (GEODE-6950) Locator can't start if a lot of clients already started
[ https://issues.apache.org/jira/browse/GEODE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157007#comment-17157007 ] Eugene Nedzvetsky edited comment on GEODE-6950 at 7/13/20, 10:06 PM: - org.apache.geode.distributed.internal.PrimaryHandler:85 Current version: {code:java} @Override public Object processRequest(Object request) throws IOException { long giveup = 0; while (giveup == 0 || System.currentTimeMillis() < giveup) { TcpHandler handler; if (request instanceof PeerLocatorRequest) { handler = handlerMapping.get(PeerLocatorRequest.class); } else { handler = handlerMapping.get(request.getClass()); } if (handler != null) { return handler.processRequest(request); } if (locatorListener != null) { return locatorListener.handleRequest(request); } // either there is a configuration problem or the locator is still starting up if (giveup == 0) { int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime(); if (locatorWaitTime <= 0) { // always retry some number of times locatorWaitTime = 30; } giveup = System.currentTimeMillis() + locatorWaitTime * 1000L; try { Thread.sleep(1000); } catch (InterruptedException ignored) { // running in an executor - no need to set the interrupted flag on the thread return null; } } } logger.info( "Received a location request of class {} but the handler for this is either not enabled or is not ready to process requests", request.getClass().getSimpleName()); return null; } {code} Fix: {code} @Override public Object processRequest(Object request) throws IOException { long giveup = 0; while (giveup == 0 || System.currentTimeMillis() < giveup) { TcpHandler handler; if (request instanceof PeerLocatorRequest) { handler = handlerMapping.get(PeerLocatorRequest.class); } else { handler = handlerMapping.get(request.getClass()); } if (handler != null) { return handler.processRequest(request); } if (locatorListener != null) { return locatorListener.handleRequest(request); } // either there is a configuration problem or the locator is still starting up if (giveup == 0) { int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime(); if (locatorWaitTime <= 0) { // always retry some number of times locatorWaitTime = 30; } giveup = System.currentTimeMillis() + locatorWaitTime * 1000L; } try { Thread.sleep(1000); } catch (InterruptedException ignored) { // running in an executor - no need to set the interrupted flag on the thread return null; } } logger.info( "Received a location request of class {} but the handler for this is either not enabled or is not ready to process requests", request.getClass().getSimpleName()); return null; } {code} was (Author: eugenex9): Current version: {code:java} @Override public Object processRequest(Object request) throws IOException { long giveup = 0; while (giveup == 0 || System.currentTimeMillis() < giveup) { TcpHandler handler; if (request instanceof PeerLocatorRequest) { handler = handlerMapping.get(PeerLocatorRequest.class); } else { handler = handlerMapping.get(request.getClass()); } if (handler != null) { return handler.processRequest(request); } if (locatorListener != null) { return locatorListener.handleRequest(request); } // either there is a configuration problem or the locator is still starting up if (giveup == 0) { int locatorWaitTime = internalLocator.getConfig().getLocatorWaitTime(); if (locatorWaitTime <= 0) { // always retry some number of times locatorWaitTime = 30; } giveup = System.currentTimeMillis() + locatorWaitTime * 1000L; try { Thread.sleep(1000); } catch (InterruptedException ignored) { // running in an executor - no need to set the interrupted flag on the thread return null; } } } logger.info( "Received a location request of class {} but the handler for this is either not enabled or is not ready to process requests", request.getClass().getSimpleName()); return null; } {code} Fix: {code} @Override public Object processRequest(Object request) throws IOException { long giveup = 0; while (giveup == 0 || System.currentTimeMillis() < giveup) { TcpHandler handler; if (request instanceof PeerLocatorRequest) { handler = handlerMapping.get(PeerLocator
[jira] [Commented] (GEODE-7670) Partitioned Region clear operations can occur during concurrent data operations
[ https://issues.apache.org/jira/browse/GEODE-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157014#comment-17157014 ] ASF GitHub Bot commented on GEODE-7670: --- gesterzhou commented on a change in pull request #4848: URL: https://github.com/apache/geode/pull/4848#discussion_r453976769 ## File path: geode-core/src/distributedTest/java/org/apache/geode/internal/cache/PartitionedRegionClearWithConcurrentOperationsDUnitTest.java ## @@ -0,0 +1,715 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.geode.internal.cache; + +import static org.apache.geode.internal.util.ArrayUtils.asList; +import static org.apache.geode.test.awaitility.GeodeAwaitility.await; +import static org.apache.geode.test.dunit.VM.getVM; +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +import java.io.Serializable; +import java.time.Instant; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.IntStream; + +import junitparams.JUnitParamsRunner; +import junitparams.Parameters; +import junitparams.naming.TestCaseName; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; + +import org.apache.geode.ForcedDisconnectException; +import org.apache.geode.cache.Cache; +import org.apache.geode.cache.CacheWriter; +import org.apache.geode.cache.CacheWriterException; +import org.apache.geode.cache.PartitionAttributes; +import org.apache.geode.cache.PartitionAttributesFactory; +import org.apache.geode.cache.PartitionedRegionPartialClearException; +import org.apache.geode.cache.Region; +import org.apache.geode.cache.RegionEvent; +import org.apache.geode.cache.RegionShortcut; +import org.apache.geode.cache.partition.PartitionRegionHelper; +import org.apache.geode.cache.util.CacheWriterAdapter; +import org.apache.geode.distributed.DistributedSystemDisconnectedException; +import org.apache.geode.distributed.internal.DMStats; +import org.apache.geode.distributed.internal.InternalDistributedSystem; +import org.apache.geode.distributed.internal.membership.api.MembershipManagerHelper; +import org.apache.geode.internal.cache.versions.RegionVersionHolder; +import org.apache.geode.internal.cache.versions.RegionVersionVector; +import org.apache.geode.internal.cache.versions.VersionSource; +import org.apache.geode.test.dunit.AsyncInvocation; +import org.apache.geode.test.dunit.VM; +import org.apache.geode.test.dunit.rules.CacheRule; +import org.apache.geode.test.dunit.rules.DistributedRule; + +/** + * Tests to verify that {@link PartitionedRegion#clear()} operation can be executed multiple times + * on the same region while other cache operations are being executed concurrently and members are + * added or removed. + */ +@RunWith(JUnitParamsRunner.class) +public class PartitionedRegionClearWithConcurrentOperationsDUnitTest implements Serializable { + private static final Integer BUCKETS = 13; + private static final String REGION_NAME = "PartitionedRegion"; + private static final String TEST_CASE_NAME = + "[{index}] {method}(Coordinator:{0}, RegionType:{1})"; + + @Rule + public DistributedRule distributedRule = new DistributedRule(3); + + @Rule + public CacheRule cacheRule = CacheRule.builder().createCacheInAll().build(); + + private VM accessor, server1, server2; + + private enum TestVM { +ACCESSOR(0), SERVER1(1), SERVER2(2); + +final int vmNumber; + +TestVM(int vmNumber) { + this.vmNumber = vmNumber; +} + } + + @SuppressWarnings("unused") + static RegionShortcut[] regionTypes() { +return new RegionShortcut[] { +RegionShortcut.PARTITION, RegionShortcut.PARTITION_REDUNDANT +}; + } + + @SuppressWarnings("unused") + static TestVM[] coordinators() { +return new TestVM[] { +TestVM.SERVER1, TestVM.ACCESSOR +}; + } + + @SuppressWarnings("unused") + static Object[] coordinatorsAndRegionTypes() { +ArrayList parameters = new Ar
[jira] [Updated] (GEODE-8320) SerialWANStatsDUnitTest.testReplicatedSerialPropagationHAWithGroupTransactionEvents is failiing
[ https://issues.apache.org/jira/browse/GEODE-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen Nichols updated GEODE-8320: Fix Version/s: (was: 1.14.0) > SerialWANStatsDUnitTest.testReplicatedSerialPropagationHAWithGroupTransactionEvents > is failiing > --- > > Key: GEODE-8320 > URL: https://issues.apache.org/jira/browse/GEODE-8320 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Mark Hanson >Assignee: Alberto Gomez >Priority: Major > > {noformat} > org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest > > testReplicatedSerialPropagationHAWithGroupTransactionEvents FAILED > 11:55:01 > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest$$Lambda$134/1929719983.run > in VM 2 running on Host 249227cf2774 with 8 VMs > 11:55:01 > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:610) > 11:55:01 > at org.apache.geode.test.dunit.VM.invoke(VM.java:437) > 11:55:01 > at > org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest.testReplicatedSerialPropagationHAWithGroupTransactionEvents(SerialWANStatsDUnitTest.java:578) > 11:55:01 > 11:55:01 > Caused by: > 11:55:01 > org.awaitility.core.ConditionTimeoutException: Assertion condition defined > as a lambda expression in org.apache.geode.internal.cache.wan.WANTestBase > that uses int, intorg.apache.geode.cache.Region Expected region entries: > 2 but actual entries: 1 present region keyset [7435200, <* > Intentionally cut out by Jira submitter *> 8851200] expected:<2> but > was:<1> within 5 minutes. > 11:55:01 > at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > 11:55:01 > at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119) > 11:55:01 > at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31) > 11:55:01 > at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > 11:55:01 > at > org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:679) > 11:55:01 > at > org.apache.geode.internal.cache.wan.WANTestBase.validateRegionSize(WANTestBase.java:2942) > 11:55:01 > at > org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest.lambda$testReplicatedSerialPropagationHAWithGroupTransactionEvents$bb17a952$8(SerialWANStatsDUnitTest.java:578) > 11:55:01 > 11:55:01 > Caused by: > 11:55:01 > java.lang.AssertionError: Expected region entries: 2 but actual entries: > 1 present region keyset [7435200, <*Intentionally cut out by Jira > submitter*> ] expected:<2> but was:<1> > 12:31:11 > {noformat} > > > > > {noformat} > =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0193/test-results/distributedTest/1593463337/ > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test report artifacts from this job are available at: > http://files.apachegeode-ci.info/builds/apache-develop-main/1.14.0-build.0193/test-artifacts/1593463337/distributedtestfiles-OpenJDK8-1.14.0-build.0193.tgz > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEODE-8327) Upgrade buildSrc guava version
[ https://issues.apache.org/jira/browse/GEODE-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Houghton resolved GEODE-8327. Fix Version/s: 1.14.0 Resolution: Fixed > Upgrade buildSrc guava version > -- > > Key: GEODE-8327 > URL: https://issues.apache.org/jira/browse/GEODE-8327 > Project: Geode > Issue Type: Improvement > Components: build >Reporter: Robert Houghton >Priority: Major > Fix For: 1.14.0 > > > Gradle buildSrc uses an old guava library that is transitive via > palantir.docker. Pull it up to a modern version to allow better/more plugin > integrations (like Jib) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEODE-8313) Improve RedisData synchronization for toData
[ https://issues.apache.org/jira/browse/GEODE-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Deppe resolved GEODE-8313. --- Fix Version/s: 1.14.0 Resolution: Fixed > Improve RedisData synchronization for toData > > > Key: GEODE-8313 > URL: https://issues.apache.org/jira/browse/GEODE-8313 > Project: Geode > Issue Type: Bug > Components: redis >Reporter: Jens Deppe >Assignee: Jens Deppe >Priority: Major > Fix For: 1.14.0 > > > During GII, redis data structures may throw > {{ConcurrentModificationException}}s from {{toData}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8067) ClassLoader Isolation
[ https://issues.apache.org/jira/browse/GEODE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157018#comment-17157018 ] ASF GitHub Bot commented on GEODE-8067: --- lgtm-com[bot] commented on pull request #5357: URL: https://github.com/apache/geode/pull/5357#issuecomment-657836650 This pull request **fixes 2 alerts** when merging ff1286006fddc924485d7668155cc56c58a63933 into c41e3b4b559bfbc744c8c21844cd126de2ad2fb9 - [view on LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-ffc286a7b66fc13ae218b8aa90d7a9e74fd0b606) **fixed alerts:** * 2 for Unused variable, import, function or class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ClassLoader Isolation > - > > Key: GEODE-8067 > URL: https://issues.apache.org/jira/browse/GEODE-8067 > Project: Geode > Issue Type: New Feature > Components: client/server >Reporter: Udo Kohlmeyer >Assignee: Udo Kohlmeyer >Priority: Major > > This is the root jira for the first pass implementation for [ClassLoader > Isolation|https://cwiki.apache.org/confluence/display/GEODE/ClassLoader+Isolation] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEODE-8305) PubSubIntegrationTest failing on Windows
[ https://issues.apache.org/jira/browse/GEODE-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Deppe resolved GEODE-8305. --- Fix Version/s: 1.14.0 Resolution: Fixed > PubSubIntegrationTest failing on Windows > > > Key: GEODE-8305 > URL: https://issues.apache.org/jira/browse/GEODE-8305 > Project: Geode > Issue Type: Bug > Components: redis >Reporter: Darrel Schneider >Assignee: Jens Deppe >Priority: Major > Fix For: 1.14.0 > > > {noformat} > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest > > testSubscribeAndPublishUsingBinaryData FAILED > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest that > uses org.apache.geode.redis.mocks.MockBinarySubscriber was not fulfilled > within 5 minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:78) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:26) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864) > at > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.waitFor(PubSubIntegrationTest.java:680) > at > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.testSubscribeAndPublishUsingBinaryData(PubSubIntegrationTest.java:327) > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest > > testUnsubscribingImplicitlyFromAllChannels FAILED > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest that > uses org.apache.geode.redis.mocks.MockSubscriber was not fulfilled within 5 > minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:78) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:26) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864) > at > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.waitFor(PubSubIntegrationTest.java:680) > at > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.testUnsubscribingImplicitlyFromAllChannels(PubSubIntegrationTest.java:400) > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest > > testPatternSubscribe FAILED > org.awaitility.core.ConditionTimeoutException: Condition with lambda > expression in > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest that > uses org.apache.geode.redis.mocks.MockSubscriber was not fulfilled within 5 > minutes. > at > org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:165) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:78) > at > org.awaitility.core.CallableCondition.await(CallableCondition.java:26) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:895) > at > org.awaitility.core.ConditionFactory.until(ConditionFactory.java:864) > at > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.waitFor(PubSubIntegrationTest.java:680) > at > org.apache.geode.redis.internal.executor.pubsub.PubSubIntegrationTest.testPatternSubscribe(PubSubIntegrationTest.java:560) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEODE-8319) NPE due to locator missing cluster config folder
[ https://issues.apache.org/jira/browse/GEODE-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinmei Liao resolved GEODE-8319. Fix Version/s: 1.14.0 Resolution: Fixed > NPE due to locator missing cluster config folder > > > Key: GEODE-8319 > URL: https://issues.apache.org/jira/browse/GEODE-8319 > Project: Geode > Issue Type: Bug > Components: management >Reporter: Jinmei Liao >Assignee: Jinmei Liao >Priority: Major > Fix For: 1.14.0 > > > Opening a JIRA because I believe any NPE is unacceptable in the product. > Please provide some better more explanatory type of error exception if > missing the cluster configuration in the locator. > Exception in thread "main" java.lang.NullPointerException > at > org.apache.geode.distributed.internal.InternalConfigurationPersistenceService.loadSharedConfigurationFromDir(InternalConfigurationPersistenceService.java:672) > at > org.apache.geode.distributed.internal.InternalConfigurationPersistenceService.initSharedConfiguration(InternalConfigurationPersistenceService.java:435) > at > org.apache.geode.distributed.internal.InternalLocator.startConfigurationPersistenceService(InternalLocator.java:1348) > at > org.apache.geode.distributed.internal.InternalLocator.startClusterManagementService(InternalLocator.java:733) > at > org.apache.geode.distributed.internal.InternalLocator.startCache(InternalLocator.java:729) > at > org.apache.geode.distributed.internal.InternalLocator.startDistributedSystem(InternalLocator.java:708) > at > org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:374) > at > org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:679) > at > com.citi.cate.gemfire.CitiGemFireLocatorStart.run(CitiGemFireLocatorStart.java:47) > at > com.citi.cate.gemfire.CitiGemFireLocatorStart.main(CitiGemFireLocatorStart.java:115) > Environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEODE-8239) Gradle configuration to create manifests for all Geode jars
[ https://issues.apache.org/jira/browse/GEODE-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Johnsn resolved GEODE-8239. --- Fix Version/s: 1.14.0 Resolution: Fixed > Gradle configuration to create manifests for all Geode jars > --- > > Key: GEODE-8239 > URL: https://issues.apache.org/jira/browse/GEODE-8239 > Project: Geode > Issue Type: Sub-task > Components: client/server >Reporter: Patrick Johnsn >Assignee: Patrick Johnsn >Priority: Major > Fix For: 1.14.0 > > > Modify the Gradle configuration to generate a manifest file with "Class-Path" > and "Dependent-Modules" attributes inside the jars. This manifest will be > used when defining modules using the jars. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8200) Rebalance operations stuck in "IN_PROGRESS" state forever
[ https://issues.apache.org/jira/browse/GEODE-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinmei Liao updated GEODE-8200: --- Fix Version/s: 1.14.0 > Rebalance operations stuck in "IN_PROGRESS" state forever > - > > Key: GEODE-8200 > URL: https://issues.apache.org/jira/browse/GEODE-8200 > Project: Geode > Issue Type: Bug > Components: management >Reporter: Aaron Lindsey >Assignee: Jianxia Chen >Priority: Major > Labels: GeodeOperationAPI > Fix For: 1.14.0 > > Attachments: GEODE-8200-exportedLogs.zip > > > We use the management REST API to call rebalance immediately before stopping > a server to limit the possibility of data loss. In a cluster with 3 locators, > 3 servers, and no regions, we noticed that sometimes the rebalance operation > never ends if one of the locators is restarting concurrently with the > rebalance operation. > More specifically, the scenario where we see this issue crop up is during an > automated "rolling restart" operation in a Kubernetes environment which > proceeds as follows: > * At most one locator and one server are restarting at any point in time > * Each locator/server waits until the previous locator/server is fully online > before restarting > * Immediately before stopping a server, a rebalance operation is performed > and the server is not stopped until the rebalance operation is completed > The impact of this issue is that the "rolling restart" operation will never > complete, because it cannot proceed with stopping a server until the > rebalance operation is completed. A human is then required to intervene and > manually trigger a rebalance and stop the server. This type of "rolling > restart" operation is triggered fairly often in Kubernetes — any time part of > the configuration of the locators or servers changes. > The following JSON is a sample response from the management REST API that > shows the rebalance operation stuck in "IN_PROGRESS". > {code} > { > "statusCode": "IN_PROGRESS", > "links": { > "self": > "http://geodecluster-sample-locator.default/management/v1/operations/rebalances/a47f23c8-02b3-443c-a367-636fd6921ea7";, > "list": > "http://geodecluster-sample-locator.default/management/v1/operations/rebalances"; > }, > "operationStart": "2020-05-27T22:38:30.619Z", > "operationId": "a47f23c8-02b3-443c-a367-636fd6921ea7", > "operation": { > "simulate": false > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8357) Exhausting the high priority message thread pool can result in deadlock
[ https://issues.apache.org/jira/browse/GEODE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Lund updated GEODE-8357: - Description: The system property "DistributionManager.MAX_THREADS" default to 100: {noformat} int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); {noformat} The system property used to be defined in geode-core ClusterDistributionManager and has moved to geode-core OperationExecutors. The value is used to limit ClusterOperationExecutors threadPool and highPriorityPool: {noformat} threadPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message Processor ", thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( "Pooled High Priority Message Processor ", thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); {noformat} I have seen server startup hang when recovering lots of expired entries from disk while using PDX. The hang looks like a dlock request for the PDX lock is not receiving a response. Checking the value for the distributionStats#highPriorityQueueSize statistic (in VSD) shows the value maxed out and never dropping. The dlock response granting the PDX lock is stuck in the highPriorityQueue because there are no more highPriorityQueue threads available to process the response. All of the highPriorityQueue thread stack dumps show tasks such as recovering bucket from disk are blocked waiting for the PDX lock. Several changes could improve this situation, either in conjunction or individually: # improve observability to enable support to identify that this situation has occurred # increase MAX_THREADS default to 1000 # automatically identify this situation and warn the user with a log statement # automatically prevent this situation # identify the messages that are prone to causing deadlocks and move them to a dedicated thread pool with a higher limit was: The system property "DistributionManager.MAX_THREADS" default to 100: {noformat} int MAX_THREADS = Integer.getInteger("DistributionManager.MAX_THREADS", 100); {noformat} The system property used to be defined in geode-core ClusterDistributionManager and has moved to geode-core OperationExecutors. The value is used to limit ClusterOperationExecutors threadPool and highPriorityPool: {noformat} threadPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics("Pooled Message Processor ", thread -> stats.incProcessingThreadStarts(), this::doProcessingThread, MAX_THREADS, stats.getNormalPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getOverflowQueueHelper()); highPriorityPool = CoreLoggingExecutors.newThreadPoolWithFeedStatistics( "Pooled High Priority Message Processor ", thread -> stats.incHighPriorityThreadStarts(), this::doHighPriorityThread, MAX_THREADS, stats.getHighPriorityPoolHelper(), threadMonitor, INCOMING_QUEUE_LIMIT, stats.getHighPriorityQueueHelper()); {noformat} I have seen server startup hang when recovering lots of expired entries from disk while using PDX. The hang looks like a dlock request for the PDX lock is not receiving a response. Checking the value for the distributionStats#highPriorityQueueSize statistic (in VSD) shows the value maxed out and never dropping. The dlock response granting the PDX lock is stuck in the highPriorityQueue because there are no more highPriorityQueue threads available to process the response. All of the highPriorityQueue thread stack dumps show tasks such as recovering bucket from disk are blocked waiting for the PDX lock. Several changes could improve this situation, either in conjunction or individually: # improve observability to enable support to identify that this situation has occurred # automatically identify this situation and warn the user with a log statement # automatically prevent this situation # identify the messages that are prone to causing deadlocks and move them to a dedicated thread pool with a higher limit > Exhausting the high priority message thread pool can result in deadlock > --- > > Key: GEODE-8357 > URL: https://issues.apache.org/jira/browse/GEODE-8357 > Project: Geode > Issue Type: Bug > Components: messaging >Affects Versions: 1.0.0-incubating, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, > 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0 >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: Geo
[jira] [Commented] (GEODE-8067) ClassLoader Isolation
[ https://issues.apache.org/jira/browse/GEODE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157040#comment-17157040 ] ASF GitHub Bot commented on GEODE-8067: --- lgtm-com[bot] commented on pull request #5357: URL: https://github.com/apache/geode/pull/5357#issuecomment-65711 This pull request **introduces 2 alerts** and **fixes 2** when merging 591c7469807ff61ab71176003f906c986845957a into c41e3b4b559bfbc744c8c21844cd126de2ad2fb9 - [view on LGTM.com](https://lgtm.com/projects/g/apache/geode/rev/pr-89c2bb8fd7b4c28a26432c538ccc3f17af02) **new alerts:** * 2 for Potential input resource leak **fixed alerts:** * 2 for Unused variable, import, function or class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ClassLoader Isolation > - > > Key: GEODE-8067 > URL: https://issues.apache.org/jira/browse/GEODE-8067 > Project: Geode > Issue Type: New Feature > Components: client/server >Reporter: Udo Kohlmeyer >Assignee: Udo Kohlmeyer >Priority: Major > > This is the root jira for the first pass implementation for [ClassLoader > Isolation|https://cwiki.apache.org/confluence/display/GEODE/ClassLoader+Isolation] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8333) Fix PUBSUB hang
[ https://issues.apache.org/jira/browse/GEODE-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157148#comment-17157148 ] ASF GitHub Bot commented on GEODE-8333: --- jdeppe-pivotal opened a new pull request #5368: URL: https://github.com/apache/geode/pull/5368 - Introduce notion of a Subscription being 'active'. This flag is only set once a subscriber has been moved to the 'subscribers' EventLoopGroup. This avoids a subscriber processing a publish message when it is still on the 'worker' EventLoopGroup which may cause a hang. - Refactor various MockSubscribers to a single class. Authored-by: Jens Deppe Thank you for submitting a contribution to Apache Geode. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Has your PR been rebased against the latest commit within the target branch (typically `develop`)? - [ ] Is your initial contribution a single, squashed commit? - [ ] Does `gradlew build` run cleanly? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? ### Note: Please ensure that once the PR is submitted, check Concourse for build issues and submit an update to your PR as soon as possible. If you need help, please send an email to d...@geode.apache.org. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix PUBSUB hang > --- > > Key: GEODE-8333 > URL: https://issues.apache.org/jira/browse/GEODE-8333 > Project: Geode > Issue Type: Bug > Components: redis >Reporter: Sarah Abbey >Priority: Major > > PUBSUB hangs with concurrent publishers and subscribers on multiple servers -- This message was sent by Atlassian Jira (v8.3.4#803005)