[jira] [Created] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0
Jakov Varenina created GEODE-9881: - Summary: Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 Key: GEODE-9881 URL: https://issues.apache.org/jira/browse/GEODE-9881 Project: Geode Issue Type: Bug Components: persistence Reporter: Jakov Varenina -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0
[ https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-9881: -- Description: We have found problem in case when region is closed and then recreated to start the recovery. If you inspect this code in close() function you will notice that it doesn't make any sense: {code:java} void close(DiskRegion dr) { // while a krf is being created can not close a region lockCompactor(); try { if (!isDrfOnly()) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { long clearCount = dri.clear(null); if (clearCount != 0) { totalLiveCount.addAndGet(-clearCount); // no need to call handleNoLiveValues because we now have an // unrecovered region. } regionMap.get().remove(dr.getId(), dri); } addUnrecoveredRegion(dr.getId()); } } finally { unlockCompactor(); } } {code} Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo object is contained in regionMap structure. Then afterwards it removes DiskRegionInfo object (that was previously marked as unrecovered) from the regionMap. This doesn't make any sense, it updated object and then removed it from map to be garbage collected. As you will see later on this will cause some issues when region is recovered. Please check this code at recovery: {code:java} /** * For each dri that this oplog has that is currently unrecoverable check to see if a DiskRegion * that is recoverable now exists. */ void checkForRecoverableRegion(DiskRegionView dr) { if (unrecoveredRegionCount.get() > 0) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { if (dri.testAndSetRecovered(dr)) { unrecoveredRegionCount.decrementAndGet(); } } } } {code} The problem is that geode will not clear counter unrecoveredRegionCount in Oplog objects after recovery is done. This is because checkForRecoverableRegion will check unrecoveredRegionCount counter and perform testAndSetRecovered. The testAndSetRecovered will always return false, because non of the DiskRegionInfo objects in region map have unrecovered flag set to true (all object marked as unrecovered were deleted by close(), and then they were recreated during recovery see note below). The problem here is that all Oplogs will be fully recovered with the counter incorrectly indicating unrecoveredRegionCount>0. This will later on prevent the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) when they reach compaction threshold. Note: During recovery regionMap will be recreated from the Oplog files. Since all DiskRegionInfo objects are deleted from regionMap during the close(), they will be recreated by using function initRecoveredEntry during the recovery. All DiskRegionInfo will be created with flag unrecovered set to false. was: We have found problem when region is closed and then recreated to start the recovery. If you inspect this code in close() function you will notice that it doesn't make any sense: {code:java} void close(DiskRegion dr) { // while a krf is being created can not close a region lockCompactor(); try { if (!isDrfOnly()) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { long clearCount = dri.clear(null); if (clearCount != 0) { totalLiveCount.addAndGet(-clearCount); // no need to call handleNoLiveValues because we now have an // unrecovered region. } regionMap.get().remove(dr.getId(), dri); } addUnrecoveredRegion(dr.getId()); } } finally { unlockCompactor(); } } {code} Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo object is contained in regionMap structure. Then afterwards it removes DiskRegionInfo object (that was previously marked as unrecovered) from the regionMap. This doesn't make any sense, it updated object and then removed it from map to be garbage collected. As you will see later on this will cause some issues when region is recovered. Please check this code at recovery: {code:java} /** * For each dri that this oplog has that is currently unrecoverable check to see if a DiskRegion * that is recoverable now exists. */ void checkForRecoverableRegion(DiskRegionView dr) { if (unrecoveredRegionCount.get() > 0) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { if (dri.testAndSetRecovered(dr)) { unrecoveredRegionCount.decrementAndGet(); } } } } {code} The problem is that geode will not clear counter unrecoveredRegionCount in Oplog objects after recovery is done. Thi
[jira] [Assigned] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0
[ https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina reassigned GEODE-9881: - Assignee: Jakov Varenina > Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 > > > Key: GEODE-9881 > URL: https://issues.apache.org/jira/browse/GEODE-9881 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Jakov Varenina >Assignee: Jakov Varenina >Priority: Major > > We have found problem in case when region is closed and then recreated to > start the recovery. If you inspect this code in close() function you will > notice that it doesn't make any sense: > {code:java} > void close(DiskRegion dr) { > // while a krf is being created can not close a region > lockCompactor(); > try { > if (!isDrfOnly()) { > DiskRegionInfo dri = getDRI(dr); > if (dri != null) { > long clearCount = dri.clear(null); > if (clearCount != 0) { > totalLiveCount.addAndGet(-clearCount); > // no need to call handleNoLiveValues because we now have an > // unrecovered region. > } > regionMap.get().remove(dr.getId(), dri); > } > addUnrecoveredRegion(dr.getId()); > } > } finally { > unlockCompactor(); > } > } > {code} > Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as > unrecovered and increments counter unrecoveredRegionCount. This > DiskRegionInfo object is contained in regionMap structure. Then afterwards it > removes DiskRegionInfo object (that was previously marked as unrecovered) > from the regionMap. This doesn't make any sense, it updated object and then > removed it from map to be garbage collected. As you will see later on this > will cause some issues when region is recovered. > Please check this code at recovery: > {code:java} > /** > * For each dri that this oplog has that is currently unrecoverable check to > see if a DiskRegion > * that is recoverable now exists. > */ > void checkForRecoverableRegion(DiskRegionView dr) { > if (unrecoveredRegionCount.get() > 0) { > DiskRegionInfo dri = getDRI(dr); > if (dri != null) { > if (dri.testAndSetRecovered(dr)) { > unrecoveredRegionCount.decrementAndGet(); > } > } > } > } > {code} > The problem is that geode will not clear counter unrecoveredRegionCount in > Oplog objects after recovery is done. This is because > checkForRecoverableRegion will check unrecoveredRegionCount counter and > perform testAndSetRecovered. The testAndSetRecovered will always return > false, because non of the DiskRegionInfo objects in region map have > unrecovered flag set to true (all object marked as unrecovered were deleted > by close(), and then they were recreated during recovery see note below). > The problem here is that all Oplogs will be fully recovered with the counter > incorrectly indicating unrecoveredRegionCount>0. This will later on prevent > the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) > when they reach compaction threshold. > Note: During recovery regionMap will be recreated from the Oplog files. Since > all DiskRegionInfo objects are deleted from regionMap during the close(), > they will be recreated by using function initRecoveredEntry during the > recovery. All DiskRegionInfo will be created with flag unrecovered set to > false. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0
[ https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-9881: -- Description: We have found problem when region is closed and then recreated to start the recovery. If you inspect this code in close() function you will notice that it doesn't make any sense: {code:java} void close(DiskRegion dr) { // while a krf is being created can not close a region lockCompactor(); try { if (!isDrfOnly()) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { long clearCount = dri.clear(null); if (clearCount != 0) { totalLiveCount.addAndGet(-clearCount); // no need to call handleNoLiveValues because we now have an // unrecovered region. } regionMap.get().remove(dr.getId(), dri); } addUnrecoveredRegion(dr.getId()); } } finally { unlockCompactor(); } } {code} Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo object is contained in regionMap structure. Then afterwards it removes DiskRegionInfo object (that was previously marked as unrecovered) from the regionMap. This doesn't make any sense, it updated object and then removed it from map to be garbage collected. As you will see later on this will cause some issues when region is recovered. Please check this code at recovery: {code:java} /** * For each dri that this oplog has that is currently unrecoverable check to see if a DiskRegion * that is recoverable now exists. */ void checkForRecoverableRegion(DiskRegionView dr) { if (unrecoveredRegionCount.get() > 0) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { if (dri.testAndSetRecovered(dr)) { unrecoveredRegionCount.decrementAndGet(); } } } } {code} The problem is that geode will not clear counter unrecoveredRegionCount in Oplog objects after recovery is done. This is because checkForRecoverableRegion will check unrecoveredRegionCount counter and perform testAndSetRecovered. The testAndSetRecovered will always return false, because non of the DiskRegionInfo objects in region map have unrecovered flag set to true (all object marked as unrecovered were deleted by close(), and then they were recreated during recovery see note below). The problem here is that all Oplogs will be fully recovered with the counter incorrectly indicating unrecoveredRegionCount>0. This will later on prevent the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) when they reach compaction threshold. Note: During recovery regionMap will be recreated from the Oplog files. Since all DiskRegionInfo objects are deleted from regionMap during the close(), they will be recreated by using function initRecoveredEntry during the recovery. All DiskRegionInfo will be created with flag unrecovered set to false. > Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 > > > Key: GEODE-9881 > URL: https://issues.apache.org/jira/browse/GEODE-9881 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Jakov Varenina >Priority: Major > > We have found problem when region is closed and then recreated to start the > recovery. If you inspect this code in close() function you will notice that > it doesn't make any sense: > {code:java} > void close(DiskRegion dr) { > // while a krf is being created can not close a region > lockCompactor(); > try { > if (!isDrfOnly()) { > DiskRegionInfo dri = getDRI(dr); > if (dri != null) { > long clearCount = dri.clear(null); > if (clearCount != 0) { > totalLiveCount.addAndGet(-clearCount); > // no need to call handleNoLiveValues because we now have an > // unrecovered region. > } > regionMap.get().remove(dr.getId(), dri); > } > addUnrecoveredRegion(dr.getId()); > } > } finally { > unlockCompactor(); > } > } > {code} > Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as > unrecovered and increments counter unrecoveredRegionCount. This > DiskRegionInfo object is contained in regionMap structure. Then afterwards it > removes DiskRegionInfo object (that was previously marked as unrecovered) > from the regionMap. This doesn't make any sense, it updated object and then > removed it from map to be garbage collected. As you will see later on this > will cause some issues when region is recovered. > Please check this code at recovery: > {code:java} > /** > * For each dri that this oplog has that is currently unrecoverable
[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing compaction
[ https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-9881: -- Summary: Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing compaction (was: Fully recoverd Oplogs object indicating unrecoveredRegionCount>0) > Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing > compaction > -- > > Key: GEODE-9881 > URL: https://issues.apache.org/jira/browse/GEODE-9881 > Project: Geode > Issue Type: Bug > Components: persistence >Reporter: Jakov Varenina >Assignee: Jakov Varenina >Priority: Major > > We have found problem in case when region is closed and then recreated to > start the recovery. If you inspect this code in close() function you will > notice that it doesn't make any sense: > {code:java} > void close(DiskRegion dr) { > // while a krf is being created can not close a region > lockCompactor(); > try { > if (!isDrfOnly()) { > DiskRegionInfo dri = getDRI(dr); > if (dri != null) { > long clearCount = dri.clear(null); > if (clearCount != 0) { > totalLiveCount.addAndGet(-clearCount); > // no need to call handleNoLiveValues because we now have an > // unrecovered region. > } > regionMap.get().remove(dr.getId(), dri); > } > addUnrecoveredRegion(dr.getId()); > } > } finally { > unlockCompactor(); > } > } > {code} > Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as > unrecovered and increments counter unrecoveredRegionCount. This > DiskRegionInfo object is contained in regionMap structure. Then afterwards it > removes DiskRegionInfo object (that was previously marked as unrecovered) > from the regionMap. This doesn't make any sense, it updated object and then > removed it from map to be garbage collected. As you will see later on this > will cause some issues when region is recovered. > Please check this code at recovery: > {code:java} > /** > * For each dri that this oplog has that is currently unrecoverable check to > see if a DiskRegion > * that is recoverable now exists. > */ > void checkForRecoverableRegion(DiskRegionView dr) { > if (unrecoveredRegionCount.get() > 0) { > DiskRegionInfo dri = getDRI(dr); > if (dri != null) { > if (dri.testAndSetRecovered(dr)) { > unrecoveredRegionCount.decrementAndGet(); > } > } > } > } > {code} > The problem is that geode will not clear counter unrecoveredRegionCount in > Oplog objects after recovery is done. This is because > checkForRecoverableRegion will check unrecoveredRegionCount counter and > perform testAndSetRecovered. The testAndSetRecovered will always return > false, because non of the DiskRegionInfo objects in region map have > unrecovered flag set to true (all object marked as unrecovered were deleted > by close(), and then they were recreated during recovery see note below). > The problem here is that all Oplogs will be fully recovered with the counter > incorrectly indicating unrecoveredRegionCount>0. This will later on prevent > the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) > when they reach compaction threshold. > Note: During recovery regionMap will be recreated from the Oplog files. Since > all DiskRegionInfo objects are deleted from regionMap during the close(), > they will be recreated by using function initRecoveredEntry during the > recovery. All DiskRegionInfo will be created with flag unrecovered set to > false. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9880) Cluster with multiple locators in an environment with no host name resolution, leads to null pointer exception
[ https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456357#comment-17456357 ] Tigran Ghahramanyan commented on GEODE-9880: Experimenting with _builder.setHostnameForClients();_ And setting the corresponding IP address represented as a string to override the host name for each locator, works around the above described problem, allowing the cluster to start. > Cluster with multiple locators in an environment with no host name > resolution, leads to null pointer exception > -- > > Key: GEODE-9880 > URL: https://issues.apache.org/jira/browse/GEODE-9880 > Project: Geode > Issue Type: Bug > Components: locator >Affects Versions: 1.12.5 >Reporter: Tigran Ghahramanyan >Priority: Major > > In our use case we have two locators that are initially configured with IP > addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on > adding their corresponding host names to the locators list, while these host > names are not resolvable. > Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client > tries to use such non resolvable host name to connect to a locator it tries > to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in > {_}SocketCreator.connect(){_}. Which seems strange. > Then, if there is no locator running on the same host, the next locator in > the list is contacted, until reaching a locator contact configured with IP > address - which succeeds eventually. > But, when there happens to be a locator listening on the same host, then we > have a null pointer exception in the second line below, because _inetadd=null_ > _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, > connects to a locator listening on the same host_ > _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd > = null_ > > As a result, the cluster comes to a failed state, unable to recover. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing compaction
[ https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-9881: -- Description: We have found problem in case when region is closed with {color:#ff}Region.close(){color} and then recreated to start the recovery. If you inspect this code in close() function you will notice that it doesn't make any sense: {code:java} void close(DiskRegion dr) { // while a krf is being created can not close a region lockCompactor(); try { if (!isDrfOnly()) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { long clearCount = dri.clear(null); if (clearCount != 0) { totalLiveCount.addAndGet(-clearCount); // no need to call handleNoLiveValues because we now have an // unrecovered region. } regionMap.get().remove(dr.getId(), dri); } addUnrecoveredRegion(dr.getId()); } } finally { unlockCompactor(); } } {code} Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo object is contained in regionMap structure. Then afterwards it removes DiskRegionInfo object (that was previously marked as unrecovered) from the regionMap. This doesn't make any sense, it updated object and then removed it from map to be garbage collected. As you will see later on this will cause some issues when region is recovered. Please check this code at recovery: {code:java} /** * For each dri that this oplog has that is currently unrecoverable check to see if a DiskRegion * that is recoverable now exists. */ void checkForRecoverableRegion(DiskRegionView dr) { if (unrecoveredRegionCount.get() > 0) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { if (dri.testAndSetRecovered(dr)) { unrecoveredRegionCount.decrementAndGet(); } } } } {code} The problem is that geode will not clear counter unrecoveredRegionCount in Oplog objects after recovery is done. This is because checkForRecoverableRegion will check unrecoveredRegionCount counter and perform testAndSetRecovered. The testAndSetRecovered will always return false, because non of the DiskRegionInfo objects in region map have unrecovered flag set to true (all object marked as unrecovered were deleted by close(), and then they were recreated during recovery see note below). The problem here is that all Oplogs will be fully recovered with the counter incorrectly indicating unrecoveredRegionCount>0. This will later on prevent the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) when they reach compaction threshold. Note: During recovery regionMap will be recreated from the Oplog files. Since all DiskRegionInfo objects are deleted from regionMap during the close(), they will be recreated by using function initRecoveredEntry during the recovery. All DiskRegionInfo will be created with flag unrecovered set to false. was: We have found problem in case when region is closed and then recreated to start the recovery. If you inspect this code in close() function you will notice that it doesn't make any sense: {code:java} void close(DiskRegion dr) { // while a krf is being created can not close a region lockCompactor(); try { if (!isDrfOnly()) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { long clearCount = dri.clear(null); if (clearCount != 0) { totalLiveCount.addAndGet(-clearCount); // no need to call handleNoLiveValues because we now have an // unrecovered region. } regionMap.get().remove(dr.getId(), dri); } addUnrecoveredRegion(dr.getId()); } } finally { unlockCompactor(); } } {code} Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo object is contained in regionMap structure. Then afterwards it removes DiskRegionInfo object (that was previously marked as unrecovered) from the regionMap. This doesn't make any sense, it updated object and then removed it from map to be garbage collected. As you will see later on this will cause some issues when region is recovered. Please check this code at recovery: {code:java} /** * For each dri that this oplog has that is currently unrecoverable check to see if a DiskRegion * that is recoverable now exists. */ void checkForRecoverableRegion(DiskRegionView dr) { if (unrecoveredRegionCount.get() > 0) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { if (dri.testAndSetRecovered(dr)) { unrecoveredRegionCount.decrementAndGet(); } } } } {code} The problem is that geode will not clear counter unrecoveredRegion
[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing compaction
[ https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakov Varenina updated GEODE-9881: -- Description: We have found problem in case when region is closed with Region.close() and then recreated to start the recovery. If you inspect this code in close() function you will notice that it doesn't make any sense: {code:java} void close(DiskRegion dr) { // while a krf is being created can not close a region lockCompactor(); try { if (!isDrfOnly()) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { long clearCount = dri.clear(null); if (clearCount != 0) { totalLiveCount.addAndGet(-clearCount); // no need to call handleNoLiveValues because we now have an // unrecovered region. } regionMap.get().remove(dr.getId(), dri); } addUnrecoveredRegion(dr.getId()); } } finally { unlockCompactor(); } } {code} Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo object is contained in regionMap structure. Then afterwards it removes DiskRegionInfo object (that was previously marked as unrecovered) from the regionMap. This doesn't make any sense, it updated object and then removed it from map to be garbage collected. As you will see later on this will cause some issues when region is recovered. Please check this code at recovery: {code:java} /** * For each dri that this oplog has that is currently unrecoverable check to see if a DiskRegion * that is recoverable now exists. */ void checkForRecoverableRegion(DiskRegionView dr) { if (unrecoveredRegionCount.get() > 0) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { if (dri.testAndSetRecovered(dr)) { unrecoveredRegionCount.decrementAndGet(); } } } } {code} The problem is that geode will not clear counter unrecoveredRegionCount in Oplog objects after recovery is done. This is because checkForRecoverableRegion will check unrecoveredRegionCount counter and perform testAndSetRecovered. The testAndSetRecovered will always return false, because non of the DiskRegionInfo objects in region map have unrecovered flag set to true (all object marked as unrecovered were deleted by close(), and then they were recreated during recovery see note below). The problem here is that all Oplogs will be fully recovered with the counter incorrectly indicating unrecoveredRegionCount>0. This will later on prevent the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) when they reach compaction threshold. Note: During recovery regionMap will be recreated from the Oplog files. Since all DiskRegionInfo objects are deleted from regionMap during the close(), they will be recreated by using function initRecoveredEntry during the recovery. All DiskRegionInfo will be created with flag unrecovered set to false. was: We have found problem in case when region is closed with {color:#ff}Region.close(){color} and then recreated to start the recovery. If you inspect this code in close() function you will notice that it doesn't make any sense: {code:java} void close(DiskRegion dr) { // while a krf is being created can not close a region lockCompactor(); try { if (!isDrfOnly()) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { long clearCount = dri.clear(null); if (clearCount != 0) { totalLiveCount.addAndGet(-clearCount); // no need to call handleNoLiveValues because we now have an // unrecovered region. } regionMap.get().remove(dr.getId(), dri); } addUnrecoveredRegion(dr.getId()); } } finally { unlockCompactor(); } } {code} Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo object is contained in regionMap structure. Then afterwards it removes DiskRegionInfo object (that was previously marked as unrecovered) from the regionMap. This doesn't make any sense, it updated object and then removed it from map to be garbage collected. As you will see later on this will cause some issues when region is recovered. Please check this code at recovery: {code:java} /** * For each dri that this oplog has that is currently unrecoverable check to see if a DiskRegion * that is recoverable now exists. */ void checkForRecoverableRegion(DiskRegionView dr) { if (unrecoveredRegionCount.get() > 0) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { if (dri.testAndSetRecovered(dr)) { unrecoveredRegionCount.decrementAndGet(); } } } } {code} The problem is that geode will not clear count
[jira] [Commented] (GEODE-9814) Add an example of geode-for-redis to the geode examples project
[ https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456467#comment-17456467 ] ASF GitHub Bot commented on GEODE-9814: --- jomartin-999 commented on a change in pull request #110: URL: https://github.com/apache/geode-examples/pull/110#discussion_r765826888 ## File path: geodeForRedis/scripts/start.gfsh ## @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ + +start locator --name=locator --bind-address=localhost + +start server --name=redisServer1 --locators=localhost[10334] --server-port=0 --J=-Dgemfire.geode-for-redis-enabled=true --J=-Dgemfire.geode-for-redis-port=6379 --J=-Dgemfire.geode-for-redis-bind-address=127.0.0.1 Review comment: @DonalEvans It might be nice for this example to also set the redundancy level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add an example of geode-for-redis to the geode examples project > --- > > Key: GEODE-9814 > URL: https://issues.apache.org/jira/browse/GEODE-9814 > Project: Geode > Issue Type: Improvement > Components: redis >Reporter: Dan Smith >Assignee: Donal Evans >Priority: Major > Labels: pull-request-available > > Add an example to the geode-examples project/repo demonstrating how to turn > on and use geode-for-redis. > This is just a script. User must download native Redis to get command line > tool. > Cluster Mode must be used. > Start Server with gfsh. > Use JedisCluster client to: > * Perform Sets > * Perform Gets > Have a readme that speaks to using native Redis. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9880) Cluster with multiple locators in an environment with no host name resolution, leads to null pointer exception
[ https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456560#comment-17456560 ] Anthony Baker commented on GEODE-9880: -- The locator list returned to the client contained \{ip1, host1, ip2, host2}. The discovered list provided to the client should have only contained 2 entries corresponding to the 2 locators. Second, the advertised address of the locator should follow this semantic: 1) If hostname-for-clients is set use that. 2) If bind-address is set, use that interface. 3) Otherwise select an available network interface but there are no guarantees about ordering or dns resolution. > Cluster with multiple locators in an environment with no host name > resolution, leads to null pointer exception > -- > > Key: GEODE-9880 > URL: https://issues.apache.org/jira/browse/GEODE-9880 > Project: Geode > Issue Type: Bug > Components: locator >Affects Versions: 1.12.5 >Reporter: Tigran Ghahramanyan >Priority: Major > > In our use case we have two locators that are initially configured with IP > addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on > adding their corresponding host names to the locators list, while these host > names are not resolvable. > Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client > tries to use such non resolvable host name to connect to a locator it tries > to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in > {_}SocketCreator.connect(){_}. Which seems strange. > Then, if there is no locator running on the same host, the next locator in > the list is contacted, until reaching a locator contact configured with IP > address - which succeeds eventually. > But, when there happens to be a locator listening on the same host, then we > have a null pointer exception in the second line below, because _inetadd=null_ > _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, > connects to a locator listening on the same host_ > _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd > = null_ > > As a result, the cluster comes to a failed state, unable to recover. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9622) CI Failure: ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails with BindException
[ https://issues.apache.org/jira/browse/GEODE-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456562#comment-17456562 ] ASF subversion and git services commented on GEODE-9622: Commit 20c417710673faf1d2afb2d72ed14fcaadc17926 in geode's branch refs/heads/develop from Dale Emery [ https://gitbox.apache.org/repos/asf?p=geode.git;h=20c4177 ] GEODE-9622: Make failover test not use ephemeral port (#7178) PROBLEM `ClientServerTransactionFailoverWithMixedVersionServersDistributedTest` misused ephemeral ports. Some tests start a locator on an ephemeral port, stop the locator, and attempt to restart it on the same port. During the time the locator is stopped, the OS can assign that port to another process. When that happens, as in these failures, the test is unable to restart the locator. SOLUTION Change the test to use `AvailablePortHelper` to assign an available port, rather than requesting an ephemeral port. > CI Failure: > ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails > with BindException > -- > > Key: GEODE-9622 > URL: https://issues.apache.org/jira/browse/GEODE-9622 > Project: Geode > Issue Type: Bug > Components: client/server, tests >Reporter: Kirk Lund >Assignee: Dale Emery >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > {noformat} > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest > > clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest$$Lambda$68/194483270.call > in VM 5 running on Host > heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal > with 6 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631) > at org.apache.geode.test.dunit.VM.invoke(VM.java:473) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.rollLocatorToCurrent(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.setupPartiallyRolledVersion(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:171) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:127) > Caused by: > java.net.BindException: Failed to create server socket on > heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal/10.0.0.60[43535] > at > org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:75) > at > org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:55) > at > org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:54) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.initializeServerSocket(TcpServer.java:196) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.startServerThread(TcpServer.java:183) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:178) > at > org.apache.geode.distributed.internal.membership.gms.locator.MembershipLocatorImpl.start(MembershipLocatorImpl.java:112) > at > org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:653) > at > org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:394) > at > org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:345) > at > org.apache.geode.distributed.Locator.startLocator(Locator.java:261) > at > org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:207) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.startLocator(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:180) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.lambda$rollLocatorToCurrent$92d5d92a$1(ClientServerTransa
[jira] [Updated] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
[ https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-9877: -- Labels: pull-request-available (was: ) > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed > -- > > Key: GEODE-9877 > URL: https://issues.apache.org/jira/browse/GEODE-9877 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Mark Hanson >Priority: Major > Labels: pull-request-available > > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43] > failed with > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse > {noformat} > java.net.BindException: Address already in use (Bind failed) > at java.net.PlainSocketImpl.socketBind(Native Method) > at > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) > at java.net.Socket.bind(Socket.java:662) > at > org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > at > org.junit
[jira] [Resolved] (GEODE-9870) JedisMovedDataException exception in testReconnectionWithAuthAndServerRestarts
[ https://issues.apache.org/jira/browse/GEODE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Deppe resolved GEODE-9870. --- Fix Version/s: 1.15.0 Resolution: Fixed > JedisMovedDataException exception in testReconnectionWithAuthAndServerRestarts > -- > > Key: GEODE-9870 > URL: https://issues.apache.org/jira/browse/GEODE-9870 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Bill Burcham >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > CI failure here > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/315|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/315]: > > {code:java} > AuthWhileServersRestartDUnitTest > testReconnectionWithAuthAndServerRestarts > FAILED > redis.clients.jedis.exceptions.JedisMovedDataException: MOVED 12539 > 127.0.0.1:26259 > at redis.clients.jedis.Protocol.processError(Protocol.java:119) > at redis.clients.jedis.Protocol.process(Protocol.java:169) > at redis.clients.jedis.Protocol.read(Protocol.java:223) > at > redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:352) > at > redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:270) > at redis.clients.jedis.BinaryJedis.flushAll(BinaryJedis.java:826) > at > org.apache.geode.test.dunit.rules.RedisClusterStartupRule.flushAll(RedisClusterStartupRule.java:147) > at > org.apache.geode.test.dunit.rules.RedisClusterStartupRule.flushAll(RedisClusterStartupRule.java:131) > at > org.apache.geode.redis.internal.executor.auth.AuthWhileServersRestartDUnitTest.after(AuthWhileServersRestartDUnitTest.java:88){code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly
[ https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456592#comment-17456592 ] Xiaojian Zhou commented on GEODE-8644: -- "Failed to connect to localhost/127.0.0.1:0" error message was introduced in Geode-7751. But introducing this error message itself is not the root cause. > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > intermittently fails when queues drain too slowly > --- > > Key: GEODE-8644 > URL: https://issues.apache.org/jira/browse/GEODE-8644 > Project: Geode > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Benjamin P Ross >Assignee: Mark Hanson >Priority: Major > Labels: GeodeOperationAPI, needsTriage, pull-request-available > > Currently the test > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > relies on a 2 second delay to allow for queues to finish draining after > finishing the put operation. If queues take longer than 2 seconds to drain > the test will fail. We should change the test to wait for the queues to be > empty with a long timeout in case the queues never fully drain. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9872) DistTXPersistentDebugDUnitTest tests fail because "cluster configuration service not available"
[ https://issues.apache.org/jira/browse/GEODE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456612#comment-17456612 ] ASF subversion and git services commented on GEODE-9872: Commit 68b9080e84054f059b8c3e9b4aff9034fb302353 in geode's branch refs/heads/develop from Dale Emery [ https://gitbox.apache.org/repos/asf?p=geode.git;h=68b9080 ] GEODE-9872: Make test framework code assign ports (#7176) * GEODE-9872: Make test framework code assign ports PROBLEM `DistTXPersistentDebugDUnitTest ` failed in CI because it accidentally connected to a locator from another test (`ClusterConfigLocatorRestartDUnitTest`). CAUSE `ClusterConfigLocatorRestartDUnitTest` attempts to restart a locator on a port in the ephemeral port range. Here is the sequence of events: 1. `ClusterConfigLocatorRestartDUnitTest ` started a locator on an ephemeral port. In this CI run it got port 37877. 2. `ClusterConfigLocatorRestartDUnitTest` stopped the locator on port 37877. 3. `DistTXPersistentDebugDUnitTest` started a locator on an ephemeral port. In this CI run it got 37877. 4. `ClusterConfigLocatorRestartDUnitTest ` attempted to restart the locator on port 37877. That port was already in use in `DistTXPersistentDebugDUnitTest`'s locator, and as a result the two tests became entangled. CONTRIBUTING FACTORS `DistTXPersistentDebugDUnitTest` uses `DUnitLauncher` to start its locator. By default, `DUnitLauncher` starts its locator on an ephemeral port. `ClusterConfigLocatorRestartDUnitTest` uses `ClusterStartupRule` to start several locators. By default, `ClusterStartupRule` starts each locator on an ephemeral port. SOLUTION Change `DUnitLauncher` and `ClusterStartupRule` to assign locator ports via `AvailablePortHelper` if the test does not specify a particular port. I considered changing only `ClusterConfigLogatorRestartDUnitTest` to assign the port that it intends to reuse. But: - That would fix only this one test, though an unknown number of tests similarly attempt to reuse ports assigned by framework code. Numerous of those tests have already been changed to assign ports explicitly, but an unknown number remain. - It is quite reasonable for this test and others to assume that, if the test framework assigns a port on the test's behalf, then the test will enjoy exclusive use of that port for the entire life of the test. I think the key problem is not that tests make this assumption, but that the framework code violates it. Changing the test framework classes that tacitly assign ports (`DUnitLauncher` and `ClusterStartupRule`) makes them behave in a way that tests expect. * Add new port var to dunit sanctioned serializables > DistTXPersistentDebugDUnitTest tests fail because "cluster configuration > service not available" > --- > > Key: GEODE-9872 > URL: https://issues.apache.org/jira/browse/GEODE-9872 > Project: Geode > Issue Type: Bug > Components: tests >Reporter: Bill Burcham >Assignee: Dale Emery >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > I suspect this failure is due to something in the test framework, or perhaps > one or more tests failing to manage ports correctly, allowing two or more > tests to interfere with one another. > In this distributed test: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/388] > we see two failures. Here's the first full stack trace: > > > {code:java} > [error 2021/12/04 20:40:53.796 UTC > tid=33] org.apache.geode.GemFireConfigException: cluster configuration > service not available > at > org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:196) > at > org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:226) > at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:192) > at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:79) > at > org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87) > at > org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225) > at > org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72) > at > org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222) > at > org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372) > at > org.jun
[jira] [Resolved] (GEODE-9871) CI failure: InfoStatsIntegrationTest > networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond
[ https://issues.apache.org/jira/browse/GEODE-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Deppe resolved GEODE-9871. --- Fix Version/s: 1.15.0 Assignee: Jens Deppe Resolution: Fixed > CI failure: InfoStatsIntegrationTest > > networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond > > > Key: GEODE-9871 > URL: https://issues.apache.org/jira/browse/GEODE-9871 > Project: Geode > Issue Type: Bug > Components: redis, statistics >Affects Versions: 1.15.0 >Reporter: Bill Burcham >Assignee: Jens Deppe >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > link: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/integration-test-openjdk8/builds/38] > stack trace: > {code:java} > InfoStatsIntegrationTest > > networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond > FAILED > org.opentest4j.AssertionFailedError: > expected: 0.0 > but was: 0.01 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.redis.internal.commands.executor.server.AbstractRedisInfoStatsIntegrationTest.networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond(AbstractRedisInfoStatsIntegrationTest.java:228) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.util
[jira] [Commented] (GEODE-9758) Configure locator serialization filtering by default on Java 8
[ https://issues.apache.org/jira/browse/GEODE-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456650#comment-17456650 ] ASF subversion and git services commented on GEODE-9758: Commit db64b4948e790d61e82f95ae6163a62adc4c67fb in geode's branch refs/heads/develop from Kirk Lund [ https://gitbox.apache.org/repos/asf?p=geode.git;h=db64b49 ] GEODE-9758: Move SanctionedSerializables to filter package (#7165) Move SanctionedSerializables to new package org.apache.geode.internal.serialization.filter. > Configure locator serialization filtering by default on Java 8 > -- > > Key: GEODE-9758 > URL: https://issues.apache.org/jira/browse/GEODE-9758 > Project: Geode > Issue Type: Improvement >Affects Versions: 1.12.7 >Reporter: Jianxia Chen >Assignee: Jianxia Chen >Priority: Major > Labels: pull-request-available > > When Geode locator is running on Java 8 JVM, the serialization filter should > be configured by default to accept only JDK classes and Geode classes. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9871) CI failure: InfoStatsIntegrationTest > networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond
[ https://issues.apache.org/jira/browse/GEODE-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456656#comment-17456656 ] ASF subversion and git services commented on GEODE-9871: Commit c65f048b5327fcd36694dfe9ab20251ed944eeb1 in geode's branch refs/heads/develop from Jens Deppe [ https://gitbox.apache.org/repos/asf?p=geode.git;h=c65f048 ] GEODE-9871: Improve Radish test for network KB/s verification (#7170) > CI failure: InfoStatsIntegrationTest > > networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond > > > Key: GEODE-9871 > URL: https://issues.apache.org/jira/browse/GEODE-9871 > Project: Geode > Issue Type: Bug > Components: redis, statistics >Affects Versions: 1.15.0 >Reporter: Bill Burcham >Assignee: Jens Deppe >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > link: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/integration-test-openjdk8/builds/38] > stack trace: > {code:java} > InfoStatsIntegrationTest > > networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond > FAILED > org.opentest4j.AssertionFailedError: > expected: 0.0 > but was: 0.01 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.redis.internal.commands.executor.server.AbstractRedisInfoStatsIntegrationTest.networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond(AbstractRedisInfoStatsIntegrationTest.java:228) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndC
[jira] [Assigned] (GEODE-9854) Orphaned .drf files causing memory leak
[ https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darrel Schneider reassigned GEODE-9854: --- Assignee: Darrel Schneider > Orphaned .drf files causing memory leak > --- > > Key: GEODE-9854 > URL: https://issues.apache.org/jira/browse/GEODE-9854 > Project: Geode > Issue Type: Bug >Reporter: Jakov Varenina >Assignee: Darrel Schneider >Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, server1.log > > > Issue: > An OpLog files are compacted, but the .drf file is left because it contains > deletes ofentries in previous .crfs. The .crf file is deleted, but the > orphaned .drf is not until all > previous .crf files (.crfs with smaller id) are deleted. > The problem is that compacted Oplog object representing orphaned .drf file > holds a structure in memory (Oplog.regionMap) that contains information that > is not useful > after the compaction and it takes certain amount of memory. Besides, there is > a race condition in the code when creating .krf files that, depending on the > execution order, > could make the problem more severe (it could leave pendingKrfTags structure > on the regionMap and this could take up a significant amount of memory). This > pendingKrfTags HashMap is actually empty, but consumes memory because it was > used previously and the size of the HashMap was not reduced after it is > cleared. > This race condition usually happens when new Oplog is rolled out and previous > Oplog is immediately marked as eligible for compaction. Compaction and .krf > creation start at > the similar time and compactor cancels creation of .krf if it is executed > first. The pendingKrfTags structure is usually cleared when .krf file is > created, but sincecompaction canceled creation of .krf, the pendingKrfTags > structure remain in memory until Oplog representing orphaned .drf file is > deleted. > Below it can be see that actually .krf is never created for the orphaned .drf > Oplog object that has memory allocated in pendingKrfTags: > {code:java} > server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 > tid=0x34] Created oplog#129 > drf for disk store store1. > server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 > tid=0x34] Created oplog#129 > crf for disk store store1. > server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction > oplog id(s): oplog#129 > server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates > in 41 ms > server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1. > {code} > !screenshot-1.png|width=1123,height=268! > Below you can see the log and heap dump of orphaned .drf Oplg that dont have > pendingKrfTags allocated in memory. This is because pendingKrfTags is cleared > when .krf is created as can be seen in below logs. > {code:java} > server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 > tid=0x34] Created oplog#130 > drf for disk store store1. > server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 > tid=0x34] Created oplog#130 > crf for disk store store1. > server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1. > server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for > store1 compaction oplog id(s): oplog#130 > server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 > creates and updates in 64 ms > server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1. > server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1. > {code} > !screenshot-2.png|width=1123,height=268! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9854) Orphaned .drf files causing memory leak
[ https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456669#comment-17456669 ] ASF subversion and git services commented on GEODE-9854: Commit 324ed89c3d43a53466cf5aeb614b63e757ba8b23 in geode's branch refs/heads/develop from Jakov Varenina [ https://gitbox.apache.org/repos/asf?p=geode.git;h=324ed89 ] GEODE-9854: Orphaned .drf file causing memory leak (#7145) * GEODE-9854: Orphaned .drf file causing memory leak Issue: An OpLog files are compacted, but the .drf file is left because it contains deletes of entries in previous .crfs. The .crf file is deleted, but the orphaned .drf is not until all previous .crf files (.crfs with smaller id) are deleted. The problem is that compacted Oplog object representing orphaned .drf file holds a structure in memory (Oplog.regionMap) that contains information that is not useful after the compaction and it takes certain amount of memory. Besides, there is a race condition in the code when creating .krf files that, depending on the execution order, could make the problem more severe (it could leave pendingKrfTags structure on the regionMap and this could take up a significant amount of memory). This pendingKrfTags HashMap is actually empty, but consumes memory because it was used previously and the size of the HashMap was not reduced after it is cleared. This race condition usually happens when new Oplog is rolled out and previous Oplog is immediately marked as eligible for compaction. Compaction and .krf creation start at the similar time and compactor cancels creation of .krf if it is executed first. The pendingKrfTags structure is usually cleared when .krf file is created, but since compaction canceled creation of .krf, the pendingKrfTags structure remain in memory until Oplog representing orphaned .drf file is deleted. Solution: Clear the regionMap data structure of the Oplog when it is compacted (currently it is deleted when the Oplog is destroyed). * introduced inner static class RegionMap in Oplog * RegionMap.get() will return always empty map if it was closed before * When closing disk region skip adding only drf oplog to unrecovered map and also don't try to remove it from regionMap (it was already removed during compaction). * Following test cases are introduced: 1. Recovery of single region after cache is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterCacheClosed) 2. Recovery of single region after region is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterRegionClose) Co-authored-by: Alberto Gomez > Orphaned .drf files causing memory leak > --- > > Key: GEODE-9854 > URL: https://issues.apache.org/jira/browse/GEODE-9854 > Project: Geode > Issue Type: Bug >Reporter: Jakov Varenina >Assignee: Jakov Varenina >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > Attachments: screenshot-1.png, screenshot-2.png, server1.log > > > Issue: > An OpLog files are compacted, but the .drf file is left because it contains > deletes ofentries in previous .crfs. The .crf file is deleted, but the > orphaned .drf is not until all > previous .crf files (.crfs with smaller id) are deleted. > The problem is that compacted Oplog object representing orphaned .drf file > holds a structure in memory (Oplog.regionMap) that contains information that > is not useful > after the compaction and it takes certain amount of memory. Besides, there is > a race condition in the code when creating .krf files that, depending on the > execution order, > could make the problem more severe (it could leave pendingKrfTags structure > on the regionMap and this could take up a significant amount of memory). This > pendingKrfTags HashMap is actually empty, but consumes memory because it was > used previously and the size of the HashMap was not reduced after it is > cleared. > This race condition usually happens when new Oplog is rolled out and previous > Oplog is immediately marked as eligible for compaction. Compaction and .krf > creation start at > the similar time and compactor cancels creation of .krf if it is executed > first. The pendingKrfTags structure is usually cleared when .krf file is > created, but sincecompaction canceled creation of .krf, the pendingKrfTags > structure remain in memory until Oplog representing orphaned .drf file is > deleted. > Below it can be see that actually .krf is never created for the orphaned .drf > Oplog object that has memory allocated in pendingKrfTags: > {code:java} > server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 > tid=0x34] Created oplog#129 > drf for disk store store1. > server1
[jira] [Resolved] (GEODE-9854) Orphaned .drf files causing memory leak
[ https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darrel Schneider resolved GEODE-9854. - Fix Version/s: 1.15.0 Assignee: Jakov Varenina (was: Darrel Schneider) Resolution: Fixed > Orphaned .drf files causing memory leak > --- > > Key: GEODE-9854 > URL: https://issues.apache.org/jira/browse/GEODE-9854 > Project: Geode > Issue Type: Bug >Reporter: Jakov Varenina >Assignee: Jakov Varenina >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > Attachments: screenshot-1.png, screenshot-2.png, server1.log > > > Issue: > An OpLog files are compacted, but the .drf file is left because it contains > deletes ofentries in previous .crfs. The .crf file is deleted, but the > orphaned .drf is not until all > previous .crf files (.crfs with smaller id) are deleted. > The problem is that compacted Oplog object representing orphaned .drf file > holds a structure in memory (Oplog.regionMap) that contains information that > is not useful > after the compaction and it takes certain amount of memory. Besides, there is > a race condition in the code when creating .krf files that, depending on the > execution order, > could make the problem more severe (it could leave pendingKrfTags structure > on the regionMap and this could take up a significant amount of memory). This > pendingKrfTags HashMap is actually empty, but consumes memory because it was > used previously and the size of the HashMap was not reduced after it is > cleared. > This race condition usually happens when new Oplog is rolled out and previous > Oplog is immediately marked as eligible for compaction. Compaction and .krf > creation start at > the similar time and compactor cancels creation of .krf if it is executed > first. The pendingKrfTags structure is usually cleared when .krf file is > created, but sincecompaction canceled creation of .krf, the pendingKrfTags > structure remain in memory until Oplog representing orphaned .drf file is > deleted. > Below it can be see that actually .krf is never created for the orphaned .drf > Oplog object that has memory allocated in pendingKrfTags: > {code:java} > server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 > tid=0x34] Created oplog#129 > drf for disk store store1. > server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 > tid=0x34] Created oplog#129 > crf for disk store store1. > server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction > oplog id(s): oplog#129 > server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates > in 41 ms > server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1. > {code} > !screenshot-1.png|width=1123,height=268! > Below you can see the log and heap dump of orphaned .drf Oplg that dont have > pendingKrfTags allocated in memory. This is because pendingKrfTags is cleared > when .krf is created as can be seen in below logs. > {code:java} > server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 > tid=0x34] Created oplog#130 > drf for disk store store1. > server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 > tid=0x34] Created oplog#130 > crf for disk store store1. > server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1. > server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for > store1 compaction oplog id(s): oplog#130 > server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 > creates and updates in 64 ms > server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1. > server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1. > {code} > !screenshot-2.png|width=1123,height=268! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9882) User Guide, Micrometer section, fix product_name typo
Dave Barnes created GEODE-9882: -- Summary: User Guide, Micrometer section, fix product_name typo Key: GEODE-9882 URL: https://issues.apache.org/jira/browse/GEODE-9882 Project: Geode Issue Type: Bug Components: docs Reporter: Dave Barnes On page https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html, the product name fails to display due to a typo in the variable syntax. Fix it. There are other types of meters available in Micrometer, but they are not currently being used in . Should be "used in Apache Geode." Change `<%vars.product_name%>` to `<%=vars.product_name%>`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (GEODE-9882) User Guide, Micrometer section, fix product_name typo
[ https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Barnes reassigned GEODE-9882: -- Assignee: Dave Barnes > User Guide, Micrometer section, fix product_name typo > - > > Key: GEODE-9882 > URL: https://issues.apache.org/jira/browse/GEODE-9882 > Project: Geode > Issue Type: Bug > Components: docs >Reporter: Dave Barnes >Assignee: Dave Barnes >Priority: Major > > On page > https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html, > the product name fails to display due to a typo in the variable syntax. Fix > it. > There are other types of meters available in Micrometer, but they are not > currently being used in . > Should be "used in Apache Geode." > Change `<%vars.product_name%>` to `<%=vars.product_name%>`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9882) User Guide, Micrometer section, fix product_name typo
[ https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-9882: -- Labels: pull-request-available (was: ) > User Guide, Micrometer section, fix product_name typo > - > > Key: GEODE-9882 > URL: https://issues.apache.org/jira/browse/GEODE-9882 > Project: Geode > Issue Type: Bug > Components: docs >Reporter: Dave Barnes >Assignee: Dave Barnes >Priority: Major > Labels: pull-request-available > > On page > https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html, > the product name fails to display due to a typo in the variable syntax. Fix > it. > There are other types of meters available in Micrometer, but they are not > currently being used in . > Should be "used in Apache Geode." > Change `<%vars.product_name%>` to `<%=vars.product_name%>`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9882) User Guide, Micrometer section, fix product_name typo
[ https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456679#comment-17456679 ] ASF subversion and git services commented on GEODE-9882: Commit 3b133c3088a2397c19c935979aa2ab2fd751a765 in geode's branch refs/heads/develop from Dave Barnes [ https://gitbox.apache.org/repos/asf?p=geode.git;h=3b133c3 ] GEODE-9882: User Guide, Micrometer section, fix product_name typo (#7181) > User Guide, Micrometer section, fix product_name typo > - > > Key: GEODE-9882 > URL: https://issues.apache.org/jira/browse/GEODE-9882 > Project: Geode > Issue Type: Bug > Components: docs >Reporter: Dave Barnes >Assignee: Dave Barnes >Priority: Major > Labels: pull-request-available > > On page > https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html, > the product name fails to display due to a typo in the variable syntax. Fix > it. > There are other types of meters available in Micrometer, but they are not > currently being used in . > Should be "used in Apache Geode." > Change `<%vars.product_name%>` to `<%=vars.product_name%>`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9882) User Guide, Micrometer section, fix product_name typo
[ https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456682#comment-17456682 ] ASF subversion and git services commented on GEODE-9882: Commit 47465165256e076112cfcaaadeb7aa365cb1b29d in geode's branch refs/heads/support/1.12 from Dave Barnes [ https://gitbox.apache.org/repos/asf?p=geode.git;h=4746516 ] GEODE-9882: User Guide, Micrometer section, fix product_name typo (#7181) > User Guide, Micrometer section, fix product_name typo > - > > Key: GEODE-9882 > URL: https://issues.apache.org/jira/browse/GEODE-9882 > Project: Geode > Issue Type: Bug > Components: docs >Reporter: Dave Barnes >Assignee: Dave Barnes >Priority: Major > Labels: pull-request-available > > On page > https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html, > the product name fails to display due to a typo in the variable syntax. Fix > it. > There are other types of meters available in Micrometer, but they are not > currently being used in . > Should be "used in Apache Geode." > Change `<%vars.product_name%>` to `<%=vars.product_name%>`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9882) User Guide, Micrometer section, fix product_name typo
[ https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456683#comment-17456683 ] ASF subversion and git services commented on GEODE-9882: Commit baacba121f98dcc860bdb954550e6e01c4d9e6e4 in geode's branch refs/heads/support/1.13 from Dave Barnes [ https://gitbox.apache.org/repos/asf?p=geode.git;h=baacba1 ] GEODE-9882: User Guide, Micrometer section, fix product_name typo (#7181) > User Guide, Micrometer section, fix product_name typo > - > > Key: GEODE-9882 > URL: https://issues.apache.org/jira/browse/GEODE-9882 > Project: Geode > Issue Type: Bug > Components: docs >Reporter: Dave Barnes >Assignee: Dave Barnes >Priority: Major > Labels: pull-request-available > > On page > https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html, > the product name fails to display due to a typo in the variable syntax. Fix > it. > There are other types of meters available in Micrometer, but they are not > currently being used in . > Should be "used in Apache Geode." > Change `<%vars.product_name%>` to `<%=vars.product_name%>`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (GEODE-9882) User Guide, Micrometer section, fix product_name typo
[ https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Barnes resolved GEODE-9882. Fix Version/s: 1.12.6 1.13.5 1.14.1 1.15.0 Resolution: Fixed > User Guide, Micrometer section, fix product_name typo > - > > Key: GEODE-9882 > URL: https://issues.apache.org/jira/browse/GEODE-9882 > Project: Geode > Issue Type: Bug > Components: docs >Reporter: Dave Barnes >Assignee: Dave Barnes >Priority: Major > Labels: pull-request-available > Fix For: 1.12.6, 1.13.5, 1.14.1, 1.15.0 > > > On page > https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html, > the product name fails to display due to a typo in the variable syntax. Fix > it. > There are other types of meters available in Micrometer, but they are not > currently being used in . > Should be "used in Apache Geode." > Change `<%vars.product_name%>` to `<%=vars.product_name%>`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9882) User Guide, Micrometer section, fix product_name typo
[ https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456685#comment-17456685 ] ASF subversion and git services commented on GEODE-9882: Commit 6b0413ccdf22d216f9f8d855b6159ecaff29c1ce in geode's branch refs/heads/support/1.14 from Dave Barnes [ https://gitbox.apache.org/repos/asf?p=geode.git;h=6b0413c ] GEODE-9882: User Guide, Micrometer section, fix product_name typo (#7181) > User Guide, Micrometer section, fix product_name typo > - > > Key: GEODE-9882 > URL: https://issues.apache.org/jira/browse/GEODE-9882 > Project: Geode > Issue Type: Bug > Components: docs >Reporter: Dave Barnes >Assignee: Dave Barnes >Priority: Major > Labels: pull-request-available > Fix For: 1.12.6, 1.13.5, 1.14.1, 1.15.0 > > > On page > https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html, > the product name fails to display due to a typo in the variable syntax. Fix > it. > There are other types of meters available in Micrometer, but they are not > currently being used in . > Should be "used in Apache Geode." > Change `<%vars.product_name%>` to `<%=vars.product_name%>`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9822) Split-brain Certain During Network Partition in Two-Locator Cluster
[ https://issues.apache.org/jira/browse/GEODE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-9822: Description: In a two-locator cluster with default member weights and default setting (true) of enable-network-partition-detection, if a long-lived network partition separates the two members, a split-brain will arise: there will be two coordinators at the same time. The reason for this can be found in the GMSJoinLeave.isNetworkPartition() method. That method's name is misleading. A name like isMajorityLost() would probably be more apt. It needs to return true iff the weight of "crashed" members (in the prospective view) is greater-than-or-equal-to half (50%) of the total weight (of all members in the current view). What the method actually does is return true iff the weight of "crashed" members is greater-than 51% of the total weight. As a result, if we have two members of equal weight, and the coordinator sees that the non-coordinator is "crashed", the coordinator will keep running. If a network partition is happening, and the non-coordinator is still running, then it will become a coordinator and start producing views. Now we'll have two coordinators producing views concurrently. For this discussion "crashed" members are members for which the coordinator has received a RemoveMemberRequest message. These are members that the failure detector has deemed failed. Keep in mind the failure detector is imperfect (it's not always right), and that's kind of the whole point of this ticket: we've lost contact with the non-coordinator member, but that doesn't mean it can't still be running (on the other side of a partition). This bug is not limited to the two-locator scenario. Any set of members that can be partitioned into two equal sets is susceptible. In fact it's even a little worse than that. Any set of members that can be partitioned into two sets, both of which still have 49% or more of the total weight, will result in a split-brain. was: In a two-locator cluster with default member weights and default setting (true) of enable-network-partition-detection, if a long-lived network partition separates the two members, a split-brain will arise: there will be two coordinators at the same time. The reason for this can be found in the GMSJoinLeave.isNetworkPartition() method. That method's name is misleading. A name like isMajorityLost() would probably be more apt. It needs to return true iff the weight of "crashed" members (in the prospective view) is greater-than-or-equal-to half (50%) of the total weight (of all members in the current view). What the method actually does is return true iff the weight of "crashed" members is greater-than 51% of the total weight. As a result, if we have two members of equal weight, and the coordinator sees that the non-coordinator is "crashed", the coordinator will keep running. If a network partition is happening, and the non-coordinator is still running, then it will become a coordinator and start producing views. Now we'll have two coordinators producing views concurrently. For this discussion "crashed" members are members for which the coordinator has received a RemoveMemberRequest message. These are members that the failure detector has deemed failed. Keep in mind the failure detector is imperfect (it's not always right), and that's kind of the whole point of this ticket: we've lost contact with the non-coordinator member, but that doesn't mean it can't still be running (on the other side of a partition). > Split-brain Certain During Network Partition in Two-Locator Cluster > --- > > Key: GEODE-9822 > URL: https://issues.apache.org/jira/browse/GEODE-9822 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bill Burcham >Priority: Major > Labels: pull-request-available > > In a two-locator cluster with default member weights and default setting > (true) of enable-network-partition-detection, if a long-lived network > partition separates the two members, a split-brain will arise: there will be > two coordinators at the same time. > The reason for this can be found in the GMSJoinLeave.isNetworkPartition() > method. That method's name is misleading. A name like isMajorityLost() would > probably be more apt. It needs to return true iff the weight of "crashed" > members (in the prospective view) is greater-than-or-equal-to half (50%) of > the total weight (of all members in the current view). > What the method actually does is return true iff the weight of "crashed" > members is greater-than 51% of the total weight. As a result, if we have two > members of equal weight, and the coordinator sees that the non-coordinator is > "crashed", the coordina
[jira] [Updated] (GEODE-9883) Review and Cleanup geode_for_redis.html.md.erb
[ https://issues.apache.org/jira/browse/GEODE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wayne updated GEODE-9883: - Affects Version/s: 1.15.0 > Review and Cleanup geode_for_redis.html.md.erb > -- > > Key: GEODE-9883 > URL: https://issues.apache.org/jira/browse/GEODE-9883 > Project: Geode > Issue Type: Task > Components: redis >Affects Versions: 1.15.0 >Reporter: Wayne >Priority: Major > > Looking at what we have in the geode docs, a few things that are out of date. > * This page still references the geode-for-redis-password, which doesn't > exist any more. It should probably talk about how redis interacts with the > security manager. > * Probably should mention how to configure TLS properties for > geode-for-redis. > * The redis-cli command I think is missing a -c option to use cluster mode. > * The supported redis commands listed in that page is incomplete > * Advantages section doesn't mention synchronous replication -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9883) Review and Cleanup geode_for_redis.html.md.erb
Wayne created GEODE-9883: Summary: Review and Cleanup geode_for_redis.html.md.erb Key: GEODE-9883 URL: https://issues.apache.org/jira/browse/GEODE-9883 Project: Geode Issue Type: Task Components: redis Reporter: Wayne Looking at what we have in the geode docs, a few things that are out of date. * This page still references the geode-for-redis-password, which doesn't exist any more. It should probably talk about how redis interacts with the security manager. * Probably should mention how to configure TLS properties for geode-for-redis. * The redis-cli command I think is missing a -c option to use cluster mode. * The supported redis commands listed in that page is incomplete * Advantages section doesn't mention synchronous replication -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9883) Review and Cleanup geode_for_redis.html.md.erb
[ https://issues.apache.org/jira/browse/GEODE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wayne updated GEODE-9883: - Labels: release-blocker (was: ) > Review and Cleanup geode_for_redis.html.md.erb > -- > > Key: GEODE-9883 > URL: https://issues.apache.org/jira/browse/GEODE-9883 > Project: Geode > Issue Type: Task > Components: redis >Affects Versions: 1.15.0 >Reporter: Wayne >Priority: Major > Labels: release-blocker > > Looking at what we have in the geode docs, a few things that are out of date. > * This page still references the geode-for-redis-password, which doesn't > exist any more. It should probably talk about how redis interacts with the > security manager. > * Probably should mention how to configure TLS properties for > geode-for-redis. > * The redis-cli command I think is missing a -c option to use cluster mode. > * The supported redis commands listed in that page is incomplete > * Advantages section doesn't mention synchronous replication -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9880) Cluster with multiple locators in an environment with no host name resolution, leads to null pointer exception
[ https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ernest Burghardt updated GEODE-9880: Labels: membership (was: ) > Cluster with multiple locators in an environment with no host name > resolution, leads to null pointer exception > -- > > Key: GEODE-9880 > URL: https://issues.apache.org/jira/browse/GEODE-9880 > Project: Geode > Issue Type: Bug > Components: locator >Affects Versions: 1.12.5 >Reporter: Tigran Ghahramanyan >Priority: Major > Labels: membership > > In our use case we have two locators that are initially configured with IP > addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on > adding their corresponding host names to the locators list, while these host > names are not resolvable. > Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client > tries to use such non resolvable host name to connect to a locator it tries > to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in > {_}SocketCreator.connect(){_}. Which seems strange. > Then, if there is no locator running on the same host, the next locator in > the list is contacted, until reaching a locator contact configured with IP > address - which succeeds eventually. > But, when there happens to be a locator listening on the same host, then we > have a null pointer exception in the second line below, because _inetadd=null_ > _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, > connects to a locator listening on the same host_ > _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd > = null_ > > As a result, the cluster comes to a failed state, unable to recover. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9884) update CI max_in_flight limits
Owen Nichols created GEODE-9884: --- Summary: update CI max_in_flight limits Key: GEODE-9884 URL: https://issues.apache.org/jira/browse/GEODE-9884 Project: Geode Issue Type: Improvement Components: ci Reporter: Owen Nichols max_in_flight limits are set on the main CI pipeline to avoid overloading concourse when a large number of commits are coming through at the same time these limits were last calculated a few years ago based on avg time each jobs takes, and many jobs now take much longer and should be recalculated -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (GEODE-9738) CI failure: RollingUpgradeRollServersOnReplicatedRegion_dataserializable failed with DistributedSystemDisconnectedException
[ https://issues.apache.org/jira/browse/GEODE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ernest Burghardt reassigned GEODE-9738: --- Assignee: (was: Bill Burcham) > CI failure: RollingUpgradeRollServersOnReplicatedRegion_dataserializable > failed with DistributedSystemDisconnectedException > --- > > Key: GEODE-9738 > URL: https://issues.apache.org/jira/browse/GEODE-9738 > Project: Geode > Issue Type: Bug > Components: membership, messaging >Affects Versions: 1.15.0 >Reporter: Kamilla Aslami >Priority: Major > Labels: needsTriage > Attachments: GEODE-9738-short.log.all, controller.log, locator.log, > vm0.log, vm1.log, vm2.log, vm3.log > > > {noformat} > RollingUpgradeRollServersOnReplicatedRegion_dataserializable > > testRollServersOnReplicatedRegion_dataserializable[from_v1.13.4] FAILED > java.lang.AssertionError: Suspicious strings were written to the log > during this run. > Fix the strings or use IgnoredException.addIgnoredException to ignore. > --- > Found suspect string in 'dunit_suspect-vm2.log' at line 685[fatal > 2021/10/14 00:24:14.739 UTC tid=115] Uncaught exception > in thread Thread[FederatingManager6,5,RMI Runtime] > org.apache.geode.management.ManagementException: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > Distribution manager on > heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator):54751 > started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated > at > org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:486) > at > org.apache.geode.management.internal.FederatingManager$AddMemberTask.call(FederatingManager.java:596) > at > org.apache.geode.management.internal.FederatingManager.lambda$addMember$1(FederatingManager.java:199) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: > org.apache.geode.distributed.DistributedSystemDisconnectedException: > Distribution manager on > heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator):54751 > started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated > at > org.apache.geode.distributed.internal.ClusterDistributionManager$Stopper.generateCancelledException(ClusterDistributionManager.java:2885) > at > org.apache.geode.distributed.internal.InternalDistributedSystem$Stopper.generateCancelledException(InternalDistributedSystem.java:1177) > at > org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5212) > at > org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83) > at > org.apache.geode.internal.cache.CreateRegionProcessor.initializeRegion(CreateRegionProcessor.java:121) > at > org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1164) > at > org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1095) > at > org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3108) > at > org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:78) > at > org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:429) > ... 5 more > at org.junit.Assert.fail(Assert.java:89) > at > org.apache.geode.test.dunit.internal.DUnitLauncher.closeAndCheckForSuspects(DUnitLauncher.java:420) > at > org.apache.geode.test.dunit.internal.DUnitLauncher.closeAndCheckForSuspects(DUnitLauncher.java:436) > at > org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.cleanupAllVms(JUnit4DistributedTestCase.java:551) > at > org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.doTearDownDistributedTestCase(JUnit4DistributedTestCase.java:498) > at > org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.tearDownDistributedTestCase(JUnit4DistributedTestCase.java:481) > at jdk.internal.reflect.GeneratedMethodAccessor11.invoke(Unknown > Source) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.m
[jira] [Assigned] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
[ https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Deppe reassigned GEODE-9877: - Assignee: Jens Deppe > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed > -- > > Key: GEODE-9877 > URL: https://issues.apache.org/jira/browse/GEODE-9877 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Mark Hanson >Assignee: Jens Deppe >Priority: Major > Labels: pull-request-available > > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43] > failed with > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse > {noformat} > java.net.BindException: Address already in use (Bind failed) > at java.net.PlainSocketImpl.socketBind(Native Method) > at > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) > at java.net.Socket.bind(Socket.java:662) > at > org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > a
[jira] [Commented] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
[ https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456712#comment-17456712 ] ASF subversion and git services commented on GEODE-9877: Commit 310c647da6ee4cc4a1eadc6df174d998e69afb31 in geode's branch refs/heads/develop from Jens Deppe [ https://gitbox.apache.org/repos/asf?p=geode.git;h=310c647 ] GEODE-9877: Use ServerSocket to create interfering port (#7180) - For some unknown reason `startupFailsGivenPortAlreadyInUse` started to fail after a seemingly innocuous Ubuntu base image bump. The problem may also have been triggered by arbitrary test ordering changes since the test did not fail on its own, but only in conjunction with running other tests beforehand. Specifically, the test was failing when binding the interfering port (bind exception). The port used was always in the TIME_WAIT state left from previous tests. Using a `ServerSocket`, instead of a regular socket, fixes the problem since it actually 'uses' the port and implicitly allows for port reuse. - Use ServerSocket consistently. Rename test to be more appropriate > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed > -- > > Key: GEODE-9877 > URL: https://issues.apache.org/jira/browse/GEODE-9877 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Mark Hanson >Priority: Major > Labels: pull-request-available > > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43] > failed with > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse > {noformat} > java.net.BindException: Address already in use (Bind failed) > at java.net.PlainSocketImpl.socketBind(Native Method) > at > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) > at java.net.Socket.bind(Socket.java:662) > at > org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCo
[jira] [Resolved] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
[ https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Deppe resolved GEODE-9877. --- Fix Version/s: 1.15.0 Resolution: Fixed > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed > -- > > Key: GEODE-9877 > URL: https://issues.apache.org/jira/browse/GEODE-9877 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Mark Hanson >Assignee: Jens Deppe >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43] > failed with > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse > {noformat} > java.net.BindException: Address already in use (Bind failed) > at java.net.PlainSocketImpl.socketBind(Native Method) > at > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) > at java.net.Socket.bind(Socket.java:662) > at > org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$exe
[jira] [Commented] (GEODE-9885) StringsDUnitTest.givenBucketsMoveDuringAppend_thenDataIsNotLost fails with duplicated append
[ https://issues.apache.org/jira/browse/GEODE-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456713#comment-17456713 ] Geode Integration commented on GEODE-9885: -- Seen in [distributed-test-openjdk11 #47|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/distributed-test-openjdk11/builds/47] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0723/test-results/distributedTest/1639077554/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0723/test-artifacts/1639077554/distributedtestfiles-openjdk11-1.15.0-build.0723.tgz]. > StringsDUnitTest.givenBucketsMoveDuringAppend_thenDataIsNotLost fails with > duplicated append > > > Key: GEODE-9885 > URL: https://issues.apache.org/jira/browse/GEODE-9885 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Ray Ingles >Priority: Major > > The test appends a lot of strings to a key. It wound up adding (at least one) > extra string to the stored string: > > {{java.util.concurrent.ExecutionException: java.lang.AssertionError: > unexpected -\{append0}-key-3-27680- at index 27681 iterationCount=61995 in > string}} > > The string "\{append0}-key-3-27680-" appeared twice in sequence. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9885) StringsDUnitTest.givenBucketsMoveDuringAppend_thenDataIsNotLost fails with duplicated append
Ray Ingles created GEODE-9885: - Summary: StringsDUnitTest.givenBucketsMoveDuringAppend_thenDataIsNotLost fails with duplicated append Key: GEODE-9885 URL: https://issues.apache.org/jira/browse/GEODE-9885 Project: Geode Issue Type: Bug Components: redis Affects Versions: 1.15.0 Reporter: Ray Ingles The test appends a lot of strings to a key. It wound up adding (at least one) extra string to the stored string: {{java.util.concurrent.ExecutionException: java.lang.AssertionError: unexpected -\{append0}-key-3-27680- at index 27681 iterationCount=61995 in string}} The string "\{append0}-key-3-27680-" appeared twice in sequence. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (GEODE-9872) DistTXPersistentDebugDUnitTest tests fail because "cluster configuration service not available"
[ https://issues.apache.org/jira/browse/GEODE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Emery resolved GEODE-9872. --- Fix Version/s: 1.15.0 Resolution: Fixed > DistTXPersistentDebugDUnitTest tests fail because "cluster configuration > service not available" > --- > > Key: GEODE-9872 > URL: https://issues.apache.org/jira/browse/GEODE-9872 > Project: Geode > Issue Type: Bug > Components: tests >Reporter: Bill Burcham >Assignee: Dale Emery >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Fix For: 1.15.0 > > > I suspect this failure is due to something in the test framework, or perhaps > one or more tests failing to manage ports correctly, allowing two or more > tests to interfere with one another. > In this distributed test: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/388] > we see two failures. Here's the first full stack trace: > > > {code:java} > [error 2021/12/04 20:40:53.796 UTC > tid=33] org.apache.geode.GemFireConfigException: cluster configuration > service not available > at > org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:196) > at > org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:226) > at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:192) > at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:79) > at > org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87) > at > org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225) > at > org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72) > at > org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222) > at > org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > at > org.junit.p
[jira] [Resolved] (GEODE-9622) CI Failure: ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails with BindException
[ https://issues.apache.org/jira/browse/GEODE-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Emery resolved GEODE-9622. --- Fix Version/s: 1.15.0 Resolution: Fixed > CI Failure: > ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails > with BindException > -- > > Key: GEODE-9622 > URL: https://issues.apache.org/jira/browse/GEODE-9622 > Project: Geode > Issue Type: Bug > Components: client/server, tests >Reporter: Kirk Lund >Assignee: Dale Emery >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Fix For: 1.15.0 > > > {noformat} > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest > > clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest$$Lambda$68/194483270.call > in VM 5 running on Host > heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal > with 6 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631) > at org.apache.geode.test.dunit.VM.invoke(VM.java:473) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.rollLocatorToCurrent(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.setupPartiallyRolledVersion(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:171) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:127) > Caused by: > java.net.BindException: Failed to create server socket on > heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal/10.0.0.60[43535] > at > org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:75) > at > org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:55) > at > org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:54) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.initializeServerSocket(TcpServer.java:196) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.startServerThread(TcpServer.java:183) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:178) > at > org.apache.geode.distributed.internal.membership.gms.locator.MembershipLocatorImpl.start(MembershipLocatorImpl.java:112) > at > org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:653) > at > org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:394) > at > org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:345) > at > org.apache.geode.distributed.Locator.startLocator(Locator.java:261) > at > org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:207) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.startLocator(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:180) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.lambda$rollLocatorToCurrent$92d5d92a$1(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216) > Caused by: > java.net.BindException: Address already in use (Bind failed) > at java.net.PlainSocketImpl.socketBind(Native Method) > at > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) > at java.net.ServerSocket.bind(ServerSocket.java:390) > at > org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:72) > ... 13 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (GEODE-9622) CI Failure: ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails with BindException
[ https://issues.apache.org/jira/browse/GEODE-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Emery closed GEODE-9622. - > CI Failure: > ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails > with BindException > -- > > Key: GEODE-9622 > URL: https://issues.apache.org/jira/browse/GEODE-9622 > Project: Geode > Issue Type: Bug > Components: client/server, tests >Reporter: Kirk Lund >Assignee: Dale Emery >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Fix For: 1.15.0 > > > {noformat} > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest > > clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest$$Lambda$68/194483270.call > in VM 5 running on Host > heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal > with 6 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631) > at org.apache.geode.test.dunit.VM.invoke(VM.java:473) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.rollLocatorToCurrent(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.setupPartiallyRolledVersion(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:171) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:127) > Caused by: > java.net.BindException: Failed to create server socket on > heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal/10.0.0.60[43535] > at > org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:75) > at > org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:55) > at > org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:54) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.initializeServerSocket(TcpServer.java:196) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.startServerThread(TcpServer.java:183) > at > org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:178) > at > org.apache.geode.distributed.internal.membership.gms.locator.MembershipLocatorImpl.start(MembershipLocatorImpl.java:112) > at > org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:653) > at > org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:394) > at > org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:345) > at > org.apache.geode.distributed.Locator.startLocator(Locator.java:261) > at > org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:207) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.startLocator(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:180) > at > org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.lambda$rollLocatorToCurrent$92d5d92a$1(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216) > Caused by: > java.net.BindException: Address already in use (Bind failed) > at java.net.PlainSocketImpl.socketBind(Native Method) > at > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) > at java.net.ServerSocket.bind(ServerSocket.java:390) > at > org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:72) > ... 13 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (GEODE-9872) DistTXPersistentDebugDUnitTest tests fail because "cluster configuration service not available"
[ https://issues.apache.org/jira/browse/GEODE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dale Emery closed GEODE-9872. - > DistTXPersistentDebugDUnitTest tests fail because "cluster configuration > service not available" > --- > > Key: GEODE-9872 > URL: https://issues.apache.org/jira/browse/GEODE-9872 > Project: Geode > Issue Type: Bug > Components: tests >Reporter: Bill Burcham >Assignee: Dale Emery >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Fix For: 1.15.0 > > > I suspect this failure is due to something in the test framework, or perhaps > one or more tests failing to manage ports correctly, allowing two or more > tests to interfere with one another. > In this distributed test: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/388] > we see two failures. Here's the first full stack trace: > > > {code:java} > [error 2021/12/04 20:40:53.796 UTC > tid=33] org.apache.geode.GemFireConfigException: cluster configuration > service not available > at > org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:196) > at > org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:226) > at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:192) > at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:79) > at > org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87) > at > org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225) > at > org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72) > at > org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222) > at > org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82) > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.withI
[jira] [Created] (GEODE-9886) Remove(key) needs to return bool
Michael Martell created GEODE-9886: -- Summary: Remove(key) needs to return bool Key: GEODE-9886 URL: https://issues.apache.org/jira/browse/GEODE-9886 Project: Geode Issue Type: Bug Components: native client Reporter: Michael Martell The Remove(key) API in c-bindings needs to return a bool. Currently it returns void. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9883) Review and Cleanup geode_for_redis.html.md.erb
[ https://issues.apache.org/jira/browse/GEODE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Donal Evans updated GEODE-9883: --- Labels: blocks-1.15.0 (was: release-blocker) > Review and Cleanup geode_for_redis.html.md.erb > -- > > Key: GEODE-9883 > URL: https://issues.apache.org/jira/browse/GEODE-9883 > Project: Geode > Issue Type: Task > Components: redis >Affects Versions: 1.15.0 >Reporter: Wayne >Priority: Major > Labels: blocks-1.15.0 > > Looking at what we have in the geode docs, a few things that are out of date. > * This page still references the geode-for-redis-password, which doesn't > exist any more. It should probably talk about how redis interacts with the > security manager. > * Probably should mention how to configure TLS properties for > geode-for-redis. > * The redis-cli command I think is missing a -c option to use cluster mode. > * The supported redis commands listed in that page is incomplete > * Advantages section doesn't mention synchronous replication -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-7876) OldFreeListOffHeapRegionJUnitTest testPersistentChangeFromHeapToOffHeap
[ https://issues.apache.org/jira/browse/GEODE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456742#comment-17456742 ] Geode Integration commented on GEODE-7876: -- Seen in [integration-test-openjdk8 #52|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/integration-test-openjdk8/builds/52] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0728/test-results/integrationTest/1639079873/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0728/test-artifacts/1639079873/integrationtestfiles-openjdk8-1.15.0-build.0728.tgz]. > OldFreeListOffHeapRegionJUnitTest testPersistentChangeFromHeapToOffHeap > --- > > Key: GEODE-7876 > URL: https://issues.apache.org/jira/browse/GEODE-7876 > Project: Geode > Issue Type: Bug > Components: tests >Affects Versions: 1.12.0 >Reporter: Mark Hanson >Priority: Major > Labels: GeodeOperationAPI, flaky > > CI Failure > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/IntegrationTestOpenJDK11/builds/1587 > > Test worker" #27 prio=5 os_prio=0 cpu=7943.09ms elapsed=1126.77s > tid=0x7f60e0b7e000 nid=0x19 in Object.wait() [0x7f60a2a4b000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(java.base@11.0.6/Native Method) > - waiting on > at > org.apache.geode.internal.cache.DiskStoreImpl.waitForBackgroundTasks(DiskStoreImpl.java:2630) > - waiting to re-lock in wait() <0xffbb6438> (a > java.util.concurrent.atomic.AtomicInteger) > at > org.apache.geode.internal.cache.DiskStoreImpl.close(DiskStoreImpl.java:2386) > at > org.apache.geode.internal.cache.DiskStoreImpl.close(DiskStoreImpl.java:2296) > at > org.apache.geode.internal.cache.GemFireCacheImpl.closeDiskStores(GemFireCacheImpl.java:2476) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2234) > - locked <0xd0a42a00> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1931) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1921) > at > org.apache.geode.internal.offheap.OffHeapRegionBase.closeCache(OffHeapRegionBase.java:106) > at > org.apache.geode.internal.offheap.OffHeapRegionBase.testPersistentChangeFromHeapToOffHeap(OffHeapRegionBase.java:675) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.6/Native > Method) > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.6/NativeMethodAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.6/DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(java.base@11.0.6/Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.S
[jira] [Commented] (GEODE-9814) Add an example of geode-for-redis to the geode examples project
[ https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456773#comment-17456773 ] ASF GitHub Bot commented on GEODE-9814: --- davebarnes97 commented on pull request #110: URL: https://github.com/apache/geode-examples/pull/110#issuecomment-990336503 > > > However, ../gradlew run failed ... > > > > > > Guessing that I might be missing prerequisites, I installed redis according to the quick-start instructions referenced in the README. The `../gradlew run` command still failed in the same way. > > Of interest, the optional `redis-cli` commands suggested in the README worked like a champ. > > @davebarnes97 Thanks for this Dave. If it's not too much trouble, could you try changing the `SORTED_SET_KEY` constant in geodeForRedis/Example.java to be `SORTED_SET_KEY = "{tag}leaderboard";` and try running again? I'm not able to reproduce this failure on my machine, so it's difficult to tell what might fix it. Per your offline suggestion, I re-ran the example using a version of Geode built from the develop branch. All good. I'll approve the README.md file portion of the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add an example of geode-for-redis to the geode examples project > --- > > Key: GEODE-9814 > URL: https://issues.apache.org/jira/browse/GEODE-9814 > Project: Geode > Issue Type: Improvement > Components: redis >Reporter: Dan Smith >Assignee: Donal Evans >Priority: Major > Labels: pull-request-available > > Add an example to the geode-examples project/repo demonstrating how to turn > on and use geode-for-redis. > This is just a script. User must download native Redis to get command line > tool. > Cluster Mode must be used. > Start Server with gfsh. > Use JedisCluster client to: > * Perform Sets > * Perform Gets > Have a readme that speaks to using native Redis. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9884) update CI max_in_flight limits
[ https://issues.apache.org/jira/browse/GEODE-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-9884: -- Labels: pull-request-available (was: ) > update CI max_in_flight limits > -- > > Key: GEODE-9884 > URL: https://issues.apache.org/jira/browse/GEODE-9884 > Project: Geode > Issue Type: Improvement > Components: ci >Reporter: Owen Nichols >Priority: Major > Labels: pull-request-available > > max_in_flight limits are set on the main CI pipeline to avoid overloading > concourse when a large number of commits are coming through at the same time > these limits were last calculated a few years ago based on avg time each jobs > takes, and many jobs now take much longer and should be recalculated -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9814) Add an example of geode-for-redis to the geode examples project
[ https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456820#comment-17456820 ] ASF subversion and git services commented on GEODE-9814: Commit 380669b16f8b0d40fd265be2e1fcb41f016166fe in geode-examples's branch refs/heads/develop from Donal Evans [ https://gitbox.apache.org/repos/asf?p=geode-examples.git;h=380669b ] GEODE-9814: Add geode-for-redis example (#110) Authored-by: Donal Evans Co-authored-by: Dave Barnes > Add an example of geode-for-redis to the geode examples project > --- > > Key: GEODE-9814 > URL: https://issues.apache.org/jira/browse/GEODE-9814 > Project: Geode > Issue Type: Improvement > Components: redis >Reporter: Dan Smith >Assignee: Donal Evans >Priority: Major > Labels: pull-request-available > > Add an example to the geode-examples project/repo demonstrating how to turn > on and use geode-for-redis. > This is just a script. User must download native Redis to get command line > tool. > Cluster Mode must be used. > Start Server with gfsh. > Use JedisCluster client to: > * Perform Sets > * Perform Gets > Have a readme that speaks to using native Redis. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (GEODE-9814) Add an example of geode-for-redis to the geode examples project
[ https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Donal Evans resolved GEODE-9814. Fix Version/s: 1.15.0 Resolution: Fixed > Add an example of geode-for-redis to the geode examples project > --- > > Key: GEODE-9814 > URL: https://issues.apache.org/jira/browse/GEODE-9814 > Project: Geode > Issue Type: Improvement > Components: redis >Reporter: Dan Smith >Assignee: Donal Evans >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > Add an example to the geode-examples project/repo demonstrating how to turn > on and use geode-for-redis. > This is just a script. User must download native Redis to get command line > tool. > Cluster Mode must be used. > Start Server with gfsh. > Use JedisCluster client to: > * Perform Sets > * Perform Gets > Have a readme that speaks to using native Redis. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9814) Add an example of geode-for-redis to the geode examples project
[ https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456821#comment-17456821 ] ASF GitHub Bot commented on GEODE-9814: --- DonalEvans merged pull request #110: URL: https://github.com/apache/geode-examples/pull/110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add an example of geode-for-redis to the geode examples project > --- > > Key: GEODE-9814 > URL: https://issues.apache.org/jira/browse/GEODE-9814 > Project: Geode > Issue Type: Improvement > Components: redis >Reporter: Dan Smith >Assignee: Donal Evans >Priority: Major > Labels: pull-request-available > > Add an example to the geode-examples project/repo demonstrating how to turn > on and use geode-for-redis. > This is just a script. User must download native Redis to get command line > tool. > Cluster Mode must be used. > Start Server with gfsh. > Use JedisCluster client to: > * Perform Sets > * Perform Gets > Have a readme that speaks to using native Redis. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9880) Cluster with multiple locators in an environment with no host name resolution, leads to null pointer exception
[ https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-9880: Component/s: membership > Cluster with multiple locators in an environment with no host name > resolution, leads to null pointer exception > -- > > Key: GEODE-9880 > URL: https://issues.apache.org/jira/browse/GEODE-9880 > Project: Geode > Issue Type: Bug > Components: locator, membership >Affects Versions: 1.12.5 >Reporter: Tigran Ghahramanyan >Priority: Major > Labels: membership > > In our use case we have two locators that are initially configured with IP > addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on > adding their corresponding host names to the locators list, while these host > names are not resolvable. > Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client > tries to use such non resolvable host name to connect to a locator it tries > to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in > {_}SocketCreator.connect(){_}. Which seems strange. > Then, if there is no locator running on the same host, the next locator in > the list is contacted, until reaching a locator contact configured with IP > address - which succeeds eventually. > But, when there happens to be a locator listening on the same host, then we > have a null pointer exception in the second line below, because _inetadd=null_ > _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, > connects to a locator listening on the same host_ > _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd > = null_ > > As a result, the cluster comes to a failed state, unable to recover. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9822) Split-brain Certain During Network Partition in Two-Locator Cluster
[ https://issues.apache.org/jira/browse/GEODE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-9822: Description: In a two-locator cluster with default member weights and default setting (true) of enable-network-partition-detection, if a long-lived network partition separates the two members, a split-brain will arise: there will be two coordinators at the same time. The reason for this can be found in the GMSJoinLeave.isNetworkPartition() method. That method's name is misleading. A name like isMajorityLost() would probably be more apt. It needs to return true iff the weight of "crashed" members (in the prospective view) is greater-than-or-equal-to half (50%) of the total weight (of all members in the current view). What the method actually does is return true iff the weight of "crashed" members is greater-than 51% of the total weight. As a result, if we have two members of equal weight, and the coordinator sees that the non-coordinator is "crashed", the coordinator will keep running. If a network partition is happening, and the non-coordinator is still running, then it will become a coordinator and start producing views. Now we'll have two coordinators producing views concurrently. For this discussion "crashed" members are members for which the coordinator has received a RemoveMemberRequest message. These are members that the failure detector has deemed failed. Keep in mind the failure detector is imperfect (it's not always right), and that's kind of the whole point of this ticket: we've lost contact with the non-coordinator member, but that doesn't mean it can't still be running (on the other side of a partition). This bug is not limited to the two-locator scenario. Any set of members that can be partitioned into two equal sets is susceptible. In fact it's even a little worse than that. Any set of members that can be partitioned (into more than one set), where any two-or-more sets, each still have 49% or more of the total weight, will result in a split-brain was: In a two-locator cluster with default member weights and default setting (true) of enable-network-partition-detection, if a long-lived network partition separates the two members, a split-brain will arise: there will be two coordinators at the same time. The reason for this can be found in the GMSJoinLeave.isNetworkPartition() method. That method's name is misleading. A name like isMajorityLost() would probably be more apt. It needs to return true iff the weight of "crashed" members (in the prospective view) is greater-than-or-equal-to half (50%) of the total weight (of all members in the current view). What the method actually does is return true iff the weight of "crashed" members is greater-than 51% of the total weight. As a result, if we have two members of equal weight, and the coordinator sees that the non-coordinator is "crashed", the coordinator will keep running. If a network partition is happening, and the non-coordinator is still running, then it will become a coordinator and start producing views. Now we'll have two coordinators producing views concurrently. For this discussion "crashed" members are members for which the coordinator has received a RemoveMemberRequest message. These are members that the failure detector has deemed failed. Keep in mind the failure detector is imperfect (it's not always right), and that's kind of the whole point of this ticket: we've lost contact with the non-coordinator member, but that doesn't mean it can't still be running (on the other side of a partition). This bug is not limited to the two-locator scenario. Any set of members that can be partitioned into two equal sets is susceptible. In fact it's even a little worse than that. Any set of members that can be partitioned into two sets, both of which still have 49% or more of the total weight, will result in a split-brain. > Split-brain Certain During Network Partition in Two-Locator Cluster > --- > > Key: GEODE-9822 > URL: https://issues.apache.org/jira/browse/GEODE-9822 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bill Burcham >Priority: Major > Labels: pull-request-available > > In a two-locator cluster with default member weights and default setting > (true) of enable-network-partition-detection, if a long-lived network > partition separates the two members, a split-brain will arise: there will be > two coordinators at the same time. > The reason for this can be found in the GMSJoinLeave.isNetworkPartition() > method. That method's name is misleading. A name like isMajorityLost() would > probably be more apt. It needs to return true iff the weight of "crashed" > members (in the prospective view) is gre
[jira] [Resolved] (GEODE-9822) Split-brain Certain During Network Partition in Two-Locator Cluster
[ https://issues.apache.org/jira/browse/GEODE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham resolved GEODE-9822. - Fix Version/s: 1.15.0 Resolution: Fixed > Split-brain Certain During Network Partition in Two-Locator Cluster > --- > > Key: GEODE-9822 > URL: https://issues.apache.org/jira/browse/GEODE-9822 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bill Burcham >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > In a two-locator cluster with default member weights and default setting > (true) of enable-network-partition-detection, if a long-lived network > partition separates the two members, a split-brain will arise: there will be > two coordinators at the same time. > The reason for this can be found in the GMSJoinLeave.isNetworkPartition() > method. That method's name is misleading. A name like isMajorityLost() would > probably be more apt. It needs to return true iff the weight of "crashed" > members (in the prospective view) is greater-than-or-equal-to half (50%) of > the total weight (of all members in the current view). > What the method actually does is return true iff the weight of "crashed" > members is greater-than 51% of the total weight. As a result, if we have two > members of equal weight, and the coordinator sees that the non-coordinator is > "crashed", the coordinator will keep running. If a network partition is > happening, and the non-coordinator is still running, then it will become a > coordinator and start producing views. Now we'll have two coordinators > producing views concurrently. > For this discussion "crashed" members are members for which the coordinator > has received a RemoveMemberRequest message. These are members that the > failure detector has deemed failed. Keep in mind the failure detector is > imperfect (it's not always right), and that's kind of the whole point of this > ticket: we've lost contact with the non-coordinator member, but that doesn't > mean it can't still be running (on the other side of a partition). > This bug is not limited to the two-locator scenario. Any set of members that > can be partitioned into two equal sets is susceptible. In fact it's even a > little worse than that. Any set of members that can be partitioned (into more > than one set), where any two-or-more sets, each still have 49% or more of the > total weight, will result in a split-brain -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9822) Split-brain Certain During Network Partition in Two-Locator Cluster
[ https://issues.apache.org/jira/browse/GEODE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456824#comment-17456824 ] ASF subversion and git services commented on GEODE-9822: Commit d89fdf67d091d5bb4e8bc60c9996b667dba3cab3 in geode's branch refs/heads/develop from Bill Burcham [ https://gitbox.apache.org/repos/asf?p=geode.git;h=d89fdf6 ] GEODE-9822: Quorum Calculation Requires Majority (#7126) * Quorum didn't formerly require a majority—it required only 49% of the member weight. * Because of that two member partitions could survive at once, resulting in split-brain. * Quorum now requires a majority. > Split-brain Certain During Network Partition in Two-Locator Cluster > --- > > Key: GEODE-9822 > URL: https://issues.apache.org/jira/browse/GEODE-9822 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bill Burcham >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > In a two-locator cluster with default member weights and default setting > (true) of enable-network-partition-detection, if a long-lived network > partition separates the two members, a split-brain will arise: there will be > two coordinators at the same time. > The reason for this can be found in the GMSJoinLeave.isNetworkPartition() > method. That method's name is misleading. A name like isMajorityLost() would > probably be more apt. It needs to return true iff the weight of "crashed" > members (in the prospective view) is greater-than-or-equal-to half (50%) of > the total weight (of all members in the current view). > What the method actually does is return true iff the weight of "crashed" > members is greater-than 51% of the total weight. As a result, if we have two > members of equal weight, and the coordinator sees that the non-coordinator is > "crashed", the coordinator will keep running. If a network partition is > happening, and the non-coordinator is still running, then it will become a > coordinator and start producing views. Now we'll have two coordinators > producing views concurrently. > For this discussion "crashed" members are members for which the coordinator > has received a RemoveMemberRequest message. These are members that the > failure detector has deemed failed. Keep in mind the failure detector is > imperfect (it's not always right), and that's kind of the whole point of this > ticket: we've lost contact with the non-coordinator member, but that doesn't > mean it can't still be running (on the other side of a partition). > This bug is not limited to the two-locator scenario. Any set of members that > can be partitioned into two equal sets is susceptible. In fact it's even a > little worse than that. Any set of members that can be partitioned (into more > than one set), where any two-or-more sets, each still have 49% or more of the > total weight, will result in a split-brain -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (GEODE-9884) update CI max_in_flight limits
[ https://issues.apache.org/jira/browse/GEODE-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen Nichols resolved GEODE-9884. - Fix Version/s: 1.15.0 Resolution: Fixed > update CI max_in_flight limits > -- > > Key: GEODE-9884 > URL: https://issues.apache.org/jira/browse/GEODE-9884 > Project: Geode > Issue Type: Improvement > Components: ci >Reporter: Owen Nichols >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > max_in_flight limits are set on the main CI pipeline to avoid overloading > concourse when a large number of commits are coming through at the same time > these limits were last calculated a few years ago based on avg time each jobs > takes, and many jobs now take much longer and should be recalculated -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
[ https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456830#comment-17456830 ] ASF subversion and git services commented on GEODE-9877: Commit 9099e1fe70b02886ec7d65d21d8b9c0e60b94677 in geode's branch refs/heads/support/1.14 from Jens Deppe [ https://gitbox.apache.org/repos/asf?p=geode.git;h=9099e1f ] GEODE-9877: Use ServerSocket to create interfering port (#7180) - For some unknown reason `startupFailsGivenPortAlreadyInUse` started to fail after a seemingly innocuous Ubuntu base image bump. The problem may also have been triggered by arbitrary test ordering changes since the test did not fail on its own, but only in conjunction with running other tests beforehand. Specifically, the test was failing when binding the interfering port (bind exception). The port used was always in the TIME_WAIT state left from previous tests. Using a `ServerSocket`, instead of a regular socket, fixes the problem since it actually 'uses' the port and implicitly allows for port reuse. - Use ServerSocket consistently. Rename test to be more appropriate (cherry picked from commit 310c647da6ee4cc4a1eadc6df174d998e69afb31) > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed > -- > > Key: GEODE-9877 > URL: https://issues.apache.org/jira/browse/GEODE-9877 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Mark Hanson >Assignee: Jens Deppe >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43] > failed with > GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse > {noformat} > java.net.BindException: Address already in use (Bind failed) > at java.net.PlainSocketImpl.socketBind(Native Method) > at > java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) > at java.net.Socket.bind(Socket.java:662) > at > org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:18
[jira] [Commented] (GEODE-9851) Use strongly typed enums rather than int for enumeration like values.
[ https://issues.apache.org/jira/browse/GEODE-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456895#comment-17456895 ] ASF subversion and git services commented on GEODE-9851: Commit 79475fa4a5e3cb82def90a1a8b7fa22a023eb57c in geode's branch refs/heads/develop from Jacob Barrett [ https://gitbox.apache.org/repos/asf?p=geode.git;h=79475fa ] GEODE-9851: Use InterestType and DataPolicy over ordinal int. (#7103) * Make InterestType an enum and use strong type in method parameters. * Use strong type DataPolicy in method parameters. * Prepare for migration to enum. > Use strongly typed enums rather than int for enumeration like values. > - > > Key: GEODE-9851 > URL: https://issues.apache.org/jira/browse/GEODE-9851 > Project: Geode > Issue Type: Improvement >Reporter: Jacob Barrett >Priority: Major > Labels: pull-request-available > > Internally register interest has both an interest policy and data storage > policy that it passes around as `int`. Since these values are finite and have > well defined values it makes sense to pass them as proper Java enums. > Strongly typing them provides compile time checks on acceptable values and > makes the code more readable. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (GEODE-9851) Use strongly typed enums rather than int for enumeration like values.
[ https://issues.apache.org/jira/browse/GEODE-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Barrett resolved GEODE-9851. -- Fix Version/s: 1.15.0 Resolution: Fixed > Use strongly typed enums rather than int for enumeration like values. > - > > Key: GEODE-9851 > URL: https://issues.apache.org/jira/browse/GEODE-9851 > Project: Geode > Issue Type: Improvement >Reporter: Jacob Barrett >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > Internally register interest has both an interest policy and data storage > policy that it passes around as `int`. Since these values are finite and have > well defined values it makes sense to pass them as proper Java enums. > Strongly typing them provides compile time checks on acceptable values and > makes the code more readable. -- This message was sent by Atlassian Jira (v8.20.1#820001)