[jira] [Created] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0

2021-12-09 Thread Jakov Varenina (Jira)
Jakov Varenina created GEODE-9881:
-

 Summary: Fully recoverd Oplogs object indicating 
unrecoveredRegionCount>0
 Key: GEODE-9881
 URL: https://issues.apache.org/jira/browse/GEODE-9881
 Project: Geode
  Issue Type: Bug
  Components: persistence
Reporter: Jakov Varenina






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0

2021-12-09 Thread Jakov Varenina (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-9881:
--
Description: 
We have found problem in case when region is closed and then recreated to start 
the recovery. If you inspect this code in close() function you will notice that 
it doesn't make any sense:
{code:java}
  void close(DiskRegion dr) {
// while a krf is being created can not close a region
lockCompactor();
try {
  if (!isDrfOnly()) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  long clearCount = dri.clear(null);
  if (clearCount != 0) {
totalLiveCount.addAndGet(-clearCount);
// no need to call handleNoLiveValues because we now have an
// unrecovered region.
  }
  regionMap.get().remove(dr.getId(), dri);
}
addUnrecoveredRegion(dr.getId());
  }
} finally {
  unlockCompactor();
}
  }

{code}
Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo 
object is contained in regionMap structure. Then afterwards it removes 
DiskRegionInfo object (that was previously marked as unrecovered) from the 
regionMap. This doesn't make any sense, it updated object and then removed it 
from map to be garbage collected. As you will see later on this will cause some 
issues when region is recovered.

Please check this code at recovery:
{code:java}
/**
 * For each dri that this oplog has that is currently unrecoverable check to 
see if a DiskRegion
 * that is recoverable now exists.
 */
void checkForRecoverableRegion(DiskRegionView dr) {
  if (unrecoveredRegionCount.get() > 0) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  if (dri.testAndSetRecovered(dr)) {
unrecoveredRegionCount.decrementAndGet();
  }
}
  }
}
{code}
The problem is that geode will not clear counter unrecoveredRegionCount in 
Oplog objects after recovery is done. This is because checkForRecoverableRegion 
will check unrecoveredRegionCount counter and perform testAndSetRecovered. The 
testAndSetRecovered will always return false, because non of the DiskRegionInfo 
objects in region map have unrecovered flag set to true (all object marked as 
unrecovered were deleted by close(), and then they were recreated during 
recovery see note below). The problem here is that all Oplogs will be fully 
recovered with the counter incorrectly indicating unrecoveredRegionCount>0. 
This will later on prevent the compaction of recovered Oplogs (the files that 
have .crf, .drf and .krf) when they reach compaction threshold.

Note: During recovery regionMap will be recreated from the Oplog files. Since 
all DiskRegionInfo objects are deleted from regionMap during the close(), they 
will be recreated by using function initRecoveredEntry during the recovery. All 
DiskRegionInfo will be created with flag unrecovered set to false.

 

  was:
We have found problem when region is closed and then recreated to start the 
recovery. If you inspect this code in close() function you will notice that it 
doesn't make any sense:
{code:java}
  void close(DiskRegion dr) {
// while a krf is being created can not close a region
lockCompactor();
try {
  if (!isDrfOnly()) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  long clearCount = dri.clear(null);
  if (clearCount != 0) {
totalLiveCount.addAndGet(-clearCount);
// no need to call handleNoLiveValues because we now have an
// unrecovered region.
  }
  regionMap.get().remove(dr.getId(), dri);
}
addUnrecoveredRegion(dr.getId());
  }
} finally {
  unlockCompactor();
}
  }

{code}
Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo 
object is contained in regionMap structure. Then afterwards it removes 
DiskRegionInfo object (that was previously marked as unrecovered) from the 
regionMap. This doesn't make any sense, it updated object and then removed it 
from map to be garbage collected. As you will see later on this will cause some 
issues when region is recovered.

Please check this code at recovery:
{code:java}
/**
 * For each dri that this oplog has that is currently unrecoverable check to 
see if a DiskRegion
 * that is recoverable now exists.
 */
void checkForRecoverableRegion(DiskRegionView dr) {
  if (unrecoveredRegionCount.get() > 0) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  if (dri.testAndSetRecovered(dr)) {
unrecoveredRegionCount.decrementAndGet();
  }
}
  }
}
{code}
The problem is that geode will not clear counter unrecoveredRegionCount in 
Oplog objects after recovery is done. Thi

[jira] [Assigned] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0

2021-12-09 Thread Jakov Varenina (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina reassigned GEODE-9881:
-

Assignee: Jakov Varenina

> Fully recoverd Oplogs object indicating unrecoveredRegionCount>0
> 
>
> Key: GEODE-9881
> URL: https://issues.apache.org/jira/browse/GEODE-9881
> Project: Geode
>  Issue Type: Bug
>  Components: persistence
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> We have found problem in case when region is closed and then recreated to 
> start the recovery. If you inspect this code in close() function you will 
> notice that it doesn't make any sense:
> {code:java}
>   void close(DiskRegion dr) {
> // while a krf is being created can not close a region
> lockCompactor();
> try {
>   if (!isDrfOnly()) {
> DiskRegionInfo dri = getDRI(dr);
> if (dri != null) {
>   long clearCount = dri.clear(null);
>   if (clearCount != 0) {
> totalLiveCount.addAndGet(-clearCount);
> // no need to call handleNoLiveValues because we now have an
> // unrecovered region.
>   }
>   regionMap.get().remove(dr.getId(), dri);
> }
> addUnrecoveredRegion(dr.getId());
>   }
> } finally {
>   unlockCompactor();
> }
>   }
> {code}
> Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
> unrecovered and increments counter unrecoveredRegionCount. This 
> DiskRegionInfo object is contained in regionMap structure. Then afterwards it 
> removes DiskRegionInfo object (that was previously marked as unrecovered) 
> from the regionMap. This doesn't make any sense, it updated object and then 
> removed it from map to be garbage collected. As you will see later on this 
> will cause some issues when region is recovered.
> Please check this code at recovery:
> {code:java}
> /**
>  * For each dri that this oplog has that is currently unrecoverable check to 
> see if a DiskRegion
>  * that is recoverable now exists.
>  */
> void checkForRecoverableRegion(DiskRegionView dr) {
>   if (unrecoveredRegionCount.get() > 0) {
> DiskRegionInfo dri = getDRI(dr);
> if (dri != null) {
>   if (dri.testAndSetRecovered(dr)) {
> unrecoveredRegionCount.decrementAndGet();
>   }
> }
>   }
> }
> {code}
> The problem is that geode will not clear counter unrecoveredRegionCount in 
> Oplog objects after recovery is done. This is because 
> checkForRecoverableRegion will check unrecoveredRegionCount counter and 
> perform testAndSetRecovered. The testAndSetRecovered will always return 
> false, because non of the DiskRegionInfo objects in region map have 
> unrecovered flag set to true (all object marked as unrecovered were deleted 
> by close(), and then they were recreated during recovery see note below). 
> The problem here is that all Oplogs will be fully recovered with the counter 
> incorrectly indicating unrecoveredRegionCount>0. This will later on prevent 
> the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) 
> when they reach compaction threshold.
> Note: During recovery regionMap will be recreated from the Oplog files. Since 
> all DiskRegionInfo objects are deleted from regionMap during the close(), 
> they will be recreated by using function initRecoveredEntry during the 
> recovery. All DiskRegionInfo will be created with flag unrecovered set to 
> false.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0

2021-12-09 Thread Jakov Varenina (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-9881:
--
Description: 
We have found problem when region is closed and then recreated to start the 
recovery. If you inspect this code in close() function you will notice that it 
doesn't make any sense:
{code:java}
  void close(DiskRegion dr) {
// while a krf is being created can not close a region
lockCompactor();
try {
  if (!isDrfOnly()) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  long clearCount = dri.clear(null);
  if (clearCount != 0) {
totalLiveCount.addAndGet(-clearCount);
// no need to call handleNoLiveValues because we now have an
// unrecovered region.
  }
  regionMap.get().remove(dr.getId(), dri);
}
addUnrecoveredRegion(dr.getId());
  }
} finally {
  unlockCompactor();
}
  }

{code}
Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo 
object is contained in regionMap structure. Then afterwards it removes 
DiskRegionInfo object (that was previously marked as unrecovered) from the 
regionMap. This doesn't make any sense, it updated object and then removed it 
from map to be garbage collected. As you will see later on this will cause some 
issues when region is recovered.

Please check this code at recovery:
{code:java}
/**
 * For each dri that this oplog has that is currently unrecoverable check to 
see if a DiskRegion
 * that is recoverable now exists.
 */
void checkForRecoverableRegion(DiskRegionView dr) {
  if (unrecoveredRegionCount.get() > 0) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  if (dri.testAndSetRecovered(dr)) {
unrecoveredRegionCount.decrementAndGet();
  }
}
  }
}
{code}
The problem is that geode will not clear counter unrecoveredRegionCount in 
Oplog objects after recovery is done. This is because checkForRecoverableRegion 
will check unrecoveredRegionCount counter and perform testAndSetRecovered. The 
testAndSetRecovered will always return false, because non of the DiskRegionInfo 
objects in region map have unrecovered flag set to true (all object marked as 
unrecovered were deleted by close(), and then they were recreated during 
recovery see note below). The problem here is that all Oplogs will be fully 
recovered with the counter incorrectly indicating unrecoveredRegionCount>0. 
This will later on prevent the compaction of recovered Oplogs (the files that 
have .crf, .drf and .krf) when they reach compaction threshold.

Note: During recovery regionMap will be recreated from the Oplog files. Since 
all DiskRegionInfo objects are deleted from regionMap during the close(), they 
will be recreated by using function initRecoveredEntry during the recovery. All 
DiskRegionInfo will be created with flag unrecovered set to false.

 

> Fully recoverd Oplogs object indicating unrecoveredRegionCount>0
> 
>
> Key: GEODE-9881
> URL: https://issues.apache.org/jira/browse/GEODE-9881
> Project: Geode
>  Issue Type: Bug
>  Components: persistence
>Reporter: Jakov Varenina
>Priority: Major
>
> We have found problem when region is closed and then recreated to start the 
> recovery. If you inspect this code in close() function you will notice that 
> it doesn't make any sense:
> {code:java}
>   void close(DiskRegion dr) {
> // while a krf is being created can not close a region
> lockCompactor();
> try {
>   if (!isDrfOnly()) {
> DiskRegionInfo dri = getDRI(dr);
> if (dri != null) {
>   long clearCount = dri.clear(null);
>   if (clearCount != 0) {
> totalLiveCount.addAndGet(-clearCount);
> // no need to call handleNoLiveValues because we now have an
> // unrecovered region.
>   }
>   regionMap.get().remove(dr.getId(), dri);
> }
> addUnrecoveredRegion(dr.getId());
>   }
> } finally {
>   unlockCompactor();
> }
>   }
> {code}
> Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
> unrecovered and increments counter unrecoveredRegionCount. This 
> DiskRegionInfo object is contained in regionMap structure. Then afterwards it 
> removes DiskRegionInfo object (that was previously marked as unrecovered) 
> from the regionMap. This doesn't make any sense, it updated object and then 
> removed it from map to be garbage collected. As you will see later on this 
> will cause some issues when region is recovered.
> Please check this code at recovery:
> {code:java}
> /**
>  * For each dri that this oplog has that is currently unrecoverable

[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing compaction

2021-12-09 Thread Jakov Varenina (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-9881:
--
Summary: Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 
preventing compaction  (was: Fully recoverd Oplogs object indicating 
unrecoveredRegionCount>0)

> Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing 
> compaction
> --
>
> Key: GEODE-9881
> URL: https://issues.apache.org/jira/browse/GEODE-9881
> Project: Geode
>  Issue Type: Bug
>  Components: persistence
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>
> We have found problem in case when region is closed and then recreated to 
> start the recovery. If you inspect this code in close() function you will 
> notice that it doesn't make any sense:
> {code:java}
>   void close(DiskRegion dr) {
> // while a krf is being created can not close a region
> lockCompactor();
> try {
>   if (!isDrfOnly()) {
> DiskRegionInfo dri = getDRI(dr);
> if (dri != null) {
>   long clearCount = dri.clear(null);
>   if (clearCount != 0) {
> totalLiveCount.addAndGet(-clearCount);
> // no need to call handleNoLiveValues because we now have an
> // unrecovered region.
>   }
>   regionMap.get().remove(dr.getId(), dri);
> }
> addUnrecoveredRegion(dr.getId());
>   }
> } finally {
>   unlockCompactor();
> }
>   }
> {code}
> Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
> unrecovered and increments counter unrecoveredRegionCount. This 
> DiskRegionInfo object is contained in regionMap structure. Then afterwards it 
> removes DiskRegionInfo object (that was previously marked as unrecovered) 
> from the regionMap. This doesn't make any sense, it updated object and then 
> removed it from map to be garbage collected. As you will see later on this 
> will cause some issues when region is recovered.
> Please check this code at recovery:
> {code:java}
> /**
>  * For each dri that this oplog has that is currently unrecoverable check to 
> see if a DiskRegion
>  * that is recoverable now exists.
>  */
> void checkForRecoverableRegion(DiskRegionView dr) {
>   if (unrecoveredRegionCount.get() > 0) {
> DiskRegionInfo dri = getDRI(dr);
> if (dri != null) {
>   if (dri.testAndSetRecovered(dr)) {
> unrecoveredRegionCount.decrementAndGet();
>   }
> }
>   }
> }
> {code}
> The problem is that geode will not clear counter unrecoveredRegionCount in 
> Oplog objects after recovery is done. This is because 
> checkForRecoverableRegion will check unrecoveredRegionCount counter and 
> perform testAndSetRecovered. The testAndSetRecovered will always return 
> false, because non of the DiskRegionInfo objects in region map have 
> unrecovered flag set to true (all object marked as unrecovered were deleted 
> by close(), and then they were recreated during recovery see note below). 
> The problem here is that all Oplogs will be fully recovered with the counter 
> incorrectly indicating unrecoveredRegionCount>0. This will later on prevent 
> the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) 
> when they reach compaction threshold.
> Note: During recovery regionMap will be recreated from the Oplog files. Since 
> all DiskRegionInfo objects are deleted from regionMap during the close(), 
> they will be recreated by using function initRecoveredEntry during the 
> recovery. All DiskRegionInfo will be created with flag unrecovered set to 
> false.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9880) Cluster with multiple locators in an environment with no host name resolution, leads to null pointer exception

2021-12-09 Thread Tigran Ghahramanyan (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456357#comment-17456357
 ] 

Tigran Ghahramanyan commented on GEODE-9880:


Experimenting with 

_builder.setHostnameForClients();_

And setting the corresponding IP address represented as a string to override 
the host name for each locator, works around the above described problem, 
allowing the cluster to start.

> Cluster with multiple locators in an environment with no host name 
> resolution, leads to null pointer exception
> --
>
> Key: GEODE-9880
> URL: https://issues.apache.org/jira/browse/GEODE-9880
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Affects Versions: 1.12.5
>Reporter: Tigran Ghahramanyan
>Priority: Major
>
> In our use case we have two locators that are initially configured with IP 
> addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on 
> adding their corresponding host names to the locators list, while these host 
> names are not resolvable.
> Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client 
> tries to use such non resolvable host name to connect to a locator it tries 
> to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in 
> {_}SocketCreator.connect(){_}. Which seems strange.
> Then, if there is no locator running on the same host, the next locator in 
> the list is contacted, until reaching a locator contact configured with IP 
> address - which succeeds eventually.
> But, when there happens to be a locator listening on the same host, then we 
> have a null pointer exception in the second line below, because _inetadd=null_
> _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, 
> connects to a locator listening on the same host_
> _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd 
> = null_
>  
> As a result, the cluster comes to a failed state, unable to recover.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing compaction

2021-12-09 Thread Jakov Varenina (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-9881:
--
Description: 
We have found problem in case when region is closed with 
{color:#ff}Region.close(){color} and then recreated to start the recovery. 
If you inspect this code in close() function you will notice that it doesn't 
make any sense:
{code:java}
  void close(DiskRegion dr) {
// while a krf is being created can not close a region
lockCompactor();
try {
  if (!isDrfOnly()) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  long clearCount = dri.clear(null);
  if (clearCount != 0) {
totalLiveCount.addAndGet(-clearCount);
// no need to call handleNoLiveValues because we now have an
// unrecovered region.
  }
  regionMap.get().remove(dr.getId(), dri);
}
addUnrecoveredRegion(dr.getId());
  }
} finally {
  unlockCompactor();
}
  }

{code}
Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo 
object is contained in regionMap structure. Then afterwards it removes 
DiskRegionInfo object (that was previously marked as unrecovered) from the 
regionMap. This doesn't make any sense, it updated object and then removed it 
from map to be garbage collected. As you will see later on this will cause some 
issues when region is recovered.

Please check this code at recovery:
{code:java}
/**
 * For each dri that this oplog has that is currently unrecoverable check to 
see if a DiskRegion
 * that is recoverable now exists.
 */
void checkForRecoverableRegion(DiskRegionView dr) {
  if (unrecoveredRegionCount.get() > 0) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  if (dri.testAndSetRecovered(dr)) {
unrecoveredRegionCount.decrementAndGet();
  }
}
  }
}
{code}
The problem is that geode will not clear counter unrecoveredRegionCount in 
Oplog objects after recovery is done. This is because checkForRecoverableRegion 
will check unrecoveredRegionCount counter and perform testAndSetRecovered. The 
testAndSetRecovered will always return false, because non of the DiskRegionInfo 
objects in region map have unrecovered flag set to true (all object marked as 
unrecovered were deleted by close(), and then they were recreated during 
recovery see note below). The problem here is that all Oplogs will be fully 
recovered with the counter incorrectly indicating unrecoveredRegionCount>0. 
This will later on prevent the compaction of recovered Oplogs (the files that 
have .crf, .drf and .krf) when they reach compaction threshold.

Note: During recovery regionMap will be recreated from the Oplog files. Since 
all DiskRegionInfo objects are deleted from regionMap during the close(), they 
will be recreated by using function initRecoveredEntry during the recovery. All 
DiskRegionInfo will be created with flag unrecovered set to false.

 

  was:
We have found problem in case when region is closed and then recreated to start 
the recovery. If you inspect this code in close() function you will notice that 
it doesn't make any sense:
{code:java}
  void close(DiskRegion dr) {
// while a krf is being created can not close a region
lockCompactor();
try {
  if (!isDrfOnly()) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  long clearCount = dri.clear(null);
  if (clearCount != 0) {
totalLiveCount.addAndGet(-clearCount);
// no need to call handleNoLiveValues because we now have an
// unrecovered region.
  }
  regionMap.get().remove(dr.getId(), dri);
}
addUnrecoveredRegion(dr.getId());
  }
} finally {
  unlockCompactor();
}
  }

{code}
Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo 
object is contained in regionMap structure. Then afterwards it removes 
DiskRegionInfo object (that was previously marked as unrecovered) from the 
regionMap. This doesn't make any sense, it updated object and then removed it 
from map to be garbage collected. As you will see later on this will cause some 
issues when region is recovered.

Please check this code at recovery:
{code:java}
/**
 * For each dri that this oplog has that is currently unrecoverable check to 
see if a DiskRegion
 * that is recoverable now exists.
 */
void checkForRecoverableRegion(DiskRegionView dr) {
  if (unrecoveredRegionCount.get() > 0) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  if (dri.testAndSetRecovered(dr)) {
unrecoveredRegionCount.decrementAndGet();
  }
}
  }
}
{code}
The problem is that geode will not clear counter unrecoveredRegion

[jira] [Updated] (GEODE-9881) Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing compaction

2021-12-09 Thread Jakov Varenina (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-9881:
--
Description: 
We have found problem in case when region is closed with Region.close() and 
then recreated to start the recovery. If you inspect this code in close() 
function you will notice that it doesn't make any sense:
{code:java}
  void close(DiskRegion dr) {
// while a krf is being created can not close a region
lockCompactor();
try {
  if (!isDrfOnly()) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  long clearCount = dri.clear(null);
  if (clearCount != 0) {
totalLiveCount.addAndGet(-clearCount);
// no need to call handleNoLiveValues because we now have an
// unrecovered region.
  }
  regionMap.get().remove(dr.getId(), dri);
}
addUnrecoveredRegion(dr.getId());
  }
} finally {
  unlockCompactor();
}
  }

{code}
Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo 
object is contained in regionMap structure. Then afterwards it removes 
DiskRegionInfo object (that was previously marked as unrecovered) from the 
regionMap. This doesn't make any sense, it updated object and then removed it 
from map to be garbage collected. As you will see later on this will cause some 
issues when region is recovered.

Please check this code at recovery:
{code:java}
/**
 * For each dri that this oplog has that is currently unrecoverable check to 
see if a DiskRegion
 * that is recoverable now exists.
 */
void checkForRecoverableRegion(DiskRegionView dr) {
  if (unrecoveredRegionCount.get() > 0) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  if (dri.testAndSetRecovered(dr)) {
unrecoveredRegionCount.decrementAndGet();
  }
}
  }
}
{code}
The problem is that geode will not clear counter unrecoveredRegionCount in 
Oplog objects after recovery is done. This is because checkForRecoverableRegion 
will check unrecoveredRegionCount counter and perform testAndSetRecovered. The 
testAndSetRecovered will always return false, because non of the DiskRegionInfo 
objects in region map have unrecovered flag set to true (all object marked as 
unrecovered were deleted by close(), and then they were recreated during 
recovery see note below). The problem here is that all Oplogs will be fully 
recovered with the counter incorrectly indicating unrecoveredRegionCount>0. 
This will later on prevent the compaction of recovered Oplogs (the files that 
have .crf, .drf and .krf) when they reach compaction threshold.

Note: During recovery regionMap will be recreated from the Oplog files. Since 
all DiskRegionInfo objects are deleted from regionMap during the close(), they 
will be recreated by using function initRecoveredEntry during the recovery. All 
DiskRegionInfo will be created with flag unrecovered set to false.

 

  was:
We have found problem in case when region is closed with 
{color:#ff}Region.close(){color} and then recreated to start the recovery. 
If you inspect this code in close() function you will notice that it doesn't 
make any sense:
{code:java}
  void close(DiskRegion dr) {
// while a krf is being created can not close a region
lockCompactor();
try {
  if (!isDrfOnly()) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  long clearCount = dri.clear(null);
  if (clearCount != 0) {
totalLiveCount.addAndGet(-clearCount);
// no need to call handleNoLiveValues because we now have an
// unrecovered region.
  }
  regionMap.get().remove(dr.getId(), dri);
}
addUnrecoveredRegion(dr.getId());
  }
} finally {
  unlockCompactor();
}
  }

{code}
Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo 
object is contained in regionMap structure. Then afterwards it removes 
DiskRegionInfo object (that was previously marked as unrecovered) from the 
regionMap. This doesn't make any sense, it updated object and then removed it 
from map to be garbage collected. As you will see later on this will cause some 
issues when region is recovered.

Please check this code at recovery:
{code:java}
/**
 * For each dri that this oplog has that is currently unrecoverable check to 
see if a DiskRegion
 * that is recoverable now exists.
 */
void checkForRecoverableRegion(DiskRegionView dr) {
  if (unrecoveredRegionCount.get() > 0) {
DiskRegionInfo dri = getDRI(dr);
if (dri != null) {
  if (dri.testAndSetRecovered(dr)) {
unrecoveredRegionCount.decrementAndGet();
  }
}
  }
}
{code}
The problem is that geode will not clear count

[jira] [Commented] (GEODE-9814) Add an example of geode-for-redis to the geode examples project

2021-12-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456467#comment-17456467
 ] 

ASF GitHub Bot commented on GEODE-9814:
---

jomartin-999 commented on a change in pull request #110:
URL: https://github.com/apache/geode-examples/pull/110#discussion_r765826888



##
File path: geodeForRedis/scripts/start.gfsh
##
@@ -0,0 +1,19 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+
+start locator --name=locator --bind-address=localhost
+
+start server --name=redisServer1 --locators=localhost[10334] --server-port=0 
--J=-Dgemfire.geode-for-redis-enabled=true 
--J=-Dgemfire.geode-for-redis-port=6379 
--J=-Dgemfire.geode-for-redis-bind-address=127.0.0.1

Review comment:
   @DonalEvans  It might be nice for this example to also set the 
redundancy level.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add an example of geode-for-redis to the geode examples project
> ---
>
> Key: GEODE-9814
> URL: https://issues.apache.org/jira/browse/GEODE-9814
> Project: Geode
>  Issue Type: Improvement
>  Components: redis
>Reporter: Dan Smith
>Assignee: Donal Evans
>Priority: Major
>  Labels: pull-request-available
>
> Add an example to the geode-examples project/repo demonstrating how to turn 
> on and use geode-for-redis.
> This is just a script. User must download native Redis to get command line 
> tool.
> Cluster Mode must be used.
> Start Server with gfsh.
> Use JedisCluster client to:
>  * Perform Sets
>  * Perform Gets
> Have a readme that speaks to using native Redis.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9880) Cluster with multiple locators in an environment with no host name resolution, leads to null pointer exception

2021-12-09 Thread Anthony Baker (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456560#comment-17456560
 ] 

Anthony Baker commented on GEODE-9880:
--

The locator list returned to the client contained \{ip1, host1, ip2, host2}. 
The discovered list provided to the client should have only contained 2 entries 
corresponding to the 2 locators.

Second, the advertised address of the locator should follow this semantic:

1) If hostname-for-clients is set use that.

2) If bind-address is set, use that interface.

3) Otherwise select an available network interface but there are no guarantees 
about ordering or dns resolution.

> Cluster with multiple locators in an environment with no host name 
> resolution, leads to null pointer exception
> --
>
> Key: GEODE-9880
> URL: https://issues.apache.org/jira/browse/GEODE-9880
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Affects Versions: 1.12.5
>Reporter: Tigran Ghahramanyan
>Priority: Major
>
> In our use case we have two locators that are initially configured with IP 
> addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on 
> adding their corresponding host names to the locators list, while these host 
> names are not resolvable.
> Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client 
> tries to use such non resolvable host name to connect to a locator it tries 
> to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in 
> {_}SocketCreator.connect(){_}. Which seems strange.
> Then, if there is no locator running on the same host, the next locator in 
> the list is contacted, until reaching a locator contact configured with IP 
> address - which succeeds eventually.
> But, when there happens to be a locator listening on the same host, then we 
> have a null pointer exception in the second line below, because _inetadd=null_
> _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, 
> connects to a locator listening on the same host_
> _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd 
> = null_
>  
> As a result, the cluster comes to a failed state, unable to recover.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9622) CI Failure: ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails with BindException

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456562#comment-17456562
 ] 

ASF subversion and git services commented on GEODE-9622:


Commit 20c417710673faf1d2afb2d72ed14fcaadc17926 in geode's branch 
refs/heads/develop from Dale Emery
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=20c4177 ]

GEODE-9622: Make failover test not use ephemeral port (#7178)

PROBLEM

`ClientServerTransactionFailoverWithMixedVersionServersDistributedTest`
misused ephemeral ports. Some tests start a locator on an ephemeral
port, stop the locator, and attempt to restart it on the same port.

During the time the locator is stopped, the OS can assign that port to another 
process. When that happens, as in these failures, the test is unable to restart 
the locator.

SOLUTION

Change the test to use `AvailablePortHelper` to assign an available
port, rather than requesting an ephemeral port.

> CI Failure: 
> ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails 
> with BindException
> --
>
> Key: GEODE-9622
> URL: https://issues.apache.org/jira/browse/GEODE-9622
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Reporter: Kirk Lund
>Assignee: Dale Emery
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>
> {noformat}
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest
>  > clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest$$Lambda$68/194483270.call
>  in VM 5 running on Host 
> heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal 
> with 6 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:473)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.rollLocatorToCurrent(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.setupPartiallyRolledVersion(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:171)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:127)
> Caused by:
> java.net.BindException: Failed to create server socket on 
> heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal/10.0.0.60[43535]
> at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:75)
> at 
> org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:55)
> at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:54)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.initializeServerSocket(TcpServer.java:196)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.startServerThread(TcpServer.java:183)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:178)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.MembershipLocatorImpl.start(MembershipLocatorImpl.java:112)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:653)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:394)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:345)
> at 
> org.apache.geode.distributed.Locator.startLocator(Locator.java:261)
> at 
> org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:207)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.startLocator(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:180)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.lambda$rollLocatorToCurrent$92d5d92a$1(ClientServerTransa

[jira] [Updated] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-9877:
--
Labels: pull-request-available  (was: )

> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
> --
>
> Key: GEODE-9877
> URL: https://issues.apache.org/jira/browse/GEODE-9877
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Mark Hanson
>Priority: Major
>  Labels: pull-request-available
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43]
>  failed with 
> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse
> {noformat}
> java.net.BindException: Address already in use (Bind failed)
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>   at java.net.Socket.bind(Socket.java:662)
>   at 
> org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Iterator.forEachRemaining(Iterator.java:116)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
>   at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
>   at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
>   at 
> org.junit

[jira] [Resolved] (GEODE-9870) JedisMovedDataException exception in testReconnectionWithAuthAndServerRestarts

2021-12-09 Thread Jens Deppe (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Deppe resolved GEODE-9870.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> JedisMovedDataException exception in testReconnectionWithAuthAndServerRestarts
> --
>
> Key: GEODE-9870
> URL: https://issues.apache.org/jira/browse/GEODE-9870
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> CI failure here 
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/315|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/315]:
>  
> {code:java}
> AuthWhileServersRestartDUnitTest > testReconnectionWithAuthAndServerRestarts 
> FAILED
> redis.clients.jedis.exceptions.JedisMovedDataException: MOVED 12539 
> 127.0.0.1:26259
> at redis.clients.jedis.Protocol.processError(Protocol.java:119)
> at redis.clients.jedis.Protocol.process(Protocol.java:169)
> at redis.clients.jedis.Protocol.read(Protocol.java:223)
> at 
> redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:352)
> at 
> redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:270)
> at redis.clients.jedis.BinaryJedis.flushAll(BinaryJedis.java:826)
> at 
> org.apache.geode.test.dunit.rules.RedisClusterStartupRule.flushAll(RedisClusterStartupRule.java:147)
> at 
> org.apache.geode.test.dunit.rules.RedisClusterStartupRule.flushAll(RedisClusterStartupRule.java:131)
> at 
> org.apache.geode.redis.internal.executor.auth.AuthWhileServersRestartDUnitTest.after(AuthWhileServersRestartDUnitTest.java:88){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly

2021-12-09 Thread Xiaojian Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456592#comment-17456592
 ] 

Xiaojian Zhou commented on GEODE-8644:
--

"Failed to connect to localhost/127.0.0.1:0" error message was introduced in 
Geode-7751. But introducing this error message itself is not the root cause. 

> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> intermittently fails when queues drain too slowly
> ---
>
> Key: GEODE-8644
> URL: https://issues.apache.org/jira/browse/GEODE-8644
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Benjamin P Ross
>Assignee: Mark Hanson
>Priority: Major
>  Labels: GeodeOperationAPI, needsTriage, pull-request-available
>
> Currently the test 
> SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() 
> relies on a 2 second delay to allow for queues to finish draining after 
> finishing the put operation. If queues take longer than 2 seconds to drain 
> the test will fail. We should change the test to wait for the queues to be 
> empty with a long timeout in case the queues never fully drain.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9872) DistTXPersistentDebugDUnitTest tests fail because "cluster configuration service not available"

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456612#comment-17456612
 ] 

ASF subversion and git services commented on GEODE-9872:


Commit 68b9080e84054f059b8c3e9b4aff9034fb302353 in geode's branch 
refs/heads/develop from Dale Emery
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=68b9080 ]

GEODE-9872: Make test framework code assign ports (#7176)

* GEODE-9872: Make test framework code assign ports

PROBLEM

`DistTXPersistentDebugDUnitTest ` failed in CI because it accidentally
connected to a locator from another test
(`ClusterConfigLocatorRestartDUnitTest`).

CAUSE

`ClusterConfigLocatorRestartDUnitTest` attempts to restart a
locator on a port in the ephemeral port range.

Here is the sequence of events:
1. `ClusterConfigLocatorRestartDUnitTest ` started a locator on an
   ephemeral port. In this CI run it got port 37877.
2. `ClusterConfigLocatorRestartDUnitTest` stopped the locator on port
   37877.
3. `DistTXPersistentDebugDUnitTest` started a locator on an ephemeral
   port. In this CI run it got 37877.
4. `ClusterConfigLocatorRestartDUnitTest ` attempted to restart the
   locator on port 37877. That port was already in use in
   `DistTXPersistentDebugDUnitTest`'s locator, and as a result the two
   tests became entangled.

CONTRIBUTING FACTORS

`DistTXPersistentDebugDUnitTest` uses `DUnitLauncher` to start its
locator. By default, `DUnitLauncher` starts its locator on an ephemeral
port.

`ClusterConfigLocatorRestartDUnitTest` uses `ClusterStartupRule` to
start several locators. By default, `ClusterStartupRule` starts each
locator on an ephemeral port.

SOLUTION

Change `DUnitLauncher` and `ClusterStartupRule` to assign locator ports
via `AvailablePortHelper` if the test does not specify a particular
port.

I considered changing only `ClusterConfigLogatorRestartDUnitTest` to
assign the port that it intends to reuse. But:
- That would fix only this one test, though an unknown number of tests
  similarly attempt to reuse ports assigned by framework code. Numerous
  of those tests have already been changed to assign ports explicitly,
  but an unknown number remain.
- It is quite reasonable for this test and others to assume that, if the
  test framework assigns a port on the test's behalf, then the test will
  enjoy exclusive use of that port for the entire life of the test. I
  think the key problem is not that tests make this assumption, but that
  the framework code violates it.

Changing the test framework classes that tacitly assign ports
(`DUnitLauncher` and `ClusterStartupRule`) makes them behave in a way
that tests expect.

* Add new port var to dunit sanctioned serializables

> DistTXPersistentDebugDUnitTest tests fail because "cluster configuration 
> service not available"
> ---
>
> Key: GEODE-9872
> URL: https://issues.apache.org/jira/browse/GEODE-9872
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Bill Burcham
>Assignee: Dale Emery
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
>
> I suspect this failure is due to something in the test framework, or perhaps 
> one or more tests failing to manage ports correctly, allowing two or more 
> tests to interfere with one another.
> In this distributed test: 
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/388]
>  we see two failures. Here's the first full stack trace:
>  
>  
> {code:java}
> [error 2021/12/04 20:40:53.796 UTC  
> tid=33] org.apache.geode.GemFireConfigException: cluster configuration 
> service not available
> at 
> org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:196)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:226)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:192)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:79)
> at 
> org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87)
> at 
> org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225)
> at 
> org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72)
> at 
> org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222)
> at 
> org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372)
> at 
> org.jun

[jira] [Resolved] (GEODE-9871) CI failure: InfoStatsIntegrationTest > networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond

2021-12-09 Thread Jens Deppe (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Deppe resolved GEODE-9871.
---
Fix Version/s: 1.15.0
 Assignee: Jens Deppe
   Resolution: Fixed

> CI failure: InfoStatsIntegrationTest > 
> networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond
> 
>
> Key: GEODE-9871
> URL: https://issues.apache.org/jira/browse/GEODE-9871
> Project: Geode
>  Issue Type: Bug
>  Components: redis, statistics
>Affects Versions: 1.15.0
>Reporter: Bill Burcham
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> link: 
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/integration-test-openjdk8/builds/38]
> stack trace:
> {code:java}
> InfoStatsIntegrationTest > 
> networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond 
> FAILED
> org.opentest4j.AssertionFailedError: 
> expected: 0.0
>  but was: 0.01
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.redis.internal.commands.executor.server.AbstractRedisInfoStatsIntegrationTest.networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond(AbstractRedisInfoStatsIntegrationTest.java:228)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at 
> java.util

[jira] [Commented] (GEODE-9758) Configure locator serialization filtering by default on Java 8

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456650#comment-17456650
 ] 

ASF subversion and git services commented on GEODE-9758:


Commit db64b4948e790d61e82f95ae6163a62adc4c67fb in geode's branch 
refs/heads/develop from Kirk Lund
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=db64b49 ]

GEODE-9758: Move SanctionedSerializables to filter package (#7165)

Move SanctionedSerializables to new package
org.apache.geode.internal.serialization.filter.

> Configure locator serialization filtering by default on Java 8
> --
>
> Key: GEODE-9758
> URL: https://issues.apache.org/jira/browse/GEODE-9758
> Project: Geode
>  Issue Type: Improvement
>Affects Versions: 1.12.7
>Reporter: Jianxia Chen
>Assignee: Jianxia Chen
>Priority: Major
>  Labels: pull-request-available
>
> When Geode locator is running on Java 8 JVM, the serialization filter should 
> be configured by default to accept only JDK classes and Geode classes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9871) CI failure: InfoStatsIntegrationTest > networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456656#comment-17456656
 ] 

ASF subversion and git services commented on GEODE-9871:


Commit c65f048b5327fcd36694dfe9ab20251ed944eeb1 in geode's branch 
refs/heads/develop from Jens Deppe
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c65f048 ]

GEODE-9871: Improve Radish test for network KB/s verification (#7170)



> CI failure: InfoStatsIntegrationTest > 
> networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond
> 
>
> Key: GEODE-9871
> URL: https://issues.apache.org/jira/browse/GEODE-9871
> Project: Geode
>  Issue Type: Bug
>  Components: redis, statistics
>Affects Versions: 1.15.0
>Reporter: Bill Burcham
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> link: 
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/integration-test-openjdk8/builds/38]
> stack trace:
> {code:java}
> InfoStatsIntegrationTest > 
> networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond 
> FAILED
> org.opentest4j.AssertionFailedError: 
> expected: 0.0
>  but was: 0.01
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.geode.redis.internal.commands.executor.server.AbstractRedisInfoStatsIntegrationTest.networkKiloBytesReadOverLastSecond_shouldBeCloseToBytesReadOverLastSecond(AbstractRedisInfoStatsIntegrationTest.java:228)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at 
> java.util.stream.AbstractPipeline.wrapAndC

[jira] [Assigned] (GEODE-9854) Orphaned .drf files causing memory leak

2021-12-09 Thread Darrel Schneider (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darrel Schneider reassigned GEODE-9854:
---

Assignee: Darrel Schneider

> Orphaned .drf files causing memory leak
> ---
>
> Key: GEODE-9854
> URL: https://issues.apache.org/jira/browse/GEODE-9854
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Darrel Schneider
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png, screenshot-2.png, server1.log
>
>
> Issue:
> An OpLog files are compacted, but the .drf file is left because it contains 
> deletes ofentries in previous .crfs. The .crf file is deleted, but the 
> orphaned .drf is not until all
> previous .crf files (.crfs with smaller id) are deleted.
> The problem is that compacted Oplog object representing orphaned .drf file 
> holds a structure in memory (Oplog.regionMap) that contains information that 
> is not useful
> after the compaction and it takes certain amount of memory. Besides, there is 
> a race condition in the code when creating .krf files that, depending on the 
> execution order,
> could make the problem more severe  (it could leave pendingKrfTags structure 
> on the regionMap and this could take up a significant amount of memory). This
> pendingKrfTags HashMap is actually empty, but consumes memory because it was 
> used previously and the size of the HashMap was not reduced after it is 
> cleared.
> This race condition usually happens when new Oplog is rolled out and previous 
> Oplog is immediately marked as eligible for compaction. Compaction and .krf 
> creation start at
> the similar time and compactor cancels creation of .krf if it is executed 
> first. The pendingKrfTags structure is usually cleared when .krf file is 
> created, but sincecompaction canceled creation of .krf, the pendingKrfTags 
> structure remain in memory until Oplog representing orphaned .drf file is 
> deleted.
> Below it can be see that actually .krf is never created for the orphaned .drf 
> Oplog object that has memory allocated in pendingKrfTags:
> {code:java}
> server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 
>  tid=0x34] Created oplog#129 
> drf for disk store store1.
> server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 
>  tid=0x34] Created oplog#129 
> crf for disk store store1.
> server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1  store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction 
> oplog id(s): oplog#129
> server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1  store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates 
> in 41 ms
> server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1  Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1.
> {code}
> !screenshot-1.png|width=1123,height=268!
> Below you can see the log and heap dump of orphaned .drf Oplg that dont have 
> pendingKrfTags allocated in memory. This is because pendingKrfTags is cleared 
> when .krf is created as can be seen in below logs.
> {code:java}
> server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 
>  tid=0x34] Created oplog#130 
> drf for disk store store1.
> server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 
>  tid=0x34] Created oplog#130 
> crf for disk store store1.
> server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1  OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
> server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1  store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for 
> store1 compaction oplog id(s): oplog#130
> server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1  store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 
> creates and updates in 64 ms
> server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1  Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1.
> server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1  Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1.
> {code}
> !screenshot-2.png|width=1123,height=268!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9854) Orphaned .drf files causing memory leak

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456669#comment-17456669
 ] 

ASF subversion and git services commented on GEODE-9854:


Commit 324ed89c3d43a53466cf5aeb614b63e757ba8b23 in geode's branch 
refs/heads/develop from Jakov Varenina
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=324ed89 ]

GEODE-9854: Orphaned .drf file causing memory leak (#7145)

* GEODE-9854: Orphaned .drf file causing memory leak

Issue:
An OpLog files are compacted, but the .drf file is left because it contains 
deletes of
entries in previous .crfs. The .crf file is deleted, but the orphaned .drf is 
not until all
previous .crf files (.crfs with smaller id) are deleted.

The problem is that compacted Oplog object representing orphaned .drf file holds
a structure in memory (Oplog.regionMap) that contains information that is not 
useful
after the compaction and it takes certain amount of memory. Besides, there is a 
race
condition in the code when creating .krf files that, depending on the execution 
order,
could make the problem more severe (it could leave pendingKrfTags structure
on the regionMap and this could take up a significant amount of memory). This
pendingKrfTags HashMap is actually empty, but consumes memory because it was 
used
previously and the size of the HashMap was not reduced after it is cleared.
This race condition usually happens when new Oplog is rolled out and previous 
Oplog
is immediately marked as eligible for compaction. Compaction and .krf creation 
start at
the similar time and compactor cancels creation of .krf if it is executed first.
The pendingKrfTags structure is usually cleared when .krf file is created, but 
since
compaction canceled creation of .krf, the pendingKrfTags structure remain in 
memory
until Oplog representing orphaned .drf file is deleted.

Solution:
Clear the regionMap data structure of the Oplog when it is compacted (currently 
it is
deleted when the Oplog is destroyed).

* introduced inner static class RegionMap in Oplog
* RegionMap.get() will return always empty map if it was closed before
* When closing disk region skip adding only drf oplog to unrecovered
map and also don't try to remove it from regionMap (it was already
removed during compaction).

* Following test cases are introduced:

1. Recovery of single region after cache is closed and then recreated
(testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterCacheClosed)

2. Recovery of single region after region is closed and then recreated
(testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterRegionClose)

Co-authored-by: Alberto Gomez 

> Orphaned .drf files causing memory leak
> ---
>
> Key: GEODE-9854
> URL: https://issues.apache.org/jira/browse/GEODE-9854
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
> Attachments: screenshot-1.png, screenshot-2.png, server1.log
>
>
> Issue:
> An OpLog files are compacted, but the .drf file is left because it contains 
> deletes ofentries in previous .crfs. The .crf file is deleted, but the 
> orphaned .drf is not until all
> previous .crf files (.crfs with smaller id) are deleted.
> The problem is that compacted Oplog object representing orphaned .drf file 
> holds a structure in memory (Oplog.regionMap) that contains information that 
> is not useful
> after the compaction and it takes certain amount of memory. Besides, there is 
> a race condition in the code when creating .krf files that, depending on the 
> execution order,
> could make the problem more severe  (it could leave pendingKrfTags structure 
> on the regionMap and this could take up a significant amount of memory). This
> pendingKrfTags HashMap is actually empty, but consumes memory because it was 
> used previously and the size of the HashMap was not reduced after it is 
> cleared.
> This race condition usually happens when new Oplog is rolled out and previous 
> Oplog is immediately marked as eligible for compaction. Compaction and .krf 
> creation start at
> the similar time and compactor cancels creation of .krf if it is executed 
> first. The pendingKrfTags structure is usually cleared when .krf file is 
> created, but sincecompaction canceled creation of .krf, the pendingKrfTags 
> structure remain in memory until Oplog representing orphaned .drf file is 
> deleted.
> Below it can be see that actually .krf is never created for the orphaned .drf 
> Oplog object that has memory allocated in pendingKrfTags:
> {code:java}
> server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 
>  tid=0x34] Created oplog#129 
> drf for disk store store1.
> server1

[jira] [Resolved] (GEODE-9854) Orphaned .drf files causing memory leak

2021-12-09 Thread Darrel Schneider (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darrel Schneider resolved GEODE-9854.
-
Fix Version/s: 1.15.0
 Assignee: Jakov Varenina  (was: Darrel Schneider)
   Resolution: Fixed

> Orphaned .drf files causing memory leak
> ---
>
> Key: GEODE-9854
> URL: https://issues.apache.org/jira/browse/GEODE-9854
> Project: Geode
>  Issue Type: Bug
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
> Attachments: screenshot-1.png, screenshot-2.png, server1.log
>
>
> Issue:
> An OpLog files are compacted, but the .drf file is left because it contains 
> deletes ofentries in previous .crfs. The .crf file is deleted, but the 
> orphaned .drf is not until all
> previous .crf files (.crfs with smaller id) are deleted.
> The problem is that compacted Oplog object representing orphaned .drf file 
> holds a structure in memory (Oplog.regionMap) that contains information that 
> is not useful
> after the compaction and it takes certain amount of memory. Besides, there is 
> a race condition in the code when creating .krf files that, depending on the 
> execution order,
> could make the problem more severe  (it could leave pendingKrfTags structure 
> on the regionMap and this could take up a significant amount of memory). This
> pendingKrfTags HashMap is actually empty, but consumes memory because it was 
> used previously and the size of the HashMap was not reduced after it is 
> cleared.
> This race condition usually happens when new Oplog is rolled out and previous 
> Oplog is immediately marked as eligible for compaction. Compaction and .krf 
> creation start at
> the similar time and compactor cancels creation of .krf if it is executed 
> first. The pendingKrfTags structure is usually cleared when .krf file is 
> created, but sincecompaction canceled creation of .krf, the pendingKrfTags 
> structure remain in memory until Oplog representing orphaned .drf file is 
> deleted.
> Below it can be see that actually .krf is never created for the orphaned .drf 
> Oplog object that has memory allocated in pendingKrfTags:
> {code:java}
> server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 
>  tid=0x34] Created oplog#129 
> drf for disk store store1.
> server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 
>  tid=0x34] Created oplog#129 
> crf for disk store store1.
> server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1  store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction 
> oplog id(s): oplog#129
> server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1  store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates 
> in 41 ms
> server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1  Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1.
> {code}
> !screenshot-1.png|width=1123,height=268!
> Below you can see the log and heap dump of orphaned .drf Oplg that dont have 
> pendingKrfTags allocated in memory. This is because pendingKrfTags is cleared 
> when .krf is created as can be seen in below logs.
> {code:java}
> server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 
>  tid=0x34] Created oplog#130 
> drf for disk store store1.
> server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 
>  tid=0x34] Created oplog#130 
> crf for disk store store1.
> server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1  OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
> server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1  store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for 
> store1 compaction oplog id(s): oplog#130
> server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1  store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 
> creates and updates in 64 ms
> server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1  Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1.
> server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1  Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1.
> {code}
> !screenshot-2.png|width=1123,height=268!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9882) User Guide, Micrometer section, fix product_name typo

2021-12-09 Thread Dave Barnes (Jira)
Dave Barnes created GEODE-9882:
--

 Summary: User Guide, Micrometer section, fix product_name typo
 Key: GEODE-9882
 URL: https://issues.apache.org/jira/browse/GEODE-9882
 Project: Geode
  Issue Type: Bug
  Components: docs
Reporter: Dave Barnes


On page 
https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html,
 the product name fails to display due to a typo in the variable syntax. Fix it.

There are other types of meters available in Micrometer, but they are not 
currently being used in .

Should be "used in Apache Geode."

Change `<%vars.product_name%>` to `<%=vars.product_name%>`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9882) User Guide, Micrometer section, fix product_name typo

2021-12-09 Thread Dave Barnes (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Barnes reassigned GEODE-9882:
--

Assignee: Dave Barnes

> User Guide, Micrometer section, fix product_name typo
> -
>
> Key: GEODE-9882
> URL: https://issues.apache.org/jira/browse/GEODE-9882
> Project: Geode
>  Issue Type: Bug
>  Components: docs
>Reporter: Dave Barnes
>Assignee: Dave Barnes
>Priority: Major
>
> On page 
> https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html,
>  the product name fails to display due to a typo in the variable syntax. Fix 
> it.
> There are other types of meters available in Micrometer, but they are not 
> currently being used in .
> Should be "used in Apache Geode."
> Change `<%vars.product_name%>` to `<%=vars.product_name%>`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9882) User Guide, Micrometer section, fix product_name typo

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-9882:
--
Labels: pull-request-available  (was: )

> User Guide, Micrometer section, fix product_name typo
> -
>
> Key: GEODE-9882
> URL: https://issues.apache.org/jira/browse/GEODE-9882
> Project: Geode
>  Issue Type: Bug
>  Components: docs
>Reporter: Dave Barnes
>Assignee: Dave Barnes
>Priority: Major
>  Labels: pull-request-available
>
> On page 
> https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html,
>  the product name fails to display due to a typo in the variable syntax. Fix 
> it.
> There are other types of meters available in Micrometer, but they are not 
> currently being used in .
> Should be "used in Apache Geode."
> Change `<%vars.product_name%>` to `<%=vars.product_name%>`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9882) User Guide, Micrometer section, fix product_name typo

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456679#comment-17456679
 ] 

ASF subversion and git services commented on GEODE-9882:


Commit 3b133c3088a2397c19c935979aa2ab2fd751a765 in geode's branch 
refs/heads/develop from Dave Barnes
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=3b133c3 ]

GEODE-9882: User Guide, Micrometer section, fix product_name typo (#7181)



> User Guide, Micrometer section, fix product_name typo
> -
>
> Key: GEODE-9882
> URL: https://issues.apache.org/jira/browse/GEODE-9882
> Project: Geode
>  Issue Type: Bug
>  Components: docs
>Reporter: Dave Barnes
>Assignee: Dave Barnes
>Priority: Major
>  Labels: pull-request-available
>
> On page 
> https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html,
>  the product name fails to display due to a typo in the variable syntax. Fix 
> it.
> There are other types of meters available in Micrometer, but they are not 
> currently being used in .
> Should be "used in Apache Geode."
> Change `<%vars.product_name%>` to `<%=vars.product_name%>`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9882) User Guide, Micrometer section, fix product_name typo

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456682#comment-17456682
 ] 

ASF subversion and git services commented on GEODE-9882:


Commit 47465165256e076112cfcaaadeb7aa365cb1b29d in geode's branch 
refs/heads/support/1.12 from Dave Barnes
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=4746516 ]

GEODE-9882: User Guide, Micrometer section, fix product_name typo (#7181)



> User Guide, Micrometer section, fix product_name typo
> -
>
> Key: GEODE-9882
> URL: https://issues.apache.org/jira/browse/GEODE-9882
> Project: Geode
>  Issue Type: Bug
>  Components: docs
>Reporter: Dave Barnes
>Assignee: Dave Barnes
>Priority: Major
>  Labels: pull-request-available
>
> On page 
> https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html,
>  the product name fails to display due to a typo in the variable syntax. Fix 
> it.
> There are other types of meters available in Micrometer, but they are not 
> currently being used in .
> Should be "used in Apache Geode."
> Change `<%vars.product_name%>` to `<%=vars.product_name%>`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9882) User Guide, Micrometer section, fix product_name typo

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456683#comment-17456683
 ] 

ASF subversion and git services commented on GEODE-9882:


Commit baacba121f98dcc860bdb954550e6e01c4d9e6e4 in geode's branch 
refs/heads/support/1.13 from Dave Barnes
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=baacba1 ]

GEODE-9882: User Guide, Micrometer section, fix product_name typo (#7181)



> User Guide, Micrometer section, fix product_name typo
> -
>
> Key: GEODE-9882
> URL: https://issues.apache.org/jira/browse/GEODE-9882
> Project: Geode
>  Issue Type: Bug
>  Components: docs
>Reporter: Dave Barnes
>Assignee: Dave Barnes
>Priority: Major
>  Labels: pull-request-available
>
> On page 
> https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html,
>  the product name fails to display due to a typo in the variable syntax. Fix 
> it.
> There are other types of meters available in Micrometer, but they are not 
> currently being used in .
> Should be "used in Apache Geode."
> Change `<%vars.product_name%>` to `<%=vars.product_name%>`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-9882) User Guide, Micrometer section, fix product_name typo

2021-12-09 Thread Dave Barnes (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Barnes resolved GEODE-9882.

Fix Version/s: 1.12.6
   1.13.5
   1.14.1
   1.15.0
   Resolution: Fixed

> User Guide, Micrometer section, fix product_name typo
> -
>
> Key: GEODE-9882
> URL: https://issues.apache.org/jira/browse/GEODE-9882
> Project: Geode
>  Issue Type: Bug
>  Components: docs
>Reporter: Dave Barnes
>Assignee: Dave Barnes
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.6, 1.13.5, 1.14.1, 1.15.0
>
>
> On page 
> https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html,
>  the product name fails to display due to a typo in the variable syntax. Fix 
> it.
> There are other types of meters available in Micrometer, but they are not 
> currently being used in .
> Should be "used in Apache Geode."
> Change `<%vars.product_name%>` to `<%=vars.product_name%>`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9882) User Guide, Micrometer section, fix product_name typo

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456685#comment-17456685
 ] 

ASF subversion and git services commented on GEODE-9882:


Commit 6b0413ccdf22d216f9f8d855b6159ecaff29c1ce in geode's branch 
refs/heads/support/1.14 from Dave Barnes
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=6b0413c ]

GEODE-9882: User Guide, Micrometer section, fix product_name typo (#7181)



> User Guide, Micrometer section, fix product_name typo
> -
>
> Key: GEODE-9882
> URL: https://issues.apache.org/jira/browse/GEODE-9882
> Project: Geode
>  Issue Type: Bug
>  Components: docs
>Reporter: Dave Barnes
>Assignee: Dave Barnes
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.6, 1.13.5, 1.14.1, 1.15.0
>
>
> On page 
> https://geode.apache.org/docs/guide/114/tools_modules/micrometer/micrometer-meters.html,
>  the product name fails to display due to a typo in the variable syntax. Fix 
> it.
> There are other types of meters available in Micrometer, but they are not 
> currently being used in .
> Should be "used in Apache Geode."
> Change `<%vars.product_name%>` to `<%=vars.product_name%>`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9822) Split-brain Certain During Network Partition in Two-Locator Cluster

2021-12-09 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9822:

Description: 
In a two-locator cluster with default member weights and default setting (true) 
of enable-network-partition-detection, if a long-lived network partition 
separates the two members, a split-brain will arise: there will be two 
coordinators at the same time.

The reason for this can be found in the GMSJoinLeave.isNetworkPartition() 
method. That method's name is misleading. A name like isMajorityLost() would 
probably be more apt. It needs to return true iff the weight of "crashed" 
members (in the prospective view) is greater-than-or-equal-to half (50%) of the 
total weight (of all members in the current view).

What the method actually does is return true iff the weight of "crashed" 
members is greater-than 51% of the total weight. As a result, if we have two 
members of equal weight, and the coordinator sees that the non-coordinator is 
"crashed", the coordinator will keep running. If a network partition is 
happening, and the non-coordinator is still running, then it will become a 
coordinator and start producing views. Now we'll have two coordinators 
producing views concurrently.

For this discussion "crashed" members are members for which the coordinator has 
received a RemoveMemberRequest message. These are members that the failure 
detector has deemed failed. Keep in mind the failure detector is imperfect 
(it's not always right), and that's kind of the whole point of this ticket: 
we've lost contact with the non-coordinator member, but that doesn't mean it 
can't still be running (on the other side of a partition).

This bug is not limited to the two-locator scenario. Any set of members that 
can be partitioned into two equal sets is susceptible. In fact it's even a 
little worse than that. Any set of members that can be partitioned into two 
sets, both of which still have 49% or more of the total weight, will result in 
a split-brain.

  was:
In a two-locator cluster with default member weights and default setting (true) 
of enable-network-partition-detection, if a long-lived network partition 
separates the two members, a split-brain will arise: there will be two 
coordinators at the same time.

The reason for this can be found in the GMSJoinLeave.isNetworkPartition() 
method. That method's name is misleading. A name like isMajorityLost() would 
probably be more apt. It needs to return true iff the weight of "crashed" 
members (in the prospective view) is greater-than-or-equal-to half (50%) of the 
total weight (of all members in the current view).

What the method actually does is return true iff the weight of "crashed" 
members is greater-than 51% of the total weight. As a result, if we have two 
members of equal weight, and the coordinator sees that the non-coordinator is 
"crashed", the coordinator will keep running. If a network partition is 
happening, and the non-coordinator is still running, then it will become a 
coordinator and start producing views. Now we'll have two coordinators 
producing views concurrently.

For this discussion "crashed" members are members for which the coordinator has 
received a RemoveMemberRequest message. These are members that the failure 
detector has deemed failed. Keep in mind the failure detector is imperfect 
(it's not always right), and that's kind of the whole point of this ticket: 
we've lost contact with the non-coordinator member, but that doesn't mean it 
can't still be running (on the other side of a partition).


> Split-brain Certain During Network Partition in Two-Locator Cluster
> ---
>
> Key: GEODE-9822
> URL: https://issues.apache.org/jira/browse/GEODE-9822
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
>
> In a two-locator cluster with default member weights and default setting 
> (true) of enable-network-partition-detection, if a long-lived network 
> partition separates the two members, a split-brain will arise: there will be 
> two coordinators at the same time.
> The reason for this can be found in the GMSJoinLeave.isNetworkPartition() 
> method. That method's name is misleading. A name like isMajorityLost() would 
> probably be more apt. It needs to return true iff the weight of "crashed" 
> members (in the prospective view) is greater-than-or-equal-to half (50%) of 
> the total weight (of all members in the current view).
> What the method actually does is return true iff the weight of "crashed" 
> members is greater-than 51% of the total weight. As a result, if we have two 
> members of equal weight, and the coordinator sees that the non-coordinator is 
> "crashed", the coordina

[jira] [Updated] (GEODE-9883) Review and Cleanup geode_for_redis.html.md.erb

2021-12-09 Thread Wayne (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wayne updated GEODE-9883:
-
Affects Version/s: 1.15.0

> Review and Cleanup geode_for_redis.html.md.erb
> --
>
> Key: GEODE-9883
> URL: https://issues.apache.org/jira/browse/GEODE-9883
> Project: Geode
>  Issue Type: Task
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Wayne
>Priority: Major
>
> Looking at what we have in the geode docs,  a few things that are out of date.
>  * This page still references the geode-for-redis-password, which doesn't 
> exist any more. It should probably talk about how redis interacts with the 
> security manager.
>  * Probably should mention how to configure TLS properties for 
> geode-for-redis.
>  * The redis-cli command I think is missing a -c option to use cluster mode.
>  * The supported redis commands listed in that page is incomplete
>  * Advantages section doesn't mention synchronous replication



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9883) Review and Cleanup geode_for_redis.html.md.erb

2021-12-09 Thread Wayne (Jira)
Wayne created GEODE-9883:


 Summary: Review and Cleanup geode_for_redis.html.md.erb
 Key: GEODE-9883
 URL: https://issues.apache.org/jira/browse/GEODE-9883
 Project: Geode
  Issue Type: Task
  Components: redis
Reporter: Wayne


Looking at what we have in the geode docs,  a few things that are out of date.
 * This page still references the geode-for-redis-password, which doesn't exist 
any more. It should probably talk about how redis interacts with the security 
manager.

 * Probably should mention how to configure TLS properties for geode-for-redis.

 * The redis-cli command I think is missing a -c option to use cluster mode.

 * The supported redis commands listed in that page is incomplete

 * Advantages section doesn't mention synchronous replication



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9883) Review and Cleanup geode_for_redis.html.md.erb

2021-12-09 Thread Wayne (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wayne updated GEODE-9883:
-
Labels: release-blocker  (was: )

> Review and Cleanup geode_for_redis.html.md.erb
> --
>
> Key: GEODE-9883
> URL: https://issues.apache.org/jira/browse/GEODE-9883
> Project: Geode
>  Issue Type: Task
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Wayne
>Priority: Major
>  Labels: release-blocker
>
> Looking at what we have in the geode docs,  a few things that are out of date.
>  * This page still references the geode-for-redis-password, which doesn't 
> exist any more. It should probably talk about how redis interacts with the 
> security manager.
>  * Probably should mention how to configure TLS properties for 
> geode-for-redis.
>  * The redis-cli command I think is missing a -c option to use cluster mode.
>  * The supported redis commands listed in that page is incomplete
>  * Advantages section doesn't mention synchronous replication



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9880) Cluster with multiple locators in an environment with no host name resolution, leads to null pointer exception

2021-12-09 Thread Ernest Burghardt (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ernest Burghardt updated GEODE-9880:

Labels: membership  (was: )

> Cluster with multiple locators in an environment with no host name 
> resolution, leads to null pointer exception
> --
>
> Key: GEODE-9880
> URL: https://issues.apache.org/jira/browse/GEODE-9880
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Affects Versions: 1.12.5
>Reporter: Tigran Ghahramanyan
>Priority: Major
>  Labels: membership
>
> In our use case we have two locators that are initially configured with IP 
> addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on 
> adding their corresponding host names to the locators list, while these host 
> names are not resolvable.
> Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client 
> tries to use such non resolvable host name to connect to a locator it tries 
> to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in 
> {_}SocketCreator.connect(){_}. Which seems strange.
> Then, if there is no locator running on the same host, the next locator in 
> the list is contacted, until reaching a locator contact configured with IP 
> address - which succeeds eventually.
> But, when there happens to be a locator listening on the same host, then we 
> have a null pointer exception in the second line below, because _inetadd=null_
> _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, 
> connects to a locator listening on the same host_
> _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd 
> = null_
>  
> As a result, the cluster comes to a failed state, unable to recover.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9884) update CI max_in_flight limits

2021-12-09 Thread Owen Nichols (Jira)
Owen Nichols created GEODE-9884:
---

 Summary: update CI max_in_flight limits
 Key: GEODE-9884
 URL: https://issues.apache.org/jira/browse/GEODE-9884
 Project: Geode
  Issue Type: Improvement
  Components: ci
Reporter: Owen Nichols


max_in_flight limits are set on the main CI pipeline to avoid overloading 
concourse when a large number of commits are coming through at the same time

these limits were last calculated a few years ago based on avg time each jobs 
takes, and many jobs now take much longer and should be recalculated



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (GEODE-9738) CI failure: RollingUpgradeRollServersOnReplicatedRegion_dataserializable failed with DistributedSystemDisconnectedException

2021-12-09 Thread Ernest Burghardt (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ernest Burghardt reassigned GEODE-9738:
---

Assignee: (was: Bill Burcham)

> CI failure: RollingUpgradeRollServersOnReplicatedRegion_dataserializable 
> failed with DistributedSystemDisconnectedException
> ---
>
> Key: GEODE-9738
> URL: https://issues.apache.org/jira/browse/GEODE-9738
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.15.0
>Reporter: Kamilla Aslami
>Priority: Major
>  Labels: needsTriage
> Attachments: GEODE-9738-short.log.all, controller.log, locator.log, 
> vm0.log, vm1.log, vm2.log, vm3.log
>
>
> {noformat}
> RollingUpgradeRollServersOnReplicatedRegion_dataserializable > 
> testRollServersOnReplicatedRegion_dataserializable[from_v1.13.4] FAILED
> java.lang.AssertionError: Suspicious strings were written to the log 
> during this run.
> Fix the strings or use IgnoredException.addIgnoredException to ignore.
> ---
> Found suspect string in 'dunit_suspect-vm2.log' at line 685[fatal 
> 2021/10/14 00:24:14.739 UTC  tid=115] Uncaught exception 
> in thread Thread[FederatingManager6,5,RMI Runtime]
> org.apache.geode.management.ManagementException: 
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> Distribution manager on 
> heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator):54751
>  started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated
> at 
> org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:486)
> at 
> org.apache.geode.management.internal.FederatingManager$AddMemberTask.call(FederatingManager.java:596)
> at 
> org.apache.geode.management.internal.FederatingManager.lambda$addMember$1(FederatingManager.java:199)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: 
> org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> Distribution manager on 
> heavy-lifter-10ae5f9d-2528-5e02-b707-d968eb54d50a(vm2:580278:locator):54751
>  started at Thu Oct 14 00:23:52 UTC 2021: Message distribution has terminated
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$Stopper.generateCancelledException(ClusterDistributionManager.java:2885)
> at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Stopper.generateCancelledException(InternalDistributedSystem.java:1177)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5212)
> at 
> org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83)
> at 
> org.apache.geode.internal.cache.CreateRegionProcessor.initializeRegion(CreateRegionProcessor.java:121)
> at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1164)
> at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1095)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3108)
> at 
> org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:78)
> at 
> org.apache.geode.management.internal.FederatingManager.addMemberArtifacts(FederatingManager.java:429)
> ... 5 more
> at org.junit.Assert.fail(Assert.java:89)
> at 
> org.apache.geode.test.dunit.internal.DUnitLauncher.closeAndCheckForSuspects(DUnitLauncher.java:420)
> at 
> org.apache.geode.test.dunit.internal.DUnitLauncher.closeAndCheckForSuspects(DUnitLauncher.java:436)
> at 
> org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.cleanupAllVms(JUnit4DistributedTestCase.java:551)
> at 
> org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.doTearDownDistributedTestCase(JUnit4DistributedTestCase.java:498)
> at 
> org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.tearDownDistributedTestCase(JUnit4DistributedTestCase.java:481)
> at jdk.internal.reflect.GeneratedMethodAccessor11.invoke(Unknown 
> Source)
> at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.m

[jira] [Assigned] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed

2021-12-09 Thread Jens Deppe (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Deppe reassigned GEODE-9877:
-

Assignee: Jens Deppe

> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
> --
>
> Key: GEODE-9877
> URL: https://issues.apache.org/jira/browse/GEODE-9877
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Mark Hanson
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43]
>  failed with 
> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse
> {noformat}
> java.net.BindException: Address already in use (Bind failed)
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>   at java.net.Socket.bind(Socket.java:662)
>   at 
> org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Iterator.forEachRemaining(Iterator.java:116)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
>   at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
>   at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
>   a

[jira] [Commented] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456712#comment-17456712
 ] 

ASF subversion and git services commented on GEODE-9877:


Commit 310c647da6ee4cc4a1eadc6df174d998e69afb31 in geode's branch 
refs/heads/develop from Jens Deppe
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=310c647 ]

GEODE-9877: Use ServerSocket to create interfering port (#7180)

- For some unknown reason `startupFailsGivenPortAlreadyInUse` started to
  fail after a seemingly innocuous Ubuntu base image bump. The problem
  may also have been triggered by arbitrary test ordering changes since
  the test did not fail on its own, but only in conjunction with running
  other tests beforehand.
  Specifically, the test was failing when binding the interfering port
  (bind exception). The port used was always in the TIME_WAIT state left
  from previous tests.
  Using a `ServerSocket`, instead of a regular socket, fixes the problem
  since it actually 'uses' the port and implicitly allows for port
  reuse.

- Use ServerSocket consistently. Rename test to be more appropriate

> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
> --
>
> Key: GEODE-9877
> URL: https://issues.apache.org/jira/browse/GEODE-9877
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Mark Hanson
>Priority: Major
>  Labels: pull-request-available
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43]
>  failed with 
> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse
> {noformat}
> java.net.BindException: Address already in use (Bind failed)
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>   at java.net.Socket.bind(Socket.java:662)
>   at 
> org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Iterator.forEachRemaining(Iterator.java:116)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCo

[jira] [Resolved] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed

2021-12-09 Thread Jens Deppe (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Deppe resolved GEODE-9877.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
> --
>
> Key: GEODE-9877
> URL: https://issues.apache.org/jira/browse/GEODE-9877
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Mark Hanson
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43]
>  failed with 
> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse
> {noformat}
> java.net.BindException: Address already in use (Bind failed)
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>   at java.net.Socket.bind(Socket.java:662)
>   at 
> org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Iterator.forEachRemaining(Iterator.java:116)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
>   at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
>   at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
>   at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$exe

[jira] [Commented] (GEODE-9885) StringsDUnitTest.givenBucketsMoveDuringAppend_thenDataIsNotLost fails with duplicated append

2021-12-09 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456713#comment-17456713
 ] 

Geode Integration commented on GEODE-9885:
--

Seen in [distributed-test-openjdk11 
#47|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/distributed-test-openjdk11/builds/47]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0723/test-results/distributedTest/1639077554/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0723/test-artifacts/1639077554/distributedtestfiles-openjdk11-1.15.0-build.0723.tgz].

> StringsDUnitTest.givenBucketsMoveDuringAppend_thenDataIsNotLost fails with 
> duplicated append
> 
>
> Key: GEODE-9885
> URL: https://issues.apache.org/jira/browse/GEODE-9885
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Ray Ingles
>Priority: Major
>
> The test appends a lot of strings to a key. It wound up adding (at least one) 
> extra string to the stored string:
>  
> {{java.util.concurrent.ExecutionException: java.lang.AssertionError: 
> unexpected -\{append0}-key-3-27680- at index 27681 iterationCount=61995 in 
> string}}
>  
> The string "\{append0}-key-3-27680-" appeared twice in sequence.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (GEODE-9885) StringsDUnitTest.givenBucketsMoveDuringAppend_thenDataIsNotLost fails with duplicated append

2021-12-09 Thread Ray Ingles (Jira)
Ray Ingles created GEODE-9885:
-

 Summary: 
StringsDUnitTest.givenBucketsMoveDuringAppend_thenDataIsNotLost fails with 
duplicated append
 Key: GEODE-9885
 URL: https://issues.apache.org/jira/browse/GEODE-9885
 Project: Geode
  Issue Type: Bug
  Components: redis
Affects Versions: 1.15.0
Reporter: Ray Ingles


The test appends a lot of strings to a key. It wound up adding (at least one) 
extra string to the stored string:

 

{{java.util.concurrent.ExecutionException: java.lang.AssertionError: unexpected 
-\{append0}-key-3-27680- at index 27681 iterationCount=61995 in string}}

 

The string "\{append0}-key-3-27680-" appeared twice in sequence.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-9872) DistTXPersistentDebugDUnitTest tests fail because "cluster configuration service not available"

2021-12-09 Thread Dale Emery (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Emery resolved GEODE-9872.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> DistTXPersistentDebugDUnitTest tests fail because "cluster configuration 
> service not available"
> ---
>
> Key: GEODE-9872
> URL: https://issues.apache.org/jira/browse/GEODE-9872
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Bill Burcham
>Assignee: Dale Emery
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
> Fix For: 1.15.0
>
>
> I suspect this failure is due to something in the test framework, or perhaps 
> one or more tests failing to manage ports correctly, allowing two or more 
> tests to interfere with one another.
> In this distributed test: 
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/388]
>  we see two failures. Here's the first full stack trace:
>  
>  
> {code:java}
> [error 2021/12/04 20:40:53.796 UTC  
> tid=33] org.apache.geode.GemFireConfigException: cluster configuration 
> service not available
> at 
> org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:196)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:226)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:192)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:79)
> at 
> org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87)
> at 
> org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225)
> at 
> org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72)
> at 
> org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222)
> at 
> org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
> at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
> at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
> at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
> at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
> at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
> at 
> org.junit.p

[jira] [Resolved] (GEODE-9622) CI Failure: ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails with BindException

2021-12-09 Thread Dale Emery (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Emery resolved GEODE-9622.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> CI Failure: 
> ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails 
> with BindException
> --
>
> Key: GEODE-9622
> URL: https://issues.apache.org/jira/browse/GEODE-9622
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Reporter: Kirk Lund
>Assignee: Dale Emery
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
> Fix For: 1.15.0
>
>
> {noformat}
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest
>  > clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest$$Lambda$68/194483270.call
>  in VM 5 running on Host 
> heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal 
> with 6 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:473)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.rollLocatorToCurrent(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.setupPartiallyRolledVersion(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:171)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:127)
> Caused by:
> java.net.BindException: Failed to create server socket on 
> heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal/10.0.0.60[43535]
> at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:75)
> at 
> org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:55)
> at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:54)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.initializeServerSocket(TcpServer.java:196)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.startServerThread(TcpServer.java:183)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:178)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.MembershipLocatorImpl.start(MembershipLocatorImpl.java:112)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:653)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:394)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:345)
> at 
> org.apache.geode.distributed.Locator.startLocator(Locator.java:261)
> at 
> org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:207)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.startLocator(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:180)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.lambda$rollLocatorToCurrent$92d5d92a$1(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216)
> Caused by:
> java.net.BindException: Address already in use (Bind failed)
> at java.net.PlainSocketImpl.socketBind(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
> at java.net.ServerSocket.bind(ServerSocket.java:390)
> at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:72)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (GEODE-9622) CI Failure: ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails with BindException

2021-12-09 Thread Dale Emery (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Emery closed GEODE-9622.
-

> CI Failure: 
> ClientServerTransactionFailoverWithMixedVersionServersDistributedTest fails 
> with BindException
> --
>
> Key: GEODE-9622
> URL: https://issues.apache.org/jira/browse/GEODE-9622
> Project: Geode
>  Issue Type: Bug
>  Components: client/server, tests
>Reporter: Kirk Lund
>Assignee: Dale Emery
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
> Fix For: 1.15.0
>
>
> {noformat}
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest
>  > clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer FAILED
> org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest$$Lambda$68/194483270.call
>  in VM 5 running on Host 
> heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal 
> with 6 VMs
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:473)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.rollLocatorToCurrent(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.setupPartiallyRolledVersion(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:171)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:127)
> Caused by:
> java.net.BindException: Failed to create server socket on 
> heavy-lifter-2b8702f1-fe32-5895-9b14-832b7049b607.c.apachegeode-ci.internal/10.0.0.60[43535]
> at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:75)
> at 
> org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:55)
> at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:54)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.initializeServerSocket(TcpServer.java:196)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.startServerThread(TcpServer.java:183)
> at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.start(TcpServer.java:178)
> at 
> org.apache.geode.distributed.internal.membership.gms.locator.MembershipLocatorImpl.start(MembershipLocatorImpl.java:112)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startPeerLocation(InternalLocator.java:653)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:394)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:345)
> at 
> org.apache.geode.distributed.Locator.startLocator(Locator.java:261)
> at 
> org.apache.geode.distributed.Locator.startLocatorAndDS(Locator.java:207)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.startLocator(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:180)
> at 
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.lambda$rollLocatorToCurrent$92d5d92a$1(ClientServerTransactionFailoverWithMixedVersionServersDistributedTest.java:216)
> Caused by:
> java.net.BindException: Address already in use (Bind failed)
> at java.net.PlainSocketImpl.socketBind(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
> at java.net.ServerSocket.bind(ServerSocket.java:390)
> at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:72)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (GEODE-9872) DistTXPersistentDebugDUnitTest tests fail because "cluster configuration service not available"

2021-12-09 Thread Dale Emery (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dale Emery closed GEODE-9872.
-

> DistTXPersistentDebugDUnitTest tests fail because "cluster configuration 
> service not available"
> ---
>
> Key: GEODE-9872
> URL: https://issues.apache.org/jira/browse/GEODE-9872
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Bill Burcham
>Assignee: Dale Emery
>Priority: Major
>  Labels: GeodeOperationAPI, pull-request-available
> Fix For: 1.15.0
>
>
> I suspect this failure is due to something in the test framework, or perhaps 
> one or more tests failing to manage ports correctly, allowing two or more 
> tests to interfere with one another.
> In this distributed test: 
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/388]
>  we see two failures. Here's the first full stack trace:
>  
>  
> {code:java}
> [error 2021/12/04 20:40:53.796 UTC  
> tid=33] org.apache.geode.GemFireConfigException: cluster configuration 
> service not available
> at 
> org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:196)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:226)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:192)
> at 
> org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:79)
> at 
> org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87)
> at 
> org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225)
> at 
> org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72)
> at 
> org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222)
> at 
> org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
> at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
> at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
> at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
> at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
> at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
> at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withI

[jira] [Created] (GEODE-9886) Remove(key) needs to return bool

2021-12-09 Thread Michael Martell (Jira)
Michael Martell created GEODE-9886:
--

 Summary: Remove(key) needs to return bool
 Key: GEODE-9886
 URL: https://issues.apache.org/jira/browse/GEODE-9886
 Project: Geode
  Issue Type: Bug
  Components: native client
Reporter: Michael Martell


The Remove(key) API in c-bindings needs to return a bool. Currently it returns 
void.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9883) Review and Cleanup geode_for_redis.html.md.erb

2021-12-09 Thread Donal Evans (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donal Evans updated GEODE-9883:
---
Labels: blocks-1.15.0​  (was: release-blocker)

> Review and Cleanup geode_for_redis.html.md.erb
> --
>
> Key: GEODE-9883
> URL: https://issues.apache.org/jira/browse/GEODE-9883
> Project: Geode
>  Issue Type: Task
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Wayne
>Priority: Major
>  Labels: blocks-1.15.0​
>
> Looking at what we have in the geode docs,  a few things that are out of date.
>  * This page still references the geode-for-redis-password, which doesn't 
> exist any more. It should probably talk about how redis interacts with the 
> security manager.
>  * Probably should mention how to configure TLS properties for 
> geode-for-redis.
>  * The redis-cli command I think is missing a -c option to use cluster mode.
>  * The supported redis commands listed in that page is incomplete
>  * Advantages section doesn't mention synchronous replication



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-7876) OldFreeListOffHeapRegionJUnitTest testPersistentChangeFromHeapToOffHeap

2021-12-09 Thread Geode Integration (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456742#comment-17456742
 ] 

Geode Integration commented on GEODE-7876:
--

Seen in [integration-test-openjdk8 
#52|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/integration-test-openjdk8/builds/52]
 ... see [test 
results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0728/test-results/integrationTest/1639079873/]
 or download 
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0728/test-artifacts/1639079873/integrationtestfiles-openjdk8-1.15.0-build.0728.tgz].

> OldFreeListOffHeapRegionJUnitTest testPersistentChangeFromHeapToOffHeap
> ---
>
> Key: GEODE-7876
> URL: https://issues.apache.org/jira/browse/GEODE-7876
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.12.0
>Reporter: Mark Hanson
>Priority: Major
>  Labels: GeodeOperationAPI, flaky
>
> CI Failure 
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/IntegrationTestOpenJDK11/builds/1587
>  
> Test worker" #27 prio=5 os_prio=0 cpu=7943.09ms elapsed=1126.77s 
> tid=0x7f60e0b7e000 nid=0x19 in Object.wait()  [0x7f60a2a4b000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(java.base@11.0.6/Native Method)
>   - waiting on 
>   at 
> org.apache.geode.internal.cache.DiskStoreImpl.waitForBackgroundTasks(DiskStoreImpl.java:2630)
>   - waiting to re-lock in wait() <0xffbb6438> (a 
> java.util.concurrent.atomic.AtomicInteger)
>   at 
> org.apache.geode.internal.cache.DiskStoreImpl.close(DiskStoreImpl.java:2386)
>   at 
> org.apache.geode.internal.cache.DiskStoreImpl.close(DiskStoreImpl.java:2296)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.closeDiskStores(GemFireCacheImpl.java:2476)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2234)
>   - locked <0xd0a42a00> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1931)
>   at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1921)
>   at 
> org.apache.geode.internal.offheap.OffHeapRegionBase.closeCache(OffHeapRegionBase.java:106)
>   at 
> org.apache.geode.internal.offheap.OffHeapRegionBase.testPersistentChangeFromHeapToOffHeap(OffHeapRegionBase.java:675)
>   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.6/Native 
> Method)
>   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.6/NativeMethodAccessorImpl.java:62)
>   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.6/DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(java.base@11.0.6/Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.S

[jira] [Commented] (GEODE-9814) Add an example of geode-for-redis to the geode examples project

2021-12-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456773#comment-17456773
 ] 

ASF GitHub Bot commented on GEODE-9814:
---

davebarnes97 commented on pull request #110:
URL: https://github.com/apache/geode-examples/pull/110#issuecomment-990336503


   > > > However, ../gradlew run failed ...
   > > 
   > > 
   > > Guessing that I might be missing prerequisites, I installed redis 
according to the quick-start instructions referenced in the README. The 
`../gradlew run` command still failed in the same way.
   > > Of interest, the optional `redis-cli` commands suggested in the README 
worked like a champ.
   > 
   > @davebarnes97 Thanks for this Dave. If it's not too much trouble, could 
you try changing the `SORTED_SET_KEY` constant in geodeForRedis/Example.java to 
be `SORTED_SET_KEY = "{tag}leaderboard";` and try running again? I'm not able 
to reproduce this failure on my machine, so it's difficult to tell what might 
fix it.
   
   Per your offline suggestion, I re-ran the example using a version of Geode 
built from the develop branch. All good. I'll approve the README.md file 
portion of the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add an example of geode-for-redis to the geode examples project
> ---
>
> Key: GEODE-9814
> URL: https://issues.apache.org/jira/browse/GEODE-9814
> Project: Geode
>  Issue Type: Improvement
>  Components: redis
>Reporter: Dan Smith
>Assignee: Donal Evans
>Priority: Major
>  Labels: pull-request-available
>
> Add an example to the geode-examples project/repo demonstrating how to turn 
> on and use geode-for-redis.
> This is just a script. User must download native Redis to get command line 
> tool.
> Cluster Mode must be used.
> Start Server with gfsh.
> Use JedisCluster client to:
>  * Perform Sets
>  * Perform Gets
> Have a readme that speaks to using native Redis.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9884) update CI max_in_flight limits

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-9884:
--
Labels: pull-request-available  (was: )

> update CI max_in_flight limits
> --
>
> Key: GEODE-9884
> URL: https://issues.apache.org/jira/browse/GEODE-9884
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Owen Nichols
>Priority: Major
>  Labels: pull-request-available
>
> max_in_flight limits are set on the main CI pipeline to avoid overloading 
> concourse when a large number of commits are coming through at the same time
> these limits were last calculated a few years ago based on avg time each jobs 
> takes, and many jobs now take much longer and should be recalculated



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9814) Add an example of geode-for-redis to the geode examples project

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456820#comment-17456820
 ] 

ASF subversion and git services commented on GEODE-9814:


Commit 380669b16f8b0d40fd265be2e1fcb41f016166fe in geode-examples's branch 
refs/heads/develop from Donal Evans
[ https://gitbox.apache.org/repos/asf?p=geode-examples.git;h=380669b ]

GEODE-9814: Add geode-for-redis example (#110)

Authored-by: Donal Evans 
Co-authored-by: Dave Barnes 

> Add an example of geode-for-redis to the geode examples project
> ---
>
> Key: GEODE-9814
> URL: https://issues.apache.org/jira/browse/GEODE-9814
> Project: Geode
>  Issue Type: Improvement
>  Components: redis
>Reporter: Dan Smith
>Assignee: Donal Evans
>Priority: Major
>  Labels: pull-request-available
>
> Add an example to the geode-examples project/repo demonstrating how to turn 
> on and use geode-for-redis.
> This is just a script. User must download native Redis to get command line 
> tool.
> Cluster Mode must be used.
> Start Server with gfsh.
> Use JedisCluster client to:
>  * Perform Sets
>  * Perform Gets
> Have a readme that speaks to using native Redis.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-9814) Add an example of geode-for-redis to the geode examples project

2021-12-09 Thread Donal Evans (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donal Evans resolved GEODE-9814.

Fix Version/s: 1.15.0
   Resolution: Fixed

> Add an example of geode-for-redis to the geode examples project
> ---
>
> Key: GEODE-9814
> URL: https://issues.apache.org/jira/browse/GEODE-9814
> Project: Geode
>  Issue Type: Improvement
>  Components: redis
>Reporter: Dan Smith
>Assignee: Donal Evans
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Add an example to the geode-examples project/repo demonstrating how to turn 
> on and use geode-for-redis.
> This is just a script. User must download native Redis to get command line 
> tool.
> Cluster Mode must be used.
> Start Server with gfsh.
> Use JedisCluster client to:
>  * Perform Sets
>  * Perform Gets
> Have a readme that speaks to using native Redis.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9814) Add an example of geode-for-redis to the geode examples project

2021-12-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456821#comment-17456821
 ] 

ASF GitHub Bot commented on GEODE-9814:
---

DonalEvans merged pull request #110:
URL: https://github.com/apache/geode-examples/pull/110


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@geode.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add an example of geode-for-redis to the geode examples project
> ---
>
> Key: GEODE-9814
> URL: https://issues.apache.org/jira/browse/GEODE-9814
> Project: Geode
>  Issue Type: Improvement
>  Components: redis
>Reporter: Dan Smith
>Assignee: Donal Evans
>Priority: Major
>  Labels: pull-request-available
>
> Add an example to the geode-examples project/repo demonstrating how to turn 
> on and use geode-for-redis.
> This is just a script. User must download native Redis to get command line 
> tool.
> Cluster Mode must be used.
> Start Server with gfsh.
> Use JedisCluster client to:
>  * Perform Sets
>  * Perform Gets
> Have a readme that speaks to using native Redis.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9880) Cluster with multiple locators in an environment with no host name resolution, leads to null pointer exception

2021-12-09 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9880:

Component/s: membership

> Cluster with multiple locators in an environment with no host name 
> resolution, leads to null pointer exception
> --
>
> Key: GEODE-9880
> URL: https://issues.apache.org/jira/browse/GEODE-9880
> Project: Geode
>  Issue Type: Bug
>  Components: locator, membership
>Affects Versions: 1.12.5
>Reporter: Tigran Ghahramanyan
>Priority: Major
>  Labels: membership
>
> In our use case we have two locators that are initially configured with IP 
> addresses, but _AutoConnectionSourceImpl.UpdateLocatorList()_ flow keeps on 
> adding their corresponding host names to the locators list, while these host 
> names are not resolvable.
> Later in {_}AutoConnectionSourceImpl.queryLocators(){_}, whenever a client 
> tries to use such non resolvable host name to connect to a locator it tries 
> to establish a connection to {_}socketaddr=0.0.0.0{_}, as written in 
> {_}SocketCreator.connect(){_}. Which seems strange.
> Then, if there is no locator running on the same host, the next locator in 
> the list is contacted, until reaching a locator contact configured with IP 
> address - which succeeds eventually.
> But, when there happens to be a locator listening on the same host, then we 
> have a null pointer exception in the second line below, because _inetadd=null_
> _socket.connect(sockaddr, Math.max(timeout, 0)); // sockaddr=0.0.0.0, 
> connects to a locator listening on the same host_
> _configureClientSSLSocket(socket, inetadd.getHostName(), timeout); // inetadd 
> = null_
>  
> As a result, the cluster comes to a failed state, unable to recover.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (GEODE-9822) Split-brain Certain During Network Partition in Two-Locator Cluster

2021-12-09 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-9822:

Description: 
In a two-locator cluster with default member weights and default setting (true) 
of enable-network-partition-detection, if a long-lived network partition 
separates the two members, a split-brain will arise: there will be two 
coordinators at the same time.

The reason for this can be found in the GMSJoinLeave.isNetworkPartition() 
method. That method's name is misleading. A name like isMajorityLost() would 
probably be more apt. It needs to return true iff the weight of "crashed" 
members (in the prospective view) is greater-than-or-equal-to half (50%) of the 
total weight (of all members in the current view).

What the method actually does is return true iff the weight of "crashed" 
members is greater-than 51% of the total weight. As a result, if we have two 
members of equal weight, and the coordinator sees that the non-coordinator is 
"crashed", the coordinator will keep running. If a network partition is 
happening, and the non-coordinator is still running, then it will become a 
coordinator and start producing views. Now we'll have two coordinators 
producing views concurrently.

For this discussion "crashed" members are members for which the coordinator has 
received a RemoveMemberRequest message. These are members that the failure 
detector has deemed failed. Keep in mind the failure detector is imperfect 
(it's not always right), and that's kind of the whole point of this ticket: 
we've lost contact with the non-coordinator member, but that doesn't mean it 
can't still be running (on the other side of a partition).

This bug is not limited to the two-locator scenario. Any set of members that 
can be partitioned into two equal sets is susceptible. In fact it's even a 
little worse than that. Any set of members that can be partitioned (into more 
than one set), where any two-or-more sets, each still have 49% or more of the 
total weight, will result in a split-brain

  was:
In a two-locator cluster with default member weights and default setting (true) 
of enable-network-partition-detection, if a long-lived network partition 
separates the two members, a split-brain will arise: there will be two 
coordinators at the same time.

The reason for this can be found in the GMSJoinLeave.isNetworkPartition() 
method. That method's name is misleading. A name like isMajorityLost() would 
probably be more apt. It needs to return true iff the weight of "crashed" 
members (in the prospective view) is greater-than-or-equal-to half (50%) of the 
total weight (of all members in the current view).

What the method actually does is return true iff the weight of "crashed" 
members is greater-than 51% of the total weight. As a result, if we have two 
members of equal weight, and the coordinator sees that the non-coordinator is 
"crashed", the coordinator will keep running. If a network partition is 
happening, and the non-coordinator is still running, then it will become a 
coordinator and start producing views. Now we'll have two coordinators 
producing views concurrently.

For this discussion "crashed" members are members for which the coordinator has 
received a RemoveMemberRequest message. These are members that the failure 
detector has deemed failed. Keep in mind the failure detector is imperfect 
(it's not always right), and that's kind of the whole point of this ticket: 
we've lost contact with the non-coordinator member, but that doesn't mean it 
can't still be running (on the other side of a partition).

This bug is not limited to the two-locator scenario. Any set of members that 
can be partitioned into two equal sets is susceptible. In fact it's even a 
little worse than that. Any set of members that can be partitioned into two 
sets, both of which still have 49% or more of the total weight, will result in 
a split-brain.


> Split-brain Certain During Network Partition in Two-Locator Cluster
> ---
>
> Key: GEODE-9822
> URL: https://issues.apache.org/jira/browse/GEODE-9822
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
>
> In a two-locator cluster with default member weights and default setting 
> (true) of enable-network-partition-detection, if a long-lived network 
> partition separates the two members, a split-brain will arise: there will be 
> two coordinators at the same time.
> The reason for this can be found in the GMSJoinLeave.isNetworkPartition() 
> method. That method's name is misleading. A name like isMajorityLost() would 
> probably be more apt. It needs to return true iff the weight of "crashed" 
> members (in the prospective view) is gre

[jira] [Resolved] (GEODE-9822) Split-brain Certain During Network Partition in Two-Locator Cluster

2021-12-09 Thread Bill Burcham (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham resolved GEODE-9822.
-
Fix Version/s: 1.15.0
   Resolution: Fixed

> Split-brain Certain During Network Partition in Two-Locator Cluster
> ---
>
> Key: GEODE-9822
> URL: https://issues.apache.org/jira/browse/GEODE-9822
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> In a two-locator cluster with default member weights and default setting 
> (true) of enable-network-partition-detection, if a long-lived network 
> partition separates the two members, a split-brain will arise: there will be 
> two coordinators at the same time.
> The reason for this can be found in the GMSJoinLeave.isNetworkPartition() 
> method. That method's name is misleading. A name like isMajorityLost() would 
> probably be more apt. It needs to return true iff the weight of "crashed" 
> members (in the prospective view) is greater-than-or-equal-to half (50%) of 
> the total weight (of all members in the current view).
> What the method actually does is return true iff the weight of "crashed" 
> members is greater-than 51% of the total weight. As a result, if we have two 
> members of equal weight, and the coordinator sees that the non-coordinator is 
> "crashed", the coordinator will keep running. If a network partition is 
> happening, and the non-coordinator is still running, then it will become a 
> coordinator and start producing views. Now we'll have two coordinators 
> producing views concurrently.
> For this discussion "crashed" members are members for which the coordinator 
> has received a RemoveMemberRequest message. These are members that the 
> failure detector has deemed failed. Keep in mind the failure detector is 
> imperfect (it's not always right), and that's kind of the whole point of this 
> ticket: we've lost contact with the non-coordinator member, but that doesn't 
> mean it can't still be running (on the other side of a partition).
> This bug is not limited to the two-locator scenario. Any set of members that 
> can be partitioned into two equal sets is susceptible. In fact it's even a 
> little worse than that. Any set of members that can be partitioned (into more 
> than one set), where any two-or-more sets, each still have 49% or more of the 
> total weight, will result in a split-brain



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9822) Split-brain Certain During Network Partition in Two-Locator Cluster

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456824#comment-17456824
 ] 

ASF subversion and git services commented on GEODE-9822:


Commit d89fdf67d091d5bb4e8bc60c9996b667dba3cab3 in geode's branch 
refs/heads/develop from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=d89fdf6 ]

GEODE-9822: Quorum Calculation Requires Majority (#7126)

* Quorum didn't formerly require a majority—it required only 49% of the member 
weight.
* Because of that two member partitions could survive at once, resulting in 
split-brain.
* Quorum now requires a majority.

> Split-brain Certain During Network Partition in Two-Locator Cluster
> ---
>
> Key: GEODE-9822
> URL: https://issues.apache.org/jira/browse/GEODE-9822
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bill Burcham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> In a two-locator cluster with default member weights and default setting 
> (true) of enable-network-partition-detection, if a long-lived network 
> partition separates the two members, a split-brain will arise: there will be 
> two coordinators at the same time.
> The reason for this can be found in the GMSJoinLeave.isNetworkPartition() 
> method. That method's name is misleading. A name like isMajorityLost() would 
> probably be more apt. It needs to return true iff the weight of "crashed" 
> members (in the prospective view) is greater-than-or-equal-to half (50%) of 
> the total weight (of all members in the current view).
> What the method actually does is return true iff the weight of "crashed" 
> members is greater-than 51% of the total weight. As a result, if we have two 
> members of equal weight, and the coordinator sees that the non-coordinator is 
> "crashed", the coordinator will keep running. If a network partition is 
> happening, and the non-coordinator is still running, then it will become a 
> coordinator and start producing views. Now we'll have two coordinators 
> producing views concurrently.
> For this discussion "crashed" members are members for which the coordinator 
> has received a RemoveMemberRequest message. These are members that the 
> failure detector has deemed failed. Keep in mind the failure detector is 
> imperfect (it's not always right), and that's kind of the whole point of this 
> ticket: we've lost contact with the non-coordinator member, but that doesn't 
> mean it can't still be running (on the other side of a partition).
> This bug is not limited to the two-locator scenario. Any set of members that 
> can be partitioned into two equal sets is susceptible. In fact it's even a 
> little worse than that. Any set of members that can be partitioned (into more 
> than one set), where any two-or-more sets, each still have 49% or more of the 
> total weight, will result in a split-brain



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-9884) update CI max_in_flight limits

2021-12-09 Thread Owen Nichols (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen Nichols resolved GEODE-9884.
-
Fix Version/s: 1.15.0
   Resolution: Fixed

> update CI max_in_flight limits
> --
>
> Key: GEODE-9884
> URL: https://issues.apache.org/jira/browse/GEODE-9884
> Project: Geode
>  Issue Type: Improvement
>  Components: ci
>Reporter: Owen Nichols
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> max_in_flight limits are set on the main CI pipeline to avoid overloading 
> concourse when a large number of commits are coming through at the same time
> these limits were last calculated a few years ago based on avg time each jobs 
> takes, and many jobs now take much longer and should be recalculated



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (GEODE-9877) GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456830#comment-17456830
 ] 

ASF subversion and git services commented on GEODE-9877:


Commit 9099e1fe70b02886ec7d65d21d8b9c0e60b94677 in geode's branch 
refs/heads/support/1.14 from Jens Deppe
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9099e1f ]

GEODE-9877: Use ServerSocket to create interfering port (#7180)

- For some unknown reason `startupFailsGivenPortAlreadyInUse` started to
  fail after a seemingly innocuous Ubuntu base image bump. The problem
  may also have been triggered by arbitrary test ordering changes since
  the test did not fail on its own, but only in conjunction with running
  other tests beforehand.
  Specifically, the test was failing when binding the interfering port
  (bind exception). The port used was always in the TIME_WAIT state left
  from previous tests.
  Using a `ServerSocket`, instead of a regular socket, fixes the problem
  since it actually 'uses' the port and implicitly allows for port
  reuse.

- Use ServerSocket consistently. Rename test to be more appropriate

(cherry picked from commit 310c647da6ee4cc4a1eadc6df174d998e69afb31)


> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse failed
> --
>
> Key: GEODE-9877
> URL: https://issues.apache.org/jira/browse/GEODE-9877
> Project: Geode
>  Issue Type: Bug
>  Components: redis
>Affects Versions: 1.15.0
>Reporter: Mark Hanson
>Assignee: Jens Deppe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/acceptance-test-openjdk8/builds/43]
>  failed with 
> GeodeRedisServerStartupDUnitTest. startupFailsGivenPortAlreadyInUse
> {noformat}
> java.net.BindException: Address already in use (Bind failed)
>   at java.net.PlainSocketImpl.socketBind(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>   at java.net.Socket.bind(Socket.java:662)
>   at 
> org.apache.geode.redis.GeodeRedisServerStartupDUnitTest.startupFailsGivenPortAlreadyInUse(GeodeRedisServerStartupDUnitTest.java:115)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.apache.geode.test.junit.rules.DescribedExternalResource$1.evaluate(DescribedExternalResource.java:40)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>   at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at java.util.Iterator.forEachRemaining(Iterator.java:116)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:18

[jira] [Commented] (GEODE-9851) Use strongly typed enums rather than int for enumeration like values.

2021-12-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456895#comment-17456895
 ] 

ASF subversion and git services commented on GEODE-9851:


Commit 79475fa4a5e3cb82def90a1a8b7fa22a023eb57c in geode's branch 
refs/heads/develop from Jacob Barrett
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=79475fa ]

GEODE-9851:  Use InterestType and DataPolicy over ordinal int. (#7103)

* Make InterestType an enum and use strong type in method parameters.
* Use strong type DataPolicy in method parameters.
 * Prepare for migration to enum.


> Use strongly typed enums rather than int for enumeration like values.
> -
>
> Key: GEODE-9851
> URL: https://issues.apache.org/jira/browse/GEODE-9851
> Project: Geode
>  Issue Type: Improvement
>Reporter: Jacob Barrett
>Priority: Major
>  Labels: pull-request-available
>
> Internally register interest has both an interest policy and data storage 
> policy that it passes around as `int`. Since these values are finite and have 
> well defined values it makes sense to pass them as proper Java enums. 
> Strongly typing them provides compile time checks on acceptable values and 
> makes the code more readable. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (GEODE-9851) Use strongly typed enums rather than int for enumeration like values.

2021-12-09 Thread Jacob Barrett (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Barrett resolved GEODE-9851.
--
Fix Version/s: 1.15.0
   Resolution: Fixed

> Use strongly typed enums rather than int for enumeration like values.
> -
>
> Key: GEODE-9851
> URL: https://issues.apache.org/jira/browse/GEODE-9851
> Project: Geode
>  Issue Type: Improvement
>Reporter: Jacob Barrett
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Internally register interest has both an interest policy and data storage 
> policy that it passes around as `int`. Since these values are finite and have 
> well defined values it makes sense to pass them as proper Java enums. 
> Strongly typing them provides compile time checks on acceptable values and 
> makes the code more readable. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)