[jira] [Commented] (GEODE-9632) Wrong output for the range query with wildcard character

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572798#comment-17572798
 ] 

ASF subversion and git services commented on GEODE-9632:


Commit 0ecd6f673801cbdcc9cfeba7da425c83502d66f8 in geode's branch 
refs/heads/develop from Mario Ivanac
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=0ecd6f6738 ]

GEODE-9632: fix for queries with multy operations and indexes (#7824)



> Wrong output for the range query with wildcard character
> 
>
> Key: GEODE-9632
> URL: https://issues.apache.org/jira/browse/GEODE-9632
> Project: Geode
>  Issue Type: Bug
>  Components: querying
>Affects Versions: 1.13.1, 1.14.0
>Reporter: Mario Kevo
>Assignee: Mario Kevo
>Priority: Major
>  Labels: pull-request-available
>
> We are using a range index on an attribute that is defined as HashMap.
> The problem is when we are using wildcard character(%), there is no results 
> for the query despite of there are some entries that meet the condition we 
> are checking.
> There is an example:
>  
> {code:java}
> gfsh>query --query="SELECT e.key, e.value from 
> /example-region.entrySet e where e.value.positions['SUN'] LIKE 
> '342234525745'" 
> Result  : true
> Limit   : 100
> Rows: 1
> Query Trace : Query Executed in 9.082156 ms; indexesUsed(1):index1(Results: 1)
> gfsh>query --query="SELECT e.key, e.value from 
> /example-region.entrySet e where e.value.positions['SUN'] LIKE 
> '34223452574%'" 
> Result  : true
> Limit   : 100
> Rows: 0
> Query Trace : Query Executed in 4.677162 ms; indexesUsed(1):index1(Results: 
> 100)
> {code}
>  As we are using indexes to have a better performance of executing a query it 
> first need to check how many entries has that field which we are looking for. 
> It stores it in index results and then check how many of them meet condition 
> we defined in our query.
> The problem is that there is parameter INDEX_THRESHOLD_SIZE which default 
> value is 100. If there is a lot of entries in the region it will write just 
> first 100 entries that is found.
> This parameter can be changed while starting servers by adding 
> *-Dgemfire.Query.INDEX_THRESHOLD_SIZE=*, but if we set it to a higher 
> value than the limit in the query is, it will overwrite it. So there should 
> be some changes to take this attribute into account.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (GEODE-9632) Wrong output for the range query with wildcard character

2022-07-29 Thread Mario Kevo (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mario Kevo resolved GEODE-9632.
---
Fix Version/s: 1.16.0
   Resolution: Fixed

> Wrong output for the range query with wildcard character
> 
>
> Key: GEODE-9632
> URL: https://issues.apache.org/jira/browse/GEODE-9632
> Project: Geode
>  Issue Type: Bug
>  Components: querying
>Affects Versions: 1.13.1, 1.14.0
>Reporter: Mario Kevo
>Assignee: Mario Kevo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> We are using a range index on an attribute that is defined as HashMap.
> The problem is when we are using wildcard character(%), there is no results 
> for the query despite of there are some entries that meet the condition we 
> are checking.
> There is an example:
>  
> {code:java}
> gfsh>query --query="SELECT e.key, e.value from 
> /example-region.entrySet e where e.value.positions['SUN'] LIKE 
> '342234525745'" 
> Result  : true
> Limit   : 100
> Rows: 1
> Query Trace : Query Executed in 9.082156 ms; indexesUsed(1):index1(Results: 1)
> gfsh>query --query="SELECT e.key, e.value from 
> /example-region.entrySet e where e.value.positions['SUN'] LIKE 
> '34223452574%'" 
> Result  : true
> Limit   : 100
> Rows: 0
> Query Trace : Query Executed in 4.677162 ms; indexesUsed(1):index1(Results: 
> 100)
> {code}
>  As we are using indexes to have a better performance of executing a query it 
> first need to check how many entries has that field which we are looking for. 
> It stores it in index results and then check how many of them meet condition 
> we defined in our query.
> The problem is that there is parameter INDEX_THRESHOLD_SIZE which default 
> value is 100. If there is a lot of entries in the region it will write just 
> first 100 entries that is found.
> This parameter can be changed while starting servers by adding 
> *-Dgemfire.Query.INDEX_THRESHOLD_SIZE=*, but if we set it to a higher 
> value than the limit in the query is, it will overwrite it. So there should 
> be some changes to take this attribute into account.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (GEODE-10403) Distributed deadlock when stopping gateway sender

2022-07-29 Thread Alberto Gomez (Jira)
Alberto Gomez created GEODE-10403:
-

 Summary: Distributed deadlock when stopping gateway sender
 Key: GEODE-10403
 URL: https://issues.apache.org/jira/browse/GEODE-10403
 Project: Geode
  Issue Type: Bug
  Components: wan
Affects Versions: 1.15.0, 1.14.4, 1.13.8, 1.12.9
Reporter: Alberto Gomez


A distributed deadlock has been found during some tests of a Geode system with 
WAN replication when stopping the gateway sender while sending a fair amount of 
operations to the servers.

The distributed deadlock manifests in the gateway sender stop command hanging 
forever and by all normal Geode operations from clients (gets, puts,...) not 
being responded.
The situation is provoked by the Gateway sender stop command that first takes 
the lifecycle lock and then, at a given point, tries to retrieve the size of 
the gateway sender. This operation, that requires communication with the other 
peers never finishes, probably because the response from one of the peers is 
never received.
Another thread is blocked when trying to acquire the lifecycle lock in 
AbstractGatewaySender.distribute().
Finally many threads handling Geode operations (get, put...) get blocked in the 
DistributedCacheOperation._distribute() call waiting for a response from 
another peer.

Thread dump section from blocked gateway sender stop command in call to get 
size of queue:
"ConcurrentParallelGatewaySenderEventProcessor Stopper Thread4" #1319 daemon 
prio=10 os_prio=0 cpu=46.95ms elapsed=4152.76s tid=0x7f92bc1bb000 
nid=0x2157 waiting on condition  [0x7f9179bd1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)
- parking to wait for  <0x00031ca2cbd8> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11/LockSupport.java:234)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1079)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1369)
at 
java.util.concurrent.CountDownLatch.await(java.base@11.0.11/CountDownLatch.java:278)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
at 
org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)
at 
org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6758)
at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6709)
at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6691)
at 
org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6663)
at 
org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)
at 
org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)
at 
org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8301)
at 
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.size(ParallelGatewaySenderQueue.java:1670)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.closeProcessor(AbstractGatewaySenderEventProcessor.java:1259)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1247)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
at 
java.util.concurrent.FutureTask.run(java.base@11.0.11/FutureTask.java:264)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.11/ThreadPoolExecutor.java:1128)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.11/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.11/Thread.java:829)


Thread dump section from blocked call to AbstractGatewaySender.distribute() 
call tr

[jira] [Updated] (GEODE-10403) Distributed deadlock when stopping gateway sender

2022-07-29 Thread Alexander Murmann (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Murmann updated GEODE-10403:
--
Labels: needsTriage  (was: )

> Distributed deadlock when stopping gateway sender
> -
>
> Key: GEODE-10403
> URL: https://issues.apache.org/jira/browse/GEODE-10403
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.12.9, 1.13.8, 1.14.4, 1.15.0
>Reporter: Alberto Gomez
>Priority: Major
>  Labels: needsTriage
>
> A distributed deadlock has been found during some tests of a Geode system 
> with WAN replication when stopping the gateway sender while sending a fair 
> amount of operations to the servers.
> The distributed deadlock manifests in the gateway sender stop command hanging 
> forever and by all normal Geode operations from clients (gets, puts,...) not 
> being responded.
> The situation is provoked by the Gateway sender stop command that first takes 
> the lifecycle lock and then, at a given point, tries to retrieve the size of 
> the gateway sender. This operation, that requires communication with the 
> other peers never finishes, probably because the response from one of the 
> peers is never received.
> Another thread is blocked when trying to acquire the lifecycle lock in 
> AbstractGatewaySender.distribute().
> Finally many threads handling Geode operations (get, put...) get blocked in 
> the DistributedCacheOperation._distribute() call waiting for a response from 
> another peer.
> Thread dump section from blocked gateway sender stop command in call to get 
> size of queue:
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread4" #1319 daemon 
> prio=10 os_prio=0 cpu=46.95ms elapsed=4152.76s tid=0x7f92bc1bb000 
> nid=0x2157 waiting on condition  [0x7f9179bd1000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)
> - parking to wait for  <0x00031ca2cbd8> (a 
> java.util.concurrent.CountDownLatch$Sync)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11/LockSupport.java:234)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1079)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1369)
> at 
> java.util.concurrent.CountDownLatch.await(java.base@11.0.11/CountDownLatch.java:278)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
> at 
> org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6758)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6709)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6691)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6663)
> at 
> org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)
> at 
> org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)
> at 
> org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8301)
> at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.size(ParallelGatewaySenderQueue.java:1670)
> at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.closeProcessor(AbstractGatewaySenderEventProcessor.java:1259)
> at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1247)
> at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
> at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
> at 
> java.util.concurrent.FutureTask.run

[jira] [Assigned] (GEODE-10403) Distributed deadlock when stopping gateway sender

2022-07-29 Thread Alberto Gomez (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto Gomez reassigned GEODE-10403:
-

Assignee: Alberto Gomez

> Distributed deadlock when stopping gateway sender
> -
>
> Key: GEODE-10403
> URL: https://issues.apache.org/jira/browse/GEODE-10403
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.12.9, 1.13.8, 1.14.4, 1.15.0
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: needsTriage
>
> A distributed deadlock has been found during some tests of a Geode system 
> with WAN replication when stopping the gateway sender while sending a fair 
> amount of operations to the servers.
> The distributed deadlock manifests in the gateway sender stop command hanging 
> forever and by all normal Geode operations from clients (gets, puts,...) not 
> being responded.
> The situation is provoked by the Gateway sender stop command that first takes 
> the lifecycle lock and then, at a given point, tries to retrieve the size of 
> the gateway sender. This operation, that requires communication with the 
> other peers never finishes, probably because the response from one of the 
> peers is never received.
> Another thread is blocked when trying to acquire the lifecycle lock in 
> AbstractGatewaySender.distribute().
> Finally many threads handling Geode operations (get, put...) get blocked in 
> the DistributedCacheOperation._distribute() call waiting for a response from 
> another peer.
> Thread dump section from blocked gateway sender stop command in call to get 
> size of queue:
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread4" #1319 daemon 
> prio=10 os_prio=0 cpu=46.95ms elapsed=4152.76s tid=0x7f92bc1bb000 
> nid=0x2157 waiting on condition  [0x7f9179bd1000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)
> - parking to wait for  <0x00031ca2cbd8> (a 
> java.util.concurrent.CountDownLatch$Sync)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11/LockSupport.java:234)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1079)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1369)
> at 
> java.util.concurrent.CountDownLatch.await(java.base@11.0.11/CountDownLatch.java:278)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
> at 
> org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6758)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6709)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6691)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6663)
> at 
> org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)
> at 
> org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)
> at 
> org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8301)
> at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.size(ParallelGatewaySenderQueue.java:1670)
> at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.closeProcessor(AbstractGatewaySenderEventProcessor.java:1259)
> at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1247)
> at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
> at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
> at 
> java

[jira] [Updated] (GEODE-10403) Distributed deadlock when stopping gateway sender

2022-07-29 Thread Alberto Gomez (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto Gomez updated GEODE-10403:
--
Description: 
A distributed deadlock has been found during some tests of a Geode system with 
WAN replication when stopping the gateway sender while sending a fair amount of 
operations to the servers.

The distributed deadlock manifests in the gateway sender stop command hanging 
forever and by all normal Geode operations from clients (gets, puts,...) not 
being responded.
The situation is provoked by the Gateway sender stop command that first takes 
the lifecycle lock and then, at a given point, tries to retrieve the size of 
the gateway sender. This operation, that requires communication with the other 
peers never finishes, probably because the response from one of the peers is 
never received.
Another thread is blocked when trying to acquire the lifecycle lock in 
AbstractGatewaySender.distribute().
Finally many threads handling Geode operations (get, put...) get blocked in the 
DistributedCacheOperation._distribute() call waiting for a response from 
another peer.

Thread dump section from blocked gateway sender stop command in call to get 
size of queue:

{{"ConcurrentParallelGatewaySenderEventProcessor Stopper Thread4" #1319 daemon 
prio=10 os_prio=0 cpu=46.95ms elapsed=4152.76s tid=0x7f92bc1bb000 
nid=0x2157 waiting on condition  [0x7f9179bd1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)
- parking to wait for  <0x00031ca2cbd8> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11/LockSupport.java:234)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1079)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1369)
at 
java.util.concurrent.CountDownLatch.await(java.base@11.0.11/CountDownLatch.java:278)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
at 
org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)
at 
org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6758)
at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6709)
at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6691)
at 
org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6663)
at 
org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)
at 
org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)
at 
org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8301)
at 
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.size(ParallelGatewaySenderQueue.java:1670)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.closeProcessor(AbstractGatewaySenderEventProcessor.java:1259)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1247)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
at 
java.util.concurrent.FutureTask.run(java.base@11.0.11/FutureTask.java:264)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.11/ThreadPoolExecutor.java:1128)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.11/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.11/Thread.java:829)
}}


Thread dump section from blocked call to AbstractGatewaySender.distribute() 
call trying to acquire the lifecycle lock:
{{"P2P message reader for 
192.168.78.164(eric-data-kvdb-ag-server-0:1):41000 shared ordered uid=6 
local port=60360 remote port=57246" #56 daemon prio=10 os_prio=

[jira] [Updated] (GEODE-10403) Distributed deadlock when stopping gateway sender

2022-07-29 Thread Alberto Gomez (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto Gomez updated GEODE-10403:
--
Description: 
A distributed deadlock has been found during some tests of a Geode system with 
WAN replication when stopping the gateway sender while sending a fair amount of 
operations to the servers.

The distributed deadlock manifests in the gateway sender stop command hanging 
forever and by all normal Geode operations from clients (gets, puts,...) not 
being responded.
The situation is provoked by the Gateway sender stop command that first takes 
the lifecycle lock and then, at a given point, tries to retrieve the size of 
the gateway sender. This operation, that requires communication with the other 
peers never finishes, probably because the response from one of the peers is 
never received.
Another thread is blocked when trying to acquire the lifecycle lock in 
AbstractGatewaySender.distribute().
Finally many threads handling Geode operations (get, put...) get blocked in the 
DistributedCacheOperation._distribute() call waiting for a response from 
another peer.

Thread dump section from blocked gateway sender stop command in call to get 
size of queue:

{{"ConcurrentParallelGatewaySenderEventProcessor Stopper Thread4" #1319 daemon 
prio=10 os_prio=0 cpu=46.95ms elapsed=4152.76s tid=0x7f92bc1bb000 
nid=0x2157 waiting on condition  [0x7f9179bd1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)
- parking to wait for  <0x00031ca2cbd8> (a 
java.util.concurrent.CountDownLatch$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11/LockSupport.java:234)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1079)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1369)
at 
java.util.concurrent.CountDownLatch.await(java.base@11.0.11/CountDownLatch.java:278)
at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
at 
org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)
at 
org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6758)
at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6709)
at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6691)
at 
org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6663)
at 
org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)
at 
org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)
at 
org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8301)
at 
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.size(ParallelGatewaySenderQueue.java:1670)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.closeProcessor(AbstractGatewaySenderEventProcessor.java:1259)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1247)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)
at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)
at 
java.util.concurrent.FutureTask.run(java.base@11.0.11/FutureTask.java:264)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.11/ThreadPoolExecutor.java:1128)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.11/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.11/Thread.java:829)

}}

Thread dump section from blocked call to AbstractGatewaySender.distribute() 
call trying to acquire the lifecycle lock:
{{"P2P message reader for 
192.168.78.164(eric-data-kvdb-ag-server-0:1):41000 shared ordered uid=6 
local port=60360 remote port=57246" #56 daemon prio=10 os_prio=

[jira] [Updated] (GEODE-10403) Distributed deadlock when stopping gateway sender

2022-07-29 Thread Alberto Gomez (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto Gomez updated GEODE-10403:
--
Description: 
A distributed deadlock has been found during some tests of a Geode system with 
WAN replication when stopping the gateway sender while sending a fair amount of 
operations to the servers.

The distributed deadlock manifests in the gateway sender stop command hanging 
forever and by all normal Geode operations from clients (gets, puts,...) not 
being responded.
The situation is provoked by the Gateway sender stop command that first takes 
the lifecycle lock and then, at a given point, tries to retrieve the size of 
the gateway sender. This operation, that requires communication with the other 
peers never finishes, probably because the response from one of the peers is 
never received.
Another thread is blocked when trying to acquire the lifecycle lock in 
AbstractGatewaySender.distribute().
Finally many threads handling Geode operations (get, put...) get blocked in the 
DistributedCacheOperation._distribute() call waiting for a response from 
another peer.

Thread dump section from blocked gateway sender stop command in call to get 
size of queue:

{{"ConcurrentParallelGatewaySenderEventProcessor Stopper Thread1" #1316 daemon 
prio=10 os_prio=0 cpu=45.55ms elapsed=4152.76s tid=0x7f92bc1c2000 
nid=0x2154 waiting on condition  [0x7f9179cd2000]}}
{{   java.lang.Thread.State: TIMED_WAITING (parking)}}
{{        at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)}}
{{        - parking to wait for  <0x00031ca2be50> (a 
java.util.concurrent.CountDownLatch$Sync)}}
{{        at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11/LockSupport.java:234)}}
{{        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1079)}}
{{        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1369)}}
{{        at 
java.util.concurrent.CountDownLatch.await(java.base@11.0.11/CountDownLatch.java:278)}}
{{        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)}}
{{        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)}}
{{        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)}}
{{        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)}}
{{        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)}}
{{        at 
org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)}}
{{        at 
org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6758)}}
{{        at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6709)}}
{{        at 
org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6691)}}
{{        at 
org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6663)}}
{{        at 
org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)}}
{{        at 
org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)}}
{{        at 
org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8301)}}
{{        at 
org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.size(ParallelGatewaySenderQueue.java:1670)}}
{{        at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.closeProcessor(AbstractGatewaySenderEventProcessor.java:1259)}}
{{        at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1247)}}
{{        at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)}}
{{        at 
org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1387)}}
{{        at 
java.util.concurrent.FutureTask.run(java.base@11.0.11/FutureTask.java:264)}}
{{        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.11/ThreadPoolExecutor.java:1128)}}
{{        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.11/ThreadPoolExecutor.java:628)}}
{{        at java.lang.Thread.run(java.base@11.0.11/Thread.java:829)}}

 

Thread dump section from blocked call to AbstractGatewaySender.distribute() 
call trying to acquire the lifecycle lock:

{{"P2P message reader for 
192.168.78.164(eri

[jira] [Updated] (GEODE-10403) Distributed deadlock when stopping gateway sender

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-10403:
---
Labels: needsTriage pull-request-available  (was: needsTriage)

> Distributed deadlock when stopping gateway sender
> -
>
> Key: GEODE-10403
> URL: https://issues.apache.org/jira/browse/GEODE-10403
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Affects Versions: 1.12.9, 1.13.8, 1.14.4, 1.15.0
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: needsTriage, pull-request-available
>
> A distributed deadlock has been found during some tests of a Geode system 
> with WAN replication when stopping the gateway sender while sending a fair 
> amount of operations to the servers.
> The distributed deadlock manifests in the gateway sender stop command hanging 
> forever and by all normal Geode operations from clients (gets, puts,...) not 
> being responded.
> The situation is provoked by the Gateway sender stop command that first takes 
> the lifecycle lock and then, at a given point, tries to retrieve the size of 
> the gateway sender. This operation, that requires communication with the 
> other peers never finishes, probably because the response from one of the 
> peers is never received.
> Another thread is blocked when trying to acquire the lifecycle lock in 
> AbstractGatewaySender.distribute().
> Finally many threads handling Geode operations (get, put...) get blocked in 
> the DistributedCacheOperation._distribute() call waiting for a response from 
> another peer.
> Thread dump section from blocked gateway sender stop command in call to get 
> size of queue:
> {{"ConcurrentParallelGatewaySenderEventProcessor Stopper Thread1" #1316 
> daemon prio=10 os_prio=0 cpu=45.55ms elapsed=4152.76s tid=0x7f92bc1c2000 
> nid=0x2154 waiting on condition  [0x7f9179cd2000]}}
> {{   java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{        at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)}}
> {{        - parking to wait for  <0x00031ca2be50> (a 
> java.util.concurrent.CountDownLatch$Sync)}}
> {{        at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11/LockSupport.java:234)}}
> {{        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1079)}}
> {{        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11/AbstractQueuedSynchronizer.java:1369)}}
> {{        at 
> java.util.concurrent.CountDownLatch.await(java.base@11.0.11/CountDownLatch.java:278)}}
> {{        at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)}}
> {{        at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)}}
> {{        at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)}}
> {{        at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)}}
> {{        at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)}}
> {{        at 
> org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344)}}
> {{        at 
> org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6758)}}
> {{        at 
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6709)}}
> {{        at 
> org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6691)}}
> {{        at 
> org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6663)}}
> {{        at 
> org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99)}}
> {{        at 
> org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078)}}
> {{        at 
> org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8301)}}
> {{        at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.size(ParallelGatewaySenderQueue.java:1670)}}
> {{        at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.closeProcessor(AbstractGatewaySenderEventProcessor.java:1259)}}
> {{        at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.stopProcessing(AbstractGatewaySenderEventProcessor.java:1247)}}
> {{        at 
> org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor$SenderStopperCallable.call(AbstractGatewaySenderEventProcessor.java:1399)}}
> {{        at 
> org.ap