"create region" cmd stuck on wan setup

2021-07-28 Thread Alberto Bustamante Reyes
Hi Geode devs,

I have been analyzing an issue that occurs in the following scenario:

1) I start two Geode clusters (cluster1 & cluster2) with one locator and two 
servers each.
Both clusters host a partitioned region called "testregion", which is 
replicated using a parallel gateway sender and a gateway receiver.
These are the gfsh files I have been using for creating the clusters: 
https://gist.github.com/alb3rtobr/e230623255632937fa68265f31e97f3a

2) I run a client connected to cluster2 performing operations on testregion.

3) cluster1 is stopped and all persistent data is deleted. And then, I create 
cluster1 again.

4) At this point, the command to create "testregion" get stuck.


After checking the thread stack and the code, I found that the problem is the 
following.

This thread is trapped on an infinite loop waiting for a bucket primary 
election at "PartitionedRegion.waitForNoStorageOrPrimary":


"Function Execution Processor4" tid=0x55
java.lang.Thread.State: TIMED_WAITING
at java.base@11.0.11/java.lang.Object.wait(Native Method)
-  waiting on org.apache.geode.internal.cache.BucketAdvisor@28be7ae0
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForPrimaryMember(BucketAdvisor.java:1433)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForNewPrimary(BucketAdvisor.java:825)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.getPrimary(BucketAdvisor.java:794)
at 
app//org.apache.geode.internal.cache.partitioned.RegionAdvisor.getPrimaryMemberForBucket(RegionAdvisor.java:1032)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getBucketPrimary(PartitionedRegion.java:9081)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.waitForNoStorageOrPrimary(PartitionedRegion.java:3249)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getNodeForBucketWrite(PartitionedRegion.java:3234)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.shadowPRWaitForBucketRecovery(PartitionedRegion.java:10110)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:564)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:443)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderEventProcessor.java:195)
at 
app//org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGatewaySenderQueue.java:183)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.postCreateRegion(PartitionedRegion.java:1177)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3050)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2910)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2894)
at 
app//org.apache.geode.cache.RegionFactory.create(RegionFactory.java:773)


After creating testregion, the sender queue partitioned region is created. 
While that region buckets are recovered the command is trapped on an infinite 
loop waiting for a primary bucket election at 
PartitionedRegion.waitForNoStorageOrPrimary.

This seems to be a known issue because in 
PartitionedRegion.getNodeForBucketWrite, there is the following command before 
calling waitForNoStorageOrPrimary (and the command has been there since Geode's 
first commit!) :

// Possible race with loss of redundancy at this point.
// This loop can possibly create a soft hang if no primary is ever selected.
// This is preferable to returning null since it will prevent obtaining the
// bucket lock for bucket creation.
return waitForNoStorageOrPrimary(bucketId, "write");

Any idea about why the primary bucket is not elected?

It seems the failure is related with the fact that "testregion" is receiving 
updates from the receiver before the "create region" command has finished. If 
the test is repeated without traffic on cluster2 or if I create the cluster1's 
receiver after creating "testregion", this problem is not happening.

Is there any recommendation on the startup order of regions, senders and 
receivers for an scenario like the one described?

Thanks in advance,

Alberto B.


Request for review of PR: GEODE-9408: Avoid duplicate events sent by Serial Gateway Sender when group-transaction-events is true

2021-07-28 Thread Alberto Gomez
Hi,

I would like to request the review of the following PR:

https://github.com/apache/geode/pull/6663 (GEODE-9408: Avoid duplicate events 
sent by Serial Gateway Sender when group-transaction-events is true).

Thanks in advance,

Alberto


Pending review from some code owners for PR linked to GEODE-9369: Command to copy region entries from a WAN site to another

2021-07-28 Thread Alberto Gomez
Hi,

The following PR https://github.com/apache/geode/pull/6601 has received the 
approval from several code owners but there are still some code owners' reviews 
pending.

Could those that have not yet reviewed it, please, have a look?

Thanks in advance,

Alberto


Re: Permissions to comment on RFCs?

2021-07-28 Thread Dan Smith
Hi Mario.

You should have full access to the Geode wiki now.

Thanks,
-Dan

From: Mario Salazar de Torres 
Sent: Tuesday, July 27, 2021 11:12 PM
To: dev@geode.apache.org 
Subject: Re: Permissions to comment on RFCs?

Hi,

Thanks for clarifying it. My username is "mario.salazar.de.torres"

Thanks,
Mario.

From: Dan Smith 
Sent: Tuesday, July 27, 2021 11:52 PM
To: dev@geode.apache.org 
Subject: Re: Permissions to comment on RFCs?

You do need permissions. What is your confluence username? If you don't have 
one you can create account. Then we can add you to the Geode project (we add 
anyone who asks :).

-Dan

From: Mario Salazar de Torres 
Sent: Tuesday, July 27, 2021 2:10 AM
To: dev@geode.apache.org 
Subject: Permissions to comment on RFCs?

Hi,

I've been trying to comment in this RFC: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FOn%2BDemand%2BGeode%2BAuthentication%2BExpiration%2Band%2BRe-authentication&data=04%7C01%7Cdasmith%40vmware.com%7Cfdb3948353e54a3201dd08d9518ebac2%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637630495735425224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DXNNwcAKZyWWDDQZRp0ByUXoIcvvgAO9J0mkkTX%2FWvc%3D&reserved=0
 but I was not able to.
Since I never commented in an RFC, I wonder if I need any kind of permissions?

Thanks,
Mario.


Re: [discuss] RFC for Geode Authentication Expiation and Re-Authentication

2021-07-28 Thread Mark Hanson
Hi Jinmei,

How do you intend to address the "register interests and CQ"? Is that by 
unregistering or queueing?

I think this RFC looks good. 

Thanks,
Mark

On 7/27/21, 2:26 PM, "Jinmei Liao"  wrote:

Calling more feedback on this RFC. I will move this to “Under Development” 
if no objection to its general direction end of this Thursday.

Thanks!

From: Jinmei Liao 
Date: Thursday, July 22, 2021 at 5:37 PM
To: dev@geode.apache.org 
Subject: [discus] RFC for Geode Authentication Expiation and 
Re-Authentication
Hi, Fellow devs,

Here the feature proposal for the said topic. Please review and provide 
your feedback. Thanks!


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FOn%2BDemand%2BGeode%2BAuthentication%2BExpiration%2Band%2BRe-authentication&data=04%7C01%7Chansonm%40vmware.com%7C9d97eeb07f0c4bdb7db308d9514540e4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637630180165496132%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=H8hDjA5KlFkBZqG17idi3Tp%2FkBs8Rsn2RUft37RGx3M%3D&reserved=0

Jinmei



Volunteers to be JIRA/confluence/mailing list admins?

2021-07-28 Thread Dan Smith
Hi all,


We have a couple of admin/human spam filter jobs that I think could use a few 
more volunteers.


  *   Confluence/JIRA admins - we have a process where we grant permission to 
these resources to anyone who asks for access on the mailing list. This could 
be any committer, or really any contributor we are comfortable giving admin 
access to our confluence and/or JIRA projects.
  *   Mailing list moderators - this probably needs to be PMC members since you 
would moderate the private list.

I'd love to get some folks outside of the US time zones so we don't leave 
people outside the US waiting for a day if they need permission.

Any volunteers?

-Dan


Re: Volunteers to be JIRA/confluence/mailing list admins?

2021-07-28 Thread Alberto Bustamante Reyes
Hi,

I could help as confluence/jira admin.

Alberto B.

Obtener Outlook para iOS

De: Dan Smith 
Enviado: Wednesday, July 28, 2021 7:52:23 PM
Para: dev@geode.apache.org 
Asunto: Volunteers to be JIRA/confluence/mailing list admins?

Hi all,


We have a couple of admin/human spam filter jobs that I think could use a few 
more volunteers.


  *   Confluence/JIRA admins - we have a process where we grant permission to 
these resources to anyone who asks for access on the mailing list. This could 
be any committer, or really any contributor we are comfortable giving admin 
access to our confluence and/or JIRA projects.
  *   Mailing list moderators - this probably needs to be PMC members since you 
would moderate the private list.

I'd love to get some folks outside of the US time zones so we don't leave 
people outside the US waiting for a day if they need permission.

Any volunteers?

-Dan


[VOTE] Apache Geode 1.13.4.RC1

2021-07-28 Thread Dick Cavender
Hello Geode Dev Community,

This is a release candidate for Apache Geode version 1.13.4.RC1.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you
performed.

Voting deadline:
3PM PST Fri, July 30 2021
*NOTE: THIS IS AN  ABBREVIATED 2 DAY VOTE *

Please note that we are voting upon the source tag:
rel/v1.13.4.RC1

Release notes:
https://cwiki.apache.org/confluence/display/GEODE/Release+Notes#ReleaseNotes-1.13.4

Source and binary distributions:
https://dist.apache.org/repos/dist/dev/geode/1.13.4.RC1/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachegeode-1087

GitHub:
https://github.com/apache/geode/tree/rel/v1.13.4.RC1
https://github.com/apache/geode-examples/tree/rel/v1.13.4.RC1
https://github.com/apache/geode-native/tree/rel/v1.13.4.RC1
https://github.com/apache/geode-benchmarks/tree/rel/v1.13.4.RC1

Pipelines:
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-13-main
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-13-rc

Geode's KEYS file containing PGP keys we use to sign the release:
https://github.com/apache/geode/blob/develop/KEYS

Command to run geode-examples:
./gradlew -PgeodeReleaseUrl=
https://dist.apache.org/repos/dist/dev/geode/1.13.4.RC1
-PgeodeRepositoryUrl=
https://repository.apache.org/content/repositories/orgapachegeode-1087
build runAll

Regards
Dick Cavender


Re: [VOTE] Apache Geode 1.13.4.RC1

2021-07-28 Thread Nabarun Nag
+1 based on the following:

  *   build from source
  *   running gfsh
  *   starting 2 site WAN cluster with ssl security enabled
  *   verifying data propagation from the 2 sites using puts and gets
  *   Rolling clusters from 1.12 to the release candidate.
  *   Rebalance operations during upgrades.


From: Dick Cavender 
Sent: Wednesday, July 28, 2021 2:49 PM
To: dev@geode.apache.org 
Subject: [VOTE] Apache Geode 1.13.4.RC1

Hello Geode Dev Community,

This is a release candidate for Apache Geode version 1.13.4.RC1.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you
performed.

Voting deadline:
3PM PST Fri, July 30 2021
*NOTE: THIS IS AN  ABBREVIATED 2 DAY VOTE *

Please note that we are voting upon the source tag:
rel/v1.13.4.RC1

Release notes:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FRelease%2BNotes%23ReleaseNotes-1.13.4&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684181664%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=eqCEV6vy0ZdBVTNoECM9C%2BkT7F%2FN4ToCthQ3RHefNLM%3D&reserved=0

Source and binary distributions:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fgeode%2F1.13.4.RC1%2F&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684181664%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yo2j2NEezL7CppnY5i9lbTm53oqziy1leDwQRfG4EU8%3D&reserved=0

Maven staging repo:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachegeode-1087&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684191659%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Amgp8sKQ1ybGOiduQlSK7rr5Qhc6phETeAyul6fMlow%3D&reserved=0

GitHub:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Ftree%2Frel%2Fv1.13.4.RC1&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684191659%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xgrofKpe2nYBsdzApFU%2F74b%2FmaJavY4Cq6%2FtRh60ZFM%3D&reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-examples%2Ftree%2Frel%2Fv1.13.4.RC1&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684191659%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GEHh0AXqsXurox%2FM3PHjjYzZJgvqPpa8Ogs%2BgFzEsvk%3D&reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Ftree%2Frel%2Fv1.13.4.RC1&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684191659%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1pvgB82Qmo5o%2BsStB0my0zMM%2F1maKuMLf%2B69dneBzII%3D&reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-benchmarks%2Ftree%2Frel%2Fv1.13.4.RC1&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684191659%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nqa%2FQziQXkleQY9LQTXQfr2IWEuHQFFsKH1JXoPy9Zo%3D&reserved=0

Pipelines:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-main&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684191659%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RVBi0pBevzehYBjSvpxvY3G93ganPaVcmiymFGvjxY8%3D&reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-rc&data=04%7C01%7Cnnag%40vmware.com%7Cbbcf652af3364ad1d54f08d9521191e0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057684191659%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TQjgIRVS22j9k2Gan5IuUWL1SEzvHlBdpwdjQTp11a0%3D&reserved=0

Geode's KEYS file containing PGP keys we use to sign the release:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.co

Re: [VOTE] Apache Geode 1.13.4.RC1

2021-07-28 Thread Donal Evans
+1

Confirmed that performance across a variety of workloads is on par with 
previous releases.

From: Dick Cavender 
Sent: Wednesday, July 28, 2021 2:49 PM
To: dev@geode.apache.org 
Subject: [VOTE] Apache Geode 1.13.4.RC1

Hello Geode Dev Community,

This is a release candidate for Apache Geode version 1.13.4.RC1.
Thanks to all the community members for their contributions to this release!

Please do a review and give your feedback, including the checks you
performed.

Voting deadline:
3PM PST Fri, July 30 2021
*NOTE: THIS IS AN  ABBREVIATED 2 DAY VOTE *

Please note that we are voting upon the source tag:
rel/v1.13.4.RC1

Release notes:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FRelease%2BNotes%23ReleaseNotes-1.13.4&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682790133%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pZ6rcGwVcvkIZCtZpjPLAGQXXLQz606sc5eay7l6%2FNo%3D&reserved=0

Source and binary distributions:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fgeode%2F1.13.4.RC1%2F&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682790133%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tqdIEWvexyzD4KFpzvJrdXWMTZq%2FkgfE%2BhE2%2F6nm2yA%3D&reserved=0

Maven staging repo:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachegeode-1087&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682790133%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sIiSNWwjMGlgJLu004vHXzmZe9Ea%2FS5%2BC25rlsRNsiE%3D&reserved=0

GitHub:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Ftree%2Frel%2Fv1.13.4.RC1&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682790133%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gCEmjzi0idYaqcGPgj319JazeNYVPFyTMT12qTEYoEg%3D&reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-examples%2Ftree%2Frel%2Fv1.13.4.RC1&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682800090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=u7JQrZFpvrnZkQuafC3YVOqkZLFJSXOGaTMPs9C8uAk%3D&reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-native%2Ftree%2Frel%2Fv1.13.4.RC1&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682800090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KdScjfvdSPhGeWZZKgI2qTFFnLBFCOLsVFXPQ1cgw7g%3D&reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode-benchmarks%2Ftree%2Frel%2Fv1.13.4.RC1&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682800090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=l%2BC7Mdi5FK4%2FQJ6DpBHwED6Vg1x8atRm8u9Sgsuw4dk%3D&reserved=0

Pipelines:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-main&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682800090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WTvSv4OSafEpuw1aSjRdsDV7dqxuzdjYlv3XCPA%2Fc7o%3D&reserved=0
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fconcourse.apachegeode-ci.info%2Fteams%2Fmain%2Fpipelines%2Fapache-support-1-13-rc&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682800090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kpSBlbt6fR1hnoERrNkdgzzQ%2BzwGBoNPy2V6lXA1hUs%3D&reserved=0

Geode's KEYS file containing PGP keys we use to sign the release:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fblob%2Fdevelop%2FKEYS&data=04%7C01%7Cdoevans%40vmware.com%7C232cabf20f48420aeec508d9521191c7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631057682800090%7CUnkn

Re: "create region" cmd stuck on wan setup

2021-07-28 Thread Anilkumar Gingade
The recommendation with WAN setup is:
- Create/start WAN Senders first
- Create Regions
- Create/Start WAN receivers last
 
That way when wan receiver is started; the regions are created on all the 
sites. Sorry, I have not looked at your scripts...

-Anil.



On 7/28/21, 3:31 AM, "Alberto Bustamante Reyes" 
 wrote:

Hi Geode devs,

I have been analyzing an issue that occurs in the following scenario:

1) I start two Geode clusters (cluster1 & cluster2) with one locator and 
two servers each.
Both clusters host a partitioned region called "testregion", which is 
replicated using a parallel gateway sender and a gateway receiver.
These are the gfsh files I have been using for creating the clusters: 
https://gist.github.com/alb3rtobr/e230623255632937fa68265f31e97f3a

2) I run a client connected to cluster2 performing operations on testregion.

3) cluster1 is stopped and all persistent data is deleted. And then, I 
create cluster1 again.

4) At this point, the command to create "testregion" get stuck.


After checking the thread stack and the code, I found that the problem is 
the following.

This thread is trapped on an infinite loop waiting for a bucket primary 
election at "PartitionedRegion.waitForNoStorageOrPrimary":


"Function Execution Processor4" tid=0x55
java.lang.Thread.State: TIMED_WAITING
at java.base@11.0.11/java.lang.Object.wait(Native Method)
-  waiting on org.apache.geode.internal.cache.BucketAdvisor@28be7ae0
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForPrimaryMember(BucketAdvisor.java:1433)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForNewPrimary(BucketAdvisor.java:825)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.getPrimary(BucketAdvisor.java:794)
at 
app//org.apache.geode.internal.cache.partitioned.RegionAdvisor.getPrimaryMemberForBucket(RegionAdvisor.java:1032)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getBucketPrimary(PartitionedRegion.java:9081)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.waitForNoStorageOrPrimary(PartitionedRegion.java:3249)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getNodeForBucketWrite(PartitionedRegion.java:3234)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.shadowPRWaitForBucketRecovery(PartitionedRegion.java:10110)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:564)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:443)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderEventProcessor.java:195)
at 
app//org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGatewaySenderQueue.java:183)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.postCreateRegion(PartitionedRegion.java:1177)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3050)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2910)
at 
app//org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2894)
at 
app//org.apache.geode.cache.RegionFactory.create(RegionFactory.java:773)


After creating testregion, the sender queue partitioned region is created. 
While that region buckets are recovered the command is trapped on an infinite 
loop waiting for a primary bucket election at 
PartitionedRegion.waitForNoStorageOrPrimary.

This seems to be a known issue because in 
PartitionedRegion.getNodeForBucketWrite, there is the following command before 
calling waitForNoStorageOrPrimary (and the command has been there since Geode's 
first commit!) :

// Possible race with loss of redundancy at this point.
// This loop can possibly create a soft hang if no primary is ever 
selected.
// This is preferable to returning null since it will prevent obtaining 
the
// bucket lock for bucket creation.
return waitForNoStorageOrPrimary(bucketId, "write");

Any idea about why the primary bucket is not elected?

It seems the failure is related with the fact that "testregion" is 
receiving updates from the receiver before the "create region" command has 
finished. If the test is repeated without traffic on cluster2 or if I create 
the cluster1's receiver after creating "testregion", this problem is not 
happening.

Is there any recommendation on the startup order of regions, senders and 
receiver

Re: "create region" cmd stuck on wan setup

2021-07-28 Thread Barrett Oglesby
I reproduced your issue with your scripts.

They do:

create gateway-receiver
create disk-store
create gateway-sender
create region

With that order, I see the hang you mentioned. I'm not 100% sure why that is 
happening but you can prevent it by reordering these elements.

As Anil said, you should start your GatewayReceiver last like:

create disk-store
create gateway-sender
create region
create gateway-receiver

With that order, cluster1 restarts fine.

btw 1 - with the order you had regardless of the hang, you'll see lots of 
dropped WAN events since the region doesn't exist yet when the receiver is 
started:

[info 2021/07/28 17:02:39.795 PDT server1_1  
tid=0x3c] The GatewayReceiver started on port : 5411

[warn 2021/07/28 17:02:39.883 PDT server1_1  tid=0x4a] Server connection from 
[identity(192.168.1.7(server2_2:25891):41005,connection=1; port=52554]: 
Caught exception processing batch create request 0 for 100 events
org.apache.geode.cache.RegionDestroyedException: Region /testregion was not 
found during batch create request 0

btw 2 - I use CacheCreation.create to see the order that elements should be 
started. Thats the object that the old GemFire cache xml uses to start things 
in the right order.

Barry

From: Anilkumar Gingade 
Sent: Wednesday, July 28, 2021 3:45 PM
To: dev@geode.apache.org 
Subject: Re: "create region" cmd stuck on wan setup

The recommendation with WAN setup is:
- Create/start WAN Senders first
- Create Regions
- Create/Start WAN receivers last

That way when wan receiver is started; the regions are created on all the 
sites. Sorry, I have not looked at your scripts...

-Anil.



On 7/28/21, 3:31 AM, "Alberto Bustamante Reyes" 
 wrote:

Hi Geode devs,

I have been analyzing an issue that occurs in the following scenario:

1) I start two Geode clusters (cluster1 & cluster2) with one locator and 
two servers each.
Both clusters host a partitioned region called "testregion", which is 
replicated using a parallel gateway sender and a gateway receiver.
These are the gfsh files I have been using for creating the clusters: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Falb3rtobr%2Fe230623255632937fa68265f31e97f3a&data=04%7C01%7Cboglesby%40vmware.com%7C6e6bff680f5d46c6bbcc08d952195ff7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637631091210347322%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=V%2Fnsqn8wiEnEpjf9GQZ4Ta38rPk5ha79RYqlZWZIXzY%3D&reserved=0

2) I run a client connected to cluster2 performing operations on testregion.

3) cluster1 is stopped and all persistent data is deleted. And then, I 
create cluster1 again.

4) At this point, the command to create "testregion" get stuck.


After checking the thread stack and the code, I found that the problem is 
the following.

This thread is trapped on an infinite loop waiting for a bucket primary 
election at "PartitionedRegion.waitForNoStorageOrPrimary":


"Function Execution Processor4" tid=0x55
java.lang.Thread.State: TIMED_WAITING
at java.base@11.0.11/java.lang.Object.wait(Native Method)
-  waiting on org.apache.geode.internal.cache.BucketAdvisor@28be7ae0
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForPrimaryMember(BucketAdvisor.java:1433)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.waitForNewPrimary(BucketAdvisor.java:825)
at 
app//org.apache.geode.internal.cache.BucketAdvisor.getPrimary(BucketAdvisor.java:794)
at 
app//org.apache.geode.internal.cache.partitioned.RegionAdvisor.getPrimaryMemberForBucket(RegionAdvisor.java:1032)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getBucketPrimary(PartitionedRegion.java:9081)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.waitForNoStorageOrPrimary(PartitionedRegion.java:3249)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.getNodeForBucketWrite(PartitionedRegion.java:3234)
at 
app//org.apache.geode.internal.cache.PartitionedRegion.shadowPRWaitForBucketRecovery(PartitionedRegion.java:10110)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:564)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:443)
at 
app//org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderEventProcessor.java:195)
at 
app//org.apache.geode.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGatewaySenderQueue.java:183)
at 
app//org.apache.geode.internal.cach