Re: Questions about conserve-sockets and WAN replication

2021-08-26 Thread Alberto Gomez
@Dave Barnes, sorry for not having answered to your 
e-mail before.

I am missing the following in the referred documentation:

  *   State that conserve-sockets must be set to false for members that 
participate in a WAN deployment as it is stated in other parts of the 
documentation (see 
./geode-docs/topologies_and_comm/multi_site_configuration/setting_up_a_multisite_system.html.md.erb
 ./geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb 
./geode-docs/reference/topics/gemfire_properties.html.md.erb):
  *   "To avoid hangs related to WAN messaging, always use the default setting 
of conserve-sockets=false for 
<%=vars.product_name%> members that participate in a WAN deployment."

@dev@geode.apache.org, besides the above, I think 
the documentation is missing a very important piece of information that I have 
found in [1]:
"even with conserve-sockets set to false, function executions do not use this 
setting and defaults to conserve-sockets=true behavior, regardless of the 
conserve-sockets setting. "
and in [2]:
"a Function Execution Processor does not honor the conserve-sockets setting so 
a shared P2P message reader is used in the remote server"

I wonder if this should be stated in the Geode documentation or rather if 
function execution behavior should be changed to honor the conserve-sockets 
setting.
Any thoughts on this?

Best regards,

Alberto

[1] 
https://community.pivotal.io/s/article/GemFire-Function-Executions-and-conserve-sockets-behavior?language=en_US
[2] 
https://medium.com/swlh/threads-used-in-apache-geode-function-execution-9dd707cf227c#bd8c


  *


From: Dave Barnes 
Sent: Wednesday, July 7, 2021 1:05 AM
To: dev@geode.apache.org 
Subject: Re: Questions about conserve-sockets and WAN replication

Alberto,
I recently updated some of the descriptions regarding conserve-sockets.
Please check out this PR and see if it addresses any of your concerns.
https://github.com/apache/geode/pull/6516

On Tue, Jul 6, 2021 at 9:57 AM Alberto Gomez  wrote:

> Hi,
>
> The Geode documentation states the following about conserve-sockets and
> WAN deployments in [1]:
>
> "WAN deployments increase the messaging demands on a Geode system. To
> avoid hangs related to WAN messaging, always set `conserve-sockets=false`
> for Geode members that participate in a WAN deployment."
>
> Could anyone please provide some more detailed information about why and
> where these hangs could happen? Is this a hard limitation or something to
> be considered under certain circumstances?
>
> We have run into an unexpected situation which we wonder if it is related
> to the documentation statement above:
>
> In a system like the following:
>  - 2 WAN sites and 3 servers each
>  - several partitioned regions with parallel senders
>  - several replicated regions with serial senders
>  - conserve-sockets set to true
>
> We have sometimes observed, when trying to stop a parallel gateway sender
> while puts are being sent to both sites, that the thread stopping the
> gateway sender in one of the members gets stuck waiting to receive a reply
> from the other members (trying to get the size of the queue, see [2]). We
> see also other threads stuck, some trying to get a lock held by the stuck
> thread and others waiting in
> ReplyProcessor21.waitForRepliesUninterruptibly() trying to put or get data
> remotely (See [3] and [4]).
> If we set conserve-sockets to false we do not experience any hang.
>
> Could these stuck threads be related to what is stated in the
> documentation about WAN deployments and conserve-sockets set to true or
> should we rather think that it is an unrelated bug that needs to be solved?
>
> Thanks in advance for your help,
>
> Alberto
>
> [1]
> https://geode.apache.org/docs/guide/113/managing/monitor_tune/sockets_and_gateways.html
>
> [2]
> "ConcurrentParallelGatewaySenderEventProcessor Stopper Thread1" #1316
> daemon prio=10 os_prio=0 cpu=18.86ms elapsed=1544.80s
> tid=0x7f92bc1c2000 nid=0x2154 waiting on condition  [0x7f9179cd2000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.11/Native Method)
> - parking to wait for  <0x00031ca2be50> (a
> java.util.concurrent.CountDownLatch$Sync)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.11
> /LockSupport.java:234)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.11
> /AbstractQueuedSynchronizer.java:1079)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.11
> /AbstractQueuedSynchronizer.java:1369)
> at java.util.concurrent.CountDownLatch.await(java.base@11.0.11
> /CountDownLatch.java:278)
> at
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
> at
> org.apache.geode

Re: Questions about conserve-sockets and WAN replication

2021-08-26 Thread Dave Barnes
Alberto,
As you point out, the recommendation to use `conserve-sockets=false` in WAN
configurations already appears in at least three places in the Geode User
Guide.
We can insert an additional mention into the guide -- did you have a
location in mind (sorry, this is an old thread and I don't recall whether
we already discussed this).

With regard to function execution ignoring the global setting, I suspect
that changing function behavior would break existing applications, so that
is probably not an option here.
Again, if you help me identify locations in the doc where you think we need
to insert a note regarding this behavior, we can do that.

Let's state these changes in one or two JIRA tickets for the docs
component. I'm happy to work with you on creating that ticket, but need
your help in identifying target locations in the guide.
Thanks,
Dave


On Thu, Aug 26, 2021 at 2:28 AM Alberto Gomez 
wrote:

> @Dave Barnes, sorry for not having answered to
> your e-mail before.
>
> I am missing the following in the referred documentation:
>
>   *   State that conserve-sockets must be set to false for members that
> participate in a WAN deployment as it is stated in other parts of the
> documentation (see
> ./geode-docs/topologies_and_comm/multi_site_configuration/setting_up_a_multisite_system.html.md.erb
> ./geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb
> ./geode-docs/reference/topics/gemfire_properties.html.md.erb):
>   *   "To avoid hangs related to WAN messaging, always use the default
> setting of conserve-sockets=false for
> <%=vars.product_name%> members that participate in a WAN deployment."
>
> @dev@geode.apache.org, besides the above, I
> think the documentation is missing a very important piece of information
> that I have found in [1]:
> "even with conserve-sockets set to false, function executions do not use
> this setting and defaults to conserve-sockets=true behavior, regardless of
> the conserve-sockets setting. "
> and in [2]:
> "a Function Execution Processor does not honor the conserve-sockets
> setting so a shared P2P message reader is used in the remote server"
>
> I wonder if this should be stated in the Geode documentation or rather if
> function execution behavior should be changed to honor the conserve-sockets
> setting.
> Any thoughts on this?
>
> Best regards,
>
> Alberto
>
> [1]
> https://community.pivotal.io/s/article/GemFire-Function-Executions-and-conserve-sockets-behavior?language=en_US
> [2]
> https://medium.com/swlh/threads-used-in-apache-geode-function-execution-9dd707cf227c#bd8c
>
>
>   *
>
> 
> From: Dave Barnes 
> Sent: Wednesday, July 7, 2021 1:05 AM
> To: dev@geode.apache.org 
> Subject: Re: Questions about conserve-sockets and WAN replication
>
> Alberto,
> I recently updated some of the descriptions regarding conserve-sockets.
> Please check out this PR and see if it addresses any of your concerns.
> https://github.com/apache/geode/pull/6516
>
> On Tue, Jul 6, 2021 at 9:57 AM Alberto Gomez 
> wrote:
>
> > Hi,
> >
> > The Geode documentation states the following about conserve-sockets and
> > WAN deployments in [1]:
> >
> > "WAN deployments increase the messaging demands on a Geode system. To
> > avoid hangs related to WAN messaging, always set `conserve-sockets=false`
> > for Geode members that participate in a WAN deployment."
> >
> > Could anyone please provide some more detailed information about why and
> > where these hangs could happen? Is this a hard limitation or something to
> > be considered under certain circumstances?
> >
> > We have run into an unexpected situation which we wonder if it is related
> > to the documentation statement above:
> >
> > In a system like the following:
> >  - 2 WAN sites and 3 servers each
> >  - several partitioned regions with parallel senders
> >  - several replicated regions with serial senders
> >  - conserve-sockets set to true
> >
> > We have sometimes observed, when trying to stop a parallel gateway sender
> > while puts are being sent to both sites, that the thread stopping the
> > gateway sender in one of the members gets stuck waiting to receive a
> reply
> > from the other members (trying to get the size of the queue, see [2]). We
> > see also other threads stuck, some trying to get a lock held by the stuck
> > thread and others waiting in
> > ReplyProcessor21.waitForRepliesUninterruptibly() trying to put or get
> data
> > remotely (See [3] and [4]).
> > If we set conserve-sockets to false we do not experience any hang.
> >
> > Could these stuck threads be related to what is stated in the
> > documentation about WAN deployments and conserve-sockets set to true or
> > should we rather think that it is an unrelated bug that needs to be
> solved?
> >
> > Thanks in advance for your help,
> >
> > Alberto
> >
> > [1]
> >
> https://geode.apache.org/docs/guide/113/managing/monitor_tune/sockets_and_gateways.html
> >
> > [2]

Re: [INFO] Apache Geode 1.14.0 Release Manager

2021-08-26 Thread Nabarun Nag
Hi Alberto,

We are starting tests this weekend, and with the condition that the tests are 
all good, we should have a release candidate for voting next week. I will keep 
you updated!

Regards
Nabarun

From: Alberto Gomez 
Sent: Tuesday, August 24, 2021 2:58 AM
To: dev@geode.apache.org 
Subject: Re: [INFO] Apache Geode 1.14.0 Release Manager

Hi Naba,

Is there any new information about the 1.14 Geode release?

Thanks in advance,

Alberto

From: Nabarun Nag 
Sent: Tuesday, June 1, 2021 6:46 PM
To: dev@geode.apache.org 
Subject: Re: [INFO] Apache Geode 1.14.0 Release Manager

Hi Alberto,

For releasing 1.14 we are waiting on two more backports.

GEODE-8609 - DUnit runners were not checking the logs for suspicious 
errors/fatal messages in VMs that were being restarted in a test. A fix is 
ready and is in PR review stage[1]

GEODE-9289 - A problem with Cluster Configuration where a new locator sends its 
configuration to an old locator, the old locator is unable to deserialize it, 
causes NPEs and clears out certain fields in the configuration. A PR is ready 
and is in review phase but this will need GEODE-8609 checked in first.[2]

Once these two PRs are merged and backported, we are going to start the process 
for voting on the release branch and release candidates. My personal estimate 
is that we can start with the voting process within next week, if we do not 
detect any serious issues.

Please do reach out if you require any additional information.

Regards,
Nabarun

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6526&data=04%7C01%7Cnnag%40vmware.com%7C60c8954316a542ace8e608d966e5b527%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637653959043733474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZZDeZkEiKWyaEGqMjxvZu7T6rZKhqsSD%2BnYDx6Fd060%3D&reserved=0
[2] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6495&data=04%7C01%7Cnnag%40vmware.com%7C60c8954316a542ace8e608d966e5b527%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637653959043733474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cP9cmUb1tVoULPx6brwoChMSpeJiEU0AFFTgKhwsC6M%3D&reserved=0

From: Alberto Gomez 
Sent: Monday, May 31, 2021 8:08 AM
To: dev@geode.apache.org 
Subject: Re: [INFO] Apache Geode 1.14.0 Release Manager

Hi Naba,

Can you please provide some information about how the 1.14 release is going and 
if is there any planned date for it?

Thanks in advance,

Alberto

From: Nabarun Nag 
Sent: Monday, March 22, 2021 5:27 PM
To: dev@geode.apache.org 
Subject: [INFO] Apache Geode 1.14.0 Release Manager

Hi everyone,

I hope you all are doing well. This is to inform the Apache Geode community 
that I will be volunteering as the Release Manager for 1.14.0 release. Thank 
you, Owen, for all the work that has been done to get the release to this point.

As for backporting, as a developer, you just need to create a PR against the 
support/1.14 branch, and you are done. As a release manager, I will take over 
from there.

Just ensure the following:

  *   The PR is a cherry-pick (cherry-pick -x) of a commit that is already in 
develop
  *   Ensure that there are no merge conflicts.

Regards
Nabarun Nag



Revert GEODE-9408: Avoid duplicate events sent by Serial Gateway Sender when group-transaction-events is true (#6663)

2021-08-26 Thread Kirk Lund
I have submitted PR #6809 to temporarily revert GEODE-9408 until it's ready
to go back in.

https://github.com/apache/geode/pull/6809

I've reviewed the code that went in with #b377e3, and I think we should
revert it to rework WanCopyRegionFunction and its unit test. The unit test
currently mocks three methods in WanCopyRegionFunction (known as partial
mocking) which is a sign that dependencies are hidden inside the class and
need to be pulled up to the constructor.

WanCopyRegionFunction also has a static final ExecutorService which is
never shutdown. The ExecutorService should live outside
WanCopyRegionFunction, be passed in via the constructor, and be shutdown
when the Cache closes.

The intermittent test failures are not specific to Windows and are likely
to also fail on a slow Linux machine.

We really need to avoid Thread sleeps especially in unit tests. This and
the partial mocking are signs that the class needs to change to better
enable unit testing.

I'll be more than happy to help but I think we need to revert it so we have
time to discuss it without everyone else being affected by failing tests.

Thanks,
Kirk