Re: On conserve-sockets=true with WAN and/or transactions - Follow-up on April's Geode Community Meeting

2022-04-15 Thread Barry Oglesby
Alberto,

I can only speak to the WAN question in your email. The conserve-sockets 
setting was (or is) a limitation on serial WAN, but I just ran a few tests, and 
it is not deadlocking. Its been a while since I've tried serial WAN with 
conserve-sockets=true, but I'm pretty sure a test with several servers in each 
site and a multi-threaded client doing puts would cause the deadlock. That is 
not happening in my tests. We would need way more than a few simple tests to 
prove that it doesn't deadlock in other scenarios, though.

Barry

From: Alberto Gomez 
Sent: Friday, April 8, 2022 4:17 AM
To: dev@geode.apache.org ; u...@geode.apache.org 

Subject: On conserve-sockets=true with WAN and/or transactions - Follow-up on 
April's Geode Community Meeting

⚠ External Email

Hi,

Following up on the discussion we had yesterday in the Apache Geode Community 
meeting around the "Reflections on conserve-sockets setting in Apache Geode" 
topic, I'd like to post here some questions that could not be fully answered 
during the meeting:

The Geode documentation states the following about conserve-sockets and WAN 
deployments in [1]:
"WAN deployments increase the messaging demands on a Geode system. To avoid 
hangs related to WAN messaging, always set `conserve-sockets=false` for Geode 
members that participate in a WAN deployment."

It also states the following about conserve-sockets and transactions in [2]:
"When you have transactions operating on EMPTY, NORMAL or PARTITION regions, 
make sure that conserve-sockets is set to false to avoid distributed deadlocks."

Doing a search on the Geode tests, the only test case related to deadlocks with 
conserve-sockets=true that I have found is:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fblob%2F41eb49989f25607acfcbf9ac5afe3d4c0721bb35%2Fgeode-wan%2Fsrc%2FdistributedTest%2Fjava%2Forg%2Fapache%2Fgeode%2Finternal%2Fcache%2Fwan%2Fserial%2FSerialGatewaySenderDistributedDeadlockDUnitTest.java%23L176&data=04%7C01%7Cboglesby%40vmware.com%7C99bf10e9a0504739006a08da1951657c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637850134638236362%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vnNkVWk0vTSjkAg1neUK91qe%2BwMyfoFyf9gabnT%2BKXs%3D&reserved=0
According to the comments in the test, it always causes a distributed deadlock, 
and it is commented out. Nevertheless, the test case is actually NOT commented 
out and, in fact, if you execute it, you see it passing without any 
failure/deadlock.

And here the questions:

Could it be that deadlocks with conserve-sockets=true and WAN and/or 
transactions over partitioned regions was some legacy issue that has already 
been fixed?

Otherwise, could someone please provide some more information about why these 
deadlocks could happen? It would be great if there were test cases that 
showcase this possibility.

It looks like a big limitation of Geode that you are forced to set 
conserve-sockets to false (with the implications this has on resources usage) 
when you are using WAN replication and/or transactions on partitioned regions.

Could it be that there are other elements (for example also using 
CacheListeners as Anthony Baker pointed out) that would increase the risk of 
hitting a distributed deadlock?

Thanks in advance,

Alberto

[1]: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F114%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&data=04%7C01%7Cboglesby%40vmware.com%7C99bf10e9a0504739006a08da1951657c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637850134638236362%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=S%2B9DPPcFSrxIlCHtPFB0QUUVwT3fTcvHPapoP6vd97U%3D&reserved=0

[2]: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F114%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&data=04%7C01%7Cboglesby%40vmware.com%7C99bf10e9a0504739006a08da1951657c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637850134638236362%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2FtF2LJ7T6yLn%2FL0ZRySokjK8%2BOSUvXTV1BiFtNA2cpI%3D&reserved=0



⚠ External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.


Re: [discuss] Future of the geode-redis module

2022-04-15 Thread Dave Barnes
I regularly contribute to the documentation, and I support this change.
Removal of the experimental Redis module would lighten the burden for those
who maintain the Geode documentation.

On Thu, Apr 14, 2022 at 10:39 PM Owen Nichols  wrote:

> Well said, Donal.
>
> As a past Release Manager, geode-for-redis impacted release timelines for
> 1.13, 1.14, and most recently 1.15.  I support this proposal in hopes that
> refocusing on core Geode makes future releases a little more predictable.
>
> Now might also be a good time to review other features in Geode that are
> marked @Experimental, such as manageability REST API, JDBC connector,
> micrometer, SSLParameterExtension, GfshCommand, or AutoBalancer.  I hope
> this proposal encourages more proposals to either shelve or promote some of
> these other experiments.
>
> From: Donal Evans 
> Date: Thursday, April 14, 2022 at 9:10 PM
> To: dev@geode.apache.org 
> Subject: Re: [discuss] Future of the geode-redis module
> I agree with this proposal.
>
> As someone who has contributed a great deal of time and effort to the
> geode-for-redis module in the past year or so, it's sad to see it go, but
> realistically, the last thing that Geode needs is more unmaintained code.
> Our track record for removing dead or deprecated code is pretty poor, so
> removing this before it gets a chance to add to the burden seems like a
> sensible idea. In addition to this, the tests in the geode-for-redis module
> contribute significantly to the total run time of CI pipeline jobs,
> particularly the acceptance tests, in which geode-for-redis tests account
> for over 50% of the total run time.
> 
> From: Alexander Murmann 
> Sent: Thursday, April 14, 2022 5:15 PM
> To: geode 
> Subject: [discuss] Future of the geode-redis module
>
> Hi Geode community!
>
> A while ago engineers from VMware started to improve the geode-redis
> module that has been in Geode for several years, but never had been
> completed. This involved adding more tests for existing functionality, as
> well as adding support for more commands.
>
> VMware will not continue this investment in the geode-redis module. While
> the geode-redis module is in a much better place than two years ago there
> still is a lot of work left to make it a viable option for most of our
> users and the module is still in an experimental state.
>
> For the community this poses the question of what we want to do with the
> existing code. This work now has been started and stopped twice. All code
> comes with a maintenance burden which adds up<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fben.balter.com%2F2016%2F07%2F21%2Fremoving-a-feature-is-a-feature%2F&data=05%7C01%7Conichols%40vmware.com%7C446f816dd25e4b44826908da1e95e1f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637855926345514685%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rherU8SIujYGLQXkXgIUPCBUtHvMaSy71FQkIvvSWjc%3D&reserved=0>.
> Dependencies might need to be updated; flaky tests fixed and it brings
> cognitive overhead for us and our users. Unless someone else in the
> community steps up to take on the task of maintaining the geode-redis
> module, I propose to remove the code from our develop branch and give
> everyone more bandwidth to use elsewhere.
>
> Thank you all in advance for your thoughts
>