Passed dunit tests is not enough. It might only mean we don't have enough test coverage.
We need to inspect the code to see what will be the behavior when 2 servers configured different conserve-sockets. On 11/20/20, 3:30 PM, "Donal Evans" <doev...@vmware.com> wrote: Regarding behaviour during RollingUpgrade; I created a draft PR with this change to test the feasibility and see what problems, if any, would be caused by tests assuming the default setting to be true. After fixing two DUnit tests that were not explicitly setting the value of conserve-sockets to true, no test failures were observed. I also ran a large suite of proprietary tests that include rolling upgrade and observed no problems there. This doesn't mean that there would definitely be no problems caused by this change, but I can at least say that none of the testing we currently have showed any problems. ________________________________ From: Anthony Baker <bak...@vmware.com> Sent: Friday, November 20, 2020 8:52 AM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false Question: how would this work with a rolling upgrade? If the user did not set this property and we changed the default I believe that we would prevent the upgraded member from rejoining the cluster. Of course the user could explicitly set this property as you point out. Anthony > On Nov 20, 2020, at 8:49 AM, Donal Evans <doev...@vmware.com> wrote: > > While I agree that the potential impact of having the setting changed out from a user may be high, the cost of addressing that change is very small. All users have to do is explicitly set the conserve-sockets value to true if they were previously using the default and they will be back to where they started with no change in behaviour or resource requirements. This could be as simple as adding a single line to a properties file, which seems like a pretty small inconvenience. > > Get Outlook for Android<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36&data=04%7C01%7Czhouxh%40vmware.com%7C652bd1a75ea14bea138808d88dac418d%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637415118265371166%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9o1R6y1dMiKzF%2BH0TZsrbNdz1E2UMRXOOg%2F%2FBm8JpZQ%3D&reserved=0> > > ________________________________ > From: Anthony Baker <bak...@vmware.com> > Sent: Thursday, November 19, 2020 5:57:33 PM > To: dev@geode.apache.org <dev@geode.apache.org> > Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false > > I think there are many good reasons to flip the default value for this property. I do question whether requiring a user to allocate new hardware to support the changed resource requirements is appropriate for a minor version bump. In most cases I think that would come as an unwelcome surprise during the upgrade. > > Anthony > >> On Nov 19, 2020, at 10:42 AM, Dan Smith <dasm...@vmware.com> wrote: >> >> Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change. >> I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case. >> >> If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads. >> >> With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc. >> >> -Dan >> >