Passed dunit tests is not enough. It might only mean we don't have enough test 
coverage. 

We need to inspect the code to see what will be the behavior when 2 servers 
configured different conserve-sockets.

On 11/20/20, 3:30 PM, "Donal Evans" <doev...@vmware.com> wrote:

    Regarding behaviour during RollingUpgrade; I created a draft PR with this 
change to test the feasibility and see what problems, if any, would be caused 
by tests assuming the default setting to be true. After fixing two DUnit tests 
that were not explicitly setting the value of conserve-sockets to true, no test 
failures were observed. I also ran a large suite of proprietary tests that 
include rolling upgrade and observed no problems there. This doesn't mean that 
there would definitely be no problems caused by this change, but I can at least 
say that none of the testing we currently have showed any problems.
    ________________________________
    From: Anthony Baker <bak...@vmware.com>
    Sent: Friday, November 20, 2020 8:52 AM
    To: dev@geode.apache.org <dev@geode.apache.org>
    Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to 
false

    Question:  how would this work with a rolling upgrade?  If the user did not 
set this property and we changed the default I believe that we would prevent 
the upgraded member from rejoining the cluster.

    Of course the user could explicitly set this property as you point out.


    Anthony


    > On Nov 20, 2020, at 8:49 AM, Donal Evans <doev...@vmware.com> wrote:
    >
    > While I agree that the potential impact of having the setting changed out 
from a user may be high, the cost of addressing that change is very small. All 
users have to do is explicitly set the conserve-sockets value to true if they 
were previously using the default and they will be back to where they started 
with no change in behaviour or resource requirements. This could be as simple 
as adding a single line to a properties file, which seems like a pretty small 
inconvenience.
    >
    > Get Outlook for 
Android<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36&amp;data=04%7C01%7Czhouxh%40vmware.com%7C652bd1a75ea14bea138808d88dac418d%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637415118265371166%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9o1R6y1dMiKzF%2BH0TZsrbNdz1E2UMRXOOg%2F%2FBm8JpZQ%3D&amp;reserved=0>
    >
    > ________________________________
    > From: Anthony Baker <bak...@vmware.com>
    > Sent: Thursday, November 19, 2020 5:57:33 PM
    > To: dev@geode.apache.org <dev@geode.apache.org>
    > Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to 
false
    >
    > I think there are many good reasons to flip the default value for this 
property. I do question whether requiring a user to allocate new hardware to 
support the changed resource requirements is appropriate for a minor version 
bump. In most cases I think that would come as an unwelcome surprise during the 
upgrade.
    >
    > Anthony
    >
    >> On Nov 19, 2020, at 10:42 AM, Dan Smith <dasm...@vmware.com> wrote:
    >>
    >> Personally, this has caused enough grief in the past (both ways, 
actually!) that I 'd say this is a major version change.
    >> I agree with John. Either value of conserve-sockets can crash or hang 
your system depending on your use case.
    >>
    >> If this was just a matter of slowing down or speeding up performance, I 
think we could change it. But users that are impacted won't just see their 
system slow down. It will crash or hang. Potentially only with production sized 
workloads.
    >>
    >> With conserve-sockets=false every thread on the server creates its own 
sockets to other servers. With N servers that's N sockets per thread. With our 
default of a max of 800 threads for client connections and a 20 server cluster 
you are looking at a worst case of 800 * 20 = 16K sending sockets per server, 
with another 16K receiving sockets and 16K receiving threads. That's before 
considering function execution threads, WAN receivers, and various other 
executors we have on the server. Users with too many threads will hit their 
file descriptor or thread limits. Or they will run out of memory for thread 
stacks, socket buffers, etc.
    >>
    >> -Dan
    >>
    >


Reply via email to