Gester,

Clearly the prior implementation had some problems, but except in pathological 
cases it provided the behavior users expected.  That’s why I think we need a 
characterization test(s) to show exactly what we want the behavior to be.  
Merging in changes that make the user experience worse in the more common 
scenarios isn’t a good tradeoff IMO.  I see this work as integral to GEODE-5591 
and shouldn’t be deferred to a separate ticket.

Anthony


> On Sep 5, 2018, at 10:43 AM, Xiaojian Zhou <gz...@pivotal.io> wrote:
> 
> The fix intend to resolve 2 issues:
> 1) change the exception handling (for a linux version).
> 2) prevent random picking port number to loop forever. In old code, for
> example, if the range only contains one port, random will always pick the
> same port and it will loop forever. The fix will stop after all available
> ports in the range are tried. There's a test
> 
> test_ValidateGatewayReceiverAttributes_WrongBindAddress
> 
> 
> For 2-minute-wait, it's still possible. The fix did not resolve it
> (when random() happened to return same port for different receiver in
> the same member), but I did not make things worse either.
> 
> 
> There's discussion on if we can reduce the 2-minute-timeout to a few
> second. This is definitely another ticket.
> 
> Regards
> 
> Gester
> 
> 
> On Wed, Sep 5, 2018 at 10:35 AM, Anthony Baker <aba...@pivotal.io> wrote:
> 
>> Before this improvement is re-merged I’d like to see:
>> 
>> 1) A test that characterizes the current behavior (e.g. doesn’t wait 2 min
>> when there’s a port conflict)
>> 2) A test that demonstrates how the current logic is insufficient
>> 
>> Anthony
>> 
>> 
>>> On Sep 5, 2018, at 10:20 AM, Nabarun Nag <n...@apache.org> wrote:
>>> 
>>> GEODE-5591 has been reverted in develop
>>> ref: 901da27f227a8ce2b7d6b681619782a1accd9330
>>> 
>>> Regards
>>> Nabarun Nag
>>> 
>>> On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon <rmcma...@pivotal.io>
>> wrote:
>>> 
>>>> +1 for reverting in both places.
>>>> 
>>>> I see that there is already an isGatewayReceiver flag in the
>> AcceptorImpl
>>>> constructor.  It's not ideal, but could we use this flag to prevent the
>> 2
>>>> minute retry logic for happening if this flag is true?
>>>> 
>>>> Ryan
>>>> 
>>>> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
>>>> lhughesgodf...@pivotal.io> wrote:
>>>> 
>>>>> +1 for reverting in both places.
>>>>> 
>>>>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith <dsm...@pivotal.io> wrote:
>>>>> 
>>>>>> +1 for reverting in both places. The current fix is not better, that's
>>>>> why
>>>>>> we are reverting it on the release branch!
>>>>>> 
>>>>>> -Dan
>>>>>> 
>>>>>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett <jbarr...@pivotal.io>
>>>>> wrote:
>>>>>> 
>>>>>>> I’m not ok with reverting in develop. Revert in 1.7 and modify in
>>>>>> develop.
>>>>>>> We shouldn’t go backwards in develop. The current fix is better than
>>>>> the
>>>>>>> bug it fixes.
>>>>>>> 
>>>>>>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag <n...@apache.org> wrote:
>>>>>>>> 
>>>>>>>> If everyone is okay with it, I will revert that change in develop
>>>> and
>>>>>>> then
>>>>>>>> cherry pick it to release/1.7.0 branch.
>>>>>>>> Please do comment.
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> Nabarun Nag
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsm...@pivotal.io>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> +1 to yank it and rework the fix.
>>>>>>>>> 
>>>>>>>>> Gester's change helps, but it just means that you will sometimes
>>>>>>> randomly
>>>>>>>>> have a 2 minute delay starting up a gateway receiver. I don't
>>>> think
>>>>>>> that is
>>>>>>>>> a great user experience either.
>>>>>>>>> 
>>>>>>>>> -Dan
>>>>>>>>> 
>>>>>>>>> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
>>>>>>> bschucha...@pivotal.io>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Let's yank it
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 9/4/18 5:04 PM, Sean Goller wrote:
>>>>>>>>>>> 
>>>>>>>>>>> If it's to get the release out, I'm fine with reverting. I don't
>>>>>> like
>>>>>>>>> it,
>>>>>>>>>>> but I'm not willing to die on that hill. :)
>>>>>>>>>>> 
>>>>>>>>>>> -S.
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsm...@pivotal.io>
>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Spitting this into a separate thread.
>>>>>>>>>>>> 
>>>>>>>>>>>> I see the issue. The two minute timeout is the constructor for
>>>>>>>>>>>> AcceptorImpl, where it retries to bind for 2 minutes.
>>>>>>>>>>>> 
>>>>>>>>>>>> That behavior makes sense for CacheServer.start.
>>>>>>>>>>>> 
>>>>>>>>>>>> But it doesn't make sense for the new logic in
>>>>>>> GatewayReceiver.start()
>>>>>>>>>>>> from
>>>>>>>>>>>> GEODE-5591. That code is trying to use CacheServer.start to
>>>> scan
>>>>>> for
>>>>>>> an
>>>>>>>>>>>> available port, trying each port in a range. That free port
>>>>> finding
>>>>>>>>> logic
>>>>>>>>>>>> really doesn't want to have two minutes of retries for each
>>>> port.
>>>>>> It
>>>>>>>>>>>> seems
>>>>>>>>>>>> like we need to rework the fix for GEODE-5591.
>>>>>>>>>>>> 
>>>>>>>>>>>> Does it make sense to hold up the release to rework this fix,
>>>> or
>>>>>>> should
>>>>>>>>>>>> we
>>>>>>>>>>>> just revert it? Have we switched concourse over to using alpine
>>>>>>> linux,
>>>>>>>>>>>> which I think was the original motivation for this fix?
>>>>>>>>>>>> 
>>>>>>>>>>>> -Dan
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsm...@pivotal.io>
>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Why is it waiting at all in this case? Where is this 2 minute
>>>>>> timeout
>>>>>>>>>>>>> coming from?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Dan
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
>>>>>>>>>>>>> 
>>>>>>>>>>>> sai.boorlaga...@gmail.com
>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> So the issue is that it takes longer to start than previous
>>>>>>> releases?
>>>>>>>>>>>>>> Also, is this wait time only when using Gfsh to create
>>>>>>>>>>>>>> gateway-receiver?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag <n...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Currently we have a minor issue in the release branch as
>>>>> pointed
>>>>>>> out
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> by
>>>>>>>>>>>> 
>>>>>>>>>>>>> Barry O.
>>>>>>>>>>>>>>> We will wait till a resolution is figured out for this
>>>> issue.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Steps:
>>>>>>>>>>>>>>> 1. create locator
>>>>>>>>>>>>>>> 2. start server --name=server1 --server-port=40404
>>>>>>>>>>>>>>> 3. start server --name=server2 --server-port=40405
>>>>>>>>>>>>>>> 4. create gateway-receiver --member=server1
>>>>>>>>>>>>>>> 5. create gateway-receiver --member=server2 `This gets stuck
>>>>>> for 2
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> minutes`
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Is the 2 minute wait time acceptable? Should we document it?
>>>>>> When
>>>>>>> we
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> revert
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> GEODE-5591, this issue does not happen.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Nabarun Nag
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 

Reply via email to