OK, after discussion with Jason and Ryan. a PR #2425 is ready. It contains fix for 3 issues, including skipping the 2-minute-timeout.
On Wed, Sep 5, 2018 at 11:03 AM, Udo Kohlmeyer <u...@apache.org> wrote: > +1 > > > > On 9/5/18 10:35, Anthony Baker wrote: > >> Before this improvement is re-merged I’d like to see: >> >> 1) A test that characterizes the current behavior (e.g. doesn’t wait 2 >> min when there’s a port conflict) >> 2) A test that demonstrates how the current logic is insufficient >> >> Anthony >> >> >> On Sep 5, 2018, at 10:20 AM, Nabarun Nag <n...@apache.org> wrote: >>> >>> GEODE-5591 has been reverted in develop >>> ref: 901da27f227a8ce2b7d6b681619782a1accd9330 >>> >>> Regards >>> Nabarun Nag >>> >>> On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon <rmcma...@pivotal.io> >>> wrote: >>> >>> +1 for reverting in both places. >>>> >>>> I see that there is already an isGatewayReceiver flag in the >>>> AcceptorImpl >>>> constructor. It's not ideal, but could we use this flag to prevent the >>>> 2 >>>> minute retry logic for happening if this flag is true? >>>> >>>> Ryan >>>> >>>> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey < >>>> lhughesgodf...@pivotal.io> wrote: >>>> >>>> +1 for reverting in both places. >>>>> >>>>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith <dsm...@pivotal.io> wrote: >>>>> >>>>> +1 for reverting in both places. The current fix is not better, that's >>>>>> >>>>> why >>>>> >>>>>> we are reverting it on the release branch! >>>>>> >>>>>> -Dan >>>>>> >>>>>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett <jbarr...@pivotal.io> >>>>>> >>>>> wrote: >>>>> >>>>>> I’m not ok with reverting in develop. Revert in 1.7 and modify in >>>>>>> >>>>>> develop. >>>>>> >>>>>>> We shouldn’t go backwards in develop. The current fix is better than >>>>>>> >>>>>> the >>>>> >>>>>> bug it fixes. >>>>>>> >>>>>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag <n...@apache.org> wrote: >>>>>>>> >>>>>>>> If everyone is okay with it, I will revert that change in develop >>>>>>>> >>>>>>> and >>>> >>>>> then >>>>>>> >>>>>>>> cherry pick it to release/1.7.0 branch. >>>>>>>> Please do comment. >>>>>>>> >>>>>>>> Regards >>>>>>>> Nabarun Nag >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsm...@pivotal.io> >>>>>>>>> >>>>>>>> wrote: >>>> >>>>> +1 to yank it and rework the fix. >>>>>>>>> >>>>>>>>> Gester's change helps, but it just means that you will sometimes >>>>>>>>> >>>>>>>> randomly >>>>>>> >>>>>>>> have a 2 minute delay starting up a gateway receiver. I don't >>>>>>>>> >>>>>>>> think >>>> >>>>> that is >>>>>>> >>>>>>>> a great user experience either. >>>>>>>>> >>>>>>>>> -Dan >>>>>>>>> >>>>>>>>> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt < >>>>>>>>> >>>>>>>> bschucha...@pivotal.io> >>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Let's yank it >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 9/4/18 5:04 PM, Sean Goller wrote: >>>>>>>>>>> >>>>>>>>>>> If it's to get the release out, I'm fine with reverting. I don't >>>>>>>>>>> >>>>>>>>>> like >>>>>> >>>>>>> it, >>>>>>>>> >>>>>>>>>> but I'm not willing to die on that hill. :) >>>>>>>>>>> >>>>>>>>>>> -S. >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsm...@pivotal.io> >>>>>>>>>>> >>>>>>>>>> wrote: >>>>> >>>>>> Spitting this into a separate thread. >>>>>>>>>>> >>>>>>>>>>>> I see the issue. The two minute timeout is the constructor for >>>>>>>>>>>> AcceptorImpl, where it retries to bind for 2 minutes. >>>>>>>>>>>> >>>>>>>>>>>> That behavior makes sense for CacheServer.start. >>>>>>>>>>>> >>>>>>>>>>>> But it doesn't make sense for the new logic in >>>>>>>>>>>> >>>>>>>>>>> GatewayReceiver.start() >>>>>>> >>>>>>>> from >>>>>>>>>>>> GEODE-5591. That code is trying to use CacheServer.start to >>>>>>>>>>>> >>>>>>>>>>> scan >>>> >>>>> for >>>>>> >>>>>>> an >>>>>>> >>>>>>>> available port, trying each port in a range. That free port >>>>>>>>>>>> >>>>>>>>>>> finding >>>>> >>>>>> logic >>>>>>>>> >>>>>>>>>> really doesn't want to have two minutes of retries for each >>>>>>>>>>>> >>>>>>>>>>> port. >>>> >>>>> It >>>>>> >>>>>>> seems >>>>>>>>>>>> like we need to rework the fix for GEODE-5591. >>>>>>>>>>>> >>>>>>>>>>>> Does it make sense to hold up the release to rework this fix, >>>>>>>>>>>> >>>>>>>>>>> or >>>> >>>>> should >>>>>>> >>>>>>>> we >>>>>>>>>>>> just revert it? Have we switched concourse over to using alpine >>>>>>>>>>>> >>>>>>>>>>> linux, >>>>>>> >>>>>>>> which I think was the original motivation for this fix? >>>>>>>>>>>> >>>>>>>>>>>> -Dan >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsm...@pivotal.io> >>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>> >>>>>>> Why is it waiting at all in this case? Where is this 2 minute >>>>>>>>>>>> >>>>>>>>>>> timeout >>>>>> >>>>>>> coming from? >>>>>>>>>>>>> >>>>>>>>>>>>> -Dan >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda < >>>>>>>>>>>>> >>>>>>>>>>>>> sai.boorlaga...@gmail.com >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> So the issue is that it takes longer to start than previous >>>>>>>>>>>>>> >>>>>>>>>>>>> releases? >>>>>>> >>>>>>>> Also, is this wait time only when using Gfsh to create >>>>>>>>>>>>>> gateway-receiver? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag <n...@apache.org> >>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>> >>>>>>>> Currently we have a minor issue in the release branch as >>>>>>>>>>>>>> >>>>>>>>>>>>> pointed >>>>> >>>>>> out >>>>>>> >>>>>>>> by >>>>>>>>>>>>>> >>>>>>>>>>>>> Barry O. >>>>>>>>>>>>> >>>>>>>>>>>>>> We will wait till a resolution is figured out for this >>>>>>>>>>>>>>> >>>>>>>>>>>>>> issue. >>>> >>>>> Steps: >>>>>>>>>>>>>>> 1. create locator >>>>>>>>>>>>>>> 2. start server --name=server1 --server-port=40404 >>>>>>>>>>>>>>> 3. start server --name=server2 --server-port=40405 >>>>>>>>>>>>>>> 4. create gateway-receiver --member=server1 >>>>>>>>>>>>>>> 5. create gateway-receiver --member=server2 `This gets stuck >>>>>>>>>>>>>>> >>>>>>>>>>>>>> for 2 >>>>>> >>>>>>> minutes` >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is the 2 minute wait time acceptable? Should we document it? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> When >>>>>> >>>>>>> we >>>>>>> >>>>>>>> revert >>>>>>>>>>>>>> >>>>>>>>>>>>>> GEODE-5591, this issue does not happen. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>> Nabarun Nag >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >