As downstream consumers of Geode, we do not want to be exposed to this.
Please revert and fix on develop. Also, could we put a test case to guard
us against this in future?

Thanks,

*Pulkit Chandra*



On Wed, Sep 5, 2018 at 1:07 AM Xiaojian Zhou <gz...@pivotal.io> wrote:

> Yes. The current fix is to let each gateway receiver (in hydra tests,
> there're a lot) to compete port 5500. Only one member will win, all other
> members will timeout after 2 minutes. Then they keep compete for port 5501.
> Again, only one member will win.
>
> In that case, if there are 5 receivers, it will take 10 minutes to start
> all the receivers.
>
> So I enhanced the current fix (see the diff attached) to let each receiver
> to pick a random port to start, if any one failed, only this guy will try
> currPort++. If reached endPort, continue on startPort, until reached his
> random port again.
>
> To enhance the 2-minute-timeout is definitely another issue.
>
> Regards
> Gester
>
> On Tue, Sep 4, 2018 at 4:38 PM, Dan Smith <dsm...@pivotal.io> wrote:
>
>> Spitting this into a separate thread.
>>
>> I see the issue. The two minute timeout is the constructor for
>> AcceptorImpl, where it retries to bind for 2 minutes.
>>
>> That behavior makes sense for CacheServer.start.
>>
>> But it doesn't make sense for the new logic in GatewayReceiver.start()
>> from
>> GEODE-5591. That code is trying to use CacheServer.start to scan for an
>> available port, trying each port in a range. That free port finding logic
>> really doesn't want to have two minutes of retries for each port. It seems
>> like we need to rework the fix for GEODE-5591.
>>
>> Does it make sense to hold up the release to rework this fix, or should we
>> just revert it? Have we switched concourse over to using alpine linux,
>> which I think was the original motivation for this fix?
>>
>> -Dan
>>
>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsm...@pivotal.io> wrote:
>>
>> > Why is it waiting at all in this case? Where is this 2 minute timeout
>> > coming from?
>> >
>> > -Dan
>> >
>> > On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
>> sai.boorlaga...@gmail.com
>> > > wrote:
>> >
>> >> So the issue is that it takes longer to start than previous releases?
>> >> Also, is this wait time only when using Gfsh to create
>> gateway-receiver?
>> >>
>> >> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag <n...@apache.org> wrote:
>> >>
>> >> > Currently we have a minor issue in the release branch as pointed out
>> by
>> >> > Barry O.
>> >> > We will wait till a resolution is figured out for this issue.
>> >> >
>> >> > Steps:
>> >> > 1. create locator
>> >> > 2. start server --name=server1 --server-port=40404
>> >> > 3. start server --name=server2 --server-port=40405
>> >> > 4. create gateway-receiver --member=server1
>> >> > 5. create gateway-receiver --member=server2 `This gets stuck for 2
>> >> minutes`
>> >> >
>> >> > Is the 2 minute wait time acceptable? Should we document it? When we
>> >> revert
>> >> > GEODE-5591, this issue does not happen.
>> >> >
>> >> > Regards
>> >> > Nabarun Nag
>> >> >
>> >>
>> >
>>
>
>

Reply via email to