As downstream consumers of Geode, we do not want to be exposed to this. Please revert and fix on develop. Also, could we put a test case to guard us against this in future?
Thanks, *Pulkit Chandra* On Wed, Sep 5, 2018 at 1:07 AM Xiaojian Zhou <gz...@pivotal.io> wrote: > Yes. The current fix is to let each gateway receiver (in hydra tests, > there're a lot) to compete port 5500. Only one member will win, all other > members will timeout after 2 minutes. Then they keep compete for port 5501. > Again, only one member will win. > > In that case, if there are 5 receivers, it will take 10 minutes to start > all the receivers. > > So I enhanced the current fix (see the diff attached) to let each receiver > to pick a random port to start, if any one failed, only this guy will try > currPort++. If reached endPort, continue on startPort, until reached his > random port again. > > To enhance the 2-minute-timeout is definitely another issue. > > Regards > Gester > > On Tue, Sep 4, 2018 at 4:38 PM, Dan Smith <dsm...@pivotal.io> wrote: > >> Spitting this into a separate thread. >> >> I see the issue. The two minute timeout is the constructor for >> AcceptorImpl, where it retries to bind for 2 minutes. >> >> That behavior makes sense for CacheServer.start. >> >> But it doesn't make sense for the new logic in GatewayReceiver.start() >> from >> GEODE-5591. That code is trying to use CacheServer.start to scan for an >> available port, trying each port in a range. That free port finding logic >> really doesn't want to have two minutes of retries for each port. It seems >> like we need to rework the fix for GEODE-5591. >> >> Does it make sense to hold up the release to rework this fix, or should we >> just revert it? Have we switched concourse over to using alpine linux, >> which I think was the original motivation for this fix? >> >> -Dan >> >> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsm...@pivotal.io> wrote: >> >> > Why is it waiting at all in this case? Where is this 2 minute timeout >> > coming from? >> > >> > -Dan >> > >> > On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda < >> sai.boorlaga...@gmail.com >> > > wrote: >> > >> >> So the issue is that it takes longer to start than previous releases? >> >> Also, is this wait time only when using Gfsh to create >> gateway-receiver? >> >> >> >> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag <n...@apache.org> wrote: >> >> >> >> > Currently we have a minor issue in the release branch as pointed out >> by >> >> > Barry O. >> >> > We will wait till a resolution is figured out for this issue. >> >> > >> >> > Steps: >> >> > 1. create locator >> >> > 2. start server --name=server1 --server-port=40404 >> >> > 3. start server --name=server2 --server-port=40405 >> >> > 4. create gateway-receiver --member=server1 >> >> > 5. create gateway-receiver --member=server2 `This gets stuck for 2 >> >> minutes` >> >> > >> >> > Is the 2 minute wait time acceptable? Should we document it? When we >> >> revert >> >> > GEODE-5591, this issue does not happen. >> >> > >> >> > Regards >> >> > Nabarun Nag >> >> > >> >> >> > >> > >