[ https://issues.apache.org/jira/browse/GEODE-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136150#comment-17136150 ]
ASF GitHub Bot commented on GEODE-8241: --------------------------------------- aaronlindsey commented on a change in pull request #5236: URL: https://github.com/apache/geode/pull/5236#discussion_r440451304 ########## File path: geode-membership/src/integrationTest/java/org/apache/geode/distributed/internal/membership/gms/MembershipIntegrationTest.java ########## @@ -172,6 +181,92 @@ public void secondMembershipCanJoinUsingTheSecondLocatorToStart() stop(locator2, locator1); } + @Test + public void secondMembershipPausesForLocatorWaitTime() + throws IOException, MemberStartupException { + + /* + * Start a locator for the coordinator (membership) so we have a port for it. + * + * Its locator-wait-time is set to 0 so it eventually (soon after membership is started) forms a + * distributed system and becomes a coordinator. + */ + + final MembershipLocator<MemberIdentifier> coordinatorLocator = createLocator(0); + coordinatorLocator.start(); + final int coordinatorLocatorPort = coordinatorLocator.getPort(); + + final Membership<MemberIdentifier> coordinatorMembership = + createMembership(coordinatorLocator, coordinatorLocatorPort); + + /* + * We have not even started the membership yet — connection attempts will certainly fail until + * we do. This is a bit like the locator (host) not being present in DNS (yet). + */ + + /* + * Start a second locator and membership trying to join via the coordinator (membership) that + * hasn't yet started behind the port. + * + * Set its locator-wait-time so it'll not become a coordinator right away, allowing time for the + * other member to start and become a coordinator. + * + * Calculate the locator-wait-time to be greater than the minimum wait time for connecting to a + * locator. + */ + + final MembershipLocator<MemberIdentifier> lateJoiningLocator = createLocator(0); + lateJoiningLocator.start(); + final int lateJoiningLocatorPort = lateJoiningLocator.getPort(); + + final int[] lateJoiningMembershipLocatorPorts = + new int[] {coordinatorLocatorPort, lateJoiningLocatorPort}; + + final Duration minimumJoinWaitTime = Duration + .ofMillis(JOIN_RETRY_SLEEP + FIND_LOCATOR_RETRY_SLEEP) // amount of sleep time per retry + .multipliedBy(lateJoiningMembershipLocatorPorts.length * 2); // expected number of retries + final int locatorWaitTime = (int) (3 * minimumJoinWaitTime.getSeconds()); + + final MembershipConfig lateJoiningMembershipConfig = + createMembershipConfig(true, locatorWaitTime, lateJoiningMembershipLocatorPorts); + final Membership<MemberIdentifier> lateJoiningMembership = + createMembership(lateJoiningMembershipConfig, lateJoiningLocator); + + CompletableFuture<Void> lateJoiningMembershipStartup = executorServiceRule.runAsync(() -> { + try { + start(lateJoiningMembership); + } catch (MemberStartupException e) { + throw new RuntimeException(e); + } + }); + + /* + * Now start the coordinator (membership), after waiting longer than the minimum wait time for + * connecting to a locator but shorter than the locator-wait-time. + */ + + CompletableFuture<Void> coordinatorMembershipStartup = executorServiceRule.runAsync(() -> { + try { + Thread.sleep(2 * minimumJoinWaitTime.toMillis()); Review comment: Again, `minimumJoinWaitTime` is the minimum amount of time a locator will wait to become a coordinator. We are just trying to make sure we sleep for longer than the minimum but shorter than the `locator-wait-time` so we can detect whether the `lateJoiningMembership` is waiting for the full `locator-wait-time` and not just the minimum wait time. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Locator does not observe locator-wait-time > ------------------------------------------ > > Key: GEODE-8241 > URL: https://issues.apache.org/jira/browse/GEODE-8241 > Project: Geode > Issue Type: Bug > Reporter: Aaron Lindsey > Assignee: Aaron Lindsey > Priority: Major > > In the case where a locator starts up and is unable to connect to any other > locators, it may decide to become the membership coordinator even if > locator-wait-time has not elapsed. > The following conditional from GMSJoinLeave.java causes the issue. There > should be an additional check for locator-wait-time before becoming > coordinator. > {code:java} > if (state.joinedMembersContacted <= 0 && > (tries >= minimumRetriesBeforeBecomingCoordinator || > state.locatorsContacted >= locators.size())) { > synchronized (viewInstallationLock) { > becomeCoordinator(); > } > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)