[ https://issues.apache.org/jira/browse/GEODE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804281#comment-16804281 ]
ASF subversion and git services commented on GEODE-6570: -------------------------------------------------------- Commit 522da6dc84fb2b0d945830a9514192cd9812f09b in geode's branch refs/heads/feature/GEODE-6570 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=522da6d ] GEODE-6570 processing of cached join request delays view installation Ignore join requests from a member that's already joined. Also, never overwrite an established vmViewID in a member's ID because the member knows about the old vmViewID and will ignore the newly assigned number. > processing of cached join request delays view installation > ---------------------------------------------------------- > > Key: GEODE-6570 > URL: https://issues.apache.org/jira/browse/GEODE-6570 > Project: Geode > Issue Type: Bug > Components: membership > Reporter: Bruce Schuchardt > Assignee: Bruce Schuchardt > Priority: Major > > In a test that kills and restarts locators one of the restarting locators > times out trying to join the distributed system. Logs show that another > locator was becoming the membership coordinator and was delayed in sending > out a membership view when it processed a different join request for a member > that was already in the distributed system. > locator A gets join request from node 1 and sends a PREPARE > node 1 sets its identity's view ID using the PREPAREd view > locator A is killed > node 1 sends a join request to locator B. Its identity has a view ID set. > node 2 sends a join request to locator B and gets a PREPARE > locator B processes node 1's join request and assigns a new view ID to it > locator B processes node 2's join request and assigns a new view ID to it > locator B sends the PREPARE with these two new nodes. It also has node 1's > original ID > locator B times out waiting for a response from node 1 with the new view ID > and declares it crashed. It sends out a new PREPARE w/o that address. > node 2 gives up waiting > locator B gets no response from node 2 and declares it crashed, sends out a > new PREPARE without node 2 and succeeds. > Here are log snippets showing the problem. Process 616 has a JoinRequest > queued when this locator becomes coordinator. The JoinRequest ID has v46 > already in it, showing that a PREPARE has already been sent with this member > in it. > The locator then creates a new View that has process 616's ID in it twice - > once with v46 and once with v60 > {noformat} > locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT > locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] > processing request > JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004) > failureDetectionPort:43747 > locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT > locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] > processing request > JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec>:41002) > failureDetectionPort:52188 > locatorgemfire_2_2_29835/system.log: [info 2019/03/27 22:22:22.818 PDT > locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] > preparing new view > View[rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001|60] > members: > [rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001, > > rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_30052:30052)<ec><v25>:41007{lead}, > > rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_4_host2_31300:31300:locator)<ec><v29>:41003, > > rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_1_host2_31671:31671:locator)<ec><v41>:41000, > > rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_31856:31856)<ec><v42>:41006, > > rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_32560:32560)<ec><v44>:41005, > > rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004, > > rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v60>:41004, > > rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec><v60>:41002] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)