[ 
https://issues.apache.org/jira/browse/GEODE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804281#comment-16804281
 ] 

ASF subversion and git services commented on GEODE-6570:
--------------------------------------------------------

Commit 522da6dc84fb2b0d945830a9514192cd9812f09b in geode's branch 
refs/heads/feature/GEODE-6570 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=522da6d ]

GEODE-6570 processing of cached join request delays view installation

Ignore join requests from a member that's already joined.  Also, never
overwrite an established vmViewID in a member's ID because the member
knows about the old vmViewID and will ignore the newly assigned number.


> processing of cached join request delays view installation
> ----------------------------------------------------------
>
>                 Key: GEODE-6570
>                 URL: https://issues.apache.org/jira/browse/GEODE-6570
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>
> In a test that kills and restarts locators one of the restarting locators 
> times out trying to join the distributed system.  Logs show that another 
> locator was becoming the membership coordinator and was delayed in sending 
> out a membership view when it processed a different join request for a member 
> that was already in the distributed system.
> locator A gets join request from node 1 and sends a PREPARE
> node 1 sets its identity's view ID using the PREPAREd view
> locator A is killed
> node 1 sends a join request to locator B.  Its identity has a view ID set.
> node 2 sends a join request to locator B and gets a PREPARE
> locator B processes node 1's join request and assigns a new view ID to it
> locator B processes node 2's join request and assigns a new view ID to it
> locator B sends the PREPARE with these two new nodes.  It also has node 1's 
> original ID
> locator B times out waiting for a response from node 1 with the new view ID 
> and declares it crashed.  It sends out a new PREPARE w/o that address.
> node 2 gives up waiting
> locator B gets no response from node 2 and declares it crashed, sends out a 
> new PREPARE without node 2 and succeeds.
> Here are log snippets showing the problem.  Process 616 has a JoinRequest 
> queued when this locator becomes coordinator.  The JoinRequest ID has v46 
> already in it, showing that a PREPARE has already been sent with this member 
> in it.
> The locator then creates a new View that has process 616's ID in it twice - 
> once with v46 and once with v60
> {noformat}
> locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT 
> locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] 
> processing request 
> JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004)
>  failureDetectionPort:43747
> locatorgemfire_2_2_29835/system.log: [fine 2019/03/27 22:22:22.817 PDT 
> locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] 
> processing request 
> JoinRequestMessage(rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec>:41002)
>  failureDetectionPort:52188
> locatorgemfire_2_2_29835/system.log: [info 2019/03/27 22:22:22.818 PDT 
> locatorgemfire_2_2_host2_29835 <Geode Membership View Creator> tid=0xba] 
> preparing new view 
> View[rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001|60]
>  members: 
> [rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_2_host2_29835:29835:locator)<ec><v24>:41001,
>  
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_30052:30052)<ec><v25>:41007{lead},
>  
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_4_host2_31300:31300:locator)<ec><v29>:41003,
>  
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_1_host2_31671:31671:locator)<ec><v41>:41000,
>  
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_2_host2_31856:31856)<ec><v42>:41006,
>  
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_32560:32560)<ec><v44>:41005,
>  
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v46>:41004,
>  
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(peergemfire_2_1_host2_616:616)<ec><v60>:41004,
>  
> rs-GEM-2463-1622a0i32xlarge-hydra-client-17(locatorgemfire_2_3_host2_746:746:locator)<ec><v60>:41002]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to