[ https://issues.apache.org/jira/browse/GEODE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573984#comment-16573984 ]
ASF subversion and git services commented on GEODE-5546: -------------------------------------------------------- Commit c8da2631cb5d51222d3e301fb628838fff140d0f in geode's branch refs/heads/feature/GEODE-5546 from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=c8da263 ] GEODE-5546 auto-reconnecting member reuses old address including vmViewId Old membership IDs are now retained in JGroupsMessenger and GMSJoinLeave uses a new method, Messenger.isOldMembershipIdentifier(), to avoid accepting a prepared view that contains an old identity. GMSJoinLeave is also modified to send an immediate removal message to servers that are no longer members of the cluster but are attempting to interact with the cluster. > auto-reconnecting member reuses old address including vmViewId > -------------------------------------------------------------- > > Key: GEODE-5546 > URL: https://issues.apache.org/jira/browse/GEODE-5546 > Project: Geode > Issue Type: Bug > Components: membership > Affects Versions: 1.6.0 > Reporter: Bruce Schuchardt > Assignee: Bruce Schuchardt > Priority: Major > > During network-down testing I found that if I restore the network immediately > after all "losing side" servers go into auto-reconnect that sometimes they > receive a view-preparation message from the surviving cluster that holds > their old membership ID. They use this ID instead of waiting for a valid new > ID and end up being shut down as rogue processes. > For instance, this process used to have an identifier with <v3> before it > went into auto-reconnect. When it tried to rejoin it ended up using that > same identifier due to receiving a view-preparation message holding it: > [info 2018/07/28 22:17:14.588 PDT > gemfire1_rs-FullRegression29040205a1i3xlarge-hydra-client-18_15643 > <ReconnectThread> tid=0x2d2] Attempting to join the distributed system > through coordinator > 10.32.110.93(gemfire6_rs-FullRegression29040205a1i3xlarge-hydra-client-50_13624:13624:locator)<ec><v1>:1024 > using address > 10.32.108.125(gemfire1_rs-FullRegression29040205a1i3xlarge-hydra-client-18_15643:15643)<v3>:1026 > In this run it then proceeded to hang trying to send startup messages to the > cluster. Cluster members rejected all of its attempts to contact them but > were also unsuccessful in getting the rogue process to shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)