[jira] [Commented] (GEODE-5546) auto-reconnecting member reuses old address including vmViewId

ASF subversion and git services (JIRA) Wed, 08 Aug 2018 15:29:17 -0700


    [ 
https://issues.apache.org/jira/browse/GEODE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573984#comment-16573984
 ]


ASF subversion and git services commented on GEODE-5546:
--------------------------------------------------------

Commit c8da2631cb5d51222d3e301fb628838fff140d0f in geode's branch 
refs/heads/feature/GEODE-5546 from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c8da263 ]

GEODE-5546 auto-reconnecting member reuses old address including vmViewId

Old membership IDs are now retained in JGroupsMessenger and GMSJoinLeave
uses a new method, Messenger.isOldMembershipIdentifier(), to avoid accepting
a prepared view that contains an old identity.

GMSJoinLeave is also modified to send an immediate removal message to
servers that are no longer members of the cluster but are attempting to interact
with the cluster.


> auto-reconnecting member reuses old address including vmViewId
> --------------------------------------------------------------
>
>                 Key: GEODE-5546
>                 URL: https://issues.apache.org/jira/browse/GEODE-5546
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.6.0
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>
> During network-down testing I found that if I restore the network immediately 
> after all "losing side" servers go into auto-reconnect that sometimes they 
> receive a view-preparation message from the surviving cluster that holds 
> their old membership ID.  They use this ID instead of waiting for a valid new 
> ID and end up being shut down as rogue processes.
> For instance, this process used to have an identifier with <v3> before it 
> went into auto-reconnect.  When it tried to rejoin it ended up using that 
> same identifier due to receiving a view-preparation message holding it:
> [info 2018/07/28 22:17:14.588 PDT 
> gemfire1_rs-FullRegression29040205a1i3xlarge-hydra-client-18_15643 
> <ReconnectThread> tid=0x2d2] Attempting to join the distributed system 
> through coordinator 
> 10.32.110.93(gemfire6_rs-FullRegression29040205a1i3xlarge-hydra-client-50_13624:13624:locator)<ec><v1>:1024
>  using address 
> 10.32.108.125(gemfire1_rs-FullRegression29040205a1i3xlarge-hydra-client-18_15643:15643)<v3>:1026
> In this run it then proceeded to hang trying to send startup messages to the 
> cluster.  Cluster members rejected all of its attempts to contact them but 
> were also unsuccessful in getting the rogue process to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GEODE-5546) auto-reconnecting member reuses old address including vmViewId

Reply via email to