[ https://issues.apache.org/jira/browse/GEODE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce J Schuchardt reassigned GEODE-8721: ----------------------------------------- Assignee: Bruce J Schuchardt > member that should become coordinator never detects loss of current > coordinator > ------------------------------------------------------------------------------- > > Key: GEODE-8721 > URL: https://issues.apache.org/jira/browse/GEODE-8721 > Project: Geode > Issue Type: Bug > Components: membership > Affects Versions: 1.14.0 > Reporter: Bruce J Schuchardt > Assignee: Bruce J Schuchardt > Priority: Major > Labels: release-blocker > > During a network partition a server that should have become membership > coordinator and shut down its side of the partition never detected the loss > of a server on the other side of the partition. Instead it continually > performed availability checks on that other server and the checks passed. > Its log file had continually increasing timestamps for when it claimed the > other server had contacted it, which was not possible due to the network > partition (which was formed through iptable manipulation). > At least one other server on its side of the network partition was doing the > same thing. It looks like they were interfering with each others > availability checks in some way. > {noformat} > locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP > Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected > recent message traffic for suspect member > 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue > Oct 20 22:23:12 PDT 2020 > locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP > Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed > for suspect member > 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 > bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast > receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting > 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 > bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP > Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected > recent message traffic for suspect member > 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue > Oct 20 22:23:14 PDT 2020 > bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP > Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed > for suspect member > 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 > locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP > Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability > check for suspect member > 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 > reason=Unable to send messages to this member via JGroups > bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP > Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability > check for suspect member > 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 > reason=Unable to send messages to this member via JGroups > bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP > Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected > recent message traffic for suspect member > 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue > Oct 20 22:23:16 PDT 2020 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)