[
https://issues.apache.org/jira/browse/GEODE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bruce J Schuchardt reassigned GEODE-8721:
-----------------------------------------
Assignee: Bruce J Schuchardt
> member that should become coordinator never detects loss of current
> coordinator
> -------------------------------------------------------------------------------
>
> Key: GEODE-8721
> URL: https://issues.apache.org/jira/browse/GEODE-8721
> Project: Geode
> Issue Type: Bug
> Components: membership
> Affects Versions: 1.14.0
> Reporter: Bruce J Schuchardt
> Assignee: Bruce J Schuchardt
> Priority: Major
> Labels: release-blocker
>
> During a network partition a server that should have become membership
> coordinator and shut down its side of the partition never detected the loss
> of a server on the other side of the partition. Instead it continually
> performed availability checks on that other server and the checks passed.
> Its log file had continually increasing timestamps for when it claimed the
> other server had contacted it, which was not possible due to the network
> partition (which was formed through iptable manipulation).
> At least one other server on its side of the network partition was doing the
> same thing. It looks like they were interfering with each others
> availability checks in some way.
> {noformat}
> locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected
> recent message traffic for suspect member
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue
> Oct 20 22:23:12 PDT 2020
> locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed
> for suspect member
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast
> receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected
> recent message traffic for suspect member
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue
> Oct 20 22:23:14 PDT 2020
> bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed
> for suspect member
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability
> check for suspect member
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> reason=Unable to send messages to this member via JGroups
> bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability
> check for suspect member
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> reason=Unable to send messages to this member via JGroups
> bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP
> Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected
> recent message traffic for suspect member
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue
> Oct 20 22:23:16 PDT 2020
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)