[ 
https://issues.apache.org/jira/browse/GEODE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce J Schuchardt reassigned GEODE-8721:
-----------------------------------------

    Assignee: Bruce J Schuchardt

> member that should become coordinator never detects loss of current 
> coordinator
> -------------------------------------------------------------------------------
>
>                 Key: GEODE-8721
>                 URL: https://issues.apache.org/jira/browse/GEODE-8721
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.14.0
>            Reporter: Bruce J Schuchardt
>            Assignee: Bruce J Schuchardt
>            Priority: Major
>              Labels: release-blocker
>
> During a network partition a server that should have become membership 
> coordinator and shut down its side of the partition never detected the loss 
> of a server on the other side of the partition.  Instead it continually 
> performed availability checks on that other server and the checks passed.  
> Its log file had continually increasing timestamps for when it claimed the 
> other server had contacted it, which was not possible due to the network 
> partition (which was formed through iptable manipulation).
> At least one other server on its side of the network partition was doing the 
> same thing.  It looks like they were interfering with each others 
> availability checks in some way.
> {noformat}
> locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected 
> recent message traffic for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
> Oct 20 22:23:12 PDT 2020
> locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed 
> for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast 
> receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected 
> recent message traffic for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
> Oct 20 22:23:14 PDT 2020
> bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed 
> for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability 
> check for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 
> reason=Unable to send messages to this member via JGroups
> bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability 
> check for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 
> reason=Unable to send messages to this member via JGroups
> bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected 
> recent message traffic for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
> Oct 20 22:23:16 PDT 2020
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to