[ 
https://issues.apache.org/jira/browse/GEODE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234042#comment-17234042
 ] 

ASF GitHub Bot commented on GEODE-8721:
---------------------------------------

bschuchardt opened a new pull request #5758:
URL: https://github.com/apache/geode/pull/5758


   If a server is in the process of performing an availability check on
   another server we shouldn't update the contact timestamp for
   the suspected server based on gossip from another server.  Doing so
   will make the availability check pass and send out another gossip
   message that would likewise update their timestamps for the suspected
   server, perpetuating the notion that the suspect is still around.
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   @kamilla1201 
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [x] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   - [x] Does `gradlew build` run cleanly?
   
   - [x] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> member that should become coordinator never detects loss of current 
> coordinator
> -------------------------------------------------------------------------------
>
>                 Key: GEODE-8721
>                 URL: https://issues.apache.org/jira/browse/GEODE-8721
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.14.0
>            Reporter: Bruce J Schuchardt
>            Assignee: Bruce J Schuchardt
>            Priority: Major
>              Labels: release-blocker
>
> During a network partition a server that should have become membership 
> coordinator and shut down its side of the partition never detected the loss 
> of a server on the other side of the partition.  Instead it continually 
> performed availability checks on that other server and the checks passed.  
> Its log file had continually increasing timestamps for when it claimed the 
> other server had contacted it, which was not possible due to the network 
> partition (which was formed through iptable manipulation).
> At least one other server on its side of the network partition was doing the 
> same thing.  It looks like they were interfering with each others 
> availability checks in some way.
> {noformat}
> locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected 
> recent message traffic for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
> Oct 20 22:23:12 PDT 2020
> locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed 
> for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast 
> receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected 
> recent message traffic for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
> Oct 20 22:23:14 PDT 2020
> bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed 
> for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
> locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability 
> check for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 
> reason=Unable to send messages to this member via JGroups
> bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability 
> check for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 
> reason=Unable to send messages to this member via JGroups
> bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP 
> Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected 
> recent message traffic for suspect member 
> 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue 
> Oct 20 22:23:16 PDT 2020
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to