Bruce Schuchardt created GEODE-2534:
---------------------------------------

             Summary: concurrently started locators fail to create a unified 
system
                 Key: GEODE-2534
                 URL: https://issues.apache.org/jira/browse/GEODE-2534
             Project: Geode
          Issue Type: Bug
          Components: locator
            Reporter: Bruce Schuchardt


During startup a locator responded to a "find coordinator" request before 
knowing its own identity.  This caused it to respond differently to subsequent 
requests during concurrent locator startup.  As a result it created its own 
distributed system while the locator that received the initial response created 
a different one.

{noformat}
[fine 2017/02/23 15:32:02.031 UTC locator-default-0 <main> tid=0x1] LogWriter 
is created.

[fine 2017/02/23 15:32:02.031 UTC locator-default-0 <main> tid=0x1] Responding 
to a property change event. Property name is config.

[info 2017/02/23 15:32:02.886 UTC locator-default-0 <main> tid=0x1] Peer 
locator is connecting to local membership services

[fine 2017/02/23 15:32:02.887 UTC locator-default-0 <locator request thread[1]> 
tid=0x14] Peer locator: coordinator from registrations is 
10.85.100.166(locator-default-2:8706:locator)<ec>:49152

[fine 2017/02/23 15:32:02.887 UTC locator-default-0 <locator request thread[1]> 
tid=0x14] Peer locator returning 
FindCoordinatorResponse(coordinator=10.85.100.166(locator-default-2:8706:locator)<ec>:49152,
 fromView=false, viewId=nul, registrants=1, senderId=null, network partition 
detection enabled=true, locators preferred as coordinators=true)

[info 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] Starting 
membership services

[fine 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] starting 
Authenticator

[fine 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] starting 
Messenger

...

[fine 2017/02/23 15:32:03.369 UTC locator-default-0 <main> tid=0x1] All 
membership services have been started

[fine 2017/02/23 15:32:03.369 UTC locator-default-0 <main> tid=0x1] join 
timeout is set to 24000

[fine 2017/02/23 15:32:03.370 UTC locator-default-0 <main> tid=0x1] searching 
for the membership coordinator

[fine 2017/02/23 15:32:03.370 UTC locator-default-0 <main> tid=0x1] sending 
FindCoordinatorRequest(memberID=10.85.100.165(locator-default-0:8873:locator)<ec>:49152,
 rejected=[], lastViewId=-1) to [/10.85.100.165:55221, /10.85.100.166:55221, 
/10.85.100.167:55221]

...

[fine 2017/02/23 15:32:03.376 UTC locator-default-0 <locator request thread[1]> 
tid=0x14] Peer locator: coordinator from registrations is 
10.85.100.165(locator-default-0:8873:locator)<ec>:49152

[fine 2017/02/23 15:32:03.376 UTC locator-default-0 <locator request thread[1]> 
tid=0x14] Peer locator returning 
FindCoordinatorResponse(coordinator=10.85.100.165(locator-default-0:8873:locator)<ec>:49152,
 fromView=false, viewId=nul, registrants=2, 
senderId=10.85.100.165(locator-default-0:8873:locator)<ec>:49152, network 
partition detection enabled=true, locators preferred as coordinators=true)
{noformat}

The locator should not respond to requests to find the coordinator before it 
knows its own identity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to