Bruce Schuchardt created GEODE-2534:
---------------------------------------
Summary: concurrently started locators fail to create a unified
system
Key: GEODE-2534
URL: https://issues.apache.org/jira/browse/GEODE-2534
Project: Geode
Issue Type: Bug
Components: locator
Reporter: Bruce Schuchardt
During startup a locator responded to a "find coordinator" request before
knowing its own identity. This caused it to respond differently to subsequent
requests during concurrent locator startup. As a result it created its own
distributed system while the locator that received the initial response created
a different one.
{noformat}
[fine 2017/02/23 15:32:02.031 UTC locator-default-0 <main> tid=0x1] LogWriter
is created.
[fine 2017/02/23 15:32:02.031 UTC locator-default-0 <main> tid=0x1] Responding
to a property change event. Property name is config.
[info 2017/02/23 15:32:02.886 UTC locator-default-0 <main> tid=0x1] Peer
locator is connecting to local membership services
[fine 2017/02/23 15:32:02.887 UTC locator-default-0 <locator request thread[1]>
tid=0x14] Peer locator: coordinator from registrations is
10.85.100.166(locator-default-2:8706:locator)<ec>:49152
[fine 2017/02/23 15:32:02.887 UTC locator-default-0 <locator request thread[1]>
tid=0x14] Peer locator returning
FindCoordinatorResponse(coordinator=10.85.100.166(locator-default-2:8706:locator)<ec>:49152,
fromView=false, viewId=nul, registrants=1, senderId=null, network partition
detection enabled=true, locators preferred as coordinators=true)
[info 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] Starting
membership services
[fine 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] starting
Authenticator
[fine 2017/02/23 15:32:02.891 UTC locator-default-0 <main> tid=0x1] starting
Messenger
...
[fine 2017/02/23 15:32:03.369 UTC locator-default-0 <main> tid=0x1] All
membership services have been started
[fine 2017/02/23 15:32:03.369 UTC locator-default-0 <main> tid=0x1] join
timeout is set to 24000
[fine 2017/02/23 15:32:03.370 UTC locator-default-0 <main> tid=0x1] searching
for the membership coordinator
[fine 2017/02/23 15:32:03.370 UTC locator-default-0 <main> tid=0x1] sending
FindCoordinatorRequest(memberID=10.85.100.165(locator-default-0:8873:locator)<ec>:49152,
rejected=[], lastViewId=-1) to [/10.85.100.165:55221, /10.85.100.166:55221,
/10.85.100.167:55221]
...
[fine 2017/02/23 15:32:03.376 UTC locator-default-0 <locator request thread[1]>
tid=0x14] Peer locator: coordinator from registrations is
10.85.100.165(locator-default-0:8873:locator)<ec>:49152
[fine 2017/02/23 15:32:03.376 UTC locator-default-0 <locator request thread[1]>
tid=0x14] Peer locator returning
FindCoordinatorResponse(coordinator=10.85.100.165(locator-default-0:8873:locator)<ec>:49152,
fromView=false, viewId=nul, registrants=2,
senderId=10.85.100.165(locator-default-0:8873:locator)<ec>:49152, network
partition detection enabled=true, locators preferred as coordinators=true)
{noformat}
The locator should not respond to requests to find the coordinator before it
knows its own identity.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)