[jira] [Commented] (GEODE-8739) Split brain when locators exhaust join attempts on non existant servers

Bill Burcham (Jira) Fri, 20 Nov 2020 13:07:08 -0800


    [ 
https://issues.apache.org/jira/browse/GEODE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236435#comment-17236435
 ]


Bill Burcham commented on GEODE-8739:
-------------------------------------

I believe the Geode processes in question were running inside Kubernetes (pods.)

The purpose of the (persistent view) {{.dat}} files are to enable a locator 
(process) re-started on a particular host, to re-join a distributed system 
(cluster). The assumption is that the distributed system has had continuity 
across the locator's restart, i.e. the assumption is that the cluster still 
exists.

If you want to do a Kubernetes maintenance operation that kills an entire Geode 
distributed system and starts up a new on in its place, you should probably 
make sure no {{.dat}} files are visible to the (new) locator processes.

Alternately, in a use-case where a locator process (running inside a pod) 
crashes (the pod crashes) and a new pod is started that is "logically" the same 
one, it is TBD.

> Split brain when locators exhaust join attempts on non existant servers
> -----------------------------------------------------------------------
>
>                 Key: GEODE-8739
>                 URL: https://issues.apache.org/jira/browse/GEODE-8739
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Jason Huynh
>            Priority: Major
>         Attachments: exportedLogs_locator-0.zip, exportedLogs_locator-1.zip
>
>
> The hypothesis: "if there is a locator view .dat file with several 
> non-existent servers then then locators will waste all of their join attempts 
> on the servers instead of finding each other"
> Scenario is a test/user attempts to recreate a cluster with existing .dat and 
> persistent files.  The locators are spun in parallel and from the analysis, 
> it looks like they are able to communicate with each other, but then end up 
> forming their own ds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEODE-8739) Split brain when locators exhaust join attempts on non existant servers

Reply via email to