On 18/10/14 12:18 AM, Andrei Borzenkov wrote:
В Mon, 06 Oct 2014 10:27:49 -0400
Digimer <[email protected]> пишет:

On 06/10/14 02:11 AM, Andrei Borzenkov wrote:
On Mon, Oct 6, 2014 at 9:03 AM, Digimer <[email protected]> wrote:
If stonith was configured, after the time out, the first node would fence
the second node ("unable to reach" != "off").

Alternatively, you can set corosync to 'wait_for_all' and have the first
node do nothing until it sees the peer.


Am I right that wait_for_all is available only in corosync 2.x and not in 1.x?

You are correct, yes.

To do otherwise would be to risk a split-brain. Each node needs to know the
state of the peer in order to run services safely. By having both start at
the same time, then they know what the other is doing. By disabling quorum,
you allow one node to continue to operate when the other leaves, but it
needs that initial connection to know for sure what it's doing.


Does it apply to both corosync 1.x and 2.x or only to 2.x with
wait_for_all? Because I actually also was confused about precise
meaning of disabling quorum in pacemaker (setting no-quorum-policy:
ignore). So if I have two node cluster with pacemaker 1.x and corosync
1.x with no-quorum-policy=ignore and no fencing - what happens when
one single node starts?

Quorum tells the cluster that if a peer leaves (gracefully or was
fenced), the remaining node is allowed to continue providing services.

Stonith is needed to put a node that is in an unknown state into a known
state; Be it because it couldn't reach the node when starting or because
the node stopped responding.

So quorum and stonith play rather different roles.

Without stonith, regardless of quorum, you risk split-brains and/or data
corruption. Operating a cluster without stonith is to operate a cluster
in an undermined state and should never be done.


OK I try to rephrase. Is it possible to achieve the same effect as
wait_for_all in corosync 2.x with combination of pacemaker 1.1.x and
corosync 1.x? I.e. ensure that cluster does not come up *on the
first startup* until all nodes are present? So just make cluster nodes
wait for others to join instead of trying to stonith them?

No, not that I know of. To achieve the same behaviour, I wrote my own program[1] to do this. It is called on boot and waits for the peer to become reachable, then it starts the cluster stack. So the same effect is gained, but it's done outside corosync directly.

Note that I write it for corosync 1.x + cman + rgmanager, but the concepts port trivially.

digimer

1. https://github.com/digimer/an-cdb/blob/master/tools/safe_anvil_start

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to