On 07/25/2016 04:56 PM, Thomas Lamprecht wrote: > Thanks for the fast reply :) > > > On 07/25/2016 03:51 PM, Christine Caulfield wrote: >> On 25/07/16 14:29, Thomas Lamprecht wrote: >>> Hi all, >>> >>> I'm currently testing the new features of corosync 2.4, especially >>> qdevices. >>> First tests show quite nice results, like having quorum on a single >>> node >>> left out of a three node cluster. >>> >>> But what I'm a bit worrying about is what happens if the server where >>> qnetd runs, or the qdevice daemon fails, >>> in this case the cluster cannot afford any other loss of a node in my >>> three node setup as votes expected are >>> 5 and thus 3 are needed for quorum, which I cannot fulfill if the qnetd >>> does not run run or failed. >> We're looking into ways of making this more resilient. It might be >> possible to cluster a qnetd (though this is not currently supported) in >> a separate cluster from the arbitrated one, obviously. > > Yeah I saw that in the QDevice document, that would be a way. > > Would then the qnetd daemons act like an own cluster I guess, as there > would be a need to communicate which node sees which qnetd daemon? > So that a decision about the quorate partition can be made. > > But it's always binding the reliability of a cluster to the one of a > node, adding a dependency, > meaning that now failures of components outside from the cluster, > which would else have > no affect on the cluster behaviour may now affect it, which could be a > problem? > > I know that's worst case scenario but with only one qnetd running on a > single (external) node > it can happen, and if the reliability of the node running qnetd is the > same as the one from each cluster node > the reliability of the whole cluster in a three node case would be > quite simplified, if I remember my introduction course to this topic > somewhat correctly: > > Without qnetd: 1 - ( (1 - R1) * (1 - R2) * (1 - R3)) > > With qnetd: (1 - ( (1 - R1) * (1 - R2) * (1 - R3)) ) * Rqnetd > > Where R1, R2, R3 are the reliabilities of the cluster nodes and > Rqnetd is the reliability of the node running qnetd. > While thats a really really simplified model, not quite correctly > depict reallity, the base concept that the reliability > of the whole cluster gets dependent of the one from the node running > qnetd, or? > With lms and ffsplit I guess the calculation is not that simple anymore ...
correct me if I'm wrong but I think a bottomline to understanding the benefits of qdevice is to think of the classic quorum-generation taking basically a snapshot of the situation at a certain time and deriving the reactions from that - whereas with qdevice it is tried to benefit from the knowledge of the past (respectively how we got into the current situation). >> >> The LMS algorithm is quite smart about how it doles out its vote and can >> handle isolation from the main qnetd provided that the main core of the >> cluster (the majority in a split) retains quorum, but any more serious >> changes to the cluster config will cause it to be withdrawn. So in this >> case you should find that your 3 node cluster will continue to work in >> the absence of the qnetd server or link, provided you don't lose any >> nodes. > > Yes I read that in the documents and saw that during testing also, > really good work! > > My point of my mail was exactly the failure of qnetd itself and the > resulting situation that the cluster > then cannot afford to loose any node, while without qnetd it could > afford to loose (n - 1) / 2 nodes. > > Or do I have also to enable quorum.last_man_standing together with > quorum.wait_for_all to allowing down scaling of the expected votes if > qnetd fails completely? > I will test that. > > I'm just wanting to be sure if my thoughts are correct, or at least > not completely flawed and that > qnetd like it is makes sense in a even node count cluster with the > ffsplit algorithm but not in an > uneven node count cluster, if the reliability of the node running > qnetd cannot be guaranteed, > i.e. adding HA to the service (VM or container) running qnetd. > > best regards, > Thomas > >> >> In a 3 node setup obviously LMS is more appropriate than ffsplit anyway. >> >> Chrissie >> >>> So in this case I'm bound to the reliability of the server providing >>> the >>> qnetd service, >>> if it fails I cannot afford to loose any other node in my three node >>> example, >>> or also in any other example with uneven node count as the qdevice vote >>> subsystems provides node count -1 votes. >>> >>> So if I see it correctly QDevices make only sense in case of even node >>> counts, >>> maybe especially 2 node setups as if qnetd works we have on more node >>> which may fail and if qnetd failed >>> we are as good as without it as qnted provides only one vote here. >>> >>> Am I missing something, or any thoughts to that? >>> >>> >>> >>> _______________________________________________ >>> Users mailing list: [email protected] >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
