On 02/07/2017 01:11 AM, Ulrich Windl wrote: >>>> Ken Gaillot <[email protected]> schrieb am 06.02.2017 um 16:13 in > Nachricht > <[email protected]>: >> On 02/06/2017 03:28 AM, Ulrich Windl wrote: >>>>>> RaSca <[email protected]> schrieb am 03.02.2017 um 14:00 in >>> Nachricht >>> <[email protected]>: >>> >>>> On 03/02/2017 11:06, Ferenc Wágner wrote: >>>>> Ken Gaillot <[email protected]> writes: >>>>> >>>>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote: >>>>>> >>>>>>> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup >>>>>>> seems to be working ok including the STONITH. >>>>>>> For test purposes I issued a "pkill -f pace" killing all pacemaker >>>>>>> processes on one node. >>>>>>> >>>>>>> Result: >>>>>>> The node is marked as "pending", all resources stay on it. If I >>>>>>> manually kill a resource it is not noticed. On the other node a drbd >>>>>>> "promote" command fails (drbd is still running as master on the first >>>>>>> node). >>>>>> >>>>>> I suspect that, when you kill pacemakerd, systemd respawns it quickly >>>>>> enough that fencing is unnecessary. Try "pkill -f pace; systemd stop >>>>>> pacemaker". >>>>> >>>>> What exactly is "quickly enough"? >>>> >>>> What Ken is saying is that Pacemaker, as a service managed by systemd, >>>> have in its service definition file >>>> (/usr/lib/systemd/system/pacemaker.service) this option: >>>> >>>> Restart=on-failure >>>> >>>> Looking at [1] it is explained: systemd restarts immediately the process >>>> if it ends for some unexpected reason (like a forced kill). >>> >>> Isn't the question: Is crmd a process that is expected to die (and thus > need >>> restarting)? Or wouldn't one prefer to debug this situation. I fear that >>> restarting it might just cover some fatal failure... >> >> If crmd or corosync dies, the node will be fenced (if fencing is enabled >> and working). If one of the crmd's persistent connections (such as to >> the cib) fails, it will exit, so it ends up the same. But the other > > But isn't it due to crmd not responding to network packets? So if the timeout > is long enough, and crmd is started fast enough, will the node really be > fenced?
If crmd dies, it leaves its corosync process group, and I'm pretty sure the other nodes will fence it for that reason, regardless of the duration. >> daemons (such as pacemakerd or attrd) can die and respawn without any >> risk to services. >> >> The failure will be logged, but it will not be reported in cluster >> status, so there is a chance of not noticing it. > > I don't understand: A node is fenced, but it will not be noted in the cluster > status??? I meant the case where pacemakerd or attrd respawns quickly. The node is not fenced in that case, and the only indication will be in the logs. _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
