Ken Gaillot <[email protected]> writes: > On 02/07/2017 01:11 AM, Ulrich Windl wrote: > >> Ken Gaillot <[email protected]> writes: >> >>> On 02/06/2017 03:28 AM, Ulrich Windl wrote: >>> >>>> Isn't the question: Is crmd a process that is expected to die (and >>>> thus need restarting)? Or wouldn't one prefer to debug this >>>> situation. I fear that restarting it might just cover some fatal >>>> failure... >>> >>> If crmd or corosync dies, the node will be fenced (if fencing is enabled >>> and working). If one of the crmd's persistent connections (such as to >>> the cib) fails, it will exit, so it ends up the same. >> >> But isn't it due to crmd not responding to network packets? So if the >> timeout is long enough, and crmd is started fast enough, will the >> node really be fenced? > > If crmd dies, it leaves its corosync process group, and I'm pretty sure > the other nodes will fence it for that reason, regardless of the duration.
See http://lists.clusterlabs.org/pipermail/users/2016-March/002415.html for a case when a Pacemaker cluster survived a crmd failure and restart. Re-reading the thread, I'm still unsure what saved our ass from resources being started in parallel and losing massive data. I'd fully expect fencing in such cases... -- Feri _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
