Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

Ferenc Wágner Wed, 08 Feb 2017 00:52:42 -0800

Ken Gaillot <[email protected]> writes:

> On 02/07/2017 01:11 AM, Ulrich Windl wrote:
>
>> Ken Gaillot <[email protected]> writes:
>>
>>> On 02/06/2017 03:28 AM, Ulrich Windl wrote:
>>>
>>>> Isn't the question: Is crmd a process that is expected to die (and
>>>> thus need restarting)? Or wouldn't one prefer to debug this
>>>> situation. I fear that restarting it might just cover some fatal
>>>> failure...
>>>
>>> If crmd or corosync dies, the node will be fenced (if fencing is enabled
>>> and working). If one of the crmd's persistent connections (such as to
>>> the cib) fails, it will exit, so it ends up the same.
>> 
>> But isn't it due to crmd not responding to network packets? So if the
>> timeout is long enough, and crmd is started fast enough, will the
>> node really be fenced?
>
> If crmd dies, it leaves its corosync process group, and I'm pretty sure
> the other nodes will fence it for that reason, regardless of the duration.


See http://lists.clusterlabs.org/pipermail/users/2016-March/002415.html
for a case when a Pacemaker cluster survived a crmd failure and restart.
Re-reading the thread, I'm still unsure what saved our ass from
resources being started in parallel and losing massive data.  I'd fully
expect fencing in such cases...
-- 
Feri

_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

Reply via email to