[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: better display of internal failures

Ulrich Windl Wed, 20 Oct 2021 00:36:12 -0700

>>> Ken Gaillot <[email protected]> schrieb am 19.10.2021 um 19:16 in
Nachricht
<[email protected]>:
> Hi all,
> 
> I hope to get the first release candidate for Pacemaker 2.1.2 out in a
> couple of weeks.
> 
> One improvement will be in status displays (crm_mon, and the
> crm_resource ‑‑force‑* options) for failed actions.
> 
> OCF resource agents already have the ability to output an "exit reason"
> for failures. These are displayed in the status, to give more detailed
> information than just "error".
> 
> Now, Pacemaker will set exit reasons for internal failures as well.
> This includes problems such as an agent or systemd unit not being
> installed, timeouts in Pacemaker communication as opposed to the agent
> itself, an agent process being killed by a signal, etc.
> 
> As an example, sending a kill ‑9 to a running agent monitor would
> previously result in status with no explanation, requiring some log
> diving to figure it out:
> 
>  * rsc1_monitor_60000 on node1 'error' (1): call=188, status='Error',
> exitreason='', last‑rc‑change='Fri Sep 24 14:45:02 2021', queued=0ms,
> exec=0ms
> 
> Now, the exit reason will plainly say what happened:
> 
>  * rsc1_monitor_60000 on node1 'error' (1): call=188, status='Error',
> exitreason='Process interrupted by signal', last‑rc‑change='Fri Sep 24
> 14:45:02 2021', queued=0ms, exec=0ms


Oops: When you detected that a process was terminated by a signal you would
also know _which_ signal; why not log it then?
And: Do you also detect and log when a core-dump was created?

That would just sound logical to me.

Regards,
Ulrich


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: better display of internal failures

Reply via email to