On Thu, 2021-07-08 at 10:00 +0200, Ulrich Windl wrote: > > > > Amol Shinde <[email protected]> schrieb am 08.07.2021 um > > > > 08:58 in > > Nachricht > < > mw3pr20mb3385122ebac1ab9282c3f91ae9...@mw3pr20mb3385.namprd20.prod.outlook.com > > > > > Hello everyone!!! > > Hope you are doing well. > > I need some help regarding pacemaker alerts. I have a 36‑node > > cluster setup > > with some IP and dummy resources. I have also deployed an alert > > script for > > the cluster that monitors the node and resources and generates > > alerts on > > events occurrence. The alert script is present on all nodes and > > sends the > > captured alert to a Web‑UI using a message bus. So, for example, > > when a node > > goes offline pacemaker triggers the alert agent script on other > > nodes in the > > cluster and logs the event as "Node is lost". This message is then > > sent to > > the message bus by the script. > > > > The problem is that since the alert is triggered on every node the > > agent > > script sends multiple duplicate log messages to the message bus. > > Multiple > > duplicate log messages from all the live nodes are reported to the > > Web‑UI > > thus > > clogging up the interface and making parsing through it difficult > > and > > ruining > > the user experience. > > > > Is there any way in the pacemaker itself through which when an > > event occurs > > the pacemaker calls the agent on any one node and logs the message > > rather > > than calling the agent on all live nodes within the cluster? For > > example, > > when a node goes offline, the agent is triggered on any one of the > > live > > nodes > > on the cluster thus generating one log, rather than generating > > multiple > > duplicate logs for the same event.
Not currently. It's not straightforward -- cluster partitions can happen in many ways besides just one node leaving (splitting into two active partitions, every node in its own partition, etc.). Pacemaker coordinates nodes within a partition by electing a DC, but that could unnecessarily delay alerts. Basically we decided that it's up to whatever is receiving the alerts to de-duplicate them. > If there were (actually I don't know) a cluster-wide "event-ID" (e.g. > sequence > number) and that event ID would be passed to the alerting function, > then you'd > still create multiple events, but the backend could suppress multiple > events > about the same event ID. No, there isn't. There's a CRM_alert_node_sequence passed to the agent, but it's node-local, so the agent can reliably detect the order of alerts on a single node. A timestamp is also passed to the agent, both in a format specified by the user and in seconds and microseconds since the epoch, so if the clocks are closely synchronized, it should be feasible to de-duplicate on the receiving end. > Regards, > Ulrich -- Ken Gaillot <[email protected]> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
