If you have a large cluster with many nodes or resources, you may have seen pacemaker become unresponsive and get a log message about "evicting client..." around the same time. This can happen due to a sudden spike of IPC messages between daemons, causing backlogs that cannot be handled fast enough. Pacemaker assumes that a daemon has died and restarts it.
Most of the time, this is not correct - the daemon has not died, it's just not processing IPC messages as fast as the other end of the connection is sending them, causing its backlog to grow. One way to avoid this problem is with the cluster-ipc-limit attribute, but the problem with this is you need to know to set it beforehand and it's always possible for the backlog to grow beyond whatever you set. Starting with Pacemaker 3.0.2, the daemons will no longer be subject to cluster-ipc-limit or to being evicted as long as we can detect they are still processing messages. Other IPC clients will still be subject to these restrictions - we don't believe a client (which could be a command line program like crm_mon or a third-party application) should be allowed to crash a daemon. Additionally, it's still possible for a daemon to be evicted if it has truly crashed or is taking a very long time to process a single message. The majority of users should never run into the eviction problem in the first place. For those that do, these changes should result in improved cluster stability. If you've set cluster-ipc-limit at some point, you may want to experiment with disabling it in 3.0.2, though leaving it set won't cause any harm. - Chris _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
