lkxjtu,



Corosync.log has kept printing the following logs for several days. What's 
wrong with the corosync cluster? Now the cpu load is not high.

Interesting messages from logs you've sent are:

Sep 30 01:23:28 [127667] paas-controller-172-21-0-2 corosync warning [MAIN ] timer_function_scheduler_timeout Corosync main process was not scheduled for 10470.3652 ms (threshold is 2400.0000 ms). Consider token timeout increase.

and

Sep 30 01:23:29 [127667] paas-controller-172-21-0-2 corosync notice [TOTEM ] pause_flush Process pause detected for 8760 ms, flushing membership messages.


This means that corosync was unable to get required time to run. This can happen because of: - (Most often) cluster is running in highly overloaded VMs (quite often cloud environments) - Corosync doesn't have a RT priority or there is another RT priority task using most of the time
- I/O problem
- Misbehaving watchdog device
- Bug in corosync

Honza


Cluster version information:
[root@paas-controller-172-167-40-24:~]$ rpm -q corosync
corosync-2.4.0-9.el7_4.2.x86_64
[root@paas-controller-172-167-40-24:~]$ rpm -q pacemaker
pacemaker-1.1.16-12.el7_4.2.x86_64



Sep 30 01:23:27 [128232] paas-controller-172-21-0-2        cib:     info: 
crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
...
_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to