[email protected] (Ferenc Wágner) writes: > Jan Friesse <[email protected]> writes: > >> [email protected] writes: >> >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >>> (in August; in May, it happened 0-2 times a day only, it's slowly >>> ramping up): >>> >>> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >>> configuration. >>> vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new >>> configuration. >>> vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled >>> for 4317.0054 ms (threshold is 2400.0000 ms). Consider token timeout >>> increase. >> >> ^^^ This is main problem you have to solve. It usually means that >> machine is too overloaded. It is happening quite often when corosync >> is running inside VM where host machine is unable to schedule regular >> VM running. > > After some extensive tracing, I think the problem lies elsewhere: my > IPMI watchdog device is slow beyond imagination.
Confirmed: setting watchdog_device: off cluster wide got rid of the above warnings. > Its ioctl operations can take seconds, starving all other functions. > At least, it seems to block the main thread of Corosync. Is this a > plausible scenario? Corosync has two threads, what are their roles? -- Regards, Feri _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
