Jan Friesse <[email protected]> writes: > [email protected] writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled >> for 4317.0054 ms (threshold is 2400.0000 ms). Consider token timeout >> increase. > > ^^^ This is main problem you have to solve. It usually means that > machine is too overloaded. It is happening quite often when corosync > is running inside VM where host machine is unable to schedule regular > VM running.
After some extensive tracing, I think the problem lies elsewhere: my IPMI watchdog device is slow beyond imagination. Its ioctl operations can take seconds, starving all other functions. At least, it seems to block the main thread of Corosync. Is this a plausible scenario? Corosync has two threads, what are their roles? -- Thanks, Feri _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
