The nodes are called node-0 and node-1. It is not happening regularly. It rather happens occasionally. Among about 50 two-node clusters we have in house I've seen this issue in journal of 2 clusters. I looked at logs and the pattern I see is this: stop Pacemaker and Corosync on node-1, and then start it. When then Corosync starts both nodes don't see each other (based on the log messages), and they both start to complain about "Digest does not match". Node-1 where Corosync (and Pacemaker) was shut down and then started again kills node-0. Then node-0 kills node-1. And then the cluster is successfully formed.
I didn't manage to reproduce it manually. Also, it might be connected to network problems inside a chassis (it turned out there might be an internal switch causing this problems). I am investigating it now. _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
