Source: corosync Version: 3.0.1-2+deb10u1 Severity: important Hi,
Some weeks ago I upgraded corosync (3.0.1-2 -> 3.0.1-2+deb10u1) and started to notice these messages in my nodes (two node cluster): Jun 2 01:10:13 patty corosync[2346]: [KNET ] link: host: 2 link: 0 is down Jun 2 01:10:13 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 1) Jun 2 01:10:14 patty corosync[2346]: [KNET ] rx: host: 2 link: 0 is up Jun 2 01:10:14 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Jun 3 03:11:07 patty corosync[2346]: [KNET ] link: host: 2 link: 1 is down Jun 3 03:11:07 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Jun 3 03:11:08 patty corosync[2346]: [KNET ] rx: host: 2 link: 1 is up Jun 3 03:11:08 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Notice the failure happens on with both links. One of the links is a cross-over cable. The other uses a bond with two interfaces. These errors are more common on one of the nodes that on the other. Some times they match (both nodes log the link failure), but most of the time only one node complains: Jun 4 01:16:23 selma corosync[52890]: [KNET ] link: host: 1 link: 0 is down Jun 4 01:16:23 selma corosync[52890]: [KNET ] host: host: 1 (passive) best link: 1 (pri: 1) Jun 4 01:16:24 selma corosync[52890]: [KNET ] rx: host: 1 link: 0 is up Jun 4 01:16:24 selma corosync[52890]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) Jun 4 01:16:55 patty corosync[2346]: [KNET ] link: host: 2 link: 0 is down Jun 4 01:16:55 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 1) Jun 4 01:16:56 patty corosync[2346]: [KNET ] rx: host: 2 link: 0 is up Jun 4 01:16:56 patty corosync[2346]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Here's my config: totem { version: 2 cluster_name: web crypto_cipher: none crypto_hash: none interface { linknumber: 0 } interface { linknumber: 1 } } logging { fileline: off to_stderr: yes to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes debug: off logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } nodelist { node { name: patty nodeid: 1 ring0_addr: 192.168.144.1 ring1_addr: 10.10.1.5 } node { name: selma nodeid: 2 ring0_addr: 192.168.144.2 ring1_addr: 10.10.1.6 } } Any help is appreciated. Thanks, Alberto -- System Information: Debian Release: bullseye/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 5.6.0-1-amd64 (SMP w/4 CPU cores) Kernel taint flags: TAINT_FIRMWARE_WORKAROUND Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE= (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system)