Hi Everyone.I have 16-nodes asynchronous cluster configured with Corosync redundant ring feature.
Each node has 2 similarly connected/configured NIC's. One NIC connected to the public network,
another one to our private VLAN. When I checked Corosync rings operability I found:
# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.1.54
status = Marking ringid 0 interface 192.168.1.54 FAULTY
RING ID 1
id = 111.11.11.1
status = ring 1 active with no faults
After some time of digging into I identified that if I enable back the
failed ring with command:
# corosync-cfgtool -rRING ID 0 will be marked as "active" for few minutes, but after it marked permanently as faulty.
Log has no any useful info, just single message:
corosync[21740]: [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY
And no any message like:
[TOTEM ] Automatically recovered ring 1
My corosync.conf looks like:
compatibility: whitetank
totem {
version: 2
secauth: on
threads: 4
rrp_mode: passive
interface {
member {
memberaddr: PRIVATE_IP_1
}
...
member {
memberaddr: PRIVATE_IP_16
}
ringnumber: 0
bindnetaddr: PRIVATE_NET_ADDR
mcastaddr: 226.0.0.1
mcastport: 5505
ttl: 1
}
interface {
member {
memberaddr: PUBLIC_IP_1
}
...
member {
memberaddr: PUBLIC_IP_16
}
ringnumber: 1
bindnetaddr: PUBLIC_NET_ADDR
mcastaddr: 224.0.0.1
mcastport: 5405
ttl: 1
}
transport: udpu
logging {
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
logfile_priority: info
to_syslog: yes
syslog_priority: warning
debug: on
timestamp: on
}
I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
but result was the similar.
I checked multicast/unicast operability using omping utility and didn't found any issues.
Also no errors on our private VLAN was found for network equipment.Why Corosync decided to disable permanently second ring? How I can debug the issue?
Other properties: Corosync Cluster Engine, version '1.4.7' Pacemaker properties: cluster-infrastructure: cman cluster-recheck-interval: 5min dc-version: 1.1.14-8.el6-70404b0 expected-quorum-votes: 3 have-watchdog: false last-lrm-refresh: 1484068350 maintenance-mode: false no-quorum-policy: ignore pe-error-series-max: 1000 pe-input-series-max: 1000 pe-warn-series-max: 1000 stonith-action: reboot stonith-enabled: false symmetric-cluster: false Thank you. -- Regards Denis Gribkov
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
