Re: [ClusterLabs] Antw: Corosync ring marked as FAULTY

bliu Wed, 22 Feb 2017 00:07:31 -0800

Hi, Denis

could you try tcpdump "udp port 5505" on the private network to see ifthere is packet?



On 02/22/2017 03:47 PM, Denis Gribkov wrote:

In our case it does not create problems since all nodes are located infew networks whichserved by single router.

There are also no any errors detected on public ring 1 unlike privatering 0.

I have a suspicion that this error could be related to private VLANsettings but unfortunately have no good idea how to found the issue.


On 22/02/17 09:37, Ulrich Windl wrote:

Is "ttl 1" a good idea for a public network?

Denis Gribkov<[email protected]>  schrieb am 21.02.2017 um 18:26 in Nachricht

<[email protected]>:

Hi Everyone.

I have 16-nodes asynchronous cluster configured with Corosync redundant
ring feature.

Each node has 2 similarly connected/configured NIC's. One NIC connected
to the public network,

another one to our private VLAN. When I checked Corosync rings
operability I found:

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
          id      = 192.168.1.54
          status  = Marking ringid 0 interface 192.168.1.54 FAULTY
RING ID 1
          id      = 111.11.11.1
          status  = ring 1 active with no faults

After some time of digging into I identified that if I enable back the
failed ring with command:

# corosync-cfgtool -r

RING ID 0 will be marked as "active" for few minutes, but after it
marked permanently as faulty.

Log has no any useful info, just single message:

corosync[21740]:   [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY

And no any message like:

[TOTEM ] Automatically recovered ring 1


My corosync.conf looks like:

compatibility: whitetank

totem {
          version: 2
          secauth: on
          threads: 4
          rrp_mode: passive

          interface {

                  member {
                          memberaddr: PRIVATE_IP_1
                  }

...

                  member {
                          memberaddr: PRIVATE_IP_16
                  }

                  ringnumber: 0
                  bindnetaddr: PRIVATE_NET_ADDR
                  mcastaddr: 226.0.0.1
                  mcastport: 5505
                  ttl: 1
          }

         interface {

                  member {
                          memberaddr: PUBLIC_IP_1
                  }
...

                  member {
                          memberaddr: PUBLIC_IP_16
                  }

                  ringnumber: 1
                  bindnetaddr: PUBLIC_NET_ADDR
                  mcastaddr: 224.0.0.1
                  mcastport: 5405
                  ttl: 1
          }

          transport: udpu

logging {
          to_stderr: no
          to_logfile: yes
          logfile: /var/log/cluster/corosync.log
          logfile_priority: info
          to_syslog: yes
          syslog_priority: warning
          debug: on
          timestamp: on
}

I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
but result was the similar.

I checked multicast/unicast operability using omping utility and didn't
found any issues.

Also no errors on our private VLAN was found for network equipment.

Why Corosync decided to disable permanently second ring? How I can debug
the issue?

Other properties:

Corosync Cluster Engine, version '1.4.7'

Pacemaker properties:
   cluster-infrastructure: cman
   cluster-recheck-interval: 5min
   dc-version: 1.1.14-8.el6-70404b0
   expected-quorum-votes: 3
   have-watchdog: false
   last-lrm-refresh: 1484068350
   maintenance-mode: false
   no-quorum-policy: ignore
   pe-error-series-max: 1000
   pe-input-series-max: 1000
   pe-warn-series-max: 1000
   stonith-action: reboot
   stonith-enabled: false
   symmetric-cluster: false

Thank you.

--
Regards Denis Gribkov




_______________________________________________
Users mailing list:[email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home:http://www.clusterlabs.org
Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:http://bugs.clusterlabs.org


--
Regards Denis Gribkov


_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Corosync ring marked as FAULTY

Reply via email to