Re: [ClusterLabs] Redudant Ring Network failure

Jan Friesse Thu, 11 Jun 2020 00:47:00 -0700

Michael,

Jan,


actually we using this.

[root@lvm-nfscpdata-05ct::~ 100 ]# apt show corosync
Package: corosync
Version: 3.0.1-2+deb10u1

[root@lvm-nfscpdata-05ct::~]# apt show libknet1
Package: libknet1
Version: 1.8-2

This are the newest version provided on Mirror.

yup, but these are pretty old anyway and there is quite a few bugs (andmany of them may explain behavior you see).

I would suggest you to either try a upstream code compilation or (if youneed to stick with packages) you may give a try sid packages or Proxmoxrepositories (where knet is at version 1.15 and corosync 3.0.3).


Regards,
  Honza





Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa 
Aktiengesellschaft, Koeln, Registereintragung / Registration: Amtsgericht Koeln 
HR B 2168
Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr. 
Karl-Ludwig Kley
Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), Thorsten 
Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef Kayser, Dr. Michael 
Niggemann


-----Ursprüngliche Nachricht-----
Von: Jan Friesse <[email protected]>
Gesendet: Mittwoch, 10. Juni 2020 09:24
An: Cluster Labs - All topics related to open-source clustering welcomed 
<[email protected]>; ROHWEDER-NEUBECK, MICHAEL (EXTERN) 
<[email protected]>; [email protected]
Betreff: Re: [ClusterLabs] Redudant Ring Network failure

Michael,
what version of knet you are using? We had quite a few problems with older 
versions of knet, so current stable is recommended (1.16). Same applies for 
corosync because 3.0.4 has vastly improved display of links status.

Hello,
We have massive problems with the redundant ring operation of our Corosync / 
pacemaker 3 Node NFS clusters.

Most of the nodes either have an entire ring offline or only 1 node in a ring.
Example: (Node1 Ring0 333 Ring1 n33 | Node2 Ring0 033 Ring1 3n3 |
Node3 Ring0 333 Ring 1 33n)


Doesn't seem completely wrong. You can ignore 'n' for ring 1, because that is 
localhost which is connected only on Ring 0 (3.0.4 has this output more 
consistent) so all nodes are connected at least via Ring 1.
Ring 0 on node 2 seems to have some trouble with connection to node 1 but node 
1 (and 3) seems to be connected to node 2 just fine, so I think it is ether 
some bug in knet (probably already fixed) or some kind of firewall blocking 
just connection from node 2 to node 1 on ring 0.


corosync-cfgtool -R don't help
All nodes are VMs that build the ring together using 2 VLANs.
Which logs do you need to hopefully help me?


syslog/journal should contain everything needed especially when debug is 
enabled (corosync.conf - logging.debug: on)

Regards,
    Honza


Corosync Cluster Engine, version '3.0.1'
Copyright (c) 2006-2018 Red Hat, Inc.
Debian Buster


--
Mit freundlichen Grüßen
    Michael Rohweder-Neubeck

NSB GmbH – Nguyen Softwareentwicklung & Beratung GmbH Röntgenstraße 27
D-64291 Darmstadt
E-Mail:
[email protected]<mailto:[email protected]<mailto:mrn@nsb-software
.de%3cmailto:[email protected]>>
Manager: Van-Hien Nguyen, Jörg Jaspert
USt-ID: DE 195 703 354; HRB 7131 Amtsgericht Darmstadt




Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa
Aktiengesellschaft, Koeln, Registereintragung / Registration:
Amtsgericht Koeln HR B 2168 Vorsitzender des Aufsichtsrats / Chairman
of the Supervisory Board: Dr. Karl-Ludwig Kley Vorstand / Executive
Board: Carsten Spohr (Vorsitzender / Chairman), Thorsten Dirks,
Christina Foerster, Harry Hohmeister, Dr. Detlef Kayser, Dr. Michael
Niggemann




_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Redudant Ring Network failure

Reply via email to