Richard ,

To clarify my problem, this is more on Qdevice issue I want to fix.

The question is, how much it is really qdevice problem and if so, if there is really something we can do about the problem.

Qdevice itself is just using standard connect(2) call and standard TCP socket. So from qdevice point of view it is really kernel problem where to route packet to reach qnetd.

It is clear that ifdown made qdevice to lost connection with qnetd (that's why ip changed from ens192 to ens256) and standard qdevice behavior is to try reconnect. Qdevice itself is not binding to any specific address (it is really just a client) so after calling connect(2) qdevice reached qnetd via other (working) interface.

So I would suggest to try method recommended by Andrei (add host route).

Regards,
  Honza

See below for more detail.
Thank you,
Richard

     ----- Original message -----
     From: Andrei Borzenkov <[email protected]>
     Sent by: "Users" <[email protected]>
     To: [email protected]
     Cc:
     Subject: [EXTERNAL] Re: [ClusterLabs] Two ethernet adapter within same
     subnet causing issue on Qdevice
     Date: Thu, Oct 1, 2020 2:45 PM
     01.10.2020 20:09, Richard Seo пишет:
      > Hello everyone,
      > I'm trying to setup a cluster with two hosts:
      > both have two ethernet adapters all within the same subnet.
      > I've created resources for an adapter for each hosts.
      > Here is the example:
      > Stack: corosync
      > Current DC: <host 1> (version 2.0.2-1.el8-744a30d655) - partition with 
quorum
      > Last updated: Thu Oct  1 12:50:48 2020
      > Last change: Thu Oct  1 12:32:53 2020 by root via cibadmin on <host 1>
      > 2 nodes configured
      > 2 resources configured
      > Online: [ <host1> <host2> ]
      > Active resources:
      > db2_<host1>_ens192    (ocf::heartbeat:db2ethmon):     Started <host1>
      > db2_<host2>_ens192    (ocf::heartbeat:db2ethmon):     Started <host2>
      > I also have a qdevice setup:
      > # corosync-qnetd-tool -l
      > Cluster "hadom":
      >      Algorithm:        LMS
      >      Tie-breaker:    Node with lowest node ID
      >      Node ID 2:
      >          Client address:        ::ffff:<ip for ens192 for host 2>:40044
      >          Configured node list:    1, 2
      >          Membership node list:    1, 2
      >          Vote:            ACK (ACK)
      >      Node ID 1:
      >          Client address:        ::ffff:<*ip for ens192 for host 
1*>:37906
      >          Configured node list:    1, 2
      >          Membership node list:    1, 2
      >          Vote:            ACK (ACK)
      > When I ifconfig down ens192 for host 1, looks like qdevice changes the 
Client
      > address to the other adapter and still give quorum to the lowest node ID
     (which
      > is host 1 in this case) even when the network is down for host 1.

     Network on host 1 is obviously not down as this host continues to
     communicate with the outside world. Network may be down for your
     specific application but then it is up to resource agent for this
     application to detect it and initiate failover.
     The Network (ens192) on host 1 is down. host1 can still communicate with 
the
     world, because host1 has another network adapter (ens256). However, only
     ens192 was configured as a resource. I've also configured specifically
     ens192 ip address in the corsync.conf.
     I want the network on host 1 down. that way, I can reproduce the problem
     where quorum is given to a wrong node.

      > Cluster "hadom":
      >      Algorithm:        LMS
      >      Tie-breaker:    Node with lowest node ID
      >      Node ID 2:
      >          Client address:        ::ffff:<ip for ens192 for host 2>:40044
      >          Configured node list:    1, 2
      >          Membership node list:    1, 2
      >          Vote:            ACK (ACK)
      >      Node ID 1:
      >          Client address:        ::ffff:<*ip for ens256 for host 
1*>:37906
      >          Configured node list:    1, 2
      >          Membership node list:    1, 2
      >          Vote:            ACK (ACK)
      > Is there a way we can force qdevice to only route through a specified 
adapter
      > (ens192 in this case)?

     Create host route via specific device.
     I've looked over the docs, haven't found a way to do this. I've tried
     configuring corosync.conf using the specific ip addresses. Could you 
specify
     how to route to a specific network adapter from a qdevice?

      > Also while I'm on this topic, is multiple communication ring support 
with
      > pacemaker supported or will be supported in the near future?

     What exactly do you mean? What communication are you talking about?

     You seem to confuse multiple layers here. qnetd and pacemaker are two
     independent things.
     So this is a separate question regarding Pacemaker and Corosync. I want to
     know if having multiple communication ring in the nodelist in
     corosync.conf is supported by Pacemaker with Corosync right now. The
     communication protocal is called Redundant ring protocol.

     _______________________________________________
     Manage your subscription:
     https://lists.clusterlabs.org/mailman/listinfo/users

     ClusterLabs home: https://www.clusterlabs.org/




_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to