Richard ,
To clarify my problem, this is more on Qdevice issue I want to fix.
The question is, how much it is really qdevice problem and if so, if
there is really something we can do about the problem.
Qdevice itself is just using standard connect(2) call and standard TCP
socket. So from qdevice point of view it is really kernel problem where
to route packet to reach qnetd.
It is clear that ifdown made qdevice to lost connection with qnetd
(that's why ip changed from ens192 to ens256) and standard qdevice
behavior is to try reconnect. Qdevice itself is not binding to any
specific address (it is really just a client) so after calling
connect(2) qdevice reached qnetd via other (working) interface.
So I would suggest to try method recommended by Andrei (add host route).
Regards,
Honza
See below for more detail.
Thank you,
Richard
----- Original message -----
From: Andrei Borzenkov <[email protected]>
Sent by: "Users" <[email protected]>
To: [email protected]
Cc:
Subject: [EXTERNAL] Re: [ClusterLabs] Two ethernet adapter within same
subnet causing issue on Qdevice
Date: Thu, Oct 1, 2020 2:45 PM
01.10.2020 20:09, Richard Seo пишет:
> Hello everyone,
> I'm trying to setup a cluster with two hosts:
> both have two ethernet adapters all within the same subnet.
> I've created resources for an adapter for each hosts.
> Here is the example:
> Stack: corosync
> Current DC: <host 1> (version 2.0.2-1.el8-744a30d655) - partition with
quorum
> Last updated: Thu Oct 1 12:50:48 2020
> Last change: Thu Oct 1 12:32:53 2020 by root via cibadmin on <host 1>
> 2 nodes configured
> 2 resources configured
> Online: [ <host1> <host2> ]
> Active resources:
> db2_<host1>_ens192 (ocf::heartbeat:db2ethmon): Started <host1>
> db2_<host2>_ens192 (ocf::heartbeat:db2ethmon): Started <host2>
> I also have a qdevice setup:
> # corosync-qnetd-tool -l
> Cluster "hadom":
> Algorithm: LMS
> Tie-breaker: Node with lowest node ID
> Node ID 2:
> Client address: ::ffff:<ip for ens192 for host 2>:40044
> Configured node list: 1, 2
> Membership node list: 1, 2
> Vote: ACK (ACK)
> Node ID 1:
> Client address: ::ffff:<*ip for ens192 for host
1*>:37906
> Configured node list: 1, 2
> Membership node list: 1, 2
> Vote: ACK (ACK)
> When I ifconfig down ens192 for host 1, looks like qdevice changes the
Client
> address to the other adapter and still give quorum to the lowest node ID
(which
> is host 1 in this case) even when the network is down for host 1.
Network on host 1 is obviously not down as this host continues to
communicate with the outside world. Network may be down for your
specific application but then it is up to resource agent for this
application to detect it and initiate failover.
The Network (ens192) on host 1 is down. host1 can still communicate with
the
world, because host1 has another network adapter (ens256). However, only
ens192 was configured as a resource. I've also configured specifically
ens192 ip address in the corsync.conf.
I want the network on host 1 down. that way, I can reproduce the problem
where quorum is given to a wrong node.
> Cluster "hadom":
> Algorithm: LMS
> Tie-breaker: Node with lowest node ID
> Node ID 2:
> Client address: ::ffff:<ip for ens192 for host 2>:40044
> Configured node list: 1, 2
> Membership node list: 1, 2
> Vote: ACK (ACK)
> Node ID 1:
> Client address: ::ffff:<*ip for ens256 for host
1*>:37906
> Configured node list: 1, 2
> Membership node list: 1, 2
> Vote: ACK (ACK)
> Is there a way we can force qdevice to only route through a specified
adapter
> (ens192 in this case)?
Create host route via specific device.
I've looked over the docs, haven't found a way to do this. I've tried
configuring corosync.conf using the specific ip addresses. Could you
specify
how to route to a specific network adapter from a qdevice?
> Also while I'm on this topic, is multiple communication ring support
with
> pacemaker supported or will be supported in the near future?
What exactly do you mean? What communication are you talking about?
You seem to confuse multiple layers here. qnetd and pacemaker are two
independent things.
So this is a separate question regarding Pacemaker and Corosync. I want to
know if having multiple communication ring in the nodelist in
corosync.conf is supported by Pacemaker with Corosync right now. The
communication protocal is called Redundant ring protocol.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/