On Thu, 2023-06-15 at 12:58 +0200, Kadlecsik József wrote: > Hello, > > We had a strange issue here: 7 node cluster, one node was put into > standby > mode to test a new iscsi setting on it. During configuring the > machine it > was rebooted and after the reboot the iscsi didn't come up. That > caused a > malformed communication (atlas5 is the node in standby) with the > cluster: > > Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: warning: > Unexpected > result (error) was recorded for probe of ocsi on atlas5 at Jun 15 > 10:09:32 2023 > Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: notice: If it is > not > possible for ocsi to run on atlas5, see the resource-discovery option > for > location constraints > Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]: error: Resource > ocsi > is active on 2 nodes (attempting recovery)
Newer versions reword this as "might be active". The idea is that if the probe returns an error, we don't know the state of the resource on that node. From an HA perspective, we have to assume the worst, that the resource could be active there. > The resource was definitely not active on 2 nodes. And that caused a > storm > of killing all virtual machines as resources. The cluster would first try to stop ocsi on that node as well as the node where it's known to be running. If a stop fails, then the cluster will try to fence that node. > How could one prevent such cases to come up? It sounds like maybe the agent can't probe or stop in certain situations. It may be possible to improve the agent. For example, some agents return an error if key software isn't installed, but for a probe or stop, that's fine -- if the software isn't installed, it's definitely not running. > > Best regards, > Jozsef > -- > E-mail : [email protected] > PGP key: https://wigner.hu/~kadlec/pgp_public_key.txt > Address: Wigner Research Centre for Physics > H-1525 Budapest 114, POB. 49, Hungary -- Ken Gaillot <[email protected]> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
