26.03.2019 18:33, JCA пишет: > Making some progress with Pacemaker/DRBD, but still trying to grasp some of > the basics of this framework. Here is my current situation: > > I have a two-node cluster, pmk1 and pmk2, with resources ClusterIP and > DrbdFS. In what follows, commands preceded by '[pmk1] #' are to be > understood as commands issued by the superuser in pmk1, whereas those > preceded by '[pmk2] #' are issued by the superuser in pmk2 (pretty obvious, > but better make it crystal clear). > > [pmk1] # pcs status resources > ClusterIP (ocf::heartbeat:IPaddr2): Started pmk1 > Master/Slave Set: DrbdDataClone [DrbdData] > Masters: [ pmk1 ] > Slaves: [ pmk2 ] > DrbdFS (ocf::heartbeat:Filesystem): Started pmk1 > > [pmk2] # pcs status resources > ClusterIP (ocf::heartbeat:IPaddr2): Started pmk1 > Master/Slave Set: DrbdDataClone [DrbdData] > Masters: [ pmk1 ] > Slaves: [ pmk2 ] > DrbdFS (ocf::heartbeat:Filesystem): Started pmk2 > > There is an ext4 filesystem in the DRBD device, mounted at /var/lib/pmk. > When things are as described above, in pmk1 this directory contains the > data that I used when I populated the DRBD filesystem in pmk1, whereas in > pmk2 it contains nothing. I.e. everything is as expected. > > Then I did > > [pmk1] # pcs cluster stop pmk1 > pmk1: Stopping Cluster (pacemaker)... > pmk1: Stopping Cluster (corosync)... > > [pmk2] # pcs status resources > ClusterIP (ocf::heartbeat:IPaddr2): Started pmk2 > Master/Slave Set: DrbdDataClone [DrbdData] > Masters: [ pmk2 ] > Stopped: [ pmk2 ] > DrbdFS (ocf::heartbeat:Filesystem): Started pmk2 > > After this the contents of /var/lib/pmk in pmk2 are those that were used to > populated the DRBD filesystem in pmk1 (plus any changes introduced by pmk1 > before I stopped it), whereas /var/lib/pmk in pmk1 is now empty. Which > implies that things seem to be behaving OK - or, at least, the way I was > expecting for them to behave. > > Next I tried to bring pmk1 back on: > > [pmk1] # pcs cluster start pmk1 > pmk1: Starting Cluster (corosync)... > pmk1: Starting Cluster (pacemaker)... > > [pmk1] # pcs status resources > ClusterIP (ocf::heartbeat:IPaddr2): Stopped > Master/Slave Set: DrbdDataClone [DrbdData] > Stopped: [ pmk1 pmk2 ] > DrbdFS (ocf::heartbeat:Filesystem): Stopped > > [pmk2] # pcs status resources > ClusterIP (ocf::heartbeat:IPaddr2): Started pmk2 > Master/Slave Set: DrbdDataClone [DrbdData] > Masters: [ pmk2 ] > Stopped: [ pmk2 ] > DrbdFS (ocf::heartbeat:Filesystem): Started pmk2 > > Node pmk1 is back up, but ClusterIP and DrbdFS are not, at least on pmk1. > And pmk2 remains in charge. I clumsily tried to restart those resources by > hand in pmk1, to no avail: > > [pmk1] # pcs resource restart ClusterIP > Error: Error performing operation: No such device or address > ClusterIP is not running anywhere and so cannot be restarted >
This sounds like pmk1 did not actually join the cluster. You need to check logs to see what happened when pacemaker on pmk1 was restarted. > I also tried stopping and starting the pmk1 node from pmk1, and also from > pmk2, several times, to no avail. > > How can I bring back the pmk1 node on correctly, so that everything is how > it originally was - i.e. with pmk1 up and running, and with the resources > also up and running in pmk1? > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
