I don't know why this happens, but I encounter this often. My workaround is this:
killall -9 pacemakerd; killall pengine; killall lrmd; killall cib; killall corosync > On May 10, 2018, at 11:26 PM, 范国腾 <[email protected]> wrote: > > Hi, > > When I run the "pcs cluster stop --all", it will hang and there is no any > response sometimes. The log is as below. Could we find the reason why it > hangs from the log and how to make the cluster stop right now? > > [root@node2 pg_log]# pcs status > Cluster name: hgpurog > Stack: corosync > Current DC: sds1 (version 1.1.16-12.el7-94ff4df) - partition with quorum > Last updated: Fri May 11 01:11:26 2018 > Last change: Fri May 11 01:09:24 2018 by hacluster via crmd on sds1 > > 2 nodes configured > 3 resources configured > > Online: [ sds1 sds2 ] > > Full list of resources: > > Master/Slave Set: pgsql-ha [pgsqld] > Stopped: [ sds1 sds2 ] > Resource Group: mastergroup > master-vip (ocf::heartbeat:IPaddr2): Started sds1 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > [root@node2 pg_log]# pcs cluster stop --all > > > The /var/log/messages is as asbelow: > May 11 01:07:50 node2 crmd[5365]: notice: State transition S_PENDING -> > S_NOT_DC > May 11 01:07:50 node2 crmd[5365]: notice: State transition S_NOT_DC -> > S_PENDING > May 11 01:07:50 node2 crmd[5365]: notice: State transition S_PENDING -> > S_NOT_DC > May 11 01:07:51 node2 pgsqlms(pgsqld)[5371]: INFO: Execute action monitor and > the result 7 > May 11 01:07:51 node2 pgsqlms(undef)[5408]: INFO: Execute action meta-data > and the result 0 > May 11 01:07:51 node2 crmd[5365]: notice: Result of probe operation for > pgsqld on sds2: 7 (not running) > May 11 01:07:51 node2 crmd[5365]: notice: sds2-pgsqld_monitor_0:6 [ > /tmp:5866 - no response\n ] > May 11 01:07:51 node2 crmd[5365]: notice: Result of probe operation for > master-vip on sds2: 7 (not running) > May 11 01:10:02 node2 systemd: Started Session 16 of user root. > May 11 01:10:02 node2 systemd: Starting Session 16 of user root. > May 11 01:11:33 node2 pacemakerd[5357]: notice: Caught 'Terminated' signal > May 11 01:11:33 node2 systemd: Stopping Pacemaker High Availability Cluster > Manager... > May 11 01:11:33 node2 pacemakerd[5357]: notice: Shutting down Pacemaker > May 11 01:11:33 node2 pacemakerd[5357]: notice: Stopping crmd > May 11 01:11:33 node2 crmd[5365]: notice: Caught 'Terminated' signal > May 11 01:11:33 node2 crmd[5365]: notice: Shutting down cluster resource > manager > May 11 01:12:49 node2 systemd: Started Session 17 of user root. > May 11 01:12:49 node2 systemd-logind: New session 17 of user root. > May 11 01:12:49 node2 gdm-launch-environment]: AccountsService: > ActUserManager: user (null) has no username (object path: > /org/freedesktop/Accounts/User0, uid: 0) > May 11 01:12:49 node2 journal: ActUserManager: user (null) has no username > (object path: /org/freedesktop/Accounts/User0, uid: 0) > May 11 01:12:49 node2 systemd: Starting Session 17 of user root. > May 11 01:12:49 node2 dbus[648]: [system] Activating service > name='org.freedesktop.problems' (using servicehelper) > May 11 01:12:49 node2 dbus-daemon: dbus[648]: [system] Activating service > name='org.freedesktop.problems' (using servicehelper) > May 11 01:12:49 node2 dbus[648]: [system] Successfully activated service > 'org.freedesktop.problems' > May 11 01:12:49 node2 dbus-daemon: dbus[648]: [system] Successfully activated > service 'org.freedesktop.problems' > May 11 01:12:49 node2 journal: g_dbus_interface_skeleton_unexport: assertion > 'interface_->priv->connections != NULL' failed > > Here is the log in the peer node > May 11 01:09:08 node1 pgsqlms(pgsqld)[28599]: WARNING: No secondary connected > to the master > May 11 01:09:08 node1 pgsqlms(pgsqld)[28599]: WARNING: "sds2" is not > connected to the primary > May 11 01:09:08 node1 pgsqlms(pgsqld)[28599]: INFO: Execute action monitor > and the result 8 > May 11 01:09:18 node1 pgsqlms(pgsqld)[28679]: WARNING: No secondary connected > to the master > May 11 01:09:18 node1 pgsqlms(pgsqld)[28679]: WARNING: "sds2" is not > connected to the primary > May 11 01:09:18 node1 pgsqlms(pgsqld)[28679]: INFO: Execute action monitor > and the result 8 > May 11 01:09:24 node1 crmd[1111]: notice: sds1-pgsqld_monitor_10000:19 [ > /tmp:5866 - accepting connections\n ] > May 11 01:09:24 node1 crmd[1111]: notice: Transition aborted by deletion of > lrm_resource[@id='pgsqld']: Resource state removal > May 11 01:10:02 node1 systemd: Started Session 17 of user root. > May 11 01:10:02 node1 systemd: Starting Session 17 of user root. > May 11 01:11:33 node1 pacemakerd[1042]: notice: Caught 'Terminated' signal > May 11 01:11:33 node1 systemd: Stopping Pacemaker High Availability Cluster > Manager... > May 11 01:11:33 node1 pacemakerd[1042]: notice: Shutting down Pacemaker > May 11 01:11:33 node1 pacemakerd[1042]: notice: Stopping crmd > May 11 01:11:33 node1 crmd[1111]: notice: Caught 'Terminated' signal > May 11 01:11:33 node1 crmd[1111]: notice: Shutting down cluster resource > manager > May 11 01:11:33 node1 crmd[1111]: warning: Input I_SHUTDOWN received in state > S_TRANSITION_ENGINE from crm_shutdown > > > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
