What about this: Configure fencing, then if everything works OK, try without fencing.
>>> ??? <[email protected]> schrieb am 07.05.2018 um 08:54 in Nachricht <[email protected]>: > Thank you, Klaus. There is no fencing device in our network according to the > request. Is there any other way to configure the cluster to make it work? > > > 发件人: Klaus Wenninger [mailto:[email protected]] > 发送时间: 2018年5月7日 14:40 > 收件人: Cluster Labs - All topics related to open-source clustering welcomed > <[email protected]>; 范国腾 <[email protected]> > 主题: Re: [ClusterLabs] The slave not does not promote to master > > On 05/07/2018 07:39 AM, 范国腾 wrote: > > Hi, > > > > We have two nodes cluster using PAF to manage the postgres. Node2 is master. > Master/Slave Set: pgsql-ha [pgsqld] > > Master: [sds2] > > Slaves: [ sds1 ] > > > > In the master node(sds2), I remove the data directory of postgres. I expect > the master nodes(sds2) stop and the slave node(sds1) is promoted to master. > > The sds2 log show that is executes monitor->notify->demote->notify->stop. The > sds1 log also show " Promote pgsqld:0#011(Slave -> Master sds1)". But the "pcs > status" shows the status like the following. Could you please help check what > prevents the promotion happen in sds1? What should I do if I want to recovery > the system? > > Didn't check all detail but looks as if stopping the resource would > fail. So that it doesn't know the state on sds2 and thus can't > promote on sds1. > If you had enabled fencing this would lead to sds2 being fenced > so that sds1 can take over. > > As digimer would say: "use fencing!" > > Regards, > Klaus > > > > > > > > 2 nodes configured > > 3 resources configured > > Online: [ sds1 sds2 ] > > Full list of resources: > > Master/Slave Set: pgsql-ha [pgsqld] > > pgsqld (ocf::heartbeat:pgsqlms): FAILED Master sds2 (blocked) > > Slaves: [ sds1 ] > > Resource Group: mastergroup > > master-vip (ocf::heartbeat:IPaddr2): Started sds2 > > Failed Actions: > > * pgsqld_stop_0 on sds2 'invalid parameter' (2): call=42, status=complete, > exitreason='PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists', > > last-rc-change='Mon May 7 00:39:06 2018', queued=1ms, exec=72ms > > > > > > > > Here is the sds2 log: > > May 7 00:38:46 node2 pgsqlms(pgsqld)[14000]: INFO: Execute action monitor > and the result 8 > > May 7 00:38:56 node2 pgsqlms(pgsqld)[14077]: INFO: Execute action monitor > and the result 8 > > May 7 00:39:06 node2 pgsqlms(pgsqld)[14152]: ERROR: PGDATA > "/home/highgo/highgo/database/4.3.1/data" does not exists > > May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_monitor_10000:14152:stderr > [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists ] > > May 7 00:39:06 node2 crmd[1129]: notice: sds2-pgsqld_monitor_10000:36 [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists\n ] > > May 7 00:39:06 node2 pgsqlms(pgsqld)[14162]: ERROR: PGDATA > "/home/highgo/highgo/database/4.3.1/data" does not exists > > May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_notify_0:14162:stderr [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists ] > > May 7 00:39:06 node2 crmd[1129]: notice: Result of notify operation for > pgsqld on sds2: 0 (ok) > > May 7 00:39:06 node2 crmd[1129]: notice: sds2-pgsqld_monitor_10000:36 [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists\n ] > > May 7 00:39:06 node2 pgsqlms(pgsqld)[14172]: ERROR: PGDATA > "/home/highgo/highgo/database/4.3.1/data" does not exists > > May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_demote_0:14172:stderr [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists ] > > May 7 00:39:06 node2 crmd[1129]: notice: Result of demote operation for > pgsqld on sds2: 2 (invalid parameter) > > May 7 00:39:06 node2 crmd[1129]: notice: sds2-pgsqld_demote_0:39 [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists\n ] > > May 7 00:39:06 node2 pgsqlms(pgsqld)[14182]: ERROR: PGDATA > "/home/highgo/highgo/database/4.3.1/data" does not exists > > May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_notify_0:14182:stderr [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists ] > > May 7 00:39:06 node2 crmd[1129]: notice: Result of notify operation for > pgsqld on sds2: 0 (ok) > > May 7 00:39:06 node2 pgsqlms(pgsqld)[14192]: ERROR: PGDATA > "/home/highgo/highgo/database/4.3.1/data" does not exists > > May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_notify_0:14192:stderr [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists ] > > May 7 00:39:06 node2 crmd[1129]: notice: Result of notify operation for > pgsqld on sds2: 0 (ok) > > May 7 00:39:06 node2 pgsqlms(pgsqld)[14202]: ERROR: PGDATA > "/home/highgo/highgo/database/4.3.1/data" does not exists > > May 7 00:39:06 node2 lrmd[1126]: notice: pgsqld_stop_0:14202:stderr [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists ] > > May 7 00:39:06 node2 crmd[1129]: notice: Result of stop operation for > pgsqld on sds2: 2 (invalid parameter) > > May 7 00:39:06 node2 crmd[1129]: notice: sds2-pgsqld_stop_0:42 [ > ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not > exists\n ] > > May 7 00:40:01 node2 systemd: Started Session 4 of user root. > > May 7 00:40:01 node2 systemd: Starting Session 4 of user root. > > May 7 00:47:21 node2 pacemakerd[1063]: notice: Caught 'Terminated' signal > > May 7 00:47:21 node2 systemd: Stopping Pacemaker High Availability Cluster > Manager... > > May 7 00:47:21 node2 pacemakerd[1063]: notice: Shutting down Pacemaker > > May 7 00:47:21 node2 pacemakerd[1063]: notice: Stopping crmd > > May 7 00:47:21 node2 crmd[1129]: notice: Caught 'Terminated' signal > > May 7 00:47:21 node2 crmd[1129]: notice: Shutting down cluster resource > manager > > > > Here is the sds1 log(in the attachment) > > May 7 00:38:47 node1 pgsqlms(pgsqld)[4426]: INFO: Execute action monitor > and the result 0May 7 00:39:03 node1 pgsqlms(pgsqld)[4442]: INFO: Execute > action monitor and the result 0May 7 00:39:06 node1 crmd[1133]: notice: > State transition S_IDLE -> S_POLICY_ENGINEMay 7 00:39:06 node1 pengine[1132]: > warning: Processing failed op monitor for pgsqld:1 on sds2: invalid parameter > (2)May 7 00:39:06 node1 pengine[1132]: error: Preventing pgsql-ha from > re-starting on sds2: operation monitor failed 'invalid parameter' (2)May 7 > 00:39:06 node1 pengine[1132]: notice: Promote pgsqld:0#011(Slave -> Master > sds1)May 7 00:39:06 node1 pengine[1132]: notice: Demote > pgsqld:1#011(Master -> Stopped sds2)May 7 00:39:06 node1 pengine[1132]: > notice: Move master-vip#011(Started sds2 -> sds1)May 7 00:39:06 node1 > pengine[1132]: notice: Calculated transition 31, saving inputs in > /var/lib/pacemaker/pengine/pe-input-97.bz2May 7 00:39:06 node1 > pengine[1132]: warning: Processing failed op monitor for pgsqld:1 on sds2: > invalid parameter (2)May 7 00:39:06 node1 pengine[1132]: error: Preventing > pgsql-ha from re-starting on sds2: operation monitor failed 'invalid > parameter' (2)May 7 00:39:06 node1 pengine[1132]: notice: Promote > pgsqld:0#011(Slave -> Master sds1)May 7 00:39:06 node1 pengine[1132]: > notice: Demote pgsqld:1#011(Master -> Stopped sds2)May 7 00:39:06 node1 > pengine[1132]: notice: Move master-vip#011(Started sds2 -> sds1)May 7 > 00:39:06 node1 pengine[1132]: notice: Calculated transition 32, saving > inputs in /var/lib/pacemaker/pengine/pe-input-98.bz2May 7 00:39:06 node1 > crmd[1133]: notice: Initiating cancel operation pgsqld_monitor_16000 locally > on sds1May 7 00:39:06 node1 crmd[1133]: notice: Initiating notify operation > pgsqld_pre_notify_demote_0 locally on sds1May 7 00:39:06 node1 crmd[1133]: > notice: Initiating notify operation pgsqld_pre_notify_demote_0 on sds2 > > > > > _______________________________________________ > > Users mailing list: [email protected]<mailto:[email protected]> > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
