Re: [ClusterLabs] Postgres clone resource does not get "notice" events

vitaly via Users Tue, 05 Jul 2022 15:04:15 -0700

Hello,
Yes, the snippet has everything there was for the full second of Jul 05 
11:54:34. I did not cut anything between the last line of 11:54:33 and first 
line of 11:54:35.


Here is grep from pacemaker config:

d19-25-left.lab.archivas.com ~ # egrep -v '^($|#)' /etc/sysconfig/pacemaker
PCMK_logfile=/var/log/pacemaker.log
SBD_SYNC_RESOURCE_STARTUP="no"
PCMK_trace_functions=services_action_sync,svc_read_output
d19-25-left.lab.archivas.com ~ # 

I also grepped CURRENT pacemaker.log for services_action_sync and got just 4 
recs for the time that does not seem to match failures:

d19-25-left.lab.archivas.com ~ # grep services_action_sync 
/var/log/pacemaker.log
Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced    [47287] 
([email protected]:901)  trace:  > (null)_(null)_0: 
/usr/sbin/fence_ipmilan = 0
Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced    [47287] 
([email protected]:903)  trace:  >  stdout: <?xml version="1.0" ?>
Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced    [47287] 
([email protected]:901)  trace:  > (null)_(null)_0: 
/usr/sbin/fence_sbd = 0
Jul 05 21:20:21 d19-25-left.lab.archivas.com pacemaker-fenced    [47287] 
([email protected]:903)  trace:  >  stdout: <?xml version="1.0" ?>

This is grep of messages for failures:

d19-25-left.lab.archivas.com ~ # grep " 5 21:[23].*Failed to .*pgsql-rhino" 
/var/log/messages 
Jul  5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:20:43 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:20:43 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:20:44 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:20:44 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:20:47 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:20:47 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:20:48 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:20:48 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:20:48 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:20:48 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:20:49 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:20:49 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
Jul  5 21:30:26 d19-25-left pacemaker-controld[47291]: error: Failed to receive 
meta-data for ocf:heartbeat:pgsql-rhino
Jul  5 21:30:26 d19-25-left pacemaker-controld[47291]: warning: Failed to get 
metadata for postgres (ocf:heartbeat:pgsql-rhino)
d19-25-left.lab.archivas.com ~ # 

Sorry, these logs are not the same time as this morning as I reinstalled 
cluster couple of times today. 

Thanks,
_Vitaly


> On 07/05/2022 3:19 PM Reid Wahl <[email protected]> wrote:
> 
>  
> On Tue, Jul 5, 2022 at 5:17 AM vitaly <[email protected]> wrote:
> >
> > Hello,
> > Thanks for looking at this issue!
> > Snippets from /var/log/messages and /var/log/pacemaker.log are below.
> > _Vitaly
> >
> > Here is /var/log/pacemaker.log snippet around the failure:
> >
> > Jul 05 11:54:33 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:277)      trace: Reading 
> > M_IP_monitor_10000 stdout into offset 177
> > Jul 05 11:54:34  tomcat-rhino(tomcat-instance)[2295103]:    INFO: [tomcat] 
> > Leave tomcat start 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > tomcat-instance_start_0 stderr into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:277)      trace: Reading 
> > tomcat-instance_start_0 stdout into offset 10505
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (log_finished@execd_commands.c:214)         info: tomcat-instance start 
> > (call 59, PID 2295103) exited with status 0 (execution time 110997ms, queue 
> > time 0ms)
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (log_execute@execd_commands.c:232)  info: executing - rsc:N1F1 action:start 
> > call_id:66
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (log_execute@execd_commands.c:232)  info: executing - rsc:fs_monitor 
> > action:start call_id:67
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > fs_monitor_start_0 stdout into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:287)      trace: Got 54 chars: 2298369 
> > (process ID) old priority 0, new priority -10
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading N1F1_start_0 
> > stdout into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:287)      trace: Got 175 chars: 8: bond0: 
> > <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > tomcat-instance_monitor_10000 stderr into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > tomcat-instance_monitor_10000 stdout into offset 0
> > Jul 05 11:54:34  fs_monitor-rhino(fs_monitor)[2298359]:    INFO: Started 
> > fs_monitor.sh, pid=2298369
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > fs_monitor_start_0 stderr into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:277)      trace: Reading 
> > fs_monitor_start_0 stdout into offset 54
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (log_finished@execd_commands.c:214)         info: fs_monitor start (call 
> > 67, PID 2298359) exited with status 0 (execution time 31ms, queue time 0ms)
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (log_execute@execd_commands.c:232)  info: executing - rsc:ClusterMonitor 
> > action:start call_id:69
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > fs_monitor_monitor_10000 stderr into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > fs_monitor_monitor_10000 stdout into offset 0
> > Jul 05 11:54:34  IPaddr2-rhino(N1F1)[2298357]:    INFO: Adding inet address 
> > 172.18.51.93/23 with broadcast address 172.18.51.255 to device bond0 (with 
> > label bond0:N1F1)
> > Jul 05 11:54:34  IPaddr2-rhino(N1F1)[2298357]:    INFO: Bringing device 
> > bond0 up
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Jul 05 11:54:34  IPaddr2-rhino(N1F1)[2298357]:    INFO: 
> > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
> > /run/resource-agents/send_arp-172.18.51.93 bond0 172.18.51.93 auto not_used 
> > not_used
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading N1F1_start_0 
> > stderr into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:277)      trace: Reading N1F1_start_0 
> > stdout into offset 175
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (log_finished@execd_commands.c:214)         info: N1F1 start (call 66, PID 
> > 2298357) exited with status 0 (execution time 68ms, queue time 0ms)
> > Jul 05 11:54:34  cluster_monitor-rhino(ClusterMonitor)[2298481]:    INFO: 
> > Started cluster_monitor.sh, pid=2298549
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (log_finished@execd_commands.c:214)         info: ClusterMonitor start 
> > (call 69, PID 2298481) exited with status 0 (execution time 40ms, queue 
> > time 0ms)
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > ClusterMonitor_monitor_10000 stdout into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:287)      trace: Got 8 chars: 2298549
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > N1F1_monitor_10000 stdout into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:287)      trace: Got 175 chars: 8: bond0: 
> > <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > N1F1_monitor_10000 stderr into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:277)      trace: Reading 
> > N1F1_monitor_10000 stdout into offset 175
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:277)      trace: Reading 
> > ClusterMonitor_monitor_10000 stdout into offset 8
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:287)      trace: Got 499 chars: root     
> > 2298549  0.0  0.0 127584 13156 ?        S    11:54   0:00 /sbin/crm_mon
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:287)      trace: Got 19 chars: ep 
> > cluster_monitor
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > ClusterMonitor_monitor_10000 stderr into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:277)      trace: Reading 
> > ClusterMonitor_monitor_10000 stdout into offset 526
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > ethmonitor_monitor_10000 stderr into offset 0
> > Jul 05 11:54:34 d19-25-right.lab.archivas.com pacemaker-execd     [2294543] 
> > (svc_read_output@services_linux.c:280)      trace: Reading 
> > ethmonitor_monitor_10000 stdout into offset 0
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Jul 05 11:54:34  pgsql-rhino(postgres)[2298353]:    INFO: Changing 
> > pgsql-data-status on d19-25-left.lab.archivas.com : 
> > DISCONNECT->STREAMING|ASYNC.
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
> > Jul 05 11:54:35  pgsql-rhino(postgres)[2298353]:    INFO: Setup 
> > d19-25-left.lab.archivas.com into sync mode.
> >
> >
> > *********
> > Here is /var/log/messages snippet around the failure:
> >
> > Jul  5 11:54:02 d19-25-right systemd[1]: session-1133.scope: Succeeded.
> > Jul  5 11:54:34 d19-25-right tomcat-rhino(tomcat-instance)[2295103]: INFO: 
> > [tomcat] Leave tomcat start 0
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > start operation for tomcat-instance on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of monitor operation for postgres on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: error: Failed to 
> > receive meta-data for ocf:heartbeat:pgsql-rhino
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: warning: Failed 
> > to get metadata for postgres (ocf:heartbeat:pgsql-rhino)
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of monitor operation for tomcat-instance on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of monitor operation for ethmonitor on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of start operation for N1F1 on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of start operation for fs_monitor on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right fs_monitor-rhino(fs_monitor)[2298359]: INFO: 
> > Started fs_monitor.sh, pid=2298369
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > monitor operation for tomcat-instance on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > start operation for fs_monitor on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of monitor operation for fs_monitor on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of start operation for ClusterMonitor on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > monitor operation for fs_monitor on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right IPaddr2-rhino(N1F1)[2298357]: INFO: Adding 
> > inet address 172.18.51.93/23 with broadcast address 172.18.51.255 to device 
> > bond0 (with label bond0:N1F1)
> > Jul  5 11:54:34 d19-25-right IPaddr2-rhino(N1F1)[2298357]: INFO: Bringing 
> > device bond0 up
> > Jul  5 11:54:34 d19-25-right IPaddr2-rhino(N1F1)[2298357]: INFO: 
> > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
> > /run/resource-agents/send_arp-172.18.51.93 bond0 172.18.51.93 auto not_used 
> > not_used
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > start operation for N1F1 on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of monitor operation for N1F1 on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right 
> > cluster_monitor-rhino(ClusterMonitor)[2298481]: INFO: Started 
> > cluster_monitor.sh, pid=2298549
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > start operation for ClusterMonitor on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: 
> > Requesting local execution of monitor operation for ClusterMonitor on 
> > d19-25-right.lab.archivas.com
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > monitor operation for N1F1 on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > monitor operation for ClusterMonitor on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right pacemaker-controld[2294546]: notice: Result of 
> > monitor operation for ethmonitor on d19-25-right.lab.archivas.com: ok
> > Jul  5 11:54:34 d19-25-right pgsql-rhino(postgres)[2298353]: INFO: Changing 
> > pgsql-data-status on d19-25-left.lab.archivas.com : 
> > DISCONNECT->STREAMING|ASYNC.
> > Jul  5 11:54:35 d19-25-right pgsql-rhino(postgres)[2298353]: INFO: Setup 
> > d19-25-left.lab.archivas.com into sync mode.
> 
> Is that the full pacemaker.log from the 11:54:33-35 interval? The
> services_action_sync() function should have logged at least
> **something**, if we reached the "Failed to receive" message.
> 
> If that's not the full log snippet from that interval, then please share it.
> 
> Also please share the /etc/sysconfig/pacemaker settings:
> 
> # egrep -v '^($|#)' /etc/sysconfig/pacemaker
> 
> 
> > > On 07/04/2022 3:57 PM Reid Wahl <[email protected]> wrote:
> > >
> > >
> > > On Mon, Jul 4, 2022 at 7:19 AM vitaly <[email protected]> wrote:
> > > >
> > > > I get printout of metadata as follows:
> > > > d19-25-left.lab.archivas.com ~ # OCF_ROOT=/usr/lib/ocf 
> > > > /usr/lib/ocf/resource.d/heartbeat/pgsql-rhino meta-data
> > > > <?xml version="1.0"?>
> > > > <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
> > > > <resource-agent name="pgsql-rhino">
> > > > <version>1.0</version>
> > > >
> > > > <longdesc lang="en">
> > > > Resource script for PostgreSQL. It manages a PostgreSQL as an HA 
> > > > resource.
> > > > </longdesc>
> > > > <shortdesc lang="en">Manages a PostgreSQL database instance</shortdesc>
> > > >
> > > > <parameters>
> > > > <parameter name="pgctl" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Path to pg_ctl command.
> > > > </longdesc>
> > > > <shortdesc lang="en">pgctl</shortdesc>
> > > > <content type="string" default="/usr/bin/pg_ctl" />
> > > > </parameter>
> > > >
> > > > <parameter name="start_opt" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Start options (-o start_opt in pg_ctl). "-i -p 5432" for example.
> > > > </longdesc>
> > > > <shortdesc lang="en">start_opt</shortdesc>
> > > > <content type="string" default="" />
> > > >
> > > > </parameter>
> > > > <parameter name="ctl_opt" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Additional pg_ctl options (-w, -W etc..).
> > > > </longdesc>
> > > > <shortdesc lang="en">ctl_opt</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="psql" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Path to psql command.
> > > > </longdesc>
> > > > <shortdesc lang="en">psql</shortdesc>
> > > > <content type="string" default="/usr/bin/psql" />
> > > > </parameter>
> > > >
> > > > <parameter name="pgdata" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Path to PostgreSQL data directory.
> > > > </longdesc>
> > > > <shortdesc lang="en">pgdata</shortdesc>
> > > > <content type="string" default="/var/lib/pgsql/data" />
> > > > </parameter>
> > > >
> > > > <parameter name="pgdba" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > User that owns PostgreSQL.
> > > > </longdesc>
> > > > <shortdesc lang="en">pgdba</shortdesc>
> > > > <content type="string" default="postgres" />
> > > > </parameter>
> > > >
> > > > <parameter name="pghost" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Hostname/IP address where PostgreSQL is listening
> > > > </longdesc>
> > > > <shortdesc lang="en">pghost</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="pgport" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Port where PostgreSQL is listening
> > > > </longdesc>
> > > > <shortdesc lang="en">pgport</shortdesc>
> > > > <content type="integer" default="5432" />
> > > > </parameter>
> > > >
> > > > <parameter name="monitor_user" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > PostgreSQL user that pgsql RA will user for monitor operations. If it's 
> > > > not set
> > > > pgdba user will be used.
> > > > </longdesc>
> > > > <shortdesc lang="en">monitor_user</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="monitor_password" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Password for monitor user.
> > > > </longdesc>
> > > > <shortdesc lang="en">monitor_password</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="monitor_sql" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > SQL script that will be used for monitor operations.
> > > > </longdesc>
> > > > <shortdesc lang="en">monitor_sql</shortdesc>
> > > > <content type="string" default="select now();" />
> > > > </parameter>
> > > >
> > > > <parameter name="config" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Path to the PostgreSQL configuration file for the instance.
> > > > </longdesc>
> > > > <shortdesc lang="en">Configuration file</shortdesc>
> > > > <content type="string" default="/var/lib/pgsql/data/postgresql.conf" />
> > > > </parameter>
> > > >
> > > > <parameter name="pgdb" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Database that will be used for monitoring.
> > > > </longdesc>
> > > > <shortdesc lang="en">pgdb</shortdesc>
> > > > <content type="string" default="rhinodb" />
> > > > </parameter>
> > > >
> > > > <parameter name="logfile" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Path to PostgreSQL server log output file.
> > > > </longdesc>
> > > > <shortdesc lang="en">logfile</shortdesc>
> > > > <content type="string" default="/dev/null" />
> > > > </parameter>
> > > >
> > > > <parameter name="socketdir" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Unix socket directory for PostgeSQL
> > > > </longdesc>
> > > > <shortdesc lang="en">socketdir</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="stop_escalate" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Number of shutdown retries (using -m fast) before resorting to -m 
> > > > immediate
> > > > </longdesc>
> > > > <shortdesc lang="en">stop escalation</shortdesc>
> > > > <content type="integer" default="30" />
> > > > </parameter>
> > > >
> > > > <parameter name="rep_mode" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Replication mode(none(default)/async/sync).
> > > > "async" and "sync" require PostgreSQL 9.1 or later.
> > > > If you use async or sync, it requires node_list, master_ip, 
> > > > restore_command
> > > > parameters, and needs setting postgresql.conf, pg_hba.conf up for
> > > > replication.
> > > > Please delete "include /../../rep_mode.conf" line in postgresql.conf
> > > > when you switch from sync to async.
> > > > </longdesc>
> > > > <shortdesc lang="en">rep_mode</shortdesc>
> > > > <content type="string" default="none" />
> > > > </parameter>
> > > >
> > > > <parameter name="node_list" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > All node names. Please separate each node name with a space.
> > > > This is required for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">node list</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="restore_command" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > restore_command for recovery.conf.
> > > > This is required for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">restore_command</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="master_ip" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Master's floating IP address to be connected from hot standby.
> > > > This parameter is used for "primary_conninfo" in recovery.conf.
> > > > This is required for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">master ip</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="repuser" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > User used to connect to the master server.
> > > > This parameter is used for "primary_conninfo" in recovery.conf.
> > > > This is required for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">repuser</shortdesc>
> > > > <content type="string" default="postgres" />
> > > > </parameter>
> > > >
> > > > <parameter name="remote_wals_dir" unique="0" required="1">
> > > > <longdesc lang="en">
> > > > Location of WALS archived by the other node
> > > > </longdesc>
> > > > <shortdesc lang="en">remote_wals_dir</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="xlogs_dir" unique="0" required="1">
> > > > <longdesc lang="en">
> > > > Location of WALS on current node in Rhino before 2.2.0
> > > > </longdesc>
> > > > <shortdesc lang="en">xlogs_dir</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="wals_dir" unique="0" required="1">
> > > > <longdesc lang="en">
> > > > Location of WALS on current node in Rhino 2.2.0 and later
> > > > </longdesc>
> > > > <shortdesc lang="en">wals_dir</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="reppassword" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > User used to connect to the master server.
> > > > This parameter is used for "primary_conninfo" in recovery.conf.
> > > > This is required for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">reppassword</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="primary_conninfo_opt" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > primary_conninfo options of recovery.conf except host, port, user and 
> > > > application_name.
> > > > This is optional for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">primary_conninfo_opt</shortdesc>
> > > > <content type="string" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="tmpdir" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Path to temporary directory.
> > > > This is optional for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">tmpdir</shortdesc>
> > > > <content type="string" default="/var/lib/pgsql/tmp" />
> > > > </parameter>
> > > >
> > > > <parameter name="xlog_check_count" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Number of checking xlog on monitor before promote.
> > > > This is optional for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">xlog check count</shortdesc>
> > > > <content type="integer" default="" />
> > > > </parameter>
> > > >
> > > > <parameter name="crm_attr_timeout" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > The timeout of crm_attribute forever update command.
> > > > Default value is 5 seconds.
> > > > This is optional for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">The timeout of crm_attribute forever update 
> > > > command.</shortdesc>
> > > > <content type="integer" default="5" />
> > > > </parameter>
> > > >
> > > > <parameter name="stop_escalate_in_slave" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Number of shutdown retries (using -m fast) before resorting to -m 
> > > > immediate
> > > > in Slave state.
> > > > This is optional for replication.
> > > > </longdesc>
> > > > <shortdesc lang="en">stop escalation_in_slave</shortdesc>
> > > > <content type="integer" default="30" />
> > > > </parameter>
> > > >
> > > > <parameter name="process_start_timeout" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Number of seconds to wait for a postgreSQL process to be running but 
> > > > not necessarilly usable
> > > > </longdesc>
> > > > <shortdesc lang="en">Seconds to wait for a process to be 
> > > > running</shortdesc>
> > > > <content type="integer" default="30" />
> > > > </parameter>
> > > >
> > > > <parameter name="start_attempts_force_recover" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Number of failed starts before the system forces a recovery from the 
> > > > master database
> > > > </longdesc>
> > > > <shortdesc lang="en">Start failures before recovery</shortdesc>
> > > > <content type="integer" default="20" />
> > > > </parameter>
> > > >
> > > > <parameter name="rhino_config_file" unique="0" required="0">
> > > > <longdesc lang="en">
> > > > Configuration file with overrides for pgsql-rhino.
> > > > </longdesc>
> > > > <shortdesc lang="en">Rhino configuration file</shortdesc>
> > > > <content type="string" default="/opt/rhino/config/pgsql-rhino.conf" />
> > > > </parameter>
> > > >
> > > > </parameters>
> > > >
> > > > <actions>
> > > > <action name="start" timeout="120" />
> > > > <action name="stop" timeout="120" />
> > > > <action name="status" timeout="60" />
> > > > <action name="monitor" depth="0" timeout="30" interval="30"/>
> > > > <action name="monitor" depth="0" timeout="30" interval="29" 
> > > > role="Master" />
> > > > <action name="promote" timeout="120" />
> > > > <action name="demote" timeout="120" />
> > > > <action name="notify"   timeout="90" />
> > > > <action name="meta-data" timeout="5" />
> > > > <action name="validate-all" timeout="5" />
> > > > <action name="methods" timeout="5" />
> > > > </actions>
> > > > </resource-agent>
> > >
> > > Hmm, seems reasonable. No permissions issues, and it looks like we
> > > should only print the "Failed to receive" message if we don't receive
> > > any stdout at all from the meta-data action.
> > >
> > > Can you add the following to /etc/sysconfig/pacemaker and restart
> > > pacemaker? Then monitor /var/log/pacemaker/pacemaker.log for relevant
> > > trace-level messages around the same time as the "Failed to receive
> > > meta-data" messages.
> > >
> > > PCMK_trace_functions=services_action_sync,svc_read_output
> > >
> > > This will get fairly verbose if you have more than a couple of
> > > resources, so after you've grabbed any relevant logs, comment that
> > > line out and restart pacemaker again.
> > >
> > > >
> > > > > On 07/04/2022 5:39 AM Reid Wahl <[email protected]> wrote:
> > > > >
> > > > >
> > > > > On Mon, Jul 4, 2022 at 1:06 AM Reid Wahl <[email protected]> wrote:
> > > > > >
> > > > > > On Sat, Jul 2, 2022 at 1:12 PM vitaly <[email protected]> wrote:
> > > > > > >
> > > > > > > Sorry, I noticed that I am missing meta "notice=true" and after 
> > > > > > > adding it to postgres-ms configuration "notice" events started to 
> > > > > > > come through.
> > > > > > > Item 1 still needs explanation. As pacemaker-controld keeps 
> > > > > > > complaining.
> > > > > >
> > > > > > What happens when you run `OCF_ROOT=/usr/lib/ocf
> > > > > > /usr/lib/ocf/resource.d/heartbeat/pgsql-rhino meta-data`?
> > > > >
> > > > > This may also be relevant:
> > > > > https://lists.clusterlabs.org/pipermail/users/2022-June/030391.html
> > > > >
> > > > > >
> > > > > > > Thanks!
> > > > > > > _Vitaly
> > > > > > >
> > > > > > > > On 07/02/2022 2:04 PM vitaly <[email protected]> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Hello Everybody.
> > > > > > > > I have a 2 node cluster with clone resource “postgres-ms”. We 
> > > > > > > > are running following versions of pacemaker/corosync:
> > > > > > > > d19-25-left.lab.archivas.com ~ # rpm -qa | grep 
> > > > > > > > "pacemaker\|corosync"
> > > > > > > > pacemaker-cluster-libs-2.0.5-9.el8.x86_64
> > > > > > > > pacemaker-libs-2.0.5-9.el8.x86_64
> > > > > > > > pacemaker-cli-2.0.5-9.el8.x86_64
> > > > > > > > corosynclib-3.1.0-5.el8.x86_64
> > > > > > > > pacemaker-schemas-2.0.5-9.el8.noarch
> > > > > > > > corosync-3.1.0-5.el8.x86_64
> > > > > > > > pacemaker-2.0.5-9.el8.x86_64
> > > > > > > >
> > > > > > > > There are couple of issues that could be related.
> > > > > > > > 1. There are following messages in the logs coming from 
> > > > > > > > pacemaker-controld:
> > > > > > > > Jul  2 14:59:27 d19-25-right pacemaker-controld[1489734]: 
> > > > > > > > error: Failed to receive meta-data for ocf:heartbeat:pgsql-rhino
> > > > > > > > Jul  2 14:59:27 d19-25-right pacemaker-controld[1489734]: 
> > > > > > > > warning: Failed to get metadata for postgres 
> > > > > > > > (ocf:heartbeat:pgsql-rhino)
> > > > > > > >
> > > > > > > > 2. ocf:heartbeat:pgsql-rhino does not get any "notice" 
> > > > > > > > operations which causes multiple issues with postgres 
> > > > > > > > synchronization during availability events.
> > > > > > > >
> > > > > > > > 3. Item 2 raises another question. Who is setting these values:
> > > > > > > > ${OCF_RESKEY_CRM_meta_notify_type}
> > > > > > > > ${OCF_RESKEY_CRM_meta_notify_operation}
> > > > > > > >
> > > > > > > > Here is excerpt from cluster config:
> > > > > > > >
> > > > > > > > d19-25-left.lab.archivas.com ~ # pcs config
> > > > > > > >
> > > > > > > > Cluster Name:
> > > > > > > > Corosync Nodes:
> > > > > > > >  d19-25-right.lab.archivas.com d19-25-left.lab.archivas.com
> > > > > > > > Pacemaker Nodes:
> > > > > > > >  d19-25-left.lab.archivas.com d19-25-right.lab.archivas.com
> > > > > > > >
> > > > > > > > Resources:
> > > > > > > >  Clone: postgres-ms
> > > > > > > >   Meta Attrs: promotable=true target-role=started
> > > > > > > >   Resource: postgres (class=ocf provider=heartbeat 
> > > > > > > > type=pgsql-rhino)
> > > > > > > >    Attributes: master_ip=172.16.1.6 
> > > > > > > > node_list="d19-25-left.lab.archivas.com 
> > > > > > > > d19-25-right.lab.archivas.com" pgdata=/pg_data 
> > > > > > > > remote_wals_dir=/remote/walarchive rep_mode=sync 
> > > > > > > > reppassword=XXXXXX repuser=XXXXXXX 
> > > > > > > > restore_command="/opt/rhino/sil/bin/script_wrapper.sh 
> > > > > > > > wal_restore.py  %f %p" tmpdir=/pg_data/tmp 
> > > > > > > > wals_dir=/pg_data/pg_wal xlogs_dir=/pg_data/pg_xlog
> > > > > > > >    Meta Attrs: is-managed=true
> > > > > > > >    Operations: demote interval=0 on-fail=restart timeout=120s 
> > > > > > > > (postgres-demote-interval-0)
> > > > > > > >                methods interval=0s timeout=5 
> > > > > > > > (postgres-methods-interval-0s)
> > > > > > > >                monitor interval=10s on-fail=restart 
> > > > > > > > timeout=300s (postgres-monitor-interval-10s)
> > > > > > > >                monitor interval=5s on-fail=restart role=Master 
> > > > > > > > timeout=300s (postgres-monitor-interval-5s)
> > > > > > > >                notify interval=0 on-fail=restart timeout=90s 
> > > > > > > > (postgres-notify-interval-0)
> > > > > > > >                promote interval=0 on-fail=restart timeout=120s 
> > > > > > > > (postgres-promote-interval-0)
> > > > > > > >                start interval=0 on-fail=restart timeout=1800s 
> > > > > > > > (postgres-start-interval-0)
> > > > > > > >                stop interval=0 on-fail=fence timeout=120s 
> > > > > > > > (postgres-stop-interval-0)
> > > > > > > > Thank you very much!
> > > > > > > > _Vitaly
> > > > > > > > _______________________________________________
> > > > > > > > Manage your subscription:
> > > > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > > >
> > > > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > > > _______________________________________________
> > > > > > > Manage your subscription:
> > > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > >
> > > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > Reid Wahl (He/Him), RHCA
> > > > > > Senior Software Maintenance Engineer, Red Hat
> > > > > > CEE - Platform Support Delivery - ClusterHA
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Reid Wahl (He/Him), RHCA
> > > > > Senior Software Maintenance Engineer, Red Hat
> > > > > CEE - Platform Support Delivery - ClusterHA
> > > > >
> > > > > _______________________________________________
> > > > > Manage your subscription:
> > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > >
> > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Reid Wahl (He/Him), RHCA
> > > Senior Software Maintenance Engineer, Red Hat
> > > CEE - Platform Support Delivery - ClusterHA
> >
> 
> 
> -- 
> Regards,
> 
> Reid Wahl (He/Him), RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Postgres clone resource does not get "notice" events

Reply via email to