We found one of our cluster nodes down this morning. The server was up but
cluster services were not running. Upon examination of the logs, we found that
the cluster just stopped around 9:40:31 and then I started it up manually (pcs
cluster start) at 11:49:48. I can't imagine that Pacemaker just randomly
terminates. Any thoughts why it would behave this way?
May 27 09:25:31 [92170] 001store01a pengine: notice: process_pe_message:
Calculated transition 91482, saving inputs in
/var/lib/pacemaker/pengine/pe-input-756.bz2
May 27 09:25:31 [92171] 001store01a crmd: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response
May 27 09:25:31 [92171] 001store01a crmd: info: do_te_invoke:
Processing graph 91482 (ref=pe_calc-dc-1622121931-124396) derived from
/var/lib/pacemaker/pengine/pe-input-756.bz2
May 27 09:25:31 [92171] 001store01a crmd: notice: run_graph:
Transition 91482 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-756.bz2): Complete
May 27 09:25:31 [92171] 001store01a crmd: info: do_log: Input
I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
May 27 09:25:31 [92171] 001store01a crmd: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd
May 27 09:40:31 [92171] 001store01a crmd: info: crm_timer_popped:
PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
May 27 09:40:31 [92171] 001store01a crmd: notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC
cause=C_TIMER_POPPED origin=crm_timer_popped
May 27 09:40:31 [92171] 001store01a crmd: info: do_state_transition:
Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
May 27 09:40:31 [92170] 001store01a pengine: info: process_pe_message:
Input has not changed since last time, not saving to disk
May 27 09:40:31 [92170] 001store01a pengine: info:
determine_online_status: Node 001store01a is online
May 27 09:40:31 [92170] 001store01a pengine: info: determine_op_status:
Operation monitor found resource p_pure-ftpd-itls active on 001store01a
May 27 09:40:31 [92170] 001store01a pengine: warning:
unpack_rsc_op_failure: Processing failed op monitor for p_vip_ftpclust01
on 001store01a: unknown error (1)
May 27 09:40:31 [92170] 001store01a pengine: info: determine_op_status:
Operation monitor found resource p_pure-ftpd-etls active on 001store01a
May 27 09:40:31 [92170] 001store01a pengine: info: unpack_node_loop:
Node 1 is already processed
May 27 09:40:31 [92170] 001store01a pengine: info: unpack_node_loop:
Node 1 is already processed
May 27 09:40:31 [92170] 001store01a pengine: info: common_print:
p_vip_ftpclust01 (ocf::heartbeat:IPaddr2): Started 001store01a
May 27 09:40:31 [92170] 001store01a pengine: info: common_print:
p_replicator (systemd:pure-replicator): Started 001store01a
May 27 09:40:31 [92170] 001store01a pengine: info: common_print:
p_pure-ftpd-etls (systemd:pure-ftpd-etls): Started 001store01a
May 27 09:40:31 [92170] 001store01a pengine: info: common_print:
p_pure-ftpd-itls (systemd:pure-ftpd-itls): Started 001store01a
May 27 09:40:31 [92170] 001store01a pengine: info: LogActions: Leave
p_vip_ftpclust01 (Started 001store01a)
May 27 09:40:31 [92170] 001store01a pengine: info: LogActions: Leave
p_replicator (Started 001store01a)
May 27 09:40:31 [92170] 001store01a pengine: info: LogActions: Leave
p_pure-ftpd-etls (Started 001store01a)
May 27 09:40:31 [92170] 001store01a pengine: info: LogActions: Leave
p_pure-ftpd-itls (Started 001store01a)
May 27 09:40:31 [92170] 001store01a pengine: notice: process_pe_message:
Calculated transition 91483, saving inputs in
/var/lib/pacemaker/pengine/pe-input-756.bz2
May 27 09:40:31 [92171] 001store01a crmd: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response
May 27 09:40:31 [92171] 001store01a crmd: info: do_te_invoke:
Processing graph 91483 (ref=pe_calc-dc-1622122831-124397) derived from
/var/lib/pacemaker/pengine/pe-input-756.bz2
May 27 09:40:31 [92171] 001store01a crmd: notice: run_graph:
Transition 91483 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-756.bz2): Complete
May 27 09:40:31 [92171] 001store01a crmd: info: do_log: Input
I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
May 27 09:40:31 [92171] 001store01a crmd: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd
[10667] 001store01a.ccnva.local corosyncnotice [MAIN ] Corosync Cluster
Engine ('2.4.3'): started and ready to provide service.
[10667] 001store01a.ccnva.local corosyncinfo [MAIN ] Corosync built-in
features: dbus systemd xmlconf qdevices qnetd snmp libcgroup pie relro bindnow
[10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] Initializing transport
(UDP/IP Unicast).
[10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
[10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] The network interface
[10.51.14.40] is now up.
[10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service engine loaded:
corosync configuration map access [0]
[10667] 001store01a.ccnva.local corosyncinfo [QB ] server name: cmap
[10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service engine loaded:
corosync configuration service [1]
[10667] 001store01a.ccnva.local corosyncinfo [QB ] server name: cfg
[10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
[10667] 001store01a.ccnva.local corosyncinfo [QB ] server name: cpg
[10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service engine loaded:
corosync profile loading service [4]
[10667] 001store01a.ccnva.local corosyncnotice [QUORUM] Using quorum provider
corosync_votequorum
[10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
[10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service engine loaded:
corosync vote quorum service v1.0 [5]
[10667] 001store01a.ccnva.local corosyncinfo [QB ] server name: votequorum
[10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service engine loaded:
corosync cluster quorum service v0.1 [3]
[10667] 001store01a.ccnva.local corosyncinfo [QB ] server name: quorum
[10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] adding new UDPU member
{10.51.14.40}
[10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] adding new UDPU member
{10.51.14.41}
[10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] A new membership
(10.51.14.40:6412) was formed. Members joined: 1
[10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
[10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
[10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for all
cluster members. Current votes: 1 expected_votes: 2
[10667] 001store01a.ccnva.local corosyncnotice [QUORUM] Members[1]: 1
[10667] 001store01a.ccnva.local corosyncnotice [MAIN ] Completed service
synchronization, ready to provide service.
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: notice: main:
Starting Pacemaker 1.1.18-11.el7_5.3 | build=2b07d5c5a9 features:
generated-manpages agent-manpages ncurses libqb-logging libqb-ipc systemd
nagios corosync-native atomic-attrd acls
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info: main:
Maximum core file size is: 18446744073709551615
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
qb_ipcs_us_publish: server name: pacemakerd
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
crm_get_peer: Created entry
05ad8b08-25a3-4a2d-84cb-1fc355fb697c/0x55d844a446b0 for node 001store01a/1 (1
total)
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
crm_get_peer: Node 1 is now known as 001store01a
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
crm_get_peer: Node 1 has uuid 1
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
crm_update_peer_proc: cluster_connect_cpg: Node 001store01a[1] -
corosync-cpg is now online
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: warning:
cluster_connect_quorum: Quorum lost
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
crm_get_peer: Created entry
2f1f038e-9cc1-4a43-bab9-e7c91ca0bf3f/0x55d844a45ee0 for node 001store01b/2 (2
total)
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
crm_get_peer: Node 2 is now known as 001store01b
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
crm_get_peer: Node 2 has uuid 2
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
start_child: Using uid=189 and group=189 for process cib
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
start_child: Forked child 10682 for process cib
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
start_child: Forked child 10683 for process stonith-ng
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
start_child: Forked child 10684 for process lrmd
May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: info:
start_child: Using uid=189 and group=189 for process attrd
[cid:[email protected]]
Disclaimer : This email and any files transmitted with it are confidential and
intended solely for intended recipients. If you are not the named addressee you
should not disseminate, distribute, copy or alter this email. Any views or
opinions presented in this email are solely those of the author and might not
represent those of Physician Select Management. Warning: Although Physician
Select Management has taken reasonable precautions to ensure no viruses are
present in this email, the company cannot accept responsibility for any loss or
damage arising from the use of this email or attachments.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/