Hi! I think "Processing failed op start for vmware_fence on q-gp2-dbpg57-3: unknown error (1)" is the reason. You should investigate why it could not be started.
Regards, Ulrich >>> Casey Allen Shobe <[email protected]> schrieb am 01.08.2018 um 21:43 in Nachricht <[email protected]>: > Here is the corosync.log for the first host in the list at the indicated > time. Not sure what it's doing or why ‑ all cluster nodes were up and running > the entire time...no fencing events. > > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > Diff: ‑‑‑ 0.700.4 2 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > Diff: +++ 0.700.5 (null) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + /cib: @num_updates=5 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + > /cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id=' > vmware_fence']/lrm_rsc_op[@id='vmware_fence_last_0']: > @operation_key=vmware_fence_start_0, @operation=start, > @transition‑key=42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, > @transition‑magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, > @call‑id=42, @rc‑code=1, @op‑status=4, @exec‑time=1510 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + > /cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id=' > vmware_fence']/lrm_rsc_op[@id='vmware_fence_last_failure_0']: > @operation_key=vmware_fence_start_0, @operation=start, > @transition‑key=42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, > @transition‑magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, > @call‑id=42, @interval=0, @last‑rc‑change=1532987187, @exec‑time=1510, > @op‑digest=8653f310a5c96a63ab95a > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: > cib_process_request: Completed cib_modify operation for section > status: OK (rc=0, origin=q‑gp2‑dbpg57‑3/crmd/32, version=0.700.5) > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: notice: > abort_transition_graph: Transition aborted by vmware_fence_start_0 > 'modify' on q‑gp2‑dbpg57‑3: Event failed > (magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, cib=0.700.5, > source=match_graph_event:381, 0) > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: info: > abort_transition_graph: Transition aborted by vmware_fence_start_0 > 'modify' on q‑gp2‑dbpg57‑3: Event failed > (magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, cib=0.700.5, > source=match_graph_event:381, 0) > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: notice: run_graph: > Transition 5084 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, > Source=/var/lib/pacemaker/pengine/pe‑input‑729.bz2): Complete > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: info: > do_state_transition: State transition S_TRANSITION_ENGINE ‑> > S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: > cib_process_request: Forwarding cib_modify operation for section > status to master (origin=local/attrd/46) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status_fencing: Node q‑gp2‑dbpg57‑1 is active > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status: Node q‑gp2‑dbpg57‑1 is online > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status_fencing: Node q‑gp2‑dbpg57‑3 is active > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status: Node q‑gp2‑dbpg57‑3 is online > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status_fencing: Node q‑gp2‑dbpg57‑2 is active > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status: Node q‑gp2‑dbpg57‑2 is online > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑master‑vip active on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑master‑vip active on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: warning: > unpack_rsc_op_failure: Processing failed op start for vmware_fence on > q‑gp2‑dbpg57‑3: unknown error (1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: warning: > unpack_rsc_op_failure: Processing failed op start for vmware_fence on > q‑gp2‑dbpg57‑3: unknown error (1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: warning: > unpack_rsc_op_failure: Processing failed op monitor for vmware_fence on > q‑gp2‑dbpg57‑2: unknown error (1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:2 active on q‑gp2‑dbpg57‑2 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:2 active on q‑gp2‑dbpg57‑2 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: native_print: > postgresql‑master‑vip (ocf::heartbeat:IPaddr2): Started q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: clone_print: > Master/Slave Set: postgresql‑ha [postgresql‑10‑main] > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: short_print: > Masters: [ q‑gp2‑dbpg57‑1 ] > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: short_print: > Slaves: [ q‑gp2‑dbpg57‑2 q‑gp2‑dbpg57‑3 ] > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: native_print: > vmware_fence (stonith:fence_vmware_rest): FAILED q‑gp2‑dbpg57‑3 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: get_failcount_full: > vmware_fence has failed 5 times on q‑gp2‑dbpg57‑2 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: warning: > common_apply_stickiness: Forcing vmware_fence away from q‑gp2‑dbpg57‑2 after > 5 failures (max=5) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: get_failcount_full: > vmware_fence has failed 1 times on q‑gp2‑dbpg57‑3 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > common_apply_stickiness: vmware_fence can fail 4 more times on q‑gp2‑dbpg57‑3 > before being forced off > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: master_color: > Promoting postgresql‑10‑main:0 (Master q‑gp2‑dbpg57‑1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: master_color: > postgresql‑ha: Promoted 1 instances of a possible 1 to master > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: RecurringOp: > Start recurring monitor (60s) for vmware_fence on q‑gp2‑dbpg57‑3 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: LogActions: Leave > postgresql‑master‑vip (Started q‑gp2‑dbpg57‑1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: LogActions: Leave > postgresql‑10‑main:0 (Master q‑gp2‑dbpg57‑1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: LogActions: Leave > postgresql‑10‑main:1 (Slave q‑gp2‑dbpg57‑3) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: LogActions: Leave > postgresql‑10‑main:2 (Slave q‑gp2‑dbpg57‑2) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: notice: LogActions: Recover > vmware_fence (Started q‑gp2‑dbpg57‑3) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > Diff: ‑‑‑ 0.700.5 2 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > Diff: +++ 0.700.6 (null) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + /cib: @num_updates=6 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + > /cib/status/node_state[@id='3']/transient_attributes[@id='3']/instance_attrib > utes[@id='status‑3']/nvpair[@id='status‑3‑fail‑count‑vmware_fence']: > @value=INFINITY > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE ‑> > S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response ] > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: notice: process_pe_message: > Calculated Transition 5085: /var/lib/pacemaker/pengine/pe‑input‑730.bz2 > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: notice: > abort_transition_graph: Transition aborted by > status‑3‑fail‑count‑vmware_fence, fail‑count‑vmware_fence=INFINITY: > Transient attribute change (modify cib=0.700.6, source=abort_unless_down:329, > path=/cib/status/node_state[@id='3']/transient_attributes[@id='3']/instance_a > ttributes[@id='status‑3']/nvpair[@id='status‑3‑fail‑count‑vmware_fence'], 0) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: > cib_process_request: Completed cib_modify operation for section > status: OK (rc=0, origin=q‑gp2‑dbpg57‑1/attrd/46, version=0.700.6) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: > cib_process_request: Forwarding cib_modify operation for section > status to master (origin=local/attrd/47) > Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1 attrd: info: attrd_cib_callback: > Update 46 for fail‑count‑vmware_fence: OK (0) > Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1 attrd: info: attrd_cib_callback: > Update 46 for fail‑count‑vmware_fence[q‑gp2‑dbpg57‑2]=5: OK (0) > Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1 attrd: info: attrd_cib_callback: > Update 46 for fail‑count‑vmware_fence[q‑gp2‑dbpg57‑3]=INFINITY: OK (0) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > Diff: ‑‑‑ 0.700.6 2 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > Diff: +++ 0.700.7 (null) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + /cib: @num_updates=7 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + > /cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id=' > vmware_fence']/lrm_rsc_op[@id='vmware_fence_last_0']: > @operation_key=vmware_fence_stop_0, @operation=stop, > @transition‑key=4:5085:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, > @transition‑magic=0:0;4:5085:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, @call‑id=43, > @rc‑code=0, @op‑status=0, @last‑run=1532987190, @last‑rc‑change=1532987190, > @exec‑time=0 > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: notice: run_graph: > Transition 5085 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=2, > Source=/var/lib/pacemaker/pengine/pe‑input‑730.bz2): Stopped > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: info: > do_state_transition: State transition S_TRANSITION_ENGINE ‑> > S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: > cib_process_request: Completed cib_modify operation for section > status: OK (rc=0, origin=q‑gp2‑dbpg57‑3/crmd/33, version=0.700.7) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > Diff: ‑‑‑ 0.700.7 2 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > Diff: +++ 0.700.8 (null) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + /cib: @num_updates=8 > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: cib_perform_op: > + > /cib/status/node_state[@id='3']/transient_attributes[@id='3']/instance_attrib > utes[@id='status‑3']/nvpair[@id='status‑3‑last‑failure‑vmware_fence']: > @value=1532987190 > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: info: > abort_transition_graph: Transition aborted by > status‑3‑last‑failure‑vmware_fence, last‑failure‑vmware_fence=1532987190: > Transient attribute change (modify cib=0.700.8, source=abort_unless_down:329, > path=/cib/status/node_state[@id='3']/transient_attributes[@id='3']/instance_a > ttributes[@id='status‑3']/nvpair[@id='status‑3‑last‑failure‑vmware_fence'], 1) > Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1 cib: info: > cib_process_request: Completed cib_modify operation for section > status: OK (rc=0, origin=q‑gp2‑dbpg57‑1/attrd/47, version=0.700.8) > Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1 attrd: info: attrd_cib_callback: > Update 47 for last‑failure‑vmware_fence: OK (0) > Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1 attrd: info: attrd_cib_callback: > Update 47 for last‑failure‑vmware_fence[q‑gp2‑dbpg57‑2]=1532448714: OK (0) > Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1 attrd: info: attrd_cib_callback: > Update 47 for last‑failure‑vmware_fence[q‑gp2‑dbpg57‑3]=1532987190: OK (0) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status_fencing: Node q‑gp2‑dbpg57‑1 is active > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status: Node q‑gp2‑dbpg57‑1 is online > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status_fencing: Node q‑gp2‑dbpg57‑3 is active > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status: Node q‑gp2‑dbpg57‑3 is online > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status_fencing: Node q‑gp2‑dbpg57‑2 is active > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_online_status: Node q‑gp2‑dbpg57‑2 is online > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑master‑vip active on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑master‑vip active on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: warning: > unpack_rsc_op_failure: Processing failed op start for vmware_fence on > q‑gp2‑dbpg57‑3: unknown error (1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: warning: > unpack_rsc_op_failure: Processing failed op monitor for vmware_fence on > q‑gp2‑dbpg57‑2: unknown error (1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:2 active on q‑gp2‑dbpg57‑2 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: > determine_op_status: Operation monitor found resource > postgresql‑10‑main:2 active on q‑gp2‑dbpg57‑2 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: native_print: > postgresql‑master‑vip (ocf::heartbeat:IPaddr2): Started q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: clone_print: > Master/Slave Set: postgresql‑ha [postgresql‑10‑main] > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: short_print: > Masters: [ q‑gp2‑dbpg57‑1 ] > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: short_print: > Slaves: [ q‑gp2‑dbpg57‑2 q‑gp2‑dbpg57‑3 ] > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: native_print: > vmware_fence (stonith:fence_vmware_rest): Stopped > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: get_failcount_full: > vmware_fence has failed 5 times on q‑gp2‑dbpg57‑2 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: warning: > common_apply_stickiness: Forcing vmware_fence away from q‑gp2‑dbpg57‑2 after > 5 failures (max=5) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: get_failcount_full: > vmware_fence has failed INFINITY times on q‑gp2‑dbpg57‑3 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: warning: > common_apply_stickiness: Forcing vmware_fence away from q‑gp2‑dbpg57‑3 after > 1000000 failures (max=5) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: master_color: > Promoting postgresql‑10‑main:0 (Master q‑gp2‑dbpg57‑1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: master_color: > postgresql‑ha: Promoted 1 instances of a possible 1 to master > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: RecurringOp: > Start recurring monitor (60s) for vmware_fence on q‑gp2‑dbpg57‑1 > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: LogActions: Leave > postgresql‑master‑vip (Started q‑gp2‑dbpg57‑1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: LogActions: Leave > postgresql‑10‑main:0 (Master q‑gp2‑dbpg57‑1) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: LogActions: Leave > postgresql‑10‑main:1 (Slave q‑gp2‑dbpg57‑3) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: info: LogActions: Leave > postgresql‑10‑main:2 (Slave q‑gp2‑dbpg57‑2) > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: notice: LogActions: Start > vmware_fence (q‑gp2‑dbpg57‑1) > Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE ‑> > S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response ] > Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1 pengine: notice: process_pe_message: > Calculated Transition 5086: /var/lib/pacemaker/pengine/pe‑input‑731.bz2 > Jul 30 21:46:30 [3880] q‑gp2‑dbpg57‑1 lrmd: info: log_execute: > executing ‑ rsc:vmware_fence action:start call_id:77 > Jul 30 21:46:30 [3879] q‑gp2‑dbpg57‑1 stonith‑ng: warning: log_action: > fence_vmware_rest[5739] stderr: [ 2018‑07‑30 21:46:30,895 ERROR: Unable to > connect/login to fencing device ] > Jul 30 21:46:30 [3879] q‑gp2‑dbpg57‑1 stonith‑ng: warning: log_action: > fence_vmware_rest[5739] stderr: [ ] > Jul 30 21:46:30 [3879] q‑gp2‑dbpg57‑1 stonith‑ng: warning: log_action: > fence_vmware_rest[5739] stderr: [ ] > Jul 30 21:46:30 [3879] q‑gp2‑dbpg57‑1 stonith‑ng: info: > internal_stonith_action_execute: Attempt 2 to execute fence_vmware_rest > (monitor). remaining timeout is 20 > > >> On 2018‑08‑01, at 1:39 PM, Casey Allen Shobe <[email protected]> > wrote: >> >> Across our clusters, I see the fence agent stop working, with no apparent > reason. It looks like shown below. I've found that I can do a `pcs resource > cleanup vmware_fence` to cause it to start back up again in a few seconds, > but why is this happening and how can I prevent it? >> >> vmware_fence (stonith:fence_vmware_rest): Stopped >> >> Failed Actions: >> * vmware_fence_start_0 on q‑gp2‑dbpg57‑1 'unknown error' (1): call=77, > status=Error, exitreason='none', >> last‑rc‑change='Mon Jul 30 21:46:30 2018', queued=1ms, exec=1862ms >> * vmware_fence_start_0 on q‑gp2‑dbpg57‑3 'unknown error' (1): call=42, > status=Error, exitreason='none', >> last‑rc‑change='Mon Jul 30 21:46:27 2018', queued=0ms, exec=1510ms >> * vmware_fence_monitor_60000 on q‑gp2‑dbpg57‑2 'unknown error' (1): call=84, > status=Error, exitreason='none', >> last‑rc‑change='Tue Jul 24 16:11:42 2018', queued=0ms, exec=12142ms >> >> Thank you, >> ‑‑ >> Casey >> _______________________________________________ >> Users mailing list: [email protected] >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
