[ClusterLabs] Antw: Re: Fence agent ends up stopped with no clear reason why

Ulrich Windl Wed, 01 Aug 2018 23:22:10 -0700

Hi!

I think "Processing failed op start for vmware_fence on q-gp2-dbpg57-3:
unknown error (1)" is the reason. You should investigate why it could not be
started.


Regards,
Ulrich

>>> Casey Allen Shobe <[email protected]> schrieb am 01.08.2018 um
21:43
in Nachricht <[email protected]>:
> Here is the corosync.log for the first host in the list at the indicated 
> time.  Not sure what it's doing or why ‑ all cluster nodes were up and
running 
> the entire time...no fencing events.
> 
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> Diff: ‑‑‑ 0.700.4 2
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> Diff: +++ 0.700.5 (null)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  /cib:  @num_updates=5
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  
>
/cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='
> vmware_fence']/lrm_rsc_op[@id='vmware_fence_last_0']:  
> @operation_key=vmware_fence_start_0, @operation=start, 
> @transition‑key=42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @transition‑magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @call‑id=42, @rc‑code=1, @op‑status=4, @exec‑time=1510
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  
>
/cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='
> vmware_fence']/lrm_rsc_op[@id='vmware_fence_last_failure_0']:  
> @operation_key=vmware_fence_start_0, @operation=start, 
> @transition‑key=42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @transition‑magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @call‑id=42, @interval=0, @last‑rc‑change=1532987187, @exec‑time=1510, 
> @op‑digest=8653f310a5c96a63ab95a
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: 
> cib_process_request:        Completed cib_modify operation for section 
> status: OK (rc=0, origin=q‑gp2‑dbpg57‑3/crmd/32, version=0.700.5)
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:   notice: 
> abort_transition_graph:     Transition aborted by vmware_fence_start_0 
> 'modify' on q‑gp2‑dbpg57‑3: Event failed 
> (magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, cib=0.700.5, 
> source=match_graph_event:381, 0)
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:     info: 
> abort_transition_graph:     Transition aborted by vmware_fence_start_0 
> 'modify' on q‑gp2‑dbpg57‑3: Event failed 
> (magic=4:1;42:5084:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, cib=0.700.5, 
> source=match_graph_event:381, 0)
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:   notice: run_graph:  
> Transition 5084 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, 
> Source=/var/lib/pacemaker/pengine/pe‑input‑729.bz2): Complete
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:     info: 
> do_state_transition:        State transition S_TRANSITION_ENGINE ‑> 
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: 
> cib_process_request:        Forwarding cib_modify operation for section 
> status to master (origin=local/attrd/46)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status_fencing:    Node q‑gp2‑dbpg57‑1 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status:    Node q‑gp2‑dbpg57‑1 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status_fencing:    Node q‑gp2‑dbpg57‑3 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status:    Node q‑gp2‑dbpg57‑3 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status_fencing:    Node q‑gp2‑dbpg57‑2 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status:    Node q‑gp2‑dbpg57‑2 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑master‑vip active on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑master‑vip active on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:  warning: 
> unpack_rsc_op_failure:      Processing failed op start for vmware_fence on 
> q‑gp2‑dbpg57‑3: unknown error (1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:  warning: 
> unpack_rsc_op_failure:      Processing failed op start for vmware_fence on 
> q‑gp2‑dbpg57‑3: unknown error (1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:  warning: 
> unpack_rsc_op_failure:      Processing failed op monitor for vmware_fence on

> q‑gp2‑dbpg57‑2: unknown error (1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:2 active on q‑gp2‑dbpg57‑2
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:2 active on q‑gp2‑dbpg57‑2
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: native_print:   
   
> postgresql‑master‑vip   (ocf::heartbeat:IPaddr2):       Started
q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: clone_print:    
   
>  Master/Slave Set: postgresql‑ha [postgresql‑10‑main]
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: short_print:    
   
>      Masters: [ q‑gp2‑dbpg57‑1 ]
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: short_print:    
   
>      Slaves: [ q‑gp2‑dbpg57‑2 q‑gp2‑dbpg57‑3 ]
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: native_print:   
   
> vmware_fence    (stonith:fence_vmware_rest):    FAILED q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info:
get_failcount_full: 
> vmware_fence has failed 5 times on q‑gp2‑dbpg57‑2
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:  warning: 
> common_apply_stickiness:    Forcing vmware_fence away from q‑gp2‑dbpg57‑2
after 
> 5 failures (max=5)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info:
get_failcount_full: 
> vmware_fence has failed 1 times on q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> common_apply_stickiness:    vmware_fence can fail 4 more times on
q‑gp2‑dbpg57‑3 
> before being forced off
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: master_color:   
   
> Promoting postgresql‑10‑main:0 (Master q‑gp2‑dbpg57‑1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: master_color:   
   
> postgresql‑ha: Promoted 1 instances of a possible 1 to master
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: RecurringOp:    
   
>  Start recurring monitor (60s) for vmware_fence on q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: LogActions:
Leave   
> postgresql‑master‑vip   (Started q‑gp2‑dbpg57‑1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: LogActions:
Leave   
> postgresql‑10‑main:0    (Master q‑gp2‑dbpg57‑1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: LogActions:
Leave   
> postgresql‑10‑main:1    (Slave q‑gp2‑dbpg57‑3)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: LogActions:
Leave   
> postgresql‑10‑main:2    (Slave q‑gp2‑dbpg57‑2)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:   notice: LogActions:
Recover 
> vmware_fence    (Started q‑gp2‑dbpg57‑3)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> Diff: ‑‑‑ 0.700.5 2
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> Diff: +++ 0.700.6 (null)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  /cib:  @num_updates=6
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  
>
/cib/status/node_state[@id='3']/transient_attributes[@id='3']/instance_attrib
> utes[@id='status‑3']/nvpair[@id='status‑3‑fail‑count‑vmware_fence']:  
> @value=INFINITY
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:     info: 
> do_state_transition:        State transition S_POLICY_ENGINE ‑> 
> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
> origin=handle_response ]
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:   notice:
process_pe_message: 
> Calculated Transition 5085: /var/lib/pacemaker/pengine/pe‑input‑730.bz2
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:   notice: 
> abort_transition_graph:     Transition aborted by 
> status‑3‑fail‑count‑vmware_fence, fail‑count‑vmware_fence=INFINITY: 
> Transient attribute change (modify cib=0.700.6,
source=abort_unless_down:329, 
>
path=/cib/status/node_state[@id='3']/transient_attributes[@id='3']/instance_a
> ttributes[@id='status‑3']/nvpair[@id='status‑3‑fail‑count‑vmware_fence'],
0)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: 
> cib_process_request:        Completed cib_modify operation for section 
> status: OK (rc=0, origin=q‑gp2‑dbpg57‑1/attrd/46, version=0.700.6)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: 
> cib_process_request:        Forwarding cib_modify operation for section 
> status to master (origin=local/attrd/47)
> Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1      attrd:     info:
attrd_cib_callback: 
> Update 46 for fail‑count‑vmware_fence: OK (0)
> Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1      attrd:     info:
attrd_cib_callback: 
> Update 46 for fail‑count‑vmware_fence[q‑gp2‑dbpg57‑2]=5: OK (0)
> Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1      attrd:     info:
attrd_cib_callback: 
> Update 46 for fail‑count‑vmware_fence[q‑gp2‑dbpg57‑3]=INFINITY: OK (0)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> Diff: ‑‑‑ 0.700.6 2
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> Diff: +++ 0.700.7 (null)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  /cib:  @num_updates=7
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  
>
/cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='
> vmware_fence']/lrm_rsc_op[@id='vmware_fence_last_0']:  
> @operation_key=vmware_fence_stop_0, @operation=stop, 
> @transition‑key=4:5085:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060, 
> @transition‑magic=0:0;4:5085:0:68fc0c5a‑8a09‑4d53‑90d5‑c1a237542060,
@call‑id=43, 
> @rc‑code=0, @op‑status=0, @last‑run=1532987190, @last‑rc‑change=1532987190,

> @exec‑time=0
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:   notice: run_graph:  
> Transition 5085 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=2, 
> Source=/var/lib/pacemaker/pengine/pe‑input‑730.bz2): Stopped
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:     info: 
> do_state_transition:        State transition S_TRANSITION_ENGINE ‑> 
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: 
> cib_process_request:        Completed cib_modify operation for section 
> status: OK (rc=0, origin=q‑gp2‑dbpg57‑3/crmd/33, version=0.700.7)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> Diff: ‑‑‑ 0.700.7 2
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> Diff: +++ 0.700.8 (null)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  /cib:  @num_updates=8
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: cib_perform_op: 
   
> +  
>
/cib/status/node_state[@id='3']/transient_attributes[@id='3']/instance_attrib
> utes[@id='status‑3']/nvpair[@id='status‑3‑last‑failure‑vmware_fence']:  
> @value=1532987190
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:     info: 
> abort_transition_graph:     Transition aborted by 
> status‑3‑last‑failure‑vmware_fence, last‑failure‑vmware_fence=1532987190: 
> Transient attribute change (modify cib=0.700.8,
source=abort_unless_down:329, 
>
path=/cib/status/node_state[@id='3']/transient_attributes[@id='3']/instance_a
> ttributes[@id='status‑3']/nvpair[@id='status‑3‑last‑failure‑vmware_fence'],
1)
> Jul 30 21:46:30 [3878] q‑gp2‑dbpg57‑1        cib:     info: 
> cib_process_request:        Completed cib_modify operation for section 
> status: OK (rc=0, origin=q‑gp2‑dbpg57‑1/attrd/47, version=0.700.8)
> Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1      attrd:     info:
attrd_cib_callback: 
> Update 47 for last‑failure‑vmware_fence: OK (0)
> Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1      attrd:     info:
attrd_cib_callback: 
> Update 47 for last‑failure‑vmware_fence[q‑gp2‑dbpg57‑2]=1532448714: OK (0)
> Jul 30 21:46:30 [3881] q‑gp2‑dbpg57‑1      attrd:     info:
attrd_cib_callback: 
> Update 47 for last‑failure‑vmware_fence[q‑gp2‑dbpg57‑3]=1532987190: OK (0)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status_fencing:    Node q‑gp2‑dbpg57‑1 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status:    Node q‑gp2‑dbpg57‑1 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status_fencing:    Node q‑gp2‑dbpg57‑3 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status:    Node q‑gp2‑dbpg57‑3 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status_fencing:    Node q‑gp2‑dbpg57‑2 is active
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_online_status:    Node q‑gp2‑dbpg57‑2 is online
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑master‑vip active on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑master‑vip active on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:0 active in master mode on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:1 active on q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:  warning: 
> unpack_rsc_op_failure:      Processing failed op start for vmware_fence on 
> q‑gp2‑dbpg57‑3: unknown error (1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:  warning: 
> unpack_rsc_op_failure:      Processing failed op monitor for vmware_fence on

> q‑gp2‑dbpg57‑2: unknown error (1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:2 active on q‑gp2‑dbpg57‑2
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: 
> determine_op_status:        Operation monitor found resource 
> postgresql‑10‑main:2 active on q‑gp2‑dbpg57‑2
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: native_print:   
   
> postgresql‑master‑vip   (ocf::heartbeat:IPaddr2):       Started
q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: clone_print:    
   
>  Master/Slave Set: postgresql‑ha [postgresql‑10‑main]
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: short_print:    
   
>      Masters: [ q‑gp2‑dbpg57‑1 ]
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: short_print:    
   
>      Slaves: [ q‑gp2‑dbpg57‑2 q‑gp2‑dbpg57‑3 ]
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: native_print:   
   
> vmware_fence    (stonith:fence_vmware_rest):    Stopped
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info:
get_failcount_full: 
> vmware_fence has failed 5 times on q‑gp2‑dbpg57‑2
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:  warning: 
> common_apply_stickiness:    Forcing vmware_fence away from q‑gp2‑dbpg57‑2
after 
> 5 failures (max=5)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info:
get_failcount_full: 
> vmware_fence has failed INFINITY times on q‑gp2‑dbpg57‑3
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:  warning: 
> common_apply_stickiness:    Forcing vmware_fence away from q‑gp2‑dbpg57‑3
after 
> 1000000 failures (max=5)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: master_color:   
   
> Promoting postgresql‑10‑main:0 (Master q‑gp2‑dbpg57‑1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: master_color:   
   
> postgresql‑ha: Promoted 1 instances of a possible 1 to master
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: RecurringOp:    
   
>  Start recurring monitor (60s) for vmware_fence on q‑gp2‑dbpg57‑1
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: LogActions:
Leave   
> postgresql‑master‑vip   (Started q‑gp2‑dbpg57‑1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: LogActions:
Leave   
> postgresql‑10‑main:0    (Master q‑gp2‑dbpg57‑1)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: LogActions:
Leave   
> postgresql‑10‑main:1    (Slave q‑gp2‑dbpg57‑3)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:     info: LogActions:
Leave   
> postgresql‑10‑main:2    (Slave q‑gp2‑dbpg57‑2)
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:   notice: LogActions:
Start   
> vmware_fence    (q‑gp2‑dbpg57‑1)
> Jul 30 21:46:30 [3883] q‑gp2‑dbpg57‑1       crmd:     info: 
> do_state_transition:        State transition S_POLICY_ENGINE ‑> 
> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
> origin=handle_response ]
> Jul 30 21:46:30 [3882] q‑gp2‑dbpg57‑1    pengine:   notice:
process_pe_message: 
> Calculated Transition 5086: /var/lib/pacemaker/pengine/pe‑input‑731.bz2
> Jul 30 21:46:30 [3880] q‑gp2‑dbpg57‑1       lrmd:     info: log_execute:    
   
> executing ‑ rsc:vmware_fence action:start call_id:77
> Jul 30 21:46:30 [3879] q‑gp2‑dbpg57‑1 stonith‑ng:  warning: log_action: 
> fence_vmware_rest[5739] stderr: [ 2018‑07‑30 21:46:30,895 ERROR: Unable to 
> connect/login to fencing device ]
> Jul 30 21:46:30 [3879] q‑gp2‑dbpg57‑1 stonith‑ng:  warning: log_action: 
> fence_vmware_rest[5739] stderr: [  ]
> Jul 30 21:46:30 [3879] q‑gp2‑dbpg57‑1 stonith‑ng:  warning: log_action: 
> fence_vmware_rest[5739] stderr: [  ]
> Jul 30 21:46:30 [3879] q‑gp2‑dbpg57‑1 stonith‑ng:     info: 
> internal_stonith_action_execute:    Attempt 2 to execute fence_vmware_rest 
> (monitor). remaining timeout is 20
> 
> 
>> On 2018‑08‑01, at 1:39 PM, Casey Allen Shobe <[email protected]>

> wrote:
>> 
>> Across our clusters, I see the fence agent stop working, with no apparent 
> reason.  It looks like shown below.  I've found that I can do a `pcs
resource 
> cleanup vmware_fence` to cause it to start back up again in a few seconds, 
> but why is this happening and how can I prevent it?
>> 
>> vmware_fence (stonith:fence_vmware_rest):    Stopped
>> 
>> Failed Actions:
>> * vmware_fence_start_0 on q‑gp2‑dbpg57‑1 'unknown error' (1): call=77, 
> status=Error, exitreason='none',
>>    last‑rc‑change='Mon Jul 30 21:46:30 2018', queued=1ms, exec=1862ms
>> * vmware_fence_start_0 on q‑gp2‑dbpg57‑3 'unknown error' (1): call=42, 
> status=Error, exitreason='none',
>>    last‑rc‑change='Mon Jul 30 21:46:27 2018', queued=0ms, exec=1510ms
>> * vmware_fence_monitor_60000 on q‑gp2‑dbpg57‑2 'unknown error' (1):
call=84, 
> status=Error, exitreason='none',
>>    last‑rc‑change='Tue Jul 24 16:11:42 2018', queued=0ms, exec=12142ms
>> 
>> Thank you,
>> ‑‑ 
>> Casey
>> _______________________________________________
>> Users mailing list: [email protected] 
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> _______________________________________________
> Users mailing list: [email protected] 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 



_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Antw: Re: Fence agent ends up stopped with no clear reason why

Reply via email to