>>> <[email protected]> schrieb am 26.07.2021 um 18:50 in Nachricht <[email protected]>: > On Mon, 2021‑07‑26 at 12:25 ‑0400, Digimer wrote: >> On 2021‑07‑26 9:54 a.m., [email protected] wrote: >> > On Fri, 2021‑07‑23 at 21:46 ‑0400, Digimer wrote: >> > > After a LOT of hassle, I finally got it updated, but OMG it was >> > > painful. >> > > >> > > I degraded the cluster (unsure if needed), set maintenance mode, >> > > deleted >> > > the stonith levels, deleted the stonith devices, recreated them >> > > with >> > > the >> > > updated values, recreated the stonith levels, and finally >> > > disabled >> > > maintenance mode. >> > > >> > > It should not have been this hard, right? Why is heck would it be >> > > that >> > > pacemaker kept "rolling back" to old configs? I'd delete the >> > > stonith >> > >> > That is bizarre. It sounds like the CIB changes were taking effect >> > locally, then being rejected by the rest of the cluster, which >> > would >> > send the "correct" CIB back to the originator. >> > >> > The logs of interest would be pacemaker.log from both nodes at the >> > time >> > you made the first configuration change that failed. I'm guessing >> > the >> > logs you posted were from after that point? >> >> Below are the logs. The change appears to first try at 'Jul 23 >> 16:22:27', made on an‑a02n01, included logs for a few minutes before >> in case relevant. >> * an‑a02n01: >> https://www.alteeve.com/an‑repo/files/an‑a02n01.pacemaker.log >> * an‑a02n02: >> https://www.alteeve.com/an‑repo/files/an‑a02n02.pacemaker.log >> Note that the PDUs as originally configured (10.201.2.1/2) were not >> available, so I had to disable and cleanup the stonith resources. >> They seemed to keep getting re‑enabled, so I got to the habit of >> doing this cycle of disable ‑> cleanup ‑> disable ‑> cleanup before I >> could reliably get the resources to be 'stopped (disabled)' in 'pcs >> stonith status'. >> digimer > > The initial change happened here: > > Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: Diff: ‑‑‑ 0.337.112 2 > Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: Diff: +++ 0.338.0 6a24af66df3d9f825cc2681222f8f5d6 > Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: + /cib: @epoch=338, @num_updates=0 > Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: + > /cib/configuration/resources/primitive[@id='apc_snmp_node1_an‑pdu03']/instance > _attributes[@id='apc_snmp_node1_an‑pdu03‑instance_attributes']/nvpair[@id='apc_ > snmp_node1_an‑pdu03‑instance_attributes‑ip']: @value=10.201.2.3 > Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_replace_notify) info: Replaced: 0.337.112 ‑> 0.338.0 from an‑a02n02 > Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_process_request) info: Completed cib_replace operation for section > configuration: OK (rc=0, origin=an‑a02n02/cibadmin/2, version=0.338.0) > > origin=an‑a02n02/cibadmin/2 means that someone or something ran the > cibadmin tool on an‑02n02. Presumably this was your interactive pcs > command. > > It was then reverted by:
I wonder about the gap between 338 and 343... > > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: Diff: ‑‑‑ 0.343.3 2 > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: Diff: +++ 0.344.0 (null) > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: + /cib: @epoch=344, @num_updates=0 > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ /cib/configuration/resources: <primitive > class="stonith" id="apc_snmp_node1_an‑pdu03" type="fence_apc_snmp"/> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ > <instance_attributes id="apc_snmp_node1_an‑pdu03‑instance_attributes"> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ <nvpair > id="apc_snmp_node1_an‑pdu03‑instance_attributes‑ip" name="ip" > value="10.201.2.1"/> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ <nvpair > id="apc_snmp_node1_an‑pdu03‑instance_attributes‑pcmk_host_list" > name="pcmk_host_list" value="an‑a02n01"/> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ <nvpair > id="apc_snmp_node1_an‑pdu03‑instance_attributes‑pcmk_off_action" > name="pcmk_off_action" value="reboot"/> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ <nvpair > id="apc_snmp_node1_an‑pdu03‑instance_attributes‑port" name="port" value="5"/> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ ></instance_attributes> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ <operations> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ <op > id="apc_snmp_node1_an‑pdu03‑monitor‑interval‑60" interval="60" name="monitor"/> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ </operations> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_perform_op) info: ++ </primitive> > Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based [121628] > (cib_process_request) info: Completed cib_apply_diff operation for section > 'all': OK (rc=0, origin=an‑a02n02/cibadmin/2, version=0.344.0) > > Notice the origin is still cibadmin on an‑a02n02. So this was either > you, or a script or cron on that node. I don't see any additional > details on that node. > ‑‑ > Ken Gaillot <[email protected]> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
