On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote:
>
> What crm_mon are you looking at?
> I see stuff like:
>
> virt-fencing (stonith:fence_xvm): Started rhos4-node3
> Resource Group: mysql-group
> mysql-vip (ocf::heartbeat:IPaddr2): Started rhos4-node3
> mysql-fs (ocf::heartbeat:Filesystem): Started rhos4-node3
> mysql-db (ocf::heartbeat:mysql): Started rhos4-node3
Yes, you are right. I couldn't see the forest for the trees.
I initially was optimistic about crm_mon being more truthful than
crm_resource but it turns out it is not.
Take for example these commands to set a constraint and start a resource
(which has already been defined at this point):
[21/Jan/2014:13:46:40] cibadmin -o constraints -C -X '<rsc_location
id="res1-primary" node="node5" rsc="res1" score="20"/>'
[21/Jan/2014:13:46:41] cibadmin -o constraints -C -X '<rsc_location
id="res1-secondary" node="node6" rsc="res1" score="10"/>'
[21/Jan/2014:13:46:42] crm_resource -r 'res1' -p target-role -m -v 'Started'
and then these repeated calls to crm_mon -1 on node5:
[21/Jan/2014:13:46:42] crm_mon -1
Last updated: Tue Jan 21 13:46:42 2014
Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
Stack: openais
Current DC: node5 - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
2 Resources configured
Online: [ node5 node6 ]
st-fencing (stonith:fence_product): Started node5
res1 (ocf::product:Target): Started node6
[21/Jan/2014:13:46:42] crm_mon -1
Last updated: Tue Jan 21 13:46:42 2014
Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
Stack: openais
Current DC: node5 - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
2 Resources configured
Online: [ node5 node6 ]
st-fencing (stonith:fence_product): Started node5
res1 (ocf::product:Target): Started node6
[21/Jan/2014:13:46:49] crm_mon -1 -r
Last updated: Tue Jan 21 13:46:49 2014
Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
Stack: openais
Current DC: node5 - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
2 Resources configured
Online: [ node5 node6 ]
Full list of resources:
st-fencing (stonith:fence_product): Started node5
res1 (ocf::product:Target): Started node5
The first two are not correct, showing the resource started on node6
when it was actually started on node5. Finally, 7 seconds later, it is
reporting correctly. The logs on node{5,6} bear this out. The resource
was actually only ever started on node5 and never on node6.
Here's the log for node5:
Jan 21 13:42:00 node5 pacemaker: Starting Pacemaker Cluster Manager
Jan 21 13:42:00 node5 pacemakerd[8684]: notice: main: Starting Pacemaker
1.1.10-14.el6_5.1 (Build: 368c726): generated-manpages agent-manpages
ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios
corosync-plugin cman
Jan 21 13:42:00 node5 pacemakerd[8684]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 stonith-ng[8691]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:42:00 node5 cib[8690]: notice: main: Using new config location:
/var/lib/pacemaker/cib
Jan 21 13:42:00 node5 cib[8690]: warning: retrieveCib: Cluster configuration
not found: /var/lib/pacemaker/cib/cib.xml
Jan 21 13:42:00 node5 cib[8690]: warning: readCibXmlFile: Primary
configuration corrupt or unusable, trying backups
Jan 21 13:42:00 node5 cib[8690]: warning: readCibXmlFile: Continuing with an
empty configuration.
Jan 21 13:42:00 node5 attrd[8693]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: classic openais (with plugin)
Jan 21 13:42:00 node5 crmd[8695]: notice: main: CRM Git Version: 368c726
Jan 21 13:42:00 node5 attrd[8693]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x1cbc3c0 for attrd/0
Jan 21 13:42:00 node5 stonith-ng[8691]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x1cb8040 for stonith-ng/0
Jan 21 13:42:00 node5 attrd[8693]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 stonith-ng[8691]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 attrd[8693]: notice: main: Starting mainloop...
Jan 21 13:42:00 node5 cib[8690]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: classic openais (with plugin)
Jan 21 13:42:00 node5 cib[8690]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x1cc0740 for cib/0
Jan 21 13:42:00 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Sending
membership update 4 to cib
Jan 21 13:42:00 node5 cib[8690]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 cib[8690]: notice: crm_update_peer_state:
plugin_handle_membership: Node node5[307367434] - state is now member (was
(null))
Jan 21 13:42:01 node5 crmd[8695]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: classic openais (with plugin)
Jan 21 13:42:01 node5 crmd[8695]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:42:01 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x1cc4c00 for crmd/0
Jan 21 13:42:01 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Sending
membership update 4 to crmd
Jan 21 13:42:01 node5 crmd[8695]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:42:01 node5 stonith-ng[8691]: notice: setup_cib: Watching for
stonith topology changes
Jan 21 13:42:01 node5 crmd[8695]: notice: crm_update_peer_state:
plugin_handle_membership: Node node5[307367434] - state is now member (was
(null))
Jan 21 13:42:01 node5 crmd[8695]: notice: do_started: The local CRM is
operational
Jan 21 13:42:01 node5 crmd[8695]: notice: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL
origin=do_started ]
Jan 21 13:42:22 node5 crmd[8695]: warning: do_log: FSA: Input I_DC_TIMEOUT
from crm_timer_popped() received in state S_PENDING
Jan 21 13:42:22 node5 crmd[8695]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 21 13:42:22 node5 cib[8690]: notice: cib_perform_op: Configuration
ordering change detected
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: log_cib_diff:
cib_process_diff: Local-only Change: 0.1.1
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: cib_process_diff: -- <cib
num_updates="1"/>
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: cib_process_diff: ++ <cib
epoch="1" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
cib-last-written="Tue Jan 21 13:42:01 2014" update-origin="node5"
update-client="crmd"/>
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: cib_process_diff: Diff 0.1.1
-> 0.1.1 from local not applied to 0.1.1: + and - versions in the diff did not
change
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb:
[cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb:
[cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: Diff: --- 0.1.2
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.2.1
5ea115fd649f5fa3419b7f1c9abf8992
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="1" num_updates="2"/>
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++
<cluster_property_set id="cib-bootstrap-options">
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-14.el6_5.1-368c726"/>
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++
</cluster_property_set>
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb:
[cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:42:22 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only
Change: 0.3.1
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="2" num_updates="1"/>
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure"
value="classic openais (with plugin)"/>
Jan 21 13:42:22 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only
Change: 0.4.1
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="3" num_updates="1"/>
Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes"
value="2"/>
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb:
[cib_diff_notify] Patch aborted: Application of an update diff failed,
requesting a full refresh (-207)
Jan 21 13:42:22 node5 attrd[8693]: notice: attrd_local_callback: Sending full
refresh (origin=crmd)
Jan 21 13:42:22 node5 pengine[8694]: warning: cluster_status: We do not have
quorum - fencing and resource management disabled
Jan 21 13:42:22 node5 pengine[8694]: error: unpack_resources: Resource
start-up disabled since no STONITH resources have been defined
Jan 21 13:42:22 node5 pengine[8694]: error: unpack_resources: Either
configure some or disable STONITH with the stonith-enabled option
Jan 21 13:42:22 node5 pengine[8694]: error: unpack_resources: NOTE: Clusters
with shared data need STONITH to ensure data integrity
Jan 21 13:42:22 node5 pengine[8694]: notice: stage6: Delaying fencing
operations until there are resources to manage
Jan 21 13:42:22 node5 pengine[8694]: notice: process_pe_message: Calculated
Transition 0: /var/lib/pacemaker/pengine/pe-input-71.bz2
Jan 21 13:42:22 node5 pengine[8694]: notice: process_pe_message:
Configuration ERRORs found during PE processing. Please run "crm_verify -L" to
identify issues.
Jan 21 13:42:22 node5 crmd[8695]: notice: te_rsc_command: Initiating action
2: probe_complete probe_complete on node5 (local) - no waiting
Jan 21 13:42:22 node5 crmd[8695]: notice: run_graph: Transition 0
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-71.bz2): Complete
Jan 21 13:42:22 node5 crmd[8695]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 21 13:42:22 node5 attrd[8693]: notice: attrd_trigger_update: Sending
flush op to all hosts for: probe_complete (true)
Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb:
[cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:42:22 node5 attrd[8693]: notice: attrd_perform_update: Sent update
4: probe_complete=true
Jan 21 13:42:36 node5 cibadmin[8745]: notice: crm_log_args: Invoked: cibadmin
--delete --obj_type resources -X <primitive class="stonith" id="st-fencing"
type="fence_product"/>
Jan 21 13:42:36 node5 cibadmin[8746]: notice: crm_log_args: Invoked: cibadmin
--create --obj_type resources -X <primitive class="stonith" id="st-fencing"
type="fence_product"/>
Jan 21 13:42:36 node5 crmd[8695]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jan 21 13:42:36 node5 cib[8690]: notice: cib:diff: Diff: --- 0.4.4
Jan 21 13:42:36 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.5.1
e1637d693af73b0a1ff6beeabaa16dbe
Jan 21 13:42:36 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="4" num_updates="4"/>
Jan 21 13:42:36 node5 cib[8690]: notice: cib:diff: ++ <primitive
class="stonith" id="st-fencing" type="fence_product"/>
Jan 21 13:42:36 node5 pengine[8694]: warning: cluster_status: We do not have
quorum - fencing and resource management disabled
Jan 21 13:42:36 node5 pengine[8694]: notice: LogActions: Start
st-fencing#011(node5 - blocked)
Jan 21 13:42:36 node5 pengine[8694]: notice: process_pe_message: Calculated
Transition 1: /var/lib/pacemaker/pengine/pe-input-72.bz2
Jan 21 13:42:36 node5 crmd[8695]: notice: te_rsc_command: Initiating action
4: monitor st-fencing_monitor_0 on node5 (local)
Jan 21 13:42:37 node5 cibadmin[8748]: notice: crm_log_args: Invoked: cibadmin
--modify --allow-create -o crm_config -X <cluster_property_set
id="cib-bootstrap-options">#012<nvpair
id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy"
value="ignore"/>#012<nvpair id="cib-bootstrap-options-symmetric-cluster"
name="symmetric-cluster" value="true"/>#012<nvpair
id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure"
value="openais"/>#012<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="true"
Jan 21 13:42:37 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only
Change: 0.6.1
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: -- <nvpair
value="classic openais (with plugin)"
id="cib-bootstrap-options-cluster-infrastructure"/>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure"
value="openais"/>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy"
value="ignore"/>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster"
value="true"/>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
Jan 21 13:42:37 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only
Change: 0.7.1
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="6" num_updates="1"/>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <rsc_defaults>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <meta_attributes
id="rsc_defaults-options">
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="rsc_defaults-options-resource-stickiness" name="resource-stickiness"
value="1000"/>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ </meta_attributes>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ </rsc_defaults>
Jan 21 13:42:37 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only
Change: 0.8.1
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="7" num_updates="1"/>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="rsc_defaults-options-failure-timeout" name="failure-timeout" value="20m"/>
Jan 21 13:42:37 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only
Change: 0.9.1
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="8" num_updates="1"/>
Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="rsc_defaults-options-migration-threshold" name="migration-threshold"
value="3"/>
Jan 21 13:42:37 node5 stonith-ng[8691]: notice: stonith_device_register:
Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:42:39 node5 crmd[8695]: notice: te_rsc_command: Initiating action
3: probe_complete probe_complete on node5 (local) - no waiting
Jan 21 13:42:39 node5 crmd[8695]: notice: run_graph: Transition 1
(Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-72.bz2): Stopped
Jan 21 13:42:39 node5 pengine[8694]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:42:39 node5 pengine[8694]: notice: LogActions: Start
st-fencing#011(node5)
Jan 21 13:42:39 node5 pengine[8694]: notice: process_pe_message: Calculated
Transition 2: /var/lib/pacemaker/pengine/pe-input-73.bz2
Jan 21 13:42:39 node5 crmd[8695]: notice: te_rsc_command: Initiating action
4: start st-fencing_start_0 on node5 (local)
Jan 21 13:42:40 node5 stonith-ng[8691]: notice: stonith_device_register:
Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:42:40 node5 crmd[8695]: notice: process_lrm_event: LRM operation
st-fencing_start_0 (call=8, rc=0, cib-update=42, confirmed=true) ok
Jan 21 13:42:40 node5 crmd[8695]: notice: run_graph: Transition 2
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-73.bz2): Complete
Jan 21 13:42:40 node5 crmd[8695]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 21 13:45:02 node5 ntpd[8470]: 0.0.0.0 c614 04 freq_mode
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] notice: pcmk_peer_update:
Transitional membership event on ring 8: memb=1, new=0, lost=0
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: pcmk_peer_update: memb:
node5 307367434
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] notice: pcmk_peer_update:
Stable membership event on ring 8: memb=2, new=1, lost=0
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: Creating
entry for node 290590218 born on 8
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: Node
290590218/unknown is now: member
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: pcmk_peer_update: NEW:
.pending. 290590218
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: pcmk_peer_update: MEMB:
.pending. 290590218
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: pcmk_peer_update: MEMB:
node5 307367434
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info:
send_member_notification: Sending membership update 8 to 2 children
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: 0x1ca5c80
Node 307367434 ((null)) born on: 4
Jan 21 13:46:23 node5 corosync[8646]: [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jan 21 13:46:23 node5 cib[8690]: notice: crm_update_peer_state:
plugin_handle_membership: Node (null)[290590218] - state is now member (was
(null))
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: 0x1cc9e50
Node 290590218 (node6) born on: 8
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: 0x1cc9e50
Node 290590218 now known as node6 (was: (null))
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: Node
node6 now has 1 quorum votes (was 0)
Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info:
send_member_notification: Sending membership update 8 to 2 children
Jan 21 13:46:23 node5 crmd[8695]: notice: crm_update_peer_state:
plugin_handle_membership: Node (null)[290590218] - state is now member (was
(null))
Jan 21 13:46:23 node5 cib[8690]: notice: plugin_handle_membership: Membership
8: quorum acquired
Jan 21 13:46:23 node5 corosync[8646]: [CPG ] chosen downlist: sender r(0)
ip(10.14.82.17) r(1) ip(10.128.2.17) ; members(old:1 left:0)
Jan 21 13:46:23 node5 corosync[8646]: [MAIN ] Completed service
synchronization, ready to provide service.
Jan 21 13:46:23 node5 crmd[8695]: notice: plugin_handle_membership:
Membership 8: quorum acquired
Jan 21 13:46:23 node5 cib[8690]: notice: cib:diff: Diff: --- 0.9.3
Jan 21 13:46:23 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.10.1
7be5e06f43347759896f381ff0e09476
Jan 21 13:46:23 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="9" num_updates="3"/>
Jan 21 13:46:23 node5 cib[8690]: notice: cib:diff: ++ <node id="node6"
uname="node6"/>
Jan 21 13:46:23 node5 crmd[8695]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jan 21 13:46:30 node5 attrd[8693]: notice: attrd_local_callback: Sending full
refresh (origin=crmd)
Jan 21 13:46:30 node5 attrd[8693]: notice: attrd_trigger_update: Sending
flush op to all hosts for: probe_complete (true)
Jan 21 13:46:31 node5 pengine[8694]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:31 node5 crmd[8695]: notice: te_rsc_command: Initiating action
5: monitor st-fencing_monitor_0 on node6
Jan 21 13:46:31 node5 pengine[8694]: notice: process_pe_message: Calculated
Transition 3: /var/lib/pacemaker/pengine/pe-input-74.bz2
Jan 21 13:46:32 node5 crmd[8695]: notice: te_rsc_command: Initiating action
4: probe_complete probe_complete on node6 - no waiting
Jan 21 13:46:32 node5 crmd[8695]: notice: run_graph: Transition 3
(Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-74.bz2): Complete
Jan 21 13:46:32 node5 crmd[8695]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 21 13:46:40 node5 cibadmin[9025]: notice: crm_log_args: Invoked: cibadmin
-o resources -C -x /tmp/tmp80fPna
Jan 21 13:46:40 node5 stonith-ng[8691]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:40 node5 crmd[8695]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: Diff: --- 0.10.10
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.11.1
50c098c8e9b5f371c281ae77c05120e2
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="10" num_updates="10"/>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <primitive
class="ocf" provider="product" type="Target" id="res1">
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++
<meta_attributes id="res1-meta_attributes">
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <nvpair
name="target-role" id="res1-meta_attributes-target-role" value="Stopped"/>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++
</meta_attributes>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <operations
id="res1-operations">
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <op
id="res1-monitor-5" interval="5" name="monitor" timeout="60"/>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <op
id="res1-start-0" interval="0" name="start" timeout="300"/>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <op
id="res1-stop-0" interval="0" name="stop" timeout="300"/>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ </operations>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++
<instance_attributes id="res1-instance_attributes">
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <nvpair
id="res1-instance_attributes-target" name="target"
value="63e7a581-08fd-4a6c-bcb8-153469bd36a2"/>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++
</instance_attributes>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ </primitive>
Jan 21 13:46:40 node5 pengine[8694]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:40 node5 pengine[8694]: notice: process_pe_message: Calculated
Transition 4: /var/lib/pacemaker/pengine/pe-input-75.bz2
Jan 21 13:46:40 node5 crmd[8695]: notice: te_rsc_command: Initiating action
6: monitor res1_monitor_0 on node6
Jan 21 13:46:40 node5 crmd[8695]: notice: te_rsc_command: Initiating action
4: monitor res1_monitor_0 on node5 (local)
Jan 21 13:46:40 node5 cibadmin[9028]: notice: crm_log_args: Invoked: cibadmin
-o constraints -C -X <rsc_location id="res1-primary" node="node5" rsc="res1"
score="20"/>
Jan 21 13:46:40 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only
Change: 0.12.1
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="11" num_updates="1"/>
Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <rsc_location
id="res1-primary" node="node5" rsc="res1" score="20"/>
Jan 21 13:46:41 node5 stonith-ng[8691]: notice: stonith_device_register:
Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:46:41 node5 stonith-ng[8691]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:41 node5 kernel: LDISKFS-fs (sde): mounted filesystem with ordered
data mode. quota=on. Opts:
Jan 21 13:46:41 node5 crmd[8695]: warning: status_from_rc: Action 6
(res1_monitor_0) on node6 failed (target: 7 vs. rc: 0): Error
Jan 21 13:46:41 node5 crmd[8695]: notice: te_rsc_command: Initiating action
5: probe_complete probe_complete on node6 - no waiting
Jan 21 13:46:41 node5 stonith-ng[8691]: notice: stonith_device_register:
Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:46:41 node5 stonith-ng[8691]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:41 node5 cib[8690]: notice: cib:diff: Diff: --- 0.12.2
Jan 21 13:46:41 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.13.1
48a53642acc94d5f541c3297baf08880
Jan 21 13:46:41 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0"
epoch="12" num_updates="2"/>
Jan 21 13:46:41 node5 cib[8690]: notice: cib:diff: ++ <rsc_location
id="res1-secondary" node="node6" rsc="res1" score="10"/>
Jan 21 13:46:41 node5 stonith-ng[8691]: notice: stonith_device_register:
Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:46:41 node5 crmd[8695]: notice: process_lrm_event: LRM operation
res1_monitor_0 (call=13, rc=7, cib-update=68, confirmed=true) not running
Jan 21 13:46:41 node5 crmd[8695]: notice: te_rsc_command: Initiating action
3: probe_complete probe_complete on node5 (local) - no waiting
Jan 21 13:46:41 node5 crmd[8695]: notice: run_graph: Transition 4
(Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-75.bz2): Stopped
Jan 21 13:46:41 node5 pengine[8694]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:41 node5 pengine[8694]: notice: LogActions: Stop
res1#011(node6)
Jan 21 13:46:41 node5 pengine[8694]: notice: process_pe_message: Calculated
Transition 5: /var/lib/pacemaker/pengine/pe-input-76.bz2
Jan 21 13:46:41 node5 crmd[8695]: notice: te_rsc_command: Initiating action
7: stop res1_stop_0 on node6
Jan 21 13:46:42 node5 stonith-ng[8691]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:42 node5 cib[8690]: notice: cib:diff: Diff: --- 0.13.2
Jan 21 13:46:42 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.14.1
36575d506c0ed5769e4655c88ef3510d
Jan 21 13:46:42 node5 cib[8690]: notice: cib:diff: -- <nvpair
value="Stopped" id="res1-meta_attributes-target-role"/>
Jan 21 13:46:42 node5 cib[8690]: notice: cib:diff: ++ <nvpair
name="target-role" id="res1-meta_attributes-target-role" value="Started"/>
Jan 21 13:46:42 node5 crmd[8695]: notice: run_graph: Transition 5
(Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-76.bz2): Stopped
Jan 21 13:46:42 node5 pengine[8694]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:42 node5 pengine[8694]: notice: LogActions: Start
res1#011(node5)
Jan 21 13:46:42 node5 pengine[8694]: notice: process_pe_message: Calculated
Transition 6: /var/lib/pacemaker/pengine/pe-input-77.bz2
Jan 21 13:46:42 node5 crmd[8695]: notice: te_rsc_command: Initiating action
7: start res1_start_0 on node5 (local)
Jan 21 13:46:42 node5 stonith-ng[8691]: notice: stonith_device_register:
Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ [ ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ { ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ "args": [ ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ "mount", ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ "-t", ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ "ext4", ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [
"/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1", ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ "/mnt/res1" ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ ], ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ "rc": 0, ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ "stderr": "", ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ "stdout": "" ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ } ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ ] ]
Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished:
res1_start_0:9059:stderr [ ]
Jan 21 13:46:43 node5 crmd[8695]: notice: process_lrm_event: LRM operation
res1_start_0 (call=16, rc=0, cib-update=71, confirmed=true) ok
Jan 21 13:46:43 node5 crmd[8695]: notice: te_rsc_command: Initiating action
8: monitor res1_monitor_5000 on node5 (local)
Jan 21 13:46:43 node5 crmd[8695]: notice: process_lrm_event: LRM operation
res1_monitor_5000 (call=19, rc=0, cib-update=72, confirmed=false) ok
Jan 21 13:46:43 node5 crmd[8695]: notice: run_graph: Transition 6
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-77.bz2): Complete
Jan 21 13:46:43 node5 crmd[8695]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
and here's the log for node6:
Jan 21 13:46:26 node6 pacemaker: Starting Pacemaker Cluster Manager
Jan 21 13:46:26 node6 pacemakerd[9322]: notice: main: Starting Pacemaker
1.1.10-14.el6_5.1 (Build: 368c726): generated-manpages agent-manpages
ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios
corosync-plugin cman
Jan 21 13:46:26 node6 pacemakerd[9322]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message:
Sending message to local.cib failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 stonith-ng[9329]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:46:26 node6 attrd[9331]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: classic openais (with plugin)
Jan 21 13:46:26 node6 crmd[9333]: notice: main: CRM Git Version: 368c726
Jan 21 13:46:26 node6 cib[9328]: notice: main: Using new config location:
/var/lib/pacemaker/cib
Jan 21 13:46:26 node6 cib[9328]: warning: retrieveCib: Cluster configuration
not found: /var/lib/pacemaker/cib/cib.xml
Jan 21 13:46:26 node6 cib[9328]: warning: readCibXmlFile: Primary
configuration corrupt or unusable, trying backups
Jan 21 13:46:26 node6 cib[9328]: warning: readCibXmlFile: Continuing with an
empty configuration.
Jan 21 13:46:26 node6 attrd[9331]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x1b1fe50 for attrd/0
Jan 21 13:46:26 node6 stonith-ng[9329]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x1b1bad0 for stonith-ng/0
Jan 21 13:46:26 node6 attrd[9331]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 stonith-ng[9329]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 attrd[9331]: notice: main: Starting mainloop...
Jan 21 13:46:26 node6 cib[9328]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: classic openais (with plugin)
Jan 21 13:46:26 node6 cib[9328]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x1b241d0 for cib/0
Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Sending
membership update 8 to cib
Jan 21 13:46:26 node6 cib[9328]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 cib[9328]: notice: plugin_handle_membership: Membership
8: quorum acquired
Jan 21 13:46:26 node6 cib[9328]: notice: crm_update_peer_state:
plugin_handle_membership: Node node6[290590218] - state is now member (was
(null))
Jan 21 13:46:26 node6 cib[9328]: notice: crm_update_peer_state:
plugin_handle_membership: Node node5[307367434] - state is now member (was
(null))
Jan 21 13:46:27 node6 crmd[9333]: notice: crm_cluster_connect: Connecting to
cluster infrastructure: classic openais (with plugin)
Jan 21 13:46:27 node6 crmd[9333]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:46:27 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Recorded
connection 0x1b28550 for crmd/0
Jan 21 13:46:27 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Sending
membership update 8 to crmd
Jan 21 13:46:27 node6 crmd[9333]: notice: get_node_name: Defaulting to uname
-n for the local classic openais (with plugin) node name
Jan 21 13:46:27 node6 crmd[9333]: notice: plugin_handle_membership:
Membership 8: quorum acquired
Jan 21 13:46:27 node6 crmd[9333]: notice: crm_update_peer_state:
plugin_handle_membership: Node node6[290590218] - state is now member (was
(null))
Jan 21 13:46:27 node6 crmd[9333]: notice: crm_update_peer_state:
plugin_handle_membership: Node node5[307367434] - state is now member (was
(null))
Jan 21 13:46:27 node6 stonith-ng[9329]: notice: setup_cib: Watching for
stonith topology changes
Jan 21 13:46:27 node6 crmd[9333]: notice: do_started: The local CRM is
operational
Jan 21 13:46:27 node6 crmd[9333]: notice: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL
origin=do_started ]
Jan 21 13:46:29 node6 stonith-ng[9329]: warning: cib_process_diff: Diff 0.1.1
-> 0.10.4 from local not applied to 0.1.1: Failed application of an update diff
Jan 21 13:46:29 node6 stonith-ng[9329]: notice: update_cib_cache_cb:
[cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:46:29 node6 crmd[9333]: notice: do_state_transition: State
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]
Jan 21 13:46:29 node6 attrd[9331]: notice: attrd_local_callback: Sending full
refresh (origin=crmd)
Jan 21 13:46:29 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:30 node6 stonith-ng[9329]: notice: stonith_device_register:
Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:46:32 node6 attrd[9331]: notice: attrd_trigger_update: Sending
flush op to all hosts for: probe_complete (true)
Jan 21 13:46:32 node6 attrd[9331]: notice: attrd_perform_update: Sent update
7: probe_complete=true
Jan 21 13:46:40 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:41 node6 crmd[9333]: notice: process_lrm_event: LRM operation
res1_monitor_0 (call=10, rc=0, cib-update=10, confirmed=true) ok
Jan 21 13:46:41 node6 cibadmin[9408]: notice: crm_log_args: Invoked: cibadmin
-o constraints -C -X <rsc_location id="res1-secondary" node="node6" rsc="res1"
score="10"/>
Jan 21 13:46:41 node6 stonith-ng[9329]: notice: stonith_device_register:
Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:46:41 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:42 node6 crmd[9333]: notice: process_lrm_event: LRM operation
res1_stop_0 (call=13, rc=0, cib-update=11, confirmed=true) ok
Jan 21 13:46:42 node6 stonith-ng[9329]: notice: stonith_device_register:
Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:46:42 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:43 node6 stonith-ng[9329]: notice: stonith_device_register:
Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:46:43 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jan 21 13:46:44 node6 stonith-ng[9329]: notice: stonith_device_register:
Added 'st-fencing' to the device list (1 active devices)
Thoughts?
b.
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org