Hi Andrew and Emi,
Please find the attached new pacemaker configuration and syslog,
log attached is when I turn off working node(server) and the xen doesn't
migrate.
After sometime it does start but throws white blank screen in VNCViewer
Thanks in advance
On Wed, Jun 4, 2014 at 2:52 PM, emmanuel segura <[email protected]> wrote:
> Because you don't have configured the fencing
>
>
> 2014-06-04 9:20 GMT+02:00 kamal kishi <[email protected]>:
>
> Hi emi,
>>
>> Cluster logs??
>> Rite now i'm getting all the logs in Syslog itself.
>>
>> Another thing i found out is that ocfs2 has some issue while a anyone
>> server is offline or powered off, can you suggest if using ocfs2 in here is
>> good option or not.
>>
>> Thank you
>>
>>
>> On Tue, Jun 3, 2014 at 6:31 PM, emmanuel segura <[email protected]>
>> wrote:
>>
>>> maybe i wrong, but i think you forgot the cluster logs
>>>
>>>
>>> 2014-06-03 14:34 GMT+02:00 kamal kishi <[email protected]>:
>>>
>>>> Hi all,
>>>>
>>>> I'm sure many have come across same question and yes i've gone
>>>> through most of the blogs and mailing list without much results.
>>>> I'm trying to configure XEN HVM DOMU on DRBD replicated partition of
>>>> filesystem type ocfs2 using Pacemaker.
>>>>
>>>> My question is what all changes to be done to below mentioned files of
>>>> xen to work fine with pacemaker -
>>>> /etc/xen/xend-config.sxp
>>>> /etc/default/xendomains
>>>>
>>>> Let know if any other file to be edited .
>>>>
>>>> Find my configuration files attached.
>>>> Many times the xen resource doesn't start.
>>>> Even if the same starts, migration doesn't take place.
>>>> Checked logs, some "Unknown error" is printed
>>>>
>>>> Would be helpful if someone could guide me through with configuration.
>>>>
>>>> Thanks in advance guys
>>>>
>>>> --
>>>> Regards,
>>>> Kamal Kishore B V
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: [email protected]
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>
>>>
>>> --
>>> esta es mi vida e me la vivo hasta que dios quiera
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: [email protected]
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>>
>> --
>> Regards,
>> Kamal Kishore B V
>>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
--
Regards,
Kamal Kishore B V
node server1
node server2
primitive Clu-FS-DRBD ocf:linbit:drbd \
params drbd_resource="r0" \
operations $id="Clu-FS-DRBD-ops" \
op start interval="0" timeout="49s" \
op stop interval="0" timeout="50s" \
op monitor interval="40s" role="Master" timeout="50s" \
op monitor interval="41s" role="Slave" timeout="51s" \
meta target-role="started"
primitive Clu-FS-Mount ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/r0" directory="/cluster" fstype="ocfs2"
\
op monitor interval="120s" \
meta target-role="started"
primitive xenwin7 ocf:heartbeat:Xen \
params xmfile="/home/cluster/xen/win7.cfg" \
op monitor interval="40s" \
meta target-role="started" is-managed="true" allow-migrate="true"
ms Clu-FS-DRBD-Master Clu-FS-DRBD \
meta resource-stickines="100" master-max="2" notify="true"
interleave="true"
clone Clu-FS-Mount-Clone Clu-FS-Mount \
meta interleave="true" ordered="true"
location drbd-fence-by-handler-Clu-FS-DRBD-Master Clu-FS-DRBD-Master \
rule $id="drbd-fence-by-handler-rule-Clu-FS-DRBD-Master" $role="Master"
-inf: #uname ne server1
colocation Clu-Clo-DRBD inf: Clu-FS-Mount-Clone Clu-FS-DRBD-Master:Master
colocation win7-Xen-Clu-Clo inf: xenwin7 Clu-FS-Mount-Clone
order Cluster-FS-After-DRBD inf: Clu-FS-DRBD-Master:promote
Clu-FS-Mount-Clone:start
property $id="cib-bootstrap-options" \
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
default-resource-stickiness="1000" \
last-lrm-refresh="1401960233"
Jun 5 15:11:39 server1 NetworkManager[887]: <info> (eth0): carrier now OFF
(device state 10)
Jun 5 15:11:39 server1 kernel: [ 2127.112852] bnx2 0000:01:00.0: eth0: NIC
Copper Link is Down
Jun 5 15:11:39 server1 kernel: [ 2127.113876] xenbr0: port 1(eth0) entering
forwarding state
Jun 5 15:11:41 server1 NetworkManager[887]: <info> (eth0): carrier now ON
(device state 10)
Jun 5 15:11:41 server1 kernel: [ 2129.231687] bnx2 0000:01:00.0: eth0: NIC
Copper Link is Up, 100 Mbps full duplex, receive & transmit flow control ON
Jun 5 15:11:41 server1 kernel: [ 2129.232672] xenbr0: port 1(eth0) entering
forwarding state
Jun 5 15:11:41 server1 kernel: [ 2129.232696] xenbr0: port 1(eth0) entering
forwarding state
Jun 5 15:11:42 server1 corosync[1556]: [TOTEM ] A processor failed, forming
new configuration.
Jun 5 15:11:43 server1 NetworkManager[887]: <info> (eth0): carrier now OFF
(device state 10)
Jun 5 15:11:43 server1 kernel: [ 2130.624346] bnx2 0000:01:00.0: eth0: NIC
Copper Link is Down
Jun 5 15:11:43 server1 kernel: [ 2130.625274] xenbr0: port 1(eth0) entering
forwarding state
Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] notice: pcmk_peer_update:
Transitional membership event on ring 64: memb=1, new=0, lost=1
Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: pcmk_peer_update:
memb: server1 16777226
Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: pcmk_peer_update:
lost: server2 33554442
Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] notice: pcmk_peer_update:
Stable membership event on ring 64: memb=1, new=0, lost=0
Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: pcmk_peer_update:
MEMB: server1 16777226
Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info:
ais_mark_unseen_peer_dead: Node server2 was not seen in the previous transition
Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: update_member: Node
33554442/server2 is now: lost
Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info:
send_member_notification: Sending membership update 64 to 2 children
Jun 5 15:11:45 server1 corosync[1556]: [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Jun 5 15:11:45 server1 corosync[1556]: [CPG ] chosen downlist: sender r(0)
ip(10.0.0.1) ; members(old:2 left:1)
Jun 5 15:11:45 server1 corosync[1556]: [MAIN ] Completed service
synchronization, ready to provide service.
Jun 5 15:11:45 server1 cib: [1595]: notice: ais_dispatch_message: Membership
64: quorum lost
Jun 5 15:11:45 server1 cib: [1595]: info: crm_update_peer: Node server2:
id=33554442 state=lost (new) addr=r(0) ip(10.0.0.2) votes=1 born=60 seen=60
proc=00000000000000000000000000111312
Jun 5 15:11:45 server1 crmd: [1600]: notice: ais_dispatch_message: Membership
64: quorum lost
Jun 5 15:11:45 server1 crmd: [1600]: info: ais_status_callback: status:
server2 is now lost (was member)
Jun 5 15:11:45 server1 crmd: [1600]: info: crm_update_peer: Node server2:
id=33554442 state=lost (new) addr=r(0) ip(10.0.0.2) votes=1 born=60 seen=60
proc=00000000000000000000000000111312
Jun 5 15:11:45 server1 crmd: [1600]: info: erase_node_from_join: Removed node
server2 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1
Jun 5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/146,
version=0.52.3): ok (rc=0)
Jun 5 15:11:45 server1 crmd: [1600]: info: crm_update_quorum: Updating quorum
status to false (call=148)
Jun 5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/148,
version=0.52.5): ok (rc=0)
Jun 5 15:11:45 server1 crmd: [1600]: info: crmd_ais_dispatch: Setting expected
votes to 2
Jun 5 15:11:45 server1 crmd: [1600]: WARN: match_down_event: No match for
shutdown action on server2
Jun 5 15:11:45 server1 crmd: [1600]: info: te_update_diff: Stonith/shutdown of
server2 not matched
Jun 5 15:11:45 server1 crmd: [1600]: info: abort_transition_graph:
te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state,
id=server2, magic=NA, cib=0.52.4) : Node failure
Jun 5 15:11:45 server1 crmd: [1600]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Jun 5 15:11:45 server1 crmd: [1600]: info: do_state_transition: All 1 cluster
nodes are eligible to run resources.
Jun 5 15:11:45 server1 crmd: [1600]: info: do_pe_invoke: Query 151: Requesting
the current CIB: S_POLICY_ENGINE
Jun 5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/150,
version=0.52.6): ok (rc=0)
Jun 5 15:11:45 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the
PE: query=151, ref=pe_calc-dc-1401961305-161, seq=64, quorate=0
Jun 5 15:11:45 server1 pengine: [1599]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 5 15:11:45 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed
op xenwin7_last_failure_0 on server1: unknown error (1)
Jun 5 15:11:45 server1 pengine: [1599]: notice: common_apply_stickiness:
Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun 5 15:11:45 server1 pengine: [1599]: notice: common_apply_stickiness:
Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun 5 15:11:45 server1 pengine: [1599]: notice: RecurringOp: Start recurring
monitor (40s) for xenwin7 on server1
Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-DRBD:0#011(Master server1)
Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-DRBD:1#011(Stopped)
Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-Mount:0#011(Started server1)
Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-Mount:1#011(Stopped)
Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Start
xenwin7#011(server1)
Jun 5 15:11:45 server1 crmd: [1600]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jun 5 15:11:45 server1 crmd: [1600]: info: unpack_graph: Unpacked transition
38: 2 actions in 2 synapses
Jun 5 15:11:45 server1 crmd: [1600]: info: do_te_invoke: Processing graph 38
(ref=pe_calc-dc-1401961305-161) derived from /var/lib/pengine/pe-input-93.bz2
Jun 5 15:11:45 server1 crmd: [1600]: info: te_rsc_command: Initiating action
40: start xenwin7_start_0 on server1 (local)
Jun 5 15:11:45 server1 crmd: [1600]: info: do_lrm_rsc_op: Performing
key=40:38:0:43add4e5-6270-43de-8ca9-8a4939271b5b op=xenwin7_start_0 )
Jun 5 15:11:45 server1 lrmd: [1596]: info: rsc:xenwin7 start[41] (pid 9270)
Jun 5 15:11:45 server1 pengine: [1599]: notice: process_pe_message: Transition
38: PEngine Input stored in: /var/lib/pengine/pe-input-93.bz2
Jun 5 15:11:58 server1 kernel: [ 2146.278476] block drbd0: PingAck did not
arrive in time.
Jun 5 15:11:58 server1 kernel: [ 2146.278488] block drbd0: peer( Primary ->
Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
susp( 0 -> 1 )
Jun 5 15:11:58 server1 kernel: [ 2146.278686] block drbd0: asender terminated
Jun 5 15:11:58 server1 kernel: [ 2146.278693] block drbd0: Terminating
drbd0_asender
Jun 5 15:11:58 server1 kernel: [ 2146.278771] block drbd0: Connection closed
Jun 5 15:11:58 server1 kernel: [ 2146.278849] block drbd0: conn(
NetworkFailure -> Unconnected )
Jun 5 15:11:58 server1 kernel: [ 2146.278860] block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0
Jun 5 15:11:58 server1 kernel: [ 2146.278864] block drbd0: receiver terminated
Jun 5 15:11:58 server1 kernel: [ 2146.278868] block drbd0: Restarting
drbd0_receiver
Jun 5 15:11:58 server1 kernel: [ 2146.278872] block drbd0: receiver (re)started
Jun 5 15:11:58 server1 kernel: [ 2146.278881] block drbd0: conn( Unconnected
-> WFConnection )
Jun 5 15:11:58 server1 crm-fence-peer.sh[9353]: invoked for r0
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: - <cib admin_epoch="0"
epoch="52" num_updates="6" />
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <cib epoch="53"
num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
crm_feature_set="3.0.5" update-origin="server1" update-client="crm_resource"
cib-last-written="Thu Jun 5 15:10:26 2014" have-quorum="0" dc-uuid="server1" >
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <configuration >
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <constraints >
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <rsc_location
rsc="Clu-FS-DRBD-Master" id="drbd-fence-by-handler-Clu-FS-DRBD-Master"
__crm_diff_marker__="added:top" >
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <rule
role="Master" score="-INFINITY"
id="drbd-fence-by-handler-rule-Clu-FS-DRBD-Master" >
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <expression
attribute="#uname" operation="ne" value="server1"
id="drbd-fence-by-handler-expr-Clu-FS-DRBD-Master" />
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </rule>
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </rsc_location>
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </constraints>
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </configuration>
Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </cib>
Jun 5 15:11:59 server1 cib: [1595]: info: cib_process_request: Operation
complete: op cib_create for section constraints (origin=local/cibadmin/2,
version=0.53.1): ok (rc=0)
Jun 5 15:11:59 server1 crmd: [1600]: info: abort_transition_graph:
te_update_diff:124 - Triggered transition abort (complete=0, tag=diff,
id=(null), magic=NA, cib=0.53.1) : Non-status change
Jun 5 15:11:59 server1 crmd: [1600]: info: update_abort_priority: Abort
priority upgraded from 0 to 1000000
Jun 5 15:11:59 server1 crmd: [1600]: info: update_abort_priority: Abort action
done superceeded by restart
Jun 5 15:11:59 server1 crm-fence-peer.sh[9353]: INFO peer is reachable, my
disk is UpToDate: placed constraint 'drbd-fence-by-handler-Clu-FS-DRBD-Master'
Jun 5 15:11:59 server1 kernel: [ 2147.428617] block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0 exit code 4 (0x400)
Jun 5 15:11:59 server1 kernel: [ 2147.428623] block drbd0: fence-peer helper
returned 4 (peer was fenced)
Jun 5 15:11:59 server1 kernel: [ 2147.428632] block drbd0: pdsk( DUnknown ->
Outdated )
Jun 5 15:11:59 server1 kernel: [ 2147.428680] block drbd0: new current UUID
C7AE32BDEB8201AF:41DEB2849956CF9F:CE91A410F5C9F940:CE90A410F5C9F940
Jun 5 15:11:59 server1 kernel: [ 2147.428861] block drbd0: susp( 1 -> 0 )
Jun 5 15:12:05 server1 lrmd: [1596]: WARN: xenwin7:start process (PID 9270)
timed out (try 1). Killing with signal SIGTERM (15).
Jun 5 15:12:05 server1 lrmd: [1596]: WARN: operation start[41] on xenwin7 for
client 1600: pid 9270 timed out
Jun 5 15:12:05 server1 crmd: [1600]: ERROR: process_lrm_event: LRM operation
xenwin7_start_0 (41) Timed Out (timeout=20000ms)
Jun 5 15:12:05 server1 crmd: [1600]: WARN: status_from_rc: Action 40
(xenwin7_start_0) on server1 failed (target: 0 vs. rc: -2): Error
Jun 5 15:12:05 server1 crmd: [1600]: WARN: update_failcount: Updating
failcount for xenwin7 on server1 after failed start: rc=-2 (update=INFINITY,
time=1401961325)
Jun 5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph:
match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op,
id=xenwin7_last_failure_0,
magic=2:-2;40:38:0:43add4e5-6270-43de-8ca9-8a4939271b5b, cib=0.53.2) : Event
failed
Jun 5 15:12:05 server1 crmd: [1600]: info: match_graph_event: Action
xenwin7_start_0 (40) confirmed on server1 (rc=4)
Jun 5 15:12:05 server1 crmd: [1600]: info: run_graph:
====================================================
Jun 5 15:12:05 server1 crmd: [1600]: notice: run_graph: Transition 38
(Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pengine/pe-input-93.bz2): Stopped
Jun 5 15:12:05 server1 crmd: [1600]: info: te_graph_trigger: Transition 38 is
now complete
Jun 5 15:12:05 server1 crmd: [1600]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 5 15:12:05 server1 crmd: [1600]: info: do_state_transition: All 1 cluster
nodes are eligible to run resources.
Jun 5 15:12:05 server1 crmd: [1600]: info: do_pe_invoke: Query 153: Requesting
the current CIB: S_POLICY_ENGINE
Jun 5 15:12:05 server1 attrd: [1597]: notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-xenwin7 (INFINITY)
Jun 5 15:12:05 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the
PE: query=153, ref=pe_calc-dc-1401961325-163, seq=64, quorate=0
Jun 5 15:12:05 server1 pengine: [1599]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 5 15:12:05 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed
op xenwin7_last_failure_0 on server1: unknown exec error (-2)
Jun 5 15:12:05 server1 pengine: [1599]: notice: common_apply_stickiness:
Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun 5 15:12:05 server1 pengine: [1599]: notice: common_apply_stickiness:
Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun 5 15:12:05 server1 pengine: [1599]: notice: RecurringOp: Start recurring
monitor (40s) for xenwin7 on server1
Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-DRBD:0#011(Master server1)
Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-DRBD:1#011(Stopped)
Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-Mount:0#011(Started server1)
Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-Mount:1#011(Stopped)
Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Recover
xenwin7#011(Started server1)
Jun 5 15:12:05 server1 crmd: [1600]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jun 5 15:12:05 server1 crmd: [1600]: info: unpack_graph: Unpacked transition
39: 4 actions in 4 synapses
Jun 5 15:12:05 server1 crmd: [1600]: info: do_te_invoke: Processing graph 39
(ref=pe_calc-dc-1401961325-163) derived from /var/lib/pengine/pe-input-94.bz2
Jun 5 15:12:05 server1 crmd: [1600]: info: te_rsc_command: Initiating action
3: stop xenwin7_stop_0 on server1 (local)
Jun 5 15:12:05 server1 attrd: [1597]: notice: attrd_perform_update: Sent
update 124: fail-count-xenwin7=INFINITY
Jun 5 15:12:05 server1 crmd: [1600]: info: do_lrm_rsc_op: Performing
key=3:39:0:43add4e5-6270-43de-8ca9-8a4939271b5b op=xenwin7_stop_0 )
Jun 5 15:12:05 server1 attrd: [1597]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-xenwin7 (1401961325)
Jun 5 15:12:05 server1 lrmd: [1596]: info: rsc:xenwin7 stop[42] (pid 9401)
Jun 5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph:
te_update_diff:164 - Triggered transition abort (complete=0, tag=nvpair,
id=status-server1-fail-count-xenwin7, name=fail-count-xenwin7, value=INFINITY,
magic=NA, cib=0.53.3) : Transient attribute: update
Jun 5 15:12:05 server1 crmd: [1600]: info: update_abort_priority: Abort
priority upgraded from 0 to 1000000
Jun 5 15:12:05 server1 crmd: [1600]: info: update_abort_priority: Abort action
done superceeded by restart
Jun 5 15:12:05 server1 attrd: [1597]: notice: attrd_perform_update: Sent
update 126: last-failure-xenwin7=1401961325
Jun 5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph:
te_update_diff:164 - Triggered transition abort (complete=0, tag=nvpair,
id=status-server1-last-failure-xenwin7, name=last-failure-xenwin7,
value=1401961325, magic=NA, cib=0.53.4) : Transient attribute: update
Jun 5 15:12:05 server1 pengine: [1599]: notice: process_pe_message: Transition
39: PEngine Input stored in: /var/lib/pengine/pe-input-94.bz2
Jun 5 15:12:07 server1 kernel: [ 2155.458452] o2net: Connection to node
server2 (num 1) at 10.0.0.2:7777 has been idle for 30.84 secs, shutting it down.
Jun 5 15:12:07 server1 kernel: [ 2155.458486] o2net: No longer connected to
node server2 (num 1) at 10.0.0.2:7777
Jun 5 15:12:07 server1 kernel: [ 2155.458531]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -112 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:07 server1 kernel: [ 2155.458538] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:13 server1 kernel: [ 2160.562477]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:13 server1 kernel: [ 2160.562484] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:18 server1 kernel: [ 2165.666468]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:18 server1 kernel: [ 2165.666475] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:23 server1 kernel: [ 2170.770473]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:23 server1 kernel: [ 2170.770481] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:25 server1 lrmd: [1596]: WARN: xenwin7:stop process (PID 9401)
timed out (try 1). Killing with signal SIGTERM (15).
Jun 5 15:12:25 server1 lrmd: [1596]: WARN: operation stop[42] on xenwin7 for
client 1600: pid 9401 timed out
Jun 5 15:12:25 server1 crmd: [1600]: ERROR: process_lrm_event: LRM operation
xenwin7_stop_0 (42) Timed Out (timeout=20000ms)
Jun 5 15:12:25 server1 crmd: [1600]: WARN: status_from_rc: Action 3
(xenwin7_stop_0) on server1 failed (target: 0 vs. rc: -2): Error
Jun 5 15:12:25 server1 crmd: [1600]: WARN: update_failcount: Updating
failcount for xenwin7 on server1 after failed stop: rc=-2 (update=INFINITY,
time=1401961345)
Jun 5 15:12:25 server1 crmd: [1600]: info: abort_transition_graph:
match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op,
id=xenwin7_last_failure_0,
magic=2:-2;3:39:0:43add4e5-6270-43de-8ca9-8a4939271b5b, cib=0.53.5) : Event
failed
Jun 5 15:12:25 server1 crmd: [1600]: info: match_graph_event: Action
xenwin7_stop_0 (3) confirmed on server1 (rc=4)
Jun 5 15:12:25 server1 crmd: [1600]: info: run_graph:
====================================================
Jun 5 15:12:25 server1 crmd: [1600]: notice: run_graph: Transition 39
(Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0,
Source=/var/lib/pengine/pe-input-94.bz2): Stopped
Jun 5 15:12:25 server1 crmd: [1600]: info: te_graph_trigger: Transition 39 is
now complete
Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: All 1 cluster
nodes are eligible to run resources.
Jun 5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke: Query 155: Requesting
the current CIB: S_POLICY_ENGINE
Jun 5 15:12:25 server1 attrd: [1597]: notice: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-xenwin7 (1401961345)
Jun 5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the
PE: query=155, ref=pe_calc-dc-1401961345-165, seq=64, quorate=0
Jun 5 15:12:25 server1 attrd: [1597]: notice: attrd_perform_update: Sent
update 128: last-failure-xenwin7=1401961345
Jun 5 15:12:25 server1 pengine: [1599]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 5 15:12:25 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed
op xenwin7_last_failure_0 on server1: unknown exec error (-2)
Jun 5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness:
Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun 5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness:
Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun 5 15:12:25 server1 pengine: [1599]: WARN: common_apply_stickiness: Forcing
xenwin7 away from server1 after 1000000 failures (max=1000000)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-DRBD:0#011(Master server1)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-DRBD:1#011(Stopped)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-Mount:0#011(Started server1)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-Mount:1#011(Stopped)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
xenwin7#011(Started unmanaged)
Jun 5 15:12:25 server1 crmd: [1600]: info: abort_transition_graph:
te_update_diff:164 - Triggered transition abort (complete=1, tag=nvpair,
id=status-server1-last-failure-xenwin7, name=last-failure-xenwin7,
value=1401961345, magic=NA, cib=0.53.6) : Transient attribute: update
Jun 5 15:12:25 server1 crmd: [1600]: info: handle_response: pe_calc
calculation pe_calc-dc-1401961345-165 is obsolete
Jun 5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke: Query 156: Requesting
the current CIB: S_POLICY_ENGINE
Jun 5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the
PE: query=156, ref=pe_calc-dc-1401961345-166, seq=64, quorate=0
Jun 5 15:12:25 server1 pengine: [1599]: notice: process_pe_message: Transition
40: PEngine Input stored in: /var/lib/pengine/pe-input-95.bz2
Jun 5 15:12:25 server1 pengine: [1599]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jun 5 15:12:25 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed
op xenwin7_last_failure_0 on server1: unknown exec error (-2)
Jun 5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness:
Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun 5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness:
Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off
Jun 5 15:12:25 server1 pengine: [1599]: WARN: common_apply_stickiness: Forcing
xenwin7 away from server1 after 1000000 failures (max=1000000)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-DRBD:0#011(Master server1)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-DRBD:1#011(Stopped)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-Mount:0#011(Started server1)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
Clu-FS-Mount:1#011(Stopped)
Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave
xenwin7#011(Started unmanaged)
Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jun 5 15:12:25 server1 crmd: [1600]: info: unpack_graph: Unpacked transition
41: 0 actions in 0 synapses
Jun 5 15:12:25 server1 crmd: [1600]: info: do_te_invoke: Processing graph 41
(ref=pe_calc-dc-1401961345-166) derived from /var/lib/pengine/pe-input-96.bz2
Jun 5 15:12:25 server1 crmd: [1600]: info: run_graph:
====================================================
Jun 5 15:12:25 server1 crmd: [1600]: notice: run_graph: Transition 41
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-96.bz2): Complete
Jun 5 15:12:25 server1 crmd: [1600]: info: te_graph_trigger: Transition 41 is
now complete
Jun 5 15:12:25 server1 crmd: [1600]: info: notify_crmd: Transition 41 status:
done - <null>
Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: Starting
PEngine Recheck Timer
Jun 5 15:12:25 server1 pengine: [1599]: notice: process_pe_message: Transition
41: PEngine Input stored in: /var/lib/pengine/pe-input-96.bz2
Jun 5 15:12:28 server1 kernel: [ 2175.874477]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:28 server1 kernel: [ 2175.874485] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:33 server1 kernel: [ 2180.978498]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:33 server1 kernel: [ 2180.978506] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:38 server1 kernel: [ 2185.538465] o2net: No connection established
with node 1 after 30.0 seconds, giving up.
Jun 5 15:12:38 server1 kernel: [ 2186.082473]
(xend,9339,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:38 server1 kernel: [ 2186.082480] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:43 server1 kernel: [ 2191.186466]
(xend,9339,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:43 server1 kernel: [ 2191.186474] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:44 server1 kernel: [ 2191.603442]
(pool,9480,3):dlm_do_master_request:1332 ERROR: link to 1 went down!
Jun 5 15:12:44 server1 kernel: [ 2191.603449]
(pool,9480,3):dlm_get_lock_resource:917 ERROR: status = -107
Jun 5 15:12:48 server1 kernel: [ 2196.290472]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:48 server1 kernel: [ 2196.290480] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:53 server1 kernel: [ 2201.394470]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:53 server1 kernel: [ 2201.394477] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:12:58 server1 kernel: [ 2206.498469]
(xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x649b059e) to node 1
Jun 5 15:12:58 server1 kernel: [ 2206.498476] o2dlm: Waiting on the death of
node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:13:00 server1 kernel: [ 2207.550684] o2cb: o2dlm has evicted node 1
from domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:13:01 server1 kernel: [ 2208.562466] o2dlm: Waiting on the recovery
of node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:13:03 server1 kernel: [ 2211.434473] o2dlm: Begin recovery on domain
F18CB82626444DD0913312B7AE741C5B for node 1
Jun 5 15:13:03 server1 kernel: [ 2211.434501] o2dlm: Node 0 (me) is the
Recovery Master for the dead node 1 in domain F18CB82626444DD0913312B7AE741C5B
Jun 5 15:13:03 server1 kernel: [ 2211.434597] o2dlm: End recovery on domain
F18CB82626444DD0913312B7AE741C5B
Jun 5 15:13:04 server1 kernel: [ 2211.602493]
(pool,9480,3):dlm_restart_lock_mastery:1221 ERROR: node down! 1
Jun 5 15:13:04 server1 kernel: [ 2211.602502]
(pool,9480,3):dlm_wait_for_lock_mastery:1038 ERROR: status = -11
Jun 5 15:13:05 server1 kernel: [ 2212.606674] ocfs2: Begin replay journal
(node 1, slot 1) on device (147,0)
Jun 5 15:13:06 server1 kernel: [ 2214.350572] ocfs2: End replay journal (node
1, slot 1) on device (147,0)
Jun 5 15:13:06 server1 kernel: [ 2214.360790] ocfs2: Beginning quota recovery
on device (147,0) for slot 1
Jun 5 15:13:06 server1 kernel: [ 2214.386783] ocfs2: Finishing quota recovery
on device (147,0) for slot 1
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: add
XENBUS_PATH=backend/vbd/4/768
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: add
XENBUS_PATH=backend/vbd/4/5632
Jun 5 15:13:07 server1 kernel: [ 2214.638622] device tap4.0 entered
promiscuous mode
Jun 5 15:13:07 server1 kernel: [ 2214.638685] xenbr1: port 2(tap4.0) entering
forwarding state
Jun 5 15:13:07 server1 kernel: [ 2214.638699] xenbr1: port 2(tap4.0) entering
forwarding state
Jun 5 15:13:07 server1 NetworkManager[887]: SCPlugin-Ifupdown: devices
added (path: /sys/devices/vif-4-0/net/vif4.0, iface: vif4.0)
Jun 5 15:13:07 server1 NetworkManager[887]: SCPlugin-Ifupdown: device added
(path: /sys/devices/vif-4-0/net/vif4.0, iface: vif4.0): no ifupdown
configuration found.
Jun 5 15:13:07 server1 NetworkManager[887]: <warn> failed to allocate link
cache: (-10) Operation not supported
Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): carrier is OFF
Jun 5 15:13:07 server1 NetworkManager[887]: <error> [1401961387.118193]
[nm-device-ethernet.c:456] real_update_permanent_hw_address(): (vif4.0): unable
to read permanent MAC address (error 0)
Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): new Ethernet
device (driver: 'vif' ifindex: 12)
Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): exported as
/org/freedesktop/NetworkManager/Devices/6
Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): now managed
Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): device state
change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): bringing up
device.
Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): preparing device.
Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): deactivating
device (reason 'managed') [2]
Jun 5 15:13:07 server1 NetworkManager[887]: <info> Unmanaged Device found;
state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889)
Jun 5 15:13:07 server1 NetworkManager[887]: <info> Unmanaged Device found;
state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889)
Jun 5 15:13:07 server1 NetworkManager[887]: <info> Added default wired
connection 'Wired connection 5' for /sys/devices/vif-4-0/net/vif4.0
Jun 5 15:13:07 server1 kernel: [ 2214.659589] ADDRCONF(NETDEV_UP): vif4.0:
link is not ready
Jun 5 15:13:07 server1 kernel: [ 2214.660699] ADDRCONF(NETDEV_UP): vif4.0:
link is not ready
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: online type_if=vif
XENBUS_PATH=backend/vif/4/0
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: add type_if=tap
XENBUS_PATH=
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing
backend/vbd/4/768/node /dev/loop0 to xenstore.
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing
backend/vbd/4/768/physical-device 7:0 to xenstore.
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing
backend/vbd/4/768/hotplug-status connected to xenstore.
Jun 5 15:13:07 server1 kernel: [ 2214.842610] xenbr1: port 2(tap4.0) entering
forwarding state
Jun 5 15:13:07 server1 kernel: [ 2214.852647] device vif4.0 entered
promiscuous mode
Jun 5 15:13:07 server1 kernel: [ 2214.858373] ADDRCONF(NETDEV_UP): vif4.0:
link is not ready
Jun 5 15:13:07 server1 kernel: [ 2214.861475] xenbr1: port 2(tap4.0) entering
forwarding state
Jun 5 15:13:07 server1 kernel: [ 2214.861487] xenbr1: port 2(tap4.0) entering
forwarding state
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Successful
vif-bridge add for tap4.0, bridge xenbr1.
Jun 5 15:13:07 server1 NetworkManager[887]: SCPlugin-Ifupdown: devices
added (path: /sys/devices/virtual/net/tap4.0, iface: tap4.0)
Jun 5 15:13:07 server1 NetworkManager[887]: SCPlugin-Ifupdown: device added
(path: /sys/devices/virtual/net/tap4.0, iface: tap4.0): no ifupdown
configuration found.
Jun 5 15:13:07 server1 NetworkManager[887]: <warn>
/sys/devices/virtual/net/tap4.0: couldn't determine device driver; ignoring...
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Successful
vif-bridge online for vif4.0, bridge xenbr1.
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Writing
backend/vif/4/0/hotplug-status connected to xenstore.
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing
backend/vbd/4/5632/node /dev/loop1 to xenstore.
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing
backend/vbd/4/5632/physical-device 7:1 to xenstore.
Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing
backend/vbd/4/5632/hotplug-status connected to xenstore.
Jun 5 15:13:08 server1 avahi-daemon[898]: Joining mDNS multicast group on
interface tap4.0.IPv6 with address fe80::fcff:ffff:feff:ffff.
Jun 5 15:13:08 server1 avahi-daemon[898]: New relevant interface tap4.0.IPv6
for mDNS.
Jun 5 15:13:08 server1 avahi-daemon[898]: Registering new address record for
fe80::fcff:ffff:feff:ffff on tap4.0.*.
Jun 5 15:13:17 server1 kernel: [ 2225.202456] tap4.0: no IPv6 routers present
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org