On 13 Jun 2014, at 9:21 pm, Jason Hendry <[email protected]> wrote:
>
> Hi Everyone,
>
> This is my first post, please let me know if I am missing any
> standard/essential information to help with debugging...
>
> I have a 2-node cluster with node-level fencing. The cluster appears to be
> configured with "Blind Faith" but my nodes are still killing each other if
> the host is up but the cluster is not running on it, to produce this I:
>
> Power-on both nodes
> Stop the cluster on both node [pcs cluster stop]
> Start the cluster on a single node [pcs cluster start]
>
> After starting the cluster I get this message the cluster logs:
>
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: warning:
> unpack_nodes: Blind faith: not fencing unseen nodes
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: info:
> determine_online_status_fencing: Node ha-nfs1 is active
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: info:
> determine_online_status: Node ha-nfs1 is online
> Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: warning:
> pe_fence_node: Node ha-nfs2 will be fenced because the peer has not been seen
> by the cluster
>
> Am I miss-understanding the meaning of "Blind faith" or is something
> mis-configured?
Looks like you might have found a bug.
"Blind faith" is a particularly dangerous option to turn on, so it doesn't get
tested very often.
A few lines further down in your logs should be a message from pengine that
looks something like:
Jun 13 09:59:48 [15756] dev-drbd01.london.mintel.ad pengine: warning:
process_pe_message: Calculated Transition ${X}:
/var/lib/pacemaker/pengine/pe-warn-${Y}.bz2
If you can send us that file I'll make sure it gets fixed.
> Both my nodes are:
>
> Centos 6.5 (Final) (uname -a: Linux dev-drbd01.london.mintel.ad
> 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64
> x86_64 GNU/Linu
> pacemakerd --version ( Pacemaker 1.1.10-14.el6_5.3 )
>
> Here is my cluster configuration:
>
>
> pcs resource create nfsDRBD ocf:linbit:drbd drbd_resource=nfs
> op monitor interval=8s meta migration-thresholds=0
> pcs resource create nfsLVM ocf:heartbeat:LVM
> volgrpname="vg_drbd" op monitor interval=7s meta migration-thresholds=0
> pcs resource create nfsDir ocf:heartbeat:Filesystem
> device=/dev/vg_drbd/lv_nfs_home directory=/data/nfs/home fstype=ext4
> run_fsck=force op monitor interval=6s meta migration-thresholds=0
> pcs resource create nfsService lsb:nfs op monitor interval=5s meta
> migration-thresholds=0
> pcs resource create nfsIP ocf:heartbeat:IPaddr2 ip=a.b.c.d
> cidr_netmask=32 op monitor interval=9s meta migration-thresholds=0
> pcs resource create network_ping ocf:pacemaker:ping name=network_ping
> multiplier=5 host_list="a.b.c.d w.x.y.z" attempts=3 timeout=1
> failure_score=10 op monitor interval=4s
> pcs resource clone network_ping op meta
> interleave=true
>
> pcs resource master nfsDRBD_ms nfsDRBD master-max=1 master-node-max=1
> clone-max=2 clone-node-max=1 notify=true target-role=Started is-managed=true
> pcs resource group add nfsGroup nfsLVM nfsDir nfsService nfsIP
>
> pcs constraint order promote nfsDRBD_ms then start nfsGroup kind=Mandatory
> symmetrical=false
> pcs constraint order stop nfsGroup then demote nfsDRBD_ms kind=Optional
> symmetrical=false
> pcs constraint colocation add nfsGroup with master nfsDRBD_ms INFINITY
>
> pcs property set no-quorum-policy=ignore
> pcs property set expected-quorum-votes=1
> pcs property set stonith-enabled=true
> pcs property set default-resource-stickiness=200
> pcs property set batch-limit=1
> pcs property set startup-fencing=false
>
> pcs stonith create ha-nfs1_poweroff fence_virsh action=off ipaddr=a.b.c.d
> login=stonith secure=yes identity_file=/data/stonith_id_rsa
> port=dev-drbd01.london pcmk_host_map="ha-nfs1:dev-drbd01.london" op meta
> priority=200
> pcs stonith create ha-nfs2_poweroff fence_virsh action=off ipaddr=w.x.y.z
> login=stonith secure=yes identity_file=/data/stonith_id_rsa
> port=dev-drbd02.london pcmk_host_map="ha-nfs2:dev-drbd02.london" op meta
> priority=200
>
> pcs stonith level add 1 ha-nfs1 ha-nfs1_poweroff
> pcs stonith level add 1 ha-nfs2 ha-nfs2_poweroff
>
> pcs constraint location ha-nfs1_poweroff prefers ha-nfs1=-INFINITY
> pcs constraint location ha-nfs2_poweroff prefers ha-nfs2=-INFINITY
> pcs constraint location nfsDRBD rule role=Master defined network_ping
>
> Jason H
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for our other offices can be found at
> http://www.mintel.com/office-locations
> .
>
> This email and any attachments may include content that is confidential,
> privileged
>
> or otherwise protected under applicable law. Unauthorised disclosure,
> copying, distribution
> or use of the contents is prohibited and may be unlawful. If you have
> received this email in error,
> including without appropriate authorisation, then please reply to the sender
> about the error
> and delete this email and any attachments.
>
>
> _______________________________________________
>
> Pacemaker mailing list: [email protected]
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
