On Tue, 2017-12-12 at 15:30 +0200, Антон Сацкий wrote: > Hi list > Need your help. > Got 2 servers use Pacemaker Corosync Drbd > > [root@voipserver ~]# pcs config > Cluster Name: ClusterKrusher > Corosync Nodes: > voipserver.primary voipserver.backup > Pacemaker Nodes: > voipserver.backup voipserver.primary > > Resources: > Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: cidr_netmask=32 ip=172.20.11.10 > Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) > start interval=0s timeout=20s (ClusterIP-start- > interval-0s) > stop interval=0s timeout=20s (ClusterIP-stop-interval- > 0s) > Master: WebDataClone > Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 > clone-node-max=1 > Resource: WebData (class=ocf provider=linbit type=drbd) > Attributes: drbd_resource=r0 > Operations: demote interval=0s timeout=90 (WebData-demote- > interval-0s) > monitor interval=60s (WebData-monitor-interval-60s)
My drbd is too rusty to comment on your specific issue, but a couple of general notes: * master/slave resources need two monitors, the regular one (which monitors the slave role) and a second one (with a different interval) monitoring the master role * fencing needs to be configured in pacemaker, and drbd needs to be configured to use pacemaker fencing, otherwise split-brain can more easily happen > promote interval=0s timeout=90 (WebData-promote- > interval-0s) > start interval=0s timeout=240 (WebData-start-interval- > 0s) > stop interval=0s timeout=100 (WebData-stop-interval- > 0s) > Resource: WebFS (class=ocf provider=heartbeat type=Filesystem) > Attributes: device=/dev/drbd1 directory=/replica fstype=ext3 > Operations: monitor interval=20 timeout=40 (WebFS-monitor-interval- > 20) > start interval=0s timeout=60 (WebFS-start-interval-0s) > stop interval=0s timeout=60 (WebFS-stop-interval-0s) > Resource: Asterisk (class=lsb type=asterisk) > Operations: monitor interval=15 timeout=15 (Asterisk-monitor- > interval-15) > start interval=0s timeout=15 (Asterisk-start-interval- > 0s) > stop interval=0s timeout=15 (Asterisk-stop-interval-0s) > Resource: MYSQL (class=lsb type=mysql) > Operations: monitor interval=15 timeout=15 (MYSQL-monitor-interval- > 15) > start interval=0s timeout=15 (MYSQL-start-interval-0s) > stop interval=0s timeout=15 (MYSQL-stop-interval-0s) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Ordering Constraints: > promote WebDataClone then start WebFS (kind:Mandatory) > start WebFS then start MYSQL (kind:Mandatory) > start ClusterIP then start Asterisk (kind:Mandatory) > Colocation Constraints: > WebFS with WebDataClone (score:INFINITY) (with-rsc-role:Master) > MYSQL with WebFS (score:INFINITY) > Asterisk with ClusterIP (score:INFINITY) > Ticket Constraints: > > Alerts: > No alerts defined > > Resources Defaults: > resource-stickiness: 100 > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: ClusterKrusher > dc-version: 1.1.16-12.el7_4.2-94ff4df > have-watchdog: false > stonith-enabled: false > > Quorum: > Options: > =================== > > > After some tibe got in logs > [root@voipserver ~]# cat /var/log/messages |grep drbd > Dec 12 14:08:52 voipserver kernel: block drbd1: role( Secondary -> > Primary ) > Dec 12 14:08:52 voipserver Filesystem(WebFS)[64935]: INFO: Running > start for /dev/drbd1 on /replica > Dec 12 14:08:52 voipserver kernel: EXT4-fs (drbd1): mounting ext3 > file system using the ext4 subsystem > Dec 12 14:08:53 voipserver kernel: EXT4-fs (drbd1): mounted > filesystem with ordered data mode. Opts: (null) > Dec 12 14:18:13 voipserver Filesystem(WebFS)[3134]: INFO: Running > stop for /dev/drbd1 on /replica > Dec 12 14:18:17 voipserver Filesystem(WebFS)[3319]: INFO: Running > start for /dev/drbd1 on /replica > Dec 12 14:18:17 voipserver kernel: EXT4-fs (drbd1): mounting ext3 > file system using the ext4 subsystem > Dec 12 14:18:17 voipserver kernel: EXT4-fs (drbd1): mounted > filesystem with ordered data mode. Opts: (null) > Dec 12 14:44:07 voipserver Filesystem(WebFS)[11669]: INFO: Running > stop for /dev/drbd1 on /replica > Dec 12 14:44:07 voipserver kernel: block drbd1: role( Primary -> > Secondary ) > Dec 12 14:44:07 voipserver kernel: block drbd1: 3552 KB (888 bits) > marked out-of-sync by on disk bit-map. > Dec 12 14:44:08 voipserver kernel: block drbd1: disk( UpToDate -> > Failed ) > Dec 12 14:44:08 voipserver kernel: block drbd1: 3552 KB (888 bits) > marked out-of-sync by on disk bit-map. > Dec 12 14:44:08 voipserver kernel: block drbd1: disk( Failed -> > Diskless ) > Dec 12 14:44:08 voipserver kernel: drbd r0: Terminating drbd_w_r0 > Dec 12 14:44:19 voipserver kernel: drbd: loading out-of-tree module > taints kernel. > Dec 12 14:44:19 voipserver kernel: drbd: module verification failed: > signature and/or required key missing - tainting kernel > Dec 12 14:44:19 voipserver systemd-modules-load: Inserted module > 'drbd' > Dec 12 14:44:19 voipserver kernel: drbd: initialized. Version: > 8.4.10-1 (api:1/proto:86-101) > Dec 12 14:44:19 voipserver kernel: drbd: GIT-hash: > a4d5de01fffd7e4cde48a080e2c686f9e8cebf4c build by mockbuild@, 2017- > 09-15 14:23:22 > Dec 12 14:44:19 voipserver kernel: drbd: registered as block device > major 147 > Dec 12 14:45:02 voipserver Filesystem(WebFS)[1400]: WARNING: Couldn't > find device [/dev/drbd1]. Expected /dev/??? to exist > Dec 12 14:45:03 voipserver kernel: drbd r0: Starting worker thread > (from drbdsetup-84 [1524]) > Dec 12 14:45:03 voipserver kernel: block drbd1: disk( Diskless -> > Attaching ) > Dec 12 14:45:03 voipserver kernel: drbd r0: Method to ensure write > ordering: flush > Dec 12 14:45:03 voipserver kernel: block drbd1: max BIO size = 524288 > Dec 12 14:45:03 voipserver kernel: block drbd1: drbd_bm_resize called > with capacity == 419153344 > Dec 12 14:45:03 voipserver kernel: block drbd1: resync bitmap: > bits=52394168 words=818659 pages=1599 > Dec 12 14:45:03 voipserver kernel: block drbd1: size = 200 GB > (209576672 KB) > Dec 12 14:45:03 voipserver kernel: block drbd1: recounting of set > bits took additional 1 jiffies > Dec 12 14:45:03 voipserver kernel: block drbd1: 3552 KB (888 bits) > marked out-of-sync by on disk bit-map. > Dec 12 14:45:03 voipserver kernel: block drbd1: disk( Attaching -> > UpToDate ) > Dec 12 14:45:03 voipserver kernel: block drbd1: attached to UUIDs > FBA12F26BE1DEE73:EE5942173C75DE98:1BF4DECFE20D51E2:1BF3DECFE20D51E3 > Dec 12 14:45:03 voipserver kernel: drbd r0: conn( StandAlone -> > Unconnected ) > Dec 12 14:45:03 voipserver kernel: drbd r0: Starting receiver thread > (from drbd_w_r0 [1525]) > Dec 12 14:45:03 voipserver kernel: drbd r0: receiver (re)started > Dec 12 14:45:03 voipserver kernel: drbd r0: conn( Unconnected -> > WFConnection ) > Dec 12 14:45:03 voipserver kernel: drbd r0: Handshake successful: > Agreed network protocol version 101 > Dec 12 14:45:03 voipserver kernel: drbd r0: Feature flags enabled on > protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME. > Dec 12 14:45:03 voipserver kernel: drbd r0: conn( WFConnection -> > WFReportParams ) > Dec 12 14:45:03 voipserver kernel: drbd r0: Starting ack_recv thread > (from drbd_r_r0 [1534]) > Dec 12 14:45:03 voipserver kernel: block drbd1: drbd_sync_handshake: > Dec 12 14:45:03 voipserver kernel: block drbd1: self > FBA12F26BE1DEE72:EE5942173C75DE98:1BF4DECFE20D51E2:1BF3DECFE20D51E3 > bits:888 flags:0 > Dec 12 14:45:03 voipserver kernel: block drbd1: peer > 93BB6F0A5075345D:EE5942173C75DE99:1BF4DECFE20D51E3:1BF3DECFE20D51E3 > bits:38004 flags:2 > Dec 12 14:45:03 voipserver kernel: block drbd1: uuid_compare()=100 by > rule 90 > Dec 12 14:45:03 voipserver kernel: block drbd1: helper command: > /sbin/drbdadm initial-split-brain minor-1 > Dec 12 14:45:03 voipserver kernel: block drbd1: helper command: > /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0) > Dec 12 14:45:03 voipserver kernel: block drbd1: Split-Brain detected > but unresolved, dropping connection! > Dec 12 14:45:03 voipserver kernel: block drbd1: helper command: > /sbin/drbdadm split-brain minor-1 > Dec 12 14:45:03 voipserver kernel: block drbd1: helper command: > /sbin/drbdadm split-brain minor-1 exit code 0 (0x0) > Dec 12 14:45:03 voipserver kernel: drbd r0: conn( WFReportParams -> > Disconnecting ) > Dec 12 14:45:03 voipserver kernel: drbd r0: error receiving > ReportState, e: -5 l: 0! > Dec 12 14:45:03 voipserver kernel: drbd r0: ack_receiver terminated > Dec 12 14:45:03 voipserver kernel: drbd r0: Terminating drbd_a_r0 > Dec 12 14:45:03 voipserver kernel: drbd r0: Connection closed > Dec 12 14:45:03 voipserver kernel: drbd r0: conn( Disconnecting -> > StandAlone ) > Dec 12 14:45:03 voipserver kernel: drbd r0: receiver terminated > Dec 12 14:45:03 voipserver kernel: drbd r0: Terminating drbd_r_r0 > > > > So i need to decide the best way now to conf split brain recovery > config files appreciated. > > Primary > [root@voipserver ~]# drbd-overview > NOTE: drbd-overview will be deprecated soon. > Please consider using drbdtop. > > 1:r0/0 WFConnection Primary/Unknown UpToDate/DUnknown /replica ext3 > 197G 720M 186G 1% > > Secondary > > [root@voipserver ~]# drbd-overview > NOTE: drbd-overview will be deprecated soon. > Please consider using drbdtop. > > 1:r0/0 StandAlone Secondary/Unknown UpToDate/DUnknown > > > So i need to decide the best way now to conf split brain recovery > config files appreciated. > THANKS > > -- > Best regards > Antony > tel. +380669197533 > tel2. +380636564340 > Paypal http://paypal.me/Satskiy > [email protected] > _______________________________________________ > Users mailing list: [email protected] > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot <[email protected]> _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
