I've been testing an implementation of a HA mysql cluster for a few months now.
I came to this project with no preior knoweldge of what was copncerned/needed
and have learned orgainscally via various online how-tos and web sites which
many cases wrere slightly out-of-date to missing large chunks of perinent
information. Thats not a criticism at all of those still helpful aids, but
more an indication of how there are huge holes in my knowledge..
So with that background ...
The cluster consits of 2 centos7 servers (esterla and rafeiro) running
DRBD90
corosync 2.4.5pacemaker 0.9.169
On the whole its all running fine with some squeaks that we are hoping are down
to underlying SAN issues.
However...
earlier this week we had some split-brain issues - some of which seem to have
fixed themselves, others not. What we did notice that whilst the split-brain
was being reported the overall cluster remained up (of course?) in that the VIP
remained up, abnd the mysql instance remained abvailavle via the VIP on port
3306. The underlying coincern being of course that had a "flip" occurred from
previous master to the previous slave, the new master's drbd device (moun ted
on /var/lib/mysql) may well be out of sync and thus contain "old" data.
So - system logs recently show this
ESTRELAOct 18th
Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 drbd0:
Split-Brain detected, 1 primaries, automatically solved. Sync from peer node
Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 drbd0:
Split-Brain detected, 1 primaries, automatically solved. Sync from peer node
Oct 19th
Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 drbd0:
Split-Brain detected but unresolved, dropping connection!
Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 drbd0:
Split-Brain detected but unresolved, dropping connection!
RAFEIRO
Oct 18
Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 drbd0:
Split-Brain detected, 1 primaries, automatically solved. Sync from this node
Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 drbd0:
Split-Brain detected, 1 primaries, automatically solved. Sync from this node
Oct 19
Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 drbd0:
Split-Brain detected but unresolved, dropping connection!
Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 drbd0:
Split-Brain detected but unresolved, dropping connection!
So on the 18th the split-brain issues was detected but (automatically?) fixed.
But on the 19th it wasnt...
Any ideas how to investigate why it worked on the 18th and not the 19th? I am
presuming the drbd config is set up to automatically fix stuff but maybe we
just got lucky and it isnt? (Ive googled automatic fixes but I am afarid I
cant follow what Im being told/reading :-( )
drbd config below
ta
ian
==================
ESTRELAresource mysql01 {
protocol C;
meta-disk internal;
device /dev/drbd0;
disk /dev/vg_mysql/lv_mysql;
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
net {
allow-two-primaries no;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
disk {
on-io-error detach;
}
syncer {
verify-alg sha1;
}
on estrela {
address 10.108.248.165:7789;
}
on rafeiro {
address 10.108.248.166:7789;
}
}
RAFEIRO
resource mysql01 {
protocol C;
meta-disk internal;
device /dev/drbd0;
disk /dev/vg_mysql/lv_mysql;
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
net {
allow-two-primaries no;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
disk {
on-io-error detach;
}
syncer {
verify-alg sha1;
}
on estrela {
address 10.108.248.165:7789;
}
on rafeiro {
address 10.108.248.166:7789;
}
}
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/