On Tue, 18 Aug 2020 14:27:01 +0200 Lukas Straub <[email protected]> wrote:
> On Tue, 4 Aug 2020 12:46:29 +0200 > Lukas Straub <[email protected]> wrote: > > > Hello Everyone, > > So here is v3. Patch 1 can already be merged independently of the others. > > Please review. > > > > Regards, > > Lukas Straub > > > > Based-on: <[email protected]> > > "Introduce 'yank' oob qmp command to recover from hanging qemu" > > > > Changes: > > > > v3: > > -resource-agent: Don't determine local qemu state by remote master-score, > > query > > directly via qmp instead > > -resource-agent: Add max_queue_size parameter for colo-compare > > -resource-agent: Fix monitor action on secondary returning error during > > clean shutdown > > -resource-agent: Fix stop action setting master-score to 0 on primary on > > clean shutdown > > > > v2: > > -use new yank api > > -drop disk_size parameter > > -introduce pick_qemu_util function and use it > > > > Overview: > > > > Hello Everyone, > > These patches introduce a resource agent for fully automatic management of > > colo > > and a test suite building upon the resource agent to extensively test colo. > > > > Test suite features: > > -Tests failover with peer crashing and hanging and failover during > > checkpoint > > -Tests network using ssh and iperf3 > > -Quick test requires no special configuration > > -Network test for testing colo-compare > > -Stress test: failover all the time with network load > > > > Resource agent features: > > -Fully automatic management of colo > > -Handles many failures: hanging/crashing qemu, replication error, disk > > error, ... > > -Recovers from hanging qemu by using the "yank" oob command > > -Tracks which node has up-to-date data > > -Works well in clusters with more than 2 nodes > > > > Run times on my laptop: > > Quick test: 200s > > Network test: 800s (tagged as slow) > > Stress test: 1300s (tagged as slow) > > > > For the last two tests, the test suite needs access to a network bridge to > > properly test the network, so some parameters need to be given to the test > > run. See tests/acceptance/colo.py for more information. > > > > Regards, > > Lukas Straub > > > > Lukas Straub (7): > > block/quorum.c: stable children names > > avocado_qemu: Introduce pick_qemu_util to pick qemu utility binaries > > boot_linux.py: Use pick_qemu_util > > colo: Introduce resource agent > > colo: Introduce high-level test suite > > configure,Makefile: Install colo resource-agent > > MAINTAINERS: Add myself as maintainer for COLO resource agent > > > > MAINTAINERS | 6 + > > Makefile | 5 + > > block/quorum.c | 20 +- > > configure | 10 + > > scripts/colo-resource-agent/colo | 1501 +++++++++++++++++++++ > > scripts/colo-resource-agent/crm_master | 44 + > > scripts/colo-resource-agent/crm_resource | 12 + > > tests/acceptance/avocado_qemu/__init__.py | 15 + > > tests/acceptance/boot_linux.py | 11 +- > > tests/acceptance/colo.py | 677 ++++++++++ > > 10 files changed, 2286 insertions(+), 15 deletions(-) > > create mode 100755 scripts/colo-resource-agent/colo > > create mode 100755 scripts/colo-resource-agent/crm_master > > create mode 100755 scripts/colo-resource-agent/crm_resource > > create mode 100644 tests/acceptance/colo.py > > > > -- > > 2.20.1 > > Ping... Ping 2... Kevin, can you already apply patch 1 "block/quorum.c: stable children names"? It resolves the following bug: https://bugs.launchpad.net/qemu/+bug/1881231 Regards, Lukas Straub
pgp1YZH9pyo2c.pgp
Description: OpenPGP digital signature
