Cédric Le Goater <[email protected]> writes:
> On 1/5/24 19:04, Fabiano Rosas wrote:
>> The migration tests have support for being passed two QEMU binaries to
>> test migration compatibility.
>>
>> Add a CI job that builds the lastest release of QEMU and another job
>> that uses that version plus an already present build of the current
>> version and run the migration tests with the two, both as source and
>> destination. I.e.:
>>
>> old QEMU (n-1) -> current QEMU (development tree)
>> current QEMU (development tree) -> old QEMU (n-1)
>>
>> The purpose of this CI job is to ensure the code we're about to merge
>> will not cause a migration compatibility problem when migrating the
>> next release (which will contain that code) to/from the previous
>> release.
>>
>> I'm leaving the jobs as manual for now because using an older QEMU in
>> tests could hit bugs that were already fixed in the current
>> development tree and we need to handle those case-by-case.
>>
>> Note: for user forks, the version tags need to be pushed to gitlab
>> otherwise it won't be able to checkout a different version.
>>
>> Signed-off-by: Fabiano Rosas <[email protected]>
>> ---
>> .gitlab-ci.d/buildtest.yml | 53 ++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>>
>> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> index 91663946de..81163a3f6a 100644
>> --- a/.gitlab-ci.d/buildtest.yml
>> +++ b/.gitlab-ci.d/buildtest.yml
>> @@ -167,6 +167,59 @@ build-system-centos:
>> x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
>> MAKE_CHECK_ARGS: check-build
>>
>> +build-previous-qemu:
>> + extends: .native_build_job_template
>> + artifacts:
>> + when: on_success
>> + expire_in: 2 days
>> + paths:
>> + - build-previous
>> + exclude:
>> + - build-previous/**/*.p
>> + - build-previous/**/*.a.p
>> + - build-previous/**/*.fa.p
>> + - build-previous/**/*.c.o
>> + - build-previous/**/*.c.o.d
>> + - build-previous/**/*.fa
>> + needs:
>> + job: amd64-opensuse-leap-container
>> + variables:
>> + QEMU_JOB_OPTIONAL: 1
>> + IMAGE: opensuse-leap
>> + TARGETS: x86_64-softmmu aarch64-softmmu
>> + before_script:
>> + - export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/'
>> VERSION)"
>> + - git checkout $QEMU_PREV_VERSION
>> + after_script:
>> + - mv build build-previous
>> +
>> +.migration-compat-common:
>> + extends: .common_test_job_template
>> + needs:
>> + - job: build-previous-qemu
>> + - job: build-system-opensuse
>> + allow_failure: true
>> + variables:
>> + QEMU_JOB_OPTIONAL: 1
>> + IMAGE: opensuse-leap
>> + MAKE_CHECK_ARGS: check-build
>> + script:
>> + - cd build
>> + - QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
>> + QTEST_QEMU_BINARY=./qemu-system-${TARGET}
>> ./tests/qtest/migration-test
>> + - QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-${TARGET}
>> + QTEST_QEMU_BINARY=./qemu-system-${TARGET}
>> ./tests/qtest/migration-test
>> +
>> +migration-compat-aarch64:
>> + extends: .migration-compat-common
>> + variables:
>> + TARGET: aarch64
>> +
>> +migration-compat-x86_64:
>> + extends: .migration-compat-common
>> + variables:
>> + TARGET: x86_64
>
>
> What about the others archs, s390x and ppc ? Do you lack the resources
> or are there any problems to address ?
Currently s390x and ppc are only tested on KVM. Which means they are not
tested at all unless someone runs migration-test on a custom runner. The
same is true for this test.
The TCG tests have been disabled:
/*
* On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG
* is touchy due to race conditions on dirty bits (especially on PPC for
* some reason)
*/
/*
* Similar to ppc64, s390x seems to be touchy with TCG, so disable it
* there until the problems are resolved
*/
It would be great if we could figure out what these issues are and fix
them so we can at least test with TCG like we do for aarch64.
Doing a TCG run of migration-test with both archs (one binary only, not
this series):
- ppc survived one run, taking 6 minutes longer than x86/Aarch64.
- s390x survived one run, taking 40s less than x86/aarch64.
I'll leave them enabled on my machine and do some runs here and there,
see if I spot something. If not, we can consider re-enabling them once
we figure out why ppc takes so long.