Daniel P. Berrangé <berra...@redhat.com> writes:

> On Fri, Oct 18, 2024 at 10:46:55AM +0100, Peter Maydell wrote:
>> On Fri, 18 Oct 2024 at 10:01, Daniel P. Berrangé <berra...@redhat.com> wrote:
>> >
>> > On Thu, Oct 17, 2024 at 01:29:35PM -0300, Fabiano Rosas wrote:
>> > > Daniel P. Berrangé <berra...@redhat.com> writes:
>> > >
>> > > > On Thu, Oct 17, 2024 at 11:32:11AM -0300, Fabiano Rosas wrote:
>> > > >> Recent changes to how we invoke the migration tests have
>> > > >> (intentionally) caused them to not be part of the check-qtest target
>> > > >> anymore. Add the check-migration-quick target so we don't lose
>> > > >> migration code testing in this job.
>> > > >
>> > > > But 'check-migration-quick' is only the subset of migration tests,
>> > > > 'check-migration' is all of the migration tests. So surely this is
>> > > > a massive regressions in covage in CI pipelines.
>> > >
>> > > I'm not sure it is. There are tests there already for all the major
>> > > parts of the code: precopy, postcopy, multifd, socket. Besides, we can
>> > > tweak migration-quick to cover spots where we think we're losing
>> > > coverage.
>> >
>> > Each of the tests in migration-test  were added for a good reason,
>> > generally to address testing gaps where we had functional regressions
>> > in the past. I don't think its a good idea to stop running such tests
>> > in CI as gating on new contributions. Any time we've had optional
>> > tests in QEMU, we've seen repeated regressions in the area in question.
>> >
>> > > Since our CI offers nothing in terms of reproducibility or
>> > > debuggability, I don't think it's productive to have an increasing
>> > > amount of tests running in CI if that means we'll be dealing with
>> > > timeouts and intermittent crashes constantly.
>> >
>> > Test reliability is a different thing. If a particular test is
>> > flaky, it needs to either be fixed or disabled. Splitting into
>> > a fast & slow grouping doesn't address reliability, just hides
>> > the problem from view.
>> 
>> A lot of the current reliability issue is timeouts -- sometimes
>> our CI runners just run really slow (I have seen an example where
>> between a normal and a slow run on the same commit both the
>> compile and test times were 10x different...) So any test
>> that is not a fast-to-complete is much much more likely to
>> hit its timeout if the runner is running slowly. When I am
>> doing CI testing for merges "migration test timed out again"
>> is really really common.
>
> If its frequently timing out, then we've got the timeouts
> wrong, or we have some genuine bugs in there to be fixed.
>
>> > > No disagreement here. But then I'm going to need advice on what to do
>> > > when other maintainers ask us to stop writing migration tests because
>> > > they take too long. I cannot send contributors away nor merge code
>> > > without tests.
>> >
>> > In general, I think it is unreasonable for other maintainers to
>> > tell us to stop adding test coverage for migration, and would
>> > push back against such a request.
>> 
>> We do not have infinite CI resources, unfortunately. Migration
>> is competing with everything else for time on CI. You have to
>> find a balance between "what do we run every time" and "what
>> do we only run when specifically testing a migration pullreq".
>> Similarly, there's a lot of iotests but we don't run all of them
>> for every block backend for every CI job via "make check".
>
> The combos we don't run for iotests are a good source of
> regressions too :-(
>
>> Long test times for tests run under "make check" are also bad
>> for individual developers -- if I'm running "make check" to
>> test a target/arm change I've made I don't really want that
>> to then spend 15 minutes testing the migration code that
>> I haven't touched and that is vanishingly unlikely to be
>> affected by my patches.
>
> Migration-test *used* to take 15 minutes to run, but that was a
> very long time ago. A run of it today is around 1m20.
>
> That said, if you are building multiple system emulators, we
> run the same test multiple times, and with the number of
> targets we have, that will be painful.
>
> That could be a good reason to split the migration-test into
> two distinct programs. One program that runs for every target,
> and one that is only run once, for some arbitrary "primary"
> target ?

What do you mean by distinct programs? It's not the migration-test that
decides on which targets it runs, it's meson.build. We register a test()
for each target, same as with any other qtest. Maybe I misunderstood
you...

>  Or could we make use of glib's g_test_thorough
> for this - a primary target runs with "SPEED=through" and
> all other targets with normal settings. That would give us
> a way to optimize any of the qtests to reduce redundant
> testing where appropriate.

This still requires a new make target I think. Otherwise we'd run *all*
thorough tests for a QEMU target and not only migration-test in thorough
mode.

>
>
> If we move alot of testing out into a migration unit test,
> this also solves the redundancy problem.
>
>
> With regards,
> Daniel

Reply via email to