Re: [PATCH] tester: Limit simultaneous QEMU jobs to 1

Kinsey Moore Tue, 31 Aug 2021 19:25:13 -0700

On 8/31/2021 18:00, Chris Johns wrote:

On 31/8/21 6:30 pm, Sebastian Huber wrote:

On 31/08/2021 09:00, Chris Johns wrote:

On 31/8/21 3:20 pm, Sebastian Huber wrote:

On 30/08/2021 20:32, Kinsey Moore wrote:

On 8/30/2021 12:12, Sebastian Huber wrote:

On 24/08/2021 20:45, Kinsey Moore wrote:

diff --git a/tester/rtems/testing/bsps/a53_ilp32_qemu.ini
b/tester/rtems/testing/bsps/a53_ilp32_qemu.ini
index 3beba06..581c59c 100644
--- a/tester/rtems/testing/bsps/a53_ilp32_qemu.ini
+++ b/tester/rtems/testing/bsps/a53_ilp32_qemu.ini
@@ -36,3 +36,4 @@ bsp           = a53_ilp32_qemu
    arch          = aarch64
    tester        = %{_rtscripts}/qemu.cfg
    bsp_qemu_opts = %{qemu_opts_base} -serial mon:stdio -machine
virt,gic-version=3 -cpu cortex-a53 -m 4096
+jobs          = 1

Does this overwrite the command line option or is this a default value?

When this is set in the tester configuration, the command line switch has no
effect but it can be overridden in the user-config.

Overruling the command line option is not that great. I have a vastly different
test run duration with --jobs=1 vs. --jobs=48 with more or less the same test
results.

What does more or less mean?

On Qemu some tests have no reliable outcome. If I run with --jobs=48 only two of
these tests fail compare to --jobs=1.

It seems the experience varies between archs and hosts. It is the origin of this
patch series.

I appreciate the efforts Kinsey has gone to looking into why we have this
happening and I also believe we need to keep pushing towards repeatable result.
If limiting to 1 gives us repeatable results on qemu then I prefer this over
tainted test results with intermittent tags.

During development waiting one minute is much better than waiting 13 minutes.
Repeatable tests is one aspect, but there are other aspects too. Overruling
command line options is not that great. If you run with default values, it is
all right to trade off repeatable results against a fast test run. However, if I
want to run with --jobs=N, I want to run with N jobs and not just one.

Yes I agree. How we manage this so it is apparent seems to be the key issue 
here.

I think this option should be split into a "force-jobs" and
"default-jobs" option.

I am sorry I do not understand these options?

force-jobs forces the jobs to N regardless of what is specified on the command
line. Maybe a warning or error should be emitted if the command line option
conflicts with the configuration option.

default-jobs selects the job count if no --jobs command line option is 
specified.

What about adding a `max-job` field which is 0 for no limit? This cannot be
exceeded?

Then `default-jobs` can be used as the default, again 0 means no liimit?

The command line is ignored because and the value is fixed on purpose and I am
not seeing a reason to change this.

Ignoring command line options is not really a pleasant user experience.

Yes it is not. It was added in a hurry without much though when I added the TFTP
support.

When specified in a config it is a physical limit. A user being able to change
the number of TFTP jobs on the command line does not make sense.

Yes, for physical limits this makes sense.

We need to manage the managed this case for new users.

This tool's focus is testing on hardware and I see that as more important. And
as I have said before if we have problematic tests maybe the test or the tool
generating the results needs to be investigated.

I see this issue as something specific to the design of qemu and a few of our
tests. I can guess at some of the reasons qemu does this but also being able to
have the tick timer's clock be sync'ed with the CPU clock is important in some
types of simulation, ie our case and these problematic test. We are a real-time
operating system so needing this to be right to closer in simulation does not
seem unreasonable.

This discussion send a clear message, tier 1 archs and BSPs are very important
to this project.

There are several ways to address the sporadic test failures on Qemu. You could
for example also change the tests to make them independent of the simulator
timing. For now, my approach would be to change the default jobs count for the
Qemu BSPs and still let the user overrule the default with a custom value to get
for example a faster test run.

This is sensible. In summary:

1. Add `max-jobs` as a config file only settings with a default of 0

2. Change the config `jobs` to `default-jobs` again with 0 as the default 
default.

3. Let the command line override the default jobs and raise an error if over the
maximum jobs allowed.

4. Provide a clear notice at the start and end of a run if the jobs used do not
match the default.


I'll work toward this solution.


Kinsey

_______________________________________________
devel mailing list
[email protected]
http://lists.rtems.org/mailman/listinfo/devel

Re: [PATCH] tester: Limit simultaneous QEMU jobs to 1

Reply via email to