Thanks to the amazing work of Outreachy participant Nikki Sharpley, our
test result logging infrastructure now supports marking tests as having
multiple expected statuses. This can be used instead of disabling tests
where the results are intermittent. Harness-level support is currently
available in the web-platform-tests harness, but the feature could be
added to any of our mozlog-using test harnesses.
== Bugzilla ==
This work is tracked in bug 1552879 [1]; if you notice any problems with
the changes, please file bugs that block that one.
== Technical Details ==
This change adds a new field; `known_intermittent` to the mozlog
`test_end` and `test_status` actions. This field contains a list of
statuses that represent known intermittent results of the test. The
`expected` field remains unaffected, so this is a backwards-compatible
change, and known mozlog consumers have been updated, but any other
consumers may require patches to correctly handle known intermittent
statuses.
The in-tree logger formatters, including the tbpl formatter that’s used
in CI, have been updated so that statuses matching a known intermittent
are called out explicitly with e.g. `TEST-KNOWN-INTERMITTENT`. The
mozharness code has been updated so that known intermittents don’t turn
jobs orange.
== Integration with wpt ==
wpt tests can be annotated with multiple statuses in the metadata files,
by replacing the single `expected` value with a list e.g.
```
[test.html]
expected:
if os == “linux”: [PASS, FAIL]
[PASS, TIMEOUT]
```
In this case the expected status is PASS in all cases and FAIL is a
known intermittent on Linux and TIMEOUT is a known intermittent on all
other platforms. This is documented on MDN [2].
The sync bot has been updated to prefer marking imported tests that are
intermittent with multiple statuses rather than disabling them.
== Isn’t this a bad idea? ==
Obviously marking tests with multiple statuses represents a loss of
coverage compared to having a single status for each test. In general
it’s always preferable to fix a test or fix gecko so that tests reliably
return a single result. However marking tests with multiple statuses is
a clear improvement over disabling tests:
* Continuing to run the test allows us to detect more serious
regressions like crashes
* It is possible to detect cases where the annotation is inaccurate
because some of the statuses are no longer recorded (e.g. a [PASS, FAIL]
that starts passing all the time or failing all the time).
The fact that marked tests will no longer turn treeherder orange does
mean that, initially at least, we won’t have the sheriffing data we
currently have to know when an intermittent becomes more frequent or
turns into a permafail. For this reason it makes sense to only mark
tests with known intermittent statuses in cases they would otherwise be
disabled and not for tests that only fail very infrequently.
== So I should never disable a wpt? ==
In general the order of priority is:
* First try to fix the underlying issue
* If that isn’t possible then mark as intermittent
* If marking as intermittent is insufficient e.g. because the test is
affecting others in the job, then it should be disabled.
== Future Work ==
Nikki will spend the remainder of her internship starting work on the
tooling to detect cases where the range of allowed statuses doesn’t
match the range of observed statuses for a specific test. This will
unlock the possibility to auto-remove superfluous expectations, or flag
tests that show intermittent statuses more frequently as likely
regressions in the future.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1552879
[2]
https://developer.mozilla.org/en-US/docs/Mozilla/QA/web-platform-tests#Metadata
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform