Thanks to the amazing work of Outreachy participant Nikki Sharpley, our test result logging infrastructure now supports marking tests as having multiple expected statuses. This can be used instead of disabling tests where the results are intermittent. Harness-level support is currently available in the web-platform-tests harness, but the feature could be added to any of our mozlog-using test harnesses.

== Bugzilla ==

This work is tracked in bug 1552879 [1]; if you notice any problems with the changes, please file bugs that block that one.

== Technical Details ==

This change adds a new field; `known_intermittent` to the mozlog `test_end` and `test_status` actions. This field contains a list of statuses that represent known intermittent results of the test. The `expected` field remains unaffected, so this is a backwards-compatible change, and known mozlog consumers have been updated, but any other consumers may require patches to correctly handle known intermittent statuses.

The in-tree logger formatters, including the tbpl formatter that’s used in CI, have been updated so that statuses matching a known intermittent are called out explicitly with e.g. `TEST-KNOWN-INTERMITTENT`. The mozharness code has been updated so that known intermittents don’t turn jobs orange.

== Integration with wpt ==

wpt tests can be annotated with multiple statuses in the metadata files, by replacing the single `expected` value with a list e.g.

```
[test.html]
  expected:
    if os == “linux”: [PASS, FAIL]
    [PASS, TIMEOUT]
```

In this case the expected status is PASS in all cases and FAIL is a known intermittent on Linux and TIMEOUT is a known intermittent on all other platforms. This is documented on MDN [2].

The sync bot has been updated to prefer marking imported tests that are intermittent with multiple statuses rather than disabling them.

== Isn’t this a bad idea? ==

Obviously marking tests with multiple statuses represents a loss of coverage compared to having a single status for each test. In general it’s always preferable to fix a test or fix gecko so that tests reliably return a single result. However marking tests with multiple statuses is a clear improvement over disabling tests:

* Continuing to run the test allows us to detect more serious regressions like crashes * It is possible to detect cases where the annotation is inaccurate because some of the statuses are no longer recorded (e.g. a [PASS, FAIL] that starts passing all the time or failing all the time).

The fact that marked tests will no longer turn treeherder orange does mean that, initially at least, we won’t have the sheriffing data we currently have to know when an intermittent becomes more frequent or turns into a permafail. For this reason it makes sense to only mark tests with known intermittent statuses in cases they would otherwise be disabled and not for tests that only fail very infrequently.

== So I should never disable a wpt? ==

In general the order of priority is:
* First try to fix the underlying issue
* If that isn’t possible then mark as intermittent
* If marking as intermittent is insufficient e.g. because the test is affecting others in the job, then it should be disabled.

== Future Work ==

Nikki will spend the remainder of her internship starting work on the tooling to detect cases where the range of allowed statuses doesn’t match the range of observed statuses for a specific test. This will unlock the possibility to auto-remove superfluous expectations, or flag tests that show intermittent statuses more frequently as likely regressions in the future.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1552879
[2] https://developer.mozilla.org/en-US/docs/Mozilla/QA/web-platform-tests#Metadata
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to