Support for annotating intermittent tests with multiple statuses

James Graham Thu, 25 Jul 2019 11:47:39 -0700

Thanks to the amazing work of Outreachy participant Nikki Sharpley, ourtest result logging infrastructure now supports marking tests as havingmultiple expected statuses. This can be used instead of disabling testswhere the results are intermittent. Harness-level support is currentlyavailable in the web-platform-tests harness, but the feature could beadded to any of our mozlog-using test harnesses.


== Bugzilla ==

This work is tracked in bug 1552879 [1]; if you notice any problems withthe changes, please file bugs that block that one.


== Technical Details ==

This change adds a new field; `known_intermittent` to the mozlog`test_end` and `test_status` actions. This field contains a list ofstatuses that represent known intermittent results of the test. The`expected` field remains unaffected, so this is a backwards-compatiblechange, and known mozlog consumers have been updated, but any otherconsumers may require patches to correctly handle known intermittentstatuses.

The in-tree logger formatters, including the tbpl formatter that’s usedin CI, have been updated so that statuses matching a known intermittentare called out explicitly with e.g. `TEST-KNOWN-INTERMITTENT`. Themozharness code has been updated so that known intermittents don’t turnjobs orange.


== Integration with wpt ==

wpt tests can be annotated with multiple statuses in the metadata files,by replacing the single `expected` value with a list e.g.


```
[test.html]
  expected:
    if os == “linux”: [PASS, FAIL]
    [PASS, TIMEOUT]
```

In this case the expected status is PASS in all cases and FAIL is aknown intermittent on Linux and TIMEOUT is a known intermittent on allother platforms. This is documented on MDN [2].

The sync bot has been updated to prefer marking imported tests that areintermittent with multiple statuses rather than disabling them.


== Isn’t this a bad idea? ==

Obviously marking tests with multiple statuses represents a loss ofcoverage compared to having a single status for each test. In generalit’s always preferable to fix a test or fix gecko so that tests reliablyreturn a single result. However marking tests with multiple statuses isa clear improvement over disabling tests:

* Continuing to run the test allows us to detect more seriousregressions like crashes* It is possible to detect cases where the annotation is inaccuratebecause some of the statuses are no longer recorded (e.g. a [PASS, FAIL]that starts passing all the time or failing all the time).

The fact that marked tests will no longer turn treeherder orange doesmean that, initially at least, we won’t have the sheriffing data wecurrently have to know when an intermittent becomes more frequent orturns into a permafail. For this reason it makes sense to only marktests with known intermittent statuses in cases they would otherwise bedisabled and not for tests that only fail very infrequently.


== So I should never disable a wpt? ==

In general the order of priority is:
* First try to fix the underlying issue
* If that isn’t possible then mark as intermittent

* If marking as intermittent is insufficient e.g. because the test isaffecting others in the job, then it should be disabled.


== Future Work ==

Nikki will spend the remainder of her internship starting work on thetooling to detect cases where the range of allowed statuses doesn’tmatch the range of observed statuses for a specific test. This willunlock the possibility to auto-remove superfluous expectations, or flagtests that show intermittent statuses more frequently as likelyregressions in the future.


[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1552879

[2]https://developer.mozilla.org/en-US/docs/Mozilla/QA/web-platform-tests#Metadata

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Support for annotating intermittent tests with multiple statuses

Reply via email to