On 09/01/16 15:43, Benjamin Smedberg wrote:
On 1/8/2016 6:02 PM, James Graham wrote:
On 08/01/16 22:41, Robert O'Callahan wrote:
On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg
<benja...@smedbergs.us>
wrote:

What are the implications of this?

The web-platform tests are pass/fail, right? So is it a bug if they
pass
but have different behaviors in e10s and non-e10s mode?


Yeah, I'm confused.

If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.

Maybe I should clarify.

web-platform-tests are slightly different to most tests in that we run
both tests we currently pass and tests that we currently don't pass.
On treeherder all we check is that we got the same result in this run
as we expected on the basis of previous runs.

Is this "same as previous run" behavior automatic, or manually
annotated? Running tests which don't pass is supported and common on
many other test suites: fail-if and random-if are used to mark tests as
a known fail but still run them.

It is semi-automated. There are explicit annotations in separate metadata files for each test which have to be updated by hand (or using output from running the affected tests) when a feature introduces different test results (e.g. by fixing tests or adding new non-passing ones), but which are automatically generated using a try run for an otherwise known-good build when we update to a new version of the testsuite from upstream.

Is this a temporary state, where the end goal is to have the
web-platform tests use similar manifests to all of our other tests? Can
you provide some context about why web-platform tests are using
different infrastructure from everywhere else?

I will first note that "similar manifests to our other tests" isn't very specific; we already use multiple manifest formats. I will assume you mean manifestparser manifests as used for mochitest, but note that web-platform-tests contain a mix of both js-based tests and reftests, so a single existing format would be insufficient; below I will mainly concentrate on the tests that could be well described by manifestparser-style manifests, although much applies to both cases.

Unfortunately the web-platform-tests have some rather different constraints to other testsuites that make the manifestparser format insufficient.

web-platform-tests js tests can contain multiple tests per file (sometimes many thousands), so purely per-file metadata is inadequate. As I understand it, for other test types we supplement this with in-file annotations. In order for us to bidirectionally sync web-platform-tests it is essential that we never make local modifications to the test files other than intentional bugfixes or additions that are suitable to be upstreamed. This means that we must be able to set the expected result for each subtest (i.e. individual testcase within a file) in a separate local-only file. This is not supported in manifestparser files, nor did it seem easy to add.

The restriction on not modifying tests also means that things like prefs cannot be set in the tests themselves; it is convenient to use the same expectation data files to store this additional information. Rather more trivially web-platform-tests may CRASH or ERROR in production, which other test types cannot. Obviously support for this would be easier to add to manifestparser, but support not existing prevents confusion for the multiple test types where the feature wouldn't make sense.

At this point I don't see any real advantages to trying to move to manifestparser for all web-platform-tests and many drawbacks, so I don't think it will happen. I am also not convinced that it's very relevant to the problem at hand; I don't see how the different manifest format is causing any issues. Indeed, now that most of our testsuites produce structured log output, you don't actually need to look at the input manifests at all.

The right thing to do is look at the log files produced from a test run. This is what wptview provides a GUI for, and what the test-informant tool ahal mentioned elsewhere does on the backend, but anyone with a little bit of time and a basic knowledge of the mozlog format (and treeherder API, perhaps) could have a go at making a one-off tool to answer this specific question efficiently. To do this one would consume all the structured logs for the e10s and non-e10s jobs on a push, and look for cases where the result is different for the same test in the two run types (this would also cover disabled tests that are recorded as SKIP).

The effect of all of this is that in order to understand what's
actually needed to bring e10s builds up to par with non-e10s builds
you need to look at the actual test results rather than just the list
of disabled tests. I believe that there are both instances of tests
that pass in non-e10s but not in e10s builds, and the reverse. wptview
gives you the ability to do that using data directly from treeherder.
The actual action to take on the basis of this data is obviously
something for the people working on e10s to determine.

This is not the responsibility of the e10s team; this is an all-hands
effort as we switch to making e10s the default configuration and soon
the only configuration for Firefox. If having different results for e10s
and non-e10s is not expected, who is the module owner/responsible for
the web platform tests and can create a list of the problem results and
find owners to get each one fixed?

web-platform-tests isn't a module, but I am responsible for it. I can draw up a list of differing results when I am back at work. However using the instructions provided by Kalpesh it's possible for anyone, for example each module owner, to get the information about any affected tests (e.g. in their module) immediately.

In general it makes more sense to me that we would treat the results of web-platform-tests in the same way we treat the results of, say, mochitests i.e. with people who own a specific area of code taking responsibility for our behaviour with respect to the tests. This is not to say that in the specific case of making a complete list of all behaviour differences between two configurations, as here, it might not make sense to split up the work on a one-person-per-suite basis, but in other cases I really don't want people to feel like this testsuite is "someone else's problem" and so not notice when it is finding bugs in code they own.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to