Re: wptview [Re: e10s]

James Graham Sat, 09 Jan 2016 14:31:07 -0800

On 09/01/16 15:43, Benjamin Smedberg wrote:

On 1/8/2016 6:02 PM, James Graham wrote:

On 08/01/16 22:41, Robert O'Callahan wrote:

On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg
<benja...@smedbergs.us>
wrote:

What are the implications of this?

The web-platform tests are pass/fail, right? So is it a bug if they
pass
but have different behaviors in e10s and non-e10s mode?


Yeah, I'm confused.

If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.


Maybe I should clarify.

web-platform-tests are slightly different to most tests in that we run
both tests we currently pass and tests that we currently don't pass.
On treeherder all we check is that we got the same result in this run
as we expected on the basis of previous runs.


Is this "same as previous run" behavior automatic, or manually
annotated? Running tests which don't pass is supported and common on
many other test suites: fail-if and random-if are used to mark tests as
a known fail but still run them.

It is semi-automated. There are explicit annotations in separatemetadata files for each test which have to be updated by hand (or usingoutput from running the affected tests) when a feature introducesdifferent test results (e.g. by fixing tests or adding new non-passingones), but which are automatically generated using a try run for anotherwise known-good build when we update to a new version of thetestsuite from upstream.

Is this a temporary state, where the end goal is to have the
web-platform tests use similar manifests to all of our other tests? Can
you provide some context about why web-platform tests are using
different infrastructure from everywhere else?

I will first note that "similar manifests to our other tests" isn't veryspecific; we already use multiple manifest formats. I will assume youmean manifestparser manifests as used for mochitest, but note thatweb-platform-tests contain a mix of both js-based tests and reftests, soa single existing format would be insufficient; below I will mainlyconcentrate on the tests that could be well described bymanifestparser-style manifests, although much applies to both cases.

Unfortunately the web-platform-tests have some rather differentconstraints to other testsuites that make the manifestparser formatinsufficient.

web-platform-tests js tests can contain multiple tests per file(sometimes many thousands), so purely per-file metadata is inadequate.As I understand it, for other test types we supplement this with in-fileannotations. In order for us to bidirectionally sync web-platform-testsit is essential that we never make local modifications to the test filesother than intentional bugfixes or additions that are suitable to beupstreamed. This means that we must be able to set the expected resultfor each subtest (i.e. individual testcase within a file) in a separatelocal-only file. This is not supported in manifestparser files, nor didit seem easy to add.

The restriction on not modifying tests also means that things like prefscannot be set in the tests themselves; it is convenient to use the sameexpectation data files to store this additional information. Rather moretrivially web-platform-tests may CRASH or ERROR in production, whichother test types cannot. Obviously support for this would be easier toadd to manifestparser, but support not existing prevents confusion forthe multiple test types where the feature wouldn't make sense.

At this point I don't see any real advantages to trying to move tomanifestparser for all web-platform-tests and many drawbacks, so I don'tthink it will happen. I am also not convinced that it's very relevant tothe problem at hand; I don't see how the different manifest format iscausing any issues. Indeed, now that most of our testsuites producestructured log output, you don't actually need to look at the inputmanifests at all.

The right thing to do is look at the log files produced from a test run.This is what wptview provides a GUI for, and what the test-informanttool ahal mentioned elsewhere does on the backend, but anyone with alittle bit of time and a basic knowledge of the mozlog format (andtreeherder API, perhaps) could have a go at making a one-off tool toanswer this specific question efficiently. To do this one would consumeall the structured logs for the e10s and non-e10s jobs on a push, andlook for cases where the result is different for the same test in thetwo run types (this would also cover disabled tests that are recorded asSKIP).

The effect of all of this is that in order to understand what's
actually needed to bring e10s builds up to par with non-e10s builds
you need to look at the actual test results rather than just the list
of disabled tests. I believe that there are both instances of tests
that pass in non-e10s but not in e10s builds, and the reverse. wptview
gives you the ability to do that using data directly from treeherder.
The actual action to take on the basis of this data is obviously
something for the people working on e10s to determine.


This is not the responsibility of the e10s team; this is an all-hands
effort as we switch to making e10s the default configuration and soon
the only configuration for Firefox. If having different results for e10s
and non-e10s is not expected, who is the module owner/responsible for
the web platform tests and can create a list of the problem results and
find owners to get each one fixed?

web-platform-tests isn't a module, but I am responsible for it. I candraw up a list of differing results when I am back at work. Howeverusing the instructions provided by Kalpesh it's possible for anyone, forexample each module owner, to get the information about any affectedtests (e.g. in their module) immediately.

In general it makes more sense to me that we would treat the results ofweb-platform-tests in the same way we treat the results of, say,mochitests i.e. with people who own a specific area of code takingresponsibility for our behaviour with respect to the tests. This is notto say that in the specific case of making a complete list of allbehaviour differences between two configurations, as here, it might notmake sense to split up the work on a one-person-per-suite basis, but inother cases I really don't want people to feel like this testsuite is"someone else's problem" and so not notice when it is finding bugs incode they own.

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: wptview [Re: e10s]

Reply via email to