On 09/01/16 15:43, Benjamin Smedberg wrote:
On 1/8/2016 6:02 PM, James Graham wrote:
On 08/01/16 22:41, Robert O'Callahan wrote:
On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg
<benja...@smedbergs.us>
wrote:
What are the implications of this?
The web-platform tests are pass/fail, right? So is it a bug if they
pass
but have different behaviors in e10s and non-e10s mode?
Yeah, I'm confused.
If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.
Maybe I should clarify.
web-platform-tests are slightly different to most tests in that we run
both tests we currently pass and tests that we currently don't pass.
On treeherder all we check is that we got the same result in this run
as we expected on the basis of previous runs.
Is this "same as previous run" behavior automatic, or manually
annotated? Running tests which don't pass is supported and common on
many other test suites: fail-if and random-if are used to mark tests as
a known fail but still run them.
It is semi-automated. There are explicit annotations in separate
metadata files for each test which have to be updated by hand (or using
output from running the affected tests) when a feature introduces
different test results (e.g. by fixing tests or adding new non-passing
ones), but which are automatically generated using a try run for an
otherwise known-good build when we update to a new version of the
testsuite from upstream.
Is this a temporary state, where the end goal is to have the
web-platform tests use similar manifests to all of our other tests? Can
you provide some context about why web-platform tests are using
different infrastructure from everywhere else?
I will first note that "similar manifests to our other tests" isn't very
specific; we already use multiple manifest formats. I will assume you
mean manifestparser manifests as used for mochitest, but note that
web-platform-tests contain a mix of both js-based tests and reftests, so
a single existing format would be insufficient; below I will mainly
concentrate on the tests that could be well described by
manifestparser-style manifests, although much applies to both cases.
Unfortunately the web-platform-tests have some rather different
constraints to other testsuites that make the manifestparser format
insufficient.
web-platform-tests js tests can contain multiple tests per file
(sometimes many thousands), so purely per-file metadata is inadequate.
As I understand it, for other test types we supplement this with in-file
annotations. In order for us to bidirectionally sync web-platform-tests
it is essential that we never make local modifications to the test files
other than intentional bugfixes or additions that are suitable to be
upstreamed. This means that we must be able to set the expected result
for each subtest (i.e. individual testcase within a file) in a separate
local-only file. This is not supported in manifestparser files, nor did
it seem easy to add.
The restriction on not modifying tests also means that things like prefs
cannot be set in the tests themselves; it is convenient to use the same
expectation data files to store this additional information. Rather more
trivially web-platform-tests may CRASH or ERROR in production, which
other test types cannot. Obviously support for this would be easier to
add to manifestparser, but support not existing prevents confusion for
the multiple test types where the feature wouldn't make sense.
At this point I don't see any real advantages to trying to move to
manifestparser for all web-platform-tests and many drawbacks, so I don't
think it will happen. I am also not convinced that it's very relevant to
the problem at hand; I don't see how the different manifest format is
causing any issues. Indeed, now that most of our testsuites produce
structured log output, you don't actually need to look at the input
manifests at all.
The right thing to do is look at the log files produced from a test run.
This is what wptview provides a GUI for, and what the test-informant
tool ahal mentioned elsewhere does on the backend, but anyone with a
little bit of time and a basic knowledge of the mozlog format (and
treeherder API, perhaps) could have a go at making a one-off tool to
answer this specific question efficiently. To do this one would consume
all the structured logs for the e10s and non-e10s jobs on a push, and
look for cases where the result is different for the same test in the
two run types (this would also cover disabled tests that are recorded as
SKIP).
The effect of all of this is that in order to understand what's
actually needed to bring e10s builds up to par with non-e10s builds
you need to look at the actual test results rather than just the list
of disabled tests. I believe that there are both instances of tests
that pass in non-e10s but not in e10s builds, and the reverse. wptview
gives you the ability to do that using data directly from treeherder.
The actual action to take on the basis of this data is obviously
something for the people working on e10s to determine.
This is not the responsibility of the e10s team; this is an all-hands
effort as we switch to making e10s the default configuration and soon
the only configuration for Firefox. If having different results for e10s
and non-e10s is not expected, who is the module owner/responsible for
the web platform tests and can create a list of the problem results and
find owners to get each one fixed?
web-platform-tests isn't a module, but I am responsible for it. I can
draw up a list of differing results when I am back at work. However
using the instructions provided by Kalpesh it's possible for anyone, for
example each module owner, to get the information about any affected
tests (e.g. in their module) immediately.
In general it makes more sense to me that we would treat the results of
web-platform-tests in the same way we treat the results of, say,
mochitests i.e. with people who own a specific area of code taking
responsibility for our behaviour with respect to the tests. This is not
to say that in the specific case of making a complete list of all
behaviour differences between two configurations, as here, it might not
make sense to split up the work on a one-person-per-suite basis, but in
other cases I really don't want people to feel like this testsuite is
"someone else's problem" and so not notice when it is finding bugs in
code they own.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform