I also agree about coalescing better. We are looking at ways to do that in conjunction with https://wiki.mozilla.org/Auto-tools/Projects/Autoland, which we'll have a prototype of by the end of the quarter. In this model, commits that are going through autoland could be coalesced when landing on inbound, which would reduce slave load on all platforms.

Until that's deployed and in widespread use, we have other options to decrease slave load, and this experiment is the simplest. It won't result in reduced test coverage, since sheriffs will backfill in the case of a regression. Essentially, we're not running tests that would have passed anyway.

Depending on feedback we receive after this experiment, we may opt to change our approach in the future: i.e., run tests every Nth opt build instead of debug build, or try to identify sets of "never failing" tests and just run those less frequently, or always include at least one flavor of Windows, OSX and Linux on every commit, etc.

Regards,

Jonathan


On 8/19/2014 1:55 PM, Benoit Girard wrote:
I completely agree with Jeff Gilbert on this one.

I think we should try to coalesce -better-. I just checked the current
state of mozilla-inbound and it doesn't feel any of the current patch
really need their own set of tests because they're are not time
sensitive or sufficiently complex. Right now developers are asked to
create bugs for their own change with their own patch. This leads to a
lot of little patches being landed by individual developers which
seems to reflect the current state of mozilla-inbound.

Perhaps we should instead promote checkin-needed (or a similar simple)
to coalesce simple changes together. Opting into this means that your
patch may take significantly longer to get merged if it's landed with
another bad patch and should only be used when that's acceptable.
Right now developers with commit access are not encouraged to make use
of checkin-needed AFAIK. If we started recommending against individual
landings for simple changes, and improved the process, we could
probably significantly cut the number of tests jobs by cutting the
number of pushes.

On Tue, Aug 19, 2014 at 3:57 PM, Jeff Gilbert <jgilb...@mozilla.com> wrote:
I would actually say that debug tests are more important for continuous 
integration than opt tests. At least in code I deal with, we have a ton of 
asserts to guarantee behavior, and we really want test coverage with these via 
CI. If a test passes on debug, it should almost certainly pass on opt, just 
faster. The opposite is not true.

"They take a long time and then break" is part of what I believe caused us to 
not bother with debug testing on much of Android and B2G, which we still haven't 
completely fixed. It should be unacceptable to ship without CI on debug tests, but here 
we are anyways. (This is finally nearly fixed, though there is still some work to do)

I'm not saying running debug tests less often is on the same scale of bad, but 
I would like to express my concerns about heading in that direction.

-Jeff

----- Original Message -----
From: "Jonathan Griffin" <jgrif...@mozilla.com>
To: dev-platform@lists.mozilla.org
Sent: Tuesday, August 19, 2014 12:22:21 PM
Subject: Experiment with running debug tests less often on mozilla-inbound      
the week of August 25

Our pools of test slaves are often at or over capacity, and this has the
effect of increasing job coalescing and test wait times.  This, in turn,
can lead to longer tree closures caused by test bustage, and can cause
try runs to be very slow to complete.

One of the easiest ways to mitigate this is to run tests less often.

To assess the impact of doing this, we will be performing an experiment
the week of August 25, in which we will run debug tests on
mozilla-inbound on most desktop platforms every other run, instead of
every run as we do now.  Debug tests on linux64 will continue to run
every time.  Non-desktop platforms and trees other than mozilla-inbound
will not be affected.

This approach is based on the premise that the number of debug-only
platform-specific failures on desktop is low enough to be manageable,
and that the extra burden this imposes on the sheriffs will be small
enough compared to the improvement in test slave metrics to justify the
cost.

While this experiment is in progress, we will be monitoring job
coalescing and test wait times, as well as impacts on sheriffs and
developers.  If the experiment causes sheriffs to be unable to perform
their job effectively, it can be terminated prematurely.

We intend to use the data we collect during the experiment to inform
decisions about additional tooling we need to make this or a similar
plan permanent at some point in the future, as well as validating the
premise on which this experiment is based.

After the conclusion of this experiment, a follow-up post will be made
which will discuss our findings.  If you have any concerns, feel free to
reach out to me.

Jonathan

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to