On Wed, Nov 6, 2013 at 5:49 PM, Ryan VanderMeulen <rya...@gmail.com> wrote:
> What do we gain by having results that can't be trusted?

The same that we gain from allowing any try pushes that don't run
every single test.  It's a tradeoff between reliability and time, not
black-and-white.  For instance, if I change a few lines in a .cpp file
in editor/, I know there are a limited number of tests it could
plausibly affect, and there's no reason to run mochitests in gfx/ just
so I can run mochitests in editor/ and dom/imptests/editing/.

Likewise, if I got try failures that I can't reproduce locally and
pushed a revised patch to test the fix, it would be nice to run just
the affected tests.  More than once I've submitted a whole series of
patches that had to run a whole test suite instead of just a few
tests, and therefore took 20 minutes or so more than necessary per
iteration.

On Wed, Nov 6, 2013 at 6:46 PM, Ryan VanderMeulen <rya...@gmail.com> wrote:
> I'm just afraid we're going to end up in the same situation we're already in
> with intermittent failures where the developer looks at it and says "that
> couldn't possibly be from me" and ignores it. We already see "Try results
> look good" backouts on a depressingly-regular basis.

The entire situation with how intermittent failures are handled
strikes me as mostly a technical problem.  Known intermittent failures
should be flagged and automatically suppressed, not require manual
judgment calls every single time.  To ensure that they don't get made
non-intermittent, they could be automatically rerun a couple of times
(just the file, not the whole suite) if they fail to make sure they
pass at least once, and get reported as a real failure if they fail
five times in a row or something.  Trying to persuade people to be
careful of something that isn't a problem 90% of the time is a losing
battle -- the signal-to-noise ratio needs to be a lot higher before
people will pay attention.

On Wed, Nov 6, 2013 at 9:56 PM, Steve Fink <sf...@mozilla.com> wrote:
> As for "faster", I'm skeptical of the reliability of incremental try
> builds. We have too many clobbers. And the same slaves do a bunch of
> different build types, so it may have been quite a while since the last
> build of the type you're doing. (Sure, we could tweak the scheduling to
> add some affinity in, but that's more complexity and a richer set of
> failure patterns.)

Right, good point.  I didn't think it through.

> I'm still a bit curious as to whether allowing slaves
> to collaborate via a fast distcc network going to remote ccaches would
> work, but it also feels complex and potentially counterproductive (I'm
> not at all sure that it's faster to look up and transfer a remotely
> cached build result than it would be to recompile it locally.)

Why would it not be faster, in terms of throughput?  The network
transfer should use roughly no CPU time, and builds are mostly
CPU-bound, right?  Even in terms of latency, I wouldn't expect hashing
input file plus making local network request plus transferring
response to take more than a millisecond if the cached file is in
memory, and not much longer if it's on an SSD, so I don't see how
actually compiling the file could compete except for very small files.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to