Also note this has happened before. mccr8 was looking into similar
leak-checking-is-totally-busted-but-nobody-noticed issues a few years ago
in https://bugzilla.mozilla.org/show_bug.cgi?id=1045316

Glad to hear you're looking into end-to-end tests!

-e

On Thu, Dec 29, 2016 at 8:37 AM, Andrew Halberstadt <
[email protected]> wrote:

> Over the holidays, we noticed that leaks in mochitest and reftest were not
> turning jobs orange, and that the test harnesses had been running in that
> state for quite some time. During this time several leak related test
> failures have landed, which can be tracked with this dependency tree:
> https://bugzilla.mozilla.org/showdependencytree.cgi?id=
> 1325148&hide_resolved=0
>
> The issue causing jobs to remain green has been fixed, however the known
> leak regressions had to be whitelisted to allow this fix to land. So while
> future leak regressions will properly fail, the existing ones (in the
> dependency tree) still need to be fixed. For mochitest, the whitelist can
> be found here:
> https://dxr.mozilla.org/mozilla-central/source/
> testing/mochitest/runtests.py#2218
>
> Other than that, leak checking is only disabled on linux crashtests.
>
> Please take a quick look to see if there is a leak in a component for
> which you could help out. I will continue to help with triage and bisection
> for the remaining issues until they are all fixed. Also big thanks to all
> the people who are currently working on a fix or have already landed a fix.
>
> Read on only if you are interested in the details.
>
>
>
> *Why wasn't this caught earlier? *
> The short answer to this question is that we do not have adequate testing
> of our CI.
>
> The problem happened at the intersection between mozharness and the test
> harnesses. Basically a change in mozharness exposed a latent bug in the
> test harnesses, and was able to land because it appeared as if nothing went
> wrong. Catching errors like this is tricky because regular unit tests would
> not have detected it either. It requires integration tests of the CI system
> as a whole (spanning test harnesses, mozharness and buildbot/taskcluster).
>
>
> *How will we prevent this in the future?*
>
> Historically, integration testing our test harnesses has been a hard
> problem. However with recent work in taskcluster, python tests and some
> refactoring on the build frontend, I believe there is a path forward that
> will allow us to stand up this kind of test. I will commit some of my time
> to fix this and hope to have *something* running that would have caught
> this by the end of Q1.
>
> I would also like to stand up a test harness designed to test command line
> applications in CI, which would provide another avenue for writing test
> harness unit and integration tests. Bug 1311991
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1311991> will track this
> work.
>
> It is important that developers are able to trust our tests, and when bugs
> like this happen, that trust is eroded. For that I'd like to apologize, and
> express my hope that this will be the last time a major test result bug
> like this happens again. At the very least, we need to have the capability
> of adding a regression test when a bug like this happens in the future.
>
> Thanks for your help and understanding.
> - Andrew
>
> _______________________________________________
> firefox-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/firefox-dev
>
>
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to