Yea, I definitely agree with you there. Is this going to end up with an @expectedFlakeyWindows, @expectedFlakeyLinux, @expectedFlakeyDarwin, @expectedFlakeyAndroid, @expectedFlakeyFreeBSD?
It's starting to get a little crazy, at some point I think we just need something that we can use like this: @test_status(status=flaky, host=[win, linux, android, darwin, bsd], target=[win, linux, android, darwin, bsd], compiler=[gcc, clang], debug_info=[dsym, dwarf, dwo]) On Mon, Oct 19, 2015 at 4:35 PM Todd Fiala <todd.fi...@gmail.com> wrote: > My initial proposal was an attempt to not entirely skip running them on > our end and still get them to generate actionable signals without > conflating them with unexpected successes (which they absolutely are not in > a semantic way). > > On Mon, Oct 19, 2015 at 4:33 PM, Todd Fiala <todd.fi...@gmail.com> wrote: > >> Nope, I have no issue with what you said. We don't want to run them over >> here at all because we don't see enough useful info come out of them. You >> need time series data for that to be somewhat useful, and even then it only >> is useful if you see a sharp change in it after a specific change. >> >> So I really don't want to be running flaky tests at all as their signals >> are not useful on a per-run basis. >> >> On Mon, Oct 19, 2015 at 4:16 PM, Zachary Turner <ztur...@google.com> >> wrote: >> >>> Don't get me wrong, I like the idea of running flakey tests a couple of >>> times and seeing if one passes (Chromium does this too as well, so it's not >>> without precedent). If I sounded harsh, it's because I *want* to be harsh >>> on flaky tests. Flaky tests indicate literally the *worst* kind of bugs >>> because you don't even know what kind of problems they're causing in the >>> wild, so by increasing the amount of pain they cause people (test suite >>> running longer, etc) the hope is that it will motivate someone to fix it. >>> >>> On Mon, Oct 19, 2015 at 4:04 PM Todd Fiala <todd.fi...@gmail.com> wrote: >>> >>>> Okay, so I'm not a fan of the flaky tests myself, nor of test suites >>>> taking longer to run than needed. >>>> >>>> Enrico is going to add a new 'flakey' category to the test >>>> categorization. >>>> >>>> Scratch all the other complexity I offered up. What we're going to ask >>>> is if a test is flakey, please add it to the 'flakey' category. We won't >>>> do anything different with the category by default, so everyone will still >>>> get flakey tests running the same manner they do now. However, on our test >>>> runners, we will be disabling the category entirely using the >>>> skipCategories mechanism since those are generating too much noise. >>>> >>>> We may need to add a per-test-method category mechanism since right now >>>> our only mechanism to add categories (1) specify a dot-file to the >>>> directory to have everything in it get tagged with a category, or (2) >>>> override the categorization for the TestCase getCategories() mechanism. >>>> >>>> -Todd >>>> >>>> On Mon, Oct 19, 2015 at 1:03 PM, Zachary Turner <ztur...@google.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev < >>>>> lldb-dev@lists.llvm.org> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I'd like unexpected successes (i.e. tests marked as unexpected >>>>>> failure that in fact pass) to retain the actionable meaning that >>>>>> something >>>>>> is wrong. The wrong part is that either (1) the test now passes >>>>>> consistently and the author of the fix just missed updating the test >>>>>> definition (or perhaps was unaware of the test), or (2) the test is not >>>>>> covering the condition it is testing completely, and some change to the >>>>>> code just happened to make the test pass (due to the test being not >>>>>> comprehensive enough). Either of those requires some sort of adjustment >>>>>> by >>>>>> the developers. >>>>>> >>>>> I'dd add #3. The test is actually flaky but is tagged incorrectly. >>>>> >>>>> >>>>>> >>>>>> We have a category of test known as "flaky" or "flakey" (both are >>>>>> valid spellings, for those who care: >>>>>> http://www.merriam-webster.com/dictionary/flaky, although flaky is >>>>>> considered the primary). Flaky tests are tests that we can't get to pass >>>>>> 100% of the time. This might be because it is extremely difficult to >>>>>> write >>>>>> the test as such and deemed not worth the effort, or it is a condition >>>>>> that >>>>>> is just not going to present itself successfully 100% of the time. >>>>>> >>>>> IMO if it's not worth the effort to write the test correctly, we >>>>> should delete the test. Flaky is useful as a temporary status, but if >>>>> nobody ends up fixing the flakiness, I think the test should be deleted >>>>> (more reasons follow). >>>>> >>>>> >>>>> >>>>>> These are tests we still want to exercise, but we don't want to have >>>>>> them start generating test failures if they don't pass 100% of the time. >>>>>> Currently the flaky test mechanism requires a test to pass one in two >>>>>> times. That is okay for a test that exhibits a slim degree of flakiness. >>>>>> For others, that is not a large enough sample of runs to elicit a >>>>>> successful result. Those tests get marked as XFAIL, and generate a >>>>>> non-actionable "unexpected success" result when they do happen to pass. >>>>>> >>>>>> GOAL >>>>>> >>>>>> * Enhance expectedFlakey* test decorators. Allow specification of >>>>>> the number of times in which a flaky test should be run to be expected to >>>>>> pass at least once. Call that MAX_RUNS. >>>>>> >>>>> I think it's worth considering it it's a good idea include the date at >>>>> which they were declared flakey. After a certain amount of time has >>>>> passed, if it's still flakey they can be relegated to hard failures. I >>>>> don't think flakey should be a permanent state. >>>>> >>>>> >>>>>> >>>>>> * When running a flaky test, run it up MAX_RUNS number of times. The >>>>>> first time it passes, mark it as a successful test completion. The test >>>>>> event system will be given the number of times it was run before passing. >>>>>> Whether we consume this info or not is TBD (and falls into the purview of >>>>>> the test results formatter). >>>>>> >>>>> >>>>>> * If the test does not pass within MAX_RUNS time, mark it as a flaky >>>>>> fail. For purposes of the standard output, this can look like FAIL: >>>>>> (flaky) or something similar so fail scanners still see it. (Note it's >>>>>> highly likely I'll do the normal output counts with the TestResults >>>>>> formatter-based output at the same time, so we get accurate test method >>>>>> counts and the like). >>>>>> >>>>> The concern I have here (and the reason I would like to delete flakey >>>>> tests if the flakiness isn't removed after certain amount of time) is >>>>> because some of our tests are slow. Repeating them many times is going to >>>>> have an impact on how long the test suite takes to run. It's already >>>>> tripled over the past 3 weeks, and I think we need to be careful to keep >>>>> out things that have the potential to lead to significant slowness of the >>>>> test suite runner. >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> -Todd >>>> >>> >> >> >> -- >> -Todd >> > > > > -- > -Todd >
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev