On Fri, Oct 26, 2012 at 12:43 PM, Alexey Proskuryakov <[email protected]> wrote: > > 26.10.2012, в 11:04, Antti Koivisto <[email protected]> написал(а): > >> The reality is that this "test coverage" today shows up as flakiness and >> so is ignored anyway, meaning we don't actually have useful coverage here. >> Even when flakiness is investigated, the "fix" is to cache-bust using unique >> URL params, which just means we "lose" the coverage you describe for that >> test, anyway. > > > I think that this is the real issue here. Test flakiness is very important > to investigate, this often leads to discovery of bad bugs, including > security ones. The phrase "flaky test" often misplaces the blame. > > When making cache related changes I have frequently found bugs from my > patches because some seemingly random test started failing and I > investigated. Without the test coverage some of those bugs would probably > now be in the tree. >
I agree strongly with both of these sentiments. My experience, though, is that most people are disinclined to actually spend the time figuring out why a test is flaky; in addition, it can be very difficult to even figure out if a test has just started becoming flaky or if your change introduced flakiness. As a result, people tend to just suppress or skip over flaky tests. > > I agree with Antti. Finding regressions is what tests are for, and it would > be difficult to make enough explicit tests to compensate for such loss of > coverage. It would certainly be very unfortunate to lose test coverage > without even an attempt to compensate for that. Because of what I've written above, having flaky tests is causing us to lose coverage today. So, I suspect that with this change we'll be able to unsuppress a number of failures and re-gain lost coverage happening now. Whether this offsets Antti's concern, I am not informed enough to know. Moreover, in my experience, flaky tests cause far more pain than they are worth, and as a result it is much more important to get tests that run consistently every time than it is to keep running tests that cause intermittent failures. I believe this is a generally accepted industry / QA principle (i.e., I don't think I'm in a minority here). A corollary of this is that a change that fixes or removes test flakiness is valued highly, even if it causes the underlying problems to stop manifesting themselves. Of course, we have to balance the desire to find bugs against other sources of productivity gain and loss as well. For example, there is no question that running the layout tests in parallel increases test flakiness as well, and yet we think that that is an acceptable tradeoff generally (although perhaps not everyone agrees with this choice). Given all this, it seems like Elliot's suggestion is a near-ideal compromise. You can have your cake and eat it ... we get less flakiness by default, and if you want to test for more flakiness / additional code paths, you can still do so. Perhaps a slight variant of this is that we can agree to make the changes on the Chromium port to clear the cache (much like the Qt and EFL ports already do), and you can continue to not clear the cache on the Apple Mac port until you feel comfortable that you've added additional tests? WDYT? -- Dirk _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

