On Wed, Aug 15, 2012 at 3:06 PM, Filip Pizlo <[email protected]> wrote: > Apparently I was somewhat unclear. Let me restate. We have the following > mechanisms available when a test fails: > > 1) Check in a new -expected.* file. > > 2) Modify the test. > > 3) Modify a TestExpectations file. > > 4) Add the test to a Skipped file. > > 5) Remove the test entirely. > > I have no problem with (1) unless it is intended to mark the test as > expected-to-fail-but-not-crash. I agree that using -expected.* to accomplish > what TestExpectations accomplishes is not valuable, but I further believe > that even TestExpectations is not valuable. > > I broadly prefer (2) whenever possible. > > I believe that (3) and (4) are redundant, and I don't buy the value of (3). > > I don't like (5) but we should probably do more of it for tests that have a > chronically low signal-to-noise ratio. >
Thank you for clarifying. I had actually written an almost identical list but didn't send it, so I think we're on the same page at least as far as understanding the problem goes ... So, I would describe my suggestion as an improved variant of the kind of (1) that can be used as "expected-to-fail-but-not-crash" (which I'll call 1-fail), and that we would use this in cases where we use (3), (4), or (1-fail) today. I would also agree that we should do (2) where possible, but I don't think this is easily possible for a large class of tests, especially pixel tests, although I am currently working on other things that will hopefully help here. Chromium certainly does a lot of (3) today, and some (1-fail). Other ports definitely use (1-fail) or (4) today, because (2) is rarely possible for many, many tests. We know that doing (1-fail), (3), or (4) causes real maintenance woes down the road, but also that doing (1-fail) or (3) catches real problems that simply skipping the test would not -- at some cost. Whether the benefit is worth the cost, is not known, of course, but I believe it is. I am hoping that my suggestion will have a lower overall cost than doing (1-fail) or (3). > You're proposing a new mechanism. I'm arguing that given the sheer number of > tests, and the overheads associated with maintaining them, (4) is the broadly > more productive strategy in terms of bugs-fixed/person-hours. And, > increasing the number of mechanisms for dealing with tests by 20% is likely > to reduce overall productivity rather than helping anyone. > Why do you believe this to be true? I'm not being flippant here ... I think this is a very plausible argument, and it may well be true, but I don't know what the criteria we would use to evaluate it are. Some of the possible factors are: * the complexity of the test infrastructure and the cognitive load it introduces on developers * the cost of bugs that are missed because we're skipping the tests intended to catch those bugs * the cost of looking at "regressions" and trying to figure out if the regression is something you care about or not * the cost of looking at the "-expected" results and trying to figure out if what is "expected" is correct or not There may be others as well, but the last three are all very real in my experience, and I believe they significantly outweigh the first one, but I don't know how to objectively assess that (and I don't think it's even possible since different people/teams/ports will weigh these things differently). -- Dirk _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

