On Aug 15, 2012, at 4:02 PM, Dirk Pranke <[email protected]> wrote:
> On Wed, Aug 15, 2012 at 3:06 PM, Filip Pizlo <[email protected]> wrote: >> Apparently I was somewhat unclear. Let me restate. We have the following >> mechanisms available when a test fails: >> >> 1) Check in a new -expected.* file. >> >> 2) Modify the test. >> >> 3) Modify a TestExpectations file. >> >> 4) Add the test to a Skipped file. >> >> 5) Remove the test entirely. >> >> I have no problem with (1) unless it is intended to mark the test as >> expected-to-fail-but-not-crash. I agree that using -expected.* to >> accomplish what TestExpectations accomplishes is not valuable, but I further >> believe that even TestExpectations is not valuable. >> >> I broadly prefer (2) whenever possible. >> >> I believe that (3) and (4) are redundant, and I don't buy the value of (3). >> >> I don't like (5) but we should probably do more of it for tests that have a >> chronically low signal-to-noise ratio. >> > > Thank you for clarifying. I had actually written an almost identical > list but didn't send it, so I think we're on the same page at least as > far as understanding the problem goes ... > > So, I would describe my suggestion as an improved variant of the kind > of (1) that can be used as "expected-to-fail-but-not-crash" (which > I'll call 1-fail), and that we would use this in cases where we use > (3), (4), or (1-fail) today. > > I would also agree that we should do (2) where possible, but I don't > think this is easily possible for a large class of tests, especially > pixel tests, although I am currently working on other things that will > hopefully help here. > > Chromium certainly does a lot of (3) today, and some (1-fail). Other > ports definitely use (1-fail) or (4) today, because (2) is rarely > possible for many, many tests. > > We know that doing (1-fail), (3), or (4) causes real maintenance woes > down the road, but also that doing (1-fail) or (3) catches real > problems that simply skipping the test would not -- at some cost. > Whether the benefit is worth the cost, is not known, of course, but I > believe it is. I am hoping that my suggestion will have a lower > overall cost than doing (1-fail) or (3). I also believe that the trade-off is known and, and specifically, I believe that the cost of having any tests in the (1-fail) or (3) states is more costly than having them in (4) or (5). > >> You're proposing a new mechanism. I'm arguing that given the sheer number of >> tests, and the overheads associated with maintaining them, (4) is the >> broadly more productive strategy in terms of bugs-fixed/person-hours. And, >> increasing the number of mechanisms for dealing with tests by 20% is likely >> to reduce overall productivity rather than helping anyone. >> > > Why do you believe this to be true? I'm not being flippant here ... I > think this is a very plausible argument, and it may well be true, but > I don't know what the criteria we would use to evaluate it are. Some > of the possible factors are: > > * the complexity of the test infrastructure and the cognitive load it > introduces on developers > * the cost of bugs that are missed because we're skipping the tests > intended to catch those bugs > * the cost of looking at "regressions" and trying to figure out if the > regression is something you care about or not > * the cost of looking at the "-expected" results and trying to figure > out if what is "expected" is correct or not > > There may be others as well, but the last three are all very real in > my experience, and I believe they significantly outweigh the first > one, but I don't know how to objectively assess that (and I don't > think it's even possible since different people/teams/ports will weigh > these things differently). I believe that the cognitive load is greater than any benefit from catching bugs incidentally by continuing to run a (1-fail) or (3) test, and continuing to evaluate whether or not the expectation matches some notions of desired behavior. And therein lies one possible source of disagreement. But there is another source of disagreement: would adding a sixth facility that overlaps with (1-fail) or (3) help? No, I don't believe it would. It's just another mechanism leading to more possible arguments about which mechanism is better. -F _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

