Re: [webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Filip Pizlo Wed, 15 Aug 2012 16:59:09 -0700

On Aug 15, 2012, at 4:02 PM, Dirk Pranke <[email protected]> wrote:


> On Wed, Aug 15, 2012 at 3:06 PM, Filip Pizlo <[email protected]> wrote:
>> Apparently I was somewhat unclear.  Let me restate.  We have the following 
>> mechanisms available when a test fails:
>> 
>> 1) Check in a new -expected.* file.
>> 
>> 2) Modify the test.
>> 
>> 3) Modify a TestExpectations file.
>> 
>> 4) Add the test to a Skipped file.
>> 
>> 5) Remove the test entirely.
>> 
>> I have no problem with (1) unless it is intended to mark the test as 
>> expected-to-fail-but-not-crash.  I agree that using -expected.* to 
>> accomplish what TestExpectations accomplishes is not valuable, but I further 
>> believe that even TestExpectations is not valuable.
>> 
>> I broadly prefer (2) whenever possible.
>> 
>> I believe that (3) and (4) are redundant, and I don't buy the value of (3).
>> 
>> I don't like (5) but we should probably do more of it for tests that have a 
>> chronically low signal-to-noise ratio.
>> 
> 
> Thank you for clarifying. I had actually written an almost identical
> list but didn't send it, so I think we're on the same page at least as
> far as understanding the problem goes ...
> 
> So, I would describe my suggestion as an improved variant of the kind
> of (1) that can be used as "expected-to-fail-but-not-crash" (which
> I'll call 1-fail), and that we would use this in cases where we use
> (3), (4), or (1-fail) today.
> 
> I would also agree that we should do (2) where possible, but I don't
> think this is easily possible for a large class of tests, especially
> pixel tests, although I am currently working on other things that will
> hopefully help here.
> 
> Chromium certainly does a lot of (3) today, and some (1-fail). Other
> ports definitely use (1-fail) or (4) today, because (2) is rarely
> possible for many, many tests.
> 
> We know that doing (1-fail), (3), or (4) causes real maintenance woes
> down the road, but also that doing (1-fail) or (3) catches real
> problems that simply skipping the test would not -- at some cost.
> Whether the benefit is worth the cost, is not known, of course, but I
> believe it is. I am hoping that my suggestion will have a lower
> overall cost than doing (1-fail) or (3).

I also believe that the trade-off is known and, and specifically, I believe 
that the cost of having any tests in the (1-fail) or (3) states is more costly 
than having them in (4) or (5).

> 
>> You're proposing a new mechanism. I'm arguing that given the sheer number of 
>> tests, and the overheads associated with maintaining them, (4) is the 
>> broadly more productive strategy in terms of bugs-fixed/person-hours.  And, 
>> increasing the number of mechanisms for dealing with tests by 20% is likely 
>> to reduce overall productivity rather than helping anyone.
>> 
> 
> Why do you believe this to be true? I'm not being flippant here ... I
> think this is a very plausible argument, and it may well be true, but
> I don't know what the criteria we would use to evaluate it are. Some
> of the possible factors are:
> 
> * the complexity of the test infrastructure and the cognitive load it
> introduces on developers
> * the cost of bugs that are missed because we're skipping the tests
> intended to catch those bugs
> * the cost of looking at "regressions" and trying to figure out if the
> regression is something you care about or not
> * the cost of looking at the "-expected" results and trying to figure
> out if what is "expected" is correct or not
> 
> There may be others as well, but the last three are all very real in
> my experience, and I believe they significantly outweigh the first
> one, but I don't know how to objectively assess that (and I don't
> think it's even possible since different people/teams/ports will weigh
> these things differently).

I believe that the cognitive load is greater than any benefit from catching 
bugs incidentally by continuing to run a (1-fail) or (3) test, and continuing 
to evaluate whether or not the expectation matches some notions of desired 
behavior.

And therein lies one possible source of disagreement.

But there is another source of disagreement: would adding a sixth facility that 
overlaps with (1-fail) or (3) help?  No, I don't believe it would.  It's just 
another mechanism leading to more possible arguments about which mechanism is 
better.

-F

_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Reply via email to