On Thu, Aug 16, 2012 at 3:50 PM, Stephen Chenney <[email protected]> wrote: > On Thu, Aug 16, 2012 at 6:15 PM, Ryosuke Niwa <[email protected]> wrote: >> >> On Thu, Aug 16, 2012 at 3:04 PM, Dirk Pranke <[email protected]> wrote: >>> >>> On Thu, Aug 16, 2012 at 2:23 PM, Ryosuke Niwa <[email protected]> wrote: >>> > Like Filip, I'm extremely concerned about the prospect of us >>> > introducing >>> > yet-another-way-of-doing-things, and not be able to get rid of it >>> > later. >>> >>> Presumably the only way we'd be not able to get rid of it would be if >>> some port actually liked it, in which case, why would we get rid of it? >> >> >> Because we already have too many ways to deal with test failures. >> >> Why don't we at least merge Skipped and TestExpectations first. >> > > I agree with the priorities above, at least. I also agree with the overall > goal of making our implementation match our philosophy on testing. > > Ryosuke has raised a very valid point: it is not clear what a Gardener > should do with a test that has changed behavior. Personally, the area I deal > with is very susceptible to the "single pixel differences matter" issue, and > I would prefer gardeners to flag updates in some way so that I can look at > it myself (or one of the very few other experts can look at it). Maybe I'm > just a control freak. > > In the current situation, gardeners are not likely to roll out a patch short > of build failures, obvious test or perf regressions, or security issues. For > the bulk of cases where a test result has changed, the gardener will either > add it to expectations or go ahead and update the result. > > The result is a mixture of incorrect expected results and files listed in > TestExpectations that may or may not be correct. The way I deal with this > right now is to maintain a snapshot of TestExpectations and occasionally go > through and look for newly added results, and check if they are in fact > correct. And every now and then go look at older entries to see if they have > been updated. Or I get lucky and notice that a new expected result has been > checked in that is incorrect (I can mostly catch that by following checkins > in the code). > > The proposed change does not alter this fundamental dynamic. Depending on > what policy gardeners adopt, they might now rename the result as failing and > remove the expected, or maybe they'll just update the expected. I'm still in > a situation where I don't know the exact status of any given result, and > where I have no easy way of knowing if a gardener has correctly updated an > expected or failing image.
So, if we had -passing/-failing, then people who knew what the results were supposed to be in a given directory (call them "owners") could rename the existing -expected files into one category or another, and then a gardener could add newly-failing tests as -expected; it would then be trivial for the owner to look for new -expected files in that directory. Right now, an owner would have to periodically scan the directory for files changed since the last time they scanned the directory, and would have no way of knowing whether an updated file was "correct" or not (perhaps you could filter out by who committed the change, but it's certainly harder than ls *-expected). So, I would contend that my proposal would make it easier for us to have a process by which gardeners could punt on changes they're unsure about and for owners to subsequently review them. I think you could have a similar model if we just checked in *all* new results into -expected, but it would be harder to implement (perhaps not impossible, though, I haven't given it much thought yet). On the other hand, if you follow the chromium model of just suppressing the file, then we have no idea if the failure you currently see has any relationship to the failure that was originally seen, or if the file that is currently checked in has any relationship to what "correct" might now need to be (we know that the file was correct at some point but is now stale). So, in the chromium model, at least, you can do a grep through TestExpectations to look for failing tests, but you don't have the history you would get by looking at a history of changes to -expected/-passing/-failing. In a world where all failures are triaged promptly by qualified owners, all of this mechanism is unnecessary. Unfortunately, we don't live in that world at the moment. -- Dirk > > Two questions come out of this: > 1) What policy do we want gardeners to follow? Giving "one more option" is > not helpful as far as I can tell. The new method does nothing that > TestExpectations could not already do, so I suppose the goal is to not put > known failing tests in expectations. I believe it does do something TestExpectations cannot do, see above, and yes, we would then not put known failing tests in TestExpectations. > 2) How will it be simpler to know if something is really failing, really > correct, or whether a non-expert gardener made an understandable mistake? > This part I'm not sure about but I believe that it would be easier to guess whether the new result is newly failing, potentially failing differently, or potentially passing, if we know how the prior result was categorized, assuming it was categorized correctly. Perhaps this implies a much stronger level of saying that you shouldn't categorize something as -passing or -failing unless you're *sure* about what you're doing. I think I have already addressed your remaining comments ... -- Dirk _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

