On Thu, Aug 16, 2012 at 2:05 PM, Dirk Pranke <[email protected]> wrote:
> I think your observations are correct, but at least my experience as a > gardener/sheriff leads me to a different conclusion. Namely, when I'm > looking at a newly failing test, it is difficult if not impossible for > me to know if the existing baseline was previously believed to be > correct or not, and thus it's hard for me to tell if the new baseline > should be considered worse, better, or different. How does the proposal solve this problem? Right now gardeners have two options: - Rebaseline - Add a test expectation Once we implemented the proposal, we have at least three options: - Rebaseline correct.png - Add/rebaseline expected.png - Add/rebaseline failure.png - (Optionally) Add a test expectation. And that's a lot of options to choose from. The more options we have, the more likely people make mistakes. We're already inconsistent in how -expected.png is used because some people make mistakes. I'm afraid that adding another set of expected results result in even more mistakes and a unrecoverable mess. This is why I want to test this theory :). It seems like if we got > experience with this on one (or more) ports for a couple of months we > would have a much more well-informed opinion, and I'm not seeing a > huge downside to at least trying this idea out. > Sure. But if we're doing this experiment on trunk, thereby imposing significant cognitive load on every other port, then I'd like to see us setting *both* an exact date at which we decide whether this approach is good or not and criteria by which we decide this *before *the experiment starts. Like Filip, I'm *extremely* concerned about the prospect of us introducing yet-another-way-of-doing-things, and not be able to get rid of it later. - Ryosuke
_______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

