Re: GCC Buildbot Update - Definition of regression

Joseph Myers Wed, 11 Oct 2017 06:19:29 -0700

On Wed, 11 Oct 2017, Paulo Matos wrote:

> On 10/10/17 23:25, Joseph Myers wrote:
> > On Tue, 10 Oct 2017, Paulo Matos wrote:
> > 
> >>     new test -> FAIL        ; New test starts as fail
> > 
> > No, that's not a regression, but you might want to treat it as one (in the 
> > sense that it's a regression at the higher level of "testsuite run should 
> > have no unexpected failures", even if the test in question would have 
> > failed all along if added earlier and so the underlying compiler bug, if 
> > any, is not a regression).  It should have human attention to classify it 
> > and either fix the test or XFAIL it (with issue filed in Bugzilla if a 
> > bug), but it's not a regression.  (Exception: where a test failing results 
> > in its name changing, e.g. through adding "(internal compiler error)".)
> > 
> 
> When someone adds a new test to the testsuite, isn't it supposed to not
> FAIL? If is does FAIL, shouldn't this be considered a regression?


Only a regression at the whole-testsuite level (in that "no FAILs" is the 
desired state).  Not a regression in the sense of a regression bug in GCC 
that might be relevant for release management (something user-visible that 
worked in a previous GCC version but no longer works).  And if e.g. 
someone added a dg-require-effective-target (for example) line to a 
testcase, so incrementing all the line numbers in that test, every PASS / 
FAIL assertion in that test will have its line number increase by 1, so 
being renamed, so resulting in spurious detection of a regression if you 
consider new FAILs as regressions (even at the whole-testsuite level, an 
increased line number on an existing FAIL is not meaningfully a 
regression).

> For this reason all of this issues need to be taken care straight away

Well, I think it *does* make sense to do sufficient analysis on existing 
FAILs to decide if they are testsuite issues or compiler bugs, fix if they 
are testsuite issues and XFAIL with reference to a bug in Bugzilla if 
compiler bugs.  That is, try to get to the point where no-FAILs is the 
normal expected testsuite state and it's Bugzilla, not 
expected-FAILs-not-marked-as-XFAIL, that is used to track regressions and 
other bugs.

> By not being unique, you mean between languages?

Yes (e.g. c-c++-common tests in both gcc and g++ tests might have the same 
name in both .sum files, but should still be counted as different tests).

> I assume that two gcc.sum from different builds will always refer to the
> same test/configuration when referring to (for example):
> PASS: gcc.c-torture/compile/20000105-1.c   -O1  (test for excess errors)

The problem is when e.g. multiple diagnostics are being tested for on the 
same line but the "test name" field in the dg-* directive is an empty 
string for all of them.  One possible approach is to automatically (in 
your regression checking scripts) append a serial number to the first, 
second, third etc. cases of any given repeated test name in a .sum file.  
Or you could count such duplicates as being errors that automatically 
result in red test results, and get fixes for them into GCC as soon as 
possible.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: GCC Buildbot Update - Definition of regression

Reply via email to