Re: GCC Buildbot Update - Definition of regression

Martin Sebor Wed, 11 Oct 2017 08:31:30 -0700

    PASS -> ANY             ; Test moves away from PASS


No, only a regression if the destination result is FAIL (if it's
UNRESOLVED then there might be a separate regression - execution test
becoming UNRESOLVED should be accompanied by compilation becoming FAIL).
If it's XFAIL, it might formally be a regression, but one already being
tracked in another way (presumably Bugzilla) which should not turn the bot
red.  If it's XPASS, that simply means XFAILing conditions slightly wider
than necessary in order to mark failure in another configuration as
expected.

My suggestion is:

PASS -> FAIL is an unambiguous regression.

Anything else -> FAIL and new FAILing tests aren't regressions at the
individual test level, but may be treated as such at the whole testsuite
level.


I don't have a strong opinion on the definition of a Regression
in this context but I would very much like to see status changes
highlighted in the test results to indicate that something that
worked before no longer works as well, to help us spot the kinds
of problems I've run into and a had trouble with.  (Showing the
SVN revision number along with each transition would be great.)
Here are a couple of examples.

A recent change of mine caused a test in the target_supports.exp
file to fail to detect attribute ifunc support.  That in turn
prevented regression tests for the attribute from being compiled
(changed them from PASS to UNSUPPORTED) which ultimately masked
a bug my change had introduced.

My script that looks for regressions in my own test results would
normally catch this before I commit such a change.  Unfortunately,
the script ignores results with the UNSUPPORTED status, so this
bug slipped in unnoticed.

Regardless of whether or not these types of errors are considered
Regressions, highlighting them perhaps in different colors would
be helpful.

Any transition where the destination result is not FAIL is not a
regression.

ERRORs in the .sum or .log files should be watched out for as well,
however, as sometimes they may indicate broken Tcl syntax in the
testsuite, which may cause many tests not to be run.


Yes, please.  I had a problem happen with a test with a bad DejaGnu
directive.  The test failed in an non-obvious way (I think it caused
an ERROR in the log) which caused a small number of tests that ran
after it to fail.  Because of parallel make (I run tests with make
-j96) the failing tests changed from one run of the test suite to
the next and the whole problem ended up being quite hard to debug.
(The ultimate root cause was a stray backslash in a dj-warning
directive introduced by copying and pasting between an Emacs session
in one terminal and a via session in another.  The backslash was in
column 80 and so virtually impossible to see.)

Martin

Re: GCC Buildbot Update - Definition of regression

Reply via email to