On Tue, 2025-07-01 at 23:04 +0100, Joern Wolfgang Rennecke wrote:
> Quite often I see a test quickly written to test some new feature
> (bug 
> fix, extension or optimization) that has a couple of functions to
> cover 
> various aspects of the feature, checked all together with a single 
> scan-tree-dump-times, scan-rtl-dump-times etc. check, using the
> expected 
> value for the target of the test writer.
> Or worse, it's all packed into one giant function, with unpredictable
> interactions between the different pieces of code.  I think we have
> less 
> of those recently, but please don't interpret this post as a
> suggestion 
> to fall back to this practice.
> 
> Quite often it turns out that the feature applies only to some of the
> functions / sites on some targets.  The first reaction is often to 
> create multiple copies of the scan-*-dump-times stanza, with mutually
> exclusive conditions for each copy, which might look harmless when
> there 
> are only two cases, but as more are added, it quickly turns into an
> unmaintainable mess of lots dejagnu directives with complicated.
> 
> This can get even worse if different targets can get the compiler the
> pattern multiple times for the same piece of source, like for 
> vectorization that is tried with different vectorization factors.
> 
> I think we should discuss what is best practice to address these 
> problems efficiently, and to preferably write new tests avoiding them
> in the first place.
> 
> When each function has a single site per feature where success is
> given
> if the pattern appears at least once, a straightforward solution that
> has already been used a number of times is to split the test into 
> multiple smaller tests.  The main disadvantages of this approach are
> that a large set of small files can clutter the directory where they 
> appear, making it less maintainable, and that the compiler is invoked
> more often, generally with the same set of include files read each
> time, 
> thus making the test runs slower.
> 
> Another approach would be to use source line numbers, where present
> and 
> distinctive, to add to the scan pattern to make it specific to the
> site
> under concern.  That should, for instance, work for vectorization 
> scan-tree-dump-times tests.  The disadvantage of that approach is
> that 
> the tests become more brittle, as the line numbers would have to be 
> adjusted whenever the line numbers of the source site change, like
> when 
> new include files, dejagnu directives at the file start, or typedefs
> are 
> needed.

Brainstorming some ideas on other possible approaches on making our
tests less brittle; for context I did some investigation back in 2018
about implementing "optimizations remarks" like clang does: diagnostics
about optimization decisions, so you could have a dg directive like
this on a particular line:

  foo ();  /* { dg-remark "inlined call to 'foo' into 'bar'" } */

which eventually became this series of patches:

[PATCH 00/10] RFC: Prototype of compiler-assisted performance analysis
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-05/msg01675.html

[PATCH] v3 of optinfo, remarks and optimization records
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-06/msg01267.html

[PATCH 0/2] v4: optinfo framework and remarks
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg00066.html

[PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00446.html

where the "remark" idea eventually got dropped in favor of optimization
records (compressed json), which landed in GCC 9 as -fsave-
optimization-record.

Ideas for further approaches:
(a) we could revisit adding optimization remarks: perhaps the dump
subsystem could be extended so it also reports diagnostics, and we
could have DejaGnu directives that check for a remark relating to a
particular source line
(b) have a script that reads the compressed json and turns it into
something that's queryable from DejaGnu tests.  This might be more
flexible in that it can potentially distinguish between different
copies of code (e.g. due to different inlinings sites), but might be
less easy to work with in terms of testsuite management.

Another idea: perhaps a new dump format for RTL that resembles
diagnostics, with line information, and then use per-line dg directives
on that so that e.g. you can test that at a particular line we do or
don't have some particular construct after a given RTL pass (e.g. that
the asm for a particular line does/doesn't match a regex).

Hope this is constructive
Dave

> 
> Maybe we could get the best of both worlds if we add a new dump
> option?
> Say, if we make that option add the (for polymorphic languages like
> C++: 
> mangled) name of the current function to each dumped line that is 
> interesting to scan for.  Or just every line, if that's simpler.
> 

Reply via email to