Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-06 Thread Hongyu Wang via Gcc-patches
> Do you know what of the three changes (preferring reps/stosb,
> CLEAR_RATIO and algorithm choice changes) cause the two speedups
> on eebmc?

A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP

This loop is transformed to builtin_memcpy and builtin_memset with size 280.

Current strategy for skylake is {512, unrolled_loop, false} for such
size, so it will generate unrolled loops with mov, while the patch
generates memcpy/memset libcall and uses vector move.

For idctrn01 it is memset with size 512. So the speedups come from
algorithm change.

H.J. Lu via Gcc-patches  于2021年4月6日周二 上午5:55写道:
>
> On Mon, Apr 5, 2021 at 2:14 PM Jan Hubicka  wrote:
> >
> > > >  /* skylake_cost should produce code tuned for Skylake familly of CPUs. 
> > > >  */
> > > >  static stringop_algs skylake_memcpy[2] =   {
> > > > -  {libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
> > > > -  {libcall, {{16, loop, false}, {512, unrolled_loop, false},
> > > > - {-1, libcall, false;
> > > > +  {libcall,
> > > > +   {{256, rep_prefix_1_byte, true},
> > > > +{256, loop, false},
> > > > +{-1, libcall, false}}},
> > > > +  {libcall,
> > > > +   {{256, rep_prefix_1_byte, true},
> > > > +{256, loop, false},
> > > > +{-1, libcall, false;
> > > >
> > > >  static stringop_algs skylake_memset[2] = {
> > > > -  {libcall, {{6, loop_1_byte, true},
> > > > - {24, loop, true},
> > > > - {8192, rep_prefix_4_byte, true},
> > > > - {-1, libcall, false}}},
> > > > -  {libcall, {{24, loop, true}, {512, unrolled_loop, false},
> > > > - {-1, libcall, false;
> > > > +  {libcall,
> > > > +   {{256, rep_prefix_1_byte, true},
> > > > +{256, loop, false},
> > > > +{-1, libcall, false}}},
> > > > +  {libcall,
> > > > +   {{256, rep_prefix_1_byte, true},
> > > > +{256, loop, false},
> > > > +{-1, libcall, false;
> > > >
> > >
> > > If there are no objections, I will check it in on Wednesday.
> >
> > On my skylake notebook if I run the benchmarking script I get:
> >
> > jan@skylake:~/trunk/contrib> ./bench-stringop 64 64000 gcc -march=native
> > memcpy
> >   block size  libcall rep1noalg   rep4noalg   rep8noalg   loop  
> >   noalg   unrlnoalg   sse noalg   bytePGO dynamicBEST
> >  8192000  0:00.23 0:00.21 0:00.21 0:00.21 0:00.21 0:00.22 0:00.24 
> > 0:00.28 0:00.22 0:00.20 0:00.21 0:00.19 0:00.19 0:00.77 0:00.18 0:00.18
> > 0:00.19 sse
> >   819200  0:00.09 0:00.18 0:00.18 0:00.18 0:00.18 0:00.18 0:00.20 
> > 0:00.19 0:00.16 0:00.15 0:00.16 0:00.13 0:00.14 0:00.63 0:00.09 0:00.09
> > 0:00.09 libcall
> >81920  0:00.06 0:00.07 0:00.07 0:00.06 0:00.06 0:00.06 0:00.06 
> > 0:00.12 0:00.11 0:00.11 0:00.10 0:00.07 0:00.08 0:00.66 0:00.11 0:00.06
> > 0:00.06 libcall
> >20480  0:00.06 0:00.07 0:00.05 0:00.06 0:00.07 0:00.07 0:00.08 
> > 0:00.14 0:00.14 0:00.10 0:00.11 0:00.06 0:00.07 0:01.11 0:00.07 0:00.09
> > 0:00.05 rep1noalign
> > 8192  0:00.06 0:00.05 0:00.04 0:00.05 0:00.06 0:00.07 0:00.07 
> > 0:00.12 0:00.15 0:00.11 0:00.10 0:00.06 0:00.06 0:00.64 0:00.06 0:00.05
> > 0:00.04 rep1noalign
> > 4096  0:00.05 0:00.05 0:00.05 0:00.06 0:00.07 0:00.05 0:00.05 
> > 0:00.09 0:00.14 0:00.11 0:00.10 0:00.07 0:00.06 0:00.61 0:00.05 0:00.07
> > 0:00.05 libcall
> > 2048  0:00.04 0:00.05 0:00.05 0:00.05 0:00.05 0:00.05 0:00.05 
> > 0:00.10 0:00.14 0:00.09 0:00.09 0:00.09 0:00.07 0:00.64 0:00.06 0:00.07
> > 0:00.04 libcall
> > 1024  0:00.06 0:00.08 0:00.08 0:00.10 0:00.11 0:00.06 0:00.06 
> > 0:00.12 0:00.15 0:00.09 0:00.09 0:00.16 0:00.09 0:00.63 0:00.05 0:00.06
> > 0:00.06 libcall
> >  512  0:00.06 0:00.07 0:00.08 0:00.12 0:00.08 0:00.10 0:00.09 
> > 0:00.13 0:00.16 0:00.10 0:00.10 0:00.28 0:00.18 0:00.66 0:00.13 0:00.08
> > 0:00.06 libcall
> >  256  0:00.10 0:00.12 0:00.11 0:00.14 0:00.11 0:00.12 0:00.13 
> > 0:00.14 0:00.16 0:00.13 0:00.12 0:00.49 0:00.30 0:00.68 0:00.14 0:00.12
> > 0:00.10 libcall
> >  128  0:00.15 0:00.19 0:00.18 0:00.20 0:00.19 0:00.20 0:00.18 
> > 0:00.19 0:00.21 0:00.17 0:00.15 0:00.49 0:00.43 0:00.72 0:00.17 0:00.17
> > 0:00.15 libcall
> >   64  0:00.29 0:00.28 0:00.29 0:00.33 0:00.33 0:00.34 0:00.29 
> > 0:00.25 0:00.29 0:00.26 0:00.26 0:01.01 0:00.97 0:01.13 0:00.32 0:00.28
> > 0:00.25 loop
> >   48  0:00.37 0:00.39 0:00.38 0:00.45 0:00.41 0:00.45 0:00.44 
> > 0:00.45 0:00.33 0:00.32 0:00.33 0:02.21 0:02.22 0:00.87 0:00.32 0:00.31
> > 0:00.32 unrl
> >   32  0:00.54 0:00.52 0:00.50 0:00.60 0:00.62 0:00.61 0:00.52 
> > 0:00.42 0:00.43 0:00.40 0:00.42 0:01.18 0:01.16 0:01.14 0:00.39 0:00.40
> > 0:00.40 unrl
> >   24  0:00.71 0:00.74 0:00.77 0:00.83 0:00.78 0:00.81 0:00.75 
> > 0:00.52 0:00.52 0:00.52 0:00.50 0:02.28 0:02.27 0:00.94 0:00.49 0:00.50
> > 0:00.50 unrlnoalign
> >   16  0:00.97 0:01.03 0:01

[committed] testsuite: Fix up g++.dg/ext/vector40.C test [PR97900]

2021-04-06 Thread Jakub Jelinek via Gcc-patches
On Sat, Apr 03, 2021 at 01:53:16AM -0400, Jason Merrill via Gcc-patches wrote:
> We were copying attributes from the template to the instantiation without
> considering that they might be dependent.  To make sure that the new parms
> have the appropriate properties for the code pattern, let's just regenerate
> them.

The test FAILs on i686-linux due to -Wpsabi diagnostics (when neither -mmmx
nor -msse is enabled).
Fixed the usual way, tested on x86_64-linux with 
-m32/-mno-mmx/-mno-sse,-m32,-m64
with both the PR97900 fix reverted and current trunk, committed to trunk as
obvious.

2021-04-06  Jakub Jelinek  

PR c++/97900
* g++.dg/ext/vector40.C: Add -Wno-psabi -w to dg-options.

--- gcc/testsuite/g++.dg/ext/vector40.C.jj  2021-04-03 10:00:54.309544456 
+0200
+++ gcc/testsuite/g++.dg/ext/vector40.C 2021-04-06 11:45:12.520060058 +0200
@@ -1,4 +1,5 @@
 // PR c++/97900
+// { dg-options "-Wno-psabi -w" }
 
 template
 T test(T __attribute__((vector_size(2 * sizeof(T vec) {

Jakub



Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-06 Thread Jan Hubicka
> > Do you know what of the three changes (preferring reps/stosb,
> > CLEAR_RATIO and algorithm choice changes) cause the two speedups
> > on eebmc?
> 
> A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP
> 
> This loop is transformed to builtin_memcpy and builtin_memset with size 280.
> 
> Current strategy for skylake is {512, unrolled_loop, false} for such
> size, so it will generate unrolled loops with mov, while the patch
> generates memcpy/memset libcall and uses vector move.

This is good - I originally set the table based on this
micro-benchmarking script and apparently glibc used at that time had
more expensive memcpy for small blocks.

One thing to consider is, however, that calling external memcpy has also
additional cost of clobbering all caller saved registers.  Especially
for code that uses SSE this is painful since all needs to go to stack in
that case. So I am not completely sure how representative the
micro-benchmark is to this respect since it does not use any SSE and
register pressure is generally small.

So with current glibc it seems libcall is win for blocks of size greater
than 64 or 128 at least if the register pressure is not big.
With this respect your change looks good.
> >
> > My patch generates "rep movsb" only in a very limited cases:
> >
> > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
> >load and store for up to 16 * 16 (256) bytes when the data size is
> >fixed and known.
> > 2. Inline only if data size is known to be <= 256.
> >a. Use "rep movsb/stosb" with a simple code sequence if the data size
> >   is a constant.
> >b. Use loop if data size is not a constant.

Aha, this is very hard to read from the algorithm descriptor.  So we
still have the check that maxsize==minsize and use rep mosb only for
constant sized blocks when the corresponding TARGET macro is defined.

I think it would be more readable if we introduced rep_1_byte_constant.
The descriptor is supposed to read as a sequence of rules where fist
applies.  It is not obvious that we have another TARGET_* macro that
makes rep_1_byte to be ignored in some cases.
(TARGET macro will also interfere with the microbenchmarking script).

Still I do not understand why compile time constant makes rep mosb/stosb
better than loop. Is it CPU special casing it at decoder time and
requiring explicit mov instruction? Or is it only becuase rep mosb is
not good for blocks smaller than 128bit?

> >
> > As a result,  "rep stosb" is generated only when 128 < data size < 256
> > with -mno-sse.
> >
> > > Do you have some data for blocks in size 8...256 to be faster with rep1
> > > compared to unrolled loop for perhaps more real world benchmarks?
> >
> > "rep movsb" isn't generated with my patch in this case since
> > MOVE_RATIO == 17 can copy up to 16 * 16 (256) bytes with
> > XMM registers.

OK, so I guess:
  {libcall,
   {{256, rep_1_byte, true},
{256, unrolled_loop, false},
{-1, libcall, false}}},
  {libcall,
   {{256, rep_1_loop, true},
{256, unrolled_loop, false},
{-1, libcall, false;

may still perform better but the differnece between loop and unrolled
loop is within 10% margin..

So i guess patch is OK and we should look into cleaning up the
descriptors.  I can make patch for that once I understand the logic above.

Honza
> >
> > > The difference seems to get quite big for small locks in range 8...16
> > > bytes.  I noticed that before and sort of conlcuded that it is probably
> > > the branch prediction playing relatively well for those small block
> > > sizes. On the other hand winding up the relatively long unrolled loop is
> > > not very cool just to catch this case.
> > >
> > > Do you know what of the three changes (preferring reps/stosb,
> > > CLEAR_RATIO and algorithm choice changes) cause the two speedups
> > > on eebmc?
> >
> > Hongyu, can you find out where the speedup came from?
> >
> > Thanks.
> >
> > --
> > H.J.


vect: Don't split store groups if we have IFN_STORE_LANES [PR99873]

2021-04-06 Thread Richard Sandiford via Gcc-patches
As noted in the PR, we were no longer using ST3 for the testcase and
instead stored each lane individually.  This is because we'd split
the store group during SLP and couldn't recover when SLP failed.

However, we seem to get better code with ST3 and ST4 even if
SLP would have succeeded, such as for vect-complex-5.c.
I think the best thing for GCC 11 would therefore be to skip
the split entirely if we could use IFN_STORE_LANES.  A downside
of this is that SLP can handle smaller iteration counts than
IFN_STORE_LANES can, but we don't have the infrastructure to
choose reliably based on that.

Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf,
armeb-eabi and x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR tree-optimization/99873
* tree-vect-slp.c (vect_could_use_store_lanes_p): New function.
(vect_build_slp_instance): Don't split store groups that could
use IFN_STORE_LANES.

gcc/testsuite/
* gcc.dg/vect/slp-21.c: Only expect 2 of the loops to use SLP
if IFN_STORE_LANES is available.
* vect/vect-complex-5.c: Expect no loops to use SLP if
IFN_STORE_LANES is available.
* gcc.target/aarch64/pr99873.c: New test.
* gcc.target/aarch64/sve/pr99873.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/slp-21.c |  4 ++--
 gcc/testsuite/gcc.dg/vect/vect-complex-5.c |  3 ++-
 gcc/testsuite/gcc.target/aarch64/pr99873.c | 17 +
 gcc/testsuite/gcc.target/aarch64/sve/pr99873.c | 15 +++
 gcc/tree-vect-slp.c| 15 ++-
 5 files changed, 50 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr99873.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr99873.c

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index ceec7f5c410..b0c03da3aeb 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2458,6 +2458,18 @@ vect_match_slp_patterns (slp_instance instance, vec_info 
*vinfo,
   return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
 }
 
+/* Return true if the store group in STMT_INFO could use IFN_STORE_LANES.
+   GROUP_SIZE is the number of elements in the group.  */
+
+static bool
+vect_could_use_store_lanes_p (vec_info *vinfo, stmt_vec_info stmt_info,
+ unsigned int group_size)
+{
+  tree scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
+  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+  return vectype && vect_store_lanes_supported (vectype, group_size, false);
+}
+
 /* Analyze an SLP instance starting from a group of grouped stores.  Call
vect_build_slp_tree to build a tree of packed stmts if possible.
Return FALSE if it's impossible to SLP any stmt in the loop.  */
@@ -2693,7 +2705,8 @@ vect_build_slp_instance (vec_info *vinfo,
 
   /* For loop vectorization split into arbitrary pieces of size > 1.  */
   if (is_a  (vinfo)
- && (i > 1 && i < group_size))
+ && (i > 1 && i < group_size)
+ && !vect_could_use_store_lanes_p (vinfo, stmt_info, group_size))
{
  unsigned group1_size = i;
 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c 
b/gcc/testsuite/gcc.dg/vect/slp-21.c
index bf8f434dd50..85393975b45 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
@@ -210,7 +210,7 @@ int main (void)
 
Not all vect_perm targets support that, and it's a bit too specific to have
its own effective-target selector, so we just test targets directly.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { aarch64*-*-* arm*-*-* powerpc64*-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { vect_strided4 && { ! { aarch64*-*-* arm*-*-* powerpc64*-*-* } } } } } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target powerpc64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { vect_strided4 && { ! powerpc64*-*-* } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { 
target { ! { vect_strided4 } } } } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
index 81fdb67ce81..addcf60438c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
@@ -40,4 +40,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
xfail { ! vect_hw_misalign } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target vect_load_lanes } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr99873.c 

vect: Restore variable-length SLP permutes [PR97513]

2021-04-06 Thread Richard Sandiford via Gcc-patches
Many of the gcc.target/sve/slp-perm*.c tests started failing
after the introduction of separate SLP permute nodes.
This patch adds variable-length support using a similar
technique to vect_transform_slp_perm_load.

As there, the idea is to detect when every permute mask vector
is the same and can be generated using a regular stepped sequence.
We can easily handle those cases for variable-length, but still
need to restrict the general case to constant-length.

Again copying vect_transform_slp_perm_load, the idea is to distinguish
the two cases regardless of whether the length is variable or not,
partly to increase testing coverage and partly because it avoids
generating redundant trees.

Doing this means that we can also use SLP for the two-vector
permute in pr88834.c, which we couldn't before VEC_PERM_EXPR
nodes were introduced.  The patch therefore makes pr88834.c
check that we don't regress back to not using SLP and adds
pr88834_ld3.c to check for the original problem in the PR.

Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf,
armeb-eabi and x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR tree-optimization/97513
* tree-vect-slp.c (vect_add_slp_permutation): New function,
split out from...
(vectorizable_slp_permutation): ...here.  Detect cases in which
all VEC_PERM_EXPRs are guaranteed to have the same stepped
permute vector and only generate one permute vector for that case.
Extend that case to handle variable-length vectors.

gcc/testsuite/
* gcc.target/aarch64/sve/pr88834.c: Expect the vectorizer to use SLP.
* gcc.target/aarch64/sve/pr88834_ld3.c: New test.
---
 .../gcc.target/aarch64/sve/pr88834.c  |   5 +-
 .../gcc.target/aarch64/sve/pr88834_ld3.c  |  16 ++
 gcc/tree-vect-slp.c   | 218 --
 3 files changed, 167 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr88834_ld3.c

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b0c03da3aeb..aef2104774a 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5813,6 +5813,57 @@ vect_transform_slp_perm_load (vec_info *vinfo,
   return true;
 }
 
+/* Produce the next vector result for SLP permutation NODE by adding a vector
+   statement at GSI.  If MASK_VEC is nonnull, add:
+
+   = VEC_PERM_EXPR 
+
+   otherwise add:
+
+   = FIRST_DEF.  */
+
+static void
+vect_add_slp_permutation (vec_info *vinfo, gimple_stmt_iterator *gsi,
+ slp_tree node, tree first_def, tree second_def,
+ tree mask_vec)
+{
+  tree vectype = SLP_TREE_VECTYPE (node);
+
+  /* ???  We SLP match existing vector element extracts but
+ allow punning which we need to re-instantiate at uses
+ but have no good way of explicitly representing.  */
+  if (!types_compatible_p (TREE_TYPE (first_def), vectype))
+{
+  gassign *conv_stmt
+   = gimple_build_assign (make_ssa_name (vectype),
+  build1 (VIEW_CONVERT_EXPR, vectype, first_def));
+  vect_finish_stmt_generation (vinfo, NULL, conv_stmt, gsi);
+  first_def = gimple_assign_lhs (conv_stmt);
+}
+  gassign *perm_stmt;
+  tree perm_dest = make_ssa_name (vectype);
+  if (mask_vec)
+{
+  if (!types_compatible_p (TREE_TYPE (second_def), vectype))
+   {
+ gassign *conv_stmt
+   = gimple_build_assign (make_ssa_name (vectype),
+  build1 (VIEW_CONVERT_EXPR,
+  vectype, second_def));
+ vect_finish_stmt_generation (vinfo, NULL, conv_stmt, gsi);
+ second_def = gimple_assign_lhs (conv_stmt);
+   }
+  perm_stmt = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
+  first_def, second_def,
+  mask_vec);
+}
+  else
+/* We need a copy here in case the def was external.  */
+perm_stmt = gimple_build_assign (perm_dest, first_def);
+  vect_finish_stmt_generation (vinfo, NULL, perm_stmt, gsi);
+  /* Store the vector statement in NODE.  */
+  SLP_TREE_VEC_STMTS (node).quick_push (perm_stmt);
+}
 
 /* Vectorize the SLP permutations in NODE as specified
in SLP_TREE_LANE_PERMUTATION which is a vector of pairs of SLP
@@ -5836,15 +5887,21 @@ vectorizable_slp_permutation (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
  arbitrary mismatches.  */
   slp_tree child;
   unsigned i;
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  bool repeating_p = multiple_p (nunits, SLP_TREE_LANES (node));
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-if (!vect_maybe_update_slp_op_vectype (child, vectype)
-   || !types_compatible_p (SLP_TREE_VECTYPE (child), vectype))
-  {
-   if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-  "Unsupported lane permutation\n");
-   

[PATCH] tree-optimization/99924 - visit permute nodes again when partitioning

2021-04-06 Thread Richard Biener
Since SLP graph partitioning works on scalar stmts (because it's done
for costing) we have to make sure to visit permute nodes multiple
times since they will not pull partitions together.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2021-04-06  Richard Biener  

PR tree-optimization/99924
* tree-vect-slp.c (vect_bb_partition_graph_r): Do not mark
nodes w/o scalar stmts as visited.

* gfortran.dg/vect/pr99924.f90: New testcase.
---
 gcc/testsuite/gfortran.dg/vect/pr99924.f90 | 12 
 gcc/tree-vect-slp.c|  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/vect/pr99924.f90

diff --git a/gcc/testsuite/gfortran.dg/vect/pr99924.f90 
b/gcc/testsuite/gfortran.dg/vect/pr99924.f90
new file mode 100644
index 000..f271ea1d0d5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/pr99924.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-additional-options "-march=armv8.3-a" { target aarch64-*-* } }
+subroutine cunhj (tfn, asum, bsum)
+  implicit none
+  complex :: up, tfn, asum, bsum
+  real :: ar
+
+  up = tfn * ar
+  bsum = up + ar
+  asum = up + asum
+  return
+end subroutine cunhj
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index ceec7f5c410..58dedfc35b7 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4224,7 +4224,7 @@ vect_bb_partition_graph_r (bb_vec_info bb_vinfo,
   stmt_instance = instance;
 }
 
-  if (visited.add (node))
+  if (!SLP_TREE_SCALAR_STMTS (node).is_empty () && visited.add (node))
 return;
 
   slp_tree child;
-- 
2.26.2


Re: [PATCH] bswap: Handle bswapping of pointers [PR96573]

2021-04-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 01, 2021 at 02:16:55PM +0100, Alex Coplan via Gcc-patches wrote:
> FYI, I'm seeing the new test failing on aarch64:
> 
> PASS: gcc.dg/pr96573.c (test for excess errors)
> FAIL: gcc.dg/pr96573.c scan-tree-dump optimized "__builtin_bswap"

The vectorizer in the aarch64 case manages to emit a VEC_PERM_EXPR instead
(which is just as efficient).

So, do we want to go for the following (and/or perhaps also restrict the test to
a couple of targets where it works?  In my last distro build it failed only
on aarch64-linux, while armv7hl-linux-gnueabi and
{i686,x86_64,powerpc64le,s390x}-linux were fine)?

2021-04-06  Jakub Jelinek  

PR tree-optimization/96573
* gcc.dg/pr96573.c: Instead of __builtin_bswap accept also
VEC_PERM_EXPR with bswapping permutation.

--- gcc/testsuite/gcc.dg/pr96573.c.jj   2021-04-01 10:50:56.238629197 +0200
+++ gcc/testsuite/gcc.dg/pr96573.c  2021-04-06 12:20:16.314520746 +0200
@@ -2,7 +2,7 @@
 /* { dg-do compile { target { lp64 || ilp32 } } } */
 /* { dg-require-effective-target bswap } */
 /* { dg-options "-O3 -fdump-tree-optimized" } */
-/* { dg-final { scan-tree-dump "__builtin_bswap" "optimized" } } */
+/* { dg-final { scan-tree-dump "__builtin_bswap\|VEC_PERM_EXPR\[^\n\r]*7, 6, 
5, 4, 3, 2, 1, 0" "optimized" } } */
 
 typedef __SIZE_TYPE__ size_t;
 


Jakub



Re: [PATCH] bswap: Handle bswapping of pointers [PR96573]

2021-04-06 Thread Richard Biener
On Tue, 6 Apr 2021, Jakub Jelinek wrote:

> On Thu, Apr 01, 2021 at 02:16:55PM +0100, Alex Coplan via Gcc-patches wrote:
> > FYI, I'm seeing the new test failing on aarch64:
> > 
> > PASS: gcc.dg/pr96573.c (test for excess errors)
> > FAIL: gcc.dg/pr96573.c scan-tree-dump optimized "__builtin_bswap"
> 
> The vectorizer in the aarch64 case manages to emit a VEC_PERM_EXPR instead
> (which is just as efficient).
> 
> So, do we want to go for the following (and/or perhaps also restrict the test 
> to
> a couple of targets where it works?  In my last distro build it failed only
> on aarch64-linux, while armv7hl-linux-gnueabi and
> {i686,x86_64,powerpc64le,s390x}-linux were fine)?

Works for me.

> 2021-04-06  Jakub Jelinek  
> 
>   PR tree-optimization/96573
>   * gcc.dg/pr96573.c: Instead of __builtin_bswap accept also
>   VEC_PERM_EXPR with bswapping permutation.
> 
> --- gcc/testsuite/gcc.dg/pr96573.c.jj 2021-04-01 10:50:56.238629197 +0200
> +++ gcc/testsuite/gcc.dg/pr96573.c2021-04-06 12:20:16.314520746 +0200
> @@ -2,7 +2,7 @@
>  /* { dg-do compile { target { lp64 || ilp32 } } } */
>  /* { dg-require-effective-target bswap } */
>  /* { dg-options "-O3 -fdump-tree-optimized" } */
> -/* { dg-final { scan-tree-dump "__builtin_bswap" "optimized" } } */
> +/* { dg-final { scan-tree-dump "__builtin_bswap\|VEC_PERM_EXPR\[^\n\r]*7, 6, 
> 5, 4, 3, 2, 1, 0" "optimized" } } */
>  
>  typedef __SIZE_TYPE__ size_t;
>  
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: vect: Don't split store groups if we have IFN_STORE_LANES [PR99873]

2021-04-06 Thread Richard Biener via Gcc-patches
On Tue, Apr 6, 2021 at 12:03 PM Richard Sandiford via Gcc-patches
 wrote:
>
> As noted in the PR, we were no longer using ST3 for the testcase and
> instead stored each lane individually.  This is because we'd split
> the store group during SLP and couldn't recover when SLP failed.
>
> However, we seem to get better code with ST3 and ST4 even if
> SLP would have succeeded, such as for vect-complex-5.c.
> I think the best thing for GCC 11 would therefore be to skip
> the split entirely if we could use IFN_STORE_LANES.  A downside
> of this is that SLP can handle smaller iteration counts than
> IFN_STORE_LANES can, but we don't have the infrastructure to
> choose reliably based on that.
>
> Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf,
> armeb-eabi and x86_64-linux-gnu.  OK to install?

One of the cases where splitting helps is if you have say V2DFmode
and a group size of 4 but if you break up the group into sizes of 2
then you get two V2DFmode group size 2 SLP subgraphs.  So I wonder
if, since you look for a vector type, want to only disable splitting
in case the resulting vector type has the same number of lanes
as the group size?  (and if not, instead limit where we consider splitting)

At least ISTR that load/store lane instructions _do_ have an overhead
compared to vector load/store.  The original motivational testcase of the
splitting is exactly such a case (on x86 there's the complication of
having V4DF and V2DF of course, sth you do not need to deal with Neon ;))

That said, if you think the patch is good enough for GCC11 the go for it,
it clearly only affects ARM.

Richard.

> Richard
>
>
> gcc/
> PR tree-optimization/99873
> * tree-vect-slp.c (vect_could_use_store_lanes_p): New function.
> (vect_build_slp_instance): Don't split store groups that could
> use IFN_STORE_LANES.
>
> gcc/testsuite/
> * gcc.dg/vect/slp-21.c: Only expect 2 of the loops to use SLP
> if IFN_STORE_LANES is available.
> * vect/vect-complex-5.c: Expect no loops to use SLP if
> IFN_STORE_LANES is available.
> * gcc.target/aarch64/pr99873.c: New test.
> * gcc.target/aarch64/sve/pr99873.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/slp-21.c |  4 ++--
>  gcc/testsuite/gcc.dg/vect/vect-complex-5.c |  3 ++-
>  gcc/testsuite/gcc.target/aarch64/pr99873.c | 17 +
>  gcc/testsuite/gcc.target/aarch64/sve/pr99873.c | 15 +++
>  gcc/tree-vect-slp.c| 15 ++-
>  5 files changed, 50 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr99873.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr99873.c
>
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index ceec7f5c410..b0c03da3aeb 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -2458,6 +2458,18 @@ vect_match_slp_patterns (slp_instance instance, 
> vec_info *vinfo,
>return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
>  }
>
> +/* Return true if the store group in STMT_INFO could use IFN_STORE_LANES.
> +   GROUP_SIZE is the number of elements in the group.  */
> +
> +static bool
> +vect_could_use_store_lanes_p (vec_info *vinfo, stmt_vec_info stmt_info,
> + unsigned int group_size)
> +{
> +  tree scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> +  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, 
> group_size);
> +  return vectype && vect_store_lanes_supported (vectype, group_size, false);
> +}
> +
>  /* Analyze an SLP instance starting from a group of grouped stores.  Call
> vect_build_slp_tree to build a tree of packed stmts if possible.
> Return FALSE if it's impossible to SLP any stmt in the loop.  */
> @@ -2693,7 +2705,8 @@ vect_build_slp_instance (vec_info *vinfo,
>
>/* For loop vectorization split into arbitrary pieces of size > 1.  */
>if (is_a  (vinfo)
> - && (i > 1 && i < group_size))
> + && (i > 1 && i < group_size)
> + && !vect_could_use_store_lanes_p (vinfo, stmt_info, group_size))
> {
>   unsigned group1_size = i;
>
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c 
> b/gcc/testsuite/gcc.dg/vect/slp-21.c
> index bf8f434dd50..85393975b45 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-21.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
> @@ -210,7 +210,7 @@ int main (void)
>
> Not all vect_perm targets support that, and it's a bit too specific to 
> have
> its own effective-target selector, so we just test targets directly.  */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> { target { aarch64*-*-* arm*-*-* powerpc64*-*-* } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { target { vect_strided4 && { ! { aarch64*-*-* arm*-*-* powerpc64*-*-* } } } 
> } } } */
> +/* { dg-final { scan-tree-dump-ti

Re: vect: Restore variable-length SLP permutes [PR97513]

2021-04-06 Thread Richard Biener via Gcc-patches
On Tue, Apr 6, 2021 at 12:05 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Many of the gcc.target/sve/slp-perm*.c tests started failing
> after the introduction of separate SLP permute nodes.
> This patch adds variable-length support using a similar
> technique to vect_transform_slp_perm_load.
>
> As there, the idea is to detect when every permute mask vector
> is the same and can be generated using a regular stepped sequence.
> We can easily handle those cases for variable-length, but still
> need to restrict the general case to constant-length.
>
> Again copying vect_transform_slp_perm_load, the idea is to distinguish
> the two cases regardless of whether the length is variable or not,
> partly to increase testing coverage and partly because it avoids
> generating redundant trees.
>
> Doing this means that we can also use SLP for the two-vector
> permute in pr88834.c, which we couldn't before VEC_PERM_EXPR
> nodes were introduced.  The patch therefore makes pr88834.c
> check that we don't regress back to not using SLP and adds
> pr88834_ld3.c to check for the original problem in the PR.
>
> Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf,
> armeb-eabi and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> PR tree-optimization/97513
> * tree-vect-slp.c (vect_add_slp_permutation): New function,
> split out from...
> (vectorizable_slp_permutation): ...here.  Detect cases in which
> all VEC_PERM_EXPRs are guaranteed to have the same stepped
> permute vector and only generate one permute vector for that case.
> Extend that case to handle variable-length vectors.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/pr88834.c: Expect the vectorizer to use SLP.
> * gcc.target/aarch64/sve/pr88834_ld3.c: New test.
> ---
>  .../gcc.target/aarch64/sve/pr88834.c  |   5 +-
>  .../gcc.target/aarch64/sve/pr88834_ld3.c  |  16 ++
>  gcc/tree-vect-slp.c   | 218 --
>  3 files changed, 167 insertions(+), 72 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr88834_ld3.c
>
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index b0c03da3aeb..aef2104774a 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -5813,6 +5813,57 @@ vect_transform_slp_perm_load (vec_info *vinfo,
>return true;
>  }
>
> +/* Produce the next vector result for SLP permutation NODE by adding a vector
> +   statement at GSI.  If MASK_VEC is nonnull, add:
> +
> +   = VEC_PERM_EXPR 
> +
> +   otherwise add:
> +
> +   = FIRST_DEF.  */
> +
> +static void
> +vect_add_slp_permutation (vec_info *vinfo, gimple_stmt_iterator *gsi,
> + slp_tree node, tree first_def, tree second_def,
> + tree mask_vec)
> +{
> +  tree vectype = SLP_TREE_VECTYPE (node);
> +
> +  /* ???  We SLP match existing vector element extracts but
> + allow punning which we need to re-instantiate at uses
> + but have no good way of explicitly representing.  */
> +  if (!types_compatible_p (TREE_TYPE (first_def), vectype))
> +{
> +  gassign *conv_stmt
> +   = gimple_build_assign (make_ssa_name (vectype),
> +  build1 (VIEW_CONVERT_EXPR, vectype, 
> first_def));
> +  vect_finish_stmt_generation (vinfo, NULL, conv_stmt, gsi);
> +  first_def = gimple_assign_lhs (conv_stmt);
> +}
> +  gassign *perm_stmt;
> +  tree perm_dest = make_ssa_name (vectype);
> +  if (mask_vec)
> +{
> +  if (!types_compatible_p (TREE_TYPE (second_def), vectype))
> +   {
> + gassign *conv_stmt
> +   = gimple_build_assign (make_ssa_name (vectype),
> +  build1 (VIEW_CONVERT_EXPR,
> +  vectype, second_def));
> + vect_finish_stmt_generation (vinfo, NULL, conv_stmt, gsi);
> + second_def = gimple_assign_lhs (conv_stmt);
> +   }
> +  perm_stmt = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> +  first_def, second_def,
> +  mask_vec);
> +}
> +  else
> +/* We need a copy here in case the def was external.  */
> +perm_stmt = gimple_build_assign (perm_dest, first_def);
> +  vect_finish_stmt_generation (vinfo, NULL, perm_stmt, gsi);
> +  /* Store the vector statement in NODE.  */
> +  SLP_TREE_VEC_STMTS (node).quick_push (perm_stmt);
> +}
>
>  /* Vectorize the SLP permutations in NODE as specified
> in SLP_TREE_LANE_PERMUTATION which is a vector of pairs of SLP
> @@ -5836,15 +5887,21 @@ vectorizable_slp_permutation (vec_info *vinfo, 
> gimple_stmt_iterator *gsi,
>   arbitrary mismatches.  */
>slp_tree child;
>unsigned i;
> +  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  bool repeating_p = multiple_p (nunits, SLP_TREE_LANES (node));
>FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (no

Re: [PATCH] testsuite/arm: Fix scan-assembler-times in pr96770.c with movt/movw

2021-04-06 Thread Christophe Lyon via Gcc-patches
ping?

On Mon, 29 Mar 2021 at 11:01, Christophe Lyon
 wrote:
>
> The previous change to this testcase missed the fact that the data may
> be accessed via an anchor, depending on the optimization level,
> leading to false failures.
>
> This patch restricts matching to upper16:lower16 followed by
> non-spaces, followed by +4 (in f4) or +320 (in f5).
>
> Using '.*' instead of '[^ \]' would match accross the whole assembly
> file, which is not what we want, hence the limitation with spaces.
>
> 2021-03-29  Christophe Lyon  
>
> gcc/testsuite/
> PR target/96770
> * gcc.target/arm/pure-code/pr96770.c: Fix scan-assembler-times
> with movt/movw.
> ---
>  gcc/testsuite/gcc.target/arm/pure-code/pr96770.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr96770.c 
> b/gcc/testsuite/gcc.target/arm/pure-code/pr96770.c
> index ae1bd10..3c69614 100644
> --- a/gcc/testsuite/gcc.target/arm/pure-code/pr96770.c
> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr96770.c
> @@ -4,12 +4,13 @@
>  int arr[1000];
>  int *f4 (void) { return &arr[1]; }
>
> -/* For cortex-m0 (thumb-1/v6m), we generate 4 movs with upper/lower:#arr+4.  
> */
> +/* For cortex-m0 (thumb-1/v6m), we generate 2 pairs of movs/adds with 
> upper/lower:#arr+4.  */
>  /* { dg-final { scan-assembler-times "arr\\+4" 4 { target { { ! 
> arm_thumb1_movt_ok } && { ! arm_thumb2_ok } } } } } */
>
>  /* For cortex-m with movt/movw (thumb-1/v8m.base or thumb-2), we
> -   generate a movt/movw pair with upper/lower:#arr+4.  */
> -/* { dg-final { scan-assembler-times "arr\\+4" 2 { target { 
> arm_thumb1_movt_ok || arm_thumb2_ok } } } } */
> +   generate a movt/movw pair with upper/lower:#arr+4 possibly via an anchor. 
>  */
> +/* { dg-final { scan-assembler-times "upper16:\[^ \]+.\\+4" 1 { target { 
> arm_thumb1_movt_ok || arm_thumb2_ok } } } } */
> +/* { dg-final { scan-assembler-times "lower16:\[^ \]+\\+4" 1 { target { 
> arm_thumb1_movt_ok || arm_thumb2_ok } } } } */
>
>  int *f5 (void) { return &arr[80]; }
>
> @@ -17,5 +18,6 @@ int *f5 (void) { return &arr[80]; }
>  /* { dg-final { scan-assembler-times "arr\\+320" 1 { target { { ! 
> arm_thumb1_movt_ok } && { ! arm_thumb2_ok } } } } } */
>
>  /* For cortex-m with movt/movw (thumb-1/v8m.base or thumb-2), we
> -   generate a movt/movw pair with upper/lower:arr+320.  */
> -/* { dg-final { scan-assembler-times "arr\\+320" 2 { target { 
> arm_thumb1_movt_ok || arm_thumb2_ok } } } } */
> +   generate a movt/movw pair with upper/lower:arr+320 possibly via an 
> anchor.  */
> +/* { dg-final { scan-assembler-times "upper16:\[^ \]+\\+320" 1 { target { 
> arm_thumb1_movt_ok || arm_thumb2_ok } } } } */
> +/* { dg-final { scan-assembler-times "lower16:\[^ \]+\\+320" 1 { target { 
> arm_thumb1_movt_ok || arm_thumb2_ok } } } } */
> --
> 2.7.4
>


c++: Simplify va_arg test

2021-04-06 Thread Nathan Sidwell


The va_arg scans are just too brittle.	Let's not be that picky.  We have 
other tested builtins that are less brittle now anyway.


gcc/testsuite/
* g++.dg/modules/builtin-3_a.C: Remove dump scans.
* g++.dg/modules/builtin-3_b.C: Remove dump scans.

--
Nathan Sidwell
diff --git i/gcc/testsuite/g++.dg/modules/builtin-3_a.C w/gcc/testsuite/g++.dg/modules/builtin-3_a.C
index fb7da6175c0..66f712928a2 100644
--- i/gcc/testsuite/g++.dg/modules/builtin-3_a.C
+++ w/gcc/testsuite/g++.dg/modules/builtin-3_a.C
@@ -1,4 +1,4 @@
-// { dg-additional-options "-fmodules-ts -fdump-lang-module-blocks-alias-uid" }
+// { dg-additional-options -fmodules-ts }
 module;
 #include 
 export module builtins;
@@ -21,24 +21,3 @@ export inline int count (int a, ...)
 
   return c;
 }
-
-// { dg-final { scan-lang-dump-not {Cluster members:\n  \[0\]=decl declaration '::__builtin_strlen'\n  \[1\]=binding '::__builtin_strlen'\n} module } }
-// { dg-final { scan-lang-dump {Wrote GMF:-[0-9]* function_decl:'::__builtin_strlen'@builtins} module } }
-// { dg-final { scan-lang-dump {Writing:-[0-9]*'s named merge key \(decl\) function_decl:'::__builtin_strlen'} module } }
-// { dg-final { scan-lang-dump-not {Writing tree:-[0-9]* function_decl:'__builtin_strlen'\(strlen\)} module } }
-
-// The implementation details of va_list's are target-specific.
-// Usually one of two patterns though
-// { dg-final { scan-lang-dump-not { Cluster members:\n  \[0\]=decl declaration '::__builtin_va_list'\n  \[1\]=binding '::__builtin_va_list'\n} module { target i?86-*-linux* x86_64-*-linux* } } }
-// { dg-final { scan-lang-dump {Wrote GMF:-[0-9]* type_decl:'::__builtin_va_list'@builtins} module { target { { x86_64-*-linux* i?86-*-linux* } && lp64 } } } }
-// { dg-final { scan-lang-dump {Writing:-[0-9]*'s named merge key \(decl\) type_decl:'::__builtin_va_list'} module { target { { x86_64-*-linux* i?86-*-linux* } && lp64 } } } }
-
-// { dg-final { scan-lang-dump {Writing:-1's named merge key \(decl\) type_decl:'::__gnuc_va_list'} module { target i?86-*-linux* *-*-darwin* } } }
-// { dg-final { scan-lang-dump {Wrote GMF:-3 type_decl:'::__gnuc_va_list'@builtins} module { target i?86-*-linux* *-*-darwin* } } }
-
-// { dg-final { scan-lang-dump {Wrote fixed:[0-9]* record_type:'__va_list'} module { target aarch64*-*-linux* } } }
-// { dg-final { scan-lang-dump {Wrote fixed:[0-9]* pointer_type:'::__builtin_va_list'} module { target powerpc*-*-linux* } } }
-
-// { dg-final { scan-lang-dump-not { Cluster members:\n  \[0\]=decl declaration '::va_list'\n  \[1\]=binding '::va_list'\n} module } }
-// { dg-final { scan-lang-dump {Wrote GMF:-[0-9]* type_decl:'::va_list'@builtins} module } }
-// { dg-final { scan-lang-dump {Writing:-[0-9]*'s named merge key \(decl\) type_decl:'::va_list'} module } }
diff --git i/gcc/testsuite/g++.dg/modules/builtin-3_b.C w/gcc/testsuite/g++.dg/modules/builtin-3_b.C
index e0e630656d3..7ba933d9aab 100644
--- i/gcc/testsuite/g++.dg/modules/builtin-3_b.C
+++ w/gcc/testsuite/g++.dg/modules/builtin-3_b.C
@@ -1,4 +1,4 @@
-// { dg-additional-options "-fmodules-ts -fdump-lang-module-alias" }
+// { dg-additional-options -fmodules-ts }
 import builtins;
 
 int main ()
@@ -6,8 +6,3 @@ int main ()
   length ("");
   count (1, "", "", nullptr);
 }
-
-// { dg-final { scan-lang-dump {Read:-[0-9]*'s named merge key \(matched\) function_decl:'::__builtin_strlen'} module } }
-// { dg-final { scan-lang-dump {Read:-[0-9]*'s named merge key \(matched\) type_decl:'::__builtin_va_list'} module { target { { x86_64-*-linux* i?86-*-linux* } && lp64 } } } }
-// { dg-final { scan-lang-dump {Read:-[0-9]*'s named merge key \(new\) type_decl:'::va_list'} module } }
-// { dg-final { scan-lang-dump {Read:-[0-9]*'s named merge key \(new\) type_decl:'::__gnuc_va_list'} module } }


[PATCH] tree-optimization/99880 - avoid vectorizing irrelevant PHI backedge defs

2021-04-06 Thread Richard Biener
This adds a relevancy check before trying to set the vector def of
a backedge in an unvectorized PHI.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-04-06  Richard Biener  

PR tree-optimization/99880
* tree-vect-loop.c (maybe_set_vectorized_backedge_value): Only
set vectorized defs of relevant PHIs.

* gcc.dg/torture/pr99880.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr99880.c | 22 ++
 gcc/tree-vect-loop.c   |  1 +
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr99880.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr99880.c 
b/gcc/testsuite/gcc.dg/torture/pr99880.c
new file mode 100644
index 000..7e0989987d7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr99880.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+unsigned a;
+int b, c, d, e;
+void f() {
+  b = 5;
+  for (; b <= 51; b++)
+;
+  unsigned int g = -8;
+  while (g) {
+g += 5;
+int h = 10;
+do {
+  h -= a = 1;
+  for (; a; a++)
+;
+  c *= c >= d >= b;
+} while (h);
+c -= e;
+  }
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 38d96fc1634..4e928c65b31 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -9148,6 +9148,7 @@ maybe_set_vectorized_backedge_value (loop_vec_info 
loop_vinfo,
 if (gphi *phi = dyn_cast  (USE_STMT (use_p)))
   if (gimple_bb (phi)->loop_father->header == gimple_bb (phi)
  && (phi_info = loop_vinfo->lookup_stmt (phi))
+ && STMT_VINFO_RELEVANT_P (phi_info)
  && VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (phi_info))
  && STMT_VINFO_REDUC_TYPE (phi_info) != FOLD_LEFT_REDUCTION
  && STMT_VINFO_REDUC_TYPE (phi_info) != EXTRACT_LAST_REDUCTION)
-- 
2.26.2


[PATCH][DOC] i386: move non-target attributes out of target section

2021-04-06 Thread Martin Liška
Some attributes like function_return, nocf_check and others are listed as 
options
for target attribute. That's not correct and it's fixed in the following patch.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* doc/extend.texi: Move non-target attributes on the top level.
---
 gcc/doc/extend.texi | 58 ++---
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 29ef0d67551..849c8802473 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6924,6 +6924,35 @@ Specify which floating-point unit to use.  You must 
specify the
 @code{target("fpmath=sse+387")} because the comma would separate
 different options.
 nocf_check 
+@item prefer-vector-width=@var{OPT}
+@cindex @code{prefer-vector-width} function attribute, x86
+On x86 targets, the @code{prefer-vector-width} attribute informs the
+compiler to use @var{OPT}-bit vector width in instructions
+instead of the default on the selected platform.
+
+Valid @var{OPT} values are:
+
+@table @samp
+@item none
+No extra limitations applied to GCC other than defined by the selected 
platform.
+
+@item 128
+Prefer 128-bit vector width for instructions.
+
+@item 256
+Prefer 256-bit vector width for instructions.
+
+@item 512
+Prefer 512-bit vector width for instructions.
+@end table
+
+On the x86, the inliner does not inline a
+function that has different target options than the caller, unless the
+callee has a subset of the target options of the caller.  For example
+a function declared with @code{target("sse3")} can inline a function
+with @code{target("sse2")}, since @code{-msse3} implies @code{-msse2}.
+@end table
+
 @item indirect_branch("@var{choice}")
 @cindex @code{indirect_branch} function attribute, x86
 On x86 targets, the @code{indirect_branch} attribute causes the compiler
@@ -7027,35 +7056,6 @@ On x86 targets, the @code{fentry_section} attribute sets 
the name
 of the section to record function entry instrumentation calls in when
 enabled with @option{-pg -mrecord-mcount}
 
-@item prefer-vector-width=@var{OPT}
-@cindex @code{prefer-vector-width} function attribute, x86
-On x86 targets, the @code{prefer-vector-width} attribute informs the
-compiler to use @var{OPT}-bit vector width in instructions
-instead of the default on the selected platform.
-
-Valid @var{OPT} values are:
-
-@table @samp
-@item none
-No extra limitations applied to GCC other than defined by the selected 
platform.
-
-@item 128
-Prefer 128-bit vector width for instructions.
-
-@item 256
-Prefer 256-bit vector width for instructions.
-
-@item 512
-Prefer 512-bit vector width for instructions.
-@end table
-
-@end table
-
-On the x86, the inliner does not inline a
-function that has different target options than the caller, unless the
-callee has a subset of the target options of the caller.  For example
-a function declared with @code{target("sse3")} can inline a function
-with @code{target("sse2")}, since @code{-msse3} implies @code{-msse2}.
 @end table
 
 @node Xstormy16 Function Attributes
-- 
2.30.2



[PATCH][DOC] -flto-compression-level: improve documentation

2021-04-06 Thread Martin Liška
gcc/ChangeLog:

* doc/invoke.texi: Document minimum and maximum value of the
argument for both supported compression algorithms.
---
 gcc/doc/invoke.texi | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f4f2fb63952..46876ea2961 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12371,10 +12371,12 @@ the link-time optimization step directly from the WPA 
phase.
 @opindex flto-compression-level
 This option specifies the level of compression used for intermediate
 language written to LTO object files, and is only meaningful in
-conjunction with LTO mode (@option{-flto}).  Valid
-values are 0 (no compression) to 9 (maximum compression).  Values
-outside this range are clamped to either 0 or 9.  If the option is not
-given, a default balanced compression setting is used.
+conjunction with LTO mode (@option{-flto}).  GCC currently supports two
+LTO compression algorithms. For zstd, valid values are 0 (no compression)
+to 19 (maximum compression), while zlib supports values from 0 to 9.
+Values outside this range are clamped to either minimum or maximum
+of the supported values.  If the option is not given,
+a default balanced compression setting is used.
 
 @item -fuse-linker-plugin
 @opindex fuse-linker-plugin
-- 
2.30.2



Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-06 Thread H.J. Lu via Gcc-patches
On Tue, Apr 6, 2021 at 2:51 AM Jan Hubicka  wrote:
>
> > > Do you know what of the three changes (preferring reps/stosb,
> > > CLEAR_RATIO and algorithm choice changes) cause the two speedups
> > > on eebmc?
> >
> > A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP
> >
> > This loop is transformed to builtin_memcpy and builtin_memset with size 280.
> >
> > Current strategy for skylake is {512, unrolled_loop, false} for such
> > size, so it will generate unrolled loops with mov, while the patch
> > generates memcpy/memset libcall and uses vector move.
>
> This is good - I originally set the table based on this
> micro-benchmarking script and apparently glibc used at that time had
> more expensive memcpy for small blocks.
>
> One thing to consider is, however, that calling external memcpy has also
> additional cost of clobbering all caller saved registers.  Especially
> for code that uses SSE this is painful since all needs to go to stack in
> that case. So I am not completely sure how representative the
> micro-benchmark is to this respect since it does not use any SSE and
> register pressure is generally small.
>
> So with current glibc it seems libcall is win for blocks of size greater
> than 64 or 128 at least if the register pressure is not big.
> With this respect your change looks good.
> > >
> > > My patch generates "rep movsb" only in a very limited cases:
> > >
> > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
> > >load and store for up to 16 * 16 (256) bytes when the data size is
> > >fixed and known.
> > > 2. Inline only if data size is known to be <= 256.
> > >a. Use "rep movsb/stosb" with a simple code sequence if the data size
> > >   is a constant.
> > >b. Use loop if data size is not a constant.
>
> Aha, this is very hard to read from the algorithm descriptor.  So we
> still have the check that maxsize==minsize and use rep mosb only for
> constant sized blocks when the corresponding TARGET macro is defined.
>
> I think it would be more readable if we introduced rep_1_byte_constant.
> The descriptor is supposed to read as a sequence of rules where fist
> applies.  It is not obvious that we have another TARGET_* macro that
> makes rep_1_byte to be ignored in some cases.
> (TARGET macro will also interfere with the microbenchmarking script).
>
> Still I do not understand why compile time constant makes rep mosb/stosb
> better than loop. Is it CPU special casing it at decoder time and
> requiring explicit mov instruction? Or is it only becuase rep mosb is
> not good for blocks smaller than 128bit?

Non constant "rep movsb" triggers more machine clear events:

https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/mo-machine-clear-overhead.html

in hot loops of some workloads.

> > >
> > > As a result,  "rep stosb" is generated only when 128 < data size < 256
> > > with -mno-sse.
> > >
> > > > Do you have some data for blocks in size 8...256 to be faster with rep1
> > > > compared to unrolled loop for perhaps more real world benchmarks?
> > >
> > > "rep movsb" isn't generated with my patch in this case since
> > > MOVE_RATIO == 17 can copy up to 16 * 16 (256) bytes with
> > > XMM registers.
>
> OK, so I guess:
>   {libcall,
>{{256, rep_1_byte, true},
> {256, unrolled_loop, false},
> {-1, libcall, false}}},
>   {libcall,
>{{256, rep_1_loop, true},
> {256, unrolled_loop, false},
> {-1, libcall, false;
>
> may still perform better but the differnece between loop and unrolled
> loop is within 10% margin..
>
> So i guess patch is OK and we should look into cleaning up the
> descriptors.  I can make patch for that once I understand the logic above.

I am checking in my patch.  We improve it for GCC 12.  We will also revisit:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

for GCC 12.

Thanks.

-- 
H.J.


Re: [PATCH] testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on s390* [PR97680]

2021-04-06 Thread Richard Sandiford via Gcc-patches
Eric Botcazou  writes:
>> It looks like the latter - I've seen no attempt by the original authors to
>> make the feature work on more targets than they cared for.
>
> On the other hand, if you hide the failures, there is essentially zero chance 
> that architecture maintainers pick up the pieces (I personally implemented 
> the 
> SPARC support only because I had ran into the failures in the testsuite).  So 
> doing the inverse filtering sounds quite counterproductive to me and IMO it's 
> up to the architecture maintainers to decide on a case-by-case basis.

Sorry for the late reply, but +1 FWIW.

As Jakub says, this feature doesn't necessarily need work for each
target.  It doesn't for aarch64, for example, and likely doesn't
for several others.  But if it does need work, that work requires
target-specific knowledge.  It isn't reasonable or fair to expect
one person to learn enough about each architecture to do the right
thing for that architecture.

And this isn't new.  I've often seen even long-standing GCC developers
say things like “I don't know enough about target X to do that.
I'll have to leave it to someone who knows about target X”.

One of the decisions for maintainers is whether they care about
this feature at all.  If they don't, they can get GCC to emit a
“sorry, not supported”.  This decision will be based on maintainer
knowledge about how the architecture is used in practice.

If maintainers do care, the question then is: which registers should be
covered?  And what's the best way of zeroing them in the constrained
situation that the option is dealing with?  (The default does the
“obvious” thing in both cases.  Intervention is only needed if the
default isn't right.)

In other words, the ball isn't IMO in the original author's court here.

So personally I object to the second patch.  I think the first was the
right way to go.  As it stands we're hiding ICEs through skips rather
than xfails, and without a specific list of FIXMEs.

Thanks,
Richard


[PATCH 1/1] C-SKY: Describe ck802 bypass accurately.

2021-04-06 Thread Xianmiao Qu
Fix the following warning:
insn-automata.c: In function ‘int maximal_insn_latency(rtx_insn*)’:
insn-automata.c:679:37: warning: array subscript -1 is below array bounds of 
‘const unsigned char [19]’ [-Warray-bounds]
  679 |   return default_latencies[insn_code];
  |  ~~~^
insn-automata.c:397:30: note: while referencing ‘default_latencies’
  397 |   static const unsigned char default_latencies[] =
  |

Tested and pushed.

gcc/
* config/csky/csky_pipeline_ck802.md : Use insn reservation name
instead of *.
---
 gcc/config/csky/csky_pipeline_ck802.md | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/csky/csky_pipeline_ck802.md 
b/gcc/config/csky/csky_pipeline_ck802.md
index bf1c2a7031c..2406f59e776 100644
--- a/gcc/config/csky/csky_pipeline_ck802.md
+++ b/gcc/config/csky/csky_pipeline_ck802.md
@@ -70,8 +70,12 @@
 (define_bypass 3 "ck802_load,ck802_store" "ck802_pool")
 (define_bypass 3 "ck802_pool" "ck802_load,ck802_store")
 
-(define_bypass 1 "*" "ck802_alu")
+(define_bypass 1 "ck802_alu,ck802_branch,ck802_cmp,ck802_cbranch,ck802_call,\
+ ck802_load,ck802_pool,ck802_store"
+"ck802_alu")
 
-(define_bypass 1 "*" "ck802_branch")
+(define_bypass 1 "ck802_alu,ck802_branch,ck802_cmp,ck802_cbranch,ck802_call,\
+ ck802_load,ck802_pool,ck802_store"
+"ck802_branch")
 
 (define_bypass 2 "ck802_cmp" "ck802_cbranch")
-- 
2.26.2



[committed] libstdc++: Fix doxygen markup for group close commands

2021-04-06 Thread Jonathan Wakely via Gcc-patches
A change in Doxygen 1.8.16 means that "// @}" is no longer recognized by
Doxygen, so doesn't close a @{ group. A "///" comment needs to be used.

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h: Fix doxygen group close.
* include/bits/basic_ios.h: Likewise.
* include/bits/forward_list.h: Likewise.
* include/bits/fs_dir.h: Likewise.
* include/bits/fs_ops.h: Likewise.
* include/bits/fs_path.h: Likewise.
* include/bits/functional_hash.h: Likewise.
* include/bits/gslice.h: Likewise.
* include/bits/gslice_array.h: Likewise.
* include/bits/hashtable_policy.h: Likewise.
* include/bits/indirect_array.h: Likewise.
* include/bits/locale_classes.h: Likewise.
* include/bits/locale_facets.h: Likewise.
* include/bits/locale_facets_nonio.h: Likewise.
* include/bits/mask_array.h: Likewise.
* include/bits/refwrap.h: Likewise.
* include/bits/regex.h: Likewise.
* include/bits/regex_automaton.h: Likewise.
* include/bits/regex_compiler.h: Likewise.
* include/bits/regex_constants.h: Likewise.
* include/bits/regex_error.h: Likewise.
* include/bits/regex_executor.h: Likewise.
* include/bits/regex_scanner.h: Likewise.
* include/bits/shared_ptr.h: Likewise.
* include/bits/shared_ptr_atomic.h: Likewise.
* include/bits/shared_ptr_base.h: Likewise.
* include/bits/slice_array.h: Likewise.
* include/bits/specfun.h: Likewise.
* include/bits/std_function.h: Likewise.
* include/bits/std_mutex.h: Likewise.
* include/bits/stl_deque.h: Likewise.
* include/bits/stl_iterator.h: Likewise.
* include/bits/stl_iterator_base_types.h: Likewise.
* include/bits/stl_map.h: Likewise.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_numeric.h: Likewise.
* include/bits/stl_pair.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_uninitialized.h: Likewise.
* include/bits/stream_iterator.h: Likewise.
* include/bits/streambuf_iterator.h: Likewise.
* include/bits/unique_ptr.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/decimal/decimal: Likewise.
* include/experimental/any: Likewise.
* include/experimental/array: Likewise.
* include/experimental/bits/fs_dir.h: Likewise.
* include/experimental/bits/fs_fwd.h: Likewise.
* include/experimental/bits/fs_ops.h: Likewise.
* include/experimental/bits/fs_path.h: Likewise.
* include/experimental/buffer: Likewise.
* include/experimental/internet: Likewise.
* include/experimental/optional: Likewise.
* include/experimental/propagate_const: Likewise.
* include/experimental/socket: Likewise.
* include/ext/pb_ds/assoc_container.hpp: Likewise.
* include/ext/pb_ds/detail/priority_queue_base_dispatch.hpp:
Likewise.
* include/ext/pb_ds/detail/tree_policy/node_metadata_selector.hpp: 
Likewise.
* include/ext/pb_ds/detail/trie_policy/node_metadata_selector.hpp: 
Likewise.
* include/ext/pb_ds/detail/types_traits.hpp: Likewise.
* include/ext/pb_ds/exception.hpp: Likewise.
* include/ext/pb_ds/priority_queue.hpp: Likewise.
* include/ext/pb_ds/tag_and_trait.hpp: Likewise.
* include/ext/random: Likewise.
* include/std/any: Likewise.
* include/std/atomic: Likewise.
* include/std/bitset: Likewise.
* include/std/chrono: Likewise.
* include/std/complex: Likewise.
* include/std/condition_variable: Likewise.
* include/std/fstream: Likewise.
* include/std/future: Likewise.
* include/std/iostream: Likewise.
* include/std/istream: Likewise.
* include/std/mutex: Likewise.
* include/std/numeric: Likewise.
* include/std/ostream: Likewise.
* include/std/ratio: Likewise.
* include/std/shared_mutex: Likewise.
* include/std/stdexcept: Likewise.
* include/std/streambuf: Likewise.
* include/std/system_error: Likewise.
* include/std/thread: Likewise.
* include/std/valarray: Likewise.
* include/std/variant: Likewise.
* include/tr1/cmath: Likewise.
* include/tr1/regex: Likewise.
* include/tr2/dynamic_bitset: Likewise.
* libsupc++/atomic_lockfree_defines.h: Likewise.
* libsupc++/exception: Likewise.
* libsupc++/exception.h: Likewise.
* libsupc++/exception_ptr.h: Likewise.
* libsupc++/nested_exception.h: Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit f0b883464c58cb2f3f521776e65008b1fa75f79e
Author: Jonathan Wakely 
Date:   Tue Apr 6 15:52:19 2021

libstdc+

[committed] libstdc++: Fix Doxygen warnings

2021-04-06 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h: Use markdown for code font.
* include/bits/basic_string.h: Fix @param names.
* include/bits/max_size_type.h: Remove period after @file.
* include/bits/regex.h: Fix duplicate @retval names, and rename.
* include/ext/pb_ds/detail/priority_queue_base_dispatch.hpp: Add
group open to match existing group close.
* include/ext/pb_ds/priority_queue.hpp: Add blank line before group
open.

Tested powerpc64le-linux. Committed to trunk.

commit daef4e4d934716b933fa445a0ec6650aeb642751
Author: Jonathan Wakely 
Date:   Tue Apr 6 16:24:06 2021

libstdc++: Fix Doxygen warnings

libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h: Use markdown for code font.
* include/bits/basic_string.h: Fix @param names.
* include/bits/max_size_type.h: Remove period after @file.
* include/bits/regex.h: Fix duplicate @retval names, and rename.
* include/ext/pb_ds/detail/priority_queue_base_dispatch.hpp: Add
group open to match existing group close.
* include/ext/pb_ds/priority_queue.hpp: Add blank line before group
open.

diff --git a/libstdc++-v3/include/bits/alloc_traits.h 
b/libstdc++-v3/include/bits/alloc_traits.h
index 2c69e5a2b35..34412583064 100644
--- a/libstdc++-v3/include/bits/alloc_traits.h
+++ b/libstdc++-v3/include/bits/alloc_traits.h
@@ -341,7 +341,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { __a.deallocate(__p, __n); }
 
   /**
-   *  @brief  Construct an object of type @a _Tp
+   *  @brief  Construct an object of type `_Tp`
*  @param  __a  An allocator.
*  @param  __p  Pointer to memory of suitable size and alignment for Tp
*  @param  __args Constructor arguments.
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index bfc97644bd0..7d819bb1bb7 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -4707,9 +4707,9 @@ _GLIBCXX_END_NAMESPACE_CXX11
 
   /**
*  @brief  Insert a string_view.
-   *  @param __pos  Position in string to insert at.
-   *  @param __svt  The object convertible to string_view to insert from.
-   *  @param __pos  Position in string_view to insert
+   *  @param __pos1  Position in string to insert at.
+   *  @param __svt   The object convertible to string_view to insert from.
+   *  @param __pos2  Position in string_view to insert
*  from.
*  @param __nThe number of characters to insert.
*  @return  Reference to this string.
diff --git a/libstdc++-v3/include/bits/max_size_type.h 
b/libstdc++-v3/include/bits/max_size_type.h
index 27d63797c49..153b1bff5f4 100644
--- a/libstdc++-v3/include/bits/max_size_type.h
+++ b/libstdc++-v3/include/bits/max_size_type.h
@@ -22,7 +22,7 @@
 // see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 // .
 
-/** @file bits/max_size_type.h.
+/** @file bits/max_size_type.h
  *  This is an internal header file, included by other library headers.
  *  Do not attempt to use it directly. @headername{iterator}
  */
diff --git a/libstdc++-v3/include/bits/regex.h 
b/libstdc++-v3/include/bits/regex.h
index 4d331c82e74..ac10fa184c6 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -916,9 +916,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*
* @param __s Another matched sequence to compare to this one.
*
-   * @retval <0 this matched sequence will collate before @p __s.
-   * @retval =0 this matched sequence is equivalent to @p __s.
-   * @retval <0 this matched sequence will collate after @p __s.
+   * @retval negative  This matched sequence will collate before `__s`.
+   * @retval zero  This matched sequence is equivalent to `__s`.
+   * @retval positive  This matched sequence will collate after `__s`.
*/
   int
   compare(const sub_match& __s) const
@@ -926,13 +926,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   /**
* @{
-   * @brief Compares this sub_match to a string.
+   * @brief Compares this `sub_match` to a string.
*
-   * @param __s A string to compare to this sub_match.
+   * @param __s A string to compare to this `sub_match`.
*
-   * @retval <0 this matched sequence will collate before @p __s.
-   * @retval =0 this matched sequence is equivalent to @p __s.
-   * @retval <0 this matched sequence will collate after @p __s.
+   * @retval negative  This matched sequence will collate before `__s`.
+   * @retval zero  This matched sequence is equivalent to `__s`.
+   * @retval positive  This matched sequence will collate after `__s`.
*/
   int
   compare(const string_type& __s) const
diff --git 
a/libstdc++-v3/include/ext/pb_ds/detail

[committed] libstdc++: Clarify static_assert message

2021-04-06 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/bits/move.h (forward): Change static_assert message
to be unambiguous about what must be true.
* testsuite/20_util/forward/c_neg.cc: Adjust dg-error.
* testsuite/20_util/forward/f_neg.cc: Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit 41019bfae2673a818a9b7d08742f3ef91c0deade
Author: Jonathan Wakely 
Date:   Tue Apr 6 16:34:48 2021

libstdc++: Clarify static_assert message

libstdc++-v3/ChangeLog:

* include/bits/move.h (forward): Change static_assert message
to be unambiguous about what must be true.
* testsuite/20_util/forward/c_neg.cc: Adjust dg-error.
* testsuite/20_util/forward/f_neg.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/move.h b/libstdc++-v3/include/bits/move.h
index d36e4b28f37..feacae084c9 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -87,7 +87,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 forward(typename std::remove_reference<_Tp>::type&& __t) noexcept
 {
   static_assert(!std::is_lvalue_reference<_Tp>::value, "template argument"
-   " substituting _Tp is an lvalue reference type");
+   " substituting _Tp must not be an lvalue reference type");
   return static_cast<_Tp&&>(__t);
 }
 
diff --git a/libstdc++-v3/testsuite/20_util/forward/c_neg.cc 
b/libstdc++-v3/testsuite/20_util/forward/c_neg.cc
index 884100da42e..dc7ec51bde6 100644
--- a/libstdc++-v3/testsuite/20_util/forward/c_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/forward/c_neg.cc
@@ -17,7 +17,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "static assertion failed" "" { target *-*-* } 89 }
+// { dg-error "must not be an lvalue reference" "" { target *-*-* } 0 }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/forward/f_neg.cc 
b/libstdc++-v3/testsuite/20_util/forward/f_neg.cc
index 90d3152ffcd..4ccd7264c65 100644
--- a/libstdc++-v3/testsuite/20_util/forward/f_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/forward/f_neg.cc
@@ -17,7 +17,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "static assertion failed" "" { target *-*-* } 89 }
+// { dg-error "must not be an lvalue reference" "" { target *-*-* } 0 }
 
 #include 
 


[committed] libstdc++: Add nodiscard attribute to cast-like functions

2021-04-06 Thread Jonathan Wakely via Gcc-patches
Add [[nodiscard]] to functions that are effectively just a static_cast,
as per P2351. Also add it to std::addressof.

libstdc++-v3/ChangeLog:

* include/bits/move.h (forward, move, move_if_noexcept)
(addressof): Add _GLIBCXX_NODISCARD.
* include/bits/ranges_cmp.h (identity::operator()): Add
nodiscard attribute.
* include/c_global/cstddef (to_integer): Likewise.
* include/std/bit (bit_cast): Likewise.
* include/std/utility (as_const, to_underlying): Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit 406f58e1e38e92e4b881f3666b596843da308783
Author: Jonathan Wakely 
Date:   Tue Apr 6 14:41:29 2021

libstdc++: Add nodiscard attribute to cast-like functions

Add [[nodiscard]] to functions that are effectively just a static_cast,
as per P2351. Also add it to std::addressof.

libstdc++-v3/ChangeLog:

* include/bits/move.h (forward, move, move_if_noexcept)
(addressof): Add _GLIBCXX_NODISCARD.
* include/bits/ranges_cmp.h (identity::operator()): Add
nodiscard attribute.
* include/c_global/cstddef (to_integer): Likewise.
* include/std/bit (bit_cast): Likewise.
* include/std/utility (as_const, to_underlying): Likewise.

diff --git a/libstdc++-v3/include/bits/move.h b/libstdc++-v3/include/bits/move.h
index feacae084c9..3abbb37ceeb 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -72,6 +72,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  This function is used to implement "perfect forwarding".
*/
   template
+_GLIBCXX_NODISCARD
 constexpr _Tp&&
 forward(typename std::remove_reference<_Tp>::type& __t) noexcept
 { return static_cast<_Tp&&>(__t); }
@@ -83,6 +84,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  This function is used to implement "perfect forwarding".
*/
   template
+_GLIBCXX_NODISCARD
 constexpr _Tp&&
 forward(typename std::remove_reference<_Tp>::type&& __t) noexcept
 {
@@ -97,6 +99,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @return The parameter cast to an rvalue-reference to allow moving it.
   */
   template
+_GLIBCXX_NODISCARD
 constexpr typename std::remove_reference<_Tp>::type&&
 move(_Tp&& __t) noexcept
 { return static_cast::type&&>(__t); }
@@ -116,6 +119,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  type is copyable, in which case an lvalue-reference is returned instead.
*/
   template
+_GLIBCXX_NODISCARD
 constexpr typename
 conditional<__move_if_noexcept_cond<_Tp>::value, const _Tp&, _Tp&&>::type
 move_if_noexcept(_Tp& __x) noexcept
@@ -136,6 +140,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @return   The actual address.
   */
   template
+_GLIBCXX_NODISCARD
 inline _GLIBCXX17_CONSTEXPR _Tp*
 addressof(_Tp& __r) noexcept
 { return std::__addressof(__r); }
diff --git a/libstdc++-v3/include/bits/ranges_cmp.h 
b/libstdc++-v3/include/bits/ranges_cmp.h
index 3f71d31e5a6..f859a33b2c1 100644
--- a/libstdc++-v3/include/bits/ranges_cmp.h
+++ b/libstdc++-v3/include/bits/ranges_cmp.h
@@ -47,6 +47,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   struct identity
   {
 template
+  [[nodiscard]]
   constexpr _Tp&&
   operator()(_Tp&& __t) const noexcept
   { return std::forward<_Tp>(__t); }
diff --git a/libstdc++-v3/include/c_global/cstddef 
b/libstdc++-v3/include/c_global/cstddef
index 11c808cab28..13ef7f03c12 100644
--- a/libstdc++-v3/include/c_global/cstddef
+++ b/libstdc++-v3/include/c_global/cstddef
@@ -169,6 +169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __l = __l ^ __r; }
 
   template
+[[nodiscard]]
 constexpr _IntegerType
 to_integer(__byte_op_t<_IntegerType> __b) noexcept
 { return _IntegerType(__b); }
diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index 8638a02c8a6..fb78578448c 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -54,6 +54,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// Create a value of type `To` from the bits of `from`.
   template
+[[nodiscard]]
 constexpr _To
 bit_cast(const _From& __from) noexcept
 {
diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index fb19d62968f..3e68f682e00 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -386,7 +386,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #define  __cpp_lib_as_const 201510
   template
-constexpr add_const_t<_Tp>& as_const(_Tp& __t) noexcept { return __t; }
+[[nodiscard]]
+constexpr add_const_t<_Tp>&
+as_const(_Tp& __t) noexcept
+{ return __t; }
 
   template
 void as_const(const _Tp&&) = delete;
@@ -466,6 +469,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #define __cpp_lib_to_underlying 202102L
   /// Convert an object of enumeration type to its underlying type.
   template
+[[nodiscard]]
 constexpr underlying_type_t<_Tp>
 to_underlying(_

Re: [committed] libstdc++: Fix doxygen markup for group close commands

2021-04-06 Thread Jonathan Wakely via Gcc-patches

On 06/04/21 16:54 +0100, Jonathan Wakely wrote:

   https://godbolt.org/z/hTsT96

   A change in Doxygen 1.8.16 means that "// @}" is no longer recognized by
   Doxygen, so doesn't close a @{ group. A "///" comment needs to be used.

   libstdc++-v3/ChangeLog:

   * include/bits/atomic_base.h: Fix doxygen group close.
   * include/bits/basic_ios.h: Likewise.
   * include/bits/forward_list.h: Likewise.
   * include/bits/fs_dir.h: Likewise.
   * include/bits/fs_ops.h: Likewise.
   * include/bits/fs_path.h: Likewise.
   * include/bits/functional_hash.h: Likewise.
   * include/bits/gslice.h: Likewise.
   * include/bits/gslice_array.h: Likewise. It wasn't clear
   in C++17 but the intent was always to require them to be
   constant expressions.
   * include/bits/hashtable_policy.h: Likewise.
   * include/bits/indirect_array.h: Likewise.
   * include/bits/locale_classes.h: Likewise.
   * include/bits/locale_facets.h: Likewise.
   * include/bits/locale_facets_nonio.h: Likewise.
   * include/bits/mask_array.h: Likewise.
   * include/bits/refwrap.h: Likewise.
   * include/bits/regex.h: Likewise.
   * include/bits/regex_automaton.h: Likewise.
   * include/bits/regex_compiler.h: Likewise.
   * include/bits/regex_constants.h: Likewise.
   * include/bits/regex_error.h: Likewise.
   * include/bits/regex_executor.h: Likewise.
   * include/bits/regex_scanner.h: Likewise.
   * include/bits/shared_ptr.h: Likewise.
   * include/bits/shared_ptr_atomic.h: Likewise.
   * include/bits/shared_ptr_base.h: Likewise.
   * include/bits/slice_array.h: Likewise.
   * include/bits/specfun.h: Likewise.
   * include/bits/std_function.h: Likewise.
   * include/bits/std_mutex.h: Likewise.
   * include/bits/stl_deque.h: Likewise.
   * include/bits/stl_iterator.h: Likewise.
   * include/bits/stl_iterator_base_types.h: Likewise.
   * include/bits/stl_map.h: Likewise.
   * include/bits/stl_multimap.h: Likewise.
   * include/bits/stl_multiset.h: Likewise.
   * include/bits/stl_numeric.h: Likewise.
   * include/bits/stl_pair.h: Likewise.
   * include/bits/stl_set.h: Likewise.
   * include/bits/stl_uninitialized.h: Likewise.
   * include/bits/stream_iterator.h: Likewise.
   * include/bits/streambuf_iterator.h: Likewise.
   * include/bits/unique_ptr.h: Likewise.
   * include/bits/unordered_map.h: Likewise.
   * include/bits/unordered_set.h: Likewise.
   * include/decimal/decimal: Likewise.
   * include/experimental/any: Likewise.
   * include/experimental/array: Likewise.
   * include/experimental/bits/fs_dir.h: Likewise.
   * include/experimental/bits/fs_fwd.h: Likewise.
   * include/experimental/bits/fs_ops.h: Likewise.
   * include/experimental/bits/fs_path.h: Likewise.
   * include/experimental/buffer: Likewise.
   * include/experimental/internet: Likewise.
   * include/experimental/optional: Likewise.
   * include/experimental/propagate_const: Likewise.
   * include/experimental/socket: Likewise.
   * include/ext/pb_ds/assoc_container.hpp: Likewise.
   * include/ext/pb_ds/detail/priority_queue_base_dispatch.hpp:
   Likewise.
   * include/ext/pb_ds/detail/tree_policy/node_metadata_selector.hpp: 
Likewise.
   * include/ext/pb_ds/detail/trie_policy/node_metadata_selector.hpp: 
Likewise.
   * include/ext/pb_ds/detail/types_traits.hpp: Likewise.
   * include/ext/pb_ds/exception.hpp: Likewise.
   * include/ext/pb_ds/priority_queue.hpp: Likewise.
   * include/ext/pb_ds/tag_and_trait.hpp: Likewise.
   * include/ext/random: Likewise.
   * include/std/any: Likewise.
   * include/std/atomic: Likewise.
   * include/std/bitset: Likewise.
   * include/std/chrono: Likewise.
   * include/std/complex: Likewise.
   * include/std/condition_variable: Likewise.
   * include/std/fstream: Likewise.
   * include/std/future: Likewise.
   * include/std/iostream: Likewise.
   * include/std/istream: Likewise.
   * include/std/mutex: Likewise.
   * include/std/numeric: Likewise.
   * include/std/ostream: Likewise.
   * include/std/ratio: Likewise.
   * include/std/shared_mutex: Likewise.
   * include/std/stdexcept: Likewise.
   * include/std/streambuf: Likewise.
   * include/std/system_error: Likewise.
   * include/std/thread: Likewise.
   * include/std/valarray: Likewise.
   * include/std/variant: Likewise.
   * include/tr1/cmath: Likewise.
   * in

Re: [Patch, fortran] 99307 - FAIL: gfortran.dg/class_assign_4.f90 execution test

2021-04-06 Thread Paul Richard Thomas via Gcc-patches
Hi Tobias,

I believe that the attached fixes the problems that you found with
gfc_find_and_cut_at_last_class_ref.

I will test:
   type1%type%array_class2 → NULL is returned  (why?)
   class1%type%array_class2 → ts = class1 but array2_class is used later on
(ups!)
   class1%...%scalar_class2 → ts = class1 but scalar_class2 is used

The ChangeLogs remain the same, apart from the date.

Regtests OK on FC33/x86_64.

Paul


On Mon, 29 Mar 2021 at 14:58, Tobias Burnus  wrote:

> Hi all,
>
> as preremark I want to note that the testcase class_assign_4.f90
> was added for PR83118/PR96012 (fixes problems in handling class objects,
> Dec 18, 2020)
> and got revised for PR99124 (class defined operators, Feb 23, 2021).
> Both patches were then also applied to GCC 9 and 10.
>
> On 26.03.21 17:30, Paul Richard Thomas via Gcc-patches wrote:
> > This patch comes in two versions: submit.diff with Change.Logs or
> > submit2.diff with Change2.Logs.
> > The first fixes the problem by changing array temporaries from class
> > expressions into class temporaries. This permits the use of
> > gfc_get_class_from_expr to obtain the vptr for these temporaries and all
> > the good things that come with that when handling dynamic types. The
> second
> > part of the fix is to use the array element length from the class
> > descriptor, when reallocating on assignment. This is needed because the
> > vptr is being set too early. I will set about trying to track down why
> this
> > is happening and fix it after release.
> >
> > The second version does the same as the first but puts in place a load of
> > tidying up that is permitted by the fix to class array temporaries.
>
> > I couldn't readily see how to prepare a testcase - ideas?
> > Both regtest on FC33/x86_64. The first was tested by Dominique (see the
> > PR). OK for master?
>
> Typo – underscore-'c' should be a dot-'c' – both changelog files
>
> >   * trans-expr_c (gfc_trans_scalar_assign): Make use of pre and
>
> I think the second longer version is nicer in general, but at least for
> GCC 9/GCC10 the first version is simpler and, hence, less error prone.
>
> As you only ask about mainline, I would prefer the second one.
>
> However, I am not happy about gfc_find_and_cut_at_last_class_ref:
>
> > + of refs following. If ts is non-null the cut is at the class entity
> > + or component that is followed by an array reference, which is not +
> > an element. */ ... + + if (ts) + { + if (e->symtree + &&
> > e->symtree->n.sym->ts.type == BT_CLASS) + *ts =
> > &e->symtree->n.sym->ts; + else + *ts = NULL; + } + for (ref = e->ref;
> > ref; ref = ref->next) { + if (ts && ref->type == REF_COMPONENT + &&
> > ref->u.c.component->ts.type == BT_CLASS + && ref->next &&
> > ref->next->type == REF_COMPONENT + && strcmp
> > (ref->next->u.c.component->name, "_data") == 0 + && ref->next->next +
> > && ref->next->next->type == REF_ARRAY + && ref->next->next->u.ar.type
> > != AR_ELEMENT) + { + *ts = &ref->u.c.component->ts; + class_ref = ref;
> > + break; + } + + if (ts && *ts == NULL) + return NULL; +
> Namely, if there is:
>type1%array_class2 → array_class2 is used for 'ts' and later (ok)
>type1%type%array_class2 → NULL is returned  (why?)
>class1%type%array_class2 → ts = class1 but array2_class is used later
> on (ups!)
>class1%...%scalar_class2 → ts = class1 but scalar_class2 is used
> etc.
>
> Thus this either needs to be cleaned up (separate 'ref' loop for
> ts != NULL) – including the wording in the description which tells what
> happens if 'ts' is passed as arg but the expr has rank == 0 – and
> what value is assigned to 'ts'. (You can then also fix 'class.c::' to
> 'class.c: ' in the description above the function.)
>
> Alternatively, you can leave the current code ref handling code in place
> at build_class_array_ref, which might be the simpler alternative.
>
> Otherwise, it looks sensible to me.
>
> Tobias
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank
> Thürauf
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


RE: [PATCH] Ada: hashed container Cursor type predefined equality non-conformance

2021-04-06 Thread Richard Wai
Pinging this..

> -Original Message-
> From: Richard Wai 
> Sent: March 16, 2021 2:19 PM
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Arnaud Charlet' ; 'Bob Duff'
> 
> Subject: RE: [PATCH] Ada: hashed container Cursor type predefined equality
> non-conformance
> 
> Just a note that I do not have write access, so I will need someone who
does
> to commit this patch, if approved.
> 
> Richard
> 
> > -Original Message-
> > From: Richard Wai 
> > Sent: March 11, 2021 9:13 AM
> > To: 'Arnaud Charlet' 
> > Cc: 'gcc-patches@gcc.gnu.org' ; 'Bob Duff'
> > 
> > Subject: RE: [PATCH] Ada: hashed container Cursor type predefined
> > equality non-conformance
> >
> > Here is the amended commit log:
> >
> > --
> >
> > Ada: Ensure correct predefined equality behavior for Cursor objects of
> > hashed containers.
> >
> > 2021-03-11  Richard Wai  
> >
> > gcc/ada/
> > * libgnat/a-cohase.ads (Cursor): Synchronize comments for the
> Cursor
> > type definition to be consistent with identical definitions in other
> > container packages. Add additional comments regarding the
> importance
> > of
> > maintaining the "Position" component for predefined equality.
> > * libgnat/a-cohama.ads (Cursor): Likewise.
> > * libgnat/a-cihama.ads (Cursor): Likewise.
> > * libgnat/a-cohase.adb (Find, Insert): Ensure that Cursor objects
> > always have their "Position" component set to ensure predefined
> > equality works as required.
> > * libgnat/a-cohama.adb (Find, Insert): Likewise.
> > * libgnat/a-cihama.adb (Find, Insert): Likewise.
> >
> > gcc/testsuite/
> > * gnat.dg/containers2.adb: New test.
> >
> > --
> >
> > Thanks!
> >
> > Richard
> >
> > > -Original Message-
> > > From: Arnaud Charlet 
> > > Sent: March 10, 2021 11:27 AM
> > > To: Richard Wai 
> > > Cc: gcc-patches@gcc.gnu.org; 'Bob Duff' ; Arnaud
> > > Charlet 
> > > Subject: Re: [PATCH] Ada: hashed container Cursor type predefined
> > > equality non-conformance
> > >
> > > > I'm not sure I correctly understand you here, but my
> > > > interpretation is that I should no longer submit Changelog
> > > > entries, rather just the patch, and then
> > >
> > > Right.
> > >
> > > > a commit message (a-la git), and then presumably the Changelong
> > > > entries will be generated automatically. From what I can see, gcc'
> > > > website does not talk about that, so I'm guessing this format
> > > > based on what I see from git-log, generally.
> > > >
> > > > So assuming that, attached is the "correct" patch, and here is the
> > > > commit
> > > > message:
> > > >
> > > > ---
> > > >
> > > > Author: Richard Wai 
> > > >
> > > > Ada: Ensure correct Cursor predefined equality behavior for hashed
> > > > containers.
> > > >
> > > > --
> > > >
> > > > And for the record, the change log entries I've come up with as
> > > > per the previous email:
> > >
> > > And the commit log will look like:
> > >
> > > 2021-03-09  Richard Wai  
> > >
> > > gcc/ada/
> > >   * libgnat/...
> > >
> > > gcc/testsuite/
> > >   * gnat.dg/containers2.adb: ...
> > >
> > > Your patch is OK with these changes, thanks.



[committed] d: Fix missing call to va_end in getMatchError [PR99917]

2021-04-06 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes a missing call to va_end in getMatchError in the
front-end, merged from upstream dmd d16195406.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32 and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

PR d/99917
* dmd/MERGE: Merge upstream dmd d16195406.
---
 gcc/d/dmd/MERGE   | 2 +-
 gcc/d/dmd/mtype.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index a89184498c3..98c229d8254 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-5cc71ff830fcfba218152360014298550be9180e
+d16195406e1795ee91f2acb8f522fcb4ec698f47
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/mtype.c b/gcc/d/dmd/mtype.c
index 57aa244b8b8..1c73f50c205 100644
--- a/gcc/d/dmd/mtype.c
+++ b/gcc/d/dmd/mtype.c
@@ -5220,6 +5220,7 @@ static const char *getMatchError(const char *format, ...)
 va_list ap;
 va_start(ap, format);
 buf.vprintf(format, ap);
+va_end(ap);
 return buf.extractChars();
 }
 
-- 
2.27.0



[committed] d: Increment gaggedWarnings if warning or deprecation message was suppressed

2021-04-06 Thread Iain Buclaw via Gcc-patches
Hi,

This patch increments gaggedWarnings count if a warning or deprecation
message was suppressed.  Used by the front-end to catch potential errors
in code that is being compiled in a speculative context.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32 and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-diagnostic.cc (vwarning): Increment gaggedWarnings if warning
message was suppressed.
(vdeprecation): Likewise for deprecation messages.
---
 gcc/d/d-diagnostic.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/d/d-diagnostic.cc b/gcc/d/d-diagnostic.cc
index 659fae24459..3bf5a535edd 100644
--- a/gcc/d/d-diagnostic.cc
+++ b/gcc/d/d-diagnostic.cc
@@ -239,6 +239,8 @@ vwarning (const Loc &loc, const char *format, va_list ap)
 
   d_diagnostic_report_diagnostic (loc, 0, format, ap, DK_WARNING, false);
 }
+  else if (global.gag)
+global.gaggedWarnings++;
 }
 
 /* Print supplementary message about the last warning with explicit location
@@ -297,6 +299,8 @@ vdeprecation (const Loc &loc, const char *format, va_list 
ap,
  DK_WARNING, false);
   free (xformat);
 }
+  else if (global.gag)
+global.gaggedWarnings++;
 }
 
 /* Print supplementary message about the last deprecation with explicit
-- 
2.27.0



[committed] d: Use Array::find to get index of element

2021-04-06 Thread Iain Buclaw via Gcc-patches
Hi,

This patch refactors some code in the code generator to use the
Array::find method to get the index of an element, instead of looping
over the array ourselves.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32 and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-codegen.cc (build_frame_type): Use Array::find to get index of
element.
---
 gcc/d/d-codegen.cc | 28 ++--
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/gcc/d/d-codegen.cc b/gcc/d/d-codegen.cc
index 608abcd94f5..5fa1acd9240 100644
--- a/gcc/d/d-codegen.cc
+++ b/gcc/d/d-codegen.cc
@@ -2507,15 +2507,11 @@ build_frame_type (tree ffi, FuncDeclaration *fd)
{
  VarDeclaration *v = (*fd->parameters)[i];
  /* Remove if already in closureVars so can push to front.  */
- for (size_t j = i; j < fd->closureVars.length; j++)
-   {
- Dsymbol *s = fd->closureVars[j];
- if (s == v)
-   {
- fd->closureVars.remove (j);
- break;
-   }
-   }
+ size_t j = fd->closureVars.find (v);
+
+ if (j < fd->closureVars.length)
+   fd->closureVars.remove (j);
+
  fd->closureVars.insert (i, v);
}
}
@@ -2523,15 +2519,11 @@ build_frame_type (tree ffi, FuncDeclaration *fd)
   /* Also add hidden `this' to outer context.  */
   if (fd->vthis)
{
- for (size_t i = 0; i < fd->closureVars.length; i++)
-   {
- Dsymbol *s = fd->closureVars[i];
- if (s == fd->vthis)
-   {
- fd->closureVars.remove (i);
- break;
-   }
-   }
+ size_t i = fd->closureVars.find (fd->vthis);
+
+ if (i < fd->closureVars.length)
+   fd->closureVars.remove (i);
+
  fd->closureVars.insert (0, fd->vthis);
}
 }
-- 
2.27.0



[committed] d: Merge upstream dmd 5cc71ff83, druntime 1134b710

2021-04-06 Thread Iain Buclaw via Gcc-patches
Hi,

This patch merges the D front-end implementation with upstream dmd
5cc71ff83, and the Phobos standard library with druntime 1134b710.

D front-end changes:

 - Fix ICEs that occurred when using opaque enums.

 - Update `pragma(printf)' checking code to work on 16-bit targets.

Phobos change:

 - Don't compile in argTypes code on AArch64

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32 and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 5cc71ff83.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 1134b710.
---
 gcc/d/dmd/MERGE   |   2 +-
 gcc/d/dmd/chkformat.c |  44 +++--
 gcc/d/dmd/denum.c |  54 +++---
 gcc/d/dmd/dsymbolsem.c|  17 +-
 gcc/d/dmd/mtype.c |  17 --
 .../gdc.test/fail_compilation/chkformat.d |  34 +++-
 .../gdc.test/fail_compilation/enum_init.d | 171 ++
 .../gdc.test/fail_compilation/fail109.d   |   8 +-
 .../gdc.test/fail_compilation/ice10770.d  |  13 --
 .../gdc.test/fail_compilation/ice8511.d   |  13 --
 libphobos/libdruntime/MERGE   |   2 +-
 libphobos/libdruntime/object.d|   2 +-
 12 files changed, 287 insertions(+), 90 deletions(-)
 create mode 100644 gcc/testsuite/gdc.test/fail_compilation/enum_init.d
 delete mode 100644 gcc/testsuite/gdc.test/fail_compilation/ice10770.d
 delete mode 100644 gcc/testsuite/gdc.test/fail_compilation/ice8511.d

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index 86475c80d35..a89184498c3 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-3b808e838bb00f527eb4ed5281cd985756237b8f
+5cc71ff830fcfba218152360014298550be9180e
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/chkformat.c b/gcc/d/dmd/chkformat.c
index d00b658ca00..a4a97c9bf50 100644
--- a/gcc/d/dmd/chkformat.c
+++ b/gcc/d/dmd/chkformat.c
@@ -610,7 +610,7 @@ bool checkPrintfFormat(const Loc &loc, const char *format, 
Expressions &args, bo
 Type *t = e->type->toBasetype();
 Type *tnext = t->nextOf();
 const unsigned c_longsize = target.c.longsize;
-const bool is64bit = global.params.is64bit;
+const unsigned ptrsize = target.ptrsize;
 
 // Types which are promoted to int are allowed.
 // Spec: C99 6.5.2.2.7
@@ -619,46 +619,56 @@ bool checkPrintfFormat(const Loc &loc, const char 
*format, Expressions &args, bo
 case Format_u:  // unsigned int
 case Format_d:  // int
 if (t->ty != Tint32 && t->ty != Tuns32)
-errorPrintfFormat(NULL, slice, e, "int", t);
+errorPrintfFormat(NULL, slice, e, fmt == Format_u ? "uint" 
: "int", t);
 break;
 
 case Format_hhu:// unsigned char
 case Format_hhd:// signed char
 if (t->ty != Tint32 && t->ty != Tuns32 && t->ty != Tint8 && 
t->ty != Tuns8)
-errorPrintfFormat(NULL, slice, e, "byte", t);
+errorPrintfFormat(NULL, slice, e, fmt == Format_hhu ? 
"ubyte" : "byte", t);
 break;
 
 case Format_hu: // unsigned short int
 case Format_hd: // short int
 if (t->ty != Tint32 && t->ty != Tuns32 && t->ty != Tint16 && 
t->ty != Tuns16)
-errorPrintfFormat(NULL, slice, e, "short", t);
+errorPrintfFormat(NULL, slice, e, fmt == Format_hu ? 
"ushort" : "short", t);
 break;
 
 case Format_lu: // unsigned long int
 case Format_ld: // long int
 if (!(t->isintegral() && t->size() == c_longsize))
-errorPrintfFormat(NULL, slice, e, (c_longsize == 4 ? "int" 
: "long"), t);
+{
+if (fmt == Format_lu)
+errorPrintfFormat(NULL, slice, e, (c_longsize == 4 ? 
"uint" : "ulong"), t);
+else
+errorPrintfFormat(NULL, slice, e, (c_longsize == 4 ? 
"int" : "long"), t);
+}
 break;
 
 case Format_llu:// unsigned long long int
 case Format_lld:// long long int
 if (t->ty != Tint64 && t->ty != Tuns64)
-errorPrintfFormat(NULL, slice, e, "long", t);
+errorPrintfFormat(NULL, slice, e, fmt == Format_llu ? 
"ulong" : "long", t);
 break;
 
 case Format_ju: // uintmax_t
 case Format_jd: // intmax_t
 if (t->ty != Tint64 && t->ty != Tuns64)
-errorPrintfFormat(NULL, slice, e, 
"core.stdc.stdint.intmax_t", t);
+{
+if (fmt == Format_ju)
+

[PATCH] libiberty: d: Add support for `typeof(*null)' and function literal symbols.

2021-04-06 Thread Iain Buclaw via Gcc-patches
Hi,

This patch adds support for demangling function literals as template
value parameters, as well as adding the new bottom type `typeof(*null)'.
Null types were incorrectly being demangled as `none', this has been
fixed to be `typeof(null)'.

Bootstrapped and regression tested on x86_64-linux-gnu.

OK for mainline?

Regards,
Iain.

---
libiberty/ChangeLog:

* d-demangle.c (dlang_attributes): Handle typeof(*null).
(dlang_type): Likewise.  Demangle 'n' as typeof(null).
(dlang_identifier): Skip over fake parent manglings.
(dlang_parse_arrayliteral): Add 'info' parameter.
(dlang_parse_assocarray): Likewise.
(dlang_parse_structlit): Likewise.
(dlang_value): Likewise.  Handle function literal symbols.
(dlang_template_args): Pass 'info' to dlang_value.
* testsuite/d-demangle-expected: Update tests.
---
 libiberty/d-demangle.c  | 71 +++--
 libiberty/testsuite/d-demangle-expected | 34 +++-
 2 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 822c7580782..a2152cc6551 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -191,7 +191,8 @@ static const char *dlang_function_args (string *, const 
char *,
 
 static const char *dlang_type (string *, const char *, struct dlang_info *);
 
-static const char *dlang_value (string *, const char *, const char *, char);
+static const char *dlang_value (string *, const char *, const char *, char,
+   struct dlang_info *);
 
 static const char *dlang_parse_qualified (string *, const char *,
  struct dlang_info *, int);
@@ -573,9 +574,11 @@ dlang_attributes (string *decl, const char *mangled)
case 'g':
case 'h':
case 'k':
+   case 'n':
  /* inout parameter is represented as 'Ng'.
 vector parameter is represented as 'Nh'.
-return paramenter is represented as 'Nk'.
+return parameter is represented as 'Nk'.
+typeof(*null) parameter is represented as 'Nn'.
 If we see this, then we know we're really in the
 parameter list.  Rewind and break.  */
  mangled--;
@@ -787,6 +790,12 @@ dlang_type (string *decl, const char *mangled, struct 
dlang_info *info)
  string_append (decl, ")");
  return mangled;
}
+  else if (*mangled == 'n') /* typeof(*null) */
+   {
+ mangled++;
+ string_append (decl, "typeof(*null)");
+ return mangled;
+   }
   else
return NULL;
 case 'A': /* dynamic array (T[]) */
@@ -884,7 +893,7 @@ dlang_type (string *decl, const char *mangled, struct 
dlang_info *info)
 /* Basic types */
 case 'n':
   mangled++;
-  string_append (decl, "none");
+  string_append (decl, "typeof(null)");
   return mangled;
 case 'v':
   mangled++;
@@ -1035,6 +1044,25 @@ dlang_identifier (string *decl, const char *mangled, 
struct dlang_info *info)
   && (mangled[2] == 'T' || mangled[2] == 'U'))
 return dlang_parse_template (decl, mangled, info, len);
 
+  /* There can be multiple different declarations in the same function that 
have
+ the same mangled name.  To make the mangled names unique, a fake parent in
+ the form `__Sddd' is added to the symbol.  */
+  if (len >= 4 && mangled[0] == '_' && mangled[1] == '_' && mangled[2] == 'S')
+{
+  const char *numptr = mangled + 3;
+  while (numptr < (mangled + len) && ISDIGIT (*numptr))
+   numptr++;
+
+  if (mangled + len == numptr)
+   {
+ /* Skip over the fake parent.  */
+ mangled += len;
+ return dlang_identifier (decl, mangled, info);
+   }
+
+  /* else demangle it as a plain identifier.  */
+}
+
   return dlang_lname (decl, mangled, len);
 }
 
@@ -1378,7 +1406,8 @@ dlang_parse_string (string *decl, const char *mangled)
 /* Extract the static array value from MANGLED and append it to DECL.
Return the remaining string on success or NULL on failure.  */
 static const char *
-dlang_parse_arrayliteral (string *decl, const char *mangled)
+dlang_parse_arrayliteral (string *decl, const char *mangled,
+ struct dlang_info *info)
 {
   unsigned long elements;
 
@@ -1389,7 +1418,7 @@ dlang_parse_arrayliteral (string *decl, const char 
*mangled)
   string_append (decl, "[");
   while (elements--)
 {
-  mangled = dlang_value (decl, mangled, NULL, '\0');
+  mangled = dlang_value (decl, mangled, NULL, '\0', info);
   if (mangled == NULL)
return NULL;
 
@@ -1404,7 +1433,8 @@ dlang_parse_arrayliteral (string *decl, const char 
*mangled)
 /* Extract the associative array value from MANGLED and append it to DECL.
Return the remaining string on success or NULL on failure.  */
 static const char *
-dlang_parse_assocarray (string *decl, const char *mangled)
+d

[pushed] c++: C++17 constexpr static data member linkage [PR99901]

2021-04-06 Thread Jason Merrill via Gcc-patches
C++17 makes constexpr static data members implicitly inline variables.  In
C++14, a subsequent out-of-class declaration is the definition.  We want to
continue emitting a symbol for such a declaration in C++17 mode, for ABI
compatibility with C++14 code that wants to refer to it.

Normally I'd distinguish in- and out-of-class declarations by looking at
DECL_IN_AGGR_P, but we never set DECL_IN_AGGR_P on inline variables.  I
think that's wrong, but don't want to mess with it so close to release.
Conveniently, we already have a test for in-class declaration earlier in the
function.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/99901
* decl.c (cp_finish_decl): mark_needed an implicitly inline
static data member with an out-of-class redeclaration.

gcc/testsuite/ChangeLog:

PR c++/99901
* g++.dg/cpp1z/inline-var9.C: New test.
---
 gcc/cp/decl.c| 18 ---
 gcc/testsuite/g++.dg/cpp1z/inline-var9.C | 40 
 2 files changed, 54 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/inline-var9.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 6789aa859cc..edab147c78d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7693,10 +7693,13 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   if (asmspec_tree && asmspec_tree != error_mark_node)
 asmspec = TREE_STRING_POINTER (asmspec_tree);
 
-  if (current_class_type
-  && CP_DECL_CONTEXT (decl) == current_class_type
-  && TYPE_BEING_DEFINED (current_class_type)
-  && !CLASSTYPE_TEMPLATE_INSTANTIATION (current_class_type)
+  bool in_class_decl
+= (current_class_type
+   && CP_DECL_CONTEXT (decl) == current_class_type
+   && TYPE_BEING_DEFINED (current_class_type)
+   && !CLASSTYPE_TEMPLATE_INSTANTIATION (current_class_type));
+
+  if (in_class_decl
   && (DECL_INITIAL (decl) || init))
 DECL_INITIALIZED_IN_CLASS_P (decl) = 1;
 
@@ -8069,6 +8072,13 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
  if (!flag_weak)
/* Check again now that we have an initializer.  */
maybe_commonize_var (decl);
+ /* A class-scope constexpr variable with an out-of-class declaration.
+C++17 makes them implicitly inline, but still force it out.  */
+ if (DECL_INLINE_VAR_P (decl)
+ && !DECL_VAR_DECLARED_INLINE_P (decl)
+ && !DECL_TEMPLATE_INSTANTIATION (decl)
+ && !in_class_decl)
+   mark_needed (decl);
}
 
   if (var_definition_p
diff --git a/gcc/testsuite/g++.dg/cpp1z/inline-var9.C 
b/gcc/testsuite/g++.dg/cpp1z/inline-var9.C
new file mode 100644
index 000..43c9748877b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/inline-var9.C
@@ -0,0 +1,40 @@
+// PR c++/99901
+// { dg-do compile { target c++11 } }
+// { dg-final { scan-assembler-not "_ZN1A1aE" } }
+// { dg-final { scan-assembler-not "_ZN2A21aE" } }
+// { dg-final { scan-assembler-not "_ZN1CIiE1cE" } }
+// { dg-final { scan-assembler "_ZN1B1bE" } }
+// { dg-final { scan-assembler "_ZN2B21bE" } }
+// { dg-final { scan-assembler "_ZN2B31bE" } }
+
+struct A {
+  static const int a = 5;
+};
+
+struct A2 {
+  static constexpr int a = 5;
+};
+
+struct B {
+  static const int b;
+};
+constexpr int B::b = 5;
+
+struct B2 {
+  static const int b = 5;
+};
+constexpr int B2::b;
+
+struct B3 {
+  static constexpr int b = 5;
+};
+const int B3::b;
+
+template 
+struct C {
+  static constexpr int c = 5;
+};
+template 
+constexpr int C::c;
+
+int i = C::c;

base-commit: 16ea7f57891d3fe885ee55b2917208695e184714
-- 
2.27.0



[pushed] c++: access checking in aggregate initialization [PR96673]

2021-04-06 Thread Jason Merrill via Gcc-patches
We were deferring access checks while parsing B{}, didn't adjust that
when we went to instantiate the default member initializer for B::c,
deferred access checking for C::C, and then checked it after parsing
B{}, back in the main() context which has no access.  We need to do the
access checks in the class context of the DMI.

I tried fixing this in push_to/pop_from_top_level, but that caused several
regressions.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/96673
* init.c (get_nsdmi): Don't defer access checking.

gcc/testsuite/ChangeLog:

PR c++/96673
* g++.dg/cpp1y/nsdmi-aggr13.C: New test.
---
 gcc/cp/init.c |  2 ++
 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr13.C | 33 +++
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr13.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 7d598f6196d..91b45a1a695 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -591,6 +591,7 @@ get_nsdmi (tree member, bool in_ctor, tsubst_flags_t 
complain)
{
  push_to_top_level ();
  push_nested_class (ctx);
+ push_deferring_access_checks (dk_no_deferred);
  pushed = true;
}
 
@@ -616,6 +617,7 @@ get_nsdmi (tree member, bool in_ctor, tsubst_flags_t 
complain)
 
  if (pushed)
{
+ pop_deferring_access_checks ();
  pop_nested_class ();
  pop_from_top_level ();
}
diff --git a/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr13.C 
b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr13.C
new file mode 100644
index 000..845e26ff593
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr13.C
@@ -0,0 +1,33 @@
+// PR c++/96673
+// { dg-do compile { target c++11 } }
+
+template 
+class A {};
+
+template 
+class B;
+
+template 
+class C {
+private:
+
+friend class B;
+
+explicit C(A&) {};
+};
+
+
+template 
+class B {
+public:
+B() = default;
+//B() {};   // << This implementation of the constructor makes it work
+
+A a = {};
+C c = C{a};
+};
+
+int main() {
+auto b = B{};
+auto &c = b.c;
+}

base-commit: 8685348075d91945066dea9b564bd42cbc1d22bd
-- 
2.27.0



[pushed] c++: Add test for Core issue 1376 [PR52202]

2021-04-06 Thread Marek Polacek via Gcc-patches
As Jens says in the PR, we handle this correctly.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/testsuite/ChangeLog:

PR c++/52202
* g++.dg/cpp0x/rv-life.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/rv-life.C | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/rv-life.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-life.C 
b/gcc/testsuite/g++.dg/cpp0x/rv-life.C
new file mode 100644
index 000..0fd1119d3ff
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/rv-life.C
@@ -0,0 +1,12 @@
+// Core 1376
+// PR c++/52202
+// { dg-do run { target c++11 } }
+
+extern "C" void abort();
+bool x;
+struct T { ~T() { if (!x) abort (); } };
+int main()
+{
+  T&& r = static_cast(T());
+  x = true;
+}

base-commit: 8cac6af6f8ba5cce69161459e572e59c2be60e75
-- 
2.30.2



[PATCH] Improve rtx insn vec output

2021-04-06 Thread Xionghu Luo via Gcc-patches
print_rtl will dump the rtx_insn from current until LAST.  But it is only
useful to see the particular insn that called by print_rtx_insn_vec,
Let's call print_rtl_single to display that insn in the gcse and store-motion
pass dump.

2021-04-07  Xionghu Luo  

gcc/ChangeLog:

* fold-const.c (fold_single_bit_test): Fix typo.
* print-rtl.c (print_rtx_insn_vec): Call print_rtl_single
instead.
---
 gcc/fold-const.c | 2 +-
 gcc/print-rtl.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index d4c5a9c299f..2834278fd76 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -7390,7 +7390,7 @@ fold_single_bit_test (location_t loc, enum tree_code code,
   return NULL_TREE;
 }
 
-/* Test whether it is preferable two swap two operands, ARG0 and
+/* Test whether it is preferable to swap two operands, ARG0 and
ARG1, for example because ARG0 is an integer constant and ARG1
isn't.  */
 
diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 2a56823d3c1..c7982bce507 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -1237,7 +1237,7 @@ print_rtx_insn_vec (FILE *file, const vec 
&vec)
   unsigned int len = vec.length ();
   for (unsigned int i = 0; i < len; i++)
 {
-  print_rtl (file, vec[i]);
+  print_rtl_single (file, vec[i]);
   if (i < len - 1)
fputs (", ", file);
 }
-- 
2.25.1



Re: [PATCH] Improve rtx insn vec output

2021-04-06 Thread Richard Biener via Gcc-patches
On Wed, Apr 7, 2021 at 7:42 AM Xionghu Luo  wrote:
>
> print_rtl will dump the rtx_insn from current until LAST.  But it is only
> useful to see the particular insn that called by print_rtx_insn_vec,
> Let's call print_rtl_single to display that insn in the gcse and store-motion
> pass dump.

Can you cite a before/after dump snippet to clarify?

> 2021-04-07  Xionghu Luo  
>
> gcc/ChangeLog:
>
> * fold-const.c (fold_single_bit_test): Fix typo.
> * print-rtl.c (print_rtx_insn_vec): Call print_rtl_single
> instead.
> ---
>  gcc/fold-const.c | 2 +-
>  gcc/print-rtl.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index d4c5a9c299f..2834278fd76 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -7390,7 +7390,7 @@ fold_single_bit_test (location_t loc, enum tree_code 
> code,
>return NULL_TREE;
>  }
>
> -/* Test whether it is preferable two swap two operands, ARG0 and
> +/* Test whether it is preferable to swap two operands, ARG0 and
> ARG1, for example because ARG0 is an integer constant and ARG1
> isn't.  */
>
> diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
> index 2a56823d3c1..c7982bce507 100644
> --- a/gcc/print-rtl.c
> +++ b/gcc/print-rtl.c
> @@ -1237,7 +1237,7 @@ print_rtx_insn_vec (FILE *file, const vec 
> &vec)
>unsigned int len = vec.length ();
>for (unsigned int i = 0; i < len; i++)
>  {
> -  print_rtl (file, vec[i]);
> +  print_rtl_single (file, vec[i]);
>if (i < len - 1)
> fputs (", ", file);
>  }
> --
> 2.25.1
>