Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-29 Thread Richard Biener
On Fri, Nov 29, 2024 at 2:30 AM Hongtao Liu  wrote:
>
> On Thu, Nov 28, 2024 at 4:57 PM Richard Biener
>  wrote:
> >
> > On Thu, Nov 28, 2024 at 3:04 AM Hongtao Liu  wrote:
> > >
> > > On Wed, Nov 27, 2024 at 9:43 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Wed, Nov 27, 2024 at 4:26 AM liuhongt  wrote:
> > > > >
> > > > > When loop requires any kind of versioning which could increase 
> > > > > register
> > > > > pressure too much, and it's in a deeply nest big loop, don't do
> > > > > vectorization.
> > > > >
> > > > > I tested the patch with both Ofast and O2 for SPEC2017, besides 
> > > > > 548.exchange_r,
> > > > > other benchmarks are same binary.
> > > > >
> > > > > Bootstrapped and regtested 0on x86_64-pc-linux-gnu{-m32,}
> > > > > Any comments?
> > > >
> > > > The vectorizer tries to version an outer loop when vectorizing a loop 
> > > > nest
> > > > and the versioning condition is invariant.  See vect_loop_versioning.  
> > > > This
> > > > tries to handle such cases.  Often the generated runtime alias checks 
> > > > are
> > > > not invariant because we do not consider the outer evolutions.  I think 
> > > > we
> > > > should instead fix this there.
> > > >
> > > > Question below ...
> > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > pr target/117088
> > > > > * config/i386/i386.cc
> > > > > (ix86_vector_costs::ix86_vect_in_deep_nested_loop_p): New 
> > > > > function.
> > > > > (ix86_vector_costs::finish_cost): Prevent loop vectorization
> > > > > if it's in a deeply nested loop and require versioning.
> > > > > * config/i386/i386.opt (--param=vect-max-loop-depth=): New
> > > > > param.
> > > > > ---
> > > > >  gcc/config/i386/i386.cc  | 89 
> > > > > 
> > > > >  gcc/config/i386/i386.opt |  4 ++
> > > > >  2 files changed, 93 insertions(+)
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > > index 526c9df7618..608f40413d2 100644
> > > > > --- a/gcc/config/i386/i386.cc
> > > > > +++ b/gcc/config/i386/i386.cc
> > > > > @@ -25019,6 +25019,8 @@ private:
> > > > >
> > > > >/* Estimate register pressure of the vectorized code.  */
> > > > >void ix86_vect_estimate_reg_pressure ();
> > > > > +  /* Check if vect_loop is in a deeply-nested loop.  */
> > > > > +  bool ix86_vect_in_deep_nested_loop_p (class loop *vect_loop);
> > > > >/* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's 
> > > > > used for
> > > > >   estimation of register pressure.
> > > > >   ??? Currently it's only used by vec_construct/scalar_to_vec
> > > > > @@ -25324,6 +25326,84 @@ 
> > > > > ix86_vector_costs::ix86_vect_estimate_reg_pressure ()
> > > > >  }
> > > > >  }
> > > > >
> > > > > +/* Return true if vect_loop is in a deeply-nested loop.
> > > > > +   .i.e vect_loop_n in below loop structure.
> > > > > +loop1
> > > > > +{
> > > > > + loop2
> > > > > + {
> > > > > +  loop3
> > > > > +  {
> > > > > +   vect_loop_1;
> > > > > +   loop4
> > > > > +   {
> > > > > +vect_loop_2;
> > > > > +loop5
> > > > > +{
> > > > > + vect_loop_3;
> > > > > + loop6
> > > > > + {
> > > > > +  vect_loop_4;
> > > > > +  loop7
> > > > > +  {
> > > > > +   vect_loop_5;
> > > > > +   loop8
> > > > > +   {
> > > > > +   loop9
> > > > > +   }
> > > > > +  vect_loop_6;
> > > > > +  }
> > > > > + vect_loop_7;
> > > > > + }
> > > > > +}
> > > > > +   }
> > > > > + }
> > > > > + It's a big hammer to fix O2 regression for 548.exchange_r after 
> > > > > vectorization
> > > > > + is enhanced by (r15-4225-g70c3db511ba14f)  */
> > > > > +bool
> > > > > +ix86_vector_costs::ix86_vect_in_deep_nested_loop_p (class loop 
> > > > > *vect_loop)
> > > > > +{
> > > > > +  if (loop_depth (vect_loop) > (unsigned) ix86_vect_max_loop_depth)
> > > > > +return true;
> > > > > +
> > > > > +  if (loop_depth (vect_loop) < 2)
> > > > > +return false;
> > > > > +
> > > >
> > > > while the above two are "obvious", what you check below isn't clear to 
> > > > me.
> > > > Is this trying to compute whether 'vect_loop' is inside of a loop nest 
> > > > which
> > > > at any sibling of vect_loop (or even sibling of an outer loop of 
> > > > vect_loop,
> > > > recursively) is a sub-nest with a loop depth (relative to what?) exceeds
> > > > ix86_vect_max_loop_depth?
> > > Yes, the function tries to find if the vect_loop is in a "big outer
> > > loop" which contains an innermost loop with loop_depth >
> > > ix86_vect_max_loop_depth.
> > > If yes, then prevent vectorization for the loop if its tripcount is
> > > not constant VF-times.(requires any kind of versioning is not
> > > accurate, and yes it's a big hammer.)
> >
> > I'll note it also doesn't seem to look at register pressure at all or limit
> > the cut-off to the very-cheap cost model?
> The default parameter ix86_vect_max_loop_depth implies the register
> pressure, for each l

Re: [PATCH] ifcombine: avoid unsound forwarder-enabled combinations [PR117723]

2024-11-29 Thread Richard Biener
On Fri, Nov 29, 2024 at 8:59 AM Alexandre Oliva  wrote:
>
>
> When ifcombining contiguous blocks, we can follow forwarder blocks and
> reverse conditions to enable combinations, but when there are
> intervening blocks, we have to constrain ourselves to paths to the
> exit that share the PHI args with all intervening blocks.
>
> Avoiding considering forwarders when intervening blocks were present
> would match the preexisting test, but we can do better, recording in
> case a forwarded path corresponds to the outer block's exit path, and
> insisting on not combining through any other path but the one that was
> verified as corresponding.  The latter is what this patch implements.
>
> While at that, I've fixed some typos, introduced early testing before
> computing the exit path to avoid it when computing it would be
> wasteful, or when avoiding it can enable other sound combinations.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

Thanks,
Richard.

>
> for  gcc/ChangeLog
>
> PR tree-optimization/117723
> * tree-ssa-ifcombine.cc (tree_ssa_ifcombine_bb): Record
> forwarder blocks in path to exit, and stick to them.  Avoid
> computing the exit if obviously not needed, and if that
> enables additional optimizations.
> (tree_ssa_ifcombine_bb_1): Fix typos.
>
> for  gcc/testsuite/ChangeLog
>
> PR tree-optimization/117723
> * gcc.dg/torture/ifcmb-1.c: New.
> ---
>  gcc/testsuite/gcc.dg/torture/ifcmb-1.c |   63 +
>  gcc/tree-ssa-ifcombine.cc  |  116 
> +++-
>  2 files changed, 161 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/ifcmb-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/ifcmb-1.c 
> b/gcc/testsuite/gcc.dg/torture/ifcmb-1.c
> new file mode 100644
> index 0..2431a548598fc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/ifcmb-1.c
> @@ -0,0 +1,63 @@
> +/* { dg-do run } */
> +
> +/* Test that we do NOT perform unsound transformations for any of these 
> cases.
> +   Forwarding blocks to the exit block used to enable some of them.  */
> +
> +[[gnu::noinline]]
> +int f0 (int a, int b) {
> +  if ((a & 1))
> +return 0;
> +  if (b)
> +return 1;
> +  if (!(a & 2))
> +return 0;
> +  else
> +return 1;
> +}
> +
> +[[gnu::noinline]]
> +int f1 (int a, int b) {
> +  if (!(a & 1))
> +return 0;
> +  if (b)
> +return 1;
> +  if ((a & 2))
> +return 1;
> +  else
> +return 0;
> +}
> +
> +[[gnu::noinline]]
> +int f2 (int a, int b) {
> +  if ((a & 1))
> +return 0;
> +  if (b)
> +return 1;
> +  if (!(a & 2))
> +return 0;
> +  else
> +return 1;
> +}
> +
> +[[gnu::noinline]]
> +int f3 (int a, int b) {
> +  if (!(a & 1))
> +return 0;
> +  if (b)
> +return 1;
> +  if ((a & 2))
> +return 1;
> +  else
> +return 0;
> +}
> +
> +int main() {
> +  if (f0 (0, 1) != 1)
> +__builtin_abort();
> +  if (f1 (1, 1) != 1)
> +__builtin_abort();
> +  if (f2 (2, 1) != 1)
> +__builtin_abort();
> +  if (f3 (3, 1) != 1)
> +__builtin_abort();
> +}
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index e389b12aa37db..a87bf1210776f 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -1077,7 +1077,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
> basic_block outer_cond_bb,
>  }
>
>/* The || form is characterized by a common then_bb with the
> - two edges leading to it mergable.  The latter is guaranteed
> + two edges leading to it mergeable.  The latter is guaranteed
>   by matching PHI arguments in the then_bb and the inner cond_bb
>   having no side-effects.  */
>if (phi_pred_bb != then_bb
> @@ -1088,7 +1088,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
> basic_block outer_cond_bb,
>
>  if (q) goto then_bb; else goto inner_cond_bb;
>
> -if (q) goto then_bb; else goto ...;
> +if (p) goto then_bb; else goto ...;
>
>  ...
> */
> @@ -1104,7 +1104,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
> basic_block outer_cond_bb,
>
>  if (q) goto inner_cond_bb; else goto then_bb;
>
> -if (q) goto then_bb; else goto ...;
> +if (p) goto then_bb; else goto ...;
>
>  ...
> */
> @@ -1139,13 +1139,18 @@ tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
>   Look for an OUTER_COND_BBs to combine with INNER_COND_BB.  They need not
>   be contiguous, as long as inner and intervening blocks have no side
>   effects, and are either single-entry-single-exit or conditionals 
> choosing
> - between the same EXIT_BB with the same PHI args, and the path leading to
> - INNER_COND_BB.  ??? We could potentially handle multi-block
> - single-entry-single-exit regions, but the loop below only deals with
> - single-e

[PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

2024-11-29 Thread Richard Biener
For a TSVC testcase we see failed register coalescing due to a
different schedule of GIMPLE .FMA and stores fed by it.  This
can be mitigated by making direct internal functions participate
in TER - given we're using more and more of such functions to
expose target capabilities it seems to be a natural thing to not
exempt those.

Unfortunately the internal function expanding API doesn't match
what we usually have - passing in a target and returning an RTX
but instead the LHS of the call is expanded and written to.  This
makes the TER expansion of a call SSA def a bit unwieldly.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

The ccmp changes have likely not seen any coverage, the debug stmt
changes might not be optimal, we might end up losing on replaceable
calls.

OK for trunk?  Or shall we simply call this "bad luck"?

Thanks,
Richard.

PR middle-end/117801
* tree-outof-ssa.cc (ssa_is_replaceable_p): Make
direct internal function calls replaceable.
* expr.cc (get_def_for_expr): Handle replacements with calls.
(get_def_for_expr_class): Likewise.
(optimize_bitfield_assignment_op): Likewise.
(expand_expr_real_1): Likewise.  Properly expand direct
internal function defs.
* cfgexpand.cc (expand_call_stmt): Handle replacements with calls.
(avoid_deep_ter_for_debug): Likewise, always create a debug temp
for calls.
(expand_debug_expr): Likewise, give up for calls.
(expand_gimple_basic_block): Likewise.
* ccmp.cc (ccmp_candidate_p): Likewise.
(get_compare_parts): Likewise.
---
 gcc/ccmp.cc   |  4 ++--
 gcc/cfgexpand.cc  | 14 +++---
 gcc/expr.cc   | 19 ++-
 gcc/tree-outof-ssa.cc | 15 ---
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
index 45629abadbe..4f739dfda50 100644
--- a/gcc/ccmp.cc
+++ b/gcc/ccmp.cc
@@ -100,7 +100,7 @@ ccmp_candidate_p (gimple *g, bool outer = false)
   tree_code tcode;
   basic_block bb;
 
-  if (!g)
+  if (!g || !is_gimple_assign (g))
 return false;
 
   tcode = gimple_assign_rhs_code (g);
@@ -138,7 +138,7 @@ get_compare_parts (tree t, int *up, rtx_code *rcode,
 {
   tree_code code;
   gimple *g = get_gimple_for_ssa_name (t);
-  if (g)
+  if (g && is_gimple_assign (g))
 {
   *up = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (g)));
   code = gimple_assign_rhs_code (g);
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 2a984758bc7..fff2462b408 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2848,6 +2848,7 @@ expand_call_stmt (gcall *stmt)
   if (builtin_p
  && TREE_CODE (arg) == SSA_NAME
  && (def = get_gimple_for_ssa_name (arg))
+ && is_gimple_assign (def)
  && gimple_assign_rhs_code (def) == ADDR_EXPR)
arg = gimple_assign_rhs1 (def);
   CALL_EXPR_ARG (exp, i) = arg;
@@ -4408,7 +4409,7 @@ avoid_deep_ter_for_debug (gimple *stmt, int depth)
   gimple *g = get_gimple_for_ssa_name (use);
   if (g == NULL)
continue;
-  if (depth > 6 && !stmt_ends_bb_p (g))
+  if ((depth > 6 || !is_gimple_assign (g)) && !stmt_ends_bb_p (g))
{
  if (deep_ter_debug_map == NULL)
deep_ter_debug_map = new hash_map;
@@ -5382,7 +5383,13 @@ expand_debug_expr (tree exp)
  t = *slot;
  }
if (t == NULL_TREE)
- t = gimple_assign_rhs_to_tree (g);
+ {
+   if (is_gimple_assign (g))
+ t = gimple_assign_rhs_to_tree (g);
+   else
+ /* expand_debug_expr doesn't handle CALL_EXPR right now.  */
+ return NULL;
+ }
op0 = expand_debug_expr (t);
if (!op0)
  return NULL;
@@ -5958,7 +5965,8 @@ expand_gimple_basic_block (basic_block bb, bool 
disable_tail_calls)
  /* Look for SSA names that have their last use here (TERed
 names always have only one real use).  */
  FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
-   if ((def = get_gimple_for_ssa_name (op)))
+   if ((def = get_gimple_for_ssa_name (op))
+   && is_gimple_assign (def))
  {
imm_use_iterator imm_iter;
use_operand_p use_p;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index cf87167ec0c..bb524f904fc 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtx-vector-builder.h"
 #include "tree-pretty-print.h"
 #include "flags.h"
+#include "internal-fn.h"
 
 
 /* If this is nonzero, we do not bother generating VOLATILE
@@ -3827,6 +3828,7 @@ get_def_for_expr (tree name, enum tree_code code)
 
   def_stmt = get_gimple_for_ssa_name (name);
   if (!def_stmt
+  || !is_gimple_assign (def_stmt)
   || gimple_assign_rhs_code (def_stmt) != code)
 return NULL;
 
@@ -3847,6 +3849,7 @@ g

[PATCH] ext-dce: Fix SIGN_EXTEND handling and cleanups [PR117360]

2024-11-29 Thread Jakub Jelinek
Hi!

This is mostly a blind attempt to fix the PR + various cleanups.
The PR is about a shift of a HOST_WIDE_INT by 127 invoking UB.

Most of carry_backpropagate works on GET_MODE_INNER of the operand,
mode is assigned
  enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
at the beginning and everything is done using that mode, so for
vector modes (or complex even?) we work with the element modes
rather than vector/complex modes.
But the SIGN_EXTEND handling does that inconsistently, it looks
at mode of the operand and uses GET_MODE_INNER in GET_MODE_MASK,
but doesn't use it in the shift.
The following patch appart from the cleanups fixes it by doing
essentially:
   mode = GET_MODE (XEXP (x, 0));
   if (mask & ~GET_MODE_MASK (GET_MODE_INNER (mode)))
-   mask |= 1ULL << (GET_MODE_BITSIZE (mode).to_constant () - 1);
+   mask |= 1ULL << (GET_MODE_BITSIZE (GET_MODE_INNER (mode)).to_constant 
() - 1);
i.e. also shifting by GET_MODE_BITSIZE of the GET_MODE_INNER of the
operand's mode.  We don't need to check if it is at most 64 bits,
at the start of the function we've already verified the result mode
is at most 64 bits and SIGN_EXTEND by definition extends from a narrower
mode.

The rest of the patch are cleanups.  For HOST_WIDE_INT we have the
HOST_WIDE_INT_{UC,1U} macros, a HWI isn't necessarily unsigned long long,
so using ULL suffixes for it is weird.

More importantly, the function does
  scalar_int_mode smode;
  if (!is_a  (mode, &smode)
  || GET_MODE_BITSIZE (smode) > HOST_BITS_PER_WIDE_INT)
return mmask;
early, so we don't need to use GET_MODE_BITSIZE (mode) which is
a poly_int but can use GET_MODE_BITSIZE (smode) with the same value
but in unsigned short, so we don't need to use known_lt or .to_constant ()
everywhere.

Plus some formatting issues.

What I've left around is
  if (!GET_MODE_BITSIZE (GET_MODE (x)).is_constant ()
  || !GET_MODE_BITSIZE (GET_MODE (XEXP (x, 0))).is_constant ())
return -1;
at the start of SIGN_EXTEND or ZERO_EXTEND, I'm afraid I don't know enough
about aarch64/riscv VL vectors to know why this is done (though even that
return -1; is weird, rest of the code does return mmask; if it wants to
punt.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-29  Jakub Jelinek  

PR rtl-optimization/117360
* ext-dce.cc (ext_dce_process_sets): Use HOST_WIDE_INT_UC
macro instead of ULL suffixed constants.
(carry_backpropagate): Likewise.  Use HOST_WIDE_INT_1U instead of
1ULL.  Use GET_MODE_BITSIZE (smode) instead of
GET_MODE_BITSIZE (mode) and with that avoid having to use
known_lt instead of < or use .to_constant ().  Formatting fixes.
(case SIGN_EXTEND): Set mode to GET_MODE_INNER (GET_MODE (XEXP (x, 0)))
rather than GET_MODE (XEXP (x, 0)) and don't use GET_MODE_INNER (mode).
(ext_dce_process_uses): Use HOST_WIDE_INT_UC macro instead of ULL
suffixed constants.

--- gcc/ext-dce.cc.jj   2024-11-18 09:05:00.264282397 +0100
+++ gcc/ext-dce.cc  2024-11-28 16:23:28.450616681 +0100
@@ -357,8 +357,8 @@ ext_dce_process_sets (rtx_insn *insn, rt
 Note that BIT need not be a power of two, consider a
 ZERO_EXTRACT destination.  */
  int start = (bit < 8 ? 0 : bit < 16 ? 1 : bit < 32 ? 2 : 3);
- int end = ((mask & ~0xULL) ? 4
-: (mask & 0xULL) ? 3
+ int end = ((mask & ~HOST_WIDE_INT_UC (0x)) ? 4
+: (mask & HOST_WIDE_INT_UC (0x)) ? 3
 : (mask & 0xff00) ? 2 : 1);
  bitmap_clear_range (livenow, 4 * rn + start, end - start);
}
@@ -509,21 +509,21 @@ carry_backpropagate (unsigned HOST_WIDE_
 case PLUS:
 case MINUS:
 case MULT:
-  return (2ULL << floor_log2 (mask)) - 1;
+  return (HOST_WIDE_INT_UC (2) << floor_log2 (mask)) - 1;
 
 /* We propagate for the shifted operand, but not the shift
count.  The count is handled specially.  */
 case ASHIFT:
   if (CONST_INT_P (XEXP (x, 1))
- && known_lt (UINTVAL (XEXP (x, 1)), GET_MODE_BITSIZE (mode)))
-   return (HOST_WIDE_INT)mask >> INTVAL (XEXP (x, 1));
-  return (2ULL << floor_log2 (mask)) - 1;
+ && UINTVAL (XEXP (x, 1)) < GET_MODE_BITSIZE (smode))
+   return (HOST_WIDE_INT) mask >> INTVAL (XEXP (x, 1));
+  return (HOST_WIDE_INT_UC (2) << floor_log2 (mask)) - 1;
 
 /* We propagate for the shifted operand, but not the shift
count.  The count is handled specially.  */
 case LSHIFTRT:
   if (CONST_INT_P (XEXP (x, 1))
- && known_lt (UINTVAL (XEXP (x, 1)), GET_MODE_BITSIZE (mode)))
+ && UINTVAL (XEXP (x, 1)) < GET_MODE_BITSIZE (smode))
return mmask & (mask << INTVAL (XEXP (x, 1)));
   return mmask;
 
@@ -531,12 +531,12 @@ carry_backpropagate (unsigned HOST_WIDE_
count.  The count is handled

Re: [PATCH] Fortran: fix crash with bounds check writing array section [PR117791]

2024-11-29 Thread Andreas Schwab
../../gcc/fortran/trans-io.cc: In function 'tree_node* 
gfc_trans_transfer(gfc_code*)':
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 'EXPR_UNKNOWN' 
not handled in switch [-Werror=switch]
 2662 | switch (ref->u.ar.start[n]->expr_type)
  |^
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 'EXPR_CONSTANT' 
not handled in switch [-Werror=switch]
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 'EXPR_VARIABLE' 
not handled in switch [-Werror=switch]
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 
'EXPR_SUBSTRING' not handled in switch [-Werror=switch]
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 
'EXPR_STRUCTURE' not handled in switch [-Werror=switch]
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 'EXPR_ARRAY' 
not handled in switch [-Werror=switch]
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 'EXPR_NULL' not 
handled in switch [-Werror=switch]
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 'EXPR_COMPCALL' 
not handled in switch [-Werror=switch]
../../gcc/fortran/trans-io.cc:2662:24: error: enumeration value 'EXPR_PPC' not 
handled in switch [-Werror=switch]
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1203: fortran/trans-io.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] Fortran: fix crash with bounds check writing array section [PR117791]

2024-11-29 Thread Tobias Burnus

H Harald, hi Paul,

Harald Anlauf wrote:

Pushed as r15-5766 .


This caused a build fail; see also: https://gcc.gnu.org/PR117843

It looks as if a 'default: break;' is missing.

…/gcc/fortran/trans-io.cc: In function 'tree_node* 
gfc_trans_transfer(gfc_code*)':
…/gcc/fortran/trans-io.cc:2662:24: error: enumeration value 
'EXPR_UNKNOWN' not handled in switch [-Werror=switch]

 2662 | switch (ref->u.ar.start[n]->expr_type)
  |^

Tobias


[PATCH] gimple-fold: Fix up type_has_padding_at_level_p [PR117065]

2024-11-29 Thread Jakub Jelinek
Hi!

The following testcase used to ICE on the trunk since the clear small
object if it has padding optimization before my r15-5746 change,
now it doesn't just because type_has_padding_at_level_p isn't called
on the testcase.

Though, as the testcase shows, structures/unions which contain erroneous
types of one or more of its members can have TREE_TYPE of the FIELD_DECL
error_mark_node, on which we can crash.

E.g. the __builtin_clear_padding lowering just ignores those:
if (TREE_TYPE (field) == error_mark_node)
  continue;
and
if (ftype == error_mark_node)
  continue;
It doesn't matter much what exactly we do for those cases, as we are going
to fail the compilation anyway, but we shouldn't crash.

So, the following patch ignores those in type_has_padding_at_level_p.
For RECORD_TYPE, we already return if !DECL_SIZE (f) which I think should
cover already the erroneous fields (and we don't use TYPE_SIZE on those).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-29  Jakub Jelinek  

PR middle-end/117065
* gimple-fold.cc (type_has_padding_at_level_p) :
Also continue if f has error_mark_node type.

* gcc.dg/pr117065.c: New test.

--- gcc/gimple-fold.cc.jj   2024-11-28 11:38:08.545042716 +0100
+++ gcc/gimple-fold.cc  2024-11-28 18:11:02.613232891 +0100
@@ -4863,7 +4863,7 @@ type_has_padding_at_level_p (tree type)
   any_fields = false;
   /* If any of the fields is smaller than the whole, there is padding.  */
   for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
-   if (TREE_CODE (f) != FIELD_DECL)
+   if (TREE_CODE (f) != FIELD_DECL || TREE_TYPE (f) == error_mark_node)
  continue;
else if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
   TYPE_SIZE (type)) != 1)
--- gcc/testsuite/gcc.dg/pr117065.c.jj  2024-11-28 18:14:33.526291760 +0100
+++ gcc/testsuite/gcc.dg/pr117065.c 2024-11-28 18:15:15.515706162 +0100
@@ -0,0 +1,12 @@
+/* PR middle-end/117065 */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu23" } */
+
+union U { struct A a; unsigned long long b; }; /* { dg-error "field 'a' has 
incomplete type" } */
+
+union U
+foo (void)
+{
+  union U u = { .b = 1 };
+  return u;
+}

Jakub



Re: [PATCH 1/2]middle-end: refactor type to be explicit in operand_equal_p [PR114932]

2024-11-29 Thread Richard Biener
On Tue, Aug 20, 2024 at 3:07 PM Tamar Christina  wrote:
>
> Hi All,
>
> This is a refactoring with no expected behavioral change.
> The goal with this is to make the type of the expressions being used explicit.
>
> I did not change all the recursive calls to operand_equal_p () to recurse
> directly to the new function but instead this goes through the top level call
> which re-extracts the types.
>
> This was done because in most of the cases where we recurse type == arg.
> The second patch makes use of this new flexibility to implement an overload
> of operand_equal_p which checks for equality under two's complement.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/114932
> * fold-const.cc (operand_compare::operand_equal_p): Split into one 
> that
> takes explicit type parameters and use that in public one.
> * fold-const.h (class operand_compare): Add operand_equal_p private
> overload.
>
> ---
> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> index 
> b82ef137e2f2096f86c20df3c7749747e604177e..878545b1148b839e8a8e866f38e31161f0d116c8
>  100644
> --- a/gcc/fold-const.h
> +++ b/gcc/fold-const.h
> @@ -273,6 +273,12 @@ protected:
>   true is returned.  Then RET is set to corresponding comparsion result.  
> */
>bool verify_hash_value (const_tree arg0, const_tree arg1, unsigned int 
> flags,
>   bool *ret);
> +
> +private:
> +  /* Return true if two operands are equal.  The flags fields can be used
> + to specify OEP flags described in tree-core.h.  */
> +  bool operand_equal_p (tree, const_tree, tree, const_tree,
> +   unsigned int flags);
>  };
>
>  #endif // GCC_FOLD_CONST_H
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 
> 8908e7381e72cbbf4a8fd96f18cbf4436aba8441..71e82b1d76d4106c7c23c54af8b35905a1af9f1c
>  100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -3156,6 +3156,17 @@ combine_comparisons (location_t loc,
>  bool
>  operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
>   unsigned int flags)
> +{
> +  return operand_equal_p (TREE_TYPE (arg0), arg0, TREE_TYPE (arg1), arg1, 
> flags);
> +}
> +
> +/* The same as operand_equal_p however the type of ARG0 and ARG1 are assumed 
> to be
> +   the TYPE0 and TYPE1 respectively.  */
> +
> +bool
> +operand_compare::operand_equal_p (tree type0, const_tree arg0,
> + tree type1, const_tree arg1,

did you try using const_tree for type0/type1?

> + unsigned int flags)
>  {
>bool r;
>if (verify_hash_value (arg0, arg1, flags, &r))
> @@ -3166,25 +3177,25 @@ operand_compare::operand_equal_p (const_tree arg0, 
> const_tree arg1,
>
>/* If either is ERROR_MARK, they aren't equal.  */
>if (TREE_CODE (arg0) == ERROR_MARK || TREE_CODE (arg1) == ERROR_MARK
> -  || TREE_TYPE (arg0) == error_mark_node
> -  || TREE_TYPE (arg1) == error_mark_node)
> +  || type0 == error_mark_node
> +  || type1 == error_mark_node)
>  return false;
>
>/* Similar, if either does not have a type (like a template id),
>   they aren't equal.  */
> -  if (!TREE_TYPE (arg0) || !TREE_TYPE (arg1))
> +  if (!type0 || !type1)
>  return false;
>
>/* Bitwise identity makes no sense if the values have different layouts.  
> */
>if ((flags & OEP_BITWISE)
> -  && !tree_nop_conversion_p (TREE_TYPE (arg0), TREE_TYPE (arg1)))
> +  && !tree_nop_conversion_p (type0, type1))
>  return false;
>
>/* We cannot consider pointers to different address space equal.  */
> -  if (POINTER_TYPE_P (TREE_TYPE (arg0))
> -  && POINTER_TYPE_P (TREE_TYPE (arg1))
> -  && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
> - != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
> +  if (POINTER_TYPE_P (type0)
> +  && POINTER_TYPE_P (type1)
> +  && (TYPE_ADDR_SPACE (TREE_TYPE (type0))
> + != TYPE_ADDR_SPACE (TREE_TYPE (type1
>  return false;
>
>/* Check equality of integer constants before bailing out due to
> @@ -3211,12 +3222,15 @@ operand_compare::operand_equal_p (const_tree arg0, 
> const_tree arg1,
>
>/* If both types don't have the same precision, then it is not safe
>  to strip NOPs.  */
> -  if (element_precision (TREE_TYPE (arg0))
> - != element_precision (TREE_TYPE (arg1)))
> +  if (element_precision (type0)
> + != element_precision (type1))
> return false;
>
>STRIP_NOPS (arg0);
>STRIP_NOPS (arg1);
> +
> +  type0 = TREE_TYPE (arg0);
> +  type1 = TREE_TYPE (arg1);
>  }
>  #if 0
>/* FIXME: Fortran FE currently produce ADDR_EXPR of NOP_EXPR. Enable the
> @@ -3275,9 +3289,9 @@ operand_compare::operand_equal_p (const_tree arg0, 
> const_tree arg1,
>
>

Re: [PATCH] gimple-fold: Fix up type_has_padding_at_level_p [PR117065]

2024-11-29 Thread Richard Biener
On Fri, 29 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase used to ICE on the trunk since the clear small
> object if it has padding optimization before my r15-5746 change,
> now it doesn't just because type_has_padding_at_level_p isn't called
> on the testcase.
> 
> Though, as the testcase shows, structures/unions which contain erroneous
> types of one or more of its members can have TREE_TYPE of the FIELD_DECL
> error_mark_node, on which we can crash.
> 
> E.g. the __builtin_clear_padding lowering just ignores those:
> if (TREE_TYPE (field) == error_mark_node)
>   continue;
> and
> if (ftype == error_mark_node)
>   continue;
> It doesn't matter much what exactly we do for those cases, as we are going
> to fail the compilation anyway, but we shouldn't crash.
> 
> So, the following patch ignores those in type_has_padding_at_level_p.
> For RECORD_TYPE, we already return if !DECL_SIZE (f) which I think should
> cover already the erroneous fields (and we don't use TYPE_SIZE on those).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-11-29  Jakub Jelinek  
> 
>   PR middle-end/117065
>   * gimple-fold.cc (type_has_padding_at_level_p) :
>   Also continue if f has error_mark_node type.
> 
>   * gcc.dg/pr117065.c: New test.
> 
> --- gcc/gimple-fold.cc.jj 2024-11-28 11:38:08.545042716 +0100
> +++ gcc/gimple-fold.cc2024-11-28 18:11:02.613232891 +0100
> @@ -4863,7 +4863,7 @@ type_has_padding_at_level_p (tree type)
>any_fields = false;
>/* If any of the fields is smaller than the whole, there is padding.  
> */
>for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> - if (TREE_CODE (f) != FIELD_DECL)
> + if (TREE_CODE (f) != FIELD_DECL || TREE_TYPE (f) == error_mark_node)
> continue;
>   else if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
>  TYPE_SIZE (type)) != 1)
> --- gcc/testsuite/gcc.dg/pr117065.c.jj2024-11-28 18:14:33.526291760 
> +0100
> +++ gcc/testsuite/gcc.dg/pr117065.c   2024-11-28 18:15:15.515706162 +0100
> @@ -0,0 +1,12 @@
> +/* PR middle-end/117065 */
> +/* { dg-do compile } */
> +/* { dg-options "-std=gnu23" } */
> +
> +union U { struct A a; unsigned long long b; };   /* { dg-error "field 
> 'a' has incomplete type" } */
> +
> +union U
> +foo (void)
> +{
> +  union U u = { .b = 1 };
> +  return u;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

2024-11-29 Thread Jakub Jelinek
On Fri, Nov 29, 2024 at 09:19:55AM +0100, Richard Biener wrote:
> For a TSVC testcase we see failed register coalescing due to a
> different schedule of GIMPLE .FMA and stores fed by it.  This
> can be mitigated by making direct internal functions participate
> in TER - given we're using more and more of such functions to
> expose target capabilities it seems to be a natural thing to not
> exempt those.
> 
> Unfortunately the internal function expanding API doesn't match
> what we usually have - passing in a target and returning an RTX
> but instead the LHS of the call is expanded and written to.  This
> makes the TER expansion of a call SSA def a bit unwieldly.

Can't we change that?
Especially if it is only for the easiest subset of internal fns
(I see you limit it only to direct_internal_fn_p), if it has just
one or a couple of easy implementations, those could be split into
one which handles the whole thing by just expanding lhs and calling
another function with the rtx target argument into which to store
stuff (or const0_rtx for ignored result?) and handle the actual expansion,
and then have an exported function from internal-fn.cc which expr.cc
could call for the TERed internal-fn case.
That function could assert it is only direct_internal_fn_p or some
other subset which it would handle.

Jakub



Re: [PATCH v3 0/4] Hard Register Constraints

2024-11-29 Thread Stefan Schulze Frielinghaus
Ping.

On Fri, Oct 25, 2024 at 11:57:16AM +0200, Stefan Schulze Frielinghaus wrote:
> This is a follow-up to
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663238.html
> 
> The primary changes are about error handling and documentation updates.
> Now, we error out whenever a hard register constraint is used more than
> once across an alternative for outputs or inputs.  For example, the
> following is allowed for register asm
> 
>   register int y __asm__ ("0") = x;
>   __asm__ ("" : "=r" (y) : "0" (y), "r" (y));
> 
> and the analogue for hard register constraints
> 
>   int y = x;
>   __asm__ ("" : "={0}" (y) : "0" (y), "{0}" (y));  // invalid
> 
> is rejected.
> 
> Furthermore, for hard register constraints we fail if an output object
> is used more than once as e.g.
> 
>   int x;
>   asm ("" : "=r" (x), "={1}" (x));  // rejected
> 
> although
> 
>   int x;
>   asm ("" : "=r" (x), "=r" (x));
> 
> is accepted.
> 
> Thus, in total the changes make hard register constraints more strict in
> order to prevent subtle bugs.
> 
> Stefan Schulze Frielinghaus (4):
>   Hard register constraints
>   Error handling for hard register constraints
>   genoutput: Verify hard register constraints
>   Rewrite register asm into hard register constraints
> 
>  gcc/cfgexpand.cc  |  42 ---
>  gcc/common.opt|   4 +
>  gcc/config/cris/cris.cc   |   6 +-
>  gcc/config/i386/i386.cc   |   6 +
>  gcc/config/s390/s390.cc   |   6 +-
>  gcc/doc/extend.texi   | 178 +++
>  gcc/doc/md.texi   |   6 +
>  gcc/function.cc   | 116 
>  gcc/genoutput.cc  |  60 
>  gcc/genpreds.cc   |   4 +-
>  gcc/gimplify.cc   | 236 ++-
>  gcc/gimplify_reg_info.h   | 169 +++
>  gcc/ira.cc|  79 -
>  gcc/lra-constraints.cc|  13 +
>  gcc/output.h  |   2 +
>  gcc/recog.cc  |  11 +-
>  gcc/stmt.cc   | 278 +-
>  gcc/stmt.h|   9 +-
>  gcc/testsuite/gcc.dg/asm-hard-reg-1.c |  85 ++
>  gcc/testsuite/gcc.dg/asm-hard-reg-2.c |  33 +++
>  gcc/testsuite/gcc.dg/asm-hard-reg-3.c |  25 ++
>  gcc/testsuite/gcc.dg/asm-hard-reg-4.c |  50 
>  gcc/testsuite/gcc.dg/asm-hard-reg-5.c |  36 +++
>  gcc/testsuite/gcc.dg/asm-hard-reg-6.c |  60 
>  gcc/testsuite/gcc.dg/asm-hard-reg-7.c |  41 +++
>  gcc/testsuite/gcc.dg/asm-hard-reg-8.c |  49 +++
>  .../gcc.dg/asm-hard-reg-demotion-1.c  |  19 ++
>  .../gcc.dg/asm-hard-reg-demotion-2.c  |  19 ++
>  .../gcc.dg/asm-hard-reg-demotion-error-1.c|  29 ++
>  gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h  |  52 
>  gcc/testsuite/gcc.dg/asm-hard-reg-error-1.c   |  83 ++
>  gcc/testsuite/gcc.dg/asm-hard-reg-error-2.c   |  26 ++
>  gcc/testsuite/gcc.dg/asm-hard-reg-error-3.c   |  27 ++
>  gcc/testsuite/gcc.dg/asm-hard-reg-error-4.c   |  24 ++
>  gcc/testsuite/gcc.dg/asm-hard-reg-error-5.c   |  13 +
>  gcc/testsuite/gcc.dg/pr87600-2.c  |  30 +-
>  gcc/testsuite/gcc.dg/pr87600-3.c  |  35 +++
>  .../gcc.target/s390/asm-hard-reg-1.c  | 103 +++
>  .../gcc.target/s390/asm-hard-reg-2.c  |  43 +++
>  .../gcc.target/s390/asm-hard-reg-3.c  |  42 +++
>  .../gcc.target/s390/asm-hard-reg-4.c  |   6 +
>  .../gcc.target/s390/asm-hard-reg-5.c  |   6 +
>  .../gcc.target/s390/asm-hard-reg-longdouble.h |  18 ++
>  gcc/testsuite/lib/scanasm.exp |   4 +
>  gcc/toplev.cc |   4 +
>  45 files changed, 2087 insertions(+), 100 deletions(-)
>  create mode 100644 gcc/gimplify_reg_info.h
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-8.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-error-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-error-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/asm-h

[PUSHED] fortran: Add default to switch in gfc_trans_transfer [PR117843]

2024-11-29 Thread Andrew Pinski
This fixes a bootstrap failure due to a warning on enum values not being
handled. In this case, it is just checking two values and the rest should
are not handled so adding a default case fixes the issue.

Pushed as obvious.

PR fortran/117843
gcc/fortran/ChangeLog:

* trans-io.cc (gfc_trans_transfer): Add default case.

Signed-off-by: Andrew Pinski 
---
 gcc/fortran/trans-io.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/fortran/trans-io.cc b/gcc/fortran/trans-io.cc
index 906dd7c6eb6..9b0b8cfdff9 100644
--- a/gcc/fortran/trans-io.cc
+++ b/gcc/fortran/trans-io.cc
@@ -2664,6 +2664,8 @@ gfc_trans_transfer (gfc_code * code)
  case EXPR_FUNCTION:
  case EXPR_OP:
goto scalarize;
+ default:
+   break;
  }
  }
}
-- 
2.43.0



Re: [PATCH 2/2]middle-end: use two's complement equality when comparing IVs during candidate selection [PR114932]

2024-11-29 Thread Richard Biener
On Tue, Aug 20, 2024 at 3:08 PM Tamar Christina  wrote:
>
> Hi All,
>
> IVOPTS normally uses affine trees to perform comparisons between different 
> IVs,
> but these seem to have been missing in two key spots and instead normal tree
> equivalencies used.
>
> In some cases where we have a two-complements equivalence but not a strict
> signedness equivalencies we end up generating both a signed and unsigned IV 
> for
> the same candidate.
>
> This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag will
> check if the operands would produce the same bit values after the computations
> even if the final sign is different.

I think the name is badly chosen - we already have OEP_LEXICOGRAPHIC and
OEP_BITWISE.  I would suggest to use OEP_ASSUME_TWOS_COMPLEMENT
or OEP_ASSUME_WRAPV.

> This happens quite a lot with fortran but can also happen in C because this 
> came
> code is unable to figure out when one expression is a multiple of another.
>
> As an example in the attached testcase we get:
>
> Initial set of candidates:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>group:0 --> iv_cand:6, cost=(0,1)
>group:1 --> iv_cand:1, cost=(0,0)
>group:2 --> iv_cand:8, cost=(0,1)
>group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
>
> :
> inv_expr 1: stride.3_27 * 4
> inv_expr 2: (unsigned long) stride.3_27 * 4
>
> These end up being used in the same group:
>
> Group 1:
> cand  costcompl.  inv.expr.   inv.vars
> 1 0   0   NIL;6
> 2 0   0   NIL;6
> 3 0   0   NIL;6
>
> which ends up with IV opts picking the signed and unsigned IVs:
>
> Improved to:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>group:0 --> iv_cand:6, cost=(0,1)
>group:1 --> iv_cand:1, cost=(0,0)
>group:2 --> iv_cand:8, cost=(0,1)
>group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
>
> and so generates the same IV as both signed and unsigned:
>
> ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 
> 58.2545), maybe hot
> ;;prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   28 [always]  count:23622320 (estimated locally, freq 
> 6.4080) (FALLTHRU,EXECUTABLE)
> ;;25 [always]  count:191126046 (estimated locally, freq 
> 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
>   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
>   # ivtmp.26_51 = PHI 
>   # ivtmp.28_90 = PHI 
>
> ...
>
> ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 
> 58.2545), maybe hot
> ;;prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> ;;pred:   22 [always]  count:95443719 (estimated locally, freq 
> 25.8909) (FALLTHRU)
> ;;21 [33.3% (guessed)]  count:71582790 (estimated locally, 
> freq 19.4182) (TRUE_VALUE,EXECUTABLE)
> ;;31 [33.3% (guessed)]  count:47721860 (estimated locally, 
> freq 12.9455) (TRUE_VALUE,EXECUTABLE)
> # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> ivtmp.22_82 = ivtmp.22_41 + 1;
> ivtmp.26_72 = ivtmp.26_51 + _80;
> ivtmp.28_98 = ivtmp.28_90 + _39;
>
> These two IVs are always used as unsigned, so IV ops generates:
>
>   _73 = stride.3_27 * 4;
>   _80 = (unsigned long) _73;
>   _54 = (unsigned long) stride.3_27;
>   _39 = _54 * 4;
>
> Which means that in e.g. exchange2 we generate a lot of duplicate code.
>
> This is because candidate 6 and 8 are equivalent under two's complement but 
> have
> different signs.
>
> This patch changes it so that if you have two IVs that are affine equivalent 
> to
> just pick one over the other.  IV already has code for this, so the patch just
> uses affine trees instead of tree for the check.
>
> With it we get:
>
> :
> inv_expr 1: stride.3_27 * 4
>
> :
> Group 0:
>   cand  costcompl.  inv.expr.   inv.vars
>   5 0   2   NIL;NIL;
>   6 0   3   NIL;NIL;
>
> Group 1:
>   cand  costcompl.  inv.expr.   inv.vars
>   1 0   0   NIL;6
>   2 0   0   NIL;6
>   3 0   0   NIL;6
>   4 0   0   NIL;6
>
> Initial set of candidates:
>   cost: 16 (complexity 3)
>   reg_cost: 6
>   cand_cost: 10
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6
>group:0 --> iv_cand:6, cost=(0,3)
>group:1 --> iv_cand:1, cost=(0,0)
>   invariant variables: 6
>   invariant expressions: 1
>
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
> time. There's also a 5% performance improvement in fotonik3d and similar
> reduction in binary size.
>
> Bootstrapped Regtested on aarch64-none-linux-

[PATCH] testsuite: Add check vect_unpack for pr117776.cc [PR117844]

2024-11-29 Thread Andrew Pinski
I had missed that you need to check vect_unpack if you are
vectorizing a conversion from char to int.

Pushed as obvious after a quick test.

PR testsuite/117844
gcc/testsuite/ChangeLog:

* g++.dg/vect/pr117776.cc: Check vect_unpack.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/vect/pr117776.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/vect/pr117776.cc 
b/gcc/testsuite/g++.dg/vect/pr117776.cc
index cbb8079bd91..71eb88c0c42 100644
--- a/gcc/testsuite/g++.dg/vect/pr117776.cc
+++ b/gcc/testsuite/g++.dg/vect/pr117776.cc
@@ -1,5 +1,6 @@
 // { dg-do compile }
 // { dg-require-effective-target vect_int }
+// { dg-require-effective-target vect_unpack }
 
 // PR tree-optimization/117776
 
-- 
2.43.0



Re: [PATCH v3] MATCH: Simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, -1.0/1.0)` [PR112472]

2024-11-29 Thread Richard Biener
On Thu, Nov 14, 2024 at 11:59 AM Eikansh Gupta
 wrote:
>
> This patch simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, 
> -1.0/1.0)`
> depending on the sign of CST. Previously, it was simplified to `copysign (x, 
> CST)`.
> It can be optimized as the sign of the CST matters, not the value.
>
> The patch also simplify `(trunc)abs (extend x)` to `abs (x)`.

Please do not mix two different changes.

> PR tree-optimization/112472
>
> gcc/ChangeLog:
>
> * match.pd ((trunc)copysign ((extend)x, -CST) --> copysign (x, 
> -1.0)): New pattern.
> ((trunc)abs (extend x) --> abs (x)): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr112472.c: New test.
>
> Signed-off-by: Eikansh Gupta 
> ---
>  gcc/match.pd | 25 +++-
>  gcc/testsuite/gcc.dg/tree-ssa/pr112472.c | 22 +
>  2 files changed, 42 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112472.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 00988241348..5b930beb418 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8854,19 +8854,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  type, OPTIMIZE_FOR_BOTH))
> (tos @0
>
> -/* Simplify (trunc)copysign ((extend)x, (extend)y) to copysignf (x, y),
> -   x,y is float value, similar for _Float16/double.  */
> +/* Simplify (trunc)copysign ((extend)x, (extend)y) to copysignf (x, y) and
> +   simplify (trunc)copysign ((extend)x, CST) to copysign (x, -1.0/1.0).
> +   x,y is float value, similar for _Float16/double. */
>  (for copysigns (COPYSIGN_ALL)
>   (simplify
> -  (convert (copysigns (convert@2 @0) (convert @1)))
> +  (convert (copysigns (convert@2 @0) (convert2? @1)))

You want to capture convert2? with @3

> (if (optimize
> && !HONOR_SNANS (@2)
> && types_match (type, TREE_TYPE (@0))
> -   && types_match (type, TREE_TYPE (@1))
> && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@2))
> && direct_internal_fn_supported_p (IFN_COPYSIGN,
>   type, OPTIMIZE_FOR_BOTH))
> -(IFN_COPYSIGN @0 @1
> + (if (TREE_CODE (@1) == REAL_CST)

and check TREE_CODE (@3) == REAL_CST, we might not always
fold a conversion of a FP constant.

> +  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
> +   (IFN_COPYSIGN @0 { build_minus_one_cst (type); })
> +   (IFN_COPYSIGN @0 { build_one_cst (type); }))
> +  (if (types_match (type, TREE_TYPE (@1)))
> +   (IFN_COPYSIGN @0 @1))
> +
> +/* (trunc)abs (extend x) --> abs (x)
> +   x is a float value */
> +(simplify
> + (convert (abs (convert@1 @0)))
> +  (if (optimize
> +  && !HONOR_SNANS (@1)
> +  && types_match (type, TREE_TYPE (@0))
> +  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
> +   (abs @0)))

This one is OK, but I don't see a testcase?  Please split it out to a
separate patch
and add one.

Richard.

>  (for froms (BUILT_IN_FMAF BUILT_IN_FMA BUILT_IN_FMAL)
>   tos (IFN_FMA IFN_FMA IFN_FMA)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c
> new file mode 100644
> index 000..8f97278ffe8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/109878 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized" } */
> +
> +/* Optimized to .COPYSIGN(a, -1.0e+0) */
> +float f(float a)
> +{
> +  return (float)__builtin_copysign(a, -3.0);
> +}
> +
> +/* This gets converted to (float) abs((double) a)
> +   With the patch it is optimized to abs(a) */
> +float f2(float a)
> +{
> +  return (float)__builtin_copysign(a, 5.0);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "= __builtin_copysign" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " double " "optimized" { target 
> ifn_copysign } } } */
> +/* { dg-final { scan-tree-dump-times ".COPYSIGN" 1 "optimized" { target 
> ifn_copysign } } } */
> +/* { dg-final { scan-tree-dump-times "-1.0e\\+0" 1 "optimized" { target 
> ifn_copysign } } } */
> +/* { dg-final { scan-tree-dump-times " ABS_EXPR " 1 "optimized" { target 
> ifn_copysign } } } */
> --
> 2.17.1
>


Re: [PATCH v3] c++, coroutines: Fix awaiter var creation [PR116506].

2024-11-29 Thread Iain Sandoe
Hi Jason,

gentle ping for this one (sorry I forgot to ping earlier and then WG21 …)


> On 31 Oct 2024, at 08:40, Iain Sandoe  wrote:
> 
> This version tested on x86_64-darwin,linux, powerpc64-linux, on folly
> and by Sam on wider codebases,
> 
 Why don't you need a variable to preserve o across suspensions if it's a 
 call returning lvalue reference?
>>> We always need a space for the awaiter, unless it is already a 
>>> variable/parameter (or part of one).
 I suspect that the simple case is not lvalue_p, but !TREE_SIDE_EFFECTS.
>>> That is likely where I’m going wrong - we must not generate a variable for 
>>> any case that already has one (or a parm), but we must for any case that is 
>>> a temporary.
>>> So, I should adjust the logic to use !TREE_SIDE_EFFECTS.
> 
>> Or perhaps DECL_P.  The difference would be for compound lvalues like *p or 
>> a[n]; if the value of p or a or n could change across suspension, the same 
>> side-effect-free lvalue expression could refer to a different object.
> 
> Right, part of the code that was elided catered for the compound values by
> making a reference to the original entity and placing that in the frame. We
> restore that behaviour here.
> 
> Note that there is no point in making a reference to an xvalue (we'd only
> have to save the expiring value in the frame anyway), so we just go ahead
> and build that var directly.  There is one small additional optimisation,
> in that building a reference to a non-pointer component ref wastes frame
> space if the underlying entity is a variable or parameter.
> 
> Thanks for the help in working through the different cases here, OK for
> trunk now?
> thanks
> Iain.
> 
> --- 8< ---
> 
> Awaiters always need to have a coroutine state frame copy since
> they persist across potential supensions.  It simplifies the later
> analysis considerably to assign these early which we do when
> building co_await expressions.
> 
> The cleanups in r15-3146-g47dbd69b1, unfortunately elided some of
> processing used to cater for cases where the var created from an
> xvalue, or is a pointer/reference type.
> 
> Corrected thus.
> 
>   PR c++/116506
>   PR c++/116880
> 
> gcc/cp/ChangeLog:
> 
>   * coroutines.cc (build_co_await): Ensure that xvalues are
>   materialised.  Handle references/pointer values in awaiter
>   access expressions.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/coroutines/pr116506.C: New test.
>   * g++.dg/coroutines/pr116880.C: New test.
> 
> Signed-off-by: Iain Sandoe 
> ---
> gcc/cp/coroutines.cc   | 82 +-
> gcc/testsuite/g++.dg/coroutines/pr116506.C | 53 ++
> gcc/testsuite/g++.dg/coroutines/pr116880.C | 36 ++
> 3 files changed, 154 insertions(+), 17 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116506.C
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116880.C
> 
> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
> index ba326bcd627..dde8ba4e614 100644
> --- a/gcc/cp/coroutines.cc
> +++ b/gcc/cp/coroutines.cc
> @@ -1072,6 +1072,30 @@ build_template_co_await_expr (location_t kw, tree 
> type, tree expr, tree kind)
>   return aw_expr;
> }
> 
> +/* For a component ref that is not a pointer type, decide if we can use
> +   this directly.  */
> +static bool
> +usable_component_ref (tree comp_ref)
> +{
> +  if (TREE_CODE (comp_ref) != COMPONENT_REF
> +  || TREE_SIDE_EFFECTS (comp_ref))
> +return false;
> +
> +  while (TREE_CODE (comp_ref) == COMPONENT_REF)
> +{
> +  comp_ref = TREE_OPERAND (comp_ref, 0);
> +  STRIP_NOPS (comp_ref);
> +  /* x-> */
> +  if (INDIRECT_REF_P (comp_ref))
> + return false;
> +  /* operator-> */
> +  if (TREE_CODE (comp_ref) == CALL_EXPR)
> + return false;
> +  STRIP_NOPS (comp_ref);
> +}
> +  gcc_checking_assert (VAR_P (comp_ref) || TREE_CODE (comp_ref) == 
> PARM_DECL);
> +  return true;
> +}
> 
> /*  This performs [expr.await] bullet 3.3 and validates the interface 
> obtained.
> It is also used to build the initial and final suspend points.
> @@ -1134,13 +1158,12 @@ build_co_await (location_t loc, tree a, 
> suspend_point_kind suspend_kind,
>   if (o_type && !VOID_TYPE_P (o_type))
> o_type = complete_type_or_else (o_type, o);
> 
> -  if (!o_type)
> +  if (!o_type || o_type == error_mark_node)
> return error_mark_node;
> 
>   if (TREE_CODE (o_type) != RECORD_TYPE)
> {
> -  error_at (loc, "awaitable type %qT is not a structure",
> - o_type);
> +  error_at (loc, "awaitable type %qT is not a structure", o_type);
>   return error_mark_node;
> }
> 
> @@ -1166,20 +1189,47 @@ build_co_await (location_t loc, tree a, 
> suspend_point_kind suspend_kind,
>   if (!glvalue_p (o))
> o = get_target_expr (o, tf_warning_or_error);
> 
> -  tree e_proxy = o;
> -  if (glvalue_p (o))
> +  /* We know that we need a coroutine state frame variable for the awaiter,
> + since it must pers

Re: [PATCH] middle-end/117801 - failed register coalescing due to GIMPLE schedule

2024-11-29 Thread Richard Biener
On Fri, 29 Nov 2024, Jakub Jelinek wrote:

> On Fri, Nov 29, 2024 at 09:19:55AM +0100, Richard Biener wrote:
> > For a TSVC testcase we see failed register coalescing due to a
> > different schedule of GIMPLE .FMA and stores fed by it.  This
> > can be mitigated by making direct internal functions participate
> > in TER - given we're using more and more of such functions to
> > expose target capabilities it seems to be a natural thing to not
> > exempt those.
> > 
> > Unfortunately the internal function expanding API doesn't match
> > what we usually have - passing in a target and returning an RTX
> > but instead the LHS of the call is expanded and written to.  This
> > makes the TER expansion of a call SSA def a bit unwieldly.
> 
> Can't we change that?
> Especially if it is only for the easiest subset of internal fns
> (I see you limit it only to direct_internal_fn_p), if it has just
> one or a couple of easy implementations, those could be split into
> one which handles the whole thing by just expanding lhs and calling
> another function with the rtx target argument into which to store
> stuff (or const0_rtx for ignored result?) and handle the actual expansion,
> and then have an exported function from internal-fn.cc which expr.cc
> could call for the TERed internal-fn case.
> That function could assert it is only direct_internal_fn_p or some
> other subset which it would handle.

The expander goes through macro-generated expand_FOO (see top of
internal-fn.cc), and in the end dispatches to expand_*_optab_fn
of which there is a generic one for UNARY, BINARY and TERNARY
but very many OPTAB_NAME variants, like expand_fold_len_extract_optab_fn
dispatching to expand_direct_optab_fn or complex ones like
expand_gather_load_optab_fn.  There's unfortunately no good way
to factor out a different API there, at least not easily.

Suggestions welcome, of course.

Richard.


[PATCH 1/3] arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c

2024-11-29 Thread Andre Vieira

After the changes to the vctp intrinsic codegen changed slightly, where we now
unfortunately seem to be generating unneeded moves and extends of the mask.
These are however not incorrect and we don't have a fix for the unneeded
codegen right now, so changing the testcase to accept them so we can catch
other changes if they occur.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-compile-asm-2.c (test7): Add an optional
vmsr to the check-function-bodies.
---
 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
index c62f592a60d..fd3f68ce5b2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
@@ -216,6 +216,7 @@ void test7 (int32_t *a, int32_t *b, int32_t *c, int n, int g)
 **...
 **	dlstp.32	lr, r3
 **	vldrw.32	q[0-9]+, \[r0\], #16
+**	(?:vmsr	p0, .*)
 **	vpst
 **	vldrwt.32	q[0-9]+, \[r1\], #16
 **	vadd.i32	(q[0-9]+), q[0-9]+, q[0-9]+


[PATCH 0/3] arm, mve: Fix DLSTP testism and issue after changes in codegen

2024-11-29 Thread Andre Vieira
This patch series does not really need to be a patch series but just makes it
easier to send it all and has one common goal which is to clean up the DLSTP
implementation new to GCC 16 after some codegen changes. The first two patches
clean-up some testcases and the last fixes an actual issue that had gonne by
unnoticed until now.

Only tested on arm-none-eabi mve.exp=dlstp*. OK for trunk?

Andre Vieira (3):
  arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c
  arm, mve: Pass -std=c99 to dlstp-loop-form.c to avoid new warning
  arm, mve: Detect uses of vctp_vpr_generated inside subregs

 gcc/config/arm/arm.cc |  3 ++-
 .../gcc.target/arm/mve/dlstp-compile-asm-2.c  |  1 +
 .../gcc.target/arm/mve/dlstp-invalid-asm.c| 20 ++-
 .../gcc.target/arm/mve/dlstp-loop-form.c  |  2 +-
 4 files changed, 23 insertions(+), 3 deletions(-)

-- 
2.25.1



[PATCH 2/3] arm, mve: Pass -std=c99 to dlstp-loop-form.c to avoid new warning

2024-11-29 Thread Andre Vieira

This fixes a testism introduced by the warning produced with the -std=c23
default.  The testcase is a reduced piece of code meant to trigger an ICE, so
there's little value in trying to change the code itself.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-loop-form.c: Add -std=c99 to avoid warning
message.
---
 gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
index 08811cef568..0f9589d7756 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
-/* { dg-options "-Ofast" } */
+/* { dg-options "-Ofast -std=c99" } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 #pragma GCC arm "arm_mve_types.h"
 #pragma GCC arm "arm_mve.h" false


[PATCH 3/3] arm, mve: Detect uses of vctp_vpr_generated inside subregs

2024-11-29 Thread Andre Vieira

Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that.  Using
reg_overlap_mentioned_p is much more robust.

gcc/ChangeLog:

* gcc/config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.
---
 gcc/config/arm/arm.cc |  3 ++-
 .../gcc.target/arm/mve/dlstp-invalid-asm.c| 20 ++-
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 7292fddef80..7f82fb94a56 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35847,7 +35847,8 @@ arm_attempt_dlstp_transform (rtx label)
 	  df_ref insn_uses = NULL;
 	  FOR_EACH_INSN_USE (insn_uses, insn)
 	  {
-	if (rtx_equal_p (vctp_vpr_generated, DF_REF_REG (insn_uses)))
+	if (reg_overlap_mentioned_p (vctp_vpr_generated,
+	 DF_REF_REG (insn_uses)))
 	  {
 		end_sequence ();
 		return 1;
diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
index 26df2d30523..f26754cc482 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
@@ -128,7 +128,7 @@ void test9 (int32_t *a, int32_t *b, int32_t *c, int n)
 }
 
 /* Using a VPR that gets re-generated within the loop.  */
-void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
+void test10a (int32_t *a, int32_t *b, int32_t *c, int n)
 {
   mve_pred16_t p = vctp32q (n);
   while (n > 0)
@@ -145,6 +145,24 @@ void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
 }
 }
 
+/* Using a VPR that gets re-generated within the loop.  */
+void test10b (int32_t *a, int32_t *b, int32_t *c, int n)
+{
+  mve_pred16_t p = vctp32q (n-4);
+  while (n > 0)
+{
+  int32x4_t va = vldrwq_z_s32 (a, p);
+  p = vctp32q (n);
+  int32x4_t vb = vldrwq_z_s32 (b, p);
+  int32x4_t vc = vaddq_x_s32 (va, vb, p);
+  vstrwq_p_s32 (c, vc, p);
+  c += 4;
+  a += 4;
+  b += 4;
+  n -= 4;
+}
+}
+
 /* Using vctp32q_m instead of vctp32q.  */
 void test11 (int32_t *a, int32_t *b, int32_t *c, int n, mve_pred16_t p0)
 {


Re: [PATCH] arm, mve: Do not DLSTP transform loops if VCTP is not first

2024-11-29 Thread Andre Vieira (lists)

Hi Christophe,

On 28/11/2024 17:00, Christophe Lyon wrote:

Hi Andre,


Thanks, the patch LGTM except a minor nit:

  /* Using a VPR that gets re-generated within the loop.  */
-void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
+void test10a (int32_t *a, int32_t *b, int32_t *c, int n)
[...]

+/* Using a VPR that gets re-generated within the loop.  */
+void test10b (int32_t *a, int32_t *b, int32_t *c, int n)

Can you update the comment before test10b, to highlight the difference
with test10a?

Thanks,

Christophe


I just sent a patch series and patch 3 of that series titled:
arm, mve: Detect uses of vctp_vpr_generated inside subregs

Has a better fix for this issue, less of a hammer. It addresses a 
short-coming in the dlstp analysis where a rtx_equal_p was being used on 
DF_REF_REG of INSN_USE's to detect uses of vctp_vpr_generated, but that 
doesn't work if the USE is in a subreg, whereas the suggestion is to use 
reg_overlap_mentioned_p. That enables the existing analysis to block the 
loops that were causing the issue that this patch was trying to address 
with a 'bigger hammer'.


So dropping this patch in favour of the new one, and I just realized I 
didn't address the comments on the testcase that the other patch shares 
with this... will do that!


Re: [PATCH 1/3] arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c

2024-11-29 Thread Christophe Lyon




On 11/29/24 11:30, Andre Vieira wrote:


After the changes to the vctp intrinsic codegen changed slightly, where we now
unfortunately seem to be generating unneeded moves and extends of the mask.
These are however not incorrect and we don't have a fix for the unneeded
codegen right now, so changing the testcase to accept them so we can catch
other changes if they occur.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-compile-asm-2.c (test7): Add an optional
vmsr to the check-function-bodies.


Indeed I've been looking at avoiding these extra moves but did not find 
a good solution yet.

In the mean time I agree it's reasonable to update the testcase for gcc-15.

Thanks,

Christophe


---
  gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c | 1 +
  1 file changed, 1 insertion(+)



Re: [PATCH 2/3] arm, mve: Pass -std=c99 to dlstp-loop-form.c to avoid new warning

2024-11-29 Thread Christophe Lyon




On 11/29/24 11:30, Andre Vieira wrote:


This fixes a testism introduced by the warning produced with the -std=c23
default.  The testcase is a reduced piece of code meant to trigger an ICE, so
there's little value in trying to change the code itself.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-loop-form.c: Add -std=c99 to avoid warning
message.


Thanks indeed that's better than my proposal:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670267.html


---
  gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Re: [PATCH 3/3] arm, mve: Detect uses of vctp_vpr_generated inside subregs

2024-11-29 Thread Christophe Lyon




On 11/29/24 11:30, Andre Vieira wrote:


Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that.  Using
reg_overlap_mentioned_p is much more robust.

gcc/ChangeLog:

* gcc/config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.


Thanks, this patch is OK too, provided the update in the comment before 
test10b as I requested in your previous version.



---
  gcc/config/arm/arm.cc |  3 ++-
  .../gcc.target/arm/mve/dlstp-invalid-asm.c| 20 ++-
  2 files changed, 21 insertions(+), 2 deletions(-)



Re: [PATCH 0/3] arm, mve: Fix DLSTP testism and issue after changes in codegen

2024-11-29 Thread Christophe Lyon




On 11/29/24 11:30, Andre Vieira wrote:

This patch series does not really need to be a patch series but just makes it
easier to send it all and has one common goal which is to clean up the DLSTP
implementation new to GCC 16 after some codegen changes. The first two patches
clean-up some testcases and the last fixes an actual issue that had gonne by
unnoticed until now.

Only tested on arm-none-eabi mve.exp=dlstp*. OK for trunk?


I OK'ed individual patches modulo an update in a comment in patch 3/3.

But I just realized that patches 1 and 3 fix some of the regressions 
reported in PR target/117814, so their ChangeLog entries should mention 
that.


Thanks,

Christophe


Andre Vieira (3):
   arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c
   arm, mve: Pass -std=c99 to dlstp-loop-form.c to avoid new warning
   arm, mve: Detect uses of vctp_vpr_generated inside subregs

  gcc/config/arm/arm.cc |  3 ++-
  .../gcc.target/arm/mve/dlstp-compile-asm-2.c  |  1 +
  .../gcc.target/arm/mve/dlstp-invalid-asm.c| 20 ++-
  .../gcc.target/arm/mve/dlstp-loop-form.c  |  2 +-
  4 files changed, 23 insertions(+), 3 deletions(-)



Re: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-29 Thread Kyrylo Tkachov
Hi Tamar,

> On 15 Nov 2024, at 14:24, Tamar Christina  wrote:
> 
> Hi All,
> 
> This patch makes it so that when you use any of the Cortex-A53 errata
> workarounds but have specified an -march or -mcpu we know is not affected by 
> it
> that we suppress the errata workaround.
> 
> This is a driver only patch as the linker invocation needs to be changed as
> well.  The linker and cc SPECs are different because for the linker we didn't
> seem to add an inversion flag for the option.  That said, it's also not 
> possible
> to configure the linker with it on by default.  So not passing the flag is
> sufficient to turn it off.
> 
> For the compilers however we have an inversion flag using -mno-, which is 
> needed
> to disable the workarounds when the compiler has been configured with it by
> default.
> 
> Note that theoretically speaking -mcpu=native on a Cortex-A53 would turn it 
> off,
> but this should be ok because it's unlikely anyone is running GCC-15+ on a
> Cortex-A53 which needs it.  If this is a concern I can adjust the patch to for
> targets that have HAVE_LOCAL_CPU_DETECT I can make a new custom function that
> re-queries host detection to see if it's an affected system.
> 
> The workaround has the effect of suppressing certain inlining and multiply-add
> formation which leads to about ~1% SPECCPU 2017 Intrate regression on modern
> cores.  This patch is needed because most distros configure GCC with the
> workaround enabled by default.
> 
> I tried writing automated testcases for these, however the testsuite doesn't
> want to scan the output of -### and it makes the excess error tests always 
> fail
> unless you use dg-error, which also looks for"error:".  So tested manually:
> 
>> gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null 
>> -### 2>&1 | grep "\-mfix" | wc -l
> 0
> 
>> gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
>> "\-mfix" | wc -l
> 5
> 
>> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 
>> 2>&1 | grep "\-mfix" | wc -l
> 5
> 
>> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 
>> 2>&1 | grep "\-mfix" | wc -l
> 0
> 
>> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 
>> 2>&1 | grep "\-\-fix" | wc -l
> 0
> 
>> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 
>> 2>&1 | grep "\-\-fix" | wc -l
> 1
> 
>> -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
>> "\-\-fix" | wc -l
> 1
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
> TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
> CA53_ERR_843419_COMPILE_SPEC): New.
> (CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
> (AARCH64_ERRATA_COMPILE_SPEC):
> * config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
> AARCH64_ERRATA_COMPILE_SPEC.
> * config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> * config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> * config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> * config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> * doc/invoke.texi: Document it.
> 
> ---
> diff --git a/gcc/config/aarch64/aarch64-elf-raw.h 
> b/gcc/config/aarch64/aarch64-elf-raw.h
> index 
> 5396da9b2d626e23e4c4d56e19cd7aa70804c475..8442a664c4fdedd9696da90e6727293c4d472a3f
>  100644
> --- a/gcc/config/aarch64/aarch64-elf-raw.h
> +++ b/gcc/config/aarch64/aarch64-elf-raw.h
> @@ -38,4 +38,12 @@
>   AARCH64_ERRATA_LINK_SPEC
> #endif
> 
> +#ifndef CC1_SPEC
> +# define CC1_SPEC AARCH64_ERRATA_COMPILE_SPEC
> +#endif
> +
> +#ifndef CC1PLUS_SPEC
> +# define CC1PLUS_SPEC AARCH64_ERRATA_COMPILE_SPEC
> +#endif
> +
> #endif /* GCC_AARCH64_ELF_RAW_H */
> diff --git a/gcc/config/aarch64/aarch64-errata.h 
> b/gcc/config/aarch64/aarch64-errata.h
> index 
> c323595ee49553f2b3bc106e993c14f62aee235b..ac0156848abe3e7df669a7ff54e07e72e978c5f0
>  100644
> --- a/gcc/config/aarch64/aarch64-errata.h
> +++ b/gcc/config/aarch64/aarch64-errata.h
> @@ -21,24 +21,61 @@
> #ifndef GCC_AARCH64_ERRATA_H
> #define GCC_AARCH64_ERRATA_H
> 
> +/* Completely ignore the option if we've explicitly specify something other 
> than
> +   mcpu=cortex-a53 or march=armv8-a.  */
> +#define TARGET_SUPPRESS_OPT_SPEC(OPT) \
> +  "mcpu=*:%{!mcpu=cortex-a53:; " OPT  \
> +  "}; march=*:%{!march=armv8-a:;" OPT "}; " OPT
> +
> +/* Explicitly turn off the option if we've explicitly specify something other
> +   than mcpu=cortex-a53 or march=armv8-a.  This will also erase any other 
> usage
> +   of the flag making the order of the options not relevant.  */
> +#define TARGET_TURN_OFF_OPT_SPEC(FLAG)   \
> +  "mcpu=*:%{!mcpu=cortex-a53:% +  "}; march=*:%{!march=armv8-a:% +
> +/* Cortex-A53 835769 Errata.  */
> +
> #if TARGET_FIX_ERR_A53_835769_DEFAULT
> -#define CA53_ERR_835769_SPEC \

RE: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-29 Thread Tamar Christina
Hi Kyril,

Thanks for the review, unfortunately this is an old version of the patch, I 
sent a new one on Thu 11/21/2024
with updates and automated tests.

Would you mind reviewing that one?  I've noted the documentation comment you 
mentioned :)

Thanks,
Tamar

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Friday, November 29, 2024 11:21 AM
> To: Tamar Christina 
> Cc: GCC Patches ; nd ; Richard
> Earnshaw ; ktkac...@gcc.gnu.org; Richard
> Sandiford 
> Subject: Re: [PATCH]AArch64 Suppress default options when march or mcpu used
> is not affected by it.
> 
> Hi Tamar,
> 
> > On 15 Nov 2024, at 14:24, Tamar Christina  wrote:
> >
> > Hi All,
> >
> > This patch makes it so that when you use any of the Cortex-A53 errata
> > workarounds but have specified an -march or -mcpu we know is not affected by
> it
> > that we suppress the errata workaround.
> >
> > This is a driver only patch as the linker invocation needs to be changed as
> > well.  The linker and cc SPECs are different because for the linker we 
> > didn't
> > seem to add an inversion flag for the option.  That said, it's also not 
> > possible
> > to configure the linker with it on by default.  So not passing the flag is
> > sufficient to turn it off.
> >
> > For the compilers however we have an inversion flag using -mno-, which is
> needed
> > to disable the workarounds when the compiler has been configured with it by
> > default.
> >
> > Note that theoretically speaking -mcpu=native on a Cortex-A53 would turn it 
> > off,
> > but this should be ok because it's unlikely anyone is running GCC-15+ on a
> > Cortex-A53 which needs it.  If this is a concern I can adjust the patch to 
> > for
> > targets that have HAVE_LOCAL_CPU_DETECT I can make a new custom function
> that
> > re-queries host detection to see if it's an affected system.
> >
> > The workaround has the effect of suppressing certain inlining and 
> > multiply-add
> > formation which leads to about ~1% SPECCPU 2017 Intrate regression on
> modern
> > cores.  This patch is needed because most distros configure GCC with the
> > workaround enabled by default.
> >
> > I tried writing automated testcases for these, however the testsuite doesn't
> > want to scan the output of -### and it makes the excess error tests always 
> > fail
> > unless you use dg-error, which also looks for"error:".  So tested manually:
> >
> >> gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null 
> >> -###
> 2>&1 | grep "\-mfix" | wc -l
> > 0
> >
> >> gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-
> mfix" | wc -l
> > 5
> >
> >> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -###
> 2>&1 | grep "\-mfix" | wc -l
> > 5
> >
> >> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null 
> >> -###
> 2>&1 | grep "\-mfix" | wc -l
> > 0
> >
> >> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null 
> >> -###
> 2>&1 | grep "\-\-fix" | wc -l
> > 0
> >
> >> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -###
> 2>&1 | grep "\-\-fix" | wc -l
> > 1
> >
> >> -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
> >> "\-\-
> fix" | wc -l
> > 1
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
> > TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
> > CA53_ERR_843419_COMPILE_SPEC): New.
> > (CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
> > (AARCH64_ERRATA_COMPILE_SPEC):
> > * config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
> > AARCH64_ERRATA_COMPILE_SPEC.
> > * config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> > * config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> > * config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> > * config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> > * doc/invoke.texi: Document it.
> >
> > ---
> > diff --git a/gcc/config/aarch64/aarch64-elf-raw.h
> b/gcc/config/aarch64/aarch64-elf-raw.h
> > index
> 5396da9b2d626e23e4c4d56e19cd7aa70804c475..8442a664c4fdedd9696da90
> e6727293c4d472a3f 100644
> > --- a/gcc/config/aarch64/aarch64-elf-raw.h
> > +++ b/gcc/config/aarch64/aarch64-elf-raw.h
> > @@ -38,4 +38,12 @@
> >   AARCH64_ERRATA_LINK_SPEC
> > #endif
> >
> > +#ifndef CC1_SPEC
> > +# define CC1_SPEC AARCH64_ERRATA_COMPILE_SPEC
> > +#endif
> > +
> > +#ifndef CC1PLUS_SPEC
> > +# define CC1PLUS_SPEC AARCH64_ERRATA_COMPILE_SPEC
> > +#endif
> > +
> > #endif /* GCC_AARCH64_ELF_RAW_H */
> > diff --git a/gcc/config/aarch64/aarch64-errata.h 
> > b/gcc/config/aarch64/aarch64-
> errata.h
> > index
> c323595ee49553f2b3bc106e993c14f62aee235b..ac0156848abe3e7df669a7ff5
> 4e07e72e978c5f0 100644
> > --- a/gcc/config/aarch64/aarch64-errata.h
> > +++ b/gcc/config/aarch64/aarch64-errata.h
> > @@ -21,24 +21,61 @@
> > #

Re: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-29 Thread Kyrylo Tkachov


> On 21 Nov 2024, at 10:13, Tamar Christina  wrote:
> 
>>> I tried writing automated testcases for these, however the testsuite doesn't
>>> want to scan the output of -### and it makes the excess error tests always 
>>> fail
>>> unless you use dg-error, which also looks for"error:".  So tested manually:
>> 
>> You might be able to use dg-message instead. dg-message does not look
>> for a `note:` (dg-note), `error:` (dg-note) or `warning:`
>> (dg-warning).
>> 
>> From gcc-dg.exp:
>> ```
>> # Look for messages that don't have standard prefixes.
>> proc dg-message { args } {
>> ```
> 
> Thanks 😊 It was mostly the excess errors that were an issue. But I  found you 
> can suppress it.
> Updated new version and tests.
> 
> ---
> 
> Hi All,
> 
> This patch makes it so that when you use any of the Cortex-A53 errata
> workarounds but have specified an -march or -mcpu we know is not affected by 
> it
> that we suppress the errata workaround.
> 
> This is a driver only patch as the linker invocation needs to be changed as
> well.  The linker and cc SPECs are different because for the linker we didn't
> seem to add an inversion flag for the option.  That said, it's also not 
> possible
> to configure the linker with it on by default.  So not passing the flag is
> sufficient to turn it off.
> 
> For the compilers however we have an inversion flag using -mno-, which is 
> needed
> to disable the workarounds when the compiler has been configured with it by
> default.
> 
> In case it's unclear how the patch does what it does (it took me a while to
> figure out the syntax):
> 
>  * Early matching will replace any -march=native or -mcpu=native with their
>expanded forms and erases the native arguments from the buffer.
>  * Due to the above if we ensure we handle the new code after this erasure 
> then
>we only have to handle the expanded form.
>  * The expanded form needs to handle -march=+extensions and
>-mcpu=+extensions and so we can't use normal string matching but
>instead use strstr with a custom driver function that's common between
>native and non-native builds.
>  * For the compilers we output -mno- and for the linker we just
>  erase the --fix- option.
>  * The extra internal matching, e.g. the duplicate match of mcpu inside:
>  mcpu=*:%{%:is_local_not_armv8_base(%{mcpu=*:%*}) is so we can extract the 
> glob
>  using %* because the outer match would otherwise reset at the %{.  The reason
>  for the outer glob at all is to skip the block early if no matches are found.
> 
> 
> The workaround has the effect of suppressing certain inlining and multiply-add
> formation which leads to about ~1% SPECCPU 2017 Intrate regression on modern
> cores.  This patch is needed because most distros configure GCC with the
> workaround enabled by default.
> 
> Expected output:
> 
>> gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null 
>> -### 2>&1 | grep "\-mfix" | wc -l
> 0
> 
>> gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
>> "\-mfix" | wc -l
> 5
> 
>> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 
>> 2>&1 | grep "\-mfix" | wc -l
> 5
> 
>> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 
>> 2>&1 | grep "\-mfix" | wc -l
> 0
> 
>> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 
>> 2>&1 | grep "\-\-fix" | wc -l
> 0
> 
>> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 
>> 2>&1 | grep "\-\-fix" | wc -l
> 1
> 
>> -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
>> "\-\-fix" | wc -l
> 1
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> cross build and regtested on aarch64-none-elf and no issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill


> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
> TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
> CA53_ERR_843419_COMPILE_SPEC): New.
> (CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
> (AARCH64_ERRATA_COMPILE_SPEC):
> * config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
> AARCH64_ERRATA_COMPILE_SPEC.
> * config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> * config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> * config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> * config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
> * common/config/aarch64/aarch64-common.cc
> (is_host_cpu_not_armv8_base): New.
> * config/aarch64/aarch64.h (is_host_cpu_not_armv8_base): New.
> (MCPU_TO_MARCH_SPEC_FUNCTIONS): Add is_local_not_armv8_base.
> (EXTRA_SPEC_FUNCTIONS): Add is_local_cpu_armv8_base.
> * doc/invoke.texi: Document it.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/cpunative/info_30: New test.
> * gcc.target/aarch64/cpunative/info_31: New test.
> * gcc.target/aarch64/cpunative/info_32: New test.
> * gcc.target/aarch64/cpunative

Re: [PATCH 1/3] aarch64: Fix up flags for vget_low_*, vget_high_* and vreinterpret intrinsics

2024-11-29 Thread Richard Sandiford
Andrew Pinski  writes:
> These 3 intrinsics will not raise an fp exception, or read FPCR. These 
> intrinsics,
> will be folded into VIEW_CONVERT_EXPR or a BIT_FIELD_REF which is already set 
> to
> be const expressions too.
>
> Built and tested for aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (VREINTERPRET_BUILTIN): Use
>   FLAG_NONE instead of FLAG_AUTO_FP.
>   (VGET_LOW_BUILTIN): Likewise.
>   (VGET_HIGH_BUILTIN): Likewise.
>
> Signed-off-by: Andrew Pinski 

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-builtins.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index e26ee323a2d..04ae16a0c76 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -911,7 +911,7 @@ static aarch64_fcmla_laneq_builtin_datum 
> aarch64_fcmla_lane_builtin_data[] = {
> 2, \
> { SIMD_INTR_MODE(A, L), SIMD_INTR_MODE(B, L) }, \
> { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(B) }, \
> -   FLAG_AUTO_FP, \
> +   FLAG_NONE, \
> SIMD_INTR_MODE(A, L) == SIMD_INTR_MODE(B, L) \
>   && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
>},
> @@ -923,7 +923,7 @@ static aarch64_fcmla_laneq_builtin_datum 
> aarch64_fcmla_lane_builtin_data[] = {
> 2, \
> { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
> -   FLAG_AUTO_FP, \
> +   FLAG_NONE, \
> false \
>},
>  
> @@ -934,7 +934,7 @@ static aarch64_fcmla_laneq_builtin_datum 
> aarch64_fcmla_lane_builtin_data[] = {
> 2, \
> { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
> -   FLAG_AUTO_FP, \
> +   FLAG_NONE, \
> false \
>},


Re: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-29 Thread Kyrylo Tkachov
Hi Tamar,

> On 29 Nov 2024, at 11:25, Tamar Christina  wrote:
> 
> Hi Kyril,
> 
> Thanks for the review, unfortunately this is an old version of the patch, I 
> sent a new one on Thu 11/21/2024
> with updates and automated tests.
> 
> Would you mind reviewing that one?  I've noted the documentation comment you 
> mentioned :)

Ah, I did review the latest one, but I had clicked reply on the wrong one in 
the thread.
I’ve ok’ed that explicitly separately.
Kyrill


> 
> Thanks,
> Tamar
> 
>> -Original Message-
>> From: Kyrylo Tkachov 
>> Sent: Friday, November 29, 2024 11:21 AM
>> To: Tamar Christina 
>> Cc: GCC Patches ; nd ; Richard
>> Earnshaw ; ktkac...@gcc.gnu.org; Richard
>> Sandiford 
>> Subject: Re: [PATCH]AArch64 Suppress default options when march or mcpu used
>> is not affected by it.
>> 
>> Hi Tamar,
>> 
>>> On 15 Nov 2024, at 14:24, Tamar Christina  wrote:
>>> 
>>> Hi All,
>>> 
>>> This patch makes it so that when you use any of the Cortex-A53 errata
>>> workarounds but have specified an -march or -mcpu we know is not affected by
>> it
>>> that we suppress the errata workaround.
>>> 
>>> This is a driver only patch as the linker invocation needs to be changed as
>>> well.  The linker and cc SPECs are different because for the linker we 
>>> didn't
>>> seem to add an inversion flag for the option.  That said, it's also not 
>>> possible
>>> to configure the linker with it on by default.  So not passing the flag is
>>> sufficient to turn it off.
>>> 
>>> For the compilers however we have an inversion flag using -mno-, which is
>> needed
>>> to disable the workarounds when the compiler has been configured with it by
>>> default.
>>> 
>>> Note that theoretically speaking -mcpu=native on a Cortex-A53 would turn it 
>>> off,
>>> but this should be ok because it's unlikely anyone is running GCC-15+ on a
>>> Cortex-A53 which needs it.  If this is a concern I can adjust the patch to 
>>> for
>>> targets that have HAVE_LOCAL_CPU_DETECT I can make a new custom function
>> that
>>> re-queries host detection to see if it's an affected system.
>>> 
>>> The workaround has the effect of suppressing certain inlining and 
>>> multiply-add
>>> formation which leads to about ~1% SPECCPU 2017 Intrate regression on
>> modern
>>> cores.  This patch is needed because most distros configure GCC with the
>>> workaround enabled by default.
>>> 
>>> I tried writing automated testcases for these, however the testsuite doesn't
>>> want to scan the output of -### and it makes the excess error tests always 
>>> fail
>>> unless you use dg-error, which also looks for"error:".  So tested manually:
>>> 
 gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null 
 -###
>> 2>&1 | grep "\-mfix" | wc -l
>>> 0
>>> 
 gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-
>> mfix" | wc -l
>>> 5
>>> 
 gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -###
>> 2>&1 | grep "\-mfix" | wc -l
>>> 5
>>> 
 gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null 
 -###
>> 2>&1 | grep "\-mfix" | wc -l
>>> 0
>>> 
 gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null 
 -###
>> 2>&1 | grep "\-\-fix" | wc -l
>>> 0
>>> 
 gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -###
>> 2>&1 | grep "\-\-fix" | wc -l
>>> 1
>>> 
 -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
 "\-\-
>> fix" | wc -l
>>> 1
>>> 
>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>> 
>>> Ok for master?
>>> 
>>> Thanks,
>>> Tamar
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
>>> TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
>>> CA53_ERR_843419_COMPILE_SPEC): New.
>>> (CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
>>> (AARCH64_ERRATA_COMPILE_SPEC):
>>> * config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
>>> AARCH64_ERRATA_COMPILE_SPEC.
>>> * config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
>>> * config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
>>> * config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
>>> * config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
>>> * doc/invoke.texi: Document it.
>>> 
>>> ---
>>> diff --git a/gcc/config/aarch64/aarch64-elf-raw.h
>> b/gcc/config/aarch64/aarch64-elf-raw.h
>>> index
>> 5396da9b2d626e23e4c4d56e19cd7aa70804c475..8442a664c4fdedd9696da90
>> e6727293c4d472a3f 100644
>>> --- a/gcc/config/aarch64/aarch64-elf-raw.h
>>> +++ b/gcc/config/aarch64/aarch64-elf-raw.h
>>> @@ -38,4 +38,12 @@
>>>  AARCH64_ERRATA_LINK_SPEC
>>> #endif
>>> 
>>> +#ifndef CC1_SPEC
>>> +# define CC1_SPEC AARCH64_ERRATA_COMPILE_SPEC
>>> +#endif
>>> +
>>> +#ifndef CC1PLUS_SPEC
>>> +# define CC1PLUS_SPEC AARCH64_ERRATA_COMPILE_SPEC
>>> +#endif
>>> +
>>> #endif /* GCC_AARCH64_ELF_RAW_H */
>>> diff --git a/gcc/config/aarch64/aarch64-errata

Re: [PATCH 2/3] aarch64: add attributes to the prefetch_builtins

2024-11-29 Thread Richard Sandiford
Andrew Pinski  writes:
> This adds the attributes associated with prefetch to the bultins.
> Just call aarch64_get_attributes with FLAG_PREFETCH_MEMORY to get the 
> attributes.
>
> Built and tested for aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (aarch64_init_prefetch_builtin):
>   Updete call to aarch64_general_add_builtin in 
> AARCH64_INIT_PREFETCH_BUILTIN.
>   Add new variable prefetch_attrs.
>
> Signed-off-by: Andrew Pinski 

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-builtins.cc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 04ae16a0c76..9705f2de090 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -2024,10 +2024,12 @@ aarch64_init_prefetch_builtin (void)
>  {
>  #define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)  
> \
>aarch64_builtin_decls[INDEX] = \
> -aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
> +aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX,  \
> +  prefetch_attrs)
>  
>tree ftype;
>tree cv_argtype;
> +  tree prefetch_attrs = aarch64_get_attributes (FLAG_PREFETCH_MEMORY, 
> DImode);
>cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
>| TYPE_QUAL_VOLATILE);
>cv_argtype = build_pointer_type (cv_argtype);


Re: [PATCH 3/3] aarch64: Add attributes to the data intrinsics.

2024-11-29 Thread Richard Sandiford
Andrew Pinski  writes:
> All of the data intrinsics don't read/write memory nor they are fp related.
> So adding the attributes will improve the code generation slightly.
>
> Built and tested for aarch64-linux-gnu
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (aarch64_init_data_intrinsics): 
> Call
>   aarch64_get_attributes and update calls to aarch64_general_add_builtin.
>
> Signed-off-by: Andrew Pinski 

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-builtins.cc | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 9705f2de090..bc1719adbaa 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -2162,6 +2162,8 @@ aarch64_init_ls64_builtins (void)
>  static void
>  aarch64_init_data_intrinsics (void)
>  {
> +  /* These intrinsics are not fp nor they read/write memory. */
> +  tree attrs = aarch64_get_attributes (FLAG_NONE, SImode);
>tree uint32_fntype = build_function_type_list (uint32_type_node,
>uint32_type_node, NULL_TREE);
>tree ulong_fntype = build_function_type_list (long_unsigned_type_node,
> @@ -2171,22 +2173,22 @@ aarch64_init_data_intrinsics (void)
>uint64_type_node, NULL_TREE);
>aarch64_builtin_decls[AARCH64_REV16]
>  = aarch64_general_add_builtin ("__builtin_aarch64_rev16", uint32_fntype,
> -AARCH64_REV16);
> +AARCH64_REV16, attrs);
>aarch64_builtin_decls[AARCH64_REV16L]
>  = aarch64_general_add_builtin ("__builtin_aarch64_rev16l", ulong_fntype,
> -AARCH64_REV16L);
> +AARCH64_REV16L, attrs);
>aarch64_builtin_decls[AARCH64_REV16LL]
>  = aarch64_general_add_builtin ("__builtin_aarch64_rev16ll", 
> uint64_fntype,
> -AARCH64_REV16LL);
> +AARCH64_REV16LL, attrs);
>aarch64_builtin_decls[AARCH64_RBIT]
>  = aarch64_general_add_builtin ("__builtin_aarch64_rbit", uint32_fntype,
> -AARCH64_RBIT);
> +AARCH64_RBIT, attrs);
>aarch64_builtin_decls[AARCH64_RBITL]
>  = aarch64_general_add_builtin ("__builtin_aarch64_rbitl", ulong_fntype,
> -AARCH64_RBITL);
> +AARCH64_RBITL, attrs);
>aarch64_builtin_decls[AARCH64_RBITLL]
>  = aarch64_general_add_builtin ("__builtin_aarch64_rbitll", uint64_fntype,
> -AARCH64_RBITLL);
> +AARCH64_RBITLL, attrs);
>  }
>  
>  /* Implement #pragma GCC aarch64 "arm_acle.h".  */


Re: [PATCH v2 0/8] aarch64: Enable C/C++ operations on SVE ACLE types.

2024-11-29 Thread Tejas Belagod

On 11/18/24 7:09 PM, Richard Sandiford wrote:

Tejas Belagod  writes:

Hi,

This is v2 of the series
   https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667743.html

based on review comments. Changes in this version include:

1. Canonicalised all index ranges for VLAs to BIT_FIELD_REF.
2. Added more initialization error tests.
3. Merged an intermediate state patch into the final state.
4. Removed duplication in new test cops.c and added tests for
constructors with variable elements.

A couple of FE patches are yet to be reviewed but just thought I should
get the ones that are fixed out while waiting for reviews.


OK for the aarch64 bits.

I responded to patch 4 individually, but otherwise the rest of the series
looks good to me as well.  OK for the generic bits in a week's time,
with the change suggested for patch 4, if there are no other reviews
before then.



Thanks for the reviews, Richard. I've now pushed this series to master.

Thanks,
Tejas.



Thanks,
Richard


This patchset enables C/C++ operations on SVE ACLE types.  The changes enable
operations on SVE ACLE types to have the same semantics as GNU vector types.
These operations like (+, -, &, | etc) behave exactly as they would behave on
GNU vector types. The operations are self-contained as in we still don't allow
mixing GNU and SVE vector types in, for eg, binary operations because the
typeof the expression is ambiguous and this causes PCS issues.

Other operations like implicit conversions behave as they would with GNU 
vectors i.e.

gnu_uv = sv_uv; // This is possible as long as the size, shape and 
element-signedness
 // of both vectors are the same.
gnu_uv = sv_sv; // Error as implicit conversion from signed to unsigned is not 
possible
 // even though size and shape may be similar.

Such assignments would have to go through an explicit cast

gnu_uv = (gnu_uv)sv_sv;

Following unary operations are supported:
   
   sve_type_var[0];

   &sve_type_var[0];
   sve_type_var[n];
   &sve_type_var[n];
   +sve_type_var;
   -sve_type_var;
   ~sve_type_var;
   !sve_type_var; /* Allowed in C++ */
   *sve_type_var; /* Error! */
   __real sve_type_var; /* Error! */
   __imag sve_type_var; /* Error! */
   ++sve_type_var;
   --sve_type_var;
   sve_type_var++;
   sve_type_var--;

Following binary ops are supported:

   sve_type_var + sve_type_var;
   sve_type_var - sve_type_var;
   sve_type_var * sve_type_var;
   sve_type_var / sve_type_var;
   sve_type_var % sve_type_var;
   sve_type_var & sve_type_var;
   sve_type_var | sve_type_var;
   sve_type_var ^ sve_type_var;
   sve_type_var == sve_type_var;
   sve_type_var != sve_type_var;
   sve_type_var <= sve_type_var;
   sve_type_var < sve_type_var;
   sve_type_var > sve_type_var;
   sve_type_var >= sve_type_var;
   sve_type_var << sve_type_var;
   sve_type_var >> sve_type_var;
   sve_type_var && sve_type_var; /* Allowed in C++ */
   sve_type_var || sve_type_var; /* Allowed in C++ */

/* Vector-scalar binary arithmetic. The reverse is also supported
eg.  + sve_type_var  */

   sve_type_var + ;
   sve_type_var - ;
   sve_type_var * ;
   sve_type_var / ;
   sve_type_var % ;
   sve_type_var & ;
   sve_type_var | ;
   sve_type_var ^ ;
   sve_type_var == ;
   sve_type_var != ;
   sve_type_var <= ;
   sve_type_var < ;
   sve_type_var > ;
   sve_type_var >= ;
   sve_type_var << ;
   sve_type_var >> ;
   sve_type_var && ; /* Allowed in C++ */
   sve_type_var || ; /* Allowed in C++ */
   sve_type_var + ;
   sve_type_var - ;
   sve_type_var * ;
   sve_type_var / ;
   sve_type_var % ;
   sve_type_var & ;
   sve_type_var | ;
   sve_type_var ^ ;
   sve_type_var == ;
   sve_type_var != ;
   sve_type_var <= ;
   sve_type_var < ;
   sve_type_var > ;
   sve_type_var >= ;
   sve_type_var << ;
   sve_type_var >> ;
   sve_type_var && ; /* Allowed in C++ */
   sve_type_var || ; /* Allowed in C++ */

Ternary operations:

? sve_type_var : sve_type_var;

   sve_type_var ? sve_type_var : sve_type_var; /* Allowed in C++ */

Builtins:

   /* Vector built-ins.  */

   __builtin_shuffle (sve_type_var, sve_type_var, sve_type_var);
   __builtin_convertvector (sve_type_var, );

These operations are supported for both fixed length and variable length 
vectors.

One outstanding fail
PASS->FAIL: g++.dg/ext/sve-sizeless-1.C  -std=gnu++11  (test for errors, line 
163)

I've left another outstanding fail as is - the test where an address is taken 
of an SVE vector element. I'm not
sure what the behaviour should be here.

Otherwise regression tested and bootstrapped on aarch64-linux-gnu. Bootstrapped 
on x86-linux-gnu.

OK for trunk?

Thanks,
Tejas.

Tejas Belagod (8):
   aarch64: Fix ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS
   aarch64: Make C/C++ operations possible on SVE ACLE types.
   c: Range-check indexing of SVE ACLE vectors
   gimple: Handle variable-sized vectors in BIT_FIELD_REF
   c: Fix constructor bounds checking for VLA and construct VLA vector
 constants
   aarch64: Add tes

Re: [PATCH] aarch64: Mark __builtin_aarch64_im_lane_boundsi as leaf and nothrow [PR117665]

2024-11-29 Thread Richard Sandiford
Andrew Pinski  writes:
> __builtin_aarch64_im_lane_boundsi is known not to throw or call back into 
> another
> function since it will either folded into an NOP or will produce a compiler 
> error.
>
> This fixes the ICE by fixing the missed optimization. It does not fix the 
> underlying
> issue with fold_marked_statements; which I filed as PR 117668.
>
> Built and tested for aarch64-linux-gnu.
>
>   PR target/117665
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc 
> (aarch64_init_simd_builtin_functions):
>   Pass nothrow and leaf as attributes to aarch64_general_add_builtin for
>   __builtin_aarch64_im_lane_boundsi.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/aarch64/lane-bound-1.C: New test.
>   * gcc.target/aarch64/lane-bound-3.c: New test.
>
> Signed-off-by: Andrew Pinski 

Yeah, ok.  I'm a bit nervous about making this builtin easier to optimise,
but the real fix for that of course is to do the checking in the frontend
(as for the new pragma-based approach).  The C++ test is convincing that
(a) we do still emit the error for obvious dead code and (b) not marking
the function has a user-visible effect.

Thanks,
Richard

> ---
>  gcc/config/aarch64/aarch64-builtins.cc|  6 -
>  .../g++.target/aarch64/lane-bound-1.C | 21 +++
>  .../gcc.target/aarch64/lane-bound-3.c | 27 +++
>  3 files changed, 53 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.target/aarch64/lane-bound-1.C
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/lane-bound-3.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index b860e22f01f..e26ee323a2d 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1482,10 +1482,14 @@ aarch64_init_simd_builtin_functions (bool 
> called_from_pragma)
> size_type_node,
> intSI_type_node,
> NULL);
> +  /* aarch64_im_lane_boundsi should be leaf and nothrow as it
> +  is expanded as nop or will cause an user error.  */
> +  tree attrs = aarch64_add_attribute ("nothrow", NULL_TREE);
> +  attrs = aarch64_add_attribute ("leaf", attrs);
>aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_LANE_CHECK]
>   = aarch64_general_add_builtin ("__builtin_aarch64_im_lane_boundsi",
>  lane_check_fpr,
> -AARCH64_SIMD_BUILTIN_LANE_CHECK);
> +AARCH64_SIMD_BUILTIN_LANE_CHECK, attrs);
>  }
>  
>for (i = 0; i < ARRAY_SIZE (aarch64_simd_builtin_data); i++, fcode++)
> diff --git a/gcc/testsuite/g++.target/aarch64/lane-bound-1.C 
> b/gcc/testsuite/g++.target/aarch64/lane-bound-1.C
> new file mode 100644
> index 000..cb3e99816a1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/lane-bound-1.C
> @@ -0,0 +1,21 @@
> +// { dg-do compile }
> +// { dg-options "" }
> +#include 
> +
> +// vgetq_lane_u64 should not cause any
> +// exceptions to thrown so even at -O0
> +// removeme should have been removed.
> +void removeme()
> +__attribute__((error("nothrow")));
> +int _setjmp();
> +void hh(uint64x2_t c, int __b)
> +{
> +  try {
> +vgetq_lane_u64(c, __b);
> +// { dg-error "must be a constant immediate" "" { target *-*-* } 0 }
> +  } catch (...)
> +  {
> +removeme(); // { dg-bogus "declared with attribute error" }
> +  }
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/lane-bound-3.c 
> b/gcc/testsuite/gcc.target/aarch64/lane-bound-3.c
> new file mode 100644
> index 000..9e0dad372cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/lane-bound-3.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* PR target/117665 */
> +/* __builtin_aarch64_im_lane_boundsi was causing an abnormal
> +   edge to the setjmp but then the builtin was folded into a nop
> +   and that edge was never removed but the edge was not needed in
> +   the first place. */
> +
> +#include 
> +
> +__attribute__((always_inline))
> +static inline
> +void h(uint64x2_t c, int __b) {
> +   /* Use vgetq_lane_u64 to get a 
> + __builtin_aarch64_im_lane_boundsi */
> +   vgetq_lane_u64(c, __b);
> +
> +  __builtin_unreachable();
> +}
> +
> +int _setjmp();
> +void hh(uint64x2_t c) {
> +  int __b = 0;
> +  if (_setjmp())
> +h(c, 0);
> +}


Re: [PATCH] wwwdocs: Align the DCO text for the GNU Toolchain to match community usage.

2024-11-29 Thread Carlos O'Donell
On 11/21/24 12:04 PM, Carlos O'Donell wrote:
> Adjust the DCO text to match the broader community usage including
> the Linux kernel use around "real names."

Ping. :-)

The glibc contribution checklist text has been updated:
https://sourceware.org/glibc/wiki/Contribution%20checklist

The binutils contribution checklist did not need updating.

I posted a status update to the projects here:
https://inbox.sourceware.org/libc-alpha/24b26a75-3b3d-4a00-af92-0b7c331e2...@redhat.com/
 
> These changes clarify what was meant by "real name" and that it is
> not required to be a "legal name" or any other stronger requirement
> than a known identity that could be contacted to discuss the
> contribution.
> 
> Link: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/process/submitting-patches.rst?id=d4563201f33a022fc0353033d9dfeb1606a88330
> Link: https://github.com/cncf/foundation/blob/659fd32c86dc/dco-guidelines.md
> Link: https://github.com/cncf/foundation/issues/383
> ---
>  htdocs/dco.html | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/htdocs/dco.html b/htdocs/dco.html
> index 68fa183b..7f6cb882 100644
> --- a/htdocs/dco.html
> +++ b/htdocs/dco.html
> @@ -54,8 +54,8 @@ then you just add a line saying:
>  
>  Signed-off-by: Random J Developer 
> 
>  
> -using your real name (sorry, no pseudonyms or anonymous contributions.)  This
> -will be done for you automatically if you use `git commit -s`.
> +using a known identity (sorry, no anonymous contributions.)
> +This will be done for you automatically if you use `git commit -s`.
>  
>  Some people also put extra optional tags at the end.  The GCC project does
>  not require tags from anyone other than the original author of the patch, but

-- 
Cheers,
Carlos.



[PATCH v1] Match: Refactor the unsigned SAT_SUB match patterns [NFC]

2024-11-29 Thread pan2 . li
From: Pan Li 

This patch would like to refactor the all unsigned SAT_SUB patterns, aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Refactor sorts of unsigned SAT_SUB match patterns.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 203 ++-
 1 file changed, 89 insertions(+), 114 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 2dd67b69cf1..94202322602 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3160,6 +3160,95 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 integer_minus_onep (realpart @2))
   (if (types_match (type, @0) && int_fits_type_p (@1, type)
 
+(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type))
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* SAT_U_SUB = X > Y ? X - Y : 0  */
+  (cond^ (gt @0 @1) (minus @0 @1) integer_zerop)
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* SAT_U_SUB = X >= Y ? X - Y : 0  */
+  (cond^ (ge @0 @1) (convert? (minus (convert1? @0) (convert1? @1)))
+   integer_zerop)
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* SAT_U_SUB = (X - Y) * (X > Y)  */
+  (mult:c (minus @0 @1) (convert (gt @0 @1)))
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* SAT_U_SUB = (X - Y) * (X >= Y)  */
+  (mult:c (minus @0 @1) (convert (ge @0 @1)))
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* DIFF = SUB_OVERFLOW (X, Y)
+ SAT_U_SUB = REALPART (DIFF) | (IMAGPART (DIFF) + (-1))  */
+  (bit_and:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1))
+   (plus (imagpart @2) integer_minus_onep))
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* DIFF = SUB_OVERFLOW (X, Y)
+ SAT_U_SUB = REALPART (DIFF) * (IMAGPART (DIFF) ^ (1))  */
+  (mult:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1))
+   (bit_xor (imagpart @2) integer_onep))
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* DIFF = SUB_OVERFLOW (X, Y)
+ SAT_U_SUB = IMAGPART (DIFF) == 0 ? REALPART (DIFF) : 0  */
+  (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
+(realpart @2) integer_zerop)
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* DIFF = SUB_OVERFLOW (X, Y)
+ SAT_U_SUB = IMAGPART (DIFF) != 0 ? 0 : REALPART (DIFF)  */
+  (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
+integer_zerop (realpart @2))
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* SAT_U_SUB = IMM > Y ? (IMM - Y) : 0
+ SAT_U_SUB = IMM >= Y ? (IMM - Y) : 0  */
+  (cond^ (le @1 INTEGER_CST@2) (minus INTEGER_CST@0 @1) integer_zerop)
+  (if (types_match (type, @1) && int_fits_type_p (@0, type))
+   (with
+{
+ unsigned precision = TYPE_PRECISION (type);
+ wide_int max = wi::mask (precision, false, precision);
+ wide_int c0 = wi::to_wide (@0);
+ wide_int c2 = wi::to_wide (@2);
+ wide_int c2_add_1 = wi::add (c2, wi::uhwi (1, precision));
+ bool equal_p = wi::eq_p (c0, c2);
+ bool less_than_1_p = !wi::eq_p (c2, max) && wi::eq_p (c2_add_1, c0);
+}
+(if (equal_p || less_than_1_p)
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* SAT_U_SUB = (MAX - 1) >= Y ? ((MAX - 1) - Y) : 0  */
+  (cond^ (ne @1 INTEGER_CST@2) (minus INTEGER_CST@0 @1) integer_zerop)
+  (if (types_match (type, @1))
+   (with
+{
+ unsigned precision = TYPE_PRECISION (type);
+ wide_int max = wi::mask (precision, false, precision);
+ wide_int c0 = wi::to_wide (@0);
+ wide_int c2 = wi::to_wide (@2);
+ wide_int c0_add_1 = wi::add (c0, wi::uhwi (1, precision));
+}
+(if (wi::eq_p (c2, max) && wi::eq_p (c0_add_1, max))
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* SAT_U_SUB = 1 >= Y ? (1 - Y) : 0  */
+  (cond^ (le @1 integer_onep@0) (bit_xor @1 integer_onep@0) integer_zerop)
+  (if (types_match (type, @1
+ (match (unsigned_integer_sat_sub @0 @1)
+  /* SAT_U_SUB = X > IMM  ? (X - IMM) : 0.
+ SAT_U_SUB = X >= IMM ? (X - IMM) : 0.  */
+  (plus (max @0 INTEGER_CST@1) INTEGER_CST@2)
+  (if (types_match (type, @1) && int_fits_type_p (@1, type))
+   (with
+{
+ unsigned precision = TYPE_PRECISION (type);
+ wide_int c1 = wi::to_wide (@1);
+ wide_int c2 = wi::to_wide (@2);
+ wide_int sum = wi::add (c1, c2);
+}
+(if (wi::eq_p (sum, wi::uhwi (0, precision
+
 /* Signed saturation add, case 1:
T sum = (T)((UT)X + (UT)Y)
SAT_S_ADD = (X ^ sum) & !(X ^ Y) < 0 ? (-(T)(X < 0) ^ MAX) : sum;
@@ -3245,120 +3334,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
  && wi::bit_and (wi::to_wide (@1), wi::to_wide (@

Re: [PATCH] arm, testsuite: Adjust Arm tests after c23 changes

2024-11-29 Thread Christophe Lyon
FTR this patch is superseded by Andre's patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670378.html

On Thu, 28 Nov 2024 at 11:12, Christophe Lyon
 wrote:
>
> After the recent c23, GCC complains because the testcase calls f()
> with a parameter whereas the prototype has none.
>
> gcc/testsuite/ChangeLog
> * gcc.target/arm/mve/dlstp-loop-form.c: Fix f() prototype.
> ---
>  gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c 
> b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
> index 08811cef568..3039ee8f686 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
> @@ -6,7 +6,7 @@
>  #pragma GCC arm "arm_mve.h" false
>  typedef __attribute__((aligned(2))) float16x8_t e;
>  mve_pred16_t c(long d) { return __builtin_mve_vctp16qv8bi(d); }
> -int f();
> +int f(e);
>  void n() {
>int g, h, *i, j;
>mve_pred16_t k;
> --
> 2.34.1
>


[PATCH] tree-optimization/115438 - SLP reduction vect vs. bwaves

2024-11-29 Thread Richard Biener
503.bwaves_r shows a case where the non-SLP optimization of performing
the reduction adjustment with the initial value as part of the epilogue
rather than including it as part of the initial vector value.  It allows
to break a critical dependence path.  The following restores this
ability for single-lane SLP.

On Zen2 this turns a 2.5% regression from GCC 14 into a 2.5%
improvement.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115438
* tree-vect-loop.cc (vect_transform_cycle_phi): For SLP also
try to do the reduction adjustment by the initial value
in the epilogue.
---
 gcc/tree-vect-loop.cc | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8c9be48ef0f..5a24fb8bf4c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9193,6 +9193,20 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
  tree neutral_op
= neutral_op_for_reduction (TREE_TYPE (vectype_out),
code, initial_value);
+ /* Try to simplify the vector initialization by applying an
+adjustment after the reduction has been performed.  This
+can also break a critical path but on the other hand
+requires to keep the initial value live across the loop.  */
+ if (neutral_op
+ && initial_values.length () == 1
+ && !reduc_info->reused_accumulator
+ && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
+ && !operand_equal_p (neutral_op, initial_values[0]))
+   {
+ STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT (reduc_info)
+   = initial_values[0];
+ initial_values[0] = neutral_op;
+   }
  get_initial_defs_for_reduction (loop_vinfo, reduc_info,
  &vec_initial_defs, vec_num,
  stmts.length (), neutral_op);
-- 
2.43.0


Re: [PATCH v5 5/5] aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

2024-11-29 Thread Richard Sandiford
Thanks for the update!

Claudio Bantaloukas  writes:
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2a4f016e2df..f7440113570 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21957,6 +21957,18 @@ Enable the fp8 (8-bit floating point) multiply 
> accumulate extension.
>  @item ssve-fp8fma
>  Enable the fp8 (8-bit floating point) multiply accumulate extension in 
> streaming
>  mode.
> +@item fp8dot4
> +Enable the fp8 (8-bit floating point) to single-precision 4-way dot product
> +extension.
> +@item ssve-fp8dot4
> +Enable the fp8 (8-bit floating point) to single-precision 4-way dot product
> +extension in streaming mode.
> +@item fp8dot2
> +Enable the fp8 (8-bit floating point) o half-precision 2-way dot product
> +extension.

typo: s/o/to/ (and below).

Since the change is so trivial, I made it locally, tweaked the ordering
of the svcvt entries in patch 3, and fixed some whitespace issues that
git am was complaining about.  Push to trunk with those changes.

Now that you've had at least two series applied, could you follow the
process on https://gcc.gnu.org/gitwrite.html to get write access for
future patches?  (I'll sponsor.)

Thanks,
Richard

> +@item ssve-fp8dot2
> +Enable the fp8 (8-bit floating point) o half-precision 2-way dot product
> +extension in streaming mode.
>  @item faminmax
>  Enable the Floating Point Absolute Maximum/Minimum extension.
>  @item sve-b16b16
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
> new file mode 100644
> index 000..9ad789a8ad2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +
> +#include 
> +
> +#pragma GCC target ("arch=armv8.2-a+sve2+fp8dot2")
> +
> +void
> +test (svfloat16_t f16, svmfloat8_t f8, fpm_t fpm, 
> +svbool_t pg, svuint8_t u8, svuint16_t u16, svint32_t s32,
> +svbfloat16_t bf16, svfloat32_t f32, svfloat64_t f64, mfloat8_t f)
> +{
> +  svdot_fpm (f16, f8, f8, fpm);
> +  svdot_fpm (f32, f8, f8, fpm);
> +
> +  svdot_fpm (f16); /* { dg-error {too few arguments to function 'svdot_fpm'} 
> } */
> +  svdot_fpm (f16, f8); /* { dg-error {too few arguments to function 
> 'svdot_fpm'} } */
> +  svdot_fpm (f16, f8, f8); /* { dg-error {too few arguments to function 
> 'svdot_fpm'} } */
> +  svdot_fpm (f8, f8, fpm); /* { dg-error {too few arguments to function 
> 'svdot_fpm'} } */
> +  svdot_fpm (f16, f8, fpm); /* { dg-error {too few arguments to function 
> 'svdot_fpm'} } */
> +  svdot_fpm (f16, f8, f8, fpm, 0); /* { dg-error {too many arguments to 
> function 'svdot_fpm'} } */
> +
> +  svdot_fpm (0, f8, f8, fpm); /* { dg-error {passing 'int' to argument 1 of 
> 'svdot_fpm', which expects an SVE type rather than a scalar} } */
> +  svdot_fpm (f16, f8, f, fpm); /* { dg-error {passing 'mfloat8_t' {aka 
> '__mfp8'} to argument 3 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
> +  svdot_fpm (pg, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
> takes 'svbool_t' and 'svmfloat8_t' arguments} } */
> +  svdot_fpm (u8, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
> takes 'svuint8_t' and 'svmfloat8_t' arguments} } */
> +  svdot_fpm (u16, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
> takes 'svuint16_t' and 'svmfloat8_t' arguments} } */
> +  svdot_fpm (f64, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
> takes 'svfloat64_t' and 'svmfloat8_t' arguments} } */
> +  svdot_fpm (f16, 0, f8, fpm); /* { dg-error {passing 'int' to argument 2 of 
> 'svdot_fpm', which expects 'svmfloat8_t'} } */
> +  svdot_fpm (f16, f16, f8, fpm); /* { dg-error {passing 'svfloat16_t' to 
> argument 2 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
> +  svdot_fpm (f16, f8, 0, fpm); /* { dg-error {passing 'int' to argument 3 of 
> 'svdot_fpm', which expects 'svmfloat8_t'} } */
> +  svdot_fpm (f16, f8, f16, fpm); /* { dg-error {passing 'svfloat16_t' to 
> argument 3 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
> +  svdot_fpm (f16, f8, f8, f8); /* { dg-error {passing 'svmfloat8_t' to 
> argument 4 of 'svdot_fpm', which expects 'uint64_t'} } */
> +}
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
>  
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
> new file mode 100644
> index 000..dec00e3abf1
> --- /dev/null
> +++ 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +
> +#include 
> +
> +#pragma GCC target ("arch=armv8.2-a+ssve-fp8fma+ssve-fp8dot2")
> +
> +void
> +f1 (svfloat16_t f16, svmfloat8_t f8, fpm_t fpm, 
> +svbool_t pg, svuint8_t u8, svuint16_t u16, svint32_t s32,
> +svbfloat16_t bf16, svfloat32_t f32, svfloat64_t f64, mfloat8_t f, int i)
> +__arm_streami

[PATCH] aarch64: Add ISA requirements to some SVE/SME md comments

2024-11-29 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi Richard
>> On 6 Nov 2024, at 18:16, Richard Sandiford  wrote:
>> 
>> This series adds support for FEAT_SVE2p1 (-march=...+sve2p1).
>> One thing that the extension does is make some SME and SME2 instructions
>> available outside of streaming mode.  It also adds quite a few new
>> instructions.  Some of those new instructions are shared with SME2.1,
>> which will be added by a later patch.
>> 
>> Tested on aarch64-linux-gnu.  GNU binutils doesn't yet have full
>> support for SVE2.1, meaning that the aarch64_asm_sve2p1_ok target
>> selector fails and that the new aarch64-sve2-acle-asm.exp tests fall
>> back to "dg-do compile" instead of "dg-do assemble".  However, I also
>> tested aarch64-sve2-acle-asm.exp against LLVM's assembler using a
>> hacked-up script.
>> 
>> I also tried to cross-check GCC's implementation against LLVM's SVE2.1
>> ACLE tests.  There were some failures due to missing B16B16 support
>> (part of a separate follow-on series) and the fact that LLVM's stores
>> take pointers to const (raised separately), but otherwise things
>> seemed ok.
>> 
>> I'll commit this on Monday if there are no comments before then,
>> but please let me know if you'd like me to wait longer.  It will
>> likely need some minor updates due to conflicts with other
>> in-flight patches.
>
> Thanks for these!
> One suggestion I have is that for the patterns in aarch64-sve2.md we may want 
> to have SVE2p1 in the comments for the new instructions.
> For example, we have:
> ;; -
> ;;  [FP] Clamp to minimum/maximum
> ;; -
> ;; - BFCLAMP (SVE_B16B16)
> ;; - FCLAMP
> ;; -
> Which shows what extension is BFCLAMP a part of.
> I personally find these comments useful when quickly looking for what’s in 
> the base architecture and what’s not and it’d be nice to keep this scheme 
> going though I notice we’re already not totally consistent about it.
> I don’t insist on it, but it’s a suggestion.

Ah, yeah, good point.  I've applied the patch below to do that.

(And sorry for the delay.  I thought it would be fairer to wait until
Claudio's patches were in, to avoid creating more churn for that series.)

Thanks,
Richard



The SVE and SME md files are divided into sections, with each
section often starting with a comment that lists the associated
mnemonics.  These lists usually include the base architecture
requirement in parentheses, if the base requirement is greater
than the baseline for the file.  This patch tries to be more
consistent about when we do that for the recently added SVE2p1
and SME2p1 extensions.

gcc/
* config/aarch64/aarch64-sme.md: In the section comments, add the
architecture requirements alongside some mnemonics.
* config/aarch64/aarch64-sve2.md: Likewise.
---
 gcc/config/aarch64/aarch64-sme.md  |  72 ++---
 gcc/config/aarch64/aarch64-sve2.md | 162 +++--
 2 files changed, 121 insertions(+), 113 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index 0f362671f75..e4562186bdd 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -533,7 +533,7 @@ (define_insn "@aarch64_sme_ldrn"
 ;;  Table loads
 ;; -
 ;; Includes:
-;; - LDR
+;; - LDR (SME2)
 ;; -
 
 (define_c_enum "unspec" [
@@ -635,7 +635,7 @@ (define_insn "@aarch64_sme_strn"
 ;;  Table stores
 ;; -
 ;; Includes:
-;; - STR
+;; - STR (SME2)
 ;; -
 
 (define_insn "aarch64_sme_str_zt0"
@@ -651,7 +651,7 @@ (define_insn "aarch64_sme_str_zt0"
 ;; -
 ;; Includes:
 ;; - MOVA
-;; - MOVAZ
+;; - MOVAZ (SME2p1)
 ;; -
 
 (define_insn "@aarch64_sme_"
@@ -813,7 +813,7 @@ (define_insn 
"@aarch64_sme_"
 ;; -
 ;; Includes:
 ;; - MOVA
-;; - MOVAZ
+;; - MOVAZ (SME2p1)
 ;; -
 
 (define_insn "@aarch64_sme_"
@@ -1140,12 +1140,12 @@ (define_insn "@aarch64_sme_"
 ;;  Binary arithmetic on ZA slice
 ;; -
 ;; Includes:
-;; - ADD
-;; - BFADD
-;; - BFSUB
-;; - FADD
-;; - FSUB
-;; - SUB
+;; - ADD (SME2)
+;; - BFADD (SME_B16B16)
+;; - BFSUB (SME_B16B16)
+;; - FADD (SME2)
+;; - FSUB (SME2)
+;; - SUB (SME2)
 ;; 

RE: [PATCH v2 4/4] vect: Disable `omp declare variant' tests for aarch64

2024-11-29 Thread Tamar Christina
Ping,

I'm filling in for Victor on the patch series.

Regards,
Tamar

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Tuesday, November 5, 2024 12:38 AM
> To: gcc-patches@gcc.gnu.org
> Cc: ja...@redhat.com
> Subject: Re: [PATCH v2 4/4] vect: Disable `omp declare variant' tests for 
> aarch64
> 
> 
> cc'ing Jakub due to email address typo in original patch submission.
> 
> Apologies,
> Victor
> 
> Victor Do Nascimento  writes:
> 
> > gcc/testsuite/ChangeLog:
> >
> > * c-c++-common/gomp/declare-variant-14.c: Make i?86 and x86_64
> target
> > only test.
> > * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
> > ---
> >  gcc/testsuite/c-c++-common/gomp/declare-variant-14.c | 12 +---
> >  .../gfortran.dg/gomp/declare-variant-14.f90  | 10 --
> >  2 files changed, 9 insertions(+), 13 deletions(-)
> >
> > diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> > index e3668893afe..2b71869787e 100644
> > --- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> > +++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> > @@ -1,6 +1,6 @@
> > -/* { dg-do compile { target vect_simd_clones } } */
> > +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones 
> > } } } */
> >  /* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" } */
> > -/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } 
> > } */
> > +/* { dg-additional-options "-mno-sse3" } */
> >
> >  int f01 (int);
> >  int f02 (int);
> > @@ -15,15 +15,13 @@ int
> >  test1 (int x)
> >  {
> >/* At gimplification time, we can't decide yet which function to call.  
> > */
> > -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target { 
> > !aarch64*-
> *-* } } } } */
> > +  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
> >/* After simd clones are created, the original non-clone test1 shall
> >   call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
> >   shall call f01 with score 8.  */
> >/* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
> > -  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target 
> > {
> !aarch64*-*-* } } } } */
> > -  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target 
> > {
> aarch64*-*-* } } } } */
> > -  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target {
> !aarch64*-*-* } } } } */
> > -  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target {
> aarch64*-*-* } } } } */
> > +  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } */
> > +  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } */
> >int a = f04 (x);
> >int b = f04 (x);
> >return a + b;
> > diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> > index 6319df0558f..8db341fd153 100644
> > --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> > +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> > @@ -1,6 +1,6 @@
> > -! { dg-do compile { target vect_simd_clones } }
> > +! { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } 
> > } } */
> >  ! { dg-additional-options "-O0 -fdump-tree-gimple -fdump-tree-optimized" }
> > -! { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } }
> > +! { dg-additional-options "-mno-sse3" }
> >
> >  module main
> >implicit none
> > @@ -40,10 +40,8 @@ contains
> >  ! call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
> >  ! shall call f01 with score 8.
> >  ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } }
> > -! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { 
> > target {
> !aarch64*-*-* } } } }
> > -! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target 
> > {
> aarch64*-*-* } } } }
> > -! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target 
> > {
> !aarch64*-*-* } } } }
> > -! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target 
> > {
> aarch64*-*-* } } } }
> > +! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } }
> > +! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } }
> >  a = f04 (x)
> >  b = f04 (x)
> >  test1 = a + b


Re: [PATCH] aarch64: Extend SVE2 bit-select instructions for Neon modes.

2024-11-29 Thread Richard Sandiford
Kyrylo Tkachov  writes:
>> On 27 Nov 2024, at 09:34, Richard Sandiford  
>> wrote:
>> 
>> Soumya AR  writes:
>>> NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain 
>>> operands
>>> inverted. These can be extended to work with Neon modes.
>>> 
>>> Since these instructions are unpredicated, duplicate patterns were added 
>>> with
>>> the predicate removed to generate these instructions for Neon modes.
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Soumya AR 
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * config/aarch64/aarch64-sve2.md
>>> (*aarch64_sve2_nbsl_unpred): New pattern to match unpredicated
>>> form.
>>> (*aarch64_sve2_bsl1n_unpred): Likewise.
>>> (*aarch64_sve2_bsl2n_unpred): Likewise.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> * gcc.target/aarch64/sve/bitsel.c: New test.
>> 
>> Thanks for the patch.  But since this is a new optimisation, and is not
>> fixing a regression, I'm not sure whether it would be appropriate during
>> stage 3.  Let's see what other maintainers say.
>
> IMO it’s not high risk but it’s a nice-to-have optimisation rather than 
> driven by a concrete motivating workload.
> Given that we have a few such patches (like the ASRD patch from Soumya) it 
> would be consistent to either take them all now or stage them all for GCC 16.

Yeah, agreed.  I'd chosen this patch somewhat arbitrarily, but it was
really a comment about the ongoing work in general.

> I’d be okay with deferring them to GCC 16 but would appreciate if they 
> received some feedback on the implementation beforehand so they can be 
> polished for next stage1.

Sure, will try to get to them soon.

I'm also not strongly opposed to the patches going in.  Full disclosure:
there are some bits of FP8 work that (despite our best efforts) slipped
into stage 3 due to unforeseen circumstances, and still need to be posted.
I'm hoping they can still go in, since the alternative would be to
disable all the existing FP8 work for GCC 15.

Given that, it probably seems hypocritical to push back on these SVE-for-
NEON patches.  The reason I did is that the work seems like an ongoing
project with no well-defined end point, so it seemed like the GCC 15
cut-off would have to be time-driven rather than feature-driven.

Thanks for all the work on this though -- it's definitely a useful project.

Richard


Re: [PATCH] c++, v2: Implement C++26 P3176R1 - The Oxford variadic comma

2024-11-29 Thread Jason Merrill

On 11/26/24 5:27 AM, Jonathan Wakely wrote:

On Tue, 26 Nov 2024 at 08:34, Jakub Jelinek  wrote:


On Mon, Nov 25, 2024 at 03:40:18PM -0500, Marek Polacek wrote:

Just "omitting a comma", I think.

"of a function parameter list"

I suppose we should also test -Wno-deprecated and/or
-Wno-deprecated-variadic-comma-omission in C++26.


Thanks, I've made those changes including adding more copies
of variadic-comma*.C with different options and diagnostics
expectations.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-26  Jakub Jelinek  

gcc/c-family/
 * c.opt: Implement C++26 P3176R1 - The Oxford variadic comma.
 (Wdeprecated-variadic-comma-omission): New option.
 * c.opt.urls: Regenerate.
 * c-opts.cc (c_common_post_options): Default to
 -Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic.
gcc/cp/
 * parser.cc (cp_parser_parameter_declaration_clause): Emit
 -Wdeprecated-variadic-comma-omission warnings.
gcc/
 * doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document.
gcc/testsuite/
 * g++.dg/cpp26/variadic-comma1.C: New test.
 * g++.dg/cpp26/variadic-comma2.C: New test.
 * g++.dg/cpp26/variadic-comma3.C: New test.
 * g++.dg/cpp26/variadic-comma4.C: New test.
 * g++.dg/cpp26/variadic-comma5.C: New test.
 * g++.dg/cpp1z/fold10.C: Expect a warning for C++26.
 * g++.dg/ext/attrib33.C: Likewise.
 * g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise.
 * g++.dg/cpp2a/lambda-generic10.C: Likewise.
 * g++.dg/cpp0x/lambda/lambda-const3.C: Likewise.
 * g++.dg/cpp0x/variadic164.C: Likewise.
 * g++.dg/cpp0x/variadic17.C: Likewise.
 * g++.dg/cpp0x/udlit-args-neg.C: Likewise.
 * g++.dg/cpp0x/variadic28.C: Likewise.
 * g++.dg/cpp0x/gen-attrs-33.C: Likewise.
 * g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise.
 * g++.old-deja/g++.law/operators15.C: Likewise.
 * g++.old-deja/g++.mike/p811.C: Likewise.
 * g++.old-deja/g++.mike/p12306.C (printf): Add , before ... .
 * g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise.
 * g++.dg/cpp0x/vt-65790.C (printf): Likewise.
libstdc++-v3/
 * include/std/functional (_Bind_check_arity): Add , before ... .
 * include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl):
 Likewise.
 * include/tr1/type_traits (is_function): Likewise.

--- gcc/c-family/c.opt.jj   2024-11-22 19:52:19.477579338 +0100
+++ gcc/c-family/c.opt  2024-11-25 16:22:11.325028058 +0100
@@ -672,6 +672,10 @@ Wdeprecated-non-prototype
  C ObjC Var(warn_deprecated_non_prototype) Init(-1) Warning
  Warn about calls with arguments to functions declared without parameters.

+Wdeprecated-variadic-comma-omission
+C++ ObjC++ Var(warn_deprecated_variadic_comma_omission) Warning
+Warn about deprecated omission of comma before ... in variadic function 
declaration.
+
  Wdesignated-init
  C ObjC Var(warn_designated_init) Init(1) Warning
  Warn about positional initialization of structs requiring designated 
initializers.
--- gcc/c-family/c.opt.urls.jj  2024-11-18 21:59:35.872150269 +0100
+++ gcc/c-family/c.opt.urls 2024-11-25 17:38:25.396693802 +0100
@@ -310,6 +310,9 @@ UrlSuffix(gcc/C_002b_002b-Dialect-Option
  Wdeprecated-non-prototype
  UrlSuffix(gcc/Warning-Options.html#index-Wdeprecated-non-prototype)

+Wdeprecated-variadic-comma-omission
+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wdeprecated-variadic-comma-omission)
+
  Wdesignated-init
  UrlSuffix(gcc/Warning-Options.html#index-Wdesignated-init)

--- gcc/c-family/c-opts.cc.jj   2024-11-23 13:00:28.188030262 +0100
+++ gcc/c-family/c-opts.cc  2024-11-25 16:23:36.569829016 +0100
@@ -1051,6 +1051,11 @@ c_common_post_options (const char **pfil
warn_deprecated_literal_operator,
deprecated_in (cxx23));

+  /* -Wdeprecated-variadic-comma-omission is enabled by default in C++26.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  warn_deprecated_variadic_comma_omission,
+  deprecated_in (cxx26));
+
/* -Wtemplate-id-cdtor is enabled by default in C++20.  */
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
warn_template_id_cdtor,
--- gcc/cp/parser.cc.jj 2024-11-23 13:00:29.060017680 +0100
+++ gcc/cp/parser.cc2024-11-25 17:02:24.551059305 +0100
@@ -25667,6 +25667,16 @@ cp_parser_parameter_declaration_clause (
   omitted.  */
else if (token->type == CPP_ELLIPSIS)
  {
+  /* Deprecated by P3176R1 in C++26.  */
+  if (warn_deprecated_variadic_comma_omission)
+   {
+ gcc_rich_location richloc (token->location);
+ richloc.add_fixit_insert_before (", ");
+ warning_at (&richloc, OPT_Wdeprecated_variadic_comma_omission,
+ "omitting of %<,%> before v

[PATCH] c++, coroutines: Make suspend index consistent for debug.

2024-11-29 Thread Iain Sandoe
Tested on x86_64-darwin, x86_64-linux,
OK for trunk?
thanks
Iain

--- 8< ---

At present, we only update the suspend index when we actually are
at the stage that the coroutine is considered suspended. This is
on the basis that it is UB to resume or destroy a coroutines that
is not suspended (and therefore we never need to access this value
otherwise).  However, it is possible that someone could set a debug
breakpoint on the resume which can be reached without suspending
if await_ready() returns true.  In that case, the debugger would
read an incorrect suspend index.  Fixed by moving the update to
just before the test for ready.

gcc/cp/ChangeLog:

* coroutines.cc (expand_one_await_expression): Update the
suspend point index before checking if the coroutine is
ready.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 3148559d208..b36730d793c 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1964,6 +1964,11 @@ expand_one_await_expression (tree *expr, tree 
*await_expr, void *d)
 
   /* Use the await_ready() call to test if we need to suspend.  */
   tree ready_cond = TREE_VEC_ELT (awaiter_calls, 0); /* await_ready().  */
+  /* We are about to pass this suspend point.  */
+  tree susp_idx = build_int_cst (short_unsigned_type_node, data->index);
+  tree r = cp_build_init_expr (data->resume_idx, susp_idx);
+  finish_expr_stmt (r);
+
   /* Convert to bool, if necessary.  */
   if (TREE_CODE (TREE_TYPE (ready_cond)) != BOOLEAN_TYPE)
 ready_cond = cp_convert (boolean_type_node, ready_cond,
@@ -1974,10 +1979,6 @@ expand_one_await_expression (tree *expr, tree 
*await_expr, void *d)
   ready_cond = invert_truthvalue_loc (loc, ready_cond);
   finish_if_stmt_cond (ready_cond, susp_if);
 
-  tree susp_idx = build_int_cst (short_unsigned_type_node, data->index);
-  tree r = cp_build_init_expr (data->resume_idx, susp_idx);
-  finish_expr_stmt (r);
-
   /* Find out what we have to do with the awaiter's suspend method.
  [expr.await]
  (5.1) If the result of await-ready is false, the coroutine is considered
-- 
2.39.2 (Apple Git-143)



[PATCH] c++, coroutines: Handle statement expressions part 1.

2024-11-29 Thread Iain Sandoe
Tested on x86_64-darwin, x86_64/powerpc64-linux, on folly and more
widely by Sam.  There are possibly additional BZ dups that will be
covered (this fixes 117231, a P1). OK for trunk?
thanks
Iain

--- 8< ---

In the current implementation, statement expressions were intentionally
unsupported (as a C++ extension).  However since they are quite heavily
used by end-users and also now emitted by the compiler in some cases
we are now working to add them.  This first patch ensures that we
recurse into statement expressions (and therefore handle coroutine
keywords that might appear inside them).

PR c++/115851
PR c++/116914
PR c++/117231

gcc/cp/ChangeLog:

* coroutines.cc (await_statement_expander): Walk into
statement expressions.
(await_statement_walker): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115851.C: New test.
* g++.dg/coroutines/pr116914.C: New test.
* g++.dg/coroutines/pr117231.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc   | 22 
 gcc/testsuite/g++.dg/coroutines/pr115851.C | 35 +++
 gcc/testsuite/g++.dg/coroutines/pr116914.C | 40 ++
 gcc/testsuite/g++.dg/coroutines/pr117231.C | 21 
 4 files changed, 118 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr115851.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116914.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr117231.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index cdf61d89109..5764475a7de 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -2128,6 +2128,14 @@ await_statement_expander (tree *stmt, int *do_subtree, 
void *d)
 }
   else if (EXPR_P (*stmt))
 {
+  /* Look for ({}) at the top level - just recurse into these.  */
+  if (TREE_CODE (*stmt) == EXPR_STMT)
+   {
+ tree inner = EXPR_STMT_EXPR (*stmt);
+ if (TREE_CODE (inner) == STATEMENT_LIST
+ || TREE_CODE (inner) == BIND_EXPR)
+   return NULL_TREE; // process contents
+   }
   process_one_statement (stmt, d);
   *do_subtree = 0; /* Done subtrees.  */
 }
@@ -3857,6 +3865,20 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
   if (!(cp_walk_tree (stmt, find_any_await, &await_ptr, &visited)))
return NULL_TREE; /* Nothing special to do here.  */
 
+  /* Handle statement expressions.  */
+  if (TREE_CODE (expr) == EXPR_STMT)
+   {
+ tree inner = EXPR_STMT_EXPR (expr);
+ if (TREE_CODE (inner) == STATEMENT_LIST
+ || TREE_CODE (inner) == BIND_EXPR)
+   {
+ res = cp_walk_tree (&EXPR_STMT_EXPR (expr),
+ await_statement_walker, d, NULL);
+ *do_subtree = 0;
+ return res;
+   }
+   }
+
   visited.empty ();
   awpts->saw_awaits = 0;
   hash_set truth_aoif_to_expand;
diff --git a/gcc/testsuite/g++.dg/coroutines/pr115851.C 
b/gcc/testsuite/g++.dg/coroutines/pr115851.C
new file mode 100644
index 000..0e251760574
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr115851.C
@@ -0,0 +1,35 @@
+// { dg-additional-options "-Wno-pedantic " }
+#include 
+
+struct SuspendNever {
+bool await_ready() noexcept;
+void await_suspend(std::coroutine_handle<>) noexcept;
+void await_resume() noexcept;
+};
+
+struct Coroutine;
+
+struct PromiseType {
+Coroutine get_return_object();
+SuspendNever initial_suspend();
+SuspendNever final_suspend() noexcept;
+void unhandled_exception () {}
+};
+
+struct Coroutine {
+using promise_type = PromiseType;
+};
+
+struct ErrorOr {
+int release_error();
+};
+
+void warnln(int const&);
+
+Coroutine __async_test_input_basic() {
+({
+co_await SuspendNever{};
+   ErrorOr _temporary_result2;
+warnln(_temporary_result2.release_error());
+});
+}
diff --git a/gcc/testsuite/g++.dg/coroutines/pr116914.C 
b/gcc/testsuite/g++.dg/coroutines/pr116914.C
new file mode 100644
index 000..8f1310380fc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr116914.C
@@ -0,0 +1,40 @@
+//  { dg-additional-options "-std=gnu++20 -fpreprocessed" }
+
+namespace std {
+template  struct coroutine_traits : a {};
+template  struct coroutine_handle {
+  static coroutine_handle from_address(void *);
+  operator coroutine_handle<>();
+  void *address();
+};
+struct b {
+  int await_ready() noexcept;
+  void await_suspend(coroutine_handle<>) noexcept;
+  void await_resume() noexcept;
+};
+} // namespace std
+struct c;
+struct d {
+  c get_return_object();
+  std::b initial_suspend();
+  std::b final_suspend() noexcept;
+  void unhandled_exception();
+  std::b yield_value(int);
+};
+struct e {
+  void operator++();
+  int operator*();
+  int operator!=(e);
+};
+struct c {
+  using promise_type = d;
+  e begin();
+  e end();
+  c f() {
+c g;
+for (auto h : g) 

[PATCH] c++, coroutines:Ensure bind exprs are visited once [PR98935].

2024-11-29 Thread Iain Sandoe
Tested on x86_64-darwin, x86_64, powerpc64-linux,
OK for trunk?
thanks
Iain

--- 8< ---

Recent changes in the C++ FE and the coroutines implementation have
exposed a latent issue in which a bind expression containing a var
that we need to capture in the coroutine state gets visited twice.
This causes an ICE (from a checking assert).  Fixed by adding a pset
to the relevant tree walk.  Exit the callback early when the tree is
not a BIND_EXPR.

PR c++/98935

gcc/cp/ChangeLog:

* coroutines.cc (register_local_var_uses): Add a pset to the
tree walk to avoid visiting the same BIND_EXPR twice.  Make
an early exit for cases that the callback does not apply.
(cp_coroutine_transform::apply_transforms): Add a pset to the
tree walk for register_local_vars.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr98935.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc  | 158 +++---
 gcc/testsuite/g++.dg/coroutines/pr98935.C |  27 
 2 files changed, 107 insertions(+), 78 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr98935.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 5764475a7de..f6f6256114a 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4055,6 +4055,9 @@ coro_make_frame_entry (tree *field_list, const char 
*name, tree fld_type,
 static tree
 register_local_var_uses (tree *stmt, int *do_subtree, void *d)
 {
+  if (TREE_CODE (*stmt) != BIND_EXPR)
+return NULL_TREE;
+
   local_vars_frame_data *lvd = (local_vars_frame_data *) d;
 
   /* As we enter a bind expression - record the vars there and then recurse.
@@ -4062,88 +4065,86 @@ register_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  The bind index is a growing count of how many bind indices we've seen.
  We build a space in the frame for each local var.  */
 
-  if (TREE_CODE (*stmt) == BIND_EXPR)
+  tree lvar;
+  unsigned serial = 0;
+  for (lvar = BIND_EXPR_VARS (*stmt); lvar != NULL; lvar = DECL_CHAIN (lvar))
 {
-  tree lvar;
-  unsigned serial = 0;
-  for (lvar = BIND_EXPR_VARS (*stmt); lvar != NULL;
-  lvar = DECL_CHAIN (lvar))
-   {
- bool existed;
- local_var_info &local_var
-   = lvd->local_var_uses->get_or_insert (lvar, &existed);
- gcc_checking_assert (!existed);
- local_var.def_loc = DECL_SOURCE_LOCATION (lvar);
- tree lvtype = TREE_TYPE (lvar);
- local_var.frame_type = lvtype;
- local_var.field_idx = local_var.field_id = NULL_TREE;
-
- /* Make sure that we only present vars to the tests below.  */
- if (TREE_CODE (lvar) != PARM_DECL
- && TREE_CODE (lvar) != VAR_DECL)
-   continue;
-
- /* We don't move static vars into the frame. */
- local_var.is_static = TREE_STATIC (lvar);
- if (local_var.is_static)
-   continue;
-
- poly_uint64 size;
- if (TREE_CODE (lvtype) == ARRAY_TYPE
- && !poly_int_tree_p (DECL_SIZE_UNIT (lvar), &size))
-   {
- sorry_at (local_var.def_loc, "variable length arrays are not"
-   " yet supported in coroutines");
- /* Ignore it, this is broken anyway.  */
- continue;
-   }
+  bool existed;
+  local_var_info &local_var
+   = lvd->local_var_uses->get_or_insert (lvar, &existed);
+  gcc_checking_assert (!existed);
+  local_var.def_loc = DECL_SOURCE_LOCATION (lvar);
+  tree lvtype = TREE_TYPE (lvar);
+  local_var.frame_type = lvtype;
+  local_var.field_idx = local_var.field_id = NULL_TREE;
+
+  /* Make sure that we only present vars to the tests below.  */
+  if (TREE_CODE (lvar) != PARM_DECL
+ && TREE_CODE (lvar) != VAR_DECL)
+   continue;
 
- lvd->local_var_seen = true;
- /* If this var is a lambda capture proxy, we want to leave it alone,
-and later rewrite the DECL_VALUE_EXPR to indirect through the
-frame copy of the pointer to the lambda closure object.  */
- local_var.is_lambda_capture = is_capture_proxy (lvar);
- if (local_var.is_lambda_capture)
-   continue;
-
- /* If a variable has a value expression, then that's what needs
-to be processed.  */
- local_var.has_value_expr_p = DECL_HAS_VALUE_EXPR_P (lvar);
- if (local_var.has_value_expr_p)
-   continue;
-
- /* Make names depth+index unique, so that we can support nested
-scopes with identically named locals and still be able to
-identify them in the coroutine frame.  */
- tree lvname = DECL_NAME (lvar);
- char *buf = NULL;
-
- /* The outermost bind scope contains the artificial variables that
-we inject to implement the coro state machine.  We want to be able
-to inspect these in debugging.  */
- if (l

Re: [PATCH v5 5/5] aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

2024-11-29 Thread Claudio Bantaloukas




On 11/29/2024 1:00 PM, Richard Sandiford wrote:

Thanks for the update!

Claudio Bantaloukas  writes:

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2a4f016e2df..f7440113570 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21957,6 +21957,18 @@ Enable the fp8 (8-bit floating point) multiply 
accumulate extension.
  @item ssve-fp8fma
  Enable the fp8 (8-bit floating point) multiply accumulate extension in 
streaming
  mode.
+@item fp8dot4
+Enable the fp8 (8-bit floating point) to single-precision 4-way dot product
+extension.
+@item ssve-fp8dot4
+Enable the fp8 (8-bit floating point) to single-precision 4-way dot product
+extension in streaming mode.
+@item fp8dot2
+Enable the fp8 (8-bit floating point) o half-precision 2-way dot product
+extension.


typo: s/o/to/ (and below).

Since the change is so trivial, I made it locally, tweaked the ordering
of the svcvt entries in patch 3, and fixed some whitespace issues that
git am was complaining about.  Push to trunk with those changes.

Now that you've had at least two series applied, could you follow the
process on https://gcc.gnu.org/gitwrite.html to get write access for
future patches?  (I'll sponsor.)

Done

Thank you!


Thanks,
Richard


+@item ssve-fp8dot2
+Enable the fp8 (8-bit floating point) o half-precision 2-way dot product
+extension in streaming mode.
  @item faminmax
  Enable the Floating Point Absolute Maximum/Minimum extension.
  @item sve-b16b16
diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
new file mode 100644
index 000..9ad789a8ad2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+
+#include 
+
+#pragma GCC target ("arch=armv8.2-a+sve2+fp8dot2")
+
+void
+test (svfloat16_t f16, svmfloat8_t f8, fpm_t fpm,
+svbool_t pg, svuint8_t u8, svuint16_t u16, svint32_t s32,
+svbfloat16_t bf16, svfloat32_t f32, svfloat64_t f64, mfloat8_t f)
+{
+  svdot_fpm (f16, f8, f8, fpm);
+  svdot_fpm (f32, f8, f8, fpm);
+
+  svdot_fpm (f16); /* { dg-error {too few arguments to function 'svdot_fpm'} } 
*/
+  svdot_fpm (f16, f8); /* { dg-error {too few arguments to function 
'svdot_fpm'} } */
+  svdot_fpm (f16, f8, f8); /* { dg-error {too few arguments to function 
'svdot_fpm'} } */
+  svdot_fpm (f8, f8, fpm); /* { dg-error {too few arguments to function 
'svdot_fpm'} } */
+  svdot_fpm (f16, f8, fpm); /* { dg-error {too few arguments to function 
'svdot_fpm'} } */
+  svdot_fpm (f16, f8, f8, fpm, 0); /* { dg-error {too many arguments to 
function 'svdot_fpm'} } */
+
+  svdot_fpm (0, f8, f8, fpm); /* { dg-error {passing 'int' to argument 1 of 
'svdot_fpm', which expects an SVE type rather than a scalar} } */
+  svdot_fpm (f16, f8, f, fpm); /* { dg-error {passing 'mfloat8_t' {aka 
'__mfp8'} to argument 3 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (pg, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
takes 'svbool_t' and 'svmfloat8_t' arguments} } */
+  svdot_fpm (u8, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
takes 'svuint8_t' and 'svmfloat8_t' arguments} } */
+  svdot_fpm (u16, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
takes 'svuint16_t' and 'svmfloat8_t' arguments} } */
+  svdot_fpm (f64, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
takes 'svfloat64_t' and 'svmfloat8_t' arguments} } */
+  svdot_fpm (f16, 0, f8, fpm); /* { dg-error {passing 'int' to argument 2 of 
'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (f16, f16, f8, fpm); /* { dg-error {passing 'svfloat16_t' to 
argument 2 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (f16, f8, 0, fpm); /* { dg-error {passing 'int' to argument 3 of 
'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (f16, f8, f16, fpm); /* { dg-error {passing 'svfloat16_t' to 
argument 3 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (f16, f8, f8, f8); /* { dg-error {passing 'svmfloat8_t' to 
argument 4 of 'svdot_fpm', which expects 'uint64_t'} } */
+}
diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
new file mode 100644
index 000..dec00e3abf1
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+
+#include 
+
+#pragma GCC target ("arch=armv8.2-a+ssve-fp8fma+ssve-fp8dot2")
+
+void
+f1 (svfloat16_t f16, svmfloat8_t f8, fpm_t fpm,
+svbool_t pg, svuint8_t u8, svuint16_t u16, svint32_t s32,
+svbfloat16_t bf16, svfloat32_t f32, svfloat64_t f64, mfloat8_t f, int i)
+__arm_streaming
+{
+  svdot_lane_fpm (f32, f8, f8, 0, fpm);
+  svdot_lane_fpm (f32, f8, f8, 3, fpm);
+  svdot_lane_fpm (f16, f8, f8, 0, fpm);

[PATCH] aarch64: Fix bootstrap build failure due to missing header

2024-11-29 Thread Yury Khrustalev
Inclusion of "arm_acle.h" would requires stdint.h that may
not be available during first stage of cross-compilation.

libgcc/ChangeLog:

* config/aarch64/aarch64-unwind.h (_CHKFEAT_GCS): Add.

---

Regression tested on aarch64-unknown-linux-gnu and no regressions have been 
found.

Is this OK for trunk?

Thanks,
Yury

---
 libgcc/config/aarch64/aarch64-unwind.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
b/libgcc/config/aarch64/aarch64-unwind.h
index 85468f9685e..d11753a0e03 100644
--- a/libgcc/config/aarch64/aarch64-unwind.h
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -29,7 +29,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 
 #include "ansidecl.h"
 #include 
-#include 
 
 #define AARCH64_DWARF_REGNUM_RA_STATE 34
 #define AARCH64_DWARF_RA_STATE_MASK   0x1
@@ -180,7 +179,7 @@ aarch64_demangle_return_addr (struct _Unwind_Context 
*context,
 }
 
 /* GCS enable flag for chkfeat instruction.  */
-
+#define _CHKFEAT_GCS 1
 /* SME runtime function local to libgcc, streaming compatible
and preserves more registers than the base PCS requires, but
we don't rely on that here.  */
-- 
2.39.5



Re: [PATCH] aarch64: Add ISA requirements to some SVE/SME md comments

2024-11-29 Thread Kyrylo Tkachov


> On 29 Nov 2024, at 13:04, Richard Sandiford  wrote:
> 
> Kyrylo Tkachov  writes:
>> Hi Richard
>>> On 6 Nov 2024, at 18:16, Richard Sandiford  
>>> wrote:
>>> 
>>> This series adds support for FEAT_SVE2p1 (-march=...+sve2p1).
>>> One thing that the extension does is make some SME and SME2 instructions
>>> available outside of streaming mode.  It also adds quite a few new
>>> instructions.  Some of those new instructions are shared with SME2.1,
>>> which will be added by a later patch.
>>> 
>>> Tested on aarch64-linux-gnu.  GNU binutils doesn't yet have full
>>> support for SVE2.1, meaning that the aarch64_asm_sve2p1_ok target
>>> selector fails and that the new aarch64-sve2-acle-asm.exp tests fall
>>> back to "dg-do compile" instead of "dg-do assemble".  However, I also
>>> tested aarch64-sve2-acle-asm.exp against LLVM's assembler using a
>>> hacked-up script.
>>> 
>>> I also tried to cross-check GCC's implementation against LLVM's SVE2.1
>>> ACLE tests.  There were some failures due to missing B16B16 support
>>> (part of a separate follow-on series) and the fact that LLVM's stores
>>> take pointers to const (raised separately), but otherwise things
>>> seemed ok.
>>> 
>>> I'll commit this on Monday if there are no comments before then,
>>> but please let me know if you'd like me to wait longer.  It will
>>> likely need some minor updates due to conflicts with other
>>> in-flight patches.
>> 
>> Thanks for these!
>> One suggestion I have is that for the patterns in aarch64-sve2.md we may 
>> want to have SVE2p1 in the comments for the new instructions.
>> For example, we have:
>> ;; -
>> ;;  [FP] Clamp to minimum/maximum
>> ;; -
>> ;; - BFCLAMP (SVE_B16B16)
>> ;; - FCLAMP
>> ;; -
>> Which shows what extension is BFCLAMP a part of.
>> I personally find these comments useful when quickly looking for what’s in 
>> the base architecture and what’s not and it’d be nice to keep this scheme 
>> going though I notice we’re already not totally consistent about it.
>> I don’t insist on it, but it’s a suggestion.
> 
> Ah, yeah, good point.  I've applied the patch below to do that.
> 
> (And sorry for the delay.  I thought it would be fairer to wait until
> Claudio's patches were in, to avoid creating more churn for that series.)

Thanks Richard, this is very helpful.
Kyrill

> 
> Thanks,
> Richard
> 
> 
> 
> The SVE and SME md files are divided into sections, with each
> section often starting with a comment that lists the associated
> mnemonics.  These lists usually include the base architecture
> requirement in parentheses, if the base requirement is greater
> than the baseline for the file.  This patch tries to be more
> consistent about when we do that for the recently added SVE2p1
> and SME2p1 extensions.
> 
> gcc/
> * config/aarch64/aarch64-sme.md: In the section comments, add the
> architecture requirements alongside some mnemonics.
> * config/aarch64/aarch64-sve2.md: Likewise.
> ---
> gcc/config/aarch64/aarch64-sme.md  |  72 ++---
> gcc/config/aarch64/aarch64-sve2.md | 162 +++--
> 2 files changed, 121 insertions(+), 113 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-sme.md 
> b/gcc/config/aarch64/aarch64-sme.md
> index 0f362671f75..e4562186bdd 100644
> --- a/gcc/config/aarch64/aarch64-sme.md
> +++ b/gcc/config/aarch64/aarch64-sme.md
> @@ -533,7 +533,7 @@ (define_insn "@aarch64_sme_ldrn"
> ;;  Table loads
> ;; -
> ;; Includes:
> -;; - LDR
> +;; - LDR (SME2)
> ;; -
> 
> (define_c_enum "unspec" [
> @@ -635,7 +635,7 @@ (define_insn "@aarch64_sme_strn"
> ;;  Table stores
> ;; -
> ;; Includes:
> -;; - STR
> +;; - STR (SME2)
> ;; -
> 
> (define_insn "aarch64_sme_str_zt0"
> @@ -651,7 +651,7 @@ (define_insn "aarch64_sme_str_zt0"
> ;; -
> ;; Includes:
> ;; - MOVA
> -;; - MOVAZ
> +;; - MOVAZ (SME2p1)
> ;; -
> 
> (define_insn "@aarch64_sme_"
> @@ -813,7 +813,7 @@ (define_insn 
> "@aarch64_sme_"
> ;; -
> ;; Includes:
> ;; - MOVA
> -;; - MOVAZ
> +;; - MOVAZ (SME2p1)
> ;; -
> 
> (define_insn "@aarch64_sme_"
> @@ -1140,12 +1140,12 @@ (define_insn "@aarch64_sme_"
> ;;  Binary arithmetic on ZA slice
> ;; -
> ;; In

[PATCH] c++, v3: Implement C++26 P3176R1 - The Oxford variadic comma

2024-11-29 Thread Jakub Jelinek
On Fri, Nov 29, 2024 at 08:36:21AM -0500, Jason Merrill wrote:
> > This should be either "omission of ',' before" or "omitting ',' before"
> > 
> > But not "omitting of" :-)
> 
> Agreed.

Picked omission of.

> I also might say "varargs" instead of "variadic", since the latter could
> also refer to variadic templates?

So like this (just in the diagnostic message and description (both docs and
option help)?
Or do you want varargs also in the option name, and perhaps names of the new
testcases, too?

2024-11-29  Jakub Jelinek  

gcc/c-family/
* c.opt: Implement C++26 P3176R1 - The Oxford variadic comma.
(Wdeprecated-variadic-comma-omission): New option.
* c.opt.urls: Regenerate.
* c-opts.cc (c_common_post_options): Default to
-Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic.
gcc/cp/
* parser.cc (cp_parser_parameter_declaration_clause): Emit
-Wdeprecated-variadic-comma-omission warnings.
gcc/
* doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document.
gcc/testsuite/
* g++.dg/cpp26/variadic-comma1.C: New test.
* g++.dg/cpp26/variadic-comma2.C: New test.
* g++.dg/cpp26/variadic-comma3.C: New test.
* g++.dg/cpp26/variadic-comma4.C: New test.
* g++.dg/cpp26/variadic-comma5.C: New test.
* g++.dg/cpp1z/fold10.C: Expect a warning for C++26.
* g++.dg/ext/attrib33.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise.
* g++.dg/cpp2a/lambda-generic10.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-const3.C: Likewise.
* g++.dg/cpp0x/variadic164.C: Likewise.
* g++.dg/cpp0x/variadic17.C: Likewise.
* g++.dg/cpp0x/udlit-args-neg.C: Likewise.
* g++.dg/cpp0x/variadic28.C: Likewise.
* g++.dg/cpp0x/gen-attrs-33.C: Likewise.
* g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise.
* g++.old-deja/g++.law/operators15.C: Likewise.
* g++.old-deja/g++.mike/p811.C: Likewise.
* g++.old-deja/g++.mike/p12306.C (printf): Add , before ... .
* g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise.
* g++.dg/cpp0x/vt-65790.C (printf): Likewise.
libstdc++-v3/
* include/std/functional (_Bind_check_arity): Add , before ... .
* include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl):
Likewise.
* include/tr1/type_traits (is_function): Likewise.

--- gcc/c-family/c.opt.jj   2024-11-22 19:52:19.477579338 +0100
+++ gcc/c-family/c.opt  2024-11-25 16:22:11.325028058 +0100
@@ -672,6 +672,10 @@ Wdeprecated-non-prototype
 C ObjC Var(warn_deprecated_non_prototype) Init(-1) Warning
 Warn about calls with arguments to functions declared without parameters.
 
+Wdeprecated-variadic-comma-omission
+C++ ObjC++ Var(warn_deprecated_variadic_comma_omission) Warning
+Warn about deprecated omission of comma before ... in varargs function 
declaration.
+
 Wdesignated-init
 C ObjC Var(warn_designated_init) Init(1) Warning
 Warn about positional initialization of structs requiring designated 
initializers.
--- gcc/c-family/c.opt.urls.jj  2024-11-18 21:59:35.872150269 +0100
+++ gcc/c-family/c.opt.urls 2024-11-25 17:38:25.396693802 +0100
@@ -310,6 +310,9 @@ UrlSuffix(gcc/C_002b_002b-Dialect-Option
 Wdeprecated-non-prototype
 UrlSuffix(gcc/Warning-Options.html#index-Wdeprecated-non-prototype)
 
+Wdeprecated-variadic-comma-omission
+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wdeprecated-variadic-comma-omission)
+
 Wdesignated-init
 UrlSuffix(gcc/Warning-Options.html#index-Wdesignated-init)
 
--- gcc/c-family/c-opts.cc.jj   2024-11-23 13:00:28.188030262 +0100
+++ gcc/c-family/c-opts.cc  2024-11-25 16:23:36.569829016 +0100
@@ -1051,6 +1051,11 @@ c_common_post_options (const char **pfil
   warn_deprecated_literal_operator,
   deprecated_in (cxx23));
 
+  /* -Wdeprecated-variadic-comma-omission is enabled by default in C++26.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  warn_deprecated_variadic_comma_omission,
+  deprecated_in (cxx26));
+
   /* -Wtemplate-id-cdtor is enabled by default in C++20.  */
   SET_OPTION_IF_UNSET (&global_options, &global_options_set,
   warn_template_id_cdtor,
--- gcc/cp/parser.cc.jj 2024-11-23 13:00:29.060017680 +0100
+++ gcc/cp/parser.cc2024-11-25 17:02:24.551059305 +0100
@@ -25667,6 +25667,16 @@ cp_parser_parameter_declaration_clause (
  omitted.  */
   else if (token->type == CPP_ELLIPSIS)
 {
+  /* Deprecated by P3176R1 in C++26.  */
+  if (warn_deprecated_variadic_comma_omission)
+   {
+ gcc_rich_location richloc (token->location);
+ richloc.add_fixit_insert_before (", ");
+ warning_at (&richloc, OPT_Wdeprecated_variadic_comma_omission,
+ "omission of %<,%> before varargs %<...%> is "
+ "deprecated in C++26");
+   }
+
  

Re: [PATCH] aarch64: Fix bootstrap build failure due to missing header

2024-11-29 Thread Kyrylo Tkachov
Hi Yury,

> On 29 Nov 2024, at 13:57, Yury Khrustalev  wrote:
> 
> Inclusion of "arm_acle.h" would requires stdint.h that may
> not be available during first stage of cross-compilation.

Do you mean when trying to build a big-endian cross-compiler or something?
The change seems harmless to me but the subject line says it’s fixing a 
bootstrap failure but the text here says cross-compilation.
So I’m trying to understand what’s going wrong.

Thanks,
Kyrill


> 
> libgcc/ChangeLog:
> 
> * config/aarch64/aarch64-unwind.h (_CHKFEAT_GCS): Add.
> 
> ---
> 
> Regression tested on aarch64-unknown-linux-gnu and no regressions have been 
> found.
> 
> Is this OK for trunk?
> 
> Thanks,
> Yury
> 
> ---
> libgcc/config/aarch64/aarch64-unwind.h | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
> b/libgcc/config/aarch64/aarch64-unwind.h
> index 85468f9685e..d11753a0e03 100644
> --- a/libgcc/config/aarch64/aarch64-unwind.h
> +++ b/libgcc/config/aarch64/aarch64-unwind.h
> @@ -29,7 +29,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> 
> #include "ansidecl.h"
> #include 
> -#include 
> 
> #define AARCH64_DWARF_REGNUM_RA_STATE 34
> #define AARCH64_DWARF_RA_STATE_MASK   0x1
> @@ -180,7 +179,7 @@ aarch64_demangle_return_addr (struct _Unwind_Context 
> *context,
> }
> 
> /* GCS enable flag for chkfeat instruction.  */
> -
> +#define _CHKFEAT_GCS 1
> /* SME runtime function local to libgcc, streaming compatible
>and preserves more registers than the base PCS requires, but
>we don't rely on that here.  */
> -- 
> 2.39.5
> 



Re: [PATCH] libgccjit: Add support for machine-dependent builtins

2024-11-29 Thread Antoni Boucher

Oh, nice.

I'll send my future patches on the forgejo instance, then.

Le 2024-11-20 à 17 h 35, Mark Wielaard a écrit :

Hi Antoni,

On Wed, Nov 20, 2024 at 11:11:01AM -0500, Antoni Boucher wrote:

 From what I understand, pull requests on forge.sourceware.org can be
removed at any time, so I could lose track of the status of my
patches.


It is an experiment, and the experiment could fail for various
reasons. At that point we could decide to just throw everything
away. But we wouldn't do that randomly and I think people are willing
to let the experiment run for at least a year before deciding it does
or doesn't work. And we would of course give people the chance to
migrate the work they want to preserve somewhere else (forgejo has
good import/export to various other forges).

We could also decide the current setup is not good (and admittedly the
-mirror/-test thing is a little odd) and change those names and/or
resetup those repos.

But interestingly it seems that wouldn't impact your workflow. Which I
hadn't even thought was possible. But I just tried on our forgejo
setup and of course it works. You can do pull request to your own fork
from one branch to another.

Seeing this already thought me something I didn't know was possible or
useful. But I can totally see now how these "self pull requests" help
someone keep track of their work.


I really like forgejo and use it for some of my personal projects.
If you still think there would be benefit in me sending patches to
forge.sourceware.org, please tell me and I'll try.


If another developer/maintainer like David is happy to try what you
already have been doing through github I think it would be
useful. Even if it doesn't work out for you that would be very
valuable feedback.

I do have to note that there are people a little nervous about reviews
completely "bypassing" the mailinglists. But that would be even more a
concern with using github for this.

Cheers,

Mark




[PATCH v2 1/3] arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c

2024-11-29 Thread Andre Vieira

After the changes to the vctp intrinsic codegen changed slightly, where we now
unfortunately seem to be generating unneeded moves and extends of the mask.
These are however not incorrect and we don't have a fix for the unneeded
codegen right now, so changing the testcase to accept them so we can catch
other changes if they occur.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/dlstp-compile-asm-2.c (test7): Add an optional
vmsr to the check-function-bodies.
---
 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
index c62f592a60d..21093044708 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
@@ -216,7 +216,12 @@ void test7 (int32_t *a, int32_t *b, int32_t *c, int n, int g)
 **...
 **	dlstp.32	lr, r3
 **	vldrw.32	q[0-9]+, \[r0\], #16
+** (
+**	vmsr	p0, .*
 **	vpst
+** |
+**	vpst
+** )
 **	vldrwt.32	q[0-9]+, \[r1\], #16
 **	vadd.i32	(q[0-9]+), q[0-9]+, q[0-9]+
 **	vstrw.32	\1, \[r2\], #16


Re: [PATCH v5 5/5] aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

2024-11-29 Thread Kyrylo Tkachov


> On 29 Nov 2024, at 13:00, Richard Sandiford  wrote:
> 
> Thanks for the update!
> 
> Claudio Bantaloukas  writes:
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 2a4f016e2df..f7440113570 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -21957,6 +21957,18 @@ Enable the fp8 (8-bit floating point) multiply 
>> accumulate extension.
>> @item ssve-fp8fma
>> Enable the fp8 (8-bit floating point) multiply accumulate extension in 
>> streaming
>> mode.
>> +@item fp8dot4
>> +Enable the fp8 (8-bit floating point) to single-precision 4-way dot product
>> +extension.
>> +@item ssve-fp8dot4
>> +Enable the fp8 (8-bit floating point) to single-precision 4-way dot product
>> +extension in streaming mode.
>> +@item fp8dot2
>> +Enable the fp8 (8-bit floating point) o half-precision 2-way dot product
>> +extension.
> 
> typo: s/o/to/ (and below).
> 
> Since the change is so trivial, I made it locally, tweaked the ordering
> of the svcvt entries in patch 3, and fixed some whitespace issues that
> git am was complaining about.  Push to trunk with those changes.

Thanks for the patch Claudio!
One thing I just noticed (sorry for not spotting it earlier) is the cpuinfo 
strings in the aarch64-option-extensions.def file for the new extensions.
I don’t think they match up with what the Linux kernel would print in 
/proc/cpuinfo.
Could you have another look at them and the page at:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpuinfo.c#n137
and make sure GCC expects the right values? It could be that for some of these 
features we may need to expect two or more strings (like “paca pacg” for pauth).

Thanks,
Kyrill


> 
> Now that you've had at least two series applied, could you follow the
> process on https://gcc.gnu.org/gitwrite.html to get write access for
> future patches?  (I'll sponsor.)
> 
> Thanks,
> Richard
> 
>> +@item ssve-fp8dot2
>> +Enable the fp8 (8-bit floating point) o half-precision 2-way dot product
>> +extension in streaming mode.
>> @item faminmax
>> Enable the Floating Point Absolute Maximum/Minimum extension.
>> @item sve-b16b16
>> diff --git 
>> a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
>> new file mode 100644
>> index 000..9ad789a8ad2
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
>> @@ -0,0 +1,33 @@
>> +/* { dg-do compile } */
>> +
>> +#include 
>> +
>> +#pragma GCC target ("arch=armv8.2-a+sve2+fp8dot2")
>> +
>> +void
>> +test (svfloat16_t f16, svmfloat8_t f8, fpm_t fpm, 
>> +svbool_t pg, svuint8_t u8, svuint16_t u16, svint32_t s32,
>> +svbfloat16_t bf16, svfloat32_t f32, svfloat64_t f64, mfloat8_t f)
>> +{
>> +  svdot_fpm (f16, f8, f8, fpm);
>> +  svdot_fpm (f32, f8, f8, fpm);
>> +
>> +  svdot_fpm (f16); /* { dg-error {too few arguments to function 
>> 'svdot_fpm'} } */
>> +  svdot_fpm (f16, f8); /* { dg-error {too few arguments to function 
>> 'svdot_fpm'} } */
>> +  svdot_fpm (f16, f8, f8); /* { dg-error {too few arguments to function 
>> 'svdot_fpm'} } */
>> +  svdot_fpm (f8, f8, fpm); /* { dg-error {too few arguments to function 
>> 'svdot_fpm'} } */
>> +  svdot_fpm (f16, f8, fpm); /* { dg-error {too few arguments to function 
>> 'svdot_fpm'} } */
>> +  svdot_fpm (f16, f8, f8, fpm, 0); /* { dg-error {too many arguments to 
>> function 'svdot_fpm'} } */
>> +
>> +  svdot_fpm (0, f8, f8, fpm); /* { dg-error {passing 'int' to argument 1 of 
>> 'svdot_fpm', which expects an SVE type rather than a scalar} } */
>> +  svdot_fpm (f16, f8, f, fpm); /* { dg-error {passing 'mfloat8_t' {aka 
>> '__mfp8'} to argument 3 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
>> +  svdot_fpm (pg, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
>> takes 'svbool_t' and 'svmfloat8_t' arguments} } */
>> +  svdot_fpm (u8, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
>> takes 'svuint8_t' and 'svmfloat8_t' arguments} } */
>> +  svdot_fpm (u16, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
>> takes 'svuint16_t' and 'svmfloat8_t' arguments} } */
>> +  svdot_fpm (f64, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
>> takes 'svfloat64_t' and 'svmfloat8_t' arguments} } */
>> +  svdot_fpm (f16, 0, f8, fpm); /* { dg-error {passing 'int' to argument 2 
>> of 'svdot_fpm', which expects 'svmfloat8_t'} } */
>> +  svdot_fpm (f16, f16, f8, fpm); /* { dg-error {passing 'svfloat16_t' to 
>> argument 2 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
>> +  svdot_fpm (f16, f8, 0, fpm); /* { dg-error {passing 'int' to argument 3 
>> of 'svdot_fpm', which expects 'svmfloat8_t'} } */
>> +  svdot_fpm (f16, f8, f16, fpm); /* { dg-error {passing 'svfloat16_t' to 
>> argument 3 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
>> +  svdot_fpm (f16, f8, f8, f8); /* { dg-error {passing 'svmfloat8_t' to 
>> argument 4 of 'svdot_fpm', which expect

[PATCH v2 3/3] arm, mve: Detect uses of vctp_vpr_generated inside subregs

2024-11-29 Thread Andre Vieira

Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that.  Using
reg_overlap_mentioned_p is much more robust.

gcc/ChangeLog:

PR target/117814
* gcc/config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.
---
 gcc/config/arm/arm.cc |  3 +-
 .../gcc.target/arm/mve/dlstp-invalid-asm.c| 37 ++-
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 7292fddef80..7f82fb94a56 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35847,7 +35847,8 @@ arm_attempt_dlstp_transform (rtx label)
 	  df_ref insn_uses = NULL;
 	  FOR_EACH_INSN_USE (insn_uses, insn)
 	  {
-	if (rtx_equal_p (vctp_vpr_generated, DF_REF_REG (insn_uses)))
+	if (reg_overlap_mentioned_p (vctp_vpr_generated,
+	 DF_REF_REG (insn_uses)))
 	  {
 		end_sequence ();
 		return 1;
diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
index 26df2d30523..eb0782ebd0d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
@@ -127,8 +127,15 @@ void test9 (int32_t *a, int32_t *b, int32_t *c, int n)
 }
 }
 
-/* Using a VPR that gets re-generated within the loop.  */
-void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
+/* Using a VPR that gets re-generated within the loop.  Even though we
+   currently reject such loops, it would be possible to dlstp transform this
+   specific loop, as long as we make sure that the first vldrwq_z mask would
+   either:
+   * remain the same as its mask in the first iteration,
+   * become the same as the loop mask after the first iteration,
+   * become all ones, since the dlstp would then mask it the same as the loop
+   mask.  */
+void test10a (int32_t *a, int32_t *b, int32_t *c, int n)
 {
   mve_pred16_t p = vctp32q (n);
   while (n > 0)
@@ -145,6 +152,32 @@ void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
 }
 }
 
+/* Using a VPR that gets re-generated within the loop, the difference between
+   this test and test10a is to make sure the two vctp calls are never the same,
+   this leads to slightly different codegen in some cases triggering the issue
+   in a different way.   This loop too would be OK to dlstp transform as long
+   as we made sure that the first vldrwq_z mask would either:
+   * remain the same as the its mask in the first iteration,
+   * become the same as the loop mask after the first iteration,
+   * become all ones, since the dlstp would then mask it the same as the loop
+   mask.  */
+void test10b (int32_t *a, int32_t *b, int32_t *c, int n)
+{
+  mve_pred16_t p = vctp32q (n-4);
+  while (n > 0)
+{
+  int32x4_t va = vldrwq_z_s32 (a, p);
+  p = vctp32q (n);
+  int32x4_t vb = vldrwq_z_s32 (b, p);
+  int32x4_t vc = vaddq_x_s32 (va, vb, p);
+  vstrwq_p_s32 (c, vc, p);
+  c += 4;
+  a += 4;
+  b += 4;
+  n -= 4;
+}
+}
+
 /* Using vctp32q_m instead of vctp32q.  */
 void test11 (int32_t *a, int32_t *b, int32_t *c, int n, mve_pred16_t p0)
 {


Re: [PATCH v3] arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

2024-11-29 Thread Christophe Lyon
Hi,

On Mon, 25 Nov 2024 at 21:08, Christophe Lyon
 wrote:
>
> In this PR, we have to handle a case where MVE predicates are supplied
> as a const_int, where individual predicates have illegal boolean
> values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
> fix the constant (any non-zero value is converted to all 1s) and emit
> a warning.
>
> On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
> instruction level, but end-users should describe lanes rather than
> bytes (so all bytes of a true-predicated lane should be '1'), see
> https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
>
> Since gen_lowpart can ICE on a subreg, we force predicates in a subreg
> into a reg, after removing subreg of the same size as the target
> (HImode) which would be made redundant by gen_lowpart and confuse the
> DLSTP optimization.
>

I forgot to mention that if OK for trunk, I'd like to backport this to gcc-14.

Thanks,

Christophe

> 2024-11-20  Christophe Lyon  
> Jakub Jelinek  
>
> PR target/114801
> gcc/
> * config/arm/arm-mve-builtins.cc
> (function_expander::add_input_operand): Handle CONST_INT
> predicates.
>
> gcc/testsuite/
> * gcc.target/arm/mve/pr108443.c: Update predicate constant.
> * gcc.target/arm/mve/pr114801.c: New test.
> ---
>  gcc/config/arm/arm-mve-builtins.cc  | 37 ++-
>  gcc/testsuite/gcc.target/arm/mve/pr108443.c |  4 +--
>  gcc/testsuite/gcc.target/arm/mve/pr114801.c | 39 +
>  3 files changed, 77 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c
>
> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> b/gcc/config/arm/arm-mve-builtins.cc
> index 255aed25600..5ff32ce06b7 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -2352,7 +2352,42 @@ function_expander::add_input_operand (insn_code icode, 
> rtx x)
>mode = GET_MODE (x);
>  }
>else if (VALID_MVE_PRED_MODE (mode))
> -x = gen_lowpart (mode, x);
> +{
> +  if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
> +   {
> + /* In V8BI or V4BI each element has 2 or 4 bits, if those bits 
> aren't
> +all the same, gen_lowpart might ICE.  Canonicalize all the 2 or 4
> +bits to all ones if any of them is non-zero.  V8BI and V4BI
> +multi-bit masks are interpreted byte-by-byte at instruction 
> level,
> +but such constants should describe lanes, rather than bytes.  See
> +
> https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
>   */
> + unsigned HOST_WIDE_INT xi = UINTVAL (x);
> + xi |= ((xi & 0x) << 1) | ((xi & 0x) >> 1);
> + if (mode == V4BImode)
> +   xi |= ((xi & 0x) << 2) | ((xi & 0x) >> 2);
> + if (xi != UINTVAL (x))
> +   inform (location, "constant predicate argument %d (%wx) does"
> +   " not map to %d lane numbers, converted to %wx",
> +   opno, UINTVAL (x) & 0x, mode == V8BImode ? 8 : 4,
> +   xi & 0x);
> +
> + x = gen_int_mode (xi, HImode);
> +   }
> +  else if (SUBREG_P (x))
> +   {
> + /* Already of the right size, drop the subreg which will be made
> +redundant by gen_lowpart below.  */
> + if (GET_MODE_SIZE (GET_MODE (x)) == GET_MODE_SIZE (HImode)
> + || SUBREG_BYTE (x) == 0)
> +   x = SUBREG_REG (x);
> +
> + /* gen_lowpart on a SUBREG can ICE.  */
> + if (gen_lowpart_common (mode, x) == 0)
> +   x = force_reg (GET_MODE (x), x);
> +   }
> +
> +  x = gen_lowpart (mode, x);
> +}
>
>m_ops.safe_grow (m_ops.length () + 1, true);
>create_input_operand (&m_ops.last (), x, mode);
> diff --git a/gcc/testsuite/gcc.target/arm/mve/pr108443.c 
> b/gcc/testsuite/gcc.target/arm/mve/pr108443.c
> index c5fbfa4a1bb..0c0e2dd6eb8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/pr108443.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/pr108443.c
> @@ -7,8 +7,8 @@
>  void
>  __attribute__ ((noipa)) partial_write_cst (uint32_t *a, uint32x4_t v)
>  {
> -  vstrwq_p_u32 (a, v, 0x00CC);
> +  vstrwq_p_u32 (a, v, 0x00FF);
>  }
>
> -/* { dg-final { scan-assembler {mov\tr[0-9]+, #204} } } */
> +/* { dg-final { scan-assembler {mov\tr[0-9]+, #255} } } */
>
> diff --git a/gcc/testsuite/gcc.target/arm/mve/pr114801.c 
> b/gcc/testsuite/gcc.target/arm/mve/pr114801.c
> new file mode 100644
> index 000..d051e309d0b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/pr114801.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-options "-O2" } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-final { check-function-bodies "**" "" ""

Re: [PATCH v5 5/5] aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

2024-11-29 Thread Claudio Bantaloukas




On 11/29/2024 2:15 PM, Kyrylo Tkachov wrote:




On 29 Nov 2024, at 13:00, Richard Sandiford  wrote:

Thanks for the update!

Claudio Bantaloukas  writes:

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2a4f016e2df..f7440113570 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21957,6 +21957,18 @@ Enable the fp8 (8-bit floating point) multiply 
accumulate extension.
@item ssve-fp8fma
Enable the fp8 (8-bit floating point) multiply accumulate extension in streaming
mode.
+@item fp8dot4
+Enable the fp8 (8-bit floating point) to single-precision 4-way dot product
+extension.
+@item ssve-fp8dot4
+Enable the fp8 (8-bit floating point) to single-precision 4-way dot product
+extension in streaming mode.
+@item fp8dot2
+Enable the fp8 (8-bit floating point) o half-precision 2-way dot product
+extension.


typo: s/o/to/ (and below).

Since the change is so trivial, I made it locally, tweaked the ordering
of the svcvt entries in patch 3, and fixed some whitespace issues that
git am was complaining about.  Push to trunk with those changes.


Thanks for the patch Claudio!
One thing I just noticed (sorry for not spotting it earlier) is the cpuinfo 
strings in the aarch64-option-extensions.def file for the new extensions.
I don’t think they match up with what the Linux kernel would print in 
/proc/cpuinfo.
Could you have another look at them and the page at:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpuinfo.c#n137
and make sure GCC expects the right values? It could be that for some of these 
features we may need to expect two or more strings (like “paca pacg” for pauth).


Will do, thank you for the heads up!
Cheers,
Claudio


Thanks,
Kyrill




Now that you've had at least two series applied, could you follow the
process on https://gcc.gnu.org/gitwrite.html to get write access for
future patches?  (I'll sponsor.)

Thanks,
Richard


+@item ssve-fp8dot2
+Enable the fp8 (8-bit floating point) o half-precision 2-way dot product
+extension in streaming mode.
@item faminmax
Enable the Floating Point Absolute Maximum/Minimum extension.
@item sve-b16b16
diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
new file mode 100644
index 000..9ad789a8ad2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+
+#include 
+
+#pragma GCC target ("arch=armv8.2-a+sve2+fp8dot2")
+
+void
+test (svfloat16_t f16, svmfloat8_t f8, fpm_t fpm,
+svbool_t pg, svuint8_t u8, svuint16_t u16, svint32_t s32,
+svbfloat16_t bf16, svfloat32_t f32, svfloat64_t f64, mfloat8_t f)
+{
+  svdot_fpm (f16, f8, f8, fpm);
+  svdot_fpm (f32, f8, f8, fpm);
+
+  svdot_fpm (f16); /* { dg-error {too few arguments to function 'svdot_fpm'} } 
*/
+  svdot_fpm (f16, f8); /* { dg-error {too few arguments to function 
'svdot_fpm'} } */
+  svdot_fpm (f16, f8, f8); /* { dg-error {too few arguments to function 
'svdot_fpm'} } */
+  svdot_fpm (f8, f8, fpm); /* { dg-error {too few arguments to function 
'svdot_fpm'} } */
+  svdot_fpm (f16, f8, fpm); /* { dg-error {too few arguments to function 
'svdot_fpm'} } */
+  svdot_fpm (f16, f8, f8, fpm, 0); /* { dg-error {too many arguments to 
function 'svdot_fpm'} } */
+
+  svdot_fpm (0, f8, f8, fpm); /* { dg-error {passing 'int' to argument 1 of 
'svdot_fpm', which expects an SVE type rather than a scalar} } */
+  svdot_fpm (f16, f8, f, fpm); /* { dg-error {passing 'mfloat8_t' {aka 
'__mfp8'} to argument 3 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (pg, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
takes 'svbool_t' and 'svmfloat8_t' arguments} } */
+  svdot_fpm (u8, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
takes 'svuint8_t' and 'svmfloat8_t' arguments} } */
+  svdot_fpm (u16, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
takes 'svuint16_t' and 'svmfloat8_t' arguments} } */
+  svdot_fpm (f64, f8, f8, fpm); /* { dg-error {'svdot_fpm' has no form that 
takes 'svfloat64_t' and 'svmfloat8_t' arguments} } */
+  svdot_fpm (f16, 0, f8, fpm); /* { dg-error {passing 'int' to argument 2 of 
'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (f16, f16, f8, fpm); /* { dg-error {passing 'svfloat16_t' to 
argument 2 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (f16, f8, 0, fpm); /* { dg-error {passing 'int' to argument 3 of 
'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (f16, f8, f16, fpm); /* { dg-error {passing 'svfloat16_t' to 
argument 3 of 'svdot_fpm', which expects 'svmfloat8_t'} } */
+  svdot_fpm (f16, f8, f8, f8); /* { dg-error {passing 'svmfloat8_t' to 
argument 4 of 'svdot_fpm', which expects 'uint64_t'} } */
+}
diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
 
b/gcc/testsuite/gcc.target/aarch64/sve/acle

Re: [PATCH] aarch64: Fix bootstrap build failure due to missing header

2024-11-29 Thread Yury Khrustalev
Hi Kyrill,

On Fri, Nov 29, 2024 at 02:06:17PM +, Kyrylo Tkachov wrote:
> Hi Yury,
> 
> > On 29 Nov 2024, at 13:57, Yury Khrustalev  wrote:
> > 
> > Inclusion of "arm_acle.h" would requires stdint.h that may
> > not be available during first stage of cross-compilation.
> 
> Do you mean when trying to build a big-endian cross-compiler or something?
> The change seems harmless to me but the subject line says it’s fixing a 
> bootstrap failure but the text here says cross-compilation.
> So I’m trying to understand what’s going wrong.
> 
> Thanks,
> Kyrill

The build failure I refer to arose when cross-building GCC for the
aarch64-none-linux-gnu target (on any supporting host) using 3-stage
bootstrap build process when we build native compiler from source
and when we use the "--with-build-sysroot=..." configure flag: GCC
fails to compile libgcc due to missing header that has not been
installed yet.

This could be worked around but it's better to fix the issue altogether.

Would you recommend to re-phrase the commit message?

Thanks,
Yury



Re: [PATCH] c++, v3: Implement C++26 P3176R1 - The Oxford variadic comma

2024-11-29 Thread Jason Merrill

On 11/29/24 8:55 AM, Jakub Jelinek wrote:

On Fri, Nov 29, 2024 at 08:36:21AM -0500, Jason Merrill wrote:

This should be either "omission of ',' before" or "omitting ',' before"

But not "omitting of" :-)


Agreed.


Picked omission of.


I also might say "varargs" instead of "variadic", since the latter could
also refer to variadic templates?


So like this (just in the diagnostic message and description (both docs and
option help)?


OK.


Or do you want varargs also in the option name, and perhaps names of the new
testcases, too?

2024-11-29  Jakub Jelinek  

gcc/c-family/
* c.opt: Implement C++26 P3176R1 - The Oxford variadic comma.
(Wdeprecated-variadic-comma-omission): New option.
* c.opt.urls: Regenerate.
* c-opts.cc (c_common_post_options): Default to
-Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic.
gcc/cp/
* parser.cc (cp_parser_parameter_declaration_clause): Emit
-Wdeprecated-variadic-comma-omission warnings.
gcc/
* doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document.
gcc/testsuite/
* g++.dg/cpp26/variadic-comma1.C: New test.
* g++.dg/cpp26/variadic-comma2.C: New test.
* g++.dg/cpp26/variadic-comma3.C: New test.
* g++.dg/cpp26/variadic-comma4.C: New test.
* g++.dg/cpp26/variadic-comma5.C: New test.
* g++.dg/cpp1z/fold10.C: Expect a warning for C++26.
* g++.dg/ext/attrib33.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise.
* g++.dg/cpp2a/lambda-generic10.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-const3.C: Likewise.
* g++.dg/cpp0x/variadic164.C: Likewise.
* g++.dg/cpp0x/variadic17.C: Likewise.
* g++.dg/cpp0x/udlit-args-neg.C: Likewise.
* g++.dg/cpp0x/variadic28.C: Likewise.
* g++.dg/cpp0x/gen-attrs-33.C: Likewise.
* g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise.
* g++.old-deja/g++.law/operators15.C: Likewise.
* g++.old-deja/g++.mike/p811.C: Likewise.
* g++.old-deja/g++.mike/p12306.C (printf): Add , before ... .
* g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise.
* g++.dg/cpp0x/vt-65790.C (printf): Likewise.
libstdc++-v3/
* include/std/functional (_Bind_check_arity): Add , before ... .
* include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl):
Likewise.
* include/tr1/type_traits (is_function): Likewise.

--- gcc/c-family/c.opt.jj   2024-11-22 19:52:19.477579338 +0100
+++ gcc/c-family/c.opt  2024-11-25 16:22:11.325028058 +0100
@@ -672,6 +672,10 @@ Wdeprecated-non-prototype
  C ObjC Var(warn_deprecated_non_prototype) Init(-1) Warning
  Warn about calls with arguments to functions declared without parameters.
  
+Wdeprecated-variadic-comma-omission

+C++ ObjC++ Var(warn_deprecated_variadic_comma_omission) Warning
+Warn about deprecated omission of comma before ... in varargs function 
declaration.
+
  Wdesignated-init
  C ObjC Var(warn_designated_init) Init(1) Warning
  Warn about positional initialization of structs requiring designated 
initializers.
--- gcc/c-family/c.opt.urls.jj  2024-11-18 21:59:35.872150269 +0100
+++ gcc/c-family/c.opt.urls 2024-11-25 17:38:25.396693802 +0100
@@ -310,6 +310,9 @@ UrlSuffix(gcc/C_002b_002b-Dialect-Option
  Wdeprecated-non-prototype
  UrlSuffix(gcc/Warning-Options.html#index-Wdeprecated-non-prototype)
  
+Wdeprecated-variadic-comma-omission

+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wdeprecated-variadic-comma-omission)
+
  Wdesignated-init
  UrlSuffix(gcc/Warning-Options.html#index-Wdesignated-init)
  
--- gcc/c-family/c-opts.cc.jj	2024-11-23 13:00:28.188030262 +0100

+++ gcc/c-family/c-opts.cc  2024-11-25 16:23:36.569829016 +0100
@@ -1051,6 +1051,11 @@ c_common_post_options (const char **pfil
   warn_deprecated_literal_operator,
   deprecated_in (cxx23));
  
+  /* -Wdeprecated-variadic-comma-omission is enabled by default in C++26.  */

+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  warn_deprecated_variadic_comma_omission,
+  deprecated_in (cxx26));
+
/* -Wtemplate-id-cdtor is enabled by default in C++20.  */
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
   warn_template_id_cdtor,
--- gcc/cp/parser.cc.jj 2024-11-23 13:00:29.060017680 +0100
+++ gcc/cp/parser.cc2024-11-25 17:02:24.551059305 +0100
@@ -25667,6 +25667,16 @@ cp_parser_parameter_declaration_clause (
   omitted.  */
else if (token->type == CPP_ELLIPSIS)
  {
+  /* Deprecated by P3176R1 in C++26.  */
+  if (warn_deprecated_variadic_comma_omission)
+   {
+ gcc_rich_location richloc (token->location);
+ richloc.add_fixit_insert_before (", ");
+ warning_at (&richloc, OPT_Wdeprecated_variadic_comma_omission,
+ "omission of %<,%> before varargs %<...%> is "
+  

Re: [PATCH] aarch64: Fix bootstrap build failure due to missing header

2024-11-29 Thread Kyrylo Tkachov


> On 29 Nov 2024, at 14:25, Yury Khrustalev  wrote:
> 
> Hi Kyrill,
> 
> On Fri, Nov 29, 2024 at 02:06:17PM +, Kyrylo Tkachov wrote:
>> Hi Yury,
>> 
>>> On 29 Nov 2024, at 13:57, Yury Khrustalev  wrote:
>>> 
>>> Inclusion of "arm_acle.h" would requires stdint.h that may
>>> not be available during first stage of cross-compilation.
>> 
>> Do you mean when trying to build a big-endian cross-compiler or something?
>> The change seems harmless to me but the subject line says it’s fixing a 
>> bootstrap failure but the text here says cross-compilation.
>> So I’m trying to understand what’s going wrong.
>> 
>> Thanks,
>> Kyrill
> 
> The build failure I refer to arose when cross-building GCC for the
> aarch64-none-linux-gnu target (on any supporting host) using 3-stage
> bootstrap build process when we build native compiler from source
> and when we use the "--with-build-sysroot=..." configure flag: GCC
> fails to compile libgcc due to missing header that has not been
> installed yet.
> 
> This could be worked around but it's better to fix the issue altogether.
> 
> Would you recommend to re-phrase the commit message?


Thanks for explaining. Yes, I think describing the use case a bit more in the 
commit message like you just did would be useful.
Ok with that change.
Kyrill

> 
> Thanks,
> Yury
> 



[PATCH] c++: Make sure fold_sizeof_expr returns the correct type [PR117775]

2024-11-29 Thread Simon Martin
We currently ICE upon the following code, that is valid under
-Wno-pointer-arith:

=== cut here ===
int main() {
  decltype( [](auto) { return sizeof(void); } ) x;
  return x.operator()(0);
}
=== cut here ===

The problem is that "fold_sizeof_expr (sizeof(void))" returns
size_one_node, that has a different TREE_TYPE from that of the sizeof
expression, which later triggers an assert in cxx_eval_store_expression.

This patch makes sure that fold_sizeof_expr always returns a tree with
the type requested.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/117775

gcc/cp/ChangeLog:

* decl.cc (fold_sizeof_expr): Make sure the folded result has
the requested TREE_TYPE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-117775.C: New test.

---
 gcc/cp/decl.cc|  1 +
 gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C | 13 +
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 80485f0a428..fbe1407a2d2 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -11686,6 +11686,7 @@ fold_sizeof_expr (tree t)
false, false);
   if (r == error_mark_node)
 r = size_one_node;
+  r = cp_fold_convert (TREE_TYPE (t), r);
   return r;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C
new file mode 100644
index 000..59fc0d332b9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C
@@ -0,0 +1,13 @@
+// PR c++/117775
+// Check that we don't ICE and have sizeof(void)==1 under -Wno-pointer-arith
+// { dg-do run { target c++20 } }
+// { dg-additional-options "-Wno-pointer-arith" }
+
+int main() {
+  struct why :
+decltype( [](auto) {
+   return sizeof(void);
+ })
+  {} x;
+  return 1 - x.operator()(0);
+}
-- 
2.44.0



Re: [PATCH] Introduce feeble_inline attribute [PR93008]

2024-11-29 Thread Jason Merrill

On 11/28/24 7:57 AM, Jan Hubicka wrote:


I think a 4 state flag { never_inline, default, auto_inline, always_inline }
would be fine.  The question is how to call the macro(s) and values
and how to merge those from different decls and what we do currently
e.g. for noinline, always_inline, on the same or on different decls
of the same function.

I was also thinking a bit of the name, but it seemed too late to jump in
:)

Generally inliner has the following modes
  - noinline
  - conservative inlining (driven by -auto limits)


inline_less? inline_basic?


  - aggressive inlining (driven by -single limits)


inline_more?


  - disregarding inline limits (inline when you can, former extern inline
of GNU C)


inline_even_more?


  - always inline (error when you can not inlined, at least sometimes -
we included a design problem allowing always inline functions to have
address taken, be recursive or be exported since historically
always_inline was upgraded from disregarding to stronger
interpretation)
Moreover meaning of conservative and aggressive is sensitive to
optimization level and also can be overwritten by inline hints.

So if we go for multi stage flag we probably want to have those 5 levels
+ default option.  There is also PR about switching inline limits (O2
wrt O3) which I am not sure how to fit into this picture.

-auto and -single names are historical and not very good. -single
predates me and -auto is my fault.
Perhaps the flags to switch conservative and aggresive inlining
be called somehting like inline_conservatively and inline_aggresively
or conservative_inline and aggressive_inline. feeble_inline is
conservative option...




RE: [RFC] PR81358: Enable automatic linking of libatomic

2024-11-29 Thread Prathamesh Kulkarni


> -Original Message-
> From: Joseph Myers 
> Sent: 28 November 2024 05:45
> To: Prathamesh Kulkarni 
> Cc: Xi Ruoyao ; Matthew Malcomson
> ; gcc-patches@gcc.gnu.org
> Subject: RE: [RFC] PR81358: Enable automatic linking of libatomic
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 19 Nov 2024, Prathamesh Kulkarni wrote:
> 
> > +#ifdef USE_LD_AS_NEEDED
> > +#define LINK_LIBATOMIC_SPEC "%{!fno-link-libatomic:"
> LD_AS_NEEDED_OPTION \
> > + " -latomic " LD_NO_AS_NEEDED_OPTION "} "
> > +#else
> > +#define LINK_LIBATOMIC_SPEC ""
> > +#endif
> 
> I'd expect conditionals to be set up so that, if libatomic is not
> built (typically because an unsupported target OS resulted in
> UNSUPPORTED=1 being set in libatomic/configure.tgt), no attempt is
> ever made to link it in.  (So in that case, users might get undefined
> references to __atomic_* and it would be their responsibility to
> provide a board support package that links with appropriate
> definitions of those symbols.)
Hi Joseph,
Thanks for the suggestions. To check whether libatomic is going to be built for
the target, the patch exports TARGET_CONFIGDIRS from Makefile.tpl and in 
gcc/configure.ac,
it checks if libatomic is present in TARGET_CONFGIDIRS and defines 
TARGET_PROVIDES_LIBATOMIC to 1 in that case.
Does that look OK ?
> 
> > diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
> 
> > +AM_CFLAGS = $(XCFLAGS) -fno-link-libatomic AM_CCASFLAGS =
> $(XCFLAGS)
> > +-fno-link-libatomic AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS)
> > +$(OPT_LDFLAGS) -fno-link-libatomic
> 
> > diff --git a/libatomic/configure.ac b/libatomic/configure.ac
> 
> > +CFLAGS="$CFLAGS -fno-link-libatomic"
> 
> > +XCFLAGS="$XCFLAGS $XPCFLAGS -fno-link-libatomic"
> 
> I don't see any clear conceptual design here for where this flag
> should go.  It should only need to be added in one place, not three
> times.
> Adding to CFLAGS before the default is set in configure, and before
> save_CFLAGS is set, seems especially dubious, though maybe you avoid
> problems with losing the default CFLAGS setting if libatomic is always
> configured with CFLAGS set by the toplevel Makefile.
> 
> My expectation is that CFLAGS should not be modified until after
> save_CFLAGS is set, which should not be until after configure has
> executed the logic that sets a -g -O2 default.  Is there some problem
> with that ordering (e.g. configure tests that expect to link target
> programs but run as part of the same Autoconf macro invocation that
> also generates the logic to determine default values)?  Also, the
It seems that in configure, AC_PROG_CC expands to setting "-g -O2" in CFLAGS,
and running conftests using those CFLAGS, and any adjustments to CFLAGS after 
invoking AC_PROG_CC don't help.
In the attached patch, I simply moved save_CFLAGS and CFLAGS before invoking 
AC_PROG_CC,
and adding "-fno-link-libatomic" to CFLAGS, which seems to work, but not sure 
if it's the correct approach ?

Patch passes bootstrap+test on aarch64-linux-gnu.

Thanks,
Prathamesh
> comment on save_CFLAGS
> says:
> 
> # In order to override CFLAGS_FOR_TARGET, all of our special flags go
> # in XCFLAGS.  But we need them in CFLAGS during configury.  So put
> them # in both places for now and restore CFLAGS at the end of config.
> 
> So if the option is set in CFLAGS itself during configure, that should
> be after save_CFLAGS is set, meaning only the setting in XCFLAGS is
> relevant for actually building libatomic.
> 
> Also, the new command-line option should be documented in invoke.texi.
> 
> --
> Joseph S. Myers
> josmy...@redhat.com

PR81358: Enable automatic linking of libatomic.

ChangeLog:
PR driver/81358
* Makefile.def: Add dependencies so libatomic is built before target
libraries are configured.
* Makefile.tpl: Export TARGET_CONFIGDIRS.
* configure.ac: Add libatomic to bootstrap_target_libs.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/ChangeLog:
PR driver/81358
* common.opt: New option -flink-libatomic.
* gcc.cc (LINK_LIBATOMIC_SPEC): New macro.
* config/gnu-user.h (GNU_USER_TARGET_LINK_GCC_C_SEQUENCE_SPEC): Use
LINK_LIBATOMIC_SPEC.
* doc/invoke.texi: Document -flink-libatomic.
* configure.ac: Define TARGET_PROVIDES_LIBATOMIC.
* configure: Regenerate.
* config.in: Regenerate.

libatomic/ChangeLog:
PR driver/81358
* Makefile.am: Pass -fno-link-libatomic.
New rule all.
* configure.ac: Pass -fno-link-libatomic. 
* Makefile.in: Regenerate.
* configure: Regenerate.

Signed-off-by: Prathamesh Kulkarni 
Co-authored-by: Matthew Malcolmson 

diff --git a/Makefile.def b/Makefile.def
index 19954e7d731..90899fa28cf 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -656,6 +656,26 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; 
no_c=true; };
 // a dependency on libgcc for nati

[PATCH v2] aarch64: Fix build failure due to missing header

2024-11-29 Thread Yury Khrustalev
Including the "arm_acle.h" header in aarch64-unwind.h requires
stdint.h to be present and it may not be available during the
first stage of cross-compilation of GCC.

When cross-building GCC for the aarch64-none-linux-gnu target
(on any supporting host) using the 3-stage bootstrap build
process when we build native compiler from source, libgcc fails
to compile due to missing header that has not been installed yet.

This could be worked around but it's better to fix the issue.

libgcc/ChangeLog:

* config/aarch64/aarch64-unwind.h (_CHKFEAT_GCS): Add.

---

Regression tested on aarch64-unknown-linux-gnu and no regressions have been 
found.
Is this OK for trunk?
Applies to fe29b03825c.

Thanks,
Yury

---
 libgcc/config/aarch64/aarch64-unwind.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
b/libgcc/config/aarch64/aarch64-unwind.h
index 85468f9685e..d11753a0e03 100644
--- a/libgcc/config/aarch64/aarch64-unwind.h
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -29,7 +29,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 
 #include "ansidecl.h"
 #include 
-#include 
 
 #define AARCH64_DWARF_REGNUM_RA_STATE 34
 #define AARCH64_DWARF_RA_STATE_MASK   0x1
@@ -180,7 +179,7 @@ aarch64_demangle_return_addr (struct _Unwind_Context 
*context,
 }
 
 /* GCS enable flag for chkfeat instruction.  */
-
+#define _CHKFEAT_GCS 1
 /* SME runtime function local to libgcc, streaming compatible
and preserves more registers than the base PCS requires, but
we don't rely on that here.  */
-- 
2.39.5



[PATCH] libstdc++: Hide TLS variables in `std::call_once`

2024-11-29 Thread LIU Hao


--
Best regards,
LIU Hao

From 78ae9cacdfea8bab4fcc8a18068ad30401eb588d Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Fri, 29 Nov 2024 17:17:01 +0800
Subject: [PATCH] libstdc++: Hide TLS variables in `std::call_once`

This is a transitional change for PR80881, because on Windows, thread-local
variables can't be exported from a DLL.

This commit hides `__once_callable` and `__once` beneath a wrapper function,
but also leaves them external for backward compatibility.

Another purpose for this wrapper function is that, if in the future we decide
to fix PR66146, the changes will be local to source files.

libstdc++-v3/ChangeLog:
PR libstdc++/66146, target/80881
* include/std/mutex (once_flag::_M_use_callable): New function
(call_once): Duplicate from old one for TLS, and make it call
`_M_use_callable` instead
* src/c++11/mutex.cc (once_flag::_M_use_callable): New function
---
 libstdc++-v3/include/std/mutex  | 48 ++---
 libstdc++-v3/src/c++11/mutex.cc | 15 +++
 2 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 8dd9b23191fd..bc1cc4bb006e 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -818,44 +818,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // for most targets this doesn't work correctly for exceptional executions.
 __gthread_once_t _M_once = __GTHREAD_ONCE_INIT;
 
+# ifdef _GLIBCXX_HAVE_TLS
+void
+_M_use_callable(void (*__call)(void*), void* __callable);
+# else
 struct _Prepare_execution;
+# endif
 
 template
   friend void
   call_once(once_flag& __once, _Callable&& __f, _Args&&... __args);
   };
 
-  /// @cond undocumented
 # ifdef _GLIBCXX_HAVE_TLS
-  // If TLS is available use thread-local state for the type-erased callable
-  // that is being run by std::call_once in the current thread.
-  extern __thread void* __once_callable;
-  extern __thread void (*__once_call)();
+  /// Invoke a callable and synchronize with other calls using the same flag
+  template
+void
+call_once(once_flag& __once, _Callable&& __f, _Args&&... __args)
+{
+  // Closure type that runs the function
+  auto __callable = [&] {
+ std::__invoke(std::forward<_Callable>(__f),
+   std::forward<_Args>(__args)...);
+  };
 
-  // RAII type to set up state for pthread_once call.
-  struct once_flag::_Prepare_execution
-  {
-template
-  explicit
-  _Prepare_execution(_Callable& __c)
-  {
-   // Store address in thread-local pointer:
-   __once_callable = std::__addressof(__c);
-   // Trampoline function to invoke the closure via thread-local pointer:
-   __once_call = [] { (*static_cast<_Callable*>(__once_callable))(); };
-  }
+  auto __call = [](void* __p) {
+ (*static_cast(__p))();
+  };
 
-~_Prepare_execution()
-{
-  // PR libstdc++/82481
-  __once_callable = nullptr;
-  __once_call = nullptr;
+  __once._M_use_callable(__call, &__callable);
 }
 
-_Prepare_execution(const _Prepare_execution&) = delete;
-_Prepare_execution& operator=(const _Prepare_execution&) = delete;
-  };
-
 # else
   // Without TLS use a global std::mutex and store the callable in a
   // global std::function.
@@ -892,8 +885,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Prepare_execution(const _Prepare_execution&) = delete;
 _Prepare_execution& operator=(const _Prepare_execution&) = delete;
   };
-# endif
-  /// @endcond
 
   // This function is passed to pthread_once by std::call_once.
   // It runs __once_call() or __once_functor().
@@ -916,6 +907,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (int __e = __gthread_once(&__once._M_once, &__once_proxy))
__throw_system_error(__e);
 }
+# endif
 
 #else // _GLIBCXX_HAS_GTHREADS
 
diff --git a/libstdc++-v3/src/c++11/mutex.cc b/libstdc++-v3/src/c++11/mutex.cc
index bfeea538a7c3..5aea0e7b60e2 100644
--- a/libstdc++-v3/src/c++11/mutex.cc
+++ b/libstdc++-v3/src/c++11/mutex.cc
@@ -41,6 +41,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __once_call();
   }
 
+  void
+  once_flag::_M_use_callable(void (*__call)(void*), void* __callable)
+  {
+auto once_p = make_pair(__call, __callable);
+__once_callable = &once_p;
+__once_call = [] {
+   auto p = static_cast(__once_callable);
+   (*p->first)(p->second);
+};
+
+// XXX pthread_once does not reset the flag if an exception is thrown.
+if (int __e = __gthread_once(&_M_once, &__once_proxy))
+  __throw_system_error(__e);
+  }
+
 #else // ! TLS
 
   // Explicit instantiation due to -fno-implicit-instantiation.
-- 
2.47.1



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] aarch64: Fix bootstrap build failure due to missing header

2024-11-29 Thread Yury Khrustalev
On Fri, Nov 29, 2024 at 02:31:34PM +, Kyrylo Tkachov wrote:
> 
> > Would you recommend to re-phrase the commit message?
> 
> Thanks for explaining. Yes, I think describing the use case a bit more in the 
> commit message like you just did would be useful.
> Ok with that change.
> Kyrill

Thanks, I've fixed in v2:
https://inbox.sourceware.org/gcc-patches/20241129144908.2259627-1-yury.khrusta...@arm.com/

Kind regards,
Yury



[PATCH v7 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-11-29 Thread Jeff Law
This is parts of patches #3 and #4 from Mariam's series.  Specifically 
it introduces an expander in the RISC-V backend that can generate a CRC 
using a clmul based sequence.  It also has support for exploiting zbkb 
if we need to reflect values.


I'm explicitly not including most of the tests.  Essentially the 
execution tests are dependent on the generic CRC tests which were part 
of a later patch.  I don't want to introduce those generic CRC tests 
until after the detection/validation/optimization are integrated, 
otherwise they'll fail.  So the bulk of the tests won't be integrated 
until later.


This has been tested with rv32 and rv64 in my tester.  I've also 
bootstrapped & regression tested riscv64-linux-gnu with the full patch 
series, but not with this patch broken out (it's in the middle of a 
bootstrap cycle now and the cycle is ~26 hours on my BPI).


Pushing to the trunk.


Jeff
commit 74eb3570e6fba73b0e2bfce2a14d7696e30b48a8
Author: Mariam Arutunian 
Date:   Thu Nov 28 14:35:23 2024 -0700

[PATCH v7 03/12] RISC-V: Add CRC expander to generate faster CRC.

If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
calculation.  Otherwise, if the target is ZBKB, generates table-based
CRC, but for reversing inputs and the output uses bswap and brev8
instructions.  Add new tests to check CRC generation for ZBC, ZBKC and
ZBKB targets.

gcc/

* expr.cc (gf2n_poly_long_div_quotient): New function.
* expr.h (gf2n_poly_long_div_quotient):  New function declaration.
* hwint.cc (reflect_hwi): New function.
* hwint.h (reflect_hwi): New function declaration.
* config/riscv/bitmanip.md (crc_rev4): New
expander for reversed CRC.
(crc4): New expander for bit-forward CRC.
* config/riscv/iterators.md (SUBX1, ANYI1): New iterators.
* config/riscv/riscv-protos.h (generate_reflecting_code_using_brev):
New function declaration.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.cc (generate_reflecting_code_using_brev): New
function.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.

* doc/sourcebuild.texi: Document new target selectors.

gcc/testsuite
* lib/target-supports.exp (check_effective_target_riscv_zbc): New
target supports predicate.
(check_effective_target_riscv_zbkb): Likewise.
(check_effective_target_riscv_zbkc): Likewise.
(check_effective_target_zbc_ok): Likewise.
(check_effective_target_zbkb_ok): Likewise.
(check_effective_target_zbkc_ok): Likewise.
(riscv_get_arch): Add zbkb and zbkc support.

* gcc.target/riscv/crc-builtin-zbc32.c: New file.
* gcc.target/riscv/crc-builtin-zbc64.c: Likewise.

Co-author: Jeff Law  

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 06ff698bfe7..23dc47eaaef 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -1192,3 +1192,66 @@ (define_insn "riscv_clmulr_"
   "TARGET_ZBC"
   "clmulr\t%0,%1,%2"
   [(set_attr "type" "clmul")])
+
+;; Reversed CRC 8, 16, 32 for TARGET_64
+(define_expand "crc_rev4"
+   ;; return value (calculated CRC)
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+ ;; initial CRC
+   (unspec:ANYI [(match_operand:ANYI 1 "register_operand" "r")
+ ;; data
+ (match_operand:ANYI1 2 "register_operand" "r")
+ ;; polynomial without leading 1
+ (match_operand:ANYI 3)]
+ UNSPEC_CRC_REV))]
+  /* We don't support the case when data's size is bigger than CRC's size.  */
+  "mode >= mode"
+{
+  /* If we have the ZBC or ZBKC extension (ie, clmul) and
+ it is possible to store the quotient within a single variable
+ (E.g.  CRC64's quotient may need 65 bits,
+ we can't keep it in 64 bit variable.)
+ then use clmul instruction to implement the CRC,
+ otherwise (TARGET_ZBKB) generate table based using brev.  */
+  if ((TARGET_ZBKC || TARGET_ZBC) && mode < word_mode)
+expand_reversed_crc_using_clmul (mode, mode,
+operands);
+  else if (TARGET_ZBKB)
+/* Generate table-based CRC.
+   To reflect values use brev and bswap instructions.  */
+expand_reversed_crc_table_based (operands[0], operands[1],
+operands[2], operands[3],
+GET_MODE (operands[2]),
+generate_reflecting_code_using_brev);
+  else
+/* Generate table-based CRC.
+   To reflect va

Re: [PATCH] libstdc++: Hide TLS variables in `std::call_once`

2024-11-29 Thread Jonathan Wakely
The change seems reasonable but this needs a change to
config/abi/pre/gnu.ver to export the new symbol in the GLIBCXX_3.4.34
version.

Please add full stops (periods) to the ChangeLog entry, to make
complete sentences.

Is "PR libstdc++/, target/" valid like that? I don't think it
is, it should be two separate lines:

PR libstdc++/
PR target/



Re: [PATCH] libstdc++: Hide TLS variables in `std::call_once`

2024-11-29 Thread LIU Hao

在 2024-11-29 23:08, Jonathan Wakely 写道:

The change seems reasonable but this needs a change to
config/abi/pre/gnu.ver to export the new symbol in the GLIBCXX_3.4.34
version.


I have added it in the attached patch. However it exists only if `_GLIBCXX_HAVE_TLS` is defined, so 
seems to require an #if.. ?



Please add full stops (periods) to the ChangeLog entry, to make
complete sentences.

Is "PR libstdc++/, target/" valid like that? I don't think it
is, it should be two separate lines:

PR libstdc++/
PR target/


Fixed now.


--
Best regards,
LIU Hao
From d5cf40386b742a6c2327d2eb5b3685165272f63c Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Fri, 29 Nov 2024 17:17:01 +0800
Subject: [PATCH] libstdc++: Hide TLS variables in `std::call_once`

This is a transitional change for PR80881, because on Windows, thread-local
variables can't be exported from a DLL.

This commit hides `__once_callable` and `__once` beneath a wrapper function,
but also leaves them external for backward compatibility.

Another purpose for this wrapper function is that, if in the future we decide
to fix PR66146, the changes will be local to source files.

libstdc++-v3/ChangeLog:
PR libstdc++/66146
PR target/80881
* include/std/mutex (once_flag::_M_use_callable): New function.
(call_once): Duplicate from old one for TLS, and make it call
`_M_use_callable` instead.
* src/c++11/mutex.cc (once_flag::_M_use_callable): New function.

Signed-off-by: LIU Hao 
---
 libstdc++-v3/config/abi/pre/gnu.ver |  6 
 libstdc++-v3/include/std/mutex  | 48 -
 libstdc++-v3/src/c++11/mutex.cc | 15 +
 3 files changed, 41 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 31449b5b87b8..fa7b2277df0e 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2528,6 +2528,12 @@ GLIBCXX_3.4.33 {
 _ZNKSt12__basic_fileIcE13native_handleEv;
 } GLIBCXX_3.4.32;
 
+# GCC 15.1.0
+GLIBCXX_3.4.34 {
+# std::once_flag::_M_use_callable(void (*)(void*), void*)
+_ZNSt9once_flag15_M_use_callableEPFvPvES0_;
+} GLIBCXX_3.4.33;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 8dd9b23191fd..bc1cc4bb006e 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -818,44 +818,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // for most targets this doesn't work correctly for exceptional executions.
 __gthread_once_t _M_once = __GTHREAD_ONCE_INIT;
 
+# ifdef _GLIBCXX_HAVE_TLS
+void
+_M_use_callable(void (*__call)(void*), void* __callable);
+# else
 struct _Prepare_execution;
+# endif
 
 template
   friend void
   call_once(once_flag& __once, _Callable&& __f, _Args&&... __args);
   };
 
-  /// @cond undocumented
 # ifdef _GLIBCXX_HAVE_TLS
-  // If TLS is available use thread-local state for the type-erased callable
-  // that is being run by std::call_once in the current thread.
-  extern __thread void* __once_callable;
-  extern __thread void (*__once_call)();
+  /// Invoke a callable and synchronize with other calls using the same flag
+  template
+void
+call_once(once_flag& __once, _Callable&& __f, _Args&&... __args)
+{
+  // Closure type that runs the function
+  auto __callable = [&] {
+ std::__invoke(std::forward<_Callable>(__f),
+   std::forward<_Args>(__args)...);
+  };
 
-  // RAII type to set up state for pthread_once call.
-  struct once_flag::_Prepare_execution
-  {
-template
-  explicit
-  _Prepare_execution(_Callable& __c)
-  {
-   // Store address in thread-local pointer:
-   __once_callable = std::__addressof(__c);
-   // Trampoline function to invoke the closure via thread-local pointer:
-   __once_call = [] { (*static_cast<_Callable*>(__once_callable))(); };
-  }
+  auto __call = [](void* __p) {
+ (*static_cast(__p))();
+  };
 
-~_Prepare_execution()
-{
-  // PR libstdc++/82481
-  __once_callable = nullptr;
-  __once_call = nullptr;
+  __once._M_use_callable(__call, &__callable);
 }
 
-_Prepare_execution(const _Prepare_execution&) = delete;
-_Prepare_execution& operator=(const _Prepare_execution&) = delete;
-  };
-
 # else
   // Without TLS use a global std::mutex and store the callable in a
   // global std::function.
@@ -892,8 +885,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Prepare_execution(const _Prepare_execution&) = delete;
 _Prepare_execution& operator=(const _Prepare_execution&) = delete;
   };
-# endif
-  /// @endcond
 
   // This function is passed to pthread_once by std::call_once.
   // It runs __once_call() or __once_functor().
@@ -916,6 +907,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (int __e = __gthread_once(&__once

Re: [PATCH v2 1/3] arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c

2024-11-29 Thread Christophe Lyon




On 11/29/24 15:14, Andre Vieira wrote:


After the changes to the vctp intrinsic codegen changed slightly, where we now
unfortunately seem to be generating unneeded moves and extends of the mask.
These are however not incorrect and we don't have a fix for the unneeded
codegen right now, so changing the testcase to accept them so we can catch
other changes if they occur.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/dlstp-compile-asm-2.c (test7): Add an optional
vmsr to the check-function-bodies.


This is OK, thanks.


---
  gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c | 5 +
  1 file changed, 5 insertions(+)



Re: [PATCH] libstdc++: Hide TLS variables in `std::call_once`

2024-11-29 Thread Jonathan Wakely
On Fri, 29 Nov 2024 at 15:31, LIU Hao  wrote:
>
> 在 2024-11-29 23:08, Jonathan Wakely 写道:
> > The change seems reasonable but this needs a change to
> > config/abi/pre/gnu.ver to export the new symbol in the GLIBCXX_3.4.34
> > version.
>
> I have added it in the attached patch. However it exists only if 
> `_GLIBCXX_HAVE_TLS` is defined, so
> seems to require an #if.. ?

It looks like your patch is against gcc-14 not trunk, the
GLIBCXX_15.1.0 version is already there.



Re: [PATCH] libstdc++: Hide TLS variables in `std::call_once`

2024-11-29 Thread Jonathan Wakely
On Fri, 29 Nov 2024 at 15:49, Jonathan Wakely  wrote:
>
> On Fri, 29 Nov 2024 at 15:31, LIU Hao  wrote:
> >
> > 在 2024-11-29 23:08, Jonathan Wakely 写道:
> > > The change seems reasonable but this needs a change to
> > > config/abi/pre/gnu.ver to export the new symbol in the GLIBCXX_3.4.34
> > > version.
> >
> > I have added it in the attached patch. However it exists only if 
> > `_GLIBCXX_HAVE_TLS` is defined, so
> > seems to require an #if.. ?
>
> It looks like your patch is against gcc-14 not trunk, the
> GLIBCXX_15.1.0 version is already there.

Sorry, I mean GLIBCXX_3.4.34 for 15.1.0



Re: [PATCH v2 3/3] arm, mve: Detect uses of vctp_vpr_generated inside subregs

2024-11-29 Thread Christophe Lyon




On 11/29/24 15:15, Andre Vieira wrote:


Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that.  Using
reg_overlap_mentioned_p is much more robust.

gcc/ChangeLog:

PR target/117814
* gcc/config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.


This is OK,

Thanks.


---
  gcc/config/arm/arm.cc |  3 +-
  .../gcc.target/arm/mve/dlstp-invalid-asm.c| 37 ++-
  2 files changed, 37 insertions(+), 3 deletions(-)



Re: [PATCH] libstdc++: Hide TLS variables in `std::call_once`

2024-11-29 Thread LIU Hao

在 2024-11-29 23:50, Jonathan Wakely 写道:

It looks like your patch is against gcc-14 not trunk, the
GLIBCXX_15.1.0 version is already there.


Sorry, I mean GLIBCXX_3.4.34 for 15.1.0


Oops that's what I used to test the patch. Reapplied to master now.


--
Best regards,
LIU Hao
From bafa1604e632d621a57b50ccd9d7a87b01822fd7 Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Fri, 29 Nov 2024 17:17:01 +0800
Subject: [PATCH] libstdc++: Hide TLS variables in `std::call_once`

This is a transitional change for PR80881, because on Windows, thread-local
variables can't be exported from a DLL.

This commit hides `__once_callable` and `__once` beneath a wrapper function,
but also leaves them external for backward compatibility.

Another purpose for this wrapper function is that, if in the future we decide
to fix PR66146, the changes will be local to source files.

libstdc++-v3/ChangeLog:
PR libstdc++/66146
PR target/80881
* include/std/mutex (once_flag::_M_use_callable): New function.
(call_once): Duplicate from old one for TLS, and make it call
`_M_use_callable` instead.
* src/c++11/mutex.cc (once_flag::_M_use_callable): New function.
---
 libstdc++-v3/config/abi/pre/gnu.ver |  2 ++
 libstdc++-v3/include/std/mutex  | 48 -
 libstdc++-v3/src/c++11/mutex.cc | 15 +
 3 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index ae79b371d80d..ef7141a06b6a 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2540,6 +2540,8 @@ GLIBCXX_3.4.34 {
 
_ZNSt8__format25__locale_encoding_to_utf8ERKSt6localeSt17basic_string_viewIcSt11char_traitsIcEEPv;
 # __sso_string constructor and destructor
 _ZNSt12__sso_string[CD][12]Ev;
+# std::once_flag::_M_use_callable(void (*)(void*), void*)
+_ZNSt9once_flag15_M_use_callableEPFvPvES0_;
 } GLIBCXX_3.4.33;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index e0cedc4398a9..0fe624f075c5 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -820,44 +820,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // for most targets this doesn't work correctly for exceptional executions.
 __gthread_once_t _M_once = __GTHREAD_ONCE_INIT;
 
+# ifdef _GLIBCXX_HAVE_TLS
+void
+_M_use_callable(void (*__call)(void*), void* __callable);
+# else
 struct _Prepare_execution;
+# endif
 
 template
   friend void
   call_once(once_flag& __once, _Callable&& __f, _Args&&... __args);
   };
 
-  /// @cond undocumented
 # ifdef _GLIBCXX_HAVE_TLS
-  // If TLS is available use thread-local state for the type-erased callable
-  // that is being run by std::call_once in the current thread.
-  extern __thread void* __once_callable;
-  extern __thread void (*__once_call)();
+  /// Invoke a callable and synchronize with other calls using the same flag
+  template
+void
+call_once(once_flag& __once, _Callable&& __f, _Args&&... __args)
+{
+  // Closure type that runs the function
+  auto __callable = [&] {
+ std::__invoke(std::forward<_Callable>(__f),
+   std::forward<_Args>(__args)...);
+  };
 
-  // RAII type to set up state for pthread_once call.
-  struct once_flag::_Prepare_execution
-  {
-template
-  explicit
-  _Prepare_execution(_Callable& __c)
-  {
-   // Store address in thread-local pointer:
-   __once_callable = std::__addressof(__c);
-   // Trampoline function to invoke the closure via thread-local pointer:
-   __once_call = [] { (*static_cast<_Callable*>(__once_callable))(); };
-  }
+  auto __call = [](void* __p) {
+ (*static_cast(__p))();
+  };
 
-~_Prepare_execution()
-{
-  // PR libstdc++/82481
-  __once_callable = nullptr;
-  __once_call = nullptr;
+  __once._M_use_callable(__call, &__callable);
 }
 
-_Prepare_execution(const _Prepare_execution&) = delete;
-_Prepare_execution& operator=(const _Prepare_execution&) = delete;
-  };
-
 # else
   // Without TLS use a global std::mutex and store the callable in a
   // global std::function.
@@ -894,8 +887,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Prepare_execution(const _Prepare_execution&) = delete;
 _Prepare_execution& operator=(const _Prepare_execution&) = delete;
   };
-# endif
-  /// @endcond
 
   // This function is passed to pthread_once by std::call_once.
   // It runs __once_call() or __once_functor().
@@ -918,6 +909,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (int __e = __gthread_once(&__once._M_once, &__once_proxy))
__throw_system_error(__e);
 }
+# endif
 
 #else // _GLIBCXX_HAS_GTHREADS
 
diff --git a/libstdc++-v3/src/c++11/mutex.cc b/libstdc++-v3/src/c++11/mutex.cc
index bfeea538a7c3..5aea0e7b60e2 100644
--- a/libstdc++-v3/s

[C PATCH] c: Set attributes for fields when forming a composite type [PR117806]

2024-11-29 Thread Martin Uecker


It seems we also miss a decl_attributes call for the fields
when building the composite type.


Bootstrapped and regression tested on x86_64.


c: Set attributes for fields when forming a composite type [PR117806]

We need to call decl_attributes when creating the fields for a composite
type.

PR c/117806

gcc/c/ChangeLog:
* c-typeck.cc (composite_type_internal): Call decl_attributes.

gcc/testsuite/ChangeLog:
* gcc.dg/pr117806.c: New test.

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 611daccb926..e60f89a21d9 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -791,6 +791,8 @@ composite_type_internal (tree t1, tree t2, struct 
composite_cache* cache)
  DECL_ATTRIBUTES (f) = DECL_ATTRIBUTES (a);
  C_DECL_VARIABLE_SIZE (f) = C_TYPE_VARIABLE_SIZE (t);
 
+ decl_attributes (&f, DECL_ATTRIBUTES (f), 0);
+
  finish_decl (f, input_location, NULL, NULL, NULL);
 
  if (DECL_C_BIT_FIELD (a))
diff --git a/gcc/testsuite/gcc.dg/pr117806.c b/gcc/testsuite/gcc.dg/pr117806.c
new file mode 100644
index 000..bc2c8c665e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr117806.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c23" } */
+
+struct Test {
+  double D __attribute__((packed,aligned(4)));
+} x;
+struct Test {
+  double D __attribute__((packed,aligned(4)));
+} x;
+struct Test {
+  double D __attribute__((packed,aligned(4)));
+} x;
+



Re: [PATCH] libstdc++: Make std::basic_stacktrace swappable with unequal allocators

2024-11-29 Thread Jonathan Wakely
On Thu, 28 Nov 2024 at 18:59, Jonathan Wakely  wrote:
>
> The standard says that it's undefined to swap two containers if the
> allocators are not equal and do not propagate. This ensures that swap is
> always O(1) and non-throwing, but has other undesirable consequences
> such as LWG 2152. The 2016 paper P0178 ("Allocators and swap") proposed
> making the non-member swap handle non-equal allocators, by performing an
> O(n) deep copy when needed. This ensures that a.swap(b) is still O(1)
> and non-throwing, but swap(a, b) is valid for all values of the type.
>
> This change implements that for std::basic_stacktrace. The member swap
> is changed so that for the undefined case (where we can't swap the
> allocators, but can't swap storage separately from the allocators) we
> just return without changing either object. This ensures that with
> assertions disabled we don't separate allocated storage from the
> allocator that can free it.
>
> For the non-member swap, perform deep copies of the two ranges, avoiding
> reallocation if there is already sufficient capacity.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/stacktrace (basic_stacktrace::swap): Refactor so
> that the undefined case is a no-op when assertions are disabled.
> (swap): Remove precondition and perform deep copies when member
> swap would be undefined.
> * testsuite/19_diagnostics/stacktrace/stacktrace.cc: Check
> swapping with unequal, non-propagating allocators.
> ---
>
> As part of my ongoing quest to reduce the undefined behaviour surface in
> the library, this helps to avoid UB when swapping stacktrace objects.
>
> This is an RFC to see if people like the idea. If we do it here, we
> could do it for other containers too.
>
> For the common case there should be no additional cost, because the
> 'if constexpr' conditions will be true and swap(a, b) will just call
> a.swap(b) unconditionally, which will swap the contents unconditionally.
> We only do extra work in the cases that are currently undefined.
>
> Tested x86_64-linux.
>
>  libstdc++-v3/include/std/stacktrace   | 77 ---
>  .../19_diagnostics/stacktrace/stacktrace.cc   | 23 ++
>  2 files changed, 90 insertions(+), 10 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/stacktrace 
> b/libstdc++-v3/include/std/stacktrace
> index f94a424e4cf..ab0788cde08 100644
> --- a/libstdc++-v3/include/std/stacktrace
> +++ b/libstdc++-v3/include/std/stacktrace
> @@ -476,15 +476,79 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
>// [stacktrace.basic.mod], modifiers
> +
> +  /** Exchange the contents of two stacktrace objects
> +   *
> +   * @pre The allocators must propagate on swap or must be equal.
> +   */
>void
>swap(basic_stacktrace& __other) noexcept
>{
> -   std::swap(_M_impl, __other._M_impl);
> if constexpr (_AllocTraits::propagate_on_container_swap::value)
> - std::swap(_M_alloc, __other._M_alloc);
> + {
> +   using std::swap;
> +   swap(_M_alloc, __other._M_alloc);
> + }
> else if constexpr (!_AllocTraits::is_always_equal::value)
>   {
> -   __glibcxx_assert(_M_alloc == __other._M_alloc);
> +   if (_M_alloc != __other._M_alloc)
> + {
> +   __glibcxx_assert(_M_alloc == __other._M_alloc);
> +   // If assertions are disabled but the allocators are unequal,
> +   // we can't swap pointers, so just erroneously return.
> +   return;
> + }
> + }
> +   std::swap(_M_impl, __other._M_impl);
> +  }
> +
> +  // [stacktrace.basic.nonmem], non-member functions
> +
> +  /** Exchange the contents of two stacktrace objects
> +   *
> +   * Unlike the `swap` member function, this can be used with unequal
> +   * and non-propagating allocators. If the storage cannot be efficiently
> +   * swapped then the stacktrace_entry elements will be exchanged
> +   * one-by-one, reallocating if needed.
> +   */
> +  friend void
> +  swap(basic_stacktrace& __a, basic_stacktrace& __b)
> +  noexcept(_AllocTraits::propagate_on_container_swap::value
> +|| _AllocTraits::is_always_equal::value)
> +  {
> +   if constexpr (_AllocTraits::propagate_on_container_swap::value
> +   || _AllocTraits::is_always_equal::value)
> + __a.swap(__b);
> +   else if (__a._M_alloc == __b._M_alloc) [[likely]]
> + __a.swap(__b);
> +   else // O(N) swap for non-equal non-propagating allocators
> + {
> +   basic_stacktrace* __p[2]{ std::__addressof(__a),
> + std::__addressof(__b) };
> +   if (__p[0]->size() > __p[1]->size())
> + std::swap(__p[0], __p[1]);
> +   basic_stacktrace& __a = *__p[0]; // shorter sequence
> +   basic_stacktrace& __b = *__p[1]; // longe

RE: [RFC] PR81358: Enable automatic linking of libatomic

2024-11-29 Thread Joseph Myers
On Fri, 29 Nov 2024, Prathamesh Kulkarni wrote:

> > My expectation is that CFLAGS should not be modified until after
> > save_CFLAGS is set, which should not be until after configure has
> > executed the logic that sets a -g -O2 default.  Is there some problem
> > with that ordering (e.g. configure tests that expect to link target
> > programs but run as part of the same Autoconf macro invocation that
> > also generates the logic to determine default values)?  Also, the
> It seems that in configure, AC_PROG_CC expands to setting "-g -O2" in CFLAGS,
> and running conftests using those CFLAGS, and any adjustments to CFLAGS after 
> invoking AC_PROG_CC don't help.
> In the attached patch, I simply moved save_CFLAGS and CFLAGS before invoking 
> AC_PROG_CC,
> and adding "-fno-link-libatomic" to CFLAGS, which seems to work, but not sure 
> if it's the correct approach ?

I don't think having those settings before the default from AC_PROG_CC is 
logically right, because the default from AC_PROG_CC only applies if 
CFLAGS is not already set (that is, you'd lose the default -g -O2, if 
libatomic/configure is run without CFLAGS set).

The underlying principle is that CFLAGS is a variable for the *user* to 
set as they wish, not something that should be used for any options 
required as part of the build (that's what other things such as XCFLAGS 
and AM_CFLAGS are for).  So if you need to modify CFLAGS in configure for 
use as part of configure tests, it should only be done temporarily in a 
way that doesn't interfere with the normal logic to determine that default 
setting.  If for some reason that doesn't work (if AC_PROG_CC also runs 
tests that need modified CFLAGS, giving nowhere you get modify the value 
between the default being set and it being used), you'd need an assertion 
that libatomic/configure didn't get run with unset CFLAGS so the default 
wouldn't be applicable anyway, with appropriate comments explaining the 
issues.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v2] aarch64: Fix build failure due to missing header

2024-11-29 Thread Kyrylo Tkachov



> On 29 Nov 2024, at 14:49, Yury Khrustalev  wrote:
> 
> Including the "arm_acle.h" header in aarch64-unwind.h requires
> stdint.h to be present and it may not be available during the
> first stage of cross-compilation of GCC.
> 
> When cross-building GCC for the aarch64-none-linux-gnu target
> (on any supporting host) using the 3-stage bootstrap build
> process when we build native compiler from source, libgcc fails
> to compile due to missing header that has not been installed yet.
> 
> This could be worked around but it's better to fix the issue.
> 

Ok.
Thanks,
Kyrill

> libgcc/ChangeLog:
> 
> * config/aarch64/aarch64-unwind.h (_CHKFEAT_GCS): Add.
> 
> ---
> 
> Regression tested on aarch64-unknown-linux-gnu and no regressions have been 
> found.
> Is this OK for trunk?
> Applies to fe29b03825c.
> 
> Thanks,
> Yury
> 
> ---
> libgcc/config/aarch64/aarch64-unwind.h | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
> b/libgcc/config/aarch64/aarch64-unwind.h
> index 85468f9685e..d11753a0e03 100644
> --- a/libgcc/config/aarch64/aarch64-unwind.h
> +++ b/libgcc/config/aarch64/aarch64-unwind.h
> @@ -29,7 +29,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> 
> #include "ansidecl.h"
> #include 
> -#include 
> 
> #define AARCH64_DWARF_REGNUM_RA_STATE 34
> #define AARCH64_DWARF_RA_STATE_MASK   0x1
> @@ -180,7 +179,7 @@ aarch64_demangle_return_addr (struct _Unwind_Context 
> *context,
> }
> 
> /* GCS enable flag for chkfeat instruction.  */
> -
> +#define _CHKFEAT_GCS 1
> /* SME runtime function local to libgcc, streaming compatible
>and preserves more registers than the base PCS requires, but
>we don't rely on that here.  */
> -- 
> 2.39.5
> 



Re: [PATCH] libstdc++: Make std::basic_stacktrace swappable with unequal allocators

2024-11-29 Thread Jonathan Wakely
Patch v2 fixes the bug in the slow path for swap, and improves the
test so that it fails with the old buggy code.
commit 44214429d428f4fe5a148c7636b844600a10f9a4
Author: Jonathan Wakely 
Date:   Thu Nov 28 13:59:09 2024

libstdc++: Make std::basic_stacktrace swappable with unequal allocators

The standard says that it's undefined to swap two containers if the
allocators are not equal and do not propagate. This ensures that swap is
always O(1) and non-throwing, but has other undesirable consequences
such as LWG 2152. The 2016 paper P0178 ("Allocators and swap") proposed
making the non-member swap handle non-equal allocators, by performing an
O(n) deep copy when needed. This ensures that a.swap(b) is still O(1)
and non-throwing, but swap(a, b) is valid for all values of the type.

This change implements that for std::basic_stacktrace. The member swap
is changed so that for the undefined case (where we can't swap the
allocators, but can't swap storage separately from the allocators) we
just return without changing either object. This ensures that with
assertions disabled we don't separate allocated storage from the
allocator that can free it.

For the non-member swap, perform deep copies of the two ranges, avoiding
reallocation if there is already sufficient capacity.

libstdc++-v3/ChangeLog:

* include/std/stacktrace (basic_stacktrace::swap): Refactor so
that the undefined case is a no-op when assertions are disabled.
(swap): Remove precondition and perform deep copies when member
swap would be undefined.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Check
swapping with unequal, non-propagating allocators.

diff --git a/libstdc++-v3/include/std/stacktrace 
b/libstdc++-v3/include/std/stacktrace
index f94a424e4cf..a7d4810e886 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -476,15 +476,80 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 
   // [stacktrace.basic.mod], modifiers
+
+  /** Exchange the contents of two stacktrace objects
+   *
+   * @pre The allocators must propagate on swap or must be equal.
+   */
   void
   swap(basic_stacktrace& __other) noexcept
   {
-   std::swap(_M_impl, __other._M_impl);
if constexpr (_AllocTraits::propagate_on_container_swap::value)
- std::swap(_M_alloc, __other._M_alloc);
+ {
+   using std::swap;
+   swap(_M_alloc, __other._M_alloc);
+ }
else if constexpr (!_AllocTraits::is_always_equal::value)
  {
-   __glibcxx_assert(_M_alloc == __other._M_alloc);
+   if (_M_alloc != __other._M_alloc)
+ {
+   __glibcxx_assert(_M_alloc == __other._M_alloc);
+   // If assertions are disabled but the allocators are unequal,
+   // we can't swap pointers, so just erroneously return.
+   return;
+ }
+ }
+   std::swap(_M_impl, __other._M_impl);
+  }
+
+  // [stacktrace.basic.nonmem], non-member functions
+
+  /** Exchange the contents of two stacktrace objects
+   *
+   * Unlike the `swap` member function, this can be used with unequal
+   * and non-propagating allocators. If the storage cannot be efficiently
+   * swapped then the stacktrace_entry elements will be exchanged
+   * one-by-one, reallocating if needed.
+   */
+  friend void
+  swap(basic_stacktrace& __a, basic_stacktrace& __b)
+  noexcept(_AllocTraits::propagate_on_container_swap::value
+|| _AllocTraits::is_always_equal::value)
+  {
+   if constexpr (_AllocTraits::propagate_on_container_swap::value
+   || _AllocTraits::is_always_equal::value)
+ __a.swap(__b);
+   else if (__a._M_alloc == __b._M_alloc) [[likely]]
+ __a.swap(__b);
+   else // O(N) swap for non-equal non-propagating allocators
+ {
+   basic_stacktrace* __p[2]{ std::__addressof(__a),
+ std::__addressof(__b) };
+   if (__p[0]->size() > __p[1]->size())
+ std::swap(__p[0], __p[1]);
+   basic_stacktrace& __a = *__p[0]; // shorter sequence
+   basic_stacktrace& __b = *__p[1]; // longer sequence
+
+   const auto __a_sz = __a.size();
+   auto __a_begin = __a._M_impl._M_frames;
+   auto __a_end = __a._M_impl._M_frames + __a_sz;
+   auto __b_begin = __b._M_impl._M_frames;
+
+   if (__a._M_impl._M_capacity < __b.size())
+ {
+   // Reallocation needed.
+   basic_stacktrace __tmp(__b, __a._M_alloc);
+   std::copy(__a_begin, __a_end, __b_begin);
+   __b._M_impl._M_resize(__a_sz, __b._M_alloc);
+   std::swap(__tmp._M_impl, __a._M_impl);
+   return;
+ 

[Patch, fortran] PR102689 revisited - Segfault with RESHAPE of CLASS as actual argument

2024-11-29 Thread Paul Richard Thomas
Hi All,

This patch was originally pushed as r15-2739. Subsequently memory faults
were found and so the patch was reverted. At the time, I could find where
the problem lay. This morning I had another look and found it almost
immediately :-)

The fix is the 'gfc_resize_class_size_with_len' in the chunk '@@ -1595,14
+1629,51 @@ gfc_trans_create_temp_array '. Without it,, half as much memory
as needed was being provided by the allocation and so accesses were
occurring outside the allocated space. Valgrind now reports no errors.

Regression tests with flying colours - OK for mainline?

Paul
diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index 59ac0d97e08..64a0e726eeb 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -884,11 +884,21 @@ static void
 add_proc_comp (gfc_symbol *vtype, const char *name, gfc_typebound_proc *tb)
 {
   gfc_component *c;
-
+  bool is_abstract = false;
 
   c = gfc_find_component (vtype, name, true, true, NULL);
 
-  if (tb->non_overridable && !tb->overridden && c)
+  /* If the present component typebound proc is abstract, the new version
+ should unconditionally be tested if it is a suitable replacement.  */
+  if (c && c->tb && c->tb->u.specific
+  && c->tb->u.specific->n.sym->attr.abstract)
+is_abstract = true;
+
+  /* Pass on the new tb being not overridable if a component is found and
+ either there is not an overridden specific or the present component
+ tb is abstract. This ensures that possible, viable replacements are
+ loaded.  */
+  if (tb->non_overridable && !tb->overridden && !is_abstract && c)
 return;
 
   if (c == NULL)
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 0d3845f9ce3..afed8db7852 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -3229,8 +3229,8 @@ static bool check_pure_function (gfc_expr *e)
   const char *name = NULL;
   code_stack *stack;
   bool saw_block = false;
-  
-  /* A BLOCK construct within a DO CONCURRENT construct leads to 
+
+  /* A BLOCK construct within a DO CONCURRENT construct leads to
  gfc_do_concurrent_flag = 0 when the check for an impure function
  occurs.  Check the stack to see if the source code has a nested
  BLOCK construct.  */
@@ -16305,10 +16305,6 @@ resolve_fl_derived (gfc_symbol *sym)
   && sym->ns->proc_name
   && sym->ns->proc_name->attr.flavor == FL_MODULE
   && sym->attr.access != ACCESS_PRIVATE
-  && !(sym->attr.extension
-	   && sym->attr.zero_comp
-	   && !sym->f2k_derived->tb_sym_root
-	   && !sym->f2k_derived->tb_uop_root)
   && !(sym->attr.vtype || sym->attr.pdt_template))
 {
   gfc_symbol *vtab = gfc_find_derived_vtab (sym);
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index a458af322ce..870f2920ddc 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -1325,23 +1325,28 @@ get_array_ref_dim_for_loop_dim (gfc_ss *ss, int loop_dim)
is a class expression.  */
 
 static tree
-get_class_info_from_ss (stmtblock_t * pre, gfc_ss *ss, tree *eltype)
+get_class_info_from_ss (stmtblock_t * pre, gfc_ss *ss, tree *eltype,
+			gfc_ss **fcnss)
 {
+  gfc_ss *loop_ss = ss->loop->ss;
   gfc_ss *lhs_ss;
   gfc_ss *rhs_ss;
+  gfc_ss *fcn_ss = NULL;
   tree tmp;
   tree tmp2;
   tree vptr;
-  tree rhs_class_expr = NULL_TREE;
+  tree class_expr = NULL_TREE;
   tree lhs_class_expr = NULL_TREE;
   bool unlimited_rhs = false;
   bool unlimited_lhs = false;
   bool rhs_function = false;
+  bool unlimited_arg1 = false;
   gfc_symbol *vtab;
+  tree cntnr = NULL_TREE;
 
   /* The second element in the loop chain contains the source for the
- temporary; ie. the rhs of the assignment.  */
-  rhs_ss = ss->loop->ss->loop_chain;
+ class temporary created in gfc_trans_create_temp_array.  */
+  rhs_ss = loop_ss->loop_chain;
 
   if (rhs_ss != gfc_ss_terminator
   && rhs_ss->info
@@ -1350,28 +1355,58 @@ get_class_info_from_ss (stmtblock_t * pre, gfc_ss *ss, tree *eltype)
   && rhs_ss->info->data.array.descriptor)
 {
   if (rhs_ss->info->expr->expr_type != EXPR_VARIABLE)
-	rhs_class_expr
+	class_expr
 	  = gfc_get_class_from_expr (rhs_ss->info->data.array.descriptor);
   else
-	rhs_class_expr = gfc_get_class_from_gfc_expr (rhs_ss->info->expr);
+	class_expr = gfc_get_class_from_gfc_expr (rhs_ss->info->expr);
   unlimited_rhs = UNLIMITED_POLY (rhs_ss->info->expr);
   if (rhs_ss->info->expr->expr_type == EXPR_FUNCTION)
 	rhs_function = true;
 }
 
+  /* Usually, ss points to the function. When the function call is an actual
+ argument, it is instead rhs_ss because the ss chain is shifted by one.  */
+  *fcnss = fcn_ss = rhs_function ? rhs_ss : ss;
+
+  /* If this is a transformational function with a class result, the info
+ class_container field points to the class container of arg1.  */
+  if (class_expr != NULL_TREE
+  && fcn_ss->info && fcn_ss->info->expr
+  && fcn_ss->info->expr->expr_type == EXPR_FUNCTION
+  && fcn_s

[patch,avr] PR117726: Post-reload split 2-byte and 3-byte shifts

2024-11-29 Thread Georg-Johann Lay

This patch splits 2-byte and 3-byte shifts after reload into
a 3-operand byte shift and a residual 2-operand shift.

The "2op" shift insn alternatives are not needed and removed because
all shift insn already have a "r,0,n" alternative that does the job.

Ok for trunk?

Johann

--


AVR: target/117726 - Better optimize shifts.

This patch splits 2-byte and 3-byte shifts after reload into
a 3-operand byte shift and a residual 2-operand shift.

The "2op" shift insn alternatives are not needed and removed because
all shift insn already have a "r,0,n" alternative that does the job.

PR target/117726
gcc/
* config/avr/avr-passes.cc (avr_shift_is_3op, avr_emit_shift):
Also handle 2-byte and 3-byte shifts.
(avr_split_shift4, avr_split_shift3, avr_split_shift2): New
local helper functions.
(avr_split_shift): Use them.
* config/avr/avr-passes.def (avr_pass_split_after_peephole2):
Adjust comments.
* config/avr/avr.cc (avr_out_ashlpsi3, avr_out_ashrpsi3)
(avr_out_lshrpsi3): Support offset 15.
(ashrhi3_out): Support offset 7 as 3-op.
(ashrsi3_out): Support offset 15.
(avr_rtx_costs_1): Adjust shift costs.
* config/avr/avr.md (2op): Remove attribute value and all such insn
alternatives.
(ashlhi3, *ashlhi3, *ashlhi3_const): Add 3-op alternatives like C2l.
(ashrhi3, *ashrhi3, *ashrhi3_const): Add 3-op alternatives like C2a.
(lshrhi3, *lshrhi3, *lshrhi3_const): Add 3-op alternatives like C2r.
(*ashlpsi3_split, *ashlpsi3): Add 3-op alternatives C15 and C3l.
(*ashrpsi3_split, *ashrpsi3): Add 3-op alternatives C15 and C3r.
(*lshrpsi3_split, *lshrpsi3): Add 3-op alternatives C15 and C3r.
(ashlsi3, *ashlsi3, *ashlsi3_const): Remove "2op" alternative.
(ashrsi3, *ashrsi3, *ashrsi3_const): Same.
(lshrsi3, *lshrsi3, *lshrsi3_const): Same.
(constr_split_suffix): Code attr morphed from constr_split_shift4.
* config/avr/constraints.md (C2a, C2r, C2l)
(C3a, C3r, C3l): New constraints.
* doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Adjust doc.diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index bd249b70e8d..a7528764530 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -4781,7 +4781,8 @@ avr_pass_fuse_add::execute1 (function *func)
 
 
 //
-// Split insns after peephole2 / befor avr-fuse-move.
+// Split shift insns after peephole2 / befor avr-fuse-move.
+
 static const pass_data avr_pass_data_split_after_peephole2 =
 {
   RTL_PASS,	// type
@@ -4816,20 +4817,19 @@ public:
 } // anonymous namespace
 
 
-/* Whether some shift insn alternatives are a 3-operand insn or a
-   2-operand insn.  This 3op alternatives allow the source and the
-   destination register of the shift to be different right from the
-   start, because the splitter will split the 3op shift into a 3op byte
-   shift and a 2op residual bit shift.
-   (When the residual shift has an offset of one less than the bitsize,
-   then the residual shift is also a 3op insn.  */
+/* Whether some shift insn alternatives are a `3op' 3-operand insn.
+   This 3op alternatives allow the source and the destination register
+   of the shift to be different right from the start, because the splitter
+   will split the 3op shift into a 3-operand byte shift and a 2-operand
+   residual bit shift.  (When the residual shift has an offset of one
+   less than the bitsize, then the residual shift is also a 3op insn.)  */
 
 bool
 avr_shift_is_3op ()
 {
   // Don't split for OPTIMIZE_SIZE_MAX (-Oz).
   // For OPTIMIZE_SIZE_BALANCED (-Os), we still split because
-  // the size overhead (if exists at all) is marginal.
+  // the size overhead (if at all) is marginal.
 
   return (avropt_split_bit_shift
 	  && optimize > 0
@@ -4837,41 +4837,77 @@ avr_shift_is_3op ()
 }
 
 
-/* Implement constraints `C4a', `C4l' and `C4r'.
+/* Implement constraints `C2a', `C2l', `C2r' ... `C4a', `C4l', `C4r'.
Whether we split an N_BYTES shift of code CODE in { ASHIFTRT,
LSHIFTRT, ASHIFT } into a byte shift and a residual bit shift.  */
 
 bool
 avr_split_shift_p (int n_bytes, int offset, rtx_code code)
 {
-  gcc_assert (n_bytes == 4);
+  gcc_assert (n_bytes == 4 || n_bytes == 3 || n_bytes == 2);
+
+  if (! avr_shift_is_3op ()
+  || offset % 8 == 0)
+return false;
 
-  if (avr_shift_is_3op ()
-  && offset % 8 != 0)
+  if (n_bytes == 4)
 return select()
-  : code == ASHIFT ? IN_RANGE (offset, 17, 30)
-  : code == ASHIFTRT ? IN_RANGE (offset, 9, 29)
+  : code == ASHIFT ? IN_RANGE (offset, 9, 30) && offset != 15
+  : code == ASHIFTRT ? IN_RANGE (offset, 9, 29) && offset != 15
   : code == LSHIFTRT ? IN_RANGE (offset, 9, 30) && offset != 15
   : bad_case ();
 
+  if (n_bytes == 3)
+return select()
+  : code

Re: [PUSHED] fortran: Add default to switch in gfc_trans_transfer [PR117843]

2024-11-29 Thread Harald Anlauf

Thanks, Andrew, for fixing this!

I did not get any reports from the pre-commit testers; I only
saw the fallout later.

And sorry for breaking bootstrap!

Harald

Am 29.11.24 um 10:16 schrieb Andrew Pinski:

This fixes a bootstrap failure due to a warning on enum values not being
handled. In this case, it is just checking two values and the rest should
are not handled so adding a default case fixes the issue.

Pushed as obvious.

PR fortran/117843
gcc/fortran/ChangeLog:

* trans-io.cc (gfc_trans_transfer): Add default case.

Signed-off-by: Andrew Pinski 
---
  gcc/fortran/trans-io.cc | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/fortran/trans-io.cc b/gcc/fortran/trans-io.cc
index 906dd7c6eb6..9b0b8cfdff9 100644
--- a/gcc/fortran/trans-io.cc
+++ b/gcc/fortran/trans-io.cc
@@ -2664,6 +2664,8 @@ gfc_trans_transfer (gfc_code * code)
  case EXPR_FUNCTION:
  case EXPR_OP:
goto scalarize;
+ default:
+   break;
  }
  }
}




Re: [Patch, fortran] PR102689 revisited - Segfault with RESHAPE of CLASS as actual argument

2024-11-29 Thread Harald Anlauf

Hi Paul,

the patch seems to contain stuff that has already been pushed
(gcc/testsuite/gfortran.dg/pr117768.f90, and the chunks in
class.cc and resolve.cc).  Can you please check?

Cheers,
Harald

Am 29.11.24 um 17:34 schrieb Paul Richard Thomas:

Hi All,

This patch was originally pushed as r15-2739. Subsequently memory faults
were found and so the patch was reverted. At the time, I could find where
the problem lay. This morning I had another look and found it almost
immediately :-)

The fix is the 'gfc_resize_class_size_with_len' in the chunk '@@ -1595,14
+1629,51 @@ gfc_trans_create_temp_array '. Without it,, half as much memory
as needed was being provided by the allocation and so accesses were
occurring outside the allocated space. Valgrind now reports no errors.

Regression tests with flying colours - OK for mainline?

Paul





[patch,avr] Fix PR117681 build warning for libgcc/unwind-sjlj.c

2024-11-29 Thread Georg-Johann Lay

This patch fixes a build warning for libgcc/unwind-sjlj.c
which used word_mode for _Unwind_Word but should use Pmode.

Ok for trunk?

Johann

--

AVR: target/117681 - Set UNWIND_WORD_MODE to Pmode.

This patch fixes a build warning for libgcc/unwind-sjlj.c
which used word_mode for _Unwind_Word but should use Pmode.

PR target/117681
gcc/
* config/avr/avr.cc (TARGET_UNWIND_WORD_MODE): Define to...
(avr_unwind_word_mode): ...this new static function.commit 9e48a5e1dc054959d1dfc2f757d5dcfbdb18e1c3
Author: Georg-Johann Lay 
Date:   Fri Nov 29 18:26:17 2024 +0100

AVR: target/117681 - Set UNWIND_WORD_MODE to Pmode.

This patch fixes a build warning for libgcc/unwind-sjlj.c
which used word_mode for _Unwind_Word but should use Pmode.

PR target/117681
gcc/
* config/avr/avr.cc (TARGET_UNWIND_WORD_MODE): Define to...
(avr_unwind_word_mode): ...this new static function.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index fc9f1770420..c5c39d30c47 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -15661,6 +15661,15 @@ avr_float_lib_compare_returns_bool (machine_mode mode, rtx_code)
 }
 
 
+/* Implement `TARGET_UNWIND_WORD_MODE'.  */
+
+static scalar_int_mode
+avr_unwind_word_mode ()
+{
+  return Pmode;
+}
+
+
 /* Implement `TARGET_LRA_P'.  */
 
 static bool
@@ -15867,6 +15876,9 @@ avr_use_lra_p ()
 #undef  TARGET_CANONICALIZE_COMPARISON
 #define TARGET_CANONICALIZE_COMPARISON avr_canonicalize_comparison
 
+#undef  TARGET_UNWIND_WORD_MODE
+#define TARGET_UNWIND_WORD_MODE avr_unwind_word_mode
+
 /* According to the opening comment in PR86772, the following applies:
   "If the port does not (and never will in the future) need to mitigate
against unsafe speculation."  */


Re: [patch,avr] Fix PR11768 build warning for libgcc/unwind-sjlj.c

2024-11-29 Thread Denis Chertykov
пт, 29 нояб. 2024 г. в 21:33, Georg-Johann Lay :
>
> This patch fixes a build warning for libgcc/unwind-sjlj.c
> which used word_mode for _Unwind_Word but should use Pmode.
>
> Ok for trunk?

Ok.
Please apply.

Denis.

>
> Johann
>
> --
>
> AVR: target/117681 - Set UNWIND_WORD_MODE to Pmode.
>
> This patch fixes a build warning for libgcc/unwind-sjlj.c
> which used word_mode for _Unwind_Word but should use Pmode.
>
> PR target/117681
> gcc/
> * config/avr/avr.cc (TARGET_UNWIND_WORD_MODE): Define to...
> (avr_unwind_word_mode): ...this new static function.
>
>
> diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
> index fc9f1770420..c5c39d30c47 100644
> --- a/gcc/config/avr/avr.cc
> +++ b/gcc/config/avr/avr.cc
> @@ -15661,6 +15661,15 @@ avr_float_lib_compare_returns_bool
> (machine_mode mode, rtx_code)
>   }
>
>
> +/* Implement `TARGET_UNWIND_WORD_MODE'.  */
> +
> +static scalar_int_mode
> +avr_unwind_word_mode ()
> +{
> +  return Pmode;
> +}
> +
> +
>   /* Implement `TARGET_LRA_P'.  */
>
>   static bool
> @@ -15867,6 +15876,9 @@ avr_use_lra_p ()
>   #undef  TARGET_CANONICALIZE_COMPARISON
>   #define TARGET_CANONICALIZE_COMPARISON avr_canonicalize_comparison
>
> +#undef  TARGET_UNWIND_WORD_MODE
> +#define TARGET_UNWIND_WORD_MODE avr_unwind_word_mode
> +
>   /* According to the opening comment in PR86772, the following applies:
> "If the port does not (and never will in the future) need to mitigate
>  against unsafe speculation."  */


Re: [patch,avr] PR117726: Post-reload split 2-byte and 3-byte shifts

2024-11-29 Thread Denis Chertykov
пт, 29 нояб. 2024 г. в 21:07, Georg-Johann Lay :
>
> This patch splits 2-byte and 3-byte shifts after reload into
> a 3-operand byte shift and a residual 2-operand shift.
>
> The "2op" shift insn alternatives are not needed and removed because
> all shift insn already have a "r,0,n" alternative that does the job.
>
> Ok for trunk?

Please apply.

Denis.

>
> Johann
>
> --
>
>
> AVR: target/117726 - Better optimize shifts.
>
> This patch splits 2-byte and 3-byte shifts after reload into
> a 3-operand byte shift and a residual 2-operand shift.
>
> The "2op" shift insn alternatives are not needed and removed because
> all shift insn already have a "r,0,n" alternative that does the job.
>
> PR target/117726
> gcc/
> * config/avr/avr-passes.cc (avr_shift_is_3op, avr_emit_shift):
> Also handle 2-byte and 3-byte shifts.
> (avr_split_shift4, avr_split_shift3, avr_split_shift2): New
> local helper functions.
> (avr_split_shift): Use them.
> * config/avr/avr-passes.def (avr_pass_split_after_peephole2):
> Adjust comments.
> * config/avr/avr.cc (avr_out_ashlpsi3, avr_out_ashrpsi3)
> (avr_out_lshrpsi3): Support offset 15.
> (ashrhi3_out): Support offset 7 as 3-op.
> (ashrsi3_out): Support offset 15.
> (avr_rtx_costs_1): Adjust shift costs.
> * config/avr/avr.md (2op): Remove attribute value and all such insn
> alternatives.
> (ashlhi3, *ashlhi3, *ashlhi3_const): Add 3-op alternatives like C2l.
> (ashrhi3, *ashrhi3, *ashrhi3_const): Add 3-op alternatives like C2a.
> (lshrhi3, *lshrhi3, *lshrhi3_const): Add 3-op alternatives like C2r.
> (*ashlpsi3_split, *ashlpsi3): Add 3-op alternatives C15 and C3l.
> (*ashrpsi3_split, *ashrpsi3): Add 3-op alternatives C15 and C3r.
> (*lshrpsi3_split, *lshrpsi3): Add 3-op alternatives C15 and C3r.
> (ashlsi3, *ashlsi3, *ashlsi3_const): Remove "2op" alternative.
> (ashrsi3, *ashrsi3, *ashrsi3_const): Same.
> (lshrsi3, *lshrsi3, *lshrsi3_const): Same.
> (constr_split_suffix): Code attr morphed from constr_split_shift4.
> * config/avr/constraints.md (C2a, C2r, C2l)
> (C3a, C3r, C3l): New constraints.
> * doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Adjust doc.


Re: [C PATCH] c: Set attributes for fields when forming a composite type [PR117806]

2024-11-29 Thread Joseph Myers
On Fri, 29 Nov 2024, Martin Uecker wrote:

> It seems we also miss a decl_attributes call for the fields
> when building the composite type.
> 
> 
> Bootstrapped and regression tested on x86_64.
> 
> 
> c: Set attributes for fields when forming a composite type [PR117806]
> 
> We need to call decl_attributes when creating the fields for a composite
> type.
> 
> PR c/117806
> 
> gcc/c/ChangeLog:
> * c-typeck.cc (composite_type_internal): Call decl_attributes.
> 
> gcc/testsuite/ChangeLog:
> * gcc.dg/pr117806.c: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH 2/2] libstdc++: Move std::monostate to for C++26 (P0472R2)

2024-11-29 Thread Jonathan Wakely
Another C++26 paper just approved in Wrocław. The std::monostate class
is defined in  since C++17, but for C++26 it should also be
available in .

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add bits/monostate.h.
* include/Makefile.in: Regenerate.
* include/std/utility: Include .
* include/std/variant (monostate, hash): Move
definitions to ...
* include/bits/monostate.h: New file.
* testsuite/20_util/headers/utility/synopsis.cc: Add monostate
and hash declarations.
* testsuite/20_util/monostate/requirements.cc: New test.
---

Tested x86_64-linux.

 libstdc++-v3/include/Makefile.am  |  1 +
 libstdc++-v3/include/Makefile.in  |  1 +
 libstdc++-v3/include/bits/monostate.h | 78 +++
 libstdc++-v3/include/std/utility  |  4 +
 libstdc++-v3/include/std/variant  | 31 +---
 .../20_util/headers/utility/synopsis.cc   |  5 ++
 .../20_util/monostate/requirements.cc | 38 +
 7 files changed, 128 insertions(+), 30 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/monostate.h
 create mode 100644 libstdc++-v3/testsuite/20_util/monostate/requirements.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 422a0f4bd0a..6efd3cd5f1c 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -133,6 +133,7 @@ bits_freestanding = \
${bits_srcdir}/iterator_concepts.h \
${bits_srcdir}/max_size_type.h \
${bits_srcdir}/memoryfwd.h \
+   ${bits_srcdir}/monostate.h \
${bits_srcdir}/move.h \
${bits_srcdir}/out_ptr.h \
${bits_srcdir}/predefined_ops.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 9fd4ab4848c..3b5f93ce185 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -488,6 +488,7 @@ bits_freestanding = \
${bits_srcdir}/iterator_concepts.h \
${bits_srcdir}/max_size_type.h \
${bits_srcdir}/memoryfwd.h \
+   ${bits_srcdir}/monostate.h \
${bits_srcdir}/move.h \
${bits_srcdir}/out_ptr.h \
${bits_srcdir}/predefined_ops.h \
diff --git a/libstdc++-v3/include/bits/monostate.h 
b/libstdc++-v3/include/bits/monostate.h
new file mode 100644
index 000..b6da720669a
--- /dev/null
+++ b/libstdc++-v3/include/bits/monostate.h
@@ -0,0 +1,78 @@
+// Definition of std::monostate for  and  -*- C++ -*-
+
+// Copyright (C) 2016-2024 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/monostate.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{utility}
+ */
+
+#ifndef _GLIBCXX_MONOSTATE_H
+#define _GLIBCXX_MONOSTATE_H 1
+
+#include 
+
+#ifdef __glibcxx_variant // C++ >= 17
+
+#include 
+#if __cplusplus >= 202002L
+# include 
+#endif
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+  struct monostate { };
+
+  constexpr bool operator==(monostate, monostate) noexcept { return true; }
+#ifdef __cpp_lib_three_way_comparison
+  constexpr strong_ordering
+  operator<=>(monostate, monostate) noexcept { return strong_ordering::equal; }
+#else
+  constexpr bool operator!=(monostate, monostate) noexcept { return false; }
+  constexpr bool operator<(monostate, monostate) noexcept { return false; }
+  constexpr bool operator>(monostate, monostate) noexcept { return false; }
+  constexpr bool operator<=(monostate, monostate) noexcept { return true; }
+  constexpr bool operator>=(monostate, monostate) noexcept { return true; }
+#endif
+
+  template<>
+struct hash
+{
+#if __cplusplus < 202002L
+  using result_type [[__deprecated__]] = size_t;
+  using argument_type [[__deprecated__]] = monostate;
+#endif
+
+  size_t
+  operator()(const monostate&) const noexcept
+  {
+   constexpr size_t __magic_monostate_hash = -;
+   re

Re: [PATCH v3 2/8] aarch64: Make C/C++ operations possible on SVE ACLE types.

2024-11-29 Thread Christophe Lyon
Hi!

On Fri, 29 Nov 2024 at 05:00, Tejas Belagod  wrote:
>
> This patch changes the TYPE_INDIVISBLE flag to 0 to enable SVE ACLE types to 
> be
> treated as GNU vectors and have the same semantics with operations that are
> defined on GNU vectors.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve-builtins.cc (register_builtin_types): 
> Flip
> TYPE_INDIVISBLE flag for SVE ACLE vector types.

Sorry I haven't closely followed the discussions around this patch
series, but the Linaro postcommit CI reports
1036 regressions after patch 2/8, is that expected?
Given that precommit CI detected "only" 22 regressions with all 8
patches, I suppose most of the 1036 are fixed later in the series?

Thanks,

Christophe

> ---
>  gcc/config/aarch64/aarch64-sve-builtins.cc | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 0fec1cd439e..adbadd303d4 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -4576,6 +4576,9 @@ register_builtin_types ()
>   vectype = build_truth_vector_type_for_mode 
> (BYTES_PER_SVE_VECTOR,
>   VNx16BImode);
>   num_pr = 1;
> + /* Leave svbool_t as indivisible for now.  We don't yet support
> +C/C++ operators on predicates.  */
> + TYPE_INDIVISIBLE_P (vectype) = 1;
> }
>   else
> {
> @@ -4592,12 +4595,12 @@ register_builtin_types ()
>   && TYPE_ALIGN (vectype) == 128
>   && known_eq (size, BITS_PER_SVE_VECTOR));
>   num_zr = 1;
> + TYPE_INDIVISIBLE_P (vectype) = 0;
> }
>   vectype = build_distinct_type_copy (vectype);
>   gcc_assert (vectype == TYPE_MAIN_VARIANT (vectype));
>   SET_TYPE_STRUCTURAL_EQUALITY (vectype);
>   TYPE_ARTIFICIAL (vectype) = 1;
> - TYPE_INDIVISIBLE_P (vectype) = 1;
>   make_type_sizeless (vectype);
> }
>if (num_pr)
> --
> 2.25.1
>


Re: [Patch, fortran] PR102689 revisited - Segfault with RESHAPE of CLASS as actual argument

2024-11-29 Thread Paul Richard Thomas
Hi Harald,

Sorry about that - it was the standard HEAD versus HEAD~ mistake.

Thanks for pointing it out.

Paul


On Fri, 29 Nov 2024 at 17:31, Harald Anlauf  wrote:

> Hi Paul,
>
> the patch seems to contain stuff that has already been pushed
> (gcc/testsuite/gfortran.dg/pr117768.f90, and the chunks in
> class.cc and resolve.cc).  Can you please check?
>
> Cheers,
> Harald
>
> Am 29.11.24 um 17:34 schrieb Paul Richard Thomas:
> > Hi All,
> >
> > This patch was originally pushed as r15-2739. Subsequently memory faults
> > were found and so the patch was reverted. At the time, I could find where
> > the problem lay. This morning I had another look and found it almost
> > immediately :-)
> >
> > The fix is the 'gfc_resize_class_size_with_len' in the chunk '@@ -1595,14
> > +1629,51 @@ gfc_trans_create_temp_array '. Without it,, half as much
> memory
> > as needed was being provided by the allocation and so accesses were
> > occurring outside the allocated space. Valgrind now reports no errors.
> >
> > Regression tests with flying colours - OK for mainline?
> >
> > Paul
> >
>
>
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index a458af322ce..870f2920ddc 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -1325,23 +1325,28 @@ get_array_ref_dim_for_loop_dim (gfc_ss *ss, int loop_dim)
is a class expression.  */
 
 static tree
-get_class_info_from_ss (stmtblock_t * pre, gfc_ss *ss, tree *eltype)
+get_class_info_from_ss (stmtblock_t * pre, gfc_ss *ss, tree *eltype,
+			gfc_ss **fcnss)
 {
+  gfc_ss *loop_ss = ss->loop->ss;
   gfc_ss *lhs_ss;
   gfc_ss *rhs_ss;
+  gfc_ss *fcn_ss = NULL;
   tree tmp;
   tree tmp2;
   tree vptr;
-  tree rhs_class_expr = NULL_TREE;
+  tree class_expr = NULL_TREE;
   tree lhs_class_expr = NULL_TREE;
   bool unlimited_rhs = false;
   bool unlimited_lhs = false;
   bool rhs_function = false;
+  bool unlimited_arg1 = false;
   gfc_symbol *vtab;
+  tree cntnr = NULL_TREE;
 
   /* The second element in the loop chain contains the source for the
- temporary; ie. the rhs of the assignment.  */
-  rhs_ss = ss->loop->ss->loop_chain;
+ class temporary created in gfc_trans_create_temp_array.  */
+  rhs_ss = loop_ss->loop_chain;
 
   if (rhs_ss != gfc_ss_terminator
   && rhs_ss->info
@@ -1350,28 +1355,58 @@ get_class_info_from_ss (stmtblock_t * pre, gfc_ss *ss, tree *eltype)
   && rhs_ss->info->data.array.descriptor)
 {
   if (rhs_ss->info->expr->expr_type != EXPR_VARIABLE)
-	rhs_class_expr
+	class_expr
 	  = gfc_get_class_from_expr (rhs_ss->info->data.array.descriptor);
   else
-	rhs_class_expr = gfc_get_class_from_gfc_expr (rhs_ss->info->expr);
+	class_expr = gfc_get_class_from_gfc_expr (rhs_ss->info->expr);
   unlimited_rhs = UNLIMITED_POLY (rhs_ss->info->expr);
   if (rhs_ss->info->expr->expr_type == EXPR_FUNCTION)
 	rhs_function = true;
 }
 
+  /* Usually, ss points to the function. When the function call is an actual
+ argument, it is instead rhs_ss because the ss chain is shifted by one.  */
+  *fcnss = fcn_ss = rhs_function ? rhs_ss : ss;
+
+  /* If this is a transformational function with a class result, the info
+ class_container field points to the class container of arg1.  */
+  if (class_expr != NULL_TREE
+  && fcn_ss->info && fcn_ss->info->expr
+  && fcn_ss->info->expr->expr_type == EXPR_FUNCTION
+  && fcn_ss->info->expr->value.function.isym
+  && fcn_ss->info->expr->value.function.isym->transformational)
+{
+  cntnr = ss->info->class_container;
+  unlimited_arg1
+	   = UNLIMITED_POLY (fcn_ss->info->expr->value.function.actual->expr);
+}
+
   /* For an assignment the lhs is the next element in the loop chain.
  If we have a class rhs, this had better be a class variable
- expression!  */
+ expression!  Otherwise, the class container from arg1 can be used
+ to set the vptr and len fields of the result class container.  */
   lhs_ss = rhs_ss->loop_chain;
-  if (lhs_ss != gfc_ss_terminator
-  && lhs_ss->info
-  && lhs_ss->info->expr
+  if (lhs_ss && lhs_ss != gfc_ss_terminator
+  && lhs_ss->info && lhs_ss->info->expr
   && lhs_ss->info->expr->expr_type ==EXPR_VARIABLE
   && lhs_ss->info->expr->ts.type == BT_CLASS)
 {
   tmp = lhs_ss->info->data.array.descriptor;
   unlimited_lhs = UNLIMITED_POLY (rhs_ss->info->expr);
 }
+  else if (cntnr != NULL_TREE)
+{
+  tmp = gfc_class_vptr_get (class_expr);
+  gfc_add_modify (pre, tmp, fold_convert (TREE_TYPE (tmp),
+	  gfc_class_vptr_get (cntnr)));
+  if (unlimited_rhs)
+	{
+	  tmp = gfc_class_len_get (class_expr);
+	  if (unlimited_arg1)
+	gfc_add_modify (pre, tmp, gfc_class_len_get (cntnr));
+	}
+  tmp = NULL_TREE;
+}
   else
 tmp = NULL_TREE;
 
@@ -1379,35 +1414,33 @@ get_class_info_from_ss (stmtblock_t * pre, gfc_ss *ss, tree *eltype)
   if (tmp != NULL_TREE && lhs_ss->loop_chain == gfc_ss_terminator)
 lhs_class_expr = gf

[PATCH 1/2] libstdc++: Improve test for synopsis

2024-11-29 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* testsuite/20_util/headers/utility/synopsis.cc: Add
declarations from C++11 and later.
---

It's a bit messy with all the macros, but I think it's still better to
have one test that runs as every -std mode than having 4+ tests that are
only valid in one or two -std modes. Maybe others disagree?

Tested x86_64-linux.

 .../20_util/headers/utility/synopsis.cc   | 108 +-
 1 file changed, 102 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/headers/utility/synopsis.cc 
b/libstdc++-v3/testsuite/20_util/headers/utility/synopsis.cc
index dddb54fd48a..51e88b70f51 100644
--- a/libstdc++-v3/testsuite/20_util/headers/utility/synopsis.cc
+++ b/libstdc++-v3/testsuite/20_util/headers/utility/synopsis.cc
@@ -20,6 +20,36 @@
 
 #include 
 
+#if __cplusplus < 201103L
+# define CONSTEXPR
+#else
+# define CONSTEXPR constexpr
+#endif
+
+#if __cplusplus < 201402L && ! defined(_GLIBCXX_RELEASE)
+# define CONSTEXPR11x
+#else
+# define CONSTEXPR11x constexpr
+#endif
+
+#if __cplusplus < 201402L
+# define CONSTEXPR14
+#else
+# define CONSTEXPR14 constexpr
+#endif
+
+#if __cplusplus < 201703L
+# define CONSTEXPR17
+#else
+# define CONSTEXPR17 constexpr
+#endif
+
+#if __cplusplus < 202002L
+# define CONSTEXPR20
+#else
+# define CONSTEXPR20 constexpr
+#endif
+
 namespace std {
   //  lib.operators, operators:
   namespace rel_ops {
@@ -29,18 +59,84 @@ namespace std {
 template bool operator>=(const T&, const T&);
   }
 
+#if __cplusplus >= 201103L
+#if 0
+  // N.B. our std::swap doesn't actually match this due to constraints on
+  // the template parameter.
+  template
+CONSTEXPR20
+void swap(T&, T&) noexcept(is_nothrow_move_constructible::value
+  && is_nothrow_move_assignable::value);
+#endif
+
+  template
+CONSTEXPR20
+void swap(T (&a)[N], T (&b)[N]) noexcept(noexcept(swap(*a, *b)));
+
+#if __cplusplus >= 201703L
+  template 
+CONSTEXPR20
+T exchange(T& obj, U&& new_val)
+#if defined _GLIBCXX_RELEASE // This noexcept is a libstdc++ extension.
+noexcept(__and_,
+   is_nothrow_assignable>::value)
+#endif
+;
+#endif
+
+  template
+CONSTEXPR11x
+T&& forward(typename remove_reference::type& t) noexcept;
+  template
+CONSTEXPR11x
+T&& forward(typename remove_reference::type&& t) noexcept;
+
+  template
+CONSTEXPR11x
+typename remove_reference::type&& move(T&& t) noexcept;
+
+  template
+CONSTEXPR17
+typename conditional< ! is_nothrow_move_constructible::value
+ && is_copy_constructible::value,
+ const T&, T&&>::type
+move_if_noexcept(T& x) noexcept;
+
+#if __cplusplus >= 201703L
+  template
+constexpr add_const_t& as_const(T& t) noexcept;
+#endif
+
+  template 
+typename add_rvalue_reference::type declval() noexcept;
+
+#if __cplusplus >= 201402L
+  template struct integer_sequence;
+#endif
+
+#endif // C++11
+
   //  lib.pairs, pairs:
   template  struct pair;
   template 
-  _GLIBCXX_CONSTEXPR bool operator==(const pair&, const pair&);
+  CONSTEXPR bool operator==(const pair&, const pair&);
   template 
-  _GLIBCXX_CONSTEXPR bool operator< (const pair&, const pair&);
+  CONSTEXPR bool operator< (const pair&, const pair&);
   template 
-  _GLIBCXX_CONSTEXPR bool operator!=(const pair&, const pair&);
+  CONSTEXPR bool operator!=(const pair&, const pair&);
   template 
-  _GLIBCXX_CONSTEXPR bool operator> (const pair&, const pair&);
+  CONSTEXPR bool operator> (const pair&, const pair&);
   template 
-  _GLIBCXX_CONSTEXPR bool operator>=(const pair&, const pair&);
+  CONSTEXPR bool operator>=(const pair&, const pair&);
   template 
-  _GLIBCXX_CONSTEXPR bool operator<=(const pair&, const pair&);
+  CONSTEXPR bool operator<=(const pair&, const pair&);
+
+#if __cplusplus >= 201103L
+  struct piecewise_construct_t;
+#if __cplusplus >= 201703L
+  struct in_place_t;
+  template struct in_place_type_t;
+  template struct in_place_index_t;
+#endif
+#endif
 }
-- 
2.47.0



[pushed][PR117770][LRA]: Check hard regs corresponding insn operands for hard reg clobbers

2024-11-29 Thread Vladimir Makarov

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117770

The patch was successfully tested and bootstrapped on x864_64, aarch64, 
ppc64le.




Re: gimplify: Handle void BIND_EXPR as asm input [PR100501]

2024-11-29 Thread Joseph Myers
On Fri, 29 Nov 2024, Richard Biener wrote:

> I think we're trying to handle errorneous cases by setting TREE_VALUE
> to error_mark_node
> before this, so how about the following instead?

Yes, that works, and also fixes the test in bug 100792 unlike my previous 
patch.  Here's a full, tested patch using your version.


gimplify: Handle void expression as asm input [PR100501, PR100792]

As reported in bug 100501 (plus duplicates), the gimplifier ICEs for C
tests involving a statement expression not returning a value as an asm
input; this includes the variant bug 100792 where the statement
expression ends with another asm statement.

The expected diagnostic for this case (as seen for C++ input) is one
coming from the gimplifier and so it seems reasonable to fix the
gimplifier to handle the GENERIC generated for this case by the C
front end, rather than trying to make the C front end detect it
earlier.  Thus the gimplifier to handle a void
expression like other non-lvalues for such a memory input.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  OK to commit?

PR c/100501
PR c/100792

gcc/
* gimplify.cc (gimplify_asm_expr): Handle void expressions for
memory inputs like other non-lvalues.

gcc/testsuite/
* gcc.dg/pr100501-1.c, gcc.dg/pr100792-1.c: New tests.
* gcc.dg/pr48552-1.c, gcc.dg/pr48552-2.c,
gcc.dg/torture/pr98601.c: Update expected errors.

Co-authored-by: Richard Biener 

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index fb0ca23bfb6c..aa99c0a98f73 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7453,7 +7453,8 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
  || TREE_CODE (inputv) == PREINCREMENT_EXPR
  || TREE_CODE (inputv) == POSTDECREMENT_EXPR
  || TREE_CODE (inputv) == POSTINCREMENT_EXPR
- || TREE_CODE (inputv) == MODIFY_EXPR)
+ || TREE_CODE (inputv) == MODIFY_EXPR
+ || VOID_TYPE_P (TREE_TYPE (inputv)))
TREE_VALUE (link) = error_mark_node;
  tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p,
is_gimple_lvalue, fb_lvalue | fb_mayfail);
diff --git a/gcc/testsuite/gcc.dg/pr100501-1.c 
b/gcc/testsuite/gcc.dg/pr100501-1.c
new file mode 100644
index ..152caac8b5d1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100501-1.c
@@ -0,0 +1,26 @@
+/* Test ICE for statement expression returning no value as asm input (bug
+   100501).  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+int x;
+int g ();
+
+void
+f ()
+{
+  __asm__ ("" : : "m" (({}))); /* { dg-error "memory input 0 is not directly 
addressable" } */
+  __asm__ ("" : : "m" (({ ; }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ (void) 0; }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ f (); }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ f (); f (); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+  __asm__ ("" : : "m" (({ x = g (); f (); }))); /* { dg-error "memory input 0 
is not directly addressable" } */
+  __asm__ ("" : : "m" (({ if (1) g (); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+  __asm__ ("" : : "m" (({ if (1) g (); else g (); }))); /* { dg-error "memory 
input 0 is not directly addressable" } */
+  __asm__ ("" : : "m" (({ test : goto test; }))); /* { dg-error "memory input 
0 is not directly addressable" } */
+  __asm__ ("" : : "m" (({ return; }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ while (1); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+  __asm__ ("" : : "m" (({ do {} while (1); }))); /* { dg-error "memory input 0 
is not directly addressable" } */
+  __asm__ ("" : : "m" (({ for (;;); }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ switch (x); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+}
diff --git a/gcc/testsuite/gcc.dg/pr100792-1.c 
b/gcc/testsuite/gcc.dg/pr100792-1.c
new file mode 100644
index ..52f3aaf83f73
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100792-1.c
@@ -0,0 +1,10 @@
+/* Test ICE for statement expression ending with asm as asm input (bug
+   100792).  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void
+f ()
+{
+  __asm__ ("" : : "m" (({ __asm__ (""); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+}
diff --git a/gcc/testsuite/gcc.dg/pr48552-1.c b/gcc/testsuite/gcc.dg/pr48552-1.c
index 4cd7c59011ef..b3ef453cb1d5 100644
--- a/gcc/testsuite/gcc.dg/pr48552-1.c
+++ b/gcc/testsuite/gcc.dg/pr48552-1.c
@@ -20,6 +20,7 @@ void
 f3 (void *x)
 {
   __asm volatile ("" : : "m" (*x));/* { dg-warning "dereferencing" } */
+  /* { dg-error "memory input 0 is not directly addressable" "not 

Re: [PATCH] ext-dce: Fix SIGN_EXTEND handling and cleanups [PR117360]

2024-11-29 Thread Jeff Law




On 11/29/24 1:43 AM, Jakub Jelinek wrote:

Hi!

This is mostly a blind attempt to fix the PR + various cleanups.
The PR is about a shift of a HOST_WIDE_INT by 127 invoking UB.

Most of carry_backpropagate works on GET_MODE_INNER of the operand,
mode is assigned
   enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
at the beginning and everything is done using that mode, so for
vector modes (or complex even?) we work with the element modes
rather than vector/complex modes.
But the SIGN_EXTEND handling does that inconsistently, it looks
at mode of the operand and uses GET_MODE_INNER in GET_MODE_MASK,
but doesn't use it in the shift.
The following patch appart from the cleanups fixes it by doing
essentially:
mode = GET_MODE (XEXP (x, 0));
if (mask & ~GET_MODE_MASK (GET_MODE_INNER (mode)))
-   mask |= 1ULL << (GET_MODE_BITSIZE (mode).to_constant () - 1);
+   mask |= 1ULL << (GET_MODE_BITSIZE (GET_MODE_INNER (mode)).to_constant 
() - 1);
i.e. also shifting by GET_MODE_BITSIZE of the GET_MODE_INNER of the
operand's mode.  We don't need to check if it is at most 64 bits,
at the start of the function we've already verified the result mode
is at most 64 bits and SIGN_EXTEND by definition extends from a narrower
mode.
Yea, the code was definitely not consistent in this regard.  Thanks for 
cleaning it up.





The rest of the patch are cleanups.  For HOST_WIDE_INT we have the
HOST_WIDE_INT_{UC,1U} macros, a HWI isn't necessarily unsigned long long,
so using ULL suffixes for it is weird.
Yea, the code clearly needed a lot of cleanups related to its constants. 
 I actually just cleaned up several similar problems in the CRC bits I 
committed earlier this week.  It's an easy issue to miss.




More importantly, the function does
   scalar_int_mode smode;
   if (!is_a  (mode, &smode)
   || GET_MODE_BITSIZE (smode) > HOST_BITS_PER_WIDE_INT)
 return mmask;
early, so we don't need to use GET_MODE_BITSIZE (mode) which is
a poly_int but can use GET_MODE_BITSIZE (smode) with the same value
but in unsigned short, so we don't need to use known_lt or .to_constant ()
everywhere.
Good cleanup.  I'd started working backwards from the bottom of the 
patch and had repeatedly asked myself if the changes were safe.  This 
explains it just fine.





Plus some formatting issues.

What I've left around is
   if (!GET_MODE_BITSIZE (GET_MODE (x)).is_constant ()
   || !GET_MODE_BITSIZE (GET_MODE (XEXP (x, 0))).is_constant ())
 return -1;
at the start of SIGN_EXTEND or ZERO_EXTEND, I'm afraid I don't know enough
about aarch64/riscv VL vectors to know why this is done (though even that
return -1; is weird, rest of the code does return mmask; if it wants to
punt.
It's just saying everything is live if we're presented with a mode that 
doesn't have a constant side.mmask, if wide enough, should work 
better in theory.





Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-29  Jakub Jelinek  

PR rtl-optimization/117360
* ext-dce.cc (ext_dce_process_sets): Use HOST_WIDE_INT_UC
macro instead of ULL suffixed constants.
(carry_backpropagate): Likewise.  Use HOST_WIDE_INT_1U instead of
1ULL.  Use GET_MODE_BITSIZE (smode) instead of
GET_MODE_BITSIZE (mode) and with that avoid having to use
known_lt instead of < or use .to_constant ().  Formatting fixes.
(case SIGN_EXTEND): Set mode to GET_MODE_INNER (GET_MODE (XEXP (x, 0)))
rather than GET_MODE (XEXP (x, 0)) and don't use GET_MODE_INNER (mode).
(ext_dce_process_uses): Use HOST_WIDE_INT_UC macro instead of ULL
suffixed constants.
Thanks.  I hadn't forgotten this, but was still a few days away from 
being able to focus on it.


I still have other concerns about this code (the carry_backpropagate 
bits), but your changes don't impact those concerns at all.


OK for the trunk.

jeff



Re: [PATCH] PR target/117669 - RISC-V:The 'VEEWTRUNC4' iterator 'RVVMF2BF' type condition error

2024-11-29 Thread Hans-Peter Nilsson
On Wed, 20 Nov 2024, Feng Wang wrote:

> This patch fix the wrong condition for RVVMF2BF. It should be
> TARGET_VECTOR_ELEN_BF_16.
> gcc/ChangeLog:
> 
>   PR target/117669
>   * config/riscv/vector-iterators.md:
> 
> Signed-off-by: Feng Wang 

There's missing text after the ":", where one would expect the 
line to be something like:

* config/riscv/vector-iterators.md (RVVMF2BF): Correct condition.

Too late to fix now as the approval and commit was quick, but 
please *edit* the result of contrib/mklog.py, it's not final.  
The commit checker is known to have unfortunate flaws.  (Can't 
rule out an ":" at the end of a line, it is sometimes valid.)

brgds, H-P

> ---
>  gcc/config/riscv/vector-iterators.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 6a621459cc4..92cb651ce49 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -365,7 +365,7 @@
>  
>(RVVM2BF "TARGET_VECTOR_ELEN_BF_16")
>(RVVM1BF "TARGET_VECTOR_ELEN_BF_16")
> -  (RVVMF2BF "TARGET_VECTOR_ELEN_FP_16")
> +  (RVVMF2BF "TARGET_VECTOR_ELEN_BF_16")
>(RVVMF4BF "TARGET_VECTOR_ELEN_BF_16 && TARGET_MIN_VLEN > 32 && 
> TARGET_64BIT")
>  
>(RVVM2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_64BIT")
> -- 
> 2.17.1
> 
> 


Re: [PATCH 1/2] c++: some further concepts cleanups

2024-11-29 Thread Jason Merrill

On 11/27/24 2:41 PM, Patrick Palka wrote:

On Tue, 5 Nov 2024, Jason Merrill wrote:


On 10/15/24 12:45 AM, Patrick Palka wrote:

This patch further cleans up the concepts code following the removal of
Concepts TS support:

* concept-ids are now the only kind of "concept check", so we can
  simplify some code accordingly.  In particular resolve_concept_check
  seems like a no-op and can be removed.
* In turn, deduce_constrained_parameter doesn't seem to do anything
  interesting.
* In light of the above we might as well inline finish_type_constraints
  into its only caller.  Note that the "prototype parameter" of a
  concept is just the first template parameter which the caller can
  easily obtain itself.


But it's still a defined term in the language
(https://eel.is/c++draft/temp#concept-7) so I think it's worth having an
accessor function.  I agree with doing away with the function that returns a
pair.


Done.




* placeholder_extract_concept_and_args is only ever called on a
  concept-id, so it's simpler to inline it into its callers.
* There's no such thing as a template-template-parameter wtih a
  type-constraint, so we can remove such handling from the parser.
  This means is_constrained_parameter is currently equivalent to
  declares_constrained_template_template_parameter, so let's prefer
  to use the latter.


Why prefer the longer name?


"is_constrained_parameter" suggests it should return true for a
constrained non-type parameter, but that's currently not the case and
callers don't expect/want that behavior, so the longer name seems
more accurate.




We might be able to remove WILDCARD_DECL and CONSTRAINED_PARM_PROTOTYPE
now as well, but I left that as future work.

@@ -18901,7 +18842,8 @@ cp_parser_template_parameter (cp_parser* parser,
bool *is_non_type,
   }
   /* The parameter may have been constrained type parameter.  */
-  if (is_constrained_parameter (parameter_declarator))
+  tree type = parameter_declarator->decl_specifiers.type;
+  if (declares_constrained_type_template_parameter (type))


Why not retain a function that takes the declarator?


I added another overload of declares_constrained_type_template_parameter
taking a cp_parameter_declarator.




   return finish_constrained_parameter (parser,
parameter_declarator,
is_non_type);
@@ -20987,11 +20929,12 @@ cp_parser_placeholder_type_specifier (cp_parser
*parser, location_t loc,
 tsubst_flags_t complain = tentative ? tf_none : tf_warning_or_error;
   /* Get the concept and prototype parameter for the constraint.  */
-  tree_pair info = finish_type_constraints (tmpl, args, complain);
-  tree con = info.first;
-  tree proto = info.second;
-  if (con == error_mark_node)
+  tree check = build_type_constraint (tmpl, args, complain);
+  if (check == error_mark_node)
   return error_mark_node;
+  tree con = STRIP_TEMPLATE (tmpl);
+  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (tmpl);
+  tree proto = TREE_VALUE (TREE_VEC_ELT (parms, 0));


And as mentioned above, let's have a small function that returns the prototype
parameter of a concept, to use here.


Done.



Incidentally, why do we want to strip the template from con?  Is that also a
relic of the different possible forms of concept?


Oops, I think we could use DECL_TEMPLATE_RESULT here instead of STRIP_TEMPLATE,
since 'tmpl' will always be a TEMPLATE_DECL.

Here's v2 which squashes the 3rd patch (removing WILDCARD_DECL) into
this one, and addresses your feedback.


OK.


-- >8 --

Subject: [PATCH] c++: some further concepts cleanups

This patch further cleans up the concepts code following the removal of
Concepts TS support:

   * concept-ids are now the only kind of "concept check", so we can
 simplify some code accordingly.  In particular resolve_concept_check
 seems like a no-op and can be removed.
   * In turn, deduce_constrained_parameter doesn't seem to do anything
 interesting.
   * In light of the above we might as well inline finish_type_constraints
 into its only caller.
   * Introduce and use a helper for obtaining the prototype parameter of
 a concept, i.e. its first template parameter.
   * placeholder_extract_concept_and_args is only ever called on a
 concept-id, so it's simpler to inline it into its callers.
   * There's no such thing as a template-template-parameter with a
 type-constraint, so we can remove such handling from the parser.
 This means is_constrained_parameter is currently equivalent to
 declares_constrained_type_template_parameter, so let's prefer
 to use the latter.
   * Remove WILDCARD_DECL and instead use the concept's prototype parameter
 as the dummy first argument of a type-constraint during template
 argument coercion.
   * Remove a redundant concept_definition_p overload.

gcc/cp/ChangeLog:

* constraint.cc (resol

[PATCH] gimplefe: Error recovery for invalid declarations [PR117749]

2024-11-29 Thread Andrew Pinski
c_parser_declarator can return null if there was an error,
but c_parser_gimple_declaration was not ready for that.
This fixes that oversight so we don't get an ICE after the error.

Bootstrapped and tested on x86_64-linux-gnu.

PR c/117749

gcc/c/ChangeLog:

* gimple-parser.cc (c_parser_gimple_declaration): Check
declarator to be non-null.

gcc/testsuite/ChangeLog:

* gcc.dg/gimplefe-55.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/c/gimple-parser.cc | 12 ++--
 gcc/testsuite/gcc.dg/gimplefe-55.c | 11 +++
 2 files changed, 17 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-55.c

diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
index 4763cf23313..78e85d93487 100644
--- a/gcc/c/gimple-parser.cc
+++ b/gcc/c/gimple-parser.cc
@@ -2208,7 +2208,12 @@ c_parser_gimple_declaration (gimple_parser &parser)
specs->typespec_kind != ctsk_none,
C_DTR_NORMAL, &dummy);
 
-  if (c_parser_next_token_is (parser, CPP_SEMICOLON))
+  if (!c_parser_next_token_is (parser, CPP_SEMICOLON))
+{
+  c_parser_error (parser, "expected %<;%>");
+  return;
+}
+  if (declarator)
 {
   /* Handle SSA name decls specially, they do not go into the identifier
  table but we simply build the SSA name for later lookup.  */
@@ -2253,11 +2258,6 @@ c_parser_gimple_declaration (gimple_parser &parser)
 NULL_TREE);
}
 }
-  else
-{
-  c_parser_error (parser, "expected %<;%>");
-  return;
-}
 }
 
 /* Parse gimple goto statement.  */
diff --git a/gcc/testsuite/gcc.dg/gimplefe-55.c 
b/gcc/testsuite/gcc.dg/gimplefe-55.c
new file mode 100644
index 000..120f4ec0ac9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-55.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+
+/* PR c/117749 */
+/* Make sure we don't ICE after not have a full local
+   declaration in gimple fe. */
+
+__GIMPLE
+void foo ( ) {
+  int ;  /* { dg-error "" } */
+}
-- 
2.43.0



Re: [PATCH] gimplefe: Error recovery for invalid declarations [PR117749]

2024-11-29 Thread Richard Biener



> Am 30.11.2024 um 05:44 schrieb Andrew Pinski :
> 
> c_parser_declarator can return null if there was an error,
> but c_parser_gimple_declaration was not ready for that.
> This fixes that oversight so we don't get an ICE after the error.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

Richard 

>PR c/117749
> 
> gcc/c/ChangeLog:
> 
>* gimple-parser.cc (c_parser_gimple_declaration): Check
>declarator to be non-null.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/gimplefe-55.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/c/gimple-parser.cc | 12 ++--
> gcc/testsuite/gcc.dg/gimplefe-55.c | 11 +++
> 2 files changed, 17 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/gimplefe-55.c
> 
> diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
> index 4763cf23313..78e85d93487 100644
> --- a/gcc/c/gimple-parser.cc
> +++ b/gcc/c/gimple-parser.cc
> @@ -2208,7 +2208,12 @@ c_parser_gimple_declaration (gimple_parser &parser)
>specs->typespec_kind != ctsk_none,
>C_DTR_NORMAL, &dummy);
> 
> -  if (c_parser_next_token_is (parser, CPP_SEMICOLON))
> +  if (!c_parser_next_token_is (parser, CPP_SEMICOLON))
> +{
> +  c_parser_error (parser, "expected %<;%>");
> +  return;
> +}
> +  if (declarator)
> {
>   /* Handle SSA name decls specially, they do not go into the identifier
>  table but we simply build the SSA name for later lookup.  */
> @@ -2253,11 +2258,6 @@ c_parser_gimple_declaration (gimple_parser &parser)
> NULL_TREE);
>}
> }
> -  else
> -{
> -  c_parser_error (parser, "expected %<;%>");
> -  return;
> -}
> }
> 
> /* Parse gimple goto statement.  */
> diff --git a/gcc/testsuite/gcc.dg/gimplefe-55.c 
> b/gcc/testsuite/gcc.dg/gimplefe-55.c
> new file mode 100644
> index 000..120f4ec0ac9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/gimplefe-55.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple" } */
> +
> +/* PR c/117749 */
> +/* Make sure we don't ICE after not have a full local
> +   declaration in gimple fe. */
> +
> +__GIMPLE
> +void foo ( ) {
> +  int ;  /* { dg-error "" } */
> +}
> --
> 2.43.0
> 


Go patch committed: Work around warning to fix bootstrap

2024-11-29 Thread Ian Lance Taylor
This patch to the Go frontend increases the size of a temporary buffer
to avoid a new warning.  The warning was breaking bootstrapping with
Go.

GCC has a new -Wformat-truncation warning that triggers on some Go
frontend code:

../../gcc/go/gofrontend/go-encode-id.cc: In function 'std::string
go_encode_id(const std::string&)':
../../gcc/go/gofrontend/go-encode-id.cc:176:48: error: '%02x'
directive output may be truncated writing between 2 and 8 bytes into a
region of size 6 [-Werror=format-truncation=]
  176 |   snprintf(buf, sizeof buf, "_x%02x", c);
  |^~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:45: note: directive
argument in the range [128, 4294967295]
  176 |   snprintf(buf, sizeof buf, "_x%02x", c);
  | ^~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:27: note: 'snprintf'
output between 5 and 11 bytes into a destination of size 8
  176 |   snprintf(buf, sizeof buf, "_x%02x", c);
  |   ^~

The code is safe, because the value of c is known to be >= 0 && <=
0xff.  But it's difficult for the compiler to know that.  This patch
bumps the buffer size to avoid the warning.

This fixes https://gcc.gnu.org/PR117833.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
1021933a9ff6e15e982c858766c90cc8e58a103a
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 59badf80f40..3bd755ce515 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-f9ea9801058aa98a421784da12b76cda0b4c6cf2
+dfe585bf82380630697e96c249de825c5f655afe
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/go-encode-id.cc 
b/gcc/go/gofrontend/go-encode-id.cc
index 7ab65f513b3..5c82aa74533 100644
--- a/gcc/go/gofrontend/go-encode-id.cc
+++ b/gcc/go/gofrontend/go-encode-id.cc
@@ -172,7 +172,7 @@ go_encode_id(const std::string &id)
}
  else
{
- char buf[8];
+ char buf[16];
  snprintf(buf, sizeof buf, "_x%02x", c);
  ret.append(buf);
}


  1   2   >