date:20250711

[PATCH v2] c++: P2036R3 - Change scope of lambda trailing-return-type [PR102610]

2025-07-11 Thread Marek Polacek

On Thu, Jul 10, 2025 at 02:13:06PM -0400, Jason Merrill wrote:
> On 7/9/25 4:27 PM, Marek Polacek wrote:
> > On Tue, Jul 08, 2025 at 12:15:03PM -0400, Jason Merrill wrote:
> > > On 7/7/25 4:52 PM, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > This patch is an attempt to implement P2036R3 along with P2579R0, fixing
> > > > build breakages caused by P2036R3.
> > > > 
> > > > The simplest example is:
> > > > 
> > > > auto counter1 = [j=0]() mutable -> decltype(j) {
> > > > return j++;
> > > > };
> > > > 
> > > > which currently doesn't compile because the 'j' in the capture isn't
> > > > visible in the trailing return type.  With these proposals, the 'j'
> > > > will be in a lambda scope which spans the trailing return type, so
> > > > this test will compile.
> > > > 
> > > > This oughtn't be difficult but decltype and other issues made this patch
> > > > much more challenging.
> > > > 
> > > > We have to push the explicit captures before going into 
> > > > _lambda_declarator_opt
> > > > because that is what parses the trailing return type.  Yet we can't 
> > > > build
> > > > any captures until after _lambda_body -> start_lambda_function which
> > > > creates the lambda's operator(), without which we can't build a proxy,
> > > > but _lambda_body happens only after parsing the declarator.  This patch
> > > > works around it by creating a fake operator() in make_dummy_lambda_op.
> > > 
> > > I was thinking that we could build the real operator() earlier, before the
> > > trailing return type, so that it's there for the above uses, and then 
> > > splice
> > > in the trailing return type to the already-built function declaration,
> > > perhaps with apply_deduced_return_type.
> > 
> > Ah, I see what you mean.  But it's not just the return type that we don't
> > have at the point where we have to have the operator(): it's also tx_qual,
> > exception_spec, std_attrs, and trailing_requires_clause.  Especially the
> > requires clause seems to be awkward to set post grokmethod; it seems I'd
> > have to replicate the flag_concepts block in grokfndecl?
> > 
> > Maybe I could add (by that I mean add it to the lambda via
> > finish_member_declaration) a bare bones operator() for the purposes of
> > parsing the return type/noexcept/requires, then after parsing them
> > construct a real operator(), then find a slot of the bare bones op(),
> > and replace it with the complete one.  I'm not sure if that makes sense
> > to do though.
> 
> I was hoping to avoid building more than one op().  But really, why do you
> need an op() at all for building the proxies?  Could you use
> build_dummy_object instead of DECL_ARGUMENTS of some fake op()?

The problem is that we need operator() to be the var's DECL_CONTEXT
for is_capture_proxy:

  && LAMBDA_FUNCTION_P (DECL_CONTEXT (decl)));
 
> > > > Another thing is that in "-> decltype(j)" we don't have the right
> > > > current_function_decl yet, so I've added the in_lambda_declarator_p flag
> > > > to be used in finish_decltype_type so that we know this decltype 
> > > > appertains
> > > > to a lambda -- then current_lambda_expr should give us the right lambda,
> > > > which has another new flag tracking whether mutable was seen.
> 
> The flag to finish_decltype_type seems unneeded; we should be able to tell
> from the proxy that it belongs to a lambda.  And I would think that the new
> handling in finish_decltype_type seems right in general; always refer to
> current_lambda_expr instead of current_function_decl, etc.

Good point.  I've removed the flag and simplified the patch quite a bit.
However:
- to honor [expr.prim.id.unqual]/4, I have to know if the decltype is
  in the lambda's parameter-declaration-clause or not:

[=]() -> decltype((x))  // float const&
   
[=](decltype((x)) y)// float&

  so I'm using LAMBDA_EXPR_CONST_QUAL_P for that.
- if we want to handle nested lambdas correctly:

   [=](decltype((x)) y) {}  // float&

   [=] {
 [](decltype((x)) y) {};  // float const&
   }

  we probably will need a new flag for decltype.  
 
> > @@ -3351,8 +3351,12 @@ check_local_shadow (tree decl)
> > }
> >/* Don't complain if it's from an enclosing function.  */
> >else if (DECL_CONTEXT (old) == current_function_decl
> > -  && TREE_CODE (decl) != PARM_DECL
> > -  && TREE_CODE (old) == PARM_DECL)
> > +  && ((TREE_CODE (decl) != PARM_DECL
> > +   && TREE_CODE (old) == PARM_DECL)
> > +  || (is_capture_proxy (old)
> > +  && current_lambda_expr ()
> > +  && DECL_CONTEXT (old)
> > + == lambda_function (current_lambda_expr ()
> 
> What case is this handling?  Doesn't the previous if already deal with
> parm/capture collision?

The proposal says that

  [x=1]{ int x; }

is invalid, so I wanted to give an error for it.  But since -Wshadow
warns for the case above,

Re: [PATCH] c, c++: Extend -Wunused-but-set-* warnings [PR44677]

2025-07-11 Thread Jason Merrill


On 7/11/25 2:17 PM, Jakub Jelinek wrote:

On Fri, Jul 11, 2025 at 12:26:54PM -0400, Jason Merrill wrote:

On 7/11/25 9:09 AM, Jakub Jelinek wrote:

On Thu, Jul 10, 2025 at 04:35:49PM -0400, Jason Merrill wrote:

--- gcc/cp/cp-gimplify.cc.jj2025-04-12 21:41:42.660924514 +0200
+++ gcc/cp/cp-gimplify.cc   2025-04-23 21:33:19.050931604 +0200
@@ -3200,7 +3200,23 @@ cp_fold (tree x, fold_flags_t flags)
  loc = EXPR_LOCATION (x);
  op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops, flags);
+  bool clear_decl_read;
+  clear_decl_read = false;
+  if (code == MODIFY_EXPR
+ && (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
+ && !DECL_READ_P (op0)
+ && (VAR_P (op0) ? warn_unused_but_set_variable
+ : warn_unused_but_set_parameter) > 2
+ && BINARY_CLASS_P (TREE_OPERAND (x, 1))
+ && TREE_OPERAND (TREE_OPERAND (x, 1), 0) == op0)
+   {
+ mark_exp_read (TREE_OPERAND (TREE_OPERAND (x, 1), 1));
+ if (!DECL_READ_P (op0))
+   clear_decl_read = true;
+   }
  op1 = cp_fold_rvalue (TREE_OPERAND (x, 1), flags);
+  if (clear_decl_read)
+   DECL_READ_P (op0) = 0;


Why does this need to happen in cp_fold?  Weren't the flags set properly at
build time?


Without the cp-gimplify.cc (cp_fold) hunk there are tons of FAILs, with


Then could we do something simpler at this point, just preserving the
pre-folding state of DECL_READ_P (op0) without looking at the RHS?


That would be wrong.
The goal of the patch is to just add some extra exceptions which aren't
counted as uses.
If a MODIFY_EXPR is say
op0 = foo (op0);
then we shouldn't clear DECL_READ_P, the call does or could read the value
and use it for something, so warning that op0 is just set but not used would
be false positive.


But by the time we get to cp_fold, DECL_READ_P should have already been 
set appropriately when we built the thing we're now folding.  And 
calling mark_exp_read on foo(op0) won't mark op0 anyway; it doesn't 
recurse.  I don't see any regressions on Wunused* after


diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 9a98628d9e8..addbc29d104 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3221,16 +3221,8 @@ cp_fold (tree x, fold_flags_t flags)
   clear_decl_read = false;
   if (code == MODIFY_EXPR
  && (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
- && !DECL_READ_P (op0)
- && (VAR_P (op0) ? warn_unused_but_set_variable
- : warn_unused_but_set_parameter) > 2
- && BINARY_CLASS_P (TREE_OPERAND (x, 1))
- && TREE_OPERAND (TREE_OPERAND (x, 1), 0) == op0)
-   {
- mark_exp_read (TREE_OPERAND (TREE_OPERAND (x, 1), 1));
- if (!DECL_READ_P (op0))
-   clear_decl_read = true;
-   }
+ && !DECL_READ_P (op0))
+   clear_decl_read = true;
   op1 = cp_fold_rvalue (TREE_OPERAND (x, 1), flags);
   if (clear_decl_read)
DECL_READ_P (op0) = 0;


@@ -3740,7 +3740,28 @@ finish_unary_op_expr (location_t op_loc,
  tree expr_ovl = expr;
  if (!processing_template_decl)
-expr_ovl = cp_fully_fold (expr_ovl);
+switch (code)
+  {
+  case PREINCREMENT_EXPR:
+  case PREDECREMENT_EXPR:
+  case POSTINCREMENT_EXPR:
+  case POSTDECREMENT_EXPR:
+   tree stripped_expr;
+   stripped_expr = tree_strip_any_location_wrapper (expr);
+   if ((VAR_P (stripped_expr) || TREE_CODE (stripped_expr) == PARM_DECL)
+   && !DECL_READ_P (stripped_expr)
+   && (VAR_P (stripped_expr) ? warn_unused_but_set_variable
+ : warn_unused_but_set_parameter) > 1)
+ {
+   expr_ovl = cp_fully_fold (expr_ovl);


Again I wonder why cp_fold is setting DECL_READ_P.


See above.


Ah, the problem here is that cp_fully_fold assumes that we're using its
operand as an rvalue, which is wrong for ++/--.  And we aren't going to get
a constant result anyway, so finish_unary_op_expr shouldn't call
cp_fully_fold, it should return result early for lvalue codes.


Well, I'm not sure it is actually an error.  finish_unary_op_expr doesn't
use cp_fully_fold result as an operand of {PRE,POST}{INC,DEC}REMENT_EXPR
(that would be wrong, we don't want the operand to fold into non-lvalue), it
is called only to find out if overflow warning should be emitted.


But it only warns if the whole expression folds to a constant, which can 
never happen for these tree codes.  So folding the operand is useless.


Jason

[PATCH] testsuite: Fix overflow in gcc.dg/vect/pr116125.c

2025-07-11 Thread Siddhesh Poyarekar

The test ends up writing a byte beyond bounds of the buffer, which gets
trapped on some targets when the test is run with
-fstack-protector-strong.

testsuite/ChangeLog:

* gcc.dg/vect/pr116125.c (mem_overlap): Reduce iteration count
to 8.

Signed-off-by: Siddhesh Poyarekar 
---
OK for trunk and backport to gcc-15?

 gcc/testsuite/gcc.dg/vect/pr116125.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr116125.c 
b/gcc/testsuite/gcc.dg/vect/pr116125.c
index eab9efdc061..2f45ac3edc1 100644
--- a/gcc/testsuite/gcc.dg/vect/pr116125.c
+++ b/gcc/testsuite/gcc.dg/vect/pr116125.c
@@ -8,7 +8,7 @@ struct st
 void __attribute__((noipa))
 mem_overlap (struct st *a, struct st *b)
 {
-  for (int i = 0; i < 9; i++)
+  for (int i = 0; i < 8; i++)
 a[i].num = b[i].num + 1;
 }
 
-- 
2.50.0

[pushed]PR121007, LRA]: Fall back to reload of whole inner address in PR case and constrain iteration number of address reloads

2025-07-11 Thread Vladimir Makarov


The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121007

The patch was successfully bootstrapped and tested on amd64, arm64, ppc64le.

commit 06c41504bd4a23c3f5848793fda503c30fe51353
Author: Vladimir N. Makarov 
Date:   Fri Jul 11 11:27:54 2025 -0400

[PR121007, LRA]: Fall back to reload of whole inner address in PR case and constrain iteration number of address reloads

gcc/ChangeLog:

* lra-constraints.cc (process_address_1): When changing base reg
on a reg of the base class, fall back to reload of whole inner address.
(process_address): Constrain the iteration number.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr121007.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 68aaf863a97..274b52cd617 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3930,6 +3930,16 @@ process_address_1 (int nop, bool check_only_p,
   enum reg_class cl;
   rtx set;
   rtx_insn *insns, *last_insn;
+
+  cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
+			   get_index_code (&ad), curr_insn);
+
+  if (REG_P (*ad.base_term)
+	  && ira_class_subset_p[get_reg_class (REGNO (*ad.base_term))][cl])
+	/* It seems base reg is already in the base reg class and changing it
+	   does not make a progress.  So reload the whole inner address.  */
+	goto reload_inner_addr;
+
   /* Try to reload base into register only if the base is invalid
  for the address but with valid offset, case (4) above.  */
   start_sequence ();
@@ -3975,8 +3985,6 @@ process_address_1 (int nop, bool check_only_p,
 	{
 	  *ad.base_term = XEXP (SET_SRC (set), 0);
 	  *ad.disp_term = XEXP (SET_SRC (set), 1);
-	  cl = base_reg_class (ad.mode, ad.as, ad.base_outer_code,
-   get_index_code (&ad), curr_insn);
 	  regno = REGNO (*ad.base_term);
 	  if (regno >= FIRST_PSEUDO_REGISTER
 		  && cl != lra_get_allocno_class (regno))
@@ -4019,11 +4027,11 @@ process_address_1 (int nop, bool check_only_p,
 }
   else
 {
-  enum reg_class cl = base_reg_class (ad.mode, ad.as,
-	  SCRATCH, SCRATCH,
-	  curr_insn);
-  rtx addr = *ad.inner;
-
+  enum reg_class cl;
+  rtx addr;
+reload_inner_addr:
+  cl = base_reg_class (ad.mode, ad.as, SCRATCH, SCRATCH, curr_insn);
+  addr = *ad.inner;
   new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, NULL, "addr");
   /* addr => new_base.  */
   lra_emit_move (new_reg, addr);
@@ -4044,14 +4052,21 @@ process_address (int nop, bool check_only_p,
 		 rtx_insn **before, rtx_insn **after)
 {
   bool res = false;
-
-  while (process_address_1 (nop, check_only_p, before, after))
+  /* Use enough iterations to process all address parts:  */
+  for (int i = 0; i < 10; i++)
 {
-  if (check_only_p)
-	return true;
-  res = true;
+  if (!process_address_1 (nop, check_only_p, before, after))
+	{
+	  return res;
+	}
+  else
+	{
+	  if (check_only_p)
+	return true;
+	  res = true;
+	}
 }
-  return res;
+  fatal_insn ("unable to reload address in ", curr_insn);
 }
 
 /* Override the generic address_reload_context in order to
diff --git a/gcc/testsuite/gcc.target/powerpc/pr121007.c b/gcc/testsuite/gcc.target/powerpc/pr121007.c
new file mode 100644
index 000..9e6b1be7911
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr121007.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+typedef struct { int a; } A;
+unsigned char *a;
+char b;
+int c;
+void foo (vector char, vector char, vector char);
+
+void
+bar (long stride)
+{
+  vector char v0, v1, v2, v3, v5;
+  vector char r0 = __builtin_vec_vsx_ld (0, a);
+  vector char r2 = __builtin_vec_vsx_ld (2 * stride, a - 3);
+  vector char r3 = __builtin_vec_vsx_ld (3 * stride, a - 3);
+  vector char r4;
+  vector char r6 = __builtin_vec_vsx_ld (6 * stride, a - 3);
+  vector char r7 = __builtin_vec_vsx_ld (7 * stride, a - 3);
+  vector char r14, h, i, j;
+  if (b)
+return;
+  v1 = __builtin_vec_vsx_ld (9 * stride, a);
+  v2 = __builtin_vec_vsx_ld (10 * stride, a - 3);
+  v3 = __builtin_vec_vsx_ld (11 * stride, a - 3);
+  r3 = __builtin_vec_mergeh (r3, v3);
+  v5 = __builtin_vec_mergel (r2, r6);
+  r14 = __builtin_vec_mergeh (r3, r7);
+  r4 = __builtin_vec_mergeh (v2, r14);
+  v0 = __builtin_vec_mergeh (r0, r4);
+  union { unsigned char a[16]; A b; } temp;
+  vector signed char k;
+  h = __builtin_vec_ld (0, temp.a);
+  i = __builtin_vec_splat (h, 1);
+  temp.b.a = c;
+  k = __builtin_vec_ld (0, (signed char *) temp.a);
+  j = __builtin_vec_and (i, (vector char) k);
+  foo (v1, v0, j);
+  foo (v1, v5, j);
+}

Re: [PATCH] aarch64: Support unpacked SVE integer division

2025-07-11 Thread Remi Machet


On 7/11/25 08:21, Spencer Abson wrote:

External email: Use caution opening links or attachments


This patch extends the existing patterns for SVE_INT_BINARY_SD to
support partial SVE integer modes, including those implement the
conditional form.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (3): Extend
to SVE_SDI_SIMD.
(@aarch64_pred_): Likewise.
(@cond_): Extend to SVE_SDI.
(*cond__2): Likewise.
(*cond__3): Likewise.
(*cond__any): Likewise.
* config/aarch64/iterators.md (SVE_SDI): New iterator for
all SVE vector modes with 32-bit or 64-bit elements.
(SVE_SDI_SIMD): New iterator.  As above, but including
V4SI and V2DI.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/cond_arith_1.C: Rename TEST_SHIFT
to TEST_OP, add tests for SDIV and UDIV.
* g++.target/aarch64/sve/cond_arith_2.C: Likewise.
* g++.target/aarch64/sve/cond_arith_3.C: Likewise.
* g++.target/aarch64/sve/cond_arith_4.C: Likewise.
* gcc.target/aarch64/sve/div_2.c: New test.

---

Bootstrapped & regtested on aarch64-linux-gnu.  OK for master?

Thanks,
Spencer

---
 gcc/config/aarch64/aarch64-sve.md | 64 +--
 gcc/config/aarch64/iterators.md   |  7 ++
 .../g++.target/aarch64/sve/cond_arith_1.C | 25 +---
 .../g++.target/aarch64/sve/cond_arith_2.C | 25 +---
 .../g++.target/aarch64/sve/cond_arith_3.C | 27 +---
 .../g++.target/aarch64/sve/cond_arith_4.C | 27 +---
 gcc/testsuite/gcc.target/aarch64/sve/div_2.c  | 22 +++
 7 files changed, 127 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_2.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 6b5113eb70f..871b31623bb 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4712,12 +4712,12 @@
 ;; We can use it with Advanced SIMD modes to expose the V2DI and V4SI
 ;; optabs to the midend.
 (define_expand "3"
-  [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand")
-   (unspec:SVE_FULL_SDI_SIMD
+  [(set (match_operand:SVE_SDI_SIMD 0 "register_operand")
+   (unspec:SVE_SDI_SIMD
  [(match_dup 3)
-  (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD
-(match_operand:SVE_FULL_SDI_SIMD 1 "register_operand")
-(match_operand:SVE_FULL_SDI_SIMD 2 "register_operand"))]
+  (SVE_INT_BINARY_SD:SVE_SDI_SIMD
+(match_operand:SVE_SDI_SIMD 1 "register_operand")
+(match_operand:SVE_SDI_SIMD 2 "register_operand"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
   {
@@ -4727,12 +4727,12 @@

 ;; Integer division predicated with a PTRUE.
 (define_insn "@aarch64_pred_"
-  [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand")
-   (unspec:SVE_FULL_SDI_SIMD
+  [(set (match_operand:SVE_SDI_SIMD 0 "register_operand")
+   (unspec:SVE_SDI_SIMD
  [(match_operand: 1 "register_operand")
-  (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD
-(match_operand:SVE_FULL_SDI_SIMD 2 "register_operand")
-(match_operand:SVE_FULL_SDI_SIMD 3 "register_operand"))]
+  (SVE_INT_BINARY_SD:SVE_SDI_SIMD
+(match_operand:SVE_SDI_SIMD 2 "register_operand")
+(match_operand:SVE_SDI_SIMD 3 "register_operand"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
   {@ [ cons: =0 , 1   , 2 , 3 ; attrs: movprfx ]
@@ -4744,25 +4744,25 @@

 ;; Predicated integer division with merging.
 (define_expand "@cond_"
-  [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
-   (unspec:SVE_FULL_SDI
+  [(set (match_operand:SVE_SDI 0 "register_operand")
+   (unspec:SVE_SDI
  [(match_operand: 1 "register_operand")
-  (SVE_INT_BINARY_SD:SVE_FULL_SDI
-(match_operand:SVE_FULL_SDI 2 "register_operand")
-(match_operand:SVE_FULL_SDI 3 "register_operand"))
-  (match_operand:SVE_FULL_SDI 4 "aarch64_simd_reg_or_zero")]
+  (SVE_INT_BINARY_SD:SVE_SDI
+(match_operand:SVE_SDI 2 "register_operand")
+(match_operand:SVE_SDI 3 "register_operand"))
+  (match_operand:SVE_SDI 4 "aarch64_simd_reg_or_zero")]
  UNSPEC_SEL))]
   "TARGET_SVE"
 )

 ;; Predicated integer division, merging with the first input.
 (define_insn "*cond__2"
-  [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
-   (unspec:SVE_FULL_SDI
+  [(set (match_operand:SVE_SDI 0 "register_operand")
+   (unspec:SVE_SDI
  [(match_operand: 1 "register_operand")
-  (SVE_INT_BINARY_SD:SVE_FULL_SDI
-(match_operand:SVE_FULL_SDI 2 "register_operand")
-(match_operand:SVE_FULL_SDI 3 "register_operand"))
+  (SVE_INT_BINARY_SD:SVE_SDI
+(match_operand:SVE_SDI 2 "register_operand")
+(match_operand:SVE_SDI 3 "register_operand"))
   (match_dup 2)]
  UNSPEC_SEL))]
   "TARGET_SVE"
@@ -4774,1

Re: [PATCH] c++, v3: Implement C++26 P2786R13 - Trivial Relocatability [PR119064]

2025-07-11 Thread Jason Merrill


On 7/10/25 6:34 PM, Jakub Jelinek wrote:

On Thu, Jul 10, 2025 at 05:46:06PM -0400, Jason Merrill wrote:

+  bool trivially_relocatable_if_eligible : 1;
+  bool replaceable_if_eligible : 1;
+
+  bool trivially_relocatable : 1;
+  bool trivially_relocatable_computed : 1;
+  bool replaceable : 1;
+  bool replaceable_computed : 1;


I wonder if we can get away with two bits per property?  I don't think
there's a way to query whether the "if_eligible" appeared on the type, only
whether it is in fact e.g. replaceable.

So if replaceable is set and replaceable_computed is not, it means
_if_eligible was specified?


Good idea.  Is it enough like done in the following updated patch, or shall
there be also *_IF_ELIGIBLE macros that would check both bits and macros to
set it etc.?


I think maybe just change e.g. CLASSTYPE_TRIVIALLY_RELOCATABLE to 
CLASSTYPE_TRIVIALLY_RELOCATABLE_BIT to avoid confusion.  OK with that tweak.


Jason

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-11 Thread Kyrylo Tkachov



> On 11 Jul 2025, at 16:48, Richard Sandiford  wrote:
> 
> Kyrylo Tkachov  writes:
>>> On 10 Jul 2025, at 11:12, Kyrylo Tkachov  wrote:
>>> 
>>> 
>>> 
 On 10 Jul 2025, at 10:40, Richard Sandiford  
 wrote:
 
 Kyrylo Tkachov  writes:
> Hi all,
> 
> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
> due to its tied operands, the destination of the movprfx cannot be also
> a source operand. But the offending pattern in aarch64-sve2.md tries
> to do exactly that for the "=?&w,w,w" alternative and gas warns for the
> attached testcase.
> 
> This patch just removes that alternative causing RA to emit a normal extra
> move.
> So for the testcase in the patch we now generate:
> nor_z:
> nbsl z1.d, z1.d, z2.d, z1.d
> mov z0.d, z1.d
> ret
> 
> instead of the previous:
> nor_z:
> movprfx z0, z1
> nbsl z0.d, z0.d, z2.d, z0.d
> ret
> 
> which generated a gas warning.
 
 Shouldn't we instead change it to:
 
   [ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, 
 %0.d, %2.d, %1.d
 
 ?  The "&" ensures that %1 is still valid in the NBSL.
 
 (That's OK if it works.)
>>> 
>>> Yes, that seems to work, thanks.
>>> I’ll push this version after some more testing.
>>> 
>> 
>> Shall I backport this for GCC 15.2 as well?
>> The test case uses C operators which were enabled in GCC 15, though I 
>> suppose one could construct a pure ACLE intrinsics testcase too.
> 
> Sounds good to me.  It's fixing wrong code, even if the gas warning
> makes it somewhat noisy wrong code.
> 

Looks like there’s a simple merge conflicts due to trunk also having 
http://gcc.gnu.org/g:f260146bc05f6fba7b2a67a62063c770588b769d
Author: Richard Earnshaw 
Date:   Mon Apr 14 16:41:16 2025 +0100

aarch64: Fix up commutative and early-clobber markers on compact inns

I’d like to backport that commit as well as it looks like a low-risk cleanup.
Both commits bootstrap and test cleanly on the branch.
Ok?
Thanks,
Kyrill

> Thanks,
> Richard

Re: [RFC v2] c++: Quoting in -fmodules-mapper [PR110153]

2025-07-11 Thread Jason Merrill


On 7/10/25 4:41 PM, Nicolas Werner wrote:

Users might be using a space in their build directory path. To allow
specifying such a root for the module mapper started by GCC, we need the
command to allow quotes. Previously quoting a path passed to the module
mapper was not possible, so replace the custom argv parsing with
the argv parsing logic from libiberty, that supports fairly standard
shell quoting using single and double quotes.

This also should fix PR110153, although my intention was really to fix
passing parameters to the "--root" parameter.

I don't know how to best add a test with this yet, since I am unsure
about how to best deal with spaces in test folders.


Can you be more specific?


-  if (!arg_no)
-   {
- /* @name means look in the compiler's install dir.  */
- if (ptr[0] == '@')
-   ptr++;
- else
-   full_program_name = nullptr;
-   }
-
-  argv[arg_no++] = ptr;
-  while (*ptr && *ptr != ' ')
-   ptr++;
-  if (!*ptr)
-   break;
-  *ptr = 0;
-}
+  // Split mapper argument into parameters
+  char** original_argv = buildargv (name.c_str () + 1);
+  int arg_no = countargv (original_argv);
+  char **argv = new char *[arg_no + 1];


Can we drop original_argv so argv is the result of buildargv without 
copying?



+  for (int i = 0; i < arg_no; i++)
+argv[i] = original_argv[i];
+
+  if (arg_no && argv[0][0] == '@')
+argv[0] = argv[0] + 1;


Let's keep the comment from the old code.


@@ -108,8 +89,8 @@ spawn_mapper_program (char const **errmsg, std::string
&name,


Word wrap in your mail client corrupted the patch here; it may be easier 
to attach it to avoid that.


Jason

Re: [PATCH] c, c++: Extend -Wunused-but-set-* warnings [PR44677]

2025-07-11 Thread Jakub Jelinek

On Fri, Jul 11, 2025 at 12:26:54PM -0400, Jason Merrill wrote:
> On 7/11/25 9:09 AM, Jakub Jelinek wrote:
> > On Thu, Jul 10, 2025 at 04:35:49PM -0400, Jason Merrill wrote:
> > > > --- gcc/cp/cp-gimplify.cc.jj2025-04-12 21:41:42.660924514 +0200
> > > > +++ gcc/cp/cp-gimplify.cc   2025-04-23 21:33:19.050931604 +0200
> > > > @@ -3200,7 +3200,23 @@ cp_fold (tree x, fold_flags_t flags)
> > > >  loc = EXPR_LOCATION (x);
> > > >  op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops, 
> > > > flags);
> > > > +  bool clear_decl_read;
> > > > +  clear_decl_read = false;
> > > > +  if (code == MODIFY_EXPR
> > > > + && (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
> > > > + && !DECL_READ_P (op0)
> > > > + && (VAR_P (op0) ? warn_unused_but_set_variable
> > > > + : warn_unused_but_set_parameter) > 2
> > > > + && BINARY_CLASS_P (TREE_OPERAND (x, 1))
> > > > + && TREE_OPERAND (TREE_OPERAND (x, 1), 0) == op0)
> > > > +   {
> > > > + mark_exp_read (TREE_OPERAND (TREE_OPERAND (x, 1), 1));
> > > > + if (!DECL_READ_P (op0))
> > > > +   clear_decl_read = true;
> > > > +   }
> > > >  op1 = cp_fold_rvalue (TREE_OPERAND (x, 1), flags);
> > > > +  if (clear_decl_read)
> > > > +   DECL_READ_P (op0) = 0;
> > > 
> > > Why does this need to happen in cp_fold?  Weren't the flags set properly 
> > > at
> > > build time?
> > 
> > Without the cp-gimplify.cc (cp_fold) hunk there are tons of FAILs, with
> 
> Then could we do something simpler at this point, just preserving the
> pre-folding state of DECL_READ_P (op0) without looking at the RHS?

That would be wrong.
The goal of the patch is to just add some extra exceptions which aren't
counted as uses.
If a MODIFY_EXPR is say
op0 = foo (op0);
then we shouldn't clear DECL_READ_P, the call does or could read the value
and use it for something, so warning that op0 is just set but not used would
be false positive.
The intent of the extension is to handle
op0 @= expr;
where expr doesn't refer to op0, or some cases of
op0 = op0 @ expr;
Still, if it is e.g.
op0 @= foo (op0);
we want to have DECL_READ_P set.
Which is why the above code when it sees op0 = op0 @ expr
calls mark_exp_read on expr and if it doesn't set DECL_READ_P (op0),
will clear it after cp_fold_rvalue on the whole rhs.

> > > > @@ -3740,7 +3740,28 @@ finish_unary_op_expr (location_t op_loc,
> > > >  tree expr_ovl = expr;
> > > >  if (!processing_template_decl)
> > > > -expr_ovl = cp_fully_fold (expr_ovl);
> > > > +switch (code)
> > > > +  {
> > > > +  case PREINCREMENT_EXPR:
> > > > +  case PREDECREMENT_EXPR:
> > > > +  case POSTINCREMENT_EXPR:
> > > > +  case POSTDECREMENT_EXPR:
> > > > +   tree stripped_expr;
> > > > +   stripped_expr = tree_strip_any_location_wrapper (expr);
> > > > +   if ((VAR_P (stripped_expr) || TREE_CODE (stripped_expr) == 
> > > > PARM_DECL)
> > > > +   && !DECL_READ_P (stripped_expr)
> > > > +   && (VAR_P (stripped_expr) ? warn_unused_but_set_variable
> > > > + : warn_unused_but_set_parameter) 
> > > > > 1)
> > > > + {
> > > > +   expr_ovl = cp_fully_fold (expr_ovl);
> > > 
> > > Again I wonder why cp_fold is setting DECL_READ_P.
> > 
> > See above.
> 
> Ah, the problem here is that cp_fully_fold assumes that we're using its
> operand as an rvalue, which is wrong for ++/--.  And we aren't going to get
> a constant result anyway, so finish_unary_op_expr shouldn't call
> cp_fully_fold, it should return result early for lvalue codes.

Well, I'm not sure it is actually an error.  finish_unary_op_expr doesn't
use cp_fully_fold result as an operand of {PRE,POST}{INC,DEC}REMENT_EXPR
(that would be wrong, we don't want the operand to fold into non-lvalue), it
is called only to find out if overflow warning should be emitted.  And I
think these ops act as both lvalue and rvalue, they read the old value,
increment/decrement it and store the new value.  The patch just ensures that
we don't consider it a DECL_READ_P in some cases.

The way the patch was written was for each of the exception cases find out
what sets DECL_READ_P even when I'd like it not to be set, and tweak those
spots, so that it is not set.

Jakub

Re: [PATCH] testsuite: Disable musttail tests if target uses SJLJ exceptions

2025-07-11 Thread Andrew Pinski

On Fri, Jul 11, 2025 at 9:59 AM Andi Kleen  wrote:
>
> Dimitar Dimitrov  writes:
>
> > A few tests started failing recently on pru-unknown-elf because it uses
> > SJLJ implementation for exceptions:
> >   FAIL: g++.dg/ext/musttail3.C  -std=c++11 (test for excess errors)
> >   .../gcc/gcc/testsuite/g++.dg/ext/musttail3.C:12:34: error: cannot 
> > tail-call: caller uses sjlj exceptions
> >
> > Fix by disabling those tests if target uses SJLJ for implementing
> > exceptions.
> >
> > Ensured that test results with and without this patch for
> > x86_64-pc-linux-gnu are the same.
> >
> > Ok for trunk?
>
> I would rather make it XFAIL and also open a PR, after all it is a
> limitation that could (and should) be fixed.
>
> As more and more software uses musttail it will hurt those targets
> eventually.

Actually with SJLJ exceptions there is nothing that can be done here.
```
  /* If we are using sjlj exceptions, we may need to add a call to
 _Unwind_SjLj_Unregister at exit of the function.  Which means
 that we cannot do any sibcall transformations.  */
  if (targetm_common.except_unwind_info (&global_options) == UI_SJLJ
  && current_function_has_exception_handlers ())
{
  maybe_error_musttail (call, _("caller uses sjlj exceptions"),
diag_musttail);
  return false;
}
```

Well maybe swap the _Unwind_SjLj_Unregister and sibcall function. But
I am not sure it really matters that much for these targets since the
overhead for SJLJ exceptions is huge even if a throw does not happen.
Yes it has a smaller code footprint but the speed cost is not worth
it.

So implementing this movement for sjlj targets is not worth the cost.

Thanks,
Andrew

>
> -Andi

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-11 Thread Richard Sandiford

Kyrylo Tkachov  writes:
>> On 10 Jul 2025, at 11:12, Kyrylo Tkachov  wrote:
>> 
>> 
>> 
>>> On 10 Jul 2025, at 10:40, Richard Sandiford  
>>> wrote:
>>> 
>>> Kyrylo Tkachov  writes:
 Hi all,

 While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
 due to its tied operands, the destination of the movprfx cannot be also
 a source operand. But the offending pattern in aarch64-sve2.md tries
 to do exactly that for the "=?&w,w,w" alternative and gas warns for the
 attached testcase.

 This patch just removes that alternative causing RA to emit a normal extra
 move.
 So for the testcase in the patch we now generate:
 nor_z:
 nbsl z1.d, z1.d, z2.d, z1.d
 mov z0.d, z1.d
 ret

 instead of the previous:
 nor_z:
 movprfx z0, z1
 nbsl z0.d, z0.d, z2.d, z0.d
 ret

 which generated a gas warning.
>>> 
>>> Shouldn't we instead change it to:
>>> 
>>>[ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, 
>>> %0.d, %2.d, %1.d
>>> 
>>> ?  The "&" ensures that %1 is still valid in the NBSL.
>>> 
>>> (That's OK if it works.)
>> 
>> Yes, that seems to work, thanks.
>> I’ll push this version after some more testing.
>> 
>
> Shall I backport this for GCC 15.2 as well?
> The test case uses C operators which were enabled in GCC 15, though I suppose 
> one could construct a pure ACLE intrinsics testcase too.

Sounds good to me.  It's fixing wrong code, even if the gas warning
makes it somewhat noisy wrong code.

Thanks,
Richard

Re: [PATCH] MicroBlaze : Enhance support for atomics. Fix PR118280

2025-07-11 Thread Michael Eager


On 7/10/25 4:41 AM, Gopi Kumar Bulusu wrote:


namaskaaram


Hi Gopi!



Please find the patch attached. This addresses regression for MicroBlaze 
(PR118280)


Neal Frager posted a different patch (or an RFC) to address pr118280 on
7/1/25:  https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg376017.html

The patches are different.  Is your patch a replacement for Neal's?
Can you either reconcile the differences or tell me which patch is
correct or better?

You might also update the PR with this patch and a comment.
pr118280 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118280


Atomic support enhanced to fix existing atomic_compare_and_swapsi pattern
to handle side effects; new patterns atomic_fetch_op and atomic_test_and_set
added. As MicroBlaze has no QImode test/set instruction, use shift magic
to implement atomic_test_and_set. Make -mxl-barrel-shift the default to keep
the default atomics code tight.


Include the PR which is resolved in the patch comments.

I'm not quite sure what you mean by "keep the default atomics code tight".

Neal suggested making -mxl-barrel-shift default for linux, but not
default for bare-metal.  This might be better for backward
compatibility, but depends on whether there are any MB configurations
which do not include a barrel shifter.  If someone does have a MB config
without a barrel shifter and tries to recompile after this patch, it's
possible that invalid code would be silently generated.



Files Changed

* gcc/config/microblaze/iterators.md: New
* microblaze-protos.h/microblaze.cc : Add microblaze_subword_address
* gcc/config/microblaze/microblaze.md: constants: Add UNSPECV_CAS_BOOL,
   UNSPECV_CAS_MEM, UNSPECV_CAS_VAL, UNSPECV_ATOMIC_FETCH_OP
   type: add atomic
* gcc/config/microblaze/microblaze.h: TARGET_DEFAULT : Add MASK_BARREL_SHIFT
* gcc/config/microblaze/sync.md: Add atomic_fetch_si
   atomic_test_and_set

Target Checked
microblazeel-amd-linux

Testing

deja-g++

                 === g++ Summary ===

# of expected passes            237906
# of unexpected failures        4165
# of unexpected successes       3
# of expected failures          2180
# of unresolved testcases       645
# of unsupported tests          2658


deja-libstdcpp

                 === libstdc++ Summary ===

# of expected passes            18180
# of unexpected failures        311
# of expected failures          133
# of unresolved testcases       18
# of unsupported tests          853

Includes Test case 29_atomics/atomic_flag/clear/1.cc (which checks for 
atomic_test_and_set)


I don't have a baseline to compare these test results with, or test
results before applying this patch, so the results don't have any
meaning to me.  Was the test case 29 failing before applying the patch
and succeeding after?  Were there any other differences?

Neal indicated that patch was tested using buildroot with
target=microblazeel-buildroot-linux-gnu.  It looks like you have
target=microblazeel-amd-linux.  How are you running the test suite?


--
Michael Eager

[PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]

2025-07-11 Thread Richard Sandiford

This PR is partly about a code quality regression that was triggered
by g:caa7a99a052929d5970677c5b639e1fa5166e334.  That patch taught the
gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
upon either (a) the original permutations not being "native" operations
or (b) the combined permutation being a "native" operation.

Whether something is a "native" operation is tested by calling
can_vec_perm_const_p with allow_variable_p set to false.  This requires
the permutation to be supported directly by TARGET_VECTORIZE_VEC_PERM_CONST,
rather than falling back to the general vec_perm optab.

This exposed a problem with the way that we handled general 2-input
permutations for SVE.  Unlike Advanced SIMD, base SVE does not have
an instruction to do general 2-input permutations.  We do still implement
the vec_perm optab for SVE, but only when the vector length is known at
compile time.  The general expansion is pretty expensive: an AND, a SUB,
two TBLs, and an ORR.  It certainly couldn't be considered a "native"
operation.

However, if a VEC_PERM_EXPR has a constant selector, the indices can
be wider than the elements being permuted.  This is not true for the
vec_perm optab, where the indices and permuted elements must have the
same precision.

This leads to one case where we cannot leave a general 2-input permutation
to be handled by the vec_perm optab: when permuting bytes on a target
with 2048-bit vectors.  In that case, the indices of the elements in
the second vector are in the range [256, 511], which cannot be stored
in a byte index.

TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
permutations for one specific case.  Rather than check for that
specific case, the code went ahead and used the vec_perm expansion
whenever it worked.  But that undermines the !allow_variable_p
handling in can_vec_perm_const_p; it becomes impossible for
target-independent code to distinguish "native" operations from
the worst-case fallback.

This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
cases that it has to handle.  It fixes the PR for all vector lengths
except 2048 bits.

A better fix would be to introduce some sort of costing mechanism,
which would allow us to reject the new VEC_PERM_EXPR even for
2048-bit targets.  But that would be a significant amount of work
and would not be backportable.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR target/121027
* config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
operations that can be handled by vec_perm.

gcc/testsuite/
PR target/121027
* gcc.target/aarch64/sve/acle/general/perm_1.c: New test.
---
 gcc/config/aarch64/aarch64.cc | 21 ++-
 .../aarch64/sve/acle/general/perm_1.c | 14 +
 2 files changed, 30 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 10b8ed5d387..6e16763f957 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26960,12 +26960,23 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
 static bool
 aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d)
 {
-  unsigned HOST_WIDE_INT nelt;
+  if (!d->one_vector_p)
+{
+  /* aarch64_expand_sve_vec_perm does not yet handle variable-length
+vectors.  */
+  if (!d->perm.length ().is_constant ())
+   return false;
 
-  /* Permuting two variable-length vectors could overflow the
- index range.  */
-  if (!d->one_vector_p && !d->perm.length ().is_constant (&nelt))
-return false;
+  /* This permutation reduces to the vec_perm optab if the elements are
+large enough to hold all selector indices.  Do not handle that case
+here, since the general TBL+SUB+TBL+ORR sequence is too expensive to
+be considered a "native" constant permutation.
+
+Not doing this would undermine code that queries can_vec_perm_const_p
+with allow_variable_p set to false.  See PR121027.  */
+  if (selector_fits_mode_p (d->vmode, d->perm))
+   return false;
+}
 
   if (d->testing_p)
 return true;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c
new file mode 100644
index 000..6b920b8b681
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c
@@ -0,0 +1,14 @@
+/* { dg-options "-O2 -msve-vector-bits=256" } */
+
+#include 
+typedef svbfloat16_t vls_bfloat16_t __attribute__((arm_sve_vector_bits(32 * 
8)));
+svbfloat16_t foo(vls_bfloat16_t a, vls_bfloat16_t b)
+{
+  svbfloat16_t zero = svreinterpret_bf16_f32 (svdup_n_f32 (0.0f));
+  return svzip2_bf16(zero, svuzp1_bf16(a,b));
+}
+
+
+/* { dg-final { scan-assembler-times {\tuzp1\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tzip2\t} 1 } } */
+/* { dg-final { scan-assembler-n

Re: [PATCH] c, c++: Extend -Wunused-but-set-* warnings [PR44677]

2025-07-11 Thread Jason Merrill


On 7/11/25 9:09 AM, Jakub Jelinek wrote:

On Thu, Jul 10, 2025 at 04:35:49PM -0400, Jason Merrill wrote:

--- gcc/cp/cp-gimplify.cc.jj2025-04-12 21:41:42.660924514 +0200
+++ gcc/cp/cp-gimplify.cc   2025-04-23 21:33:19.050931604 +0200
@@ -3200,7 +3200,23 @@ cp_fold (tree x, fold_flags_t flags)
 loc = EXPR_LOCATION (x);
 op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops, flags);
+  bool clear_decl_read;
+  clear_decl_read = false;
+  if (code == MODIFY_EXPR
+ && (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
+ && !DECL_READ_P (op0)
+ && (VAR_P (op0) ? warn_unused_but_set_variable
+ : warn_unused_but_set_parameter) > 2
+ && BINARY_CLASS_P (TREE_OPERAND (x, 1))
+ && TREE_OPERAND (TREE_OPERAND (x, 1), 0) == op0)
+   {
+ mark_exp_read (TREE_OPERAND (TREE_OPERAND (x, 1), 1));
+ if (!DECL_READ_P (op0))
+   clear_decl_read = true;
+   }
 op1 = cp_fold_rvalue (TREE_OPERAND (x, 1), flags);
+  if (clear_decl_read)
+   DECL_READ_P (op0) = 0;


Why does this need to happen in cp_fold?  Weren't the flags set properly at
build time?


Without the cp-gimplify.cc (cp_fold) hunk there are tons of FAILs, with


Then could we do something simpler at this point, just preserving the 
pre-folding state of DECL_READ_P (op0) without looking at the RHS?



GXX_TESTSUITE_STDS=17 make check-g++ RUNTESTFLAGS="dg.exp='Wunused-var* 
Wunused-parm* name-independent-decl1.C unused-9.c memchr-3.c'"
(just 17 so that the same FAILs don't keep repeated for different std
versions):
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 14)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 15)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 16)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 17)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 18)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 19)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 20)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 14)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 15)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 16)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 17)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 18)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 19)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 20)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 13)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 14)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 15)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 16)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 17)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 18)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 19)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 13)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 14)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 15)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 16)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 17)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 18)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 19)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 13)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 14)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 15)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 16)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 17)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 18)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 19)

cp_fold_rvalue -> cp_fold_maybe_rvalue -> mark_rvalue_use -> mark_use
->mark_exp_read then sets DECL_READ_P on op0, which we want to avoid in the
op0 @= expr
case in some of the levels.


@@ -211,8 +211,27 @@ mark_use (tree expr, bool rvalue_p, bool
}
  return expr;
}
-  gcc_fallthrough();
+  gcc_fallthrough ();
   CASE_CONVERT:
+  if (VOID_TYPE_P (TREE_TYPE (expr)))
+   switch (TREE_CODE (TREE_OPERAND (expr, 0)))
+ {
+ case PREINCREMENT_EXPR:
+ case PREDECREMENT_EXPR:
+ case POSTINCREMENT_

Re: [Patch, Fortran, Coarray, PR88076, v2] Add a shared memory multi process coarray library.

2025-07-11 Thread Jerry D


On 7/10/25 2:27 AM, Andre Vehreschild wrote:

Hi all,

after Jerry had the idea to use OpenCoarray's tests to also test
caf_shmem, a few issue arose. Those have been fixed now in the updated
patch-series.

I have put all patches into one mail to allow the CIs to pick them all
up and hopefully test on multiple (exotic) architectures and OSes.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre


I have applied and pushed V2 of the patch set to:

https://forge.sourceware.org/JerryD/gfortran-TEST

For others to test.

I have tested this on Toon's random_weather.f90 test case and that tests fine. I 
still need to rebuild OpenCoarrays and make sure that builds OK with this 
compiler and then I will give Andres suggestions to test with the OpenCoarrays 
test suite.


Thomas, can you suggest how to test specifically for race conditions. I am not 
sure what I might try to do this.


Andre, do you have any thoughts on this?  Anyone else?

Jerry

Re: [PATCH] ipa: Disallow signature changes in fun->has_musttail functions [PR121023]

2025-07-11 Thread Martin Jambor

Hi,

On Fri, Jul 11 2025, Richard Biener wrote:
> On Fri, 11 Jul 2025, Jakub Jelinek wrote:
>
>> Hi!
>> 
>> As the following testcase shows e.g. on ia32, letting IPA opts change
>> signature of functions which have [[{gnu,clang}::musttail]] calls
>> can turn programs that would be compiled normally into something
>> that is rejected because the caller has fewer argument stack slots
>> than the function being tail called.
>> 
>> The following patch prevents signature changes for such functions.
>> It is perhaps too big hammer in some cases, but it might be hard
>> to try to figure out what signature changes are still acceptable and which
>> are not at IPA time.
>> 
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/15.2?
>
> OK, but please give Martin/Honza a chance to comment.

I agree.

Martin


>
> Thanks,
> Richard.
>
>> 2025-07-11  Jakub Jelinek  
>>  Martin Jambor  
>> 
>>  PR ipa/121023
>>  * ipa-fnsummary.cc (compute_fn_summary): Disallow signature changes
>>  on cfun->has_musttail functions.
>> 
>>  * c-c++-common/musttail32.c: New test.
>> 
>> --- gcc/ipa-fnsummary.cc.jj  2025-07-04 09:01:47.507516910 +0200
>> +++ gcc/ipa-fnsummary.cc 2025-07-10 14:00:19.488185173 +0200
>> @@ -3421,6 +3421,21 @@ compute_fn_summary (struct cgraph_node *
>>   info->inlinable = tree_inlinable_function_p (node->decl);
>>  
>> bool no_signature = false;
>> +
>> +   /* Don't allow signature changes for functions which have
>> +  [[gnu::musttail]] or [[clang::musttail]] calls.  Sometimes
>> +  (more often on targets which pass everything on the stack)
>> +  signature changes can result in tail calls being impossible
>> +  even when without the signature changes they would be ok.
>> +  See PR121023.  */
>> +   if (cfun->has_musttail)
>> + {
>> +   if (dump_file)
>> +fprintf (dump_file, "No signature change:"
>> + " function has calls with musttail attribute.\n");
>> +   no_signature = true;
>> + }
>> +
>> /* Type attributes can use parameter indices to describe them.
>>Special case fn spec since we can safely preserve them in
>>modref summaries.  */
>> --- gcc/testsuite/c-c++-common/musttail32.c.jj   2025-07-10 
>> 14:00:56.760698477 +0200
>> +++ gcc/testsuite/c-c++-common/musttail32.c  2025-07-10 14:02:21.945586151 
>> +0200
>> @@ -0,0 +1,23 @@
>> +/* PR ipa/121023 */
>> +/* { dg-do compile { target musttail } } */
>> +/* { dg-options "-O2" } */
>> +
>> +struct S { int a, b; };
>> +
>> +[[gnu::noipa]] int
>> +foo (struct S x, int y, int z)
>> +{
>> +  return x.a + y + z;
>> +}
>> +
>> +[[gnu::noinline]] static int
>> +bar (struct S x, int y, int z)
>> +{
>> +  [[gnu::musttail]] return foo ((struct S) { x.a, 0 }, y, 1);
>> +}
>> +
>> +int
>> +baz (int x)
>> +{
>> +  return bar ((struct S) { 1, 2 }, x, 2) + bar ((struct S) { 2, 3 }, x + 1, 
>> 2);
>> +}
>> 
>>  Jakub
>> 
>> 
>
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v2] x86: Update MMXMODE:*mov_internal to support all 1s vectors

2025-07-11 Thread Uros Bizjak

On Fri, Jul 11, 2025 at 9:57 AM Uros Bizjak  wrote:
>
> On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  wrote:
>
> > gcc/
> >
> > PR target/121015
> > * config/i386/constraints.md (BX): New constraint.
> > * config/i386/i386.cc (ix86_print_operand): Support CONSTM1_RTX.
> > * config/i386/mmx.md (MMXMODE:*mov_internal): Replace C with
> > BX for memory and integer register destination.  Replace 
> > with .
> > Update 32-bit MMXMODE move splitter to also split all 1s vector
> > source operand.
> > * config/i386/predicates.md (vector_const0_or_m1_operand): New
> > predicate.
> > (nonimm_or_vector_const0_or_m1_operand): Likewise.
> >
> > gcc/testsuite/
> >
> > PR target/121015
> > * gcc.target/i386/pr106022-2.c: Adjusted.
> > * gcc.target/i386/pr121015-1.c: New test.
> > * gcc.target/i386/pr121015-2.c: Likewise.
> > * gcc.target/i386/pr121015-3.c: Likewise.
> > * gcc.target/i386/pr121015-4.c: Likewise.
> > * gcc.target/i386/pr121015-5.c: Likewise.
> > * gcc.target/i386/pr121015-6.c: Likewise.
> >
> > OK for master?
>
> Please try the attached patch that introduces "all ones" handling to MMX 
> moves.

Bah, wrong version attached (missing 32bit modes in mmxconstm1) -
please try this.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ad7360ec71a..c9b0ddf290c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5448,6 +5448,8 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
return 2;
  break;
case 16:
+   case 8:
+   case 4:
  if (TARGET_SSE2)
return 2;
  break;
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 79202323e53..c5e4f4239e6 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -111,6 +111,13 @@ (define_mode_attr mmxinsnmode
(V4BF "DI") (V2BF "SI")
(V2SF "DI")])
 
+;; MMX constant -1 constraint
+(define_mode_attr mmxconstm1
+  [(V8QI "BC") (V4HI "BC") (V2SI "BC") (V1DI "BC")
+   (V4QI "BC") (V2HI "BC") (V1SI "BC")
+   (V4HF "BF") (V4BF "BF") (V2SF "BF")
+   (V2HF "BF") (V2BF "BF")])
+
 (define_mode_attr mmxdoublemode
   [(V8QI "V8HI") (V4HI "V4SI")])
 
@@ -174,7 +181,7 @@ (define_mode_attr Yv_Yw
 
 (define_expand "mov"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
-   (match_operand:MMXMODE 1 "nonimm_or_0_operand"))]
+   (match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"))]
   "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
@@ -183,9 +190,9 @@ (define_expand "mov"
 
 (define_insn "*mov_internal"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand"
-"=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
-   (match_operand:MMXMODE 1 "nonimm_or_0_operand"
-"rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
+"=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v   ,v,v,m,r,v,!y,*x")
+   (match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"
+"rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,,v,m,v,v,r,*x,!y"))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))
&& ix86_hardreg_mov_ok (operands[0], operands[1])"
@@ -232,9 +239,9 @@ (define_insn "*mov_internal"
  (const_string "nox64")
(eq_attr "alternative" "2,3,4,9,10")
  (const_string "x64")
-   (eq_attr "alternative" "15,16")
+   (eq_attr "alternative" "16,17")
  (const_string "x64_sse2")
-   (eq_attr "alternative" "17,18")
+   (eq_attr "alternative" "12,18,19")
  (const_string "sse2")
   ]
   (const_string "*")))
@@ -247,14 +254,14 @@ (define_insn "*mov_internal"
  (const_string "mmx")
(eq_attr "alternative" "6,7,8,9,10")
  (const_string "mmxmov")
-   (eq_attr "alternative" "11")
+   (eq_attr "alternative" "11,12")
  (const_string "sselog1")
-   (eq_attr "alternative" "17,18")
+   (eq_attr "alternative" "18,19")
  (const_string "ssecvt")
   ]
   (const_string "ssemov")))
(set (attr "prefix_rex")
- (if_then_else (eq_attr "alternative" "9,10,15,16")
+ (if_then_else (eq_attr "alternative" "9,10,16,17")
(const_string "1")
(const_string "*")))
(set (attr "prefix")
@@ -269,7 +276,7 @@ (define_insn "*mov_internal"
(set (attr "mode")
  (cond [(eq_attr "alternative" "2")
  (const_string "SI")
-   (eq_attr "alternative" "11,12")
+   (eq_attr "alternative" "11,12,13")
  (cond [(match_test "mode == V2SFmode
  || mode == V4HFmode
  || mode == V4BFmode")
@@ -280,7 +287,7 @@ (define_insn "*mov_internal"
]
(const_string "TI"))
 
-   (and (eq_attr "alternative" "13")
+   (and (eq_attr "alternative" "14")
 (ior (ior (and (match_test "mode == V2SFmode")

Re: [PATCH v2] x86: Update MMXMODE:*mov_internal to support all 1s vectors

2025-07-11 Thread H.J. Lu

On Fri, Jul 11, 2025 at 4:23 PM Uros Bizjak  wrote:
>
> On Fri, Jul 11, 2025 at 9:57 AM Uros Bizjak  wrote:
> >
> > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  wrote:
> >
> > > gcc/
> > >
> > > PR target/121015
> > > * config/i386/constraints.md (BX): New constraint.
> > > * config/i386/i386.cc (ix86_print_operand): Support CONSTM1_RTX.
> > > * config/i386/mmx.md (MMXMODE:*mov_internal): Replace C with
> > > BX for memory and integer register destination.  Replace 
> > > with .
> > > Update 32-bit MMXMODE move splitter to also split all 1s vector
> > > source operand.
> > > * config/i386/predicates.md (vector_const0_or_m1_operand): New
> > > predicate.
> > > (nonimm_or_vector_const0_or_m1_operand): Likewise.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/121015
> > > * gcc.target/i386/pr106022-2.c: Adjusted.
> > > * gcc.target/i386/pr121015-1.c: New test.
> > > * gcc.target/i386/pr121015-2.c: Likewise.
> > > * gcc.target/i386/pr121015-3.c: Likewise.
> > > * gcc.target/i386/pr121015-4.c: Likewise.
> > > * gcc.target/i386/pr121015-5.c: Likewise.
> > > * gcc.target/i386/pr121015-6.c: Likewise.
> > >
> > > OK for master?
> >
> > Please try the attached patch that introduces "all ones" handling to MMX 
> > moves.
>
> Bah, wrong version attached (missing 32bit modes in mmxconstm1) -
> please try this.
>
> Uros.

Here are the source and 2 assembly codes generated by -O2 -march=x86-64-v3.
My patch generates:

movq $-1, %rax
...
movq %rax, 4(%rcx)
...
movq %rax, 4(%rcx)
...
movq %rax, 4(%rcx)

Yours generates:

vpcmpeqd %xmm0, %xmm0, %xmm0
...
vmovlps %xmm0, 4(%rdx)
...
vpcmpeqd %xmm1, %xmm1, %xmm1
...
vmovlps %xmm1, 4(%rdx)
...
vpcmpeqd %xmm2, %xmm2, %xmm2
...
vmovlps %xmm2, 4(%rdx)

I prefer the assembly codes generated by my patch.

-- 
H.J.


pr121015-1.s.hjl
Description: Binary data


pr121015-1.s.uros
Description: Binary data
/* { dg-do compile } */
/* { dg-options "-O2 -march=x86-64-v3" } */
/* { dg-final { scan-assembler-not "\tmovl\[\\t \]+\\\$-1, %" { target { ! ia32 } } } } */
/* { dg-final { scan-assembler "\tmovq\[\\t \]+\\\$-1, " { target { ! ia32 } } } } */

extern union {
  int i;
  float f;
} int_as_float_u;

extern int render_result_from_bake_w;
extern int render_result_from_bake_h_seed_pass;
extern float *render_result_from_bake_h_primitive;
extern float *render_result_from_bake_h_seed;

float
int_as_float(int i)
{
  int_as_float_u.i = i;
  return int_as_float_u.f;
}

void
render_result_from_bake_h(int tx)
{
  while (render_result_from_bake_w) {
for (; tx < render_result_from_bake_w; tx++)
  render_result_from_bake_h_primitive[1] =
  render_result_from_bake_h_primitive[2] = int_as_float(-1);
if (render_result_from_bake_h_seed_pass) {
  *render_result_from_bake_h_seed = 0;
}
  }
}

Re: [PATCH] [x86] properly compute fp/mode for scalar ops for vectorizer costing

2025-07-11 Thread Richard Biener

On Thu, 10 Jul 2025, Richard Biener wrote:

> On Thu, 10 Jul 2025, Jan Hubicka wrote:
> 
> > > The x86 add_stmt_hook relies on the passed vectype to determine
> > > the mode and whether it is FP for a scalar operation.  This is
> > > unreliable now for stmts involving patterns and in the future when
> > > there is no vector type passed for scalar operations.
> > > 
> > > To be least disruptive I've kept using the vector type if it is passed.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > 
> > > OK?
> > > 
> > > Thanks
> > > Richard.
> > > 
> > >   * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Use
> > >   the LHS of a scalar stmt to determine mode and whether it is FP.
> > > ---
> > >  gcc/config/i386/i386.cc | 6 ++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index ad7360ec71a..26eefadea64 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -25798,6 +25798,12 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > > vect_cost_for_stmt kind,
> > >if (scalar_p)
> > >   mode = TYPE_MODE (TREE_TYPE (vectype));
> > >  }
> > > +  else if (scalar_p && stmt_info)
> > > +if (tree lhs = gimple_get_lhs (stmt_info->stmt))
> > > +  {
> > > + fp = FLOAT_TYPE_P (TREE_TYPE (lhs));
> > > + mode = TYPE_MODE (TREE_TYPE (lhs));
> > > +  }
> > Makes sense to me, but perhaps it would be good idea to add a comment,
> > since it looks odd at first glance?
> 
> Like
> 
>   /* When we are costing a scalar stmt use the scalar stmt to get at the
>  type of the operation.  */
> 
> ?

I have pushed with this change.

Richard.

[PATCH] testsuite: arm: Add effective-target vect_early_break to vect-tsvc-*

2025-07-11 Thread Torbjörn SVENSSON

Ok for trunk, gcc-15 and gcc-14.

I discovered that the dg-require-effective-target is missing on gcc-14,
but it's probably the right thing to add on gcc-15 and trunk too.

Without the `dg-require-effective-target vect_early_break`, the
`dg-add-options vect_early_break` will return the flags unchanged and
`dg-require-effective-target vect_early_break_hw` will succeed as it
overrides the flags, causing the tests to use the wrong target.

Let me know what you think.

--

With the -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c, these tests start to pass due to that the
cpu/arch is overridden. The proper thing to do when using
`dg-add-options vect_early_break` is to also have a
`dg-require-effective-target vect_early_break`, so adding this.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Add
dg-require-effective-target vect_early_break to test.
* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c | 1 +
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c | 1 +
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 21a9c5a6b2b..b4154040d1b 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -3,6 +3,7 @@
 
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break } */
 /* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-add-options vect_early_break } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index e4433385d66..156e44972bd 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -3,6 +3,7 @@
 
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break } */
 /* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-add-options vect_early_break } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 146df409ecc..a1fcb18c557 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -3,6 +3,7 @@
 
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break } */
 /* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-add-options vect_early_break } */
 
-- 
2.25.1

Re: [PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned SAT_MUL pattern

2025-07-11 Thread Richard Biener

On Fri, Jul 11, 2025 at 9:13 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > Why is it important to constrain the widen-mult input to a
> > fixed precision at all?
>
> I suppose widen-mult only occurs when the result exceed the max bits of gpr.
> So, here I would like to make sure the precision is matching the bits of gpr.
>
> For rv32 with 32-bits gpr, 32 bits * 32 bits => 64 bits
>
>   15   │   _1 = (long long unsigned int) a_4(D);
>   16   │   _2 = (long long unsigned int) b_5(D);
>   17   │   _9 = (unsigned int) _1;
>   18   │   _10 = (unsigned int) _2;
>   19   │   x_6 = _9 w* _10;
>
> For rv64 with 64-bits gpr, 64 bits * 64 bits => 128 bits
>
>   15   │   _1 = (__int128 unsigned) a_4(D);
>   16   │   _2 = (__int128 unsigned) b_5(D);
>   17   │   _9 = (unsigned long) _1;
>   18   │   _10 = (unsigned long) _2;
>   19   │   x_6 = _9 w* _10;
>
> But if it is a widen-mul, it looks like that the ops of widen-mul should be 
> the max bits of gpr already.
> Or it will be a normal mul, then looks we don't need to check that anymore, 
> am I understanding correct?

A widen-mul could also be a QImode x QImode -> HImode operation, or a
QImode x QImode -> SImode
operation.  The only restriction is the result is at least twice as
wide as the inputs.

Richard.

>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, July 11, 2025 2:23 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com; Chen, Ken ; 
> Liu, Hongtao 
> Subject: Re: [PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned 
> SAT_MUL pattern
>
> On Fri, Jul 11, 2025 at 6:51 AM  wrote:
> >
> > From: Pan Li 
> >
> > The widen mul has different source type for differnt platform,
> > like rv32 or rv64.  For rv32, the source of widen mul is 32-bits
> > while 64-bits in rv64.  Thus, leverage HOST_WIDE_INT is not that
> > correct and result in the pattern match failures in 32-bits system
> > like rv32.
> >
> > Thus, leverage the BITS_PER_WORD instead for this pattern.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Leverage BITS_PER_WORD instead of HOST_WIDE_INT
> > for widen mul precision check.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/match.pd | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 67b33eee5f7..7f31705b652 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3605,11 +3605,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> >unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
> >unsigned cvt6_prec = TYPE_PRECISION (TREE_TYPE (@6));
> > -  unsigned hw_int_prec = sizeof (HOST_WIDE_INT) * 8;
> >wide_int c2 = wi::to_wide (@2);
> >wide_int max = wi::mask (prec, false, widen_prec);
> >bool c2_is_max_p = wi::eq_p (c2, max);
> > -  bool widen_mult_p = cvt5_prec == cvt6_prec && hw_int_prec == 
> > cvt5_prec;
> > +  bool widen_mult_p = cvt5_prec == cvt6_prec && BITS_PER_WORD == 
> > cvt5_prec;
>
> Why is it important to constrain the widen-mult input to a
> fixed precision at all?
>
> >   }
> >   (if (widen_prec > prec && c2_is_max_p && widen_mult_p)
> >  )
> > --
> > 2.43.0
> >

GCC 12 branch is now closed

2025-07-11 Thread Richard Biener



The GCC 12 branch is now closed, no further changes can be pushed
there.

Re: [PATCH v2] x86: Update MMXMODE:*mov_internal to support all 1s vectors

2025-07-11 Thread Uros Bizjak

On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  wrote:

> gcc/
>
> PR target/121015
> * config/i386/constraints.md (BX): New constraint.
> * config/i386/i386.cc (ix86_print_operand): Support CONSTM1_RTX.
> * config/i386/mmx.md (MMXMODE:*mov_internal): Replace C with
> BX for memory and integer register destination.  Replace 
> with .
> Update 32-bit MMXMODE move splitter to also split all 1s vector
> source operand.
> * config/i386/predicates.md (vector_const0_or_m1_operand): New
> predicate.
> (nonimm_or_vector_const0_or_m1_operand): Likewise.
>
> gcc/testsuite/
>
> PR target/121015
> * gcc.target/i386/pr106022-2.c: Adjusted.
> * gcc.target/i386/pr121015-1.c: New test.
> * gcc.target/i386/pr121015-2.c: Likewise.
> * gcc.target/i386/pr121015-3.c: Likewise.
> * gcc.target/i386/pr121015-4.c: Likewise.
> * gcc.target/i386/pr121015-5.c: Likewise.
> * gcc.target/i386/pr121015-6.c: Likewise.
>
> OK for master?

Please try the attached patch that introduces "all ones" handling to MMX moves.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ad7360ec71a..c9b0ddf290c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5448,6 +5448,8 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
return 2;
  break;
case 16:
+   case 8:
+   case 4:
  if (TARGET_SSE2)
return 2;
  break;
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 79202323e53..1cfd09ed59e 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -111,6 +111,11 @@ (define_mode_attr mmxinsnmode
(V4BF "DI") (V2BF "SI")
(V2SF "DI")])
 
+;; MMX constant -1 constraint
+(define_mode_attr mmxconstm1
+  [(V8QI "BC") (V4HI "BC") (V2SI "BC") (V1DI "BC")
+   (V4HF "BF") (V4BF "BF") (V2SF "BF")])
+
 (define_mode_attr mmxdoublemode
   [(V8QI "V8HI") (V4HI "V4SI")])
 
@@ -174,7 +179,7 @@ (define_mode_attr Yv_Yw
 
 (define_expand "mov"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
-   (match_operand:MMXMODE 1 "nonimm_or_0_operand"))]
+   (match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"))]
   "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
@@ -183,9 +188,9 @@ (define_expand "mov"
 
 (define_insn "*mov_internal"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand"
-"=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
-   (match_operand:MMXMODE 1 "nonimm_or_0_operand"
-"rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
+"=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v   ,v,v,m,r,v,!y,*x")
+   (match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"
+"rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,,v,m,v,v,r,*x,!y"))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))
&& ix86_hardreg_mov_ok (operands[0], operands[1])"
@@ -232,9 +237,9 @@ (define_insn "*mov_internal"
  (const_string "nox64")
(eq_attr "alternative" "2,3,4,9,10")
  (const_string "x64")
-   (eq_attr "alternative" "15,16")
+   (eq_attr "alternative" "16,17")
  (const_string "x64_sse2")
-   (eq_attr "alternative" "17,18")
+   (eq_attr "alternative" "12,18,19")
  (const_string "sse2")
   ]
   (const_string "*")))
@@ -247,14 +252,14 @@ (define_insn "*mov_internal"
  (const_string "mmx")
(eq_attr "alternative" "6,7,8,9,10")
  (const_string "mmxmov")
-   (eq_attr "alternative" "11")
+   (eq_attr "alternative" "11,12")
  (const_string "sselog1")
-   (eq_attr "alternative" "17,18")
+   (eq_attr "alternative" "18,19")
  (const_string "ssecvt")
   ]
   (const_string "ssemov")))
(set (attr "prefix_rex")
- (if_then_else (eq_attr "alternative" "9,10,15,16")
+ (if_then_else (eq_attr "alternative" "9,10,16,17")
(const_string "1")
(const_string "*")))
(set (attr "prefix")
@@ -269,7 +274,7 @@ (define_insn "*mov_internal"
(set (attr "mode")
  (cond [(eq_attr "alternative" "2")
  (const_string "SI")
-   (eq_attr "alternative" "11,12")
+   (eq_attr "alternative" "11,12,13")
  (cond [(match_test "mode == V2SFmode
  || mode == V4HFmode
  || mode == V4BFmode")
@@ -280,7 +285,7 @@ (define_insn "*mov_internal"
]
(const_string "TI"))
 
-   (and (eq_attr "alternative" "13")
+   (and (eq_attr "alternative" "14")
 (ior (ior (and (match_test "mode == V2SFmode")
(not (match_test "TARGET_MMX_WITH_SSE")))
   (not (match_test "TARGET_SSE2")))
@@ -288,7 +293,7 @@ (define_insn "*mov_internal"
  || mode == V4BFmode")))
  (const_string

Re: [PATCH v2] x86: Update MMXMODE:*mov_internal to support all 1s vectors

2025-07-11 Thread Uros Bizjak

On Fri, Jul 11, 2025 at 10:39 AM H.J. Lu  wrote:
>
> On Fri, Jul 11, 2025 at 4:23 PM Uros Bizjak  wrote:
> >
> > On Fri, Jul 11, 2025 at 9:57 AM Uros Bizjak  wrote:
> > >
> > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  wrote:
> > >
> > > > gcc/
> > > >
> > > > PR target/121015
> > > > * config/i386/constraints.md (BX): New constraint.
> > > > * config/i386/i386.cc (ix86_print_operand): Support CONSTM1_RTX.
> > > > * config/i386/mmx.md (MMXMODE:*mov_internal): Replace C with
> > > > BX for memory and integer register destination.  Replace 
> > > > with .
> > > > Update 32-bit MMXMODE move splitter to also split all 1s vector
> > > > source operand.
> > > > * config/i386/predicates.md (vector_const0_or_m1_operand): New
> > > > predicate.
> > > > (nonimm_or_vector_const0_or_m1_operand): Likewise.
> > > >
> > > > gcc/testsuite/
> > > >
> > > > PR target/121015
> > > > * gcc.target/i386/pr106022-2.c: Adjusted.
> > > > * gcc.target/i386/pr121015-1.c: New test.
> > > > * gcc.target/i386/pr121015-2.c: Likewise.
> > > > * gcc.target/i386/pr121015-3.c: Likewise.
> > > > * gcc.target/i386/pr121015-4.c: Likewise.
> > > > * gcc.target/i386/pr121015-5.c: Likewise.
> > > > * gcc.target/i386/pr121015-6.c: Likewise.
> > > >
> > > > OK for master?
> > >
> > > Please try the attached patch that introduces "all ones" handling to MMX 
> > > moves.
> >
> > Bah, wrong version attached (missing 32bit modes in mmxconstm1) -
> > please try this.
> >
> > Uros.
>
> Here are the source and 2 assembly codes generated by -O2 -march=x86-64-v3.
> My patch generates:
>
> movq $-1, %rax
> ...
> movq %rax, 4(%rcx)
> ...
> movq %rax, 4(%rcx)
> ...
> movq %rax, 4(%rcx)
>
> Yours generates:
>
> vpcmpeqd %xmm0, %xmm0, %xmm0
> ...
> vmovlps %xmm0, 4(%rdx)
> ...
> vpcmpeqd %xmm1, %xmm1, %xmm1
> ...
> vmovlps %xmm1, 4(%rdx)
> ...
> vpcmpeqd %xmm2, %xmm2, %xmm2
> ...
> vmovlps %xmm2, 4(%rdx)
>
> I prefer the assembly codes generated by my patch.

Yes, I also noticed this issue after some more testing. The attached
patch revision adds  constraint that results in even better:

   movq$-1, 4(%rdx)

Please note we don't want this for 32-bit targets, where the above
would result in two stores. "vpcmpeqd %xmm1, %xmm1, %xmm1;  vmovlps
%xmm1, 4(%rdx)" should be used instead.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ad7360ec71a..c9b0ddf290c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5448,6 +5448,8 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
return 2;
  break;
case 16:
+   case 8:
+   case 4:
  if (TARGET_SSE2)
return 2;
  break;
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 79202323e53..f5f4782101c 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -111,6 +111,13 @@ (define_mode_attr mmxinsnmode
(V4BF "DI") (V2BF "SI")
(V2SF "DI")])
 
+;; MMX constant -1 constraint
+(define_mode_attr mmxconstm1
+  [(V8QI "BC") (V4HI "BC") (V2SI "BC") (V1DI "BC")
+   (V4QI "BC") (V2HI "BC") (V1SI "BC")
+   (V4HF "BF") (V4BF "BF") (V2SF "BF")
+   (V2HF "BF") (V2BF "BF")])
+
 (define_mode_attr mmxdoublemode
   [(V8QI "V8HI") (V4HI "V4SI")])
 
@@ -174,7 +181,7 @@ (define_mode_attr Yv_Yw
 
 (define_expand "mov"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
-   (match_operand:MMXMODE 1 "nonimm_or_0_operand"))]
+   (match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"))]
   "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
@@ -183,9 +190,9 @@ (define_expand "mov"
 
 (define_insn "*mov_internal"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand"
-"=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
-   (match_operand:MMXMODE 1 "nonimm_or_0_operand"
-"rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
+"=r ,o ,r,r ,m ,m   ,?!y,!y,?!y,m  ,r  ,?!y,v,v   
,v,v,m,r,v,!y,*x")
+   (match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"
+"rCo,rC,C,rm,rC,,C  ,!y,m  ,?!y,?!y,r  
,C,,v,m,v,v,r,*x,!y"))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))
&& ix86_hardreg_mov_ok (operands[0], operands[1])"
@@ -230,31 +237,31 @@ (define_insn "*mov_internal"
   [(set (attr "isa")
  (cond [(eq_attr "alternative" "0,1")
  (const_string "nox64")
-   (eq_attr "alternative" "2,3,4,9,10")
+   (eq_attr "alternative" "2,3,4,10,11")
  (const_string "x64")
-   (eq_attr "alternative" "15,16")
+   (eq_attr "alternative" "5,17,18")
  (const_string "x64_sse2")
-   (eq_attr "alternative" "17,18")
+   (eq_attr "alternative" "13,19,20")
  (const_string "sse2")
   ]
   (const_string "*")))
(set (attr "type")
  (cond [(eq_attr "alternative" "0,1")
  (const_string "multi")
-   (eq_attr "alternative" "2,3,4")
+

Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-11 Thread Claudiu Zissulescu-Ianculescu

Hi,
> 
> Currently, the data type of sanitizer flags is unsigned int, with
> SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
> enumerator for enum sanitize_code.  Use 'uint64_t' data type to allow
> for more distinct instrumentation modes be added when needed.
> 
> 
> 
> I have not looked yet but does it make sense to use `unsigned
> HOST_WIDE_INT` instead of uint64_t? HWI should be the same as uint64_t
> but it is more consistent with the rest of gcc.
> Plus since tree_to_uhwi is more consistent there.
> 
That was in the v2, however, the reviewers suggested to use uint64_t.

Best wishes,
Claudiu

Re: [PATCH v4 0/1] Add warnings of potentially-uninitialized padding bits

2025-07-11 Thread Christopher Bazley


Ping.

Thanks,
Chris

On 23/06/2025 14:48, Christopher Bazley wrote:

Dear GCC Developers,

I previously received comments from Joseph and Jakub, which I believe 
I addressed more than a month ago.

Please could someone review version 4?

Thanks,

Chris

On 21/05/2025 16:13, Christopher Bazley wrote:

Commit 0547dbb725b reduced the number of cases in which
union padding bits are zeroed when the relevant language
standard does not strictly require it, unless gcc was
invoked with -fzero-init-padding-bits=unions or
-fzero-init-padding-bits=all in order to explicitly
request zeroing of padding bits.

This commit adds a closely related warning,
-Wzero-init-padding-bits=, which is intended to help
programmers to find code that might now need to be
rewritten or recompiled with
-fzero-init-padding-bits=unions or
-fzero-init-padding-bits=all in order to replicate
the behaviour that it had when compiled by older
versions of GCC. It can also be used to find struct
padding that was never previously guaranteed to be
zero initialized and still isn't unless GCC is
invoked with -fzero-init-padding-bits=all option.

The new warning can be set to the same three states
as -fzero-init-padding-bits ('standard', 'unions'
or 'all') and has the same default value ('standard').

The two options interact as follows:

   f: standard  f: unions   f: all
w: standard X X X
w: unions   U X X
w: all  A S X

X = No warnings about padding
U = Warnings about padding of unions.
S = Warnings about padding of structs.
A = Warnings about padding of structs and unions.

The level of optimisation and whether or not the
entire initializer is dropped to memory can both
affect whether warnings are produced when compiling
a given program. This is intentional, since tying
the warnings more closely to the relevant language
standard would require a very different approach
that would still be target-dependent, might impose
an unacceptable burden on programmers, and would
risk not satisfying the intended use-case (which
is closely tied to a specific optimisation).

Bootstrapped the compiler and tested on AArch64
and x86-64 using some new tests for
-Wzero-init-padding-bits and the existing tests
for -fzero-init-padding-bits
(check-gcc RUNTESTFLAGS="dg.exp=*-empty-init-*.c").

Base commit is a470433732e77ae29a717cf79049ceeea3cbe979

Changes in v2:
  - Added missing changelog entry.

Changes in v3:
  - Modified two tests in which I had neglected to
    ensure that initializers were not compile time
    constants. This policy prevents the entire
    initializer being dropped to memory, which
    would otherwise prevent the expected diagnostic
    message from being produced.
  - Amended the diagnostic message from "Padding bits
    might not.." to "padding might not..."

Changes in v4:
- Removed redundant braces.
- Added "if code relies on it being zero," to the
   diagnostic message.

Link to v1:
https://inbox.sourceware.org/gcc-patches/20250520104940.3546-1-chris.baz...@arm.com/ 



Link to v2:
https://inbox.sourceware.org/gcc-patches/20250520144524.5968-1-chris.baz...@arm.com/ 



Link to v3:
https://inbox.sourceware.org/gcc-patches/20250521124745.24592-1-chris.baz...@arm.com/ 



Christopher Bazley (1):
   Add warnings of potentially-uninitialized padding bits

  gcc/common.opt    |  4 +
  gcc/doc/invoke.texi   | 85 ++-
  gcc/expr.cc   | 41 -
  gcc/expr.h    |  7 +-
  gcc/gimplify.cc   | 27 +-
  gcc/testsuite/gcc.dg/c23-empty-init-warn-1.c  | 68 +++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-10.c |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-11.c |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-12.c |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-13.c |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-14.c |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-15.c |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-16.c |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-17.c | 51 +++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-2.c  | 69 +++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-3.c  |  7 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-4.c  | 69 +++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-5.c  |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-6.c  |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-7.c  |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-8.c  |  8 ++
  gcc/testsuite/gcc.dg/c23-empty-init-warn-9.c  | 69 +++
  .../gcc.dg/gnu11-empty-init-warn-1.c  | 52 
  .../gcc.dg/gnu11-empty-init-warn-10.c |  8 ++
  .../gcc.dg/gnu11-empty-init-warn-11.c |  8 ++
  .../gcc.dg/gnu11-empty-init-warn-12.c |  8 ++
  .../gcc.dg/gnu11-empty-init-warn-13.c |  8 ++
  .../gcc.dg/gnu11-empty-init-warn-14.c

Re: [PATCH] c++, libstdc++, v5: Implement C++26 P3068R5 - constexpr exceptions [PR117785]

2025-07-11 Thread Jonathan Wakely

I think we want something like this:

--- a/libstdc++-v3/libsupc++/exception_ptr.h
+++ b/libstdc++-v3/libsupc++/exception_ptr.h
@@ -297,10 +297,13 @@ namespace std _GLIBCXX_VISIBILITY(default)
  /// Obtain an exception_ptr pointing to a copy of the supplied object.
#if (__cplusplus >= 201103L && __cpp_rtti) || __cpp_exceptions
  template
-_GLIBCXX26_CONSTEXPR exception_ptr
+#if defined __cpp_exceptions && defined __cpp_constexpr_exceptions
+constexpr
+#endif
+exception_ptr
make_exception_ptr(_Ex __ex) _GLIBCXX_USE_NOEXCEPT
{
-#if __cplusplus >= 202400L
+#if defined __cpp_exceptions && defined __cpp_constexpr_exceptions
  if consteval {
   try
 {

Otherwise I see this when testing with GLIBCXX_TESTSUITE_STDS=26

FAIL: 17_intro/headers/c++1998/all_no_exceptions.cc  -std=gnu++26
(test for excess errors)

The test just tries to include all headers using -fno-exceptions

Re: [PATCH] tree-optimization/120939 - remove uninitialized use of LOOP_VINFO_COST_MODEL_THRESHOLD

2025-07-11 Thread Richard Biener

On Thu, 10 Jul 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following removes an optimization that wrongly triggers right now
> > because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be
> > computed yet.
> >
> > Testing on x86_64 didn't reveal any testsuite coverage.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> >
> > PR tree-optimization/120939
> > * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
> > Remove eliding an epilogue based on not computed
> > LOOP_VINFO_COST_MODEL_THRESHOLD.
> 
> This regresses:
> 
> FAIL: gcc.dg/torture/pr113026-1.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions   (test for bogus messages, line 10)
> 
> on aarch64-linux-gnu, with:
> 
> .../pr113026-1.c:10:12: warning: writing 1 byte into a region of size 0 
> [-Wstringop-overflow=]
> .../pr113026-1.c:4:6: note: at offset 16 into destination object 'dst' of 
> size 16
> 
> I haven't looked into why yet, but it does seem superficially similar
> to PR60505, which was what this code seems to have been added to fix
> (g:090cd8dc70b80183c83d9f43f1e6ab9970481efd).

Yes.  I now also see the above diagnostic (not sure why it escaped me).

Now, the issue is this elides creating a vector epilog with 8 byte
vectors so we never vectorize when n is in [8, 16].  With the proposed
patch we'd again do that, but we also have a scalar epilog then
and warn.

Fact is that neither LOOP_VINFO_COST_MODEL_THRESHOLD nor
LOOP_VINFO_VERSIONING_THRESHOLD are initialized when we apply
this heuristic and the actual versioning condition is
using LOOP_VINFO_VERSIONING_THRESHOLD (when not zero)
plus, when the cost model threshold and that are not ordered
and vect_apply_runtime_profitability_check_p we apply
LOOP_VINFO_COST_MODEL_THRESHOLD as well (see vect_loop_versioning).

So besides the above mentioned missed optimization caused by
the heuristic I fear there is the chance of wrong code or at
least we are applying the heuristic at a wrong place.  It
was also added before we had any epilogue vectorization.

What happens for the diagnostic to re-appear is that we
end up with unrolled scalar epilogues we diagnose.

I'll re-spin with an added xfail on this testcase and re-open
the bugreport?

Richard.


> Thanks,
> Richard
> 
> > ---
> >  gcc/tree-vect-loop.cc | 21 ++---
> >  1 file changed, 2 insertions(+), 19 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 46a6399243d..7ac61d4dce2 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1224,13 +1224,6 @@ static bool
> >  vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
> >  {
> >unsigned HOST_WIDE_INT const_vf;
> > -  HOST_WIDE_INT max_niter
> > -= likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> > -
> > -  unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> > -  if (!th && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
> > -th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
> > - (loop_vinfo));
> >  
> >if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >&& LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > @@ -1250,18 +1243,8 @@ vect_need_peeling_or_partial_vectors_p 
> > (loop_vec_info loop_vinfo)
> >  VF * N + 1.  That's something of a niche case though.  */
> >|| LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> >|| !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf)
> > -  || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> > -  < (unsigned) exact_log2 (const_vf))
> > - /* In case of versioning, check if the maximum number of
> > -iterations is greater than th.  If they are identical,
> > -the epilogue is unnecessary.  */
> > - && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > - || ((unsigned HOST_WIDE_INT) max_niter
> > - /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
> > -but that's only computed later based on our result.
> > -The following is the most conservative approximation.  */
> > - > (std::max ((unsigned HOST_WIDE_INT) th,
> > -  const_vf) / const_vf) * const_vf
> > +  || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> > + < (unsigned) exact_log2 (const_vf)))
> >  return true;
> >  
> >return false;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: Rewrite assign_discriminators pass

2025-07-11 Thread Jan Hubicka

> So with this the discriminator we assign might depend on whether
> we have debug stmts or not.  We output them only to debug info, so
> it should in principle not cause compare-debug issues, right?  And
> we don't use discriminators to affect code generation (hopefully).

This is the reason of opts.cc change:
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index 6ca1ec7e865..60ad633b7ff 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -1411,11 +1411,14 @@ finish_options (struct gcc_options *opts, struct 
> gcc_options *opts_set,
>   opts->x_debug_info_level = DINFO_LEVEL_NONE;
>  }
>  
> +  /* Also enable markers with -fauto-profile even when debug info is 
> disabled,
> + so we assign same discriminators and can read back the profile info.  */
>if (!opts_set->x_debug_nonbind_markers_p)
>  opts->x_debug_nonbind_markers_p
>= (opts->x_optimize
> -  && opts->x_debug_info_level >= DINFO_LEVEL_NORMAL
> -  && (dwarf_debuginfo_p (opts) || codeview_debuginfo_p ())
> +  && ((opts->x_debug_info_level >= DINFO_LEVEL_NORMAL
> +   && (dwarf_debuginfo_p (opts) || codeview_debuginfo_p ()))
> +  || opts->x_flag_auto_profile)
>&& !(opts->x_flag_selective_scheduling
> || opts->x_flag_selective_scheduling2));

We only consume discriminators if we produce dwarf or if we do
auto-profile and they indeed must agree.  With -Wauto-profile you now
get compiler complain if they does not.  

I enable debug stmt markers in both cases, so discriminators should be
the same.  I tested that on spec and it seems to work. As discussed on
IRC I will look into possibility of enabling compare_debug for
profiledbootstrap and autoprofiledbootstrap where it is currently off.

We already have function to remove debug statements, so I guess we could
remove them after auto-profile annotate pass at -O0, that I plan to look
into incrementally now.  My immediate plan is to fix the create_gcov
consumer, so things can be finally properly tested.

Honza

GCC 15.1.1 Status Report (2025-07-11)

2025-07-11 Thread Richard Biener



The releases/gcc-15 branch is open for regression and documentation fixes.
This is now the time to prepare for the GCC 15.2 release - a release
candidate is planned for Friday Aug 1st, three weeks from now, with
the GCC 15.2 release following a week after that.

Please go over reported regressions for your target and maintainance
area and see which ones can be fixed and/or backported from trunk.  For
GCC 15.2 we are more permissive with what kind of fixes we allow, esp.
it is still possible to resolve missed-optimization regressions.


Quality Data


Priority  #   Change from last report
---   ---
P11+   1
P2  596+  16
P3  185+  84
P4  236-   3
P5   23 
---   ---
Total P1-P3 782+ 101
Total  1041+  98


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2025-April/245972.html

Re: [PATCH] tree-optimization/120939 - remove uninitialized use of LOOP_VINFO_COST_MODEL_THRESHOLD

2025-07-11 Thread Richard Sandiford

Richard Biener  writes:
> On Thu, 10 Jul 2025, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > The following removes an optimization that wrongly triggers right now
>> > because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be
>> > computed yet.
>> >
>> > Testing on x86_64 didn't reveal any testsuite coverage.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > OK?
>> >
>> >PR tree-optimization/120939
>> >* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
>> >Remove eliding an epilogue based on not computed
>> >LOOP_VINFO_COST_MODEL_THRESHOLD.
>> 
>> This regresses:
>> 
>> FAIL: gcc.dg/torture/pr113026-1.c   -O3 -fomit-frame-pointer -funroll-loops 
>> -fpeel-loops -ftracer -finline-functions   (test for bogus messages, line 10)
>> 
>> on aarch64-linux-gnu, with:
>> 
>> .../pr113026-1.c:10:12: warning: writing 1 byte into a region of size 0 
>> [-Wstringop-overflow=]
>> .../pr113026-1.c:4:6: note: at offset 16 into destination object 'dst' of 
>> size 16
>> 
>> I haven't looked into why yet, but it does seem superficially similar
>> to PR60505, which was what this code seems to have been added to fix
>> (g:090cd8dc70b80183c83d9f43f1e6ab9970481efd).
>
> Yes.  I now also see the above diagnostic (not sure why it escaped me).
>
> Now, the issue is this elides creating a vector epilog with 8 byte
> vectors so we never vectorize when n is in [8, 16].  With the proposed
> patch we'd again do that, but we also have a scalar epilog then
> and warn.
>
> Fact is that neither LOOP_VINFO_COST_MODEL_THRESHOLD nor
> LOOP_VINFO_VERSIONING_THRESHOLD are initialized when we apply
> this heuristic and the actual versioning condition is
> using LOOP_VINFO_VERSIONING_THRESHOLD (when not zero)
> plus, when the cost model threshold and that are not ordered
> and vect_apply_runtime_profitability_check_p we apply
> LOOP_VINFO_COST_MODEL_THRESHOLD as well (see vect_loop_versioning).
>
> So besides the above mentioned missed optimization caused by
> the heuristic I fear there is the chance of wrong code or at
> least we are applying the heuristic at a wrong place.  It
> was also added before we had any epilogue vectorization.

Hmm, yeah.  Seems like this went wrong with
g:052204fac580b21c967e57e6285d99a9828b8fac.  Before that, the test
was only applied after vect_analyze_loop_costing.

> What happens for the diagnostic to re-appear is that we
> end up with unrolled scalar epilogues we diagnose.
>
> I'll re-spin with an added xfail on this testcase and re-open
> the bugreport?

Sounds good to me FWIW.

Thanks,
Richard

RE: [PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned SAT_MUL pattern

2025-07-11 Thread Li, Pan2

> A widen-mul could also be a QImode x QImode -> HImode operation, or a
> QImode x QImode -> SImode
> operation.  The only restriction is the result is at least twice as
> wide as the inputs.

I see, that make sense, looks like what we do from truncate, will update in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, July 11, 2025 4:49 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Chen, Ken ; 
Liu, Hongtao 
Subject: Re: [PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned SAT_MUL 
pattern

On Fri, Jul 11, 2025 at 9:13 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > Why is it important to constrain the widen-mult input to a
> > fixed precision at all?
>
> I suppose widen-mult only occurs when the result exceed the max bits of gpr.
> So, here I would like to make sure the precision is matching the bits of gpr.
>
> For rv32 with 32-bits gpr, 32 bits * 32 bits => 64 bits
>
>   15   │   _1 = (long long unsigned int) a_4(D);
>   16   │   _2 = (long long unsigned int) b_5(D);
>   17   │   _9 = (unsigned int) _1;
>   18   │   _10 = (unsigned int) _2;
>   19   │   x_6 = _9 w* _10;
>
> For rv64 with 64-bits gpr, 64 bits * 64 bits => 128 bits
>
>   15   │   _1 = (__int128 unsigned) a_4(D);
>   16   │   _2 = (__int128 unsigned) b_5(D);
>   17   │   _9 = (unsigned long) _1;
>   18   │   _10 = (unsigned long) _2;
>   19   │   x_6 = _9 w* _10;
>
> But if it is a widen-mul, it looks like that the ops of widen-mul should be 
> the max bits of gpr already.
> Or it will be a normal mul, then looks we don't need to check that anymore, 
> am I understanding correct?

A widen-mul could also be a QImode x QImode -> HImode operation, or a
QImode x QImode -> SImode
operation.  The only restriction is the result is at least twice as
wide as the inputs.

Richard.

>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, July 11, 2025 2:23 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com; Chen, Ken ; 
> Liu, Hongtao 
> Subject: Re: [PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned 
> SAT_MUL pattern
>
> On Fri, Jul 11, 2025 at 6:51 AM  wrote:
> >
> > From: Pan Li 
> >
> > The widen mul has different source type for differnt platform,
> > like rv32 or rv64.  For rv32, the source of widen mul is 32-bits
> > while 64-bits in rv64.  Thus, leverage HOST_WIDE_INT is not that
> > correct and result in the pattern match failures in 32-bits system
> > like rv32.
> >
> > Thus, leverage the BITS_PER_WORD instead for this pattern.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Leverage BITS_PER_WORD instead of HOST_WIDE_INT
> > for widen mul precision check.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/match.pd | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 67b33eee5f7..7f31705b652 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3605,11 +3605,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> >unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
> >unsigned cvt6_prec = TYPE_PRECISION (TREE_TYPE (@6));
> > -  unsigned hw_int_prec = sizeof (HOST_WIDE_INT) * 8;
> >wide_int c2 = wi::to_wide (@2);
> >wide_int max = wi::mask (prec, false, widen_prec);
> >bool c2_is_max_p = wi::eq_p (c2, max);
> > -  bool widen_mult_p = cvt5_prec == cvt6_prec && hw_int_prec == 
> > cvt5_prec;
> > +  bool widen_mult_p = cvt5_prec == cvt6_prec && BITS_PER_WORD == 
> > cvt5_prec;
>
> Why is it important to constrain the widen-mult input to a
> fixed precision at all?
>
> >   }
> >   (if (widen_prec > prec && c2_is_max_p && widen_mult_p)
> >  )
> > --
> > 2.43.0
> >

RE: [PATCH] i386: Add a new peeophole2 for PR91384 under APX_F

2025-07-11 Thread Liu, Hongtao




> -Original Message-
> From: Hu, Lin1 
> Sent: Wednesday, June 4, 2025 3:26 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] i386: Add a new peeophole2 for PR91384 under APX_F
> 
> gcc/ChangeLog:
> 
>   PR target/91384
>   * config/i386/i386.md: Add new peeophole2 for optimize *negsi_1
>   followed by *cmpsi_ccno_1 with APX_F.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/91384
>   * gcc.target/i386/pr91384-1.c: New test.

Ok for the trunk.

> ---
>  gcc/config/i386/i386.md   | 11 +++
>  gcc/testsuite/gcc.target/i386/pr91384-1.c | 20 
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr91384-1.c
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
> b7a18d583da..6f87606b02b 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -14572,6 +14572,17 @@ (define_peephole2
>  (compare:CCZ (neg:SWI (match_dup 0)) (const_int 0)))
> (set (match_dup 0) (neg:SWI (match_dup 0)))])])
> 
> +;; Optimize *negsi_1 followed by *cmpsi_ccno_1 (PR target/91384) with
> +APX_F
> +(define_peephole2
> +  [(parallel [(set (match_operand:SWI 0 "general_reg_operand")
> +(neg:SWI (match_operand:SWI 1 "general_reg_operand")))
> +   (clobber (reg:CC FLAGS_REG))])
> +   (set (reg:CCZ FLAGS_REG) (compare:CCZ (match_dup 1) (const_int 0)))]
> +  "TARGET_APX_NDD"
> +  [(parallel [(set (reg:CCZ FLAGS_REG)
> +(compare:CCZ (neg:SWI (match_dup 1)) (const_int 0)))
> +   (set (match_dup 0) (neg:SWI (match_dup 1)))])])
> +
>  ;; Special expand pattern to handle integer mode abs
> 
>  (define_expand "abs2"
> diff --git a/gcc/testsuite/gcc.target/i386/pr91384-1.c
> b/gcc/testsuite/gcc.target/i386/pr91384-1.c
> new file mode 100644
> index 000..4f8823d6da2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr91384-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mapxf" } */
> +
> +void foo (void);
> +void bar (void);
> +
> +int
> +test (int a)
> +{
> +  int r;
> +
> +  if (r = -a)
> +foo ();
> +  else
> +bar ();
> +
> +  return r;
> +}
> +
> +/* { dg-final { scan-assembler-not "testl" } } */
> --
> 2.31.1

Re: [Patch, fortran] PR106135 - Implement F2018 IMPORT statements

2025-07-11 Thread Paul Richard Thomas

Thanks, Jerry.

Pushed as r16-2189. Note however that s/pr106135/pr106035 is required
throughout. I will attend to it tomorrow.

Paul


On Mon, 23 Jun 2025 at 19:27, Jerry D  wrote:

> On 6/23/25 9:43 AM, Paul Richard Thomas wrote:
> > Hello All,
> >
> > I was mulling over the F2018 status of gfortran, when I came across the
> > additions to the IMPORT statement. This seemed like such a useful
> > addition to fortran that I set about an implementation; thinking that
> > this would be low hanging fruit. Parsing and checking the
> > constraints C897-8100 turned out to be straightforward. C8101 was
> > already implemented for F2008 IMPORT. C8102 required a lot more work!
> > (Please see the patch for the constraints.)
> >
> > Steve K got in touch, when he found out that we had been working in
> > parallel on the new IMPORT features. Thus encouraged by our exchanges, I
> > ground on until the patch reached its present state. I think that the
> > ChangeLog is clear enough, even if the patch came out a bit long winded.
> >
> > Of the existing IMPORT tests, only import3.f90 needed modification by
> > setting -std=f2008 because of the change in the wording of the error
> > messages. The new test, import12.f90, is complete IMHO but I am open to
> > suggestions for additions. I cannot return to working on this until the
> > second week of July so you have plenty of time to test and comment.
> >
> > Regtests fine with x86_64 on FC42. OK for mainline?
> >
> > Paul
>
> I do think the new test case is very thorough. I reviewed that patch for
> anything glaring and it looks good. I also tried it on some things here
> and got what I expected.
>
> If you and Steve have been in touch I think it is OK for mainline.
>
> Cheers,
>
> Jerry
>

[PATCH v3 5/5] riscv: testsuite: Fix misalignment check.

2025-07-11 Thread Robin Dapp

This fixes a thinko in the misalignment check.  If we want to check for
vector misalignment support we need to load 16-byte elements, not
8-byte elements that will never be misaligned.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Fix misalignment check.
---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 9ab46a0eab4..46c7ab3d2ab 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2428,7 +2428,7 @@ proc check_effective_target_riscv_v_misalign_ok { } {
= {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
  asm ("vsetivli zero,7,e8,m1,ta,ma");
  asm ("addi a7,%0,1" : : "r" (a) : "a7" );
- asm ("vle8.v v8,0(a7)" : : : "v8");
+ asm ("vle16.v v8,0(a7)" : : : "v8");
  return 0; } } "-march=${gcc_march}"] } {
return 1
 }
-- 
2.50.0

[PATCH v3 1/5] ifn: Add helper functions for gather/scatter.

2025-07-11 Thread Robin Dapp

This patch adds access helpers for the gather/scatter offset and scale
parameters.

gcc/ChangeLog:

* internal-fn.cc (expand_scatter_store_optab_fn): Use new
function.
(expand_gather_load_optab_fn): Ditto.
(internal_fn_offset_index): Ditto.
(internal_fn_scale_index): Ditto.
* internal-fn.h (internal_fn_offset_index): New function.
(internal_fn_scale_index): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Adjust index.
* tree-vect-data-refs.cc (vect_describe_gather_scatter_call):
Use new function.
---
 gcc/internal-fn.cc | 57 ++
 gcc/internal-fn.h  |  2 ++
 gcc/tree-vect-data-refs.cc |  6 ++--
 3 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 044bdc22481..4a9dc26e836 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3652,8 +3652,8 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   internal_fn ifn = gimple_call_internal_fn (stmt);
   int rhs_index = internal_fn_stored_value_index (ifn);
   tree base = gimple_call_arg (stmt, 0);
-  tree offset = gimple_call_arg (stmt, 1);
-  tree scale = gimple_call_arg (stmt, 2);
+  tree offset = gimple_call_arg (stmt, internal_fn_offset_index (ifn));
+  tree scale = gimple_call_arg (stmt, internal_fn_scale_index (ifn));
   tree rhs = gimple_call_arg (stmt, rhs_index);
 
   rtx base_rtx = expand_normal (base);
@@ -3678,12 +3678,12 @@ expand_scatter_store_optab_fn (internal_fn, gcall 
*stmt, direct_optab optab)
 /* Expand {MASK_,}GATHER_LOAD call CALL using optab OPTAB.  */
 
 static void
-expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
+expand_gather_load_optab_fn (internal_fn ifn, gcall *stmt, direct_optab optab)
 {
   tree lhs = gimple_call_lhs (stmt);
   tree base = gimple_call_arg (stmt, 0);
-  tree offset = gimple_call_arg (stmt, 1);
-  tree scale = gimple_call_arg (stmt, 2);
+  tree offset = gimple_call_arg (stmt, internal_fn_offset_index (ifn));
+  tree scale = gimple_call_arg (stmt, internal_fn_scale_index (ifn));
 
   rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   rtx base_rtx = expand_normal (base);
@@ -5125,6 +5125,53 @@ internal_fn_stored_value_index (internal_fn fn)
 }
 }
 
+/* If FN is a gather/scatter return the index of its offset argument,
+   otherwise return -1.  */
+
+int
+internal_fn_offset_index (internal_fn fn)
+{
+  if (!internal_gather_scatter_fn_p (fn))
+return -1;
+
+  switch (fn)
+{
+case IFN_GATHER_LOAD:
+case IFN_MASK_GATHER_LOAD:
+case IFN_MASK_LEN_GATHER_LOAD:
+case IFN_SCATTER_STORE:
+case IFN_MASK_SCATTER_STORE:
+case IFN_MASK_LEN_SCATTER_STORE:
+  return 1;
+
+default:
+  return -1;
+}
+}
+
+/* If FN is a gather/scatter return the index of its scale argument,
+   otherwise return -1.  */
+
+int
+internal_fn_scale_index (internal_fn fn)
+{
+  if (!internal_gather_scatter_fn_p (fn))
+return -1;
+
+  switch (fn)
+{
+case IFN_GATHER_LOAD:
+case IFN_MASK_GATHER_LOAD:
+case IFN_MASK_LEN_GATHER_LOAD:
+case IFN_SCATTER_STORE:
+case IFN_MASK_SCATTER_STORE:
+case IFN_MASK_LEN_SCATTER_STORE:
+  return 2;
+
+default:
+  return -1;
+}
+}
 
 /* Store all supported else values for the optab referred to by ICODE
in ELSE_VALS.  The index of the else operand must be specified in
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index afd4f8e64c7..c5b533c0abd 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -239,6 +239,8 @@ extern int internal_fn_mask_index (internal_fn);
 extern int internal_fn_len_index (internal_fn);
 extern int internal_fn_else_index (internal_fn);
 extern int internal_fn_stored_value_index (internal_fn);
+extern int internal_fn_offset_index (internal_fn fn);
+extern int internal_fn_scale_index (internal_fn fn);
 extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
tree, tree, int,
vec * = nullptr);
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index c84cd29116e..5b2cb537438 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4539,10 +4539,12 @@ vect_describe_gather_scatter_call (stmt_vec_info 
stmt_info,
   info->ifn = gimple_call_internal_fn (call);
   info->decl = NULL_TREE;
   info->base = gimple_call_arg (call, 0);
-  info->offset = gimple_call_arg (call, 1);
+  info->offset = gimple_call_arg
+ (call, internal_fn_offset_index (info->ifn));
   info->offset_dt = vect_unknown_def_type;
   info->offset_vectype = NULL_TREE;
-  info->scale = TREE_INT_CST_LOW (gimple_call_arg (call, 2));
+  info->scale = TREE_INT_CST_LOW (gimple_call_arg
+ (call, internal_fn_scale_index (info->ifn)));
   info->element_type = TREE_TYPE (vectype

[PATCH v3 4/5] vect: Misalign checks for gather/scatter.

2025-07-11 Thread Robin Dapp

This patch adds simple misalignment checks for gather/scatter
operations.  Previously, we assumed that those perform element accesses
internally so alignment does not matter.  The riscv vector spec however
explicitly states that vector operations are allowed to fault on
element-misaligned accesses.  Reasonable uarchs won't, but...

For gather/scatter we have two paths in the vectorizer:

 (1) Regular analysis based on datarefs.  Here we can also create
 strided loads.
 (2) Non-affine access where each gather index is relative to the
 initial address.

The assumption this patch works off is that once the alignment for the
first scalar is correct, all others will fall in line, as the index is
always a multiple of the first element's size.

For (1) we have a dataref and can check it for alignment as in other
cases.  For (2) this patch checks the object alignment of BASE and
compares it against the natural alignment of the current vectype's unit.

The patch also adds a pointer argument to the gather/scatter IFNs that
contains the necessary alignment.  Most of the patch is thus mechanical
in that it merely adjusts indices.

I tested the riscv version with a custom qemu version that faults on
element-misaligned vector accesses.  With this patch applied, there is
just a single fault left, which is due to PR120782 and which will be
addressed separately.

Bootstrapped and regtested on x86 and aarch64.  Regtested on
rv64gcv_zvl512b with and without unaligned vector support.

gcc/ChangeLog:

* internal-fn.cc (internal_fn_len_index): Adjust indices for new
alias_ptr param.
(internal_fn_else_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_fn_alias_ptr_index): Ditto.
(internal_fn_offset_index): Ditto.
(internal_fn_scale_index): Ditto.
(internal_gather_scatter_fn_supported_p): Ditto.
* optabs-query.cc (supports_vec_gather_load_p): Ditto.
* tree-vect-data-refs.cc (vect_check_gather_scatter): Add alias
pointer.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Add
alias pointer.
* tree-vect-slp.cc (vect_get_operand_map): Adjust for alias
pointer.
* tree-vect-stmts.cc (vect_truncate_gather_scatter_offset): Add
alias pointer and misalignment handling.
(get_load_store_type): Move from here...
(get_group_load_store_type): ...To here.
(vectorizable_store): Add alias pointer.
(vectorizable_load): Ditto.
* tree-vectorizer.h (struct gather_scatter_info): Ditto.
---
 gcc/internal-fn.cc |  43 ++--
 gcc/internal-fn.h  |   1 +
 gcc/optabs-query.cc|   6 +-
 gcc/tree-vect-data-refs.cc |   7 ++
 gcc/tree-vect-patterns.cc  |  17 +--
 gcc/tree-vect-slp.cc   |  16 +--
 gcc/tree-vect-stmts.cc | 214 +++--
 gcc/tree-vectorizer.h  |   4 +
 8 files changed, 198 insertions(+), 110 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 4a9dc26e836..6c0155e4c63 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4940,11 +4940,13 @@ internal_fn_len_index (internal_fn fn)
   return 2;
 
 case IFN_MASK_LEN_SCATTER_STORE:
+  return 6;
+
 case IFN_MASK_LEN_STRIDED_LOAD:
   return 5;
 
 case IFN_MASK_LEN_GATHER_LOAD:
-  return 6;
+  return 7;
 
 case IFN_COND_LEN_FMA:
 case IFN_COND_LEN_FMS:
@@ -5048,7 +5050,7 @@ internal_fn_else_index (internal_fn fn)
 
 case IFN_MASK_GATHER_LOAD:
 case IFN_MASK_LEN_GATHER_LOAD:
-  return 5;
+  return 6;
 
 default:
   return -1;
@@ -5083,7 +5085,7 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_SCATTER_STORE:
 case IFN_MASK_LEN_GATHER_LOAD:
 case IFN_MASK_LEN_SCATTER_STORE:
-  return 4;
+  return 5;
 
 case IFN_VCOND_MASK:
 case IFN_VCOND_MASK_LEN:
@@ -5108,10 +5110,11 @@ internal_fn_stored_value_index (internal_fn fn)
 
 case IFN_MASK_STORE:
 case IFN_MASK_STORE_LANES:
+  return 3;
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_MASK_LEN_SCATTER_STORE:
-  return 3;
+  return 4;
 
 case IFN_LEN_STORE:
   return 4;
@@ -5125,6 +5128,28 @@ internal_fn_stored_value_index (internal_fn fn)
 }
 }
 
+/* If FN has an alias pointer return its index, otherwise return -1.  */
+
+int
+internal_fn_alias_ptr_index (internal_fn fn)
+{
+  switch (fn)
+{
+case IFN_MASK_LOAD:
+case IFN_MASK_LEN_LOAD:
+case IFN_GATHER_LOAD:
+case IFN_MASK_GATHER_LOAD:
+case IFN_MASK_LEN_GATHER_LOAD:
+case IFN_SCATTER_STORE:
+case IFN_MASK_SCATTER_STORE:
+case IFN_MASK_LEN_SCATTER_STORE:
+  return 1;
+
+default:
+  return -1;
+}
+}
+
 /* If FN is a gather/scatter return the index of its offset argument,
otherwise return -1.  */
 
@@ -5142,7 +5167,7 @@ internal_fn_offset_index (internal_fn fn)

Re: [PATCH] c, c++: Extend -Wunused-but-set-* warnings [PR44677]

2025-07-11 Thread Jakub Jelinek

On Thu, Jul 10, 2025 at 04:35:49PM -0400, Jason Merrill wrote:
> > --- gcc/cp/cp-gimplify.cc.jj2025-04-12 21:41:42.660924514 +0200
> > +++ gcc/cp/cp-gimplify.cc   2025-04-23 21:33:19.050931604 +0200
> > @@ -3200,7 +3200,23 @@ cp_fold (tree x, fold_flags_t flags)
> > loc = EXPR_LOCATION (x);
> > op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops, flags);
> > +  bool clear_decl_read;
> > +  clear_decl_read = false;
> > +  if (code == MODIFY_EXPR
> > + && (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
> > + && !DECL_READ_P (op0)
> > + && (VAR_P (op0) ? warn_unused_but_set_variable
> > + : warn_unused_but_set_parameter) > 2
> > + && BINARY_CLASS_P (TREE_OPERAND (x, 1))
> > + && TREE_OPERAND (TREE_OPERAND (x, 1), 0) == op0)
> > +   {
> > + mark_exp_read (TREE_OPERAND (TREE_OPERAND (x, 1), 1));
> > + if (!DECL_READ_P (op0))
> > +   clear_decl_read = true;
> > +   }
> > op1 = cp_fold_rvalue (TREE_OPERAND (x, 1), flags);
> > +  if (clear_decl_read)
> > +   DECL_READ_P (op0) = 0;
> 
> Why does this need to happen in cp_fold?  Weren't the flags set properly at
> build time?

Without the cp-gimplify.cc (cp_fold) hunk there are tons of FAILs, with
GXX_TESTSUITE_STDS=17 make check-g++ RUNTESTFLAGS="dg.exp='Wunused-var* 
Wunused-parm* name-independent-decl1.C unused-9.c memchr-3.c'"
(just 17 so that the same FAILs don't keep repeated for different std
versions):
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 14)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 15)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 16)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 17)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 18)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 19)
FAIL: g++.dg/warn/Wunused-parm-12.C  -std=gnu++17  (test for warnings, line 20)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 14)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 15)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 16)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 17)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 18)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 19)
FAIL: g++.dg/warn/Wunused-parm-13.C  -std=gnu++17  (test for warnings, line 20)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 13)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 14)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 15)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 16)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 17)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 18)
FAIL: c-c++-common/Wunused-parm-1.c  -std=gnu++17  (test for warnings, line 19)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 13)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 14)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 15)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 16)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 17)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 18)
FAIL: c-c++-common/Wunused-parm-2.c  -std=gnu++17  (test for warnings, line 19)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 13)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 14)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 15)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 16)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 17)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 18)
FAIL: c-c++-common/Wunused-parm-3.c  -std=gnu++17  (test for warnings, line 19)

cp_fold_rvalue -> cp_fold_maybe_rvalue -> mark_rvalue_use -> mark_use
->mark_exp_read then sets DECL_READ_P on op0, which we want to avoid in the
op0 @= expr
case in some of the levels.

> > @@ -211,8 +211,27 @@ mark_use (tree expr, bool rvalue_p, bool
> > }
> >   return expr;
> > }
> > -  gcc_fallthrough();
> > +  gcc_fallthrough ();
> >   CASE_CONVERT:
> > +  if (VOID_TYPE_P (TREE_TYPE (expr)))
> > +   switch (TREE_CODE (TREE_OPERAND (expr, 0)))
> > + {
> > + case PREINCREMENT_EXPR:
> > + case PREDECREMENT_EXPR:
> > + case POSTINCREMENT_EXPR:
> > + case POSTDECREMENT_EXPR:
> 
> Why is this specific to these codes?  I would think we would want consistent
> han

Re: [PATCH] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-11 Thread Jonathan Wakely

On Fri, 11 Jul 2025 at 14:02, Jonathan Wakely  wrote:
>
> The if-consteval branches in std::make_exception_ptr and
> std::exception_ptr_cast use a try-catch block, which gives an error for
> -fno-exceptions. Just make them return a null pointer at compile-time
> when -fno-exceptions is used, because there's no way to get an active
> exception with -fno-exceptions.
>
> For std::exception_ptr_cast the consteval branch doesn't depend on RTTI
> being enabled, so we can move the check for __cpp_rtti into the runtime
> branch. We can also remove the #else group and just fall through to the
> return nullptr statement if there was no return from whichever branch of
> the if-consteval was taken.

Actually for exception_ptr_cast maybe we want:

#ifdef __cpp_rtti
  if not consteval {
const type_info &__id = typeid(const _Ex&);
return static_cast(__p._M_exception_ptr_cast(__id));
  }
#endif

#ifdef __cpp_exceptions
  if (__p._M_exception_object)
try
  {
std::rethrow_exception(__p);
  }
  catch (const _Ex& __exc)
  {
return &__exc;
  }
  catch (...)
  {
  }
#endif
  return nullptr;

So for runtime calls with -frtti we try to use that, and then unless
-fno-exceptions is used we just use try-catch.

That will allow it to work at runtime with either -fno-rtti or
-fno-exceptions (but not both).

For compile-time -fno-rtti doesn't matter, but -fno-exceptions makes
it return null.

[committed] testsuite: Add testcase for already fixed PR [PR120954]

2025-07-11 Thread Jakub Jelinek

Hi!

This was a regression introduced by r16-1893 (and its backports) for C++,
though for C it had false positive warning for years.  Fixed by r16-2000
(and its backports).

Tested on x86_64-linux, committed to trunk as obvious.

2025-07-11  Jakub Jelinek  

PR c++/120954
* c-c++-common/Warray-bounds-11.c: New test.

--- gcc/testsuite/c-c++-common/Warray-bounds-11.c.jj2025-07-11 
13:12:37.438580284 +0200
+++ gcc/testsuite/c-c++-common/Warray-bounds-11.c   2025-07-11 
13:11:58.215096590 +0200
@@ -0,0 +1,21 @@
+/* PR c++/120954 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -Warray-bounds=2" } */
+
+static const int a[32] = { 11, 12, 13, 14, 15 };
+static const int b[32] = { 21, 22, 23, 24, 25 };
+static const int c[32] = { 31, 32, 33, 34, 35 };
+static const int d[32] = { 111, 112, 113, 114, 115 };
+static const int e[32] = { 121, 122, 123, 124, 125 };
+static const int f[32] = { 131, 132, 133, 134, 135 };
+
+int
+foo (int x, int y)
+{
+  int r = 0;
+  if (x >= 0 && x < 32)
+r = (y >= 4 ? (y >= 0x65 ? a : b ) : c)[x];
+  else if (x >= 0x100 && x < 0x120)
+r = (y >= 4 ? (y >= 0x65 ? d : e ) : f)[x - 0x100];
+  return r;
+}

Jakub

Re: [PATCH] x86-64: Add --enable-x86-64-mfentry

2025-07-11 Thread Siddhesh Poyarekar


On 2025-07-08 18:07, Sam James wrote:

OK in principle, but please allow some time for distro maintainers
(CC'd) to voice their opinion.


It looks good to me and I plan on us using it. I'd like opinions from
one other group first before it goes in if possible though, as our
perspective is different from others (e.g. we don't have to worry about
old enterprise deployments).


Why not just switch over unconditionally?  __fentry__ seems like a 
better alternative to mcount overall and it has been around long enough 
that even older deployments should be relatively unaffected.


Sid

Re: [PATCH] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-11 Thread Jonathan Wakely

On Fri, 11 Jul 2025 at 14:21, Jakub Jelinek  wrote:
>
> On Fri, Jul 11, 2025 at 02:11:05PM +0100, Jonathan Wakely wrote:
> > On Fri, 11 Jul 2025 at 14:02, Jonathan Wakely  wrote:
> > >
> > > The if-consteval branches in std::make_exception_ptr and
> > > std::exception_ptr_cast use a try-catch block, which gives an error for
> > > -fno-exceptions. Just make them return a null pointer at compile-time
> > > when -fno-exceptions is used, because there's no way to get an active
> > > exception with -fno-exceptions.
> > >
> > > For std::exception_ptr_cast the consteval branch doesn't depend on RTTI
> > > being enabled, so we can move the check for __cpp_rtti into the runtime
> > > branch. We can also remove the #else group and just fall through to the
> > > return nullptr statement if there was no return from whichever branch of
> > > the if-consteval was taken.
> >
> > Actually for exception_ptr_cast maybe we want:
> >
> > #ifdef __cpp_rtti
> >   if not consteval {
>
> Agreed, just I think not is never used in libstdc++ headers, while
> if !consteval {
> is.

Yes, to work with -fno-operator-names

Re: [PATCH] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-11 Thread Jakub Jelinek

On Fri, Jul 11, 2025 at 01:47:18PM +0100, Jonathan Wakely wrote:
Thanks for doing this.

> @@ -301,8 +301,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
>  _GLIBCXX26_CONSTEXPR exception_ptr
>  make_exception_ptr(_Ex __ex) _GLIBCXX_USE_NOEXCEPT
>  {
> -#if __cplusplus >= 202400L
> +#if __cpp_lib_exception_ptr_cast >= 202506L

Just not sure how is make_exception_ptr related to
__cpp_lib_exception_ptr_cast.
Perhaps __cpp_constexpr_exceptions >= 202411L ?

Jakub

Re: [PATCH] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-11 Thread Jakub Jelinek

On Fri, Jul 11, 2025 at 02:11:05PM +0100, Jonathan Wakely wrote:
> On Fri, 11 Jul 2025 at 14:02, Jonathan Wakely  wrote:
> >
> > The if-consteval branches in std::make_exception_ptr and
> > std::exception_ptr_cast use a try-catch block, which gives an error for
> > -fno-exceptions. Just make them return a null pointer at compile-time
> > when -fno-exceptions is used, because there's no way to get an active
> > exception with -fno-exceptions.
> >
> > For std::exception_ptr_cast the consteval branch doesn't depend on RTTI
> > being enabled, so we can move the check for __cpp_rtti into the runtime
> > branch. We can also remove the #else group and just fall through to the
> > return nullptr statement if there was no return from whichever branch of
> > the if-consteval was taken.
> 
> Actually for exception_ptr_cast maybe we want:
> 
> #ifdef __cpp_rtti
>   if not consteval {

Agreed, just I think not is never used in libstdc++ headers, while
if !consteval {
is.

Jakub

Re: [PATCH] testsuite: arm: Add effective-target vect_early_break to vect-tsvc-*

2025-07-11 Thread Christophe Lyon

Hi Torbjörn,

On Fri, 11 Jul 2025 at 10:47, Torbjörn SVENSSON
 wrote:
>
> Ok for trunk, gcc-15 and gcc-14.
>
> I discovered that the dg-require-effective-target is missing on gcc-14,
> but it's probably the right thing to add on gcc-15 and trunk too.
>
> Without the `dg-require-effective-target vect_early_break`, the
> `dg-add-options vect_early_break` will return the flags unchanged and
> `dg-require-effective-target vect_early_break_hw` will succeed as it
> overrides the flags, causing the tests to use the wrong target.
>
> Let me know what you think.
>
> --
>
> With the -mcpu=unset/-march=unset feature introduced in
> r15-3606-g7d6c6a0d15c, these tests start to pass due to that the
> cpu/arch is overridden. The proper thing to do when using
> `dg-add-options vect_early_break` is to also have a
> `dg-require-effective-target vect_early_break`, so adding this.
>

So IIUC:
- on gcc-14 the tests are skipped because you override -march/-mcpu
when testing, and arm_v8_neon_ok fails when called from
vect_early_break ?
- on gcc-15 and trunk, the tests now pass thanks to -mcpu=unset/-march=unset
but you are concerned that 'dg-add-options vect_early_break' is used
without the corresponding effective-target?

So your patch is just "cosmetic" and has no impact on the testsuite results?

I have another concern (hence cc'ing Alexandre): vect.exp calls
check_vect_support_and_set_flags which defines dg-do-what-default
according to what it discovers, meaning that for some targets these
tests are 'run' and on others they are just 'compile'.
So I suppose we should use 'dg-require-effective-target
vect_early_break_hw' only when running the tests and
'dg-require-effective-target vect_early_break' when compiling them?

I suppose at the moment we completely skip these tests, while would
could at least compile them on some targets?


I think you can remove the 'arm' keywork in the title of your commit
message, as this patch can impact all targets.

Thanks,

Christophe



> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/tsvc/vect-tsvc-s332.c: Add
> dg-require-effective-target vect_early_break to test.
> * gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
> * gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c | 1 +
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c | 1 +
>  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c | 1 +
>  3 files changed, 3 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> index 21a9c5a6b2b..b4154040d1b 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break } */
>  /* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> index e4433385d66..156e44972bd 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break } */
>  /* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> index 146df409ecc..a1fcb18c557 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
> @@ -3,6 +3,7 @@
>
>  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_early_break } */
>  /* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-add-options vect_early_break } */
>
> --
> 2.25.1
>

RE: [PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned SAT_MUL pattern

2025-07-11 Thread Li, Pan2

Thanks Richard for comments.

> Why is it important to constrain the widen-mult input to a
> fixed precision at all?

I suppose widen-mult only occurs when the result exceed the max bits of gpr.
So, here I would like to make sure the precision is matching the bits of gpr.

For rv32 with 32-bits gpr, 32 bits * 32 bits => 64 bits

  15   │   _1 = (long long unsigned int) a_4(D);
  16   │   _2 = (long long unsigned int) b_5(D);
  17   │   _9 = (unsigned int) _1;
  18   │   _10 = (unsigned int) _2; 
  19   │   x_6 = _9 w* _10;

For rv64 with 64-bits gpr, 64 bits * 64 bits => 128 bits

  15   │   _1 = (__int128 unsigned) a_4(D);
  16   │   _2 = (__int128 unsigned) b_5(D);
  17   │   _9 = (unsigned long) _1;
  18   │   _10 = (unsigned long) _2;
  19   │   x_6 = _9 w* _10;

But if it is a widen-mul, it looks like that the ops of widen-mul should be the 
max bits of gpr already.
Or it will be a normal mul, then looks we don't need to check that anymore, am 
I understanding correct?

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, July 11, 2025 2:23 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com; Chen, Ken ; 
Liu, Hongtao 
Subject: Re: [PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned SAT_MUL 
pattern

On Fri, Jul 11, 2025 at 6:51 AM  wrote:
>
> From: Pan Li 
>
> The widen mul has different source type for differnt platform,
> like rv32 or rv64.  For rv32, the source of widen mul is 32-bits
> while 64-bits in rv64.  Thus, leverage HOST_WIDE_INT is not that
> correct and result in the pattern match failures in 32-bits system
> like rv32.
>
> Thus, leverage the BITS_PER_WORD instead for this pattern.
>
> gcc/ChangeLog:
>
> * match.pd: Leverage BITS_PER_WORD instead of HOST_WIDE_INT
> for widen mul precision check.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 67b33eee5f7..7f31705b652 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3605,11 +3605,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
>unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
>unsigned cvt6_prec = TYPE_PRECISION (TREE_TYPE (@6));
> -  unsigned hw_int_prec = sizeof (HOST_WIDE_INT) * 8;
>wide_int c2 = wi::to_wide (@2);
>wide_int max = wi::mask (prec, false, widen_prec);
>bool c2_is_max_p = wi::eq_p (c2, max);
> -  bool widen_mult_p = cvt5_prec == cvt6_prec && hw_int_prec == cvt5_prec;
> +  bool widen_mult_p = cvt5_prec == cvt6_prec && BITS_PER_WORD == 
> cvt5_prec;

Why is it important to constrain the widen-mult input to a
fixed precision at all?

>   }
>   (if (widen_prec > prec && c2_is_max_p && widen_mult_p)
>  )
> --
> 2.43.0
>

make autprofiledbootstrap with LTO meaningful

2025-07-11 Thread Jan Hubicka

Hello,
currently autoprofiled bootstrap produces auto-profiles for cc1 and
cc1plus binaries.  Those are used to build respective frontend files.
For backend cc1plus.fda is used.   This does not work well with LTO
bootstrap where cc1plus backend is untrained since it is used only for
parsing and ealry opts. As a result all binaries gets most of the
backend optimized for size rather then speed.

This patch adds lto1.fda and then combines all of cc1, cc1plus and lto1 into
all.fda that is used compiling common modules.  This is more or less
equivalent to what -fprofile-use effectively uses modulo that with
-fprofile-use we know number of runs of evety object file and scale
accordingly at LTO time.

There is comment disabling lto1 profiling claiming it does not work. Indeed I
get an ICE which I fixed in separate patch.

autoprofiledbootstrapped x86_64-linux with the extra fix, profiledbootstrap is
running, OK if it passes?

gcc/ChangeLog:

* Makefile.in (ALL_FDAS): New variable.
(ALL_HOST_BACKEND_OBJ): Use all.fda instead of cc1plus.fda
(all.fda): New target.

gcc/c/ChangeLog:

* Make-lang.in: Add c_FDAS.

gcc/cp/ChangeLog:

* Make-lang.in: Add c++_FDAS.

gcc/lto/ChangeLog:

* Make-lang.in: Add lto1_FDAS.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d5ceccc424b..a9dd07f7532 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1895,6 +1895,9 @@ OBJS-libcommon-target = $(common_out_object_file) 
prefix.o \
 # This lists all host objects for the front ends.
 ALL_HOST_FRONTEND_OBJS = $(foreach v,$(CONFIG_LANGUAGES),$($(v)_OBJS))
 
+# All auto-profile files
+ALL_FDAS = $(foreach v,$(CONFIG_LANGUAGES),$($(v)_FDAS))
+
 ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) $(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
@@ -1905,8 +1908,8 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
 # is likely the most exercised during the build
 ifeq ($(if $(wildcard ../stage_current),$(shell cat \
   ../stage_current)),stageautofeedback)
-$(ALL_HOST_BACKEND_OBJS): ALL_COMPILERFLAGS += -fauto-profile=cc1plus.fda
-$(ALL_HOST_BACKEND_OBJS): cc1plus.fda
+$(ALL_HOST_BACKEND_OBJS): ALL_COMPILERFLAGS += -fauto-profile=all.fda
+$(ALL_HOST_BACKEND_OBJS): all.fda
 endif
 
 # This lists all host object files, whether they are included in this
@@ -4727,6 +4730,9 @@ paranoia.o: $(srcdir)/../contrib/paranoia.cc $(CONFIG_H) 
$(SYSTEM_H) $(TREE_H)
 paranoia: paranoia.o real.o $(LIBIBERTY)
g++ -o $@ paranoia.o real.o $(LIBIBERTY)
 
+all.fda: $(ALL_FDAS)
+   $(PROFILE_MERGER) $(ALL_FDAS) --output_file all.fda -gcov_version 2
+
 # These exist for maintenance purposes.
 
 CTAGS=@CTAGS@
diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in
index 2517b64439f..049fe12a3ea 100644
--- a/gcc/c/Make-lang.in
+++ b/gcc/c/Make-lang.in
@@ -58,6 +58,7 @@ C_AND_OBJC_OBJS = attribs.o c/c-errors.o c/c-decl.o 
c/c-typeck.o \
 # Language-specific object files for C.
 C_OBJS = c/c-lang.o c-family/stub-objc.o $(C_AND_OBJC_OBJS)
 c_OBJS = $(C_OBJS) cc1-checksum.o c/gccspec.o
+c_FDAS = cc1.fda
 
 # Use strict warnings for this front end.
 c-warn = $(STRICT_WARN)
diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index dae3c6846e0..d47d096c886 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -123,6 +123,8 @@ CXX_OBJS = cp/cp-lang.o c-family/stub-objc.o 
$(CXX_AND_OBJCXX_OBJS)
 
 c++_OBJS = $(CXX_OBJS) cc1plus-checksum.o cp/g++spec.o
 
+c++_FDAS = cc1plus.fda
+
 # Use strict warnings for this front end.
 cp-warn = $(STRICT_WARN)
 
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index 553e6ddd0d2..a7dd7526bc7 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -26,18 +26,15 @@ LTO_DUMP_INSTALL_NAME := $(shell echo lto-dump|sed 
'$(program_transform_name)')
 # The LTO-specific object files inclued in $(LTO_EXE).
 LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o attribs.o 
lto/lto-partition.o lto/lto-symtab.o lto/lto-common.o
 lto_OBJS = $(LTO_OBJS)
+lto_FDAS = lto1.fda
 LTO_DUMP_OBJS = lto/lto-lang.o lto/lto-object.o attribs.o lto/lto-partition.o 
lto/lto-symtab.o lto/lto-dump.o lto/lto-common.o
 lto_dump_OBJS = $(LTO_DUMP_OBJS)
 
-# this is only useful in a LTO bootstrap, but this does not work right
-# now. Should reenable after this is fixed, but only when LTO bootstrap
-# is enabled.
-
-#ifeq ($(if $(wildcard ../stage_current),$(shell cat \
-#  ../stage_current)),stageautofeedback)
-#$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda
-#$(LTO_OBJS): lto1.fda
-#endif
+ifeq ($(if $(wildcard ../stage_current),$(shell cat \
+  ../stage_current)),stageautofeedback)
+$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda
+$(LTO_OBJS): lto1.fda
+endif
 
 # Rules

[PATCH 1/7 v2] RISC-V: Add basic XAndes vendor extension support.

2025-07-11 Thread Kuan-Lin Chen

This patch add basic support for the following XAndes ISA extensions:

XANDESPERF
XANDESBFHCVT
XANDESVBFHCVT
XANDESVSINTLOAD
XANDESVPACKFPH
XANDESVDOT

gcc/ChangeLog:

* config/riscv/riscv-ext.def: Include riscv-ext-andes.def.
* config/riscv/riscv-ext.opt (riscv_xandes_subext): New variable.
(XANDESPERF) : New mask.
(XANDESBFHCVT): Ditto.
(XANDESVBFHCVT): Ditto.
(XANDESVSINTLOAD): Ditto.
(XANDESVPACKFPH): Ditto.
(XANDESVDOT): Ditto.
* config/riscv/t-riscv: Add riscv-ext-andes.def.
* doc/riscv-ext.texi: Regenerated.
* config/riscv/riscv-ext-andes.def: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xandes-predef-1.c: New test.
* gcc.target/riscv/xandes-predef-2.c: New test.
* gcc.target/riscv/xandes-predef-3.c: New test.
* gcc.target/riscv/xandes-predef-4.c: New test.
* gcc.target/riscv/xandes-predef-5.c: New test.
* gcc.target/riscv/xandes-predef-6.c: New test.

Co-author: Lino Hsing-Yu Peng (linop...@andestech.com),
   Kai Kai-Yi Weng (kaiw...@andestech.com).
---
 gcc/config/riscv/riscv-ext-andes.def  | 100 ++
 gcc/config/riscv/riscv-ext.def|   1 +
 gcc/config/riscv/riscv-ext.opt|  15 +++
 gcc/config/riscv/t-riscv  |   3 +-
 gcc/doc/riscv-ext.texi|  24 +
 .../gcc.target/riscv/xandes-predef-1.c|  14 +++
 .../gcc.target/riscv/xandes-predef-2.c|  14 +++
 .../gcc.target/riscv/xandes-predef-3.c|  14 +++
 .../gcc.target/riscv/xandes-predef-4.c|  14 +++
 .../gcc.target/riscv/xandes-predef-5.c|  14 +++
 .../gcc.target/riscv/xandes-predef-6.c|  14 +++
 11 files changed, 226 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-ext-andes.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandes-predef-6.c

diff --git a/gcc/config/riscv/riscv-ext-andes.def 
b/gcc/config/riscv/riscv-ext-andes.def
new file mode 100644
index ..4226e3ed86fe
--- /dev/null
+++ b/gcc/config/riscv/riscv-ext-andes.def
@@ -0,0 +1,100 @@
+/* Andes extension definition file for RISC-V.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+Please run `make riscv-regen` in build folder to make sure updated anything.
+
+Format of DEFINE_RISCV_EXT, please refer to riscv-ext.def.  */
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xandesperf,
+  /* UPPERCASE_NAME */ XANDESPERF,
+  /* FULL_NAME */ "Andes performace extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{5, 0}}),
+  /* FLAG_GROUP */ xandes,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xandesbfhcvt,
+  /* UPPERCASE_NAME */ XANDESBFHCVT,
+  /* FULL_NAME */ "Andes bfloat16 conversion extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{5, 0}}),
+  /* FLAG_GROUP */ xandes,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xandesvbfhcvt,
+  /* UPPERCASE_NAME */ XANDESVBFHCVT,
+  /* FULL_NAME */ "Andes vector bfloat16 conversion extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{5, 0}}),
+  /* FLAG_GROUP */ xandes,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xandesvsintload,
+  /* UPPERCASE_NAME */ XANDESVSINTLOAD,
+  /* FULL_NAME */ "Andes vector INT4 load extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{5, 0}}),
+  /* FLAG_GROUP */ xandes,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NO

[PATCH 0/7 v2] Add Xandes vender extension support.

2025-07-11 Thread Kuan-Lin Chen

Changes since v1:
[PATCH 1/7]
Replaced "UPPERCAE_NAME" with "UPPERCASE_NAME".

[PATCH 2/7]
Renamed predicates
  - extract_loc_imm_si → unsigned_5_bit_integer_operand
  - extract_loc_imm_di → unsigned_6_bit_integer_operand
Replaced with existing predicates
  - Used const_int6_operand in place of extract_loc_imm_di
  - Defined const_int5_operand to replace extract_loc_imm_si
Branch handling updates
  - Added support for long-branch handling
  - Removed the length attribute from nds_branch_imms7 and 
nds_branch_on_bit
-Cost adjustments
  - For nds_branch_on_bit, set the combine-phase ZERO_EXTRACT cost to 
zero as intended
define_insn_and_splits condition fixes
  - Retained the built-in behavior where a split condition beginning with && is 
ANDed with the main condition
  - Reverted unnecessary changes to the split condition logic and simplified it 
back to the original form

[PATCH 6/7]
Removed "nds_vfpmad" temporarily before uploading Andes pipeline model.

Thanks for your review.


Kuan-Lin Chen (7):
  RISC-V: Add basic XAndes vendor extension support.
  RISC-V: Add support for the XAndesperf ISA extension.
  RISC-V: Add support for the XAndesbfhcvt ISA extension.
  RISC-V: Add support for the XAndesvbfhcvt ISA extension.
  RISC-V: Add support for the XAndesvsintload ISA extension.
  RISC-V: Add support for the XAndesvpackfph ISA extension.
  RISC-V: Add support for the XAndesvdot ISA extension.

 gcc/common/config/riscv/riscv-common.cc   |   3 +
 gcc/config.gcc|   4 +-
 .../riscv/andes-vector-builtins-bases.cc  | 189 +++
 .../riscv/andes-vector-builtins-bases.h   |  42 ++
 .../riscv/andes-vector-builtins-functions.def |  65 +++
 gcc/config/riscv/andes-vector.md  | 163 ++
 gcc/config/riscv/andes.def|  14 +
 gcc/config/riscv/andes.md | 469 ++
 gcc/config/riscv/andes_vector.h   |  32 ++
 gcc/config/riscv/constraints.md   |  10 +
 gcc/config/riscv/genrvv-type-indexer.cc   |   6 +-
 gcc/config/riscv/iterators.md |  12 +
 gcc/config/riscv/predicates.md|  42 ++
 gcc/config/riscv/riscv-builtins.cc|  10 +
 gcc/config/riscv/riscv-ext-andes.def  | 100 
 gcc/config/riscv/riscv-ext.def|   1 +
 gcc/config/riscv/riscv-ext.opt|  15 +
 gcc/config/riscv/riscv-ftypes.def |   3 +
 .../riscv/riscv-vector-builtins-types.def |  44 ++
 gcc/config/riscv/riscv-vector-builtins.cc | 103 
 gcc/config/riscv/riscv-vector-builtins.def|   4 +
 gcc/config/riscv/riscv-vector-builtins.h  |  20 +
 gcc/config/riscv/riscv.cc |  32 ++
 gcc/config/riscv/riscv.md |  17 +-
 gcc/config/riscv/t-riscv  |  18 +-
 gcc/config/riscv/vector-iterators.md  |  38 +-
 gcc/config/riscv/vector.md|   1 +
 gcc/doc/riscv-ext.texi|  24 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  12 +
 .../riscv/rvv/xandesvector/nds_vfwcvt.c   |  37 ++
 .../non-policy/non-overloaded/nds_vd4dots.c   | 132 +
 .../non-policy/non-overloaded/nds_vd4dotsu.c  | 132 +
 .../non-policy/non-overloaded/nds_vd4dotu.c   | 132 +
 .../non-policy/non-overloaded/nds_vfpmadb.c   | 103 
 .../non-policy/non-overloaded/nds_vfpmadt.c   | 103 
 .../non-policy/non-overloaded/nds_vln8.c  |  62 +++
 .../non-policy/overloaded/nds_vd4dots.c   | 132 +
 .../non-policy/overloaded/nds_vd4dotsu.c  | 132 +
 .../non-policy/overloaded/nds_vd4dotu.c   | 133 +
 .../non-policy/overloaded/nds_vfpmadb.c   | 103 
 .../non-policy/overloaded/nds_vfpmadt.c   | 103 
 .../non-policy/overloaded/nds_vln8.c  |  34 ++
 .../policy/non-overloaded/nds_vd4dots.c   | 258 ++
 .../policy/non-overloaded/nds_vd4dotsu.c  | 258 ++
 .../policy/non-overloaded/nds_vd4dotu.c   | 258 ++
 .../policy/non-overloaded/nds_vfpmadb.c   | 199 
 .../policy/non-overloaded/nds_vfpmadt.c   | 199 
 .../policy/non-overloaded/nds_vln8.c  | 118 +
 .../policy/overloaded/nds_vd4dots.c   | 258 ++
 .../policy/overloaded/nds_vd4dotsu.c  | 258 ++
 .../policy/overloaded/nds_vd4dotu.c   | 258 ++
 .../policy/overloaded/nds_vfpmadb.c   | 199 
 .../policy/overloaded/nds_vfpmadt.c   | 199 
 .../xandesvector/policy/overloaded/nds_vln8.c | 118 +
 .../gcc.target/riscv/xandes-predef-1.c|  14 +
 .../gcc.target/riscv/xandes-predef-2.c|  14 +
 .../gcc.target/riscv/xandes-predef-3.c|  14 +
 .../gcc.target/riscv/xandes-predef-4.c|  14 +
 .../gcc.target/riscv/xandes-predef-5.c|  14 +
 .../gcc.target/riscv/xandes-predef-6.c|  14 +
 .../gcc.target/riscv/xandesbfhcvt-1.c |  11 +
 .../gcc.target/riscv/xandesbfhcvt-2.c |

Re: Rewrite assign_discriminators pass

2025-07-11 Thread Richard Biener

On Thu, 10 Jul 2025, Jan Hubicka wrote:

> Hi,
> to assign debug locations to corresponding statements auto-fdo uses
> discriminators.  Documentation says that if given statement belongs to 
> multiple
> basic blocks, the discrminator distinguishes them.
> 
> Current implementation however only work fork statements that expands into a
> squence of gimple statements which forms a linear sequence, sicne it
> essentially tracks a current location and renews it each time new BB is found.
> This is commonly not true for C++ code as in:
> 
>:
>   [simulator/csimplemodule.cc:379:85] _40 = 
> std::__cxx11::basic_string::c_str ([simulator/csimplemodule.cc:379:85] 
> &D.80680);
>   [simulator/csimplemodule.cc:379:85 discrim 13] _41 = 
> [simulator/csimplemodule.cc:379:85] 
> &this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
>   [simulator/csimplemodule.cc:379:85 discrim 13] _42 = 
> &this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
>   [simulator/csimplemodule.cc:377:45] _43 = 
> this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782._vptr.cObject;
>   [simulator/csimplemodule.cc:377:45] _44 = _43 + 40;
>   [simulator/csimplemodule.cc:377:45] _45 = 
> [simulator/csimplemodule.cc:377:45] *_44;
>   [simulator/csimplemodule.cc:379:85] D.89001 = OBJ_TYPE_REF(_45;(const 
> struct cObject)_42->5B) (_41);
> 
> This is a fragment of code that is expanded from:
> 
> 
> 371 if (this!=simulation.getContextModule())
> 372 throw cRuntimeError("send()/sendDelayed() of module (%s)%s 
> called in the context of "
> 373 "module (%s)%s: method called from the 
> latter module "
> 374 "lacks Enter_Method() or 
> Enter_Method_Silent()? "
> 375 "Also, if message to be sent is passed 
> from that module, "
> 376 "you'll need to call take(msg) after 
> Enter_Method() as well",
> 377 getClassName(), getFullPath().c_str(),
> 378 
> simulation.getContextModule()->getClassName(),
> 379 
> simulation.getContextModule()->getFullPath().c_str());
> 
> Notice that 379:85 is interleaved by 377:45 and the pass does not assign new 
> discriminator.
> With patch we get:
> 
>:
>   [simulator/csimplemodule.cc:379:85 discrim 7] _40 = 
> std::__cxx11::basic_string::c_str ([simulator/csimplemodule.cc:379:85] 
> &D.80680);
>   [simulator/csimplemodule.cc:379:85 discrim 8] _41 = 
> [simulator/csimplemodule.cc:379:85] 
> &this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
>   [simulator/csimplemodule.cc:379:85 discrim 8] _42 = 
> &this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
>   [simulator/csimplemodule.cc:377:45 discrim 1] _43 = 
> this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782._vptr.cObject;
>   [simulator/csimplemodule.cc:377:45 discrim 1] _44 = _43 + 40;
>   [simulator/csimplemodule.cc:377:45 discrim 1] _45 = 
> [simulator/csimplemodule.cc:377:45] *_44;
>   [simulator/csimplemodule.cc:379:85 discrim 8] D.89001 = 
> OBJ_TYPE_REF(_45;(const struct cObject)_42->5B) (_41);
> 
> There are earlier statements with line number 379, so that is why there is 
> discriminator 7 for the call.
> After that discriminator is increased.  There are two reasons for it
>  1) AFDO requires every callsite to have unique lineno:discriminator pair
>  2) call may not terminate and htus the profile of first statement
> may be higher than the rest.
> 
> Old pass also contained logic to skip debug statements.  This is needed
> to discriminator at train time (with -g) and discriminators at feedback
> time (say -g0 -fauto-profile=...) are the same.  However keeping debug
> statments with broken discriminators is not a good idea since we output
> them to the debug output and if AFDO tool picks these locations up they
> will be misplaced in basic blocks.
> 
> Debug statements are naturally quite useful to track back the AFDO profiles
> and in meantime LLVM folks implemented something similar called pseudoprobe.
> I think it makes sense toenable debug statements with -fauto-profile even if
> debug info is off and make use of them as done in this patch.
> 
> Sadly AFDO tool is quite broken and bulid around assumption that every
> address has at most one debug location assigned to it (i.e. debug info
> before debug statements were introduced). I have WIP patch fixing this.
> The fact that it ignores all but last location assigned to the address
> sort of mitigates problem with debug statements.  If they are
> immediately suceeded by another location, the tool ignores them.
> 
> Note that LLVM also has -fdebug-info-for-auto-profile (on by defualt it seems)
> that controls discriminator production and some other little bits.  I wonder 
> if
> we want to have something similar.  Should it be -g instead?
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> 
> I am i

[PATCH 3/7 v2] RISC-V: Add support for the XAndesbfhcvt ISA extension.

2025-07-11 Thread Kuan-Lin Chen

This extension defines instructions to perform scalar floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754
32-bit single-precision floating-point (SP) data in a scalar
floating point register.

gcc/ChangeLog:

* config/riscv/andes.def: Add nds_fcvt_s_bf16 and nds_fcvt_bf16_s.
* config/riscv/andes.md (riscv_nds_fcvt_bf16_s): New pattern.
(riscv_nds_fcvt_s_bf16): New pattern.
* config/riscv/riscv-builtins.cc: New AVAIL andesbfhcvt.
Add new define RISCV_ATYPE_BF and RISCV_ATYPE_SF.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xandesbfhcvt-1.c: New test.
* gcc.target/riscv/xandesbfhcvt-2.c: New test.
---
 gcc/config/riscv/andes.def|  4 +++
 gcc/config/riscv/andes.md | 26 +++
 gcc/config/riscv/riscv-builtins.cc|  3 +++
 gcc/config/riscv/riscv-ftypes.def |  2 ++
 .../gcc.target/riscv/xandesbfhcvt-1.c | 11 
 .../gcc.target/riscv/xandesbfhcvt-2.c | 11 
 6 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesbfhcvt-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesbfhcvt-2.c

diff --git a/gcc/config/riscv/andes.def b/gcc/config/riscv/andes.def
index b864ae712c1d..5b5fb76bfe0e 100644
--- a/gcc/config/riscv/andes.def
+++ b/gcc/config/riscv/andes.def
@@ -8,3 +8,7 @@ RISCV_BUILTIN (nds_ffmismsi, "nds_ffmism", 
RISCV_BUILTIN_DIRECT, RISCV_LONG_FTYP
 RISCV_BUILTIN (nds_ffmismdi, "nds_ffmism", RISCV_BUILTIN_DIRECT, 
RISCV_LONG_FTYPE_ULONG_ULONG, andesperf64),
 RISCV_BUILTIN (nds_flmismsi, "nds_flmism", RISCV_BUILTIN_DIRECT, 
RISCV_LONG_FTYPE_ULONG_ULONG, andesperf32),
 RISCV_BUILTIN (nds_flmismdi, "nds_flmism", RISCV_BUILTIN_DIRECT, 
RISCV_LONG_FTYPE_ULONG_ULONG, andesperf64),
+
+/* Andes Scalar BFLOAT16 Conversion Extension */
+RISCV_BUILTIN (nds_fcvt_s_bf16, "nds_fcvt_s_bf16", RISCV_BUILTIN_DIRECT, 
RISCV_SF_FTYPE_BF, andesbfhcvt),
+RISCV_BUILTIN (nds_fcvt_bf16_s, "nds_fcvt_bf16_s", RISCV_BUILTIN_DIRECT, 
RISCV_BF_FTYPE_SF, andesbfhcvt),
diff --git a/gcc/config/riscv/andes.md b/gcc/config/riscv/andes.md
index 51f61e58e244..22aa5e5150d5 100644
--- a/gcc/config/riscv/andes.md
+++ b/gcc/config/riscv/andes.md
@@ -441,3 +441,29 @@
   "nds.flmism\t%0, %z1, %z2"
   [(set_attr "mode" "")
(set_attr "type" "arith")])
+
+;;
+;;  
+;;
+;;Bfloat16
+;;
+;;  
+;;
+
+(define_insn "riscv_nds_fcvt_bf16_s"
+  [(set (match_operand:BF   0 "register_operand" "=f")
+   (float_truncate:BF
+ (match_operand:SF 1 "register_operand" " f")))]
+  "TARGET_XANDESBFHCVT"
+  "nds.fcvt.bf16.s\t%0,%1"
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+(define_insn "riscv_nds_fcvt_s_bf16"
+  [(set (match_operand:SF   0 "register_operand" "=f")
+   (float_extend:SF
+ (match_operand:BF 1 "register_operand" " f")))]
+  "TARGET_XANDESBFHCVT"
+  "nds.fcvt.s.bf16\t%0,%1"
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "SF")])
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 8b081e240be4..799c7a4ccd13 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -140,6 +140,7 @@ AVAIL (cvsimd, TARGET_XCVSIMD && !TARGET_64BIT)
 /* ANDES AVAIL.  */
 AVAIL (andesperf32, !TARGET_64BIT && TARGET_XANDESPERF)
 AVAIL (andesperf64, TARGET_64BIT && TARGET_XANDESPERF)
+AVAIL (andesbfhcvt, TARGET_XANDESBFHCVT)
 
 /* Construct a riscv_builtin_description from the given arguments.
 
@@ -199,6 +200,8 @@ AVAIL (andesperf64, TARGET_64BIT && TARGET_XANDESPERF)
 #define RISCV_ATYPE_INT_PTR integer_ptr_type_node
 #define RISCV_ATYPE_ULONG long_unsigned_type_node
 #define RISCV_ATYPE_LONG long_integer_type_node
+#define RISCV_ATYPE_BF bfloat16_type_node
+#define RISCV_ATYPE_SF float_type_node
 
 /* RISCV_FTYPE_ATYPESN takes N RISCV_FTYPES-like type codes and lists
their associated RISCV_ATYPEs.  */
diff --git a/gcc/config/riscv/riscv-ftypes.def 
b/gcc/config/riscv/riscv-ftypes.def
index fd1314d9d975..f50a37a581a6 100644
--- a/gcc/config/riscv/riscv-ftypes.def
+++ b/gcc/config/riscv/riscv-ftypes.def
@@ -37,6 +37,8 @@ DEF_RISCV_FTYPE (1, (USI, UQI))
 DEF_RISCV_FTYPE (1, (USI, UHI))
 DEF_RISCV_FTYPE (1, (SI, QI))
 DEF_RISCV_FTYPE (1, (SI, HI))
+DEF_RISCV_FTYPE (1, (BF, SF))
+DEF_RISCV_FTYPE (1, (SF, BF))
 DEF_RISCV_FTYPE (2, (USI, UQI, UQI))
 DEF_RISCV_FTYPE (2, (USI, USI, UHI))
 DEF_RISCV_FTYPE (2, (USI, USI, QI))
diff --git a/gcc/testsuite/gcc.target/riscv/xandesbfhcvt-1.c 
b/gcc/testsuite/gcc.target/riscv/xandesbfhcvt-1.c
new file mode 100644
index ..b174b6ef5053
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xandesbfhcvt-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xandesbfhcvt" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_xandesbfhcvt" { target { rv64 } } } */

[PATCH 7/7 v2] RISC-V: Add support for the XAndesvdot ISA extension.

2025-07-11 Thread Kuan-Lin Chen

This extension defines vector instructions to calculae of the signed/unsigned
dot product of four SEW/4-bit data and accumulate the result into a SEWbit
element for all elements in a vector register.

gcc/ChangeLog:

* config/riscv/andes-vector-builtins-bases.cc (nds_vd4dot): New class.
(class nds_vd4dotsu): New class.
* config/riscv/andes-vector-builtins-bases.h: New def.
* config/riscv/andes-vector-builtins-functions.def (nds_vd4dots): Ditto.
(nds_vd4dotsu): Ditto.
(nds_vd4dotu): Ditto.
* config/riscv/andes-vector.md
(@pred_nds_vd4dot): New pattern.
(@pred_nds_vd4dotsu): New pattern.
* config/riscv/genrvv-type-indexer.cc (main): Modify sew of QUAD_FIX,
QUAD_FIX_SIGNED and QUAD_FIX_UNSIGNED.
* config/riscv/riscv-vector-builtins.cc
(qexti__ops): New operand information.
(qexti_su__ops): New operand information.
(qextu__ops): New operand information.
* config/riscv/riscv-vector-builtins.h (XANDESVDOT_EXT): New def.
(required_ext_to_isa_name): Add case XANDESVDOT_EXT.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/vector-iterators.md (NDS_QUAD_FIX): New iterator.

gcc/testsuite/ChangeLog:

* 
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dots.c: New 
test.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotsu.c: New 
test.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotu.c: New 
test.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dots.c: New test.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotsu.c: New 
test.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotu.c: New test.
* 
gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dots.c: New test.
* 
gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotsu.c: New 
test.
* 
gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dots.c: 
New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotsu.c: 
New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotu.c: 
New test.
---
 .../riscv/andes-vector-builtins-bases.cc  |  33 +++
 .../riscv/andes-vector-builtins-bases.h   |   3 +
 .../riscv/andes-vector-builtins-functions.def |   7 +
 gcc/config/riscv/andes-vector.md  |  53 
 gcc/config/riscv/genrvv-type-indexer.cc   |   6 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  31 +++
 gcc/config/riscv/riscv-vector-builtins.h  |   5 +
 gcc/config/riscv/vector-iterators.md  |  13 +
 .../non-policy/non-overloaded/nds_vd4dots.c   | 132 +
 .../non-policy/non-overloaded/nds_vd4dotsu.c  | 132 +
 .../non-policy/non-overloaded/nds_vd4dotu.c   | 132 +
 .../non-policy/overloaded/nds_vd4dots.c   | 132 +
 .../non-policy/overloaded/nds_vd4dotsu.c  | 132 +
 .../non-policy/overloaded/nds_vd4dotu.c   | 133 +
 .../policy/non-overloaded/nds_vd4dots.c   | 258 ++
 .../policy/non-overloaded/nds_vd4dotsu.c  | 258 ++
 .../policy/non-overloaded/nds_vd4dotu.c   | 258 ++
 .../policy/overloaded/nds_vd4dots.c   | 258 ++
 .../policy/overloaded/nds_vd4dotsu.c  | 258 ++
 .../policy/overloaded/nds_vd4dotu.c   | 258 ++
 20 files changed, 2489 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dots.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotsu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dots.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotsu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dots.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotsu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dots.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotsu.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/overloade

[PATCH 2/7 v2] RISC-V: Add support for the XAndesperf ISA extension.

2025-07-11 Thread Kuan-Lin Chen

This patch adds support for the XAndesperf ISA extension.
The 32-bit AndeStar V5 extension includes branch instructions,
load effective address instructions, and string processing
instructions for performance improvement.
New INSN patterns are added into the new file andes.md
as a seprated vender extension.

gcc/ChangeLog:

* config/riscv/constraints.md (Ou07): New constraint.
(ads_Bext): New constraint.
* config/riscv/iterators.md (ANYLE32): New iterator.
(sizen): New iterator.
(sh_limit): New iterator.
(sh_bit): New iterator.
* config/riscv/predicates.md (ads_branch_bbcs_operand): New predicate.
(ads_branch_bimm_operand): New predicate.
(ads_imm_extract_operand): New predicate.
(ads_extract_size_imm_si): New predicate.
(ads_extract_size_imm_di): New predicate.
(const_int5_operand): New predicate.
* config/riscv/riscv-builtins.cc:
Add new AVAIL andesperf32 and andesperf64.
Add new define RISCV_ATYPE_ULONG and RISCV_ATYPE_LONG.
* config/riscv/riscv-ftypes.def: New DEF_RISCV_FTYPE.
* config/riscv/riscv.cc
(riscv_extend_cost): Cost for pattern 'bfo'.
(riscv_rtx_costs): Cost for XAndesperf extension.
* config/riscv/riscv.md: Add support for XAndesperf to patterns
zero_extendsidi2_internal, zero_extendhi2, extendsidi2_internal,
extend2, 3
and branch_on_bit.
* config/riscv/vector-iterators.md
 (sz): Add sign_extract and zero_extract.
* config/riscv/andes.def: New file for vender Andes.
* config/riscv/andes.md: New file for vender Andes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xandesperf-1.c: New test.
* gcc.target/riscv/xandesperf-10.c: New test.
* gcc.target/riscv/xandesperf-2.c: New test.
* gcc.target/riscv/xandesperf-3.c: New test.
* gcc.target/riscv/xandesperf-4.c: New test.
* gcc.target/riscv/xandesperf-5.c: New test.
* gcc.target/riscv/xandesperf-6.c: New test.
* gcc.target/riscv/xandesperf-7.c: New test.
* gcc.target/riscv/xandesperf-8.c: New test.
* gcc.target/riscv/xandesperf-9.c: New test.
---
 gcc/config/riscv/andes.def|  10 +
 gcc/config/riscv/andes.md | 443 ++
 gcc/config/riscv/constraints.md   |  10 +
 gcc/config/riscv/iterators.md |  12 +
 gcc/config/riscv/predicates.md|  42 ++
 gcc/config/riscv/riscv-builtins.cc|   7 +
 gcc/config/riscv/riscv-ftypes.def |   1 +
 gcc/config/riscv/riscv.cc |  32 ++
 gcc/config/riscv/riscv.md |  17 +-
 gcc/config/riscv/vector-iterators.md  |   2 +-
 gcc/testsuite/gcc.target/riscv/xandesperf-1.c |  13 +
 .../gcc.target/riscv/xandesperf-10.c  |  32 ++
 gcc/testsuite/gcc.target/riscv/xandesperf-2.c |  13 +
 gcc/testsuite/gcc.target/riscv/xandesperf-3.c |  11 +
 gcc/testsuite/gcc.target/riscv/xandesperf-4.c |  11 +
 gcc/testsuite/gcc.target/riscv/xandesperf-5.c |  11 +
 gcc/testsuite/gcc.target/riscv/xandesperf-6.c |  18 +
 gcc/testsuite/gcc.target/riscv/xandesperf-7.c |  22 +
 gcc/testsuite/gcc.target/riscv/xandesperf-8.c |  26 +
 gcc/testsuite/gcc.target/riscv/xandesperf-9.c |  31 ++
 20 files changed, 757 insertions(+), 7 deletions(-)
 create mode 100644 gcc/config/riscv/andes.def
 create mode 100644 gcc/config/riscv/andes.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xandesperf-9.c

diff --git a/gcc/config/riscv/andes.def b/gcc/config/riscv/andes.def
new file mode 100644
index ..b864ae712c1d
--- /dev/null
+++ b/gcc/config/riscv/andes.def
@@ -0,0 +1,10 @@
+// XANDESPERF
+/* Andes Performance Extension */
+RISCV_BUILTIN (nds_ffbsi, "nds_ffb", RISCV_BUILTIN_DIRECT, 
RISCV_LONG_FTYPE_ULONG_ULONG, andesperf32),
+RISCV_BUILTIN (nds_ffbdi, "nds_ffb", RISCV_BUILTIN_DIRECT, 
RISCV_LONG_FTYPE_ULONG_ULONG, andesperf64),
+RISCV_BUILTIN (nds_ffzmismsi, "nds_ffzmism", RISCV_BUILTIN_DIRECT, 
RISCV_LONG_FTYPE_ULONG_ULONG, andesperf32),
+RISCV_BUILTIN (nds_ffzmismdi, "nds_ffzmism", RISCV_BUILTIN_DIRECT, 
RISCV_LONG_FTYPE_ULONG_ULONG, andesperf64),
+RISCV_BUILTIN (nds_ffmismsi, "nds_ffmism", RISCV_BUILTIN_DIRECT, 
RISCV_LONG_FTYPE_ULONG_ULONG, andesperf32),
+RISCV_BUILTIN (nds_ffmismdi, "nds_ffmism", RISCV_

[PATCH 4/7 v2] RISC-V: Add support for the XAndesvbfhcvt ISA extension.

2025-07-11 Thread Kuan-Lin Chen

This patch add support for XAndesvbfhcvt ISA extension.
This extension defines instructions to perform vector floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754 32-bit
single-precision floating-point (SP) data in a vector register.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
Turn on VECTOR_ELEN_BF_16 for XAndesvbfhcvt.
* config.gcc: Add extra_objs andes-vector-builtins-bases.o
and extra_headers andes_vector.h.
* config/riscv/riscv-vector-builtins.cc
(f32_to_bf16_nf_w_ops): New operand information.
(f32_to_bf16_nf_w_ops): New operand information.
(DEF_RVV_FUNCTION): New def.
* config/riscv/riscv-vector-builtins.def (bf16_s): Ditto.
(s_bf16): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVBFHCVT_EXT.
(required_extensions_specified): Ditto.
* config/riscv/t-riscv: Add andes-vector-builtins-functions.def,
andes-vector-builtins-bases.h and andes-vector-builtins-bases.o.
* config/riscv/vector-iterators.md (NDS_VWEXTBF): New iterator.
(NDS_V_DOUBLE_TRUNC_BF): New attr.
* config/riscv/andes-vector-builtins-bases.cc: New file.
* config/riscv/andes-vector-builtins-bases.h: New file.
* config/riscv/andes-vector-builtins-functions.def: New file.
* config/riscv/andes_vector.h: New file.
* config/riscv/andes_vector.md: New file.
* config/riscv/vector.md: Include andes_vector.md.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/xandesvector/nds_vfwcvt.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   2 +
 gcc/config.gcc|   4 +-
 .../riscv/andes-vector-builtins-bases.cc  | 103 ++
 .../riscv/andes-vector-builtins-bases.h   |  33 ++
 .../riscv/andes-vector-builtins-functions.def |  45 
 gcc/config/riscv/andes-vector.md  |  51 +
 gcc/config/riscv/andes_vector.h   |  32 ++
 gcc/config/riscv/riscv-vector-builtins.cc |  21 
 gcc/config/riscv/riscv-vector-builtins.def|   2 +
 gcc/config/riscv/riscv-vector-builtins.h  |   5 +
 gcc/config/riscv/t-riscv  |  15 +++
 gcc/config/riscv/vector-iterators.md  |  13 +++
 gcc/config/riscv/vector.md|   1 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 .../riscv/rvv/xandesvector/nds_vfwcvt.c   |  37 +++
 15 files changed, 364 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/andes-vector-builtins-bases.cc
 create mode 100644 gcc/config/riscv/andes-vector-builtins-bases.h
 create mode 100644 gcc/config/riscv/andes-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/andes-vector.md
 create mode 100644 gcc/config/riscv/andes_vector.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xandesvector/nds_vfwcvt.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 82037a334528..2e20eee87902 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1528,6 +1528,8 @@ static const riscv_extra_ext_flag_table_t 
riscv_extra_ext_flag_table[] =
   RISCV_EXT_FLAG_ENTRY ("xtheadvector",  x_riscv_isa_flags, MASK_FULL_V),
   RISCV_EXT_FLAG_ENTRY ("xtheadvector",  x_riscv_isa_flags, MASK_VECTOR),
 
+  RISCV_EXT_FLAG_ENTRY ("xandesvbfhcvt", x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_BF_16),
+
   {NULL, NULL, NULL, 0}
 };
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 8ed111392bb4..45a17fbb452b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,10 +549,10 @@ riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o 
riscv-avlprop.o riscv-vect-permconst.o"
-   extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o 
sifive-vector-builtins-bases.o"
+   extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o 
sifive-vector-builtins-bases.o andes-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o riscv-zicfilp.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h riscv_crypto.h riscv_bitmanip.h 
riscv_th_vector.h sifive_vector.h"
+   extra_headers="riscv_vector.h riscv_crypto.h riscv_bitmanip.h 
riscv_th_vector.h sifive_vector.h andes_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
extra_options="${extra_options} riscv/riscv-ext.opt"
diff

[PATCH v3 0/5] vect: Misalign for gather/scatter.

2025-07-11 Thread Robin Dapp

Hi,

this is v3 with the patch split up into multiple ones.  Sorry that it
took so long...
The first two patches and the fifth are independent but (3) and (4) are
not so they would be squashed. 4 is still the largest one and now
includes the suggested refactoring.  It's still not terribly obvious
what's happening but maybe better than before.

The strided loads/stores will need the same changes at some point.
Even though we treat them as special cases of gathers/scatters right
now (and also perform all the same checks) it's not intuitive that
their IFNs don't have an alias pointer.  I deferred that for now, though.

The whole series was regtested and bootstrapped on x86, aarch64, and
power10 and I built the patches individually on x86 as well as riscv.
It was also regtested on rv64gcv_zvl512b.

Robin Dapp (5):
  ifn: Add helper functions for gather/scatter.
  vect: Add helper macros for gather/scatter.
  vect: Add is_gather_scatter argument to misalignment hook.
  vect: Misalign checks for gather/scatter.
  riscv: testsuite: Fix misalignment check.

 gcc/config/aarch64/aarch64.cc |  12 +-
 gcc/config/arm/arm.cc |  12 +-
 gcc/config/epiphany/epiphany.cc   |   8 +-
 gcc/config/gcn/gcn.cc |   6 +-
 gcc/config/loongarch/loongarch.cc |   8 +-
 gcc/config/riscv/riscv.cc |  29 +++-
 gcc/config/rs6000/rs6000.cc   |   6 +-
 gcc/config/s390/s390.cc   |   6 +-
 gcc/doc/tm.texi   |   8 +-
 gcc/internal-fn.cc|  96 +--
 gcc/internal-fn.h |   3 +
 gcc/optabs-query.cc   |   6 +-
 gcc/target.def|  14 +-
 gcc/targhooks.cc  |   2 +
 gcc/targhooks.h   |   2 +-
 gcc/testsuite/lib/target-supports.exp |   2 +-
 gcc/tree-vect-data-refs.cc|  15 +-
 gcc/tree-vect-patterns.cc |  17 +-
 gcc/tree-vect-slp.cc  |  28 ++--
 gcc/tree-vect-stmts.cc| 231 --
 gcc/tree-vectorizer.h |  12 ++
 21 files changed, 363 insertions(+), 160 deletions(-)

-- 
2.50.0

[PATCH v3 2/5] vect: Add helper macros for gather/scatter.

2025-07-11 Thread Robin Dapp

This encapsulates the IFN and the builtin-function way of handling
gather/scatter via three defines:

  GATHER_SCATTER_IFN_P
  GATHER_SCATTER_LEGACY_P
  GATHER_SCATTER_EMULATED_P

and introduces a helper define for SLP operand handling as well.

gcc/ChangeLog:

* tree-vect-slp.cc (GATHER_SCATTER_OFFSET): New define.
(vect_get_and_check_slp_defs): Use.
* tree-vectorizer.h (GATHER_SCATTER_LEGACY_P): New define.
(GATHER_SCATTER_IFN_P): Ditto.
(GATHER_SCATTER_EMULATED_P): Ditto.
* tree-vect-stmts.cc (vectorizable_store): Use.
(vectorizable_load): Use.
---
 gcc/tree-vect-slp.cc   | 12 +++-
 gcc/tree-vect-stmts.cc | 19 +--
 gcc/tree-vectorizer.h  |  8 
 3 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ad75386926a..0c95ed946bb 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -507,6 +507,8 @@ vect_def_types_match (enum vect_def_type dta, enum 
vect_def_type dtb)
  && (dtb == vect_external_def || dtb == vect_constant_def)));
 }
 
+#define GATHER_SCATTER_OFFSET (-3)
+
 static const int no_arg_map[] = { 0 };
 static const int arg0_map[] = { 1, 0 };
 static const int arg1_map[] = { 1, 1 };
@@ -516,10 +518,10 @@ static const int arg1_arg4_arg5_map[] = { 3, 1, 4, 5 };
 static const int arg1_arg3_arg4_map[] = { 3, 1, 3, 4 };
 static const int arg3_arg2_map[] = { 2, 3, 2 };
 static const int op1_op0_map[] = { 2, 1, 0 };
-static const int off_map[] = { 1, -3 };
-static const int off_op0_map[] = { 2, -3, 0 };
-static const int off_arg2_arg3_map[] = { 3, -3, 2, 3 };
-static const int off_arg3_arg2_map[] = { 3, -3, 3, 2 };
+static const int off_map[] = { 1, GATHER_SCATTER_OFFSET };
+static const int off_op0_map[] = { 2, GATHER_SCATTER_OFFSET, 0 };
+static const int off_arg2_arg3_map[] = { 3, GATHER_SCATTER_OFFSET, 2, 3 };
+static const int off_arg3_arg2_map[] = { 3, GATHER_SCATTER_OFFSET, 3, 2 };
 static const int mask_call_maps[6][7] = {
   { 1, 1, },
   { 2, 1, 2, },
@@ -691,7 +693,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char 
swap,
 {
   oprnd_info = (*oprnds_info)[i];
   int opno = map ? map[i] : int (i);
-  if (opno == -3)
+  if (opno == GATHER_SCATTER_OFFSET)
{
  gcc_assert (STMT_VINFO_GATHER_SCATTER_P (stmt_info));
  if (!is_a  (vinfo)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4aa69da2218..57942f43c3b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2455,7 +2455,7 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
 If that failed for some reason (e.g. because another pattern
 took priority), just handle cases in which the offset already
 has the right type.  */
-  else if (gs_info->ifn != IFN_LAST
+  else if (GATHER_SCATTER_IFN_P (*gs_info)
   && !is_gimple_call (stmt_info->stmt)
   && !tree_nop_conversion_p (TREE_TYPE (gs_info->offset),
  TREE_TYPE (gs_info->offset_vectype)))
@@ -8368,7 +8368,8 @@ vectorizable_store (vec_info *vinfo,
}
   else if (memory_access_type != VMAT_LOAD_STORE_LANES
   && (memory_access_type != VMAT_GATHER_SCATTER
-  || (gs_info.decl && !VECTOR_BOOLEAN_TYPE_P (mask_vectype
+  || (GATHER_SCATTER_LEGACY_P (gs_info)
+  && !VECTOR_BOOLEAN_TYPE_P (mask_vectype
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -8376,8 +8377,7 @@ vectorizable_store (vec_info *vinfo,
  return false;
}
   else if (memory_access_type == VMAT_GATHER_SCATTER
-  && gs_info.ifn == IFN_LAST
-  && !gs_info.decl)
+  && GATHER_SCATTER_EMULATED_P (gs_info))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -9103,7 +9103,7 @@ vectorizable_store (vec_info *vinfo,
   final_mask, vec_mask, gsi);
}
 
- if (gs_info.ifn != IFN_LAST)
+ if (GATHER_SCATTER_IFN_P (gs_info))
{
  if (costing_p)
{
@@ -9166,7 +9166,7 @@ vectorizable_store (vec_info *vinfo,
  vect_finish_stmt_generation (vinfo, stmt_info, call, gsi);
  new_stmt = call;
}
-  else if (gs_info.decl)
+ else if (GATHER_SCATTER_LEGACY_P (gs_info))
{
  /* The builtin decls path for scatter is legacy, x86 only.  */
  gcc_assert (nunits.is_constant ()
@@ -10077,8 +10077,7 @@ vectorizable_load (vec_info *vinfo,
  return false;
}
   else if (memory_access_type == VMAT_GATHER_SCATTER
-  && gs_info.ifn == IFN_LAST
-  && !gs_info.decl)
+  && GATHER_SCATTER_EMULATED_P (gs_info))

[PATCH 5/7 v2] RISC-V: Add support for the XAndesvsintload ISA extension.

2025-07-11 Thread Kuan-Lin Chen

This extension defines vector load instructions to move sign-extended or
zero-extended INT4 data into 8-bit vector register elements.

gcc/ChangeLog:

* config/riscv/andes-vector-builtins-bases.cc
(nds_nibbleload): New class.
* config/riscv/andes-vector-builtins-bases.h (nds_vln8): New def.
(nds_vlnu8): Ditto.
* config/riscv/andes-vector-builtins-functions.def (nds_vln8): Ditto.
(nds_vlnu8): Ditto.
* config/riscv/andes.md (@pred_intload_mov): New pattern.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_Q_OPS): New def.
(DEF_RVV_QU_OPS): Ditto.
* config/riscv/riscv-vector-builtins.cc
(q_v_void_const_ptr_ops): New operand information.
(qu_v_void_const_ptr_ops): Ditto.
* config/riscv/riscv-vector-builtins.def (void_const_ptr): New def.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVSINTLOAD_EXT.
(required_extensions_specified): Ditto.
* config/riscv/vector-iterators.md (NDS_QVI): New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add regression for xandesvector.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vln8.c: New 
test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vln8.c: 
New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vln8.c: 
New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vln8.c: New 
test.
---
 .../riscv/andes-vector-builtins-bases.cc  |  30 -
 .../riscv/andes-vector-builtins-bases.h   |   2 +
 .../riscv/andes-vector-builtins-functions.def |   5 +
 gcc/config/riscv/andes-vector.md  |  27 
 .../riscv/riscv-vector-builtins-types.def |  30 +
 gcc/config/riscv/riscv-vector-builtins.cc |  32 +
 gcc/config/riscv/riscv-vector-builtins.def|   1 +
 gcc/config/riscv/riscv-vector-builtins.h  |   5 +
 gcc/config/riscv/vector-iterators.md  |   5 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  10 ++
 .../non-policy/non-overloaded/nds_vln8.c  |  62 +
 .../non-policy/overloaded/nds_vln8.c  |  34 +
 .../policy/non-overloaded/nds_vln8.c  | 118 ++
 .../xandesvector/policy/overloaded/nds_vln8.c | 118 ++
 14 files changed, 478 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vln8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vln8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vln8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vln8.c

diff --git a/gcc/config/riscv/andes-vector-builtins-bases.cc 
b/gcc/config/riscv/andes-vector-builtins-bases.cc
index 69e16fd94543..2c19f32225b9 100644
--- a/gcc/config/riscv/andes-vector-builtins-bases.cc
+++ b/gcc/config/riscv/andes-vector-builtins-bases.cc
@@ -89,8 +89,35 @@ public:
   }
 };
 
+/* Implements Andes vln8.v/vln8.v.  */
+template 
+class nds_nibbleload : public function_base
+{
+public:
+  unsigned int call_properties (const function_instance &) const override
+  {
+return CP_READ_MEMORY;
+  }
+
+  bool can_be_overloaded_p (enum predication_type_index pred) const override
+  {
+return pred != PRED_TYPE_none;
+  }
+
+  rtx expand (function_expander &e) const override
+  {
+if (SIGN)
+  return e.use_contiguous_load_insn (
+   code_for_pred_intload_mov (SIGN_EXTEND, e.vector_mode ()));
+return e.use_contiguous_load_insn (
+  code_for_pred_intload_mov (ZERO_EXTEND, e.vector_mode ()));
+  }
+};
+
 static CONSTEXPR const nds_vfwcvt nds_vfwcvt_obj;
 static CONSTEXPR const nds_vfncvt nds_vfncvt_obj;
+static CONSTEXPR const nds_nibbleload nds_vln8_obj;
+static CONSTEXPR const nds_nibbleload nds_vlnu8_obj;
 
 /* Declare the function base NAME, pointing it to an instance
of class _obj.  */
@@ -99,5 +126,6 @@ static CONSTEXPR const nds_vfncvt nds_vfncvt_obj;
 
 BASE (nds_vfwcvt)
 BASE (nds_vfncvt)
-
+BASE (nds_vln8)
+BASE (nds_vlnu8)
 } // end namespace riscv_vector
diff --git a/gcc/config/riscv/andes-vector-builtins-bases.h 
b/gcc/config/riscv/andes-vector-builtins-bases.h
index 7d11761d8f6e..d983b44d2e9d 100644
--- a/gcc/config/riscv/andes-vector-builtins-bases.h
+++ b/gcc/config/riscv/andes-vector-builtins-bases.h
@@ -26,6 +26,8 @@ namespace riscv_vector {
 namespace bases {
 extern const function_base *const nds_vfwcvt;
 extern const function_base *const nds_vfncvt;
+extern const function_base *const nds_vln8;
+extern const function_base *const nds_vlnu8;
 }
 
 } // end namespace riscv_vector
diff --git a/gcc/config/riscv/andes-vector-builtins-functions.def 
b/gcc/config/riscv/andes-vector-builtins-functions.def
index 989db8c71bab..ebb0de3217ea 100644
--- a

[PATCH 6/7 v2] RISC-V: Add support for the XAndesvpackfph ISA extension.

2025-07-11 Thread Kuan-Lin Chen

This extension defines vector instructions to extract a pair of FP16 data from
a floating-point register. Multiply the top FP16 data with the FP16 elements
and add the result with the bottom FP16 data.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
Turn on VECTOR_ELEN_FP_16 for XAndesvpackfph.
* config/riscv/andes-vector-builtins-bases.cc (nds_vfpmad): New class.
* config/riscv/andes-vector-builtins-bases.h: New def.
* config/riscv/andes-vector-builtins-functions.def (nds_vfpmadt): Ditto.
(nds_vfpmadb): Ditto.
(nds_vfpmadt_frm): Ditto.
(nds_vfpmadb_frm): Ditto.
* config/riscv/andes-vector.md (@pred_nds_vfpmad):
New pattern.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_F16_OPS): New def.
* config/riscv/riscv-vector-builtins.cc (f16_ops): Ditto
* config/riscv/riscv-vector-builtins.def (float32_type_node): Ditto.
* config/riscv/riscv-vector-builtins.h (XANDESVPACKFPH_EXT): Ditto.
(required_ext_to_isa_name): Add case XANDESVPACKFPH_EXT.
(required_extensions_specified): Ditto.
* config/riscv/vector-iterators.md (VHF): New iterator.

gcc/testsuite/ChangeLog:

* 
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadb.c: New 
test.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadt.c: New 
test.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadb.c: New test.
* 
gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadt.c: New test.
* 
gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadb.c: New test.
* 
gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadt.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadb.c: 
New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadt.c: 
New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   3 +-
 .../riscv/andes-vector-builtins-bases.cc  |  25 +++
 .../riscv/andes-vector-builtins-bases.h   |   4 +
 .../riscv/andes-vector-builtins-functions.def |   8 +
 gcc/config/riscv/andes-vector.md  |  32 +++
 .../riscv/riscv-vector-builtins-types.def |  14 ++
 gcc/config/riscv/riscv-vector-builtins.cc |  19 ++
 gcc/config/riscv/riscv-vector-builtins.def|   1 +
 gcc/config/riscv/riscv-vector-builtins.h  |   5 +
 gcc/config/riscv/vector-iterators.md  |   5 +
 .../non-policy/non-overloaded/nds_vfpmadb.c   | 103 +
 .../non-policy/non-overloaded/nds_vfpmadt.c   | 103 +
 .../non-policy/overloaded/nds_vfpmadb.c   | 103 +
 .../non-policy/overloaded/nds_vfpmadt.c   | 103 +
 .../policy/non-overloaded/nds_vfpmadb.c   | 199 ++
 .../policy/non-overloaded/nds_vfpmadt.c   | 199 ++
 .../policy/overloaded/nds_vfpmadb.c   | 199 ++
 .../policy/overloaded/nds_vfpmadt.c   | 199 ++
 18 files changed, 1323 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadb.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadt.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadb.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadt.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadb.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadt.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadb.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadt.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 2e20eee87902..85783989afbc 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1528,7 +1528,8 @@ static const riscv_extra_ext_flag_table_t 
riscv_extra_ext_flag_table[] =
   RISCV_EXT_FLAG_ENTRY ("xtheadvector",  x_riscv_isa_flags, MASK_FULL_V),
   RISCV_EXT_FLAG_ENTRY ("xtheadvector",  x_riscv_isa_flags, MASK_VECTOR),
 
-  RISCV_EXT_FLAG_ENTRY ("xandesvbfhcvt", x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_BF_16),
+  RISCV_EXT_FLAG_ENTRY ("xandesvbfhcvt",  x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_BF_16),
+  RISCV_EXT_FLAG_ENTRY ("xandesvpackfph", x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16),
 
   {NULL, NULL, NULL, 0}
 };
diff --git a/gcc/config/riscv/andes-vector-builtins-bases.cc 
b/gcc/config/riscv/andes-vector-builtins-bases.cc
index 2c19f32225b9..1bf8b9dc088e 100644
--- a/gcc/config/riscv/andes-vector-builtins-bases.cc
+++ b/gcc/config/riscv/andes-vec

[PATCH v3 3/5] vect: Add is_gather_scatter argument to misalignment hook.

2025-07-11 Thread Robin Dapp

This patch adds an is_gather_scatter argument to the
support_vector_misalignment hook.  All targets but riscv do not care
about alignment for gather/scatter so return true for is_gather_scatter.

gcc/ChangeLog:

* config/aarch64/aarch64.cc 
(aarch64_builtin_support_vector_misalignment):
Return true for gather/scatter.
* config/arm/arm.cc (arm_builtin_support_vector_misalignment):
Ditto.
* config/epiphany/epiphany.cc (epiphany_support_vector_misalignment):
Ditto.
* config/gcn/gcn.cc (gcn_vectorize_support_vector_misalignment):
Ditto.
* config/loongarch/loongarch.cc 
(loongarch_builtin_support_vector_misalignment):
Ditto.
* config/riscv/riscv.cc (riscv_support_vector_misalignment):
Add gather/scatter argument.
* config/rs6000/rs6000.cc (rs6000_builtin_support_vector_misalignment):
Return true for gather/scatter.
* config/s390/s390.cc (s390_support_vector_misalignment):
Ditto.
* doc/tm.texi: Add argument.
* target.def: Ditto.
* targhooks.cc (default_builtin_support_vector_misalignment):
Ditto.
* targhooks.h (default_builtin_support_vector_misalignment):
Ditto.
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
Ditto.
---
 gcc/config/aarch64/aarch64.cc | 12 +---
 gcc/config/arm/arm.cc | 12 +---
 gcc/config/epiphany/epiphany.cc   |  8 ++--
 gcc/config/gcn/gcn.cc |  6 +-
 gcc/config/loongarch/loongarch.cc |  8 ++--
 gcc/config/riscv/riscv.cc | 29 +++--
 gcc/config/rs6000/rs6000.cc   |  6 +-
 gcc/config/s390/s390.cc   |  6 --
 gcc/doc/tm.texi   |  8 +---
 gcc/target.def| 14 +-
 gcc/targhooks.cc  |  2 ++
 gcc/targhooks.h   |  2 +-
 gcc/tree-vect-data-refs.cc|  2 +-
 13 files changed, 85 insertions(+), 30 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 10b8ed5d387..0162c724cd2 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -356,7 +356,8 @@ static int aarch64_address_cost (rtx, machine_mode, 
addr_space_t, bool);
 static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
 const_tree type,
 int misalignment,
-bool is_packed);
+bool is_packed,
+bool 
is_gather_scatter);
 static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
 static bool aarch64_print_address_internal (FILE*, machine_mode, rtx,
aarch64_addr_query_type);
@@ -24401,10 +24402,14 @@ aarch64_simd_vector_alignment_reachable (const_tree 
type, bool is_packed)
 static bool
 aarch64_builtin_support_vector_misalignment (machine_mode mode,
 const_tree type, int misalignment,
-bool is_packed)
+bool is_packed,
+bool is_gather_scatter)
 {
   if (TARGET_SIMD && STRICT_ALIGNMENT)
 {
+  if (is_gather_scatter)
+   return true;
+
   /* Return if movmisalign pattern is not supported for this mode.  */
   if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
 return false;
@@ -24414,7 +24419,8 @@ aarch64_builtin_support_vector_misalignment 
(machine_mode mode,
return false;
 }
   return default_builtin_support_vector_misalignment (mode, type, misalignment,
- is_packed);
+ is_packed,
+ is_gather_scatter);
 }
 
 /* If VALS is a vector constant that can be loaded into a register
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index bde06f3fa86..29b45ae96bd 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -289,7 +289,8 @@ static bool arm_vector_alignment_reachable (const_tree 
type, bool is_packed);
 static bool arm_builtin_support_vector_misalignment (machine_mode mode,
 const_tree type,
 int misalignment,
-bool is_packed);
+bool is_packed,
+bool is_gather_scatter);
 static void arm_conditional_register_usage (void);
 static enum flt_eval_method arm_excess_precision (enum excess_precision_t

[PATCH] aarch64: Support unpacked SVE integer division

2025-07-11 Thread Spencer Abson

This patch extends the existing patterns for SVE_INT_BINARY_SD to
support partial SVE integer modes, including those implement the
conditional form.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (3): Extend
to SVE_SDI_SIMD.
(@aarch64_pred_): Likewise.
(@cond_): Extend to SVE_SDI.
(*cond__2): Likewise.
(*cond__3): Likewise.
(*cond__any): Likewise.
* config/aarch64/iterators.md (SVE_SDI): New iterator for
all SVE vector modes with 32-bit or 64-bit elements.
(SVE_SDI_SIMD): New iterator.  As above, but including
V4SI and V2DI.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/cond_arith_1.C: Rename TEST_SHIFT
to TEST_OP, add tests for SDIV and UDIV.
* g++.target/aarch64/sve/cond_arith_2.C: Likewise.
* g++.target/aarch64/sve/cond_arith_3.C: Likewise.
* g++.target/aarch64/sve/cond_arith_4.C: Likewise.
* gcc.target/aarch64/sve/div_2.c: New test.

---

Bootstrapped & regtested on aarch64-linux-gnu.  OK for master?

Thanks,
Spencer

---
 gcc/config/aarch64/aarch64-sve.md | 64 +--
 gcc/config/aarch64/iterators.md   |  7 ++
 .../g++.target/aarch64/sve/cond_arith_1.C | 25 +---
 .../g++.target/aarch64/sve/cond_arith_2.C | 25 +---
 .../g++.target/aarch64/sve/cond_arith_3.C | 27 +---
 .../g++.target/aarch64/sve/cond_arith_4.C | 27 +---
 gcc/testsuite/gcc.target/aarch64/sve/div_2.c  | 22 +++
 7 files changed, 127 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_2.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 6b5113eb70f..871b31623bb 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4712,12 +4712,12 @@
 ;; We can use it with Advanced SIMD modes to expose the V2DI and V4SI
 ;; optabs to the midend.
 (define_expand "3"
-  [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand")
-   (unspec:SVE_FULL_SDI_SIMD
+  [(set (match_operand:SVE_SDI_SIMD 0 "register_operand")
+   (unspec:SVE_SDI_SIMD
  [(match_dup 3)
-  (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD
-(match_operand:SVE_FULL_SDI_SIMD 1 "register_operand")
-(match_operand:SVE_FULL_SDI_SIMD 2 "register_operand"))]
+  (SVE_INT_BINARY_SD:SVE_SDI_SIMD
+(match_operand:SVE_SDI_SIMD 1 "register_operand")
+(match_operand:SVE_SDI_SIMD 2 "register_operand"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
   {
@@ -4727,12 +4727,12 @@
 
 ;; Integer division predicated with a PTRUE.
 (define_insn "@aarch64_pred_"
-  [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand")
-   (unspec:SVE_FULL_SDI_SIMD
+  [(set (match_operand:SVE_SDI_SIMD 0 "register_operand")
+   (unspec:SVE_SDI_SIMD
  [(match_operand: 1 "register_operand")
-  (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD
-(match_operand:SVE_FULL_SDI_SIMD 2 "register_operand")
-(match_operand:SVE_FULL_SDI_SIMD 3 "register_operand"))]
+  (SVE_INT_BINARY_SD:SVE_SDI_SIMD
+(match_operand:SVE_SDI_SIMD 2 "register_operand")
+(match_operand:SVE_SDI_SIMD 3 "register_operand"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
   {@ [ cons: =0 , 1   , 2 , 3 ; attrs: movprfx ]
@@ -4744,25 +4744,25 @@
 
 ;; Predicated integer division with merging.
 (define_expand "@cond_"
-  [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
-   (unspec:SVE_FULL_SDI
+  [(set (match_operand:SVE_SDI 0 "register_operand")
+   (unspec:SVE_SDI
  [(match_operand: 1 "register_operand")
-  (SVE_INT_BINARY_SD:SVE_FULL_SDI
-(match_operand:SVE_FULL_SDI 2 "register_operand")
-(match_operand:SVE_FULL_SDI 3 "register_operand"))
-  (match_operand:SVE_FULL_SDI 4 "aarch64_simd_reg_or_zero")]
+  (SVE_INT_BINARY_SD:SVE_SDI
+(match_operand:SVE_SDI 2 "register_operand")
+(match_operand:SVE_SDI 3 "register_operand"))
+  (match_operand:SVE_SDI 4 "aarch64_simd_reg_or_zero")]
  UNSPEC_SEL))]
   "TARGET_SVE"
 )
 
 ;; Predicated integer division, merging with the first input.
 (define_insn "*cond__2"
-  [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
-   (unspec:SVE_FULL_SDI
+  [(set (match_operand:SVE_SDI 0 "register_operand")
+   (unspec:SVE_SDI
  [(match_operand: 1 "register_operand")
-  (SVE_INT_BINARY_SD:SVE_FULL_SDI
-(match_operand:SVE_FULL_SDI 2 "register_operand")
-(match_operand:SVE_FULL_SDI 3 "register_operand"))
+  (SVE_INT_BINARY_SD:SVE_SDI
+(match_operand:SVE_SDI 2 "register_operand")
+(match_operand:SVE_SDI 3 "register_operand"))
   (match_dup 2)]
  UNSPEC_SEL))]
   "TARGET_SVE"
@@ -4774,12 +4774,12 @@
 
 ;; Predicated integer division, merging with the second input.
 (define_insn "*c

[PATCH] tree-optimization/121034 - fix reduction vectorization

2025-07-11 Thread Richard Biener

The following fixes the loop following the reduction chain to
properly visit all SLP nodes involved and makes the stmt info
and the SLP node we track match.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/121034
* tree-vect-loop.cc (vectorizable_reduction): Cleanup
reduction chain following code.

* gcc.dg/vect/pr121034.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr121034.c | 17 +++
 gcc/tree-vect-loop.cc| 31 ++--
 2 files changed, 32 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr121034.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr121034.c 
b/gcc/testsuite/gcc.dg/vect/pr121034.c
new file mode 100644
index 000..de207814aa9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr121034.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int b, e;
+char c, d;
+unsigned g;
+int abs(int);
+void f() {
+  char *a = &d;
+  int h;
+  for (; e; e++) {
+h = 0;
+for (; h < 16; h++)
+  g += __builtin_abs(a[h] - c);
+a += b;
+  }
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 54ac92b8a58..21c95100aef 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7318,17 +7318,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   unsigned reduc_chain_length = 0;
   bool only_slp_reduc_chain = true;
   stmt_info = NULL;
-  slp_tree slp_for_stmt_info = slp_node_instance->root;
+  slp_tree slp_for_stmt_info = NULL;
+  slp_tree vdef_slp = slp_node_instance->root;
   /* For double-reductions we start SLP analysis at the inner loop LC PHI
  which is the def of the outer loop live stmt.  */
   if (STMT_VINFO_DEF_TYPE (reduc_info) == vect_double_reduction_def)
-slp_for_stmt_info = SLP_TREE_CHILDREN (slp_for_stmt_info)[0];
+vdef_slp = SLP_TREE_CHILDREN (vdef_slp)[0];
   while (reduc_def != PHI_RESULT (reduc_def_phi))
 {
   stmt_vec_info def = loop_vinfo->lookup_def (reduc_def);
   stmt_vec_info vdef = vect_stmt_to_vectorize (def);
   int reduc_idx = STMT_VINFO_REDUC_IDX (vdef);
-
   if (reduc_idx == -1)
{
  if (dump_enabled_p ())
@@ -7345,14 +7345,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 the SLP node with live lane zero the other live lanes also
 need to be identified as part of a reduction to be able
 to skip code generation for them.  */
-  if (slp_for_stmt_info)
-   {
- for (auto s : SLP_TREE_SCALAR_STMTS (slp_for_stmt_info))
-   if (STMT_VINFO_LIVE_P (s))
- STMT_VINFO_REDUC_DEF (vect_orig_stmt (s)) = phi_info;
-   }
-  else if (STMT_VINFO_LIVE_P (vdef))
-   STMT_VINFO_REDUC_DEF (def) = phi_info;
+  for (auto s : SLP_TREE_SCALAR_STMTS (vdef_slp))
+   if (STMT_VINFO_LIVE_P (s))
+ STMT_VINFO_REDUC_DEF (vect_orig_stmt (s)) = phi_info;
   gimple_match_op op;
   if (!gimple_extract_op (vdef->stmt, &op))
{
@@ -7371,12 +7366,16 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 "conversion in the reduction chain.\n");
  return false;
}
+ vdef_slp = SLP_TREE_CHILDREN (vdef_slp)[0];
}
   else
{
  /* First non-conversion stmt.  */
  if (!stmt_info)
-   stmt_info = vdef;
+   {
+ stmt_info = vdef;
+ slp_for_stmt_info = vdef_slp;
+   }
 
  if (lane_reducing_op_p (op.code))
{
@@ -7384,7 +7383,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 reduction.  */
  gcc_assert (reduc_idx > 0 && reduc_idx == (int) op.num_ops - 1);
 
- slp_tree op_node = SLP_TREE_CHILDREN (slp_for_stmt_info)[0];
+ slp_tree op_node = SLP_TREE_CHILDREN (vdef_slp)[0];
  tree vectype_op = SLP_TREE_VECTYPE (op_node);
  tree type_op = TREE_TYPE (op.ops[0]);
  if (!vectype_op)
@@ -7415,14 +7414,14 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   < GET_MODE_SIZE (SCALAR_TYPE_MODE (type_op
vectype_in = vectype_op;
}
- else
+ else if (!vectype_in)
vectype_in = STMT_VINFO_VECTYPE (phi_info);
+ if (!REDUC_GROUP_FIRST_ELEMENT (vdef))
+   vdef_slp = SLP_TREE_CHILDREN (vdef_slp)[reduc_idx];
}
 
   reduc_def = op.ops[reduc_idx];
   reduc_chain_length++;
-  if (!stmt_info)
-   slp_for_stmt_info = SLP_TREE_CHILDREN (slp_for_stmt_info)[0];
 }
   /* PHIs should not participate in patterns.  */
   gcc_assert (!STMT_VINFO_RELATED_STMT (phi_info));
-- 
2.43.0

[PATCH][v2] Reject single lane vector types for SLP build

2025-07-11 Thread Richard Biener

The following makes us never consider vector(1) T types for
vectorization and ensures this during SLP build.  This is a
long-standing issue for BB vectorization and when we remove
early loop vector type setting we lose the single place we have
that rejects this for loops.

Once we implement partial loop vectorization we should revisit
this, but then use the original scalar types for the unvectorized
parts.

[v2]: With patch adjustment, more risc-v fallout was reported but
I cannot reproduce that so I think it's spurious.

* tree-vect-slp.cc (vect_build_slp_tree_1): Reject
single-lane vector types.

* gcc.dg/vect/bb-slp-39.c: Adjust.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-39.c | 3 +--
 gcc/tree-vect-slp.cc  | 9 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-39.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-39.c
index f05ce8f2847..255bb1095dc 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-39.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-39.c
@@ -16,5 +16,4 @@ void foo (double *p)
 }
 
 /* See that we vectorize three SLP instances.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "slp2" { 
target { ! { s390*-*-* riscv*-*-* } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 5 "slp2" { 
target {   s390*-*-* riscv*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "slp2" } 
} */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ad75386926a..d2ce4ffaa4f 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1114,6 +1114,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   matches[0] = false;
   return false;
 }
+  if (known_le (TYPE_VECTOR_SUBPARTS (vectype), 1U))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: not using single lane "
+"vector type %T\n", vectype);
+  matches[0] = false;
+  return false;
+}
   /* Record nunits required but continue analysis, producing matches[]
  as if nunits was not an issue.  This allows splitting of groups
  to happen.  */
-- 
2.43.0

[PATCH] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-11 Thread Jonathan Wakely

The if-consteval branches in std::make_exception_ptr and
std::exception_ptr_cast use a try-catch block, which gives an error for
-fno-exceptions. Just make them return a null pointer at compile-time
when -fno-exceptions is used, because there's no way to get an active
exception with -fno-exceptions.

For std::exception_ptr_cast the consteval branch doesn't depend on RTTI
being enabled, so we can move the check for __cpp_rtti into the runtime
branch. We can also remove the #else group and just fall through to the
return nullptr statement if there was no return from whichever branch of
the if-consteval was taken.

Also adjust some formatting and whitespace.

libstdc++-v3/ChangeLog:

* libsupc++/exception_ptr.h (make_exception_ptr): Return null
for consteval when -fno-exceptions is used.
(exception_ptr_cast): Likewise. Allow consteval path to work
with -fno-rtti.
---

The 17_intro and 18_support tests all pass, and the compiler's
g++.dg/cpp26/constexpr-eh* tests. The full testsuite is running now for
x86_64-linux.

 libstdc++-v3/libsupc++/exception_ptr.h | 28 --
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/libsupc++/exception_ptr.h 
b/libstdc++-v3/libsupc++/exception_ptr.h
index ee009155a39c..709d5d1740ca 100644
--- a/libstdc++-v3/libsupc++/exception_ptr.h
+++ b/libstdc++-v3/libsupc++/exception_ptr.h
@@ -83,9 +83,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
 
 #if __cpp_lib_exception_ptr_cast >= 202506L
   template
-  constexpr const _Ex* exception_ptr_cast(const exception_ptr&) noexcept;
+constexpr const _Ex* exception_ptr_cast(const exception_ptr&) noexcept;
   template
-  void exception_ptr_cast(const exception_ptr&&) = delete;
+void exception_ptr_cast(const exception_ptr&&) = delete;
 #endif
 
   namespace __exception_ptr
@@ -138,8 +138,8 @@ namespace std _GLIBCXX_VISIBILITY(default)
_GLIBCXX_USE_NOEXCEPT;
 #if __cpp_lib_exception_ptr_cast >= 202506L
   template
-  friend constexpr const _Ex* std::exception_ptr_cast(const exception_ptr&)
-   noexcept;
+   friend constexpr const _Ex*
+   std::exception_ptr_cast(const exception_ptr&) noexcept;
 #endif
 
   const void* _M_exception_ptr_cast(const type_info&) const
@@ -301,8 +301,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX26_CONSTEXPR exception_ptr
 make_exception_ptr(_Ex __ex) _GLIBCXX_USE_NOEXCEPT
 {
-#if __cplusplus >= 202400L
+#if __cpp_lib_exception_ptr_cast >= 202506L
   if consteval {
+#ifdef __cpp_exceptions
try
  {
throw __ex;
@@ -311,6 +312,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
  {
return current_exception();
  }
+#else
+   return exception_ptr();
+#endif
   }
 #endif
 #if __cplusplus >= 201103L && __cpp_rtti
@@ -353,7 +357,8 @@ namespace std _GLIBCXX_VISIBILITY(default)
 #if __cpp_lib_exception_ptr_cast >= 202506L
   template
 [[__gnu__::__always_inline__]]
-constexpr const _Ex* exception_ptr_cast(const exception_ptr& __p) noexcept
+constexpr const _Ex*
+exception_ptr_cast(const exception_ptr& __p) noexcept
 {
   static_assert(!std::is_const_v<_Ex>);
   static_assert(!std::is_reference_v<_Ex>);
@@ -361,8 +366,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
   static_assert(!std::is_array_v<_Ex>);
   static_assert(!std::is_pointer_v<_Ex>);
   static_assert(!std::is_member_pointer_v<_Ex>);
-#ifdef __cpp_rtti
+
   if consteval {
+#ifdef __cpp_exceptions
if (__p._M_exception_object)
  try
{
@@ -375,14 +381,14 @@ namespace std _GLIBCXX_VISIBILITY(default)
  catch (...)
{
}
-   return nullptr;
+#endif
   } else {
+#ifdef __cpp_rtti
const type_info &__id = typeid(const _Ex&);
return static_cast(__p._M_exception_ptr_cast(__id));
-  }
-#else
-  return nullptr;
 #endif
+  }
+  return nullptr;
 }
 #endif
 
-- 
2.50.0

Re: [PATCH] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-11 Thread Jonathan Wakely

On Fri, 11 Jul 2025 at 14:19, Jakub Jelinek  wrote:
>
> On Fri, Jul 11, 2025 at 01:47:18PM +0100, Jonathan Wakely wrote:
> Thanks for doing this.
>
> > @@ -301,8 +301,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
> >  _GLIBCXX26_CONSTEXPR exception_ptr
> >  make_exception_ptr(_Ex __ex) _GLIBCXX_USE_NOEXCEPT
> >  {
> > -#if __cplusplus >= 202400L
> > +#if __cpp_lib_exception_ptr_cast >= 202506L
>
> Just not sure how is make_exception_ptr related to
> __cpp_lib_exception_ptr_cast.
> Perhaps __cpp_constexpr_exceptions >= 202411L ?

Oops, I meant to use
#if __cpp_lib_constexpr_exceptions >= 202411L
i.e. the lib macro not the core one.

Re: [PATCH] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-11 Thread Jakub Jelinek

On Fri, Jul 11, 2025 at 02:26:28PM +0100, Jonathan Wakely wrote:
> On Fri, 11 Jul 2025 at 14:19, Jakub Jelinek  wrote:
> >
> > On Fri, Jul 11, 2025 at 01:47:18PM +0100, Jonathan Wakely wrote:
> > Thanks for doing this.
> >
> > > @@ -301,8 +301,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
> > >  _GLIBCXX26_CONSTEXPR exception_ptr
> > >  make_exception_ptr(_Ex __ex) _GLIBCXX_USE_NOEXCEPT
> > >  {
> > > -#if __cplusplus >= 202400L
> > > +#if __cpp_lib_exception_ptr_cast >= 202506L
> >
> > Just not sure how is make_exception_ptr related to
> > __cpp_lib_exception_ptr_cast.
> > Perhaps __cpp_constexpr_exceptions >= 202411L ?
> 
> Oops, I meant to use
> #if __cpp_lib_constexpr_exceptions >= 202411L
> i.e. the lib macro not the core one.

That is even better, agreed.

Jakub

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-11 Thread Kyrylo Tkachov



> On 10 Jul 2025, at 11:12, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 10 Jul 2025, at 10:40, Richard Sandiford  
>> wrote:
>> 
>> Kyrylo Tkachov  writes:
>>> Hi all,
>>> 
>>> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
>>> due to its tied operands, the destination of the movprfx cannot be also
>>> a source operand. But the offending pattern in aarch64-sve2.md tries
>>> to do exactly that for the "=?&w,w,w" alternative and gas warns for the
>>> attached testcase.
>>> 
>>> This patch just removes that alternative causing RA to emit a normal extra
>>> move.
>>> So for the testcase in the patch we now generate:
>>> nor_z:
>>> nbsl z1.d, z1.d, z2.d, z1.d
>>> mov z0.d, z1.d
>>> ret
>>> 
>>> instead of the previous:
>>> nor_z:
>>> movprfx z0, z1
>>> nbsl z0.d, z0.d, z2.d, z0.d
>>> ret
>>> 
>>> which generated a gas warning.
>> 
>> Shouldn't we instead change it to:
>> 
>>[ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, %0.d, 
>> %2.d, %1.d
>> 
>> ?  The "&" ensures that %1 is still valid in the NBSL.
>> 
>> (That's OK if it works.)
> 
> Yes, that seems to work, thanks.
> I’ll push this version after some more testing.
> 

Shall I backport this for GCC 15.2 as well?
The test case uses C operators which were enabled in GCC 15, though I suppose 
one could construct a pure ACLE intrinsics testcase too.

Thanks,
Kyrill 

> Kyrill
> 
>> 
>> Thanks,
>> Richard
>> 
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>> Ok for trunk?
>>> Do we want to backport it?
>>> 
>>> Thanks,
>>> Kyrill
>>> 
>>> 
>>> Signed-off-by: Kyrylo Tkachov 
>>> 
>>> gcc/
>>> 
>>> PR target/120999
>>> * config/aarch64/aarch64-sve2.md (*aarch64_sve2_nor):
>>> Remove movprfx alternative.
>>> 
>>> gcc/testsuite/
>>> 
>>> PR target/120999
>>> * gcc.target/aarch64/sve2/pr120999.c: New test.
>>> 
>>> From bd24ce298461ee8129befda1983acf1b37a7215a Mon Sep 17 00:00:00 2001
>>> From: Kyrylo Tkachov 
>>> Date: Wed, 9 Jul 2025 10:04:01 -0700
>>> Subject: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL
>>> implementation of NOR
>>> 
>>> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
>>> due to its tied operands, the destination of the movprfx cannot be also
>>> a source operand.  But the offending pattern in aarch64-sve2.md tries
>>> to do exactly that for the "=?&w,w,w" alternative and gas warns for the
>>> attached testcase.
>>> 
>>> This patch just removes that alternative causing RA to emit a normal extra
>>> move.
>>> So for the testcase in the patch we now generate:
>>> nor_z:
>>>   nbslz1.d, z1.d, z2.d, z1.d
>>>   mov z0.d, z1.d
>>>   ret
>>> 
>>> instead of the previous:
>>> nor_z:
>>>   movprfx z0, z1
>>>   nbslz0.d, z0.d, z2.d, z0.d
>>>   ret
>>> 
>>> which generated a gas warning.
>>> 
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>> 
>>> Signed-off-by: Kyrylo Tkachov 
>>> 
>>> gcc/
>>> 
>>> PR target/120999
>>> * config/aarch64/aarch64-sve2.md (*aarch64_sve2_nor):
>>> Remove movprfx alternative.
>>> 
>>> gcc/testsuite/
>>> 
>>> PR target/120999
>>> * gcc.target/aarch64/sve2/pr120999.c: New test.
>>> ---
>>> gcc/config/aarch64/aarch64-sve2.md   | 11 ---
>>> gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c | 15 +++
>>> 2 files changed, 19 insertions(+), 7 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
>>> b/gcc/config/aarch64/aarch64-sve2.md
>>> index 15714712d3b..504ba6fc39b 100644
>>> --- a/gcc/config/aarch64/aarch64-sve2.md
>>> +++ b/gcc/config/aarch64/aarch64-sve2.md
>>> @@ -1616,20 +1616,17 @@
>>> 
>>> ;; Use NBSL for vector NOR.
>>> (define_insn_and_rewrite "*aarch64_sve2_nor"
>>> -  [(set (match_operand:SVE_FULL_I 0 "register_operand")
>>> +  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w")
>>> (unspec:SVE_FULL_I
>>>  [(match_operand 3)
>>>   (and:SVE_FULL_I
>>> (not:SVE_FULL_I
>>> -(match_operand:SVE_FULL_I 1 "register_operand"))
>>> +(match_operand:SVE_FULL_I 1 "register_operand" "%0"))
>>> (not:SVE_FULL_I
>>> -(match_operand:SVE_FULL_I 2 "register_operand")))]
>>> +(match_operand:SVE_FULL_I 2 "register_operand" "w")))]
>>>  UNSPEC_PRED_X))]
>>>  "TARGET_SVE2"
>>> -  {@ [ cons: =0 , %1 , 2 ; attrs: movprfx ]
>>> - [ w, 0  , w ; *  ] nbsl\t%0.d, %0.d, %2.d, %0.d
>>> - [ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, 
>>> %0.d, %2.d, %0.d
>>> -  }
>>> +  "nbsl\t%0.d, %0.d, %2.d, %0.d"
>>>  "&& !CONSTANT_P (operands[3])"
>>>  {
>>>operands[3] = CONSTM1_RTX (mode);
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c 
>>> b/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
>>> new file mode 100644
>>> index 000..1cdfa4107ae
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
>>> @@ -0,0 +1,15 @@
>>> +/* PR target/120999.  */
>>> +/* { dg-do a

[PATCH v2] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-11 Thread Jonathan Wakely

The if-consteval branches in std::make_exception_ptr and
std::exception_ptr_cast use a try-catch block, which gives an error for
-fno-exceptions. Just make them return a null pointer at compile-time
when -fno-exceptions is used, because there's no way to get an active
exception with -fno-exceptions.

For both functions we have a runtime-only branch that depends on RTTI,
and a fallback using try-catch which works for runtime and consteval.
Rearrange both functions to express this logic more clearly.

Also adjust some formatting and whitespace elsewhere in the file.

libstdc++-v3/ChangeLog:

* libsupc++/exception_ptr.h (make_exception_ptr): Return null
for consteval when -fno-exceptions is used.
(exception_ptr_cast): Likewise. Allow consteval path to work
with -fno-rtti.
---

Testing x86_64-linux.

 libstdc++-v3/libsupc++/exception_ptr.h | 113 -
 1 file changed, 55 insertions(+), 58 deletions(-)

diff --git a/libstdc++-v3/libsupc++/exception_ptr.h 
b/libstdc++-v3/libsupc++/exception_ptr.h
index ee009155a39c..f673a3343338 100644
--- a/libstdc++-v3/libsupc++/exception_ptr.h
+++ b/libstdc++-v3/libsupc++/exception_ptr.h
@@ -83,9 +83,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
 
 #if __cpp_lib_exception_ptr_cast >= 202506L
   template
-  constexpr const _Ex* exception_ptr_cast(const exception_ptr&) noexcept;
+constexpr const _Ex* exception_ptr_cast(const exception_ptr&) noexcept;
   template
-  void exception_ptr_cast(const exception_ptr&&) = delete;
+void exception_ptr_cast(const exception_ptr&&) = delete;
 #endif
 
   namespace __exception_ptr
@@ -138,8 +138,8 @@ namespace std _GLIBCXX_VISIBILITY(default)
_GLIBCXX_USE_NOEXCEPT;
 #if __cpp_lib_exception_ptr_cast >= 202506L
   template
-  friend constexpr const _Ex* std::exception_ptr_cast(const exception_ptr&)
-   noexcept;
+   friend constexpr const _Ex*
+   std::exception_ptr_cast(const exception_ptr&) noexcept;
 #endif
 
   const void* _M_exception_ptr_cast(const type_info&) const
@@ -296,40 +296,41 @@ namespace std _GLIBCXX_VISIBILITY(default)
   using __exception_ptr::swap; // So that std::swap(exp1, exp2) finds it.
 
   /// Obtain an exception_ptr pointing to a copy of the supplied object.
-#if (__cplusplus >= 201103L && __cpp_rtti) || __cpp_exceptions
   template
+#if !(__cplusplus >= 201103L && __cpp_rtti) && !__cpp_exceptions
+// This is always_inline so the linker will never use a useless definition
+// instead of a working one compiled with RTTI and/or exceptions enabled.
+__attribute__ ((__always_inline__)) inline
+#endif
 _GLIBCXX26_CONSTEXPR exception_ptr
 make_exception_ptr(_Ex __ex) _GLIBCXX_USE_NOEXCEPT
 {
-#if __cplusplus >= 202400L
-  if consteval {
-   try
- {
-   throw __ex;
- }
-   catch(...)
- {
-   return current_exception();
- }
-  }
-#endif
 #if __cplusplus >= 201103L && __cpp_rtti
-  using _Ex2 = typename decay<_Ex>::type;
-  void* __e = __cxxabiv1::__cxa_allocate_exception(sizeof(_Ex));
-  (void) __cxxabiv1::__cxa_init_primary_exception(
- __e, const_cast(&typeid(_Ex)),
- __exception_ptr::__dest_thunk<_Ex2>);
-  __try
+  // For runtime calls with -frtti enabled we can avoid try-catch overhead.
+  // We can't use this for C++98 because it relies on std::decay.
+#ifdef __glibcxx_constexpr_exceptions
+  if ! consteval
+#endif
{
- ::new (__e) _Ex2(__ex);
- return exception_ptr(__e);
+ using _Ex2 = typename decay<_Ex>::type;
+ void* __e = __cxxabiv1::__cxa_allocate_exception(sizeof(_Ex));
+ (void) __cxxabiv1::__cxa_init_primary_exception(
+ __e, const_cast(&typeid(_Ex)),
+ __exception_ptr::__dest_thunk<_Ex2>);
+ __try
+   {
+ ::new (__e) _Ex2(__ex);
+ return exception_ptr(__e);
+   }
+ __catch(...)
+   {
+ __cxxabiv1::__cxa_free_exception(__e);
+ return current_exception();
+   }
}
-  __catch(...)
-   {
- __cxxabiv1::__cxa_free_exception(__e);
- return current_exception();
-   }
-#else
+#endif
+
+#ifdef __cpp_exceptions
   try
{
   throw __ex;
@@ -339,21 +340,14 @@ namespace std _GLIBCXX_VISIBILITY(default)
  return current_exception();
}
 #endif
+  return exception_ptr();
 }
-#else // no RTTI and no exceptions
-  // This is always_inline so the linker will never use this useless definition
-  // instead of a working one compiled with RTTI and/or exceptions enabled.
-  template
-__attribute__ ((__always_inline__))
-_GLIBCXX26_CONSTEXPR inline exception_ptr
-make_exception_ptr(_Ex) _GLIBCXX_USE_NOEXCEPT
-{ return exception_ptr(); }
-#endif
 
 #if __cpp_lib_exception_ptr_cast >= 202506L
   template
 [[__gnu__::__always_inline__]]
-constexpr const

Fix ICE with speculative devirtualization

2025-07-11 Thread Jan Hubicka

Hi,
this patch fixes ICE bilding lto1 with autoprofiledbootstrap and in pr114790.
What happens is that auto-fdo speculatively devirtualizes to a wrong target.
This is due to a bug where it mixes up dwarf names and linkage names of inline
functions I need to fix as well.

Later we clone at WPA time. At ltrans time clone is materialized and call is
turned into a direct call (this optimization is missed by ipa-cp propagation).
At this time we should resolve speculation but we don't.  As a result we get
error from verifier after inlining complaining that there is speculative call
with corresponding direct call lacking speculative flag.

This seems long-lasting problem in cgraph_update_edges_for_call_stmt_node but
I suppose it does not trigger since we usually speculate correctly or notice
the direct call at WPA time already.

profiledbootstrapped and autoprofiledbootstraped/regtested x86_64-linux

gcc/ChangeLog:

PR ipa/114790
* cgraph.cc (cgraph_update_edges_for_call_stmt_node): Resolve 
devirtualization
if call statement was optimized out or turned to direct call.

gcc/testsuite/ChangeLog:

* g++.dg/lto/pr114790_0.C: New test.
* g++.dg/lto/pr114790_1.C: New test.


diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 94a2e6e6105..32071a84bac 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -1790,6 +1790,19 @@ cgraph_update_edges_for_call_stmt_node (cgraph_node 
*node,
 
   if (e)
{
+ /* If call was devirtualized during cloning, mark edge
+as resolved.  */
+ if (e->speculative)
+   {
+ if (new_stmt && is_gimple_call (new_stmt))
+   {
+ tree decl = gimple_call_fndecl (new_stmt);
+ if (decl)
+   e = cgraph_edge::resolve_speculation (e, decl);
+   }
+ else
+   e = cgraph_edge::resolve_speculation (e, NULL);
+   }
  /* Keep calls marked as dead dead.  */
  if (new_stmt && is_gimple_call (new_stmt) && e->callee
  && fndecl_built_in_p (e->callee->decl, BUILT_IN_UNREACHABLE,

diff --git a/gcc/testsuite/g++.dg/lto/pr114790_0.C 
b/gcc/testsuite/g++.dg/lto/pr114790_0.C
new file mode 100644
index 000..eed112df389
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr114790_0.C
@@ -0,0 +1,16 @@
+// { dg-lto-do link }
+// { dg-lto-options { { -w -flto -g -flto-partition=1to1 -O2 -shared -fPIC 
-fvisibility=hidden} } }
+// { dg-require-effective-target fpic }
+// { dg-require-effective-target shared }
+struct APITracerContext {
+  virtual ~APITracerContext() = default;
+  virtual void releaseActivetracersList() = 0;
+};
+struct APITracerContextImp : APITracerContext {
+  ~APITracerContextImp() override;
+  void releaseActivetracersList() override;
+};
+struct APITracerContextImp globalAPITracerContextImp;
+struct APITracerContextImp *pGlobalAPITracerContextImp = 
&globalAPITracerContextImp;
+APITracerContextImp::~APITracerContextImp() {}
+
diff --git a/gcc/testsuite/g++.dg/lto/pr114790_1.C 
b/gcc/testsuite/g++.dg/lto/pr114790_1.C
new file mode 100644
index 000..511fae45be8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr114790_1.C
@@ -0,0 +1,15 @@
+struct APITracerContext {
+  virtual void releaseActivetracersList() = 0;
+};
+extern struct APITracerContextImp *pGlobalAPITracerContextImp;
+struct APITracerContextImp : APITracerContext { void 
releaseActivetracersList();};
+int g();
+inline int
+apiTracerWrapperImp(  ) {
+  for (int i = 0; i < g(); i++) 
+  pGlobalAPITracerContextImp->releaseActivetracersList();
+}
+__attribute__((visibility("default"))) int
+zeCommandListAppendMemoryCopyTracing() {
+  return apiTracerWrapperImp(  );
+}

Re: make autprofiledbootstrap with LTO meaningful

2025-07-11 Thread Andi Kleen

On Fri, Jul 11, 2025 at 12:14:46PM +0200, Jan Hubicka wrote:
> Hello,
> currently autoprofiled bootstrap produces auto-profiles for cc1 and
> cc1plus binaries.  Those are used to build respective frontend files.
> For backend cc1plus.fda is used.   This does not work well with LTO
> bootstrap where cc1plus backend is untrained since it is used only for
> parsing and ealry opts. As a result all binaries gets most of the
> backend optimized for size rather then speed.
> 
> This patch adds lto1.fda and then combines all of cc1, cc1plus and lto1 into
> all.fda that is used compiling common modules.  This is more or less
> equivalent to what -fprofile-use effectively uses modulo that with
> -fprofile-use we know number of runs of evety object file and scale
> accordingly at LTO time.
> 
> There is comment disabling lto1 profiling claiming it does not work. Indeed I
> get an ICE which I fixed in separate patch.
> 
> autoprofiledbootstrapped x86_64-linux with the extra fix, profiledbootstrap is
> running, OK if it passes?

Looks good to me.

Also should enable the missing languages, in particular fortran and
maybe a few others.

-Andi

Re: [PATCH v3] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-11 Thread Jonathan Wakely


On 10/07/25 09:48 +0200, Björn Schäpers wrote:

From: Björn Schäpers 

On Windows there is no API to get the current time zone as IANA name,
instead Windows has its own zones. But there exists a mapping provided
by the Unicode Consortium. This patch adds a script to convert the XML
file with the mapping to a lookup table and adds a Windows code path to
use that mapping.

libstdc++-v3/Changelog:

Implement std::chrono::current_zone() for Windows

* scripts/gen_windows_zones_map.py: New file, generates
windows_zones-map.h.
* src/c++20/windows_zones-map.h: New file, contains the look up
table.
* src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code path.

Signed-off-by: Björn Schäpers 
---
libstdc++-v3/scripts/gen_windows_zones_map.py | 127 ++
libstdc++-v3/src/c++20/tzdb.cc| 103 -
libstdc++-v3/src/c++20/windows_zones-map.h| 407 ++
3 files changed, 635 insertions(+), 2 deletions(-)
create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py
create mode 100644 libstdc++-v3/src/c++20/windows_zones-map.h

diff --git a/libstdc++-v3/scripts/gen_windows_zones_map.py 
b/libstdc++-v3/scripts/gen_windows_zones_map.py
new file mode 100644


This file's mode should be 100755 (i.e. executable), but I can take
care of that.


index 000..9ac559209cc
--- /dev/null
+++ b/libstdc++-v3/scripts/gen_windows_zones_map.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python3
+#
+# Script to generate the map for libstdc++ std::chrono::current_zone under 
Windows.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# To update the Libstdc++ static data in src/c++20/windows_zones-map.h 
download the latest:
+# 
https://raw.githubusercontent.com/unicode-org/cldr/master/common/supplemental/windowsZones.xml
+# Then run this script and save the output to
+# src/c++20/windows_zones-map.h
+
+import os
+import sys
+import xml.etree.ElementTree as et
+
+if len(sys.argv) != 2:
+print("Usage: %s " % sys.argv[0], file=sys.stderr)
+sys.exit(1)
+
+self = os.path.basename(__file__)
+print("// Generated by scripts/{}, do not edit.".format(self))
+print("""
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/windows_zones-map.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{chrono}
+ */
+""")
+
+print("#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP")
+print('# error "This is not a public header, do not include it directly"')
+print("#endif\n")


If we save the generated file in the src/c++20 directory then it can't
be included by users at all, so we don't need the @file doxygen
comment, nor this #ifndef and #error


+
+class WindowsZoneMapEntry:
+def __init__(self, windows, territory, iana):
+self.windows = windows
+self.territory = territory
+self.iana = iana
+
+def __lt__(self, other):
+if self.windows < other.windows:
+return True
+if self.windows > other.windows:
+return False
+return self.territory < other.territory


This could be:

 return (self.windows, self.territory) < (other.windows, 
other.territory)

i.e. the equivalent of using std::tie(...) < std::tie(...).


+windows_zone_map = []
+
+tree = et.parse(sys.argv[1])
+xml_zone_map = tree.getroot().find("windowsZones").find("mapTimezones")
+

Re: [PATCH v2] libstdc++: implement Philox Engine [PR119794]

2025-07-11 Thread Patrick Palka

Hi,

On Thu, 22 May 2025, 1nfocalypse wrote:

> Implements Philox Engine (P2075R6) and associated tests.
> 
> v2 corrects a multiline comment left in error in serialize.cc, and 
> additionally corrects a bug hidden by said comment, where the stream was 
> given the output of 'y()' instead of 'y', causing state to be
> incorrectly passed. Lastly, it fixes numerous whitespace issues found in the 
> original patch. My apologies for not noticing prior to the submission of the 
> original patch, which can now be disregarded.
> 
> To reiterate from the original email, the template unpacking functions are 
> placed in a private classifier prior to the public one due to an ordering 
> bug, where in order to function correctly, they must be
> placed prior to the bulk of the header. This is counter to the style 
> recommendations, but I was unable to obtain functionality any other way. 
> Additionally, while SIMD instructions are not utilized, and I do
> not think that they would integrate well with how the generator's state is 
> currently handled, some structure choices could be made that may make them of 
> interest.
> 
> Lastly, since word width can be specified, and thus atypical, maximum value 
> is calculated via some bit manipulation rather than numeric_limits, since the 
> specified width may differ from the width of the type
> used.
> 
> Built/tested on x86_64-linux-gnu.

Sorry for the delay and thanks for your patience!  Some initial review
comments below.

> 
>  *  1nfocalypse

>  > 
>  > 
>  Subject: [PATCH] [PATCH v2] libstdc++: implement Philox Engine [PR119794]
>  
>  The template unpacking functions, while private, are placed prior
>  to the public access specifier due to issues where the template
>  pack could not be unpacked and used to populate the public member
>  arrays without being declared beforehand.
>  
>  Additionally, the tests implemented attempt to mirror the tests
>  for other engines, when they apply. Changes to random
>  provided cause for changing 'pr60037-neg.cc' because it suppresses
>  an error by explicit line number. It should still be correctly
>  suppressed in this patch. Lastly, v2 fixes an issue in
>  'serialize.cc' for Philox, where a portion of the test was
>  commented out, hiding a bug where 'y()' was passed to the
>  stream instead of 'y'. Both have been remedied in this patch.

This implements the changes to the original paper
https://wg21.link/lwg4134
https://wg21.link/lwg4153
right?  Maybe we could make a note of that in the commit message.

>  
>  Plus some whitespace fixes.
>  
>   PR libstdc++/119794
>  
>  libstdc++-v3/ChangeLog:
>  
>   * include/bits/random.h: Add Philox Engine components.
>   * include/bits/random.tcc: Implement Philox Engine components.
>   * testsuite/26_numerics/random/pr60037-neg.cc: Alter line #.
>   * testsuite/26_numerics/random/inequal.cc: New test.
>   * testsuite/26_numerics/random/philox4x32.cc: New test.
>   * testsuite/26_numerics/random/philox4x64.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/cons/
>   119794.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/cons/
>   copy.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/cons/
>   default.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/cons/
>   seed.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/cons/
>   seed_seq.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/operators/
>   equal.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/operators/
>   inequal.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/operators/
>   serialize.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/requirements/
>   constants.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/requirements/
>   constexpr_data.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/requirements/
>   constexpr_functions.cc: New test.
>   * testsuite/26_numerics/random/philox_engine/requirements/
>   typedefs.cc: New test.
>  ---
>   libstdc++-v3/include/bits/random.h| 340 ++
>   libstdc++-v3/include/bits/random.tcc  | 201 +++
>   .../testsuite/26_numerics/random/inequal.cc   |  49 +++
>   .../26_numerics/random/philox4x32.cc  |  42 +++
>   .../26_numerics/random/philox4x64.cc  |  42 +++
>   .../random/philox_engine/cons/119794.cc   |  57 +++
>   .../random/philox_engine/cons/copy.cc |  44 +++
>   .../random/philox_engine/cons/default.cc  |  46 +++
>   .../random/philox_engine/cons/seed.cc |  39 ++
>   .../random/philox_engine/cons/seed_seq.cc |  41 +++
>   .../random/philox_engine/operators/equal.cc   |  49 +++
>   .../random/philox_engine/operators/inequal.cc |  49 +++
>   .../philox_engine/operators/serialize.cc  |  68 
>   .../philox_engine/requirements/c

Patch ping (Re: [PATCH] libstdc++: library side of C++26 P2786R13 - Trivial Relocatability [PR119064])

2025-07-11 Thread Jakub Jelinek

Hi!

On Tue, Jun 17, 2025 at 01:14:03PM +0200, Jakub Jelinek wrote:
> Here is a new version of the library side of the C++26 P2786R13 paper.
> For if constexpr the patch uses __builtin_constant_p trick to figure
> out if __result is non-equality comparable with __first, it adds recursion
> for the is_array_v cases, adds qualification on several calls and rewrites
> the testcase, such that it is hopefully valid and also tests the constant
> evaluation.

Now that the core part is in, I'd like to ping the library side of the
paper.
https://gcc.gnu.org/pipermail/libstdc++/2025-June/062131.html

Thanks.

> 2025-06-17  Jakub Jelinek  
> 
>   PR c++/119064
>   * include/bits/version.def (trivially_relocatable): New.
>   * include/bits/version.h: Regenerate.
>   * include/std/type_traits (std::is_trivially_relocatable,
>   std::is_nothrow_relocatable, std::is_replaceable): New traits.
>   std::is_trivially_relocatable_v, std::is_nothrow_relocatable_v,
>   std::is_replaceable_v): New trait variable templates.
>   * include/std/memory (__glibcxx_want_trivially_relocatable): Define
>   before including bits/version.h.
>   (std::trivially_relocate): New template function.
>   (std::relocate): Likewise.
>   * testsuite/std/memory/relocate/relocate.cc: New test.

Jakub

Re: [PATCH v2] libstdc++: implement Philox Engine [PR119794]

2025-07-11 Thread Jonathan Wakely

On Fri, 11 Jul 2025 at 18:12, Jonathan Wakely  wrote:
> > >  +static constexpr std::array multipliers =
> > >  +   philox_engine::__popMultArray();
> > >  +static constexpr std::array round_consts =
> > >  +   philox_engine::__popConstArray();
>
> Since you're creating static data members for these anyway, would it
> make more sense for them to just be variable templates instead of
> calling

Sorry, ignore this incomplete sentence, I changed my mind about
variable templates and wrote this instead ...

> Can we just use a single function for both of these?
>
> template
> static constexpr array
> _S_populate_array()
> {
>   if constexpr (__n == 4)
> return {__consts...[_Ind0], __consts[_Ind1]};
>   else
> return {__consts...[_Ind0]};
> }
>
> then:
>
> static constexpr std::array multipliers
>= _S_populate_array<0, 2>();
> static constexpr std::array round_consts
>= _S_populate_array<1, 3>();
>

Or slightly simpler:

template
static constexpr array
_S_populate_array()
{
  if constexpr (__n == 4)
return {__consts...[_Offset], __consts...[_Offset+2]};
  else
return {__consts...[_Offset]};
}

static constexpr std::array multipliers
   = _S_populate_array<0>();
static constexpr std::array round_consts
   = _S_populate_array<1>();

Re: [PATCH] x86-64: Add --enable-x86-64-mfentry

2025-07-11 Thread Siddhesh Poyarekar


On 2025-07-11 15:28, Uros Bizjak wrote:

Why not just switch over unconditionally?  __fentry__ seems like a
better alternative to mcount overall and it has been around long enough
that even older deployments should be relatively unaffected.


Actually, it is switched on by default for i?86-*-linux* |
x86_64-*-linux*. The default for --enable-x86-64-mfentry is "auto",
which triggers the mentioned condition. One still has a chance to use
"yes" or "no" in addition to "auto" when configuring with
--{enable|disable}-x86-64-mfentry.


Oh that's good then.

Thanks,
Sid

Re: [PATCH 2/2] lra: Reallow reloading user hard registers if the insn is not asm [PR 120983]

2025-07-11 Thread Vladimir Makarov




On 7/8/25 9:43 PM, Xi Ruoyao wrote:

The PR 87600 fix has disallowed reloading user hard registers to resolve
earlyclobber-induced conflict.

However before reload, recog completely ignores the constraints of
insns, so the RTL passes may produce insns where some user hard
registers violate an earlyclobber.  Then we'll get an ICE without
reloading them, like what we are recently encountering in LoongArch test
suite.

IIUC "recog does not look at constraints until reload" has been a
well-established rule in GCC for years and I don't have enough skill to
challange it.  So reallow reloading user hard registers (but still
disallow doing so for asm) to fix the ICE.


There was a development to avoid hard regs in insns except for moves 
before RA.   But they can slip through as there is no pass guaranteeing 
that.


So the patch has a sense and there is no harm from it.

The patch is ok for me for the trunk.  I don't expect any complication 
with it but it is difficult predict it for sure.  So if there is issues 
with the patch on other targets, please revert it as I'll be away for a 
week.


Thank you for your work.


gcc/ChangeLog:

PR rtl-optimization/120983
* lra-constraints.cc (process_alt_operands): Allow reloading
user hard registers unless the insn is an asm.
---
  gcc/lra-constraints.cc | 19 ---
  1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 68aaf863a97..abb5e0bf237 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -2416,14 +2416,15 @@ process_alt_operands (int only_alternative)
if (curr_static_id->operand[nop].type == OP_INOUT
|| curr_static_id->operand[m].type == OP_INOUT)
  break;
-   /* Operands don't match.  If the operands are
-  different user defined explicit hard
+   /* Operands don't match.  For asm if the operands
+  are different user defined explicit hard
   registers, then we cannot make them match
   when one is early clobber operand.  */
if ((REG_P (*curr_id->operand_loc[nop])
 || SUBREG_P (*curr_id->operand_loc[nop]))
&& (REG_P (*curr_id->operand_loc[m])
-   || SUBREG_P (*curr_id->operand_loc[m])))
+   || SUBREG_P (*curr_id->operand_loc[m]))
+   && INSN_CODE (curr_insn) < 0)
  {
rtx nop_reg = *curr_id->operand_loc[nop];
if (SUBREG_P (nop_reg))
@@ -3328,19 +3329,15 @@ process_alt_operands (int only_alternative)
  first_conflict_j = j;
last_conflict_j = j;
/* Both the earlyclobber operand and conflicting operand
-  cannot both be user defined hard registers.  */
+  cannot both be user defined hard registers for asm.
+  Let curr_insn_transform diagnose it.  */
if (HARD_REGISTER_P (operand_reg[i])
&& REG_USERVAR_P (operand_reg[i])
&& operand_reg[j] != NULL_RTX
&& HARD_REGISTER_P (operand_reg[j])
-   && REG_USERVAR_P (operand_reg[j]))
- {
-   /* For asm, let curr_insn_transform diagnose it.  */
-   if (INSN_CODE (curr_insn) < 0)
+   && REG_USERVAR_P (operand_reg[j])
+   && INSN_CODE (curr_insn) < 0)
  return false;
-   fatal_insn ("unable to generate reloads for "
-   "impossible constraints:", curr_insn);
- }
  }
  if (last_conflict_j < 0)
continue;

Re: [PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]

2025-07-11 Thread Richard Sandiford

Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, July 11, 2025 4:23 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Alex Coplan ; Alice Carlotti 
>> ;
>> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw
>> ; Tamar Christina ;
>> Wilco Dijkstra 
>> Subject: [PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]
>> 
>> This PR is partly about a code quality regression that was triggered
>> by g:caa7a99a052929d5970677c5b639e1fa5166e334.  That patch taught the
>> gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
>> upon either (a) the original permutations not being "native" operations
>> or (b) the combined permutation being a "native" operation.
>> 
>> Whether something is a "native" operation is tested by calling
>> can_vec_perm_const_p with allow_variable_p set to false.  This requires
>> the permutation to be supported directly by
>> TARGET_VECTORIZE_VEC_PERM_CONST,
>> rather than falling back to the general vec_perm optab.
>> 
>> This exposed a problem with the way that we handled general 2-input
>> permutations for SVE.  Unlike Advanced SIMD, base SVE does not have
>> an instruction to do general 2-input permutations.  We do still implement
>> the vec_perm optab for SVE, but only when the vector length is known at
>> compile time.  The general expansion is pretty expensive: an AND, a SUB,
>> two TBLs, and an ORR.  It certainly couldn't be considered a "native"
>> operation.
>> 
>> However, if a VEC_PERM_EXPR has a constant selector, the indices can
>> be wider than the elements being permuted.  This is not true for the
>> vec_perm optab, where the indices and permuted elements must have the
>> same precision.
>> 
>> This leads to one case where we cannot leave a general 2-input permutation
>> to be handled by the vec_perm optab: when permuting bytes on a target
>> with 2048-bit vectors.  In that case, the indices of the elements in
>> the second vector are in the range [256, 511], which cannot be stored
>> in a byte index.
>> 
>> TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
>> permutations for one specific case.  Rather than check for that
>> specific case, the code went ahead and used the vec_perm expansion
>> whenever it worked.  But that undermines the !allow_variable_p
>> handling in can_vec_perm_const_p; it becomes impossible for
>> target-independent code to distinguish "native" operations from
>> the worst-case fallback.
>> 
>> This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
>> cases that it has to handle.  It fixes the PR for all vector lengths
>> except 2048 bits.
>> 
>> A better fix would be to introduce some sort of costing mechanism,
>> which would allow us to reject the new VEC_PERM_EXPR even for
>> 2048-bit targets.  But that would be a significant amount of work
>> and would not be backportable.
>> 
>> Tested on aarch64-linux-gnu.  OK to install?
>
> Ok.

Thanks.

> Thanks!
>
> I'm somewhat surprised by
> "aarch64_expand_sve_vec_perm does not yet handle variable-length vectors"
>
> I assume cases that we could handle are if the permute values are
> a series right? It doesn't seem like we could handle an arbitrary permute 
> with VLA.

Yeah, but the constant selector series that we can handle are matched
first, trying things like ZIP, UZP, TRN, EXT, etc.  aarch64_evpc_sve_tbl
is the fallback for when all other things fail.

aarch64_expand_sve_vec_perm could probably be taught to handle 16-bit
and wider elements for VLA.  Not sure how good the code would be though.

Richard

Re: [PATCH] testsuite: Disable musttail tests if target uses SJLJ exceptions

2025-07-11 Thread Andi Kleen

Dimitar Dimitrov  writes:

> A few tests started failing recently on pru-unknown-elf because it uses
> SJLJ implementation for exceptions:
>   FAIL: g++.dg/ext/musttail3.C  -std=c++11 (test for excess errors)
>   .../gcc/gcc/testsuite/g++.dg/ext/musttail3.C:12:34: error: cannot 
> tail-call: caller uses sjlj exceptions
>
> Fix by disabling those tests if target uses SJLJ for implementing
> exceptions.
>
> Ensured that test results with and without this patch for
> x86_64-pc-linux-gnu are the same.
>
> Ok for trunk?

I would rather make it XFAIL and also open a PR, after all it is a
limitation that could (and should) be fixed.

As more and more software uses musttail it will hurt those targets
eventually.

-Andi

Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-11 Thread Andrew Pinski

On Fri, Jul 11, 2025 at 2:51 AM Claudiu Zissulescu-Ianculescu
 wrote:
>
> Hi,
> >
> > Currently, the data type of sanitizer flags is unsigned int, with
> > SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
> > enumerator for enum sanitize_code.  Use 'uint64_t' data type to allow
> > for more distinct instrumentation modes be added when needed.
> >
> >
> >
> > I have not looked yet but does it make sense to use `unsigned
> > HOST_WIDE_INT` instead of uint64_t? HWI should be the same as uint64_t
> > but it is more consistent with the rest of gcc.
> > Plus since tree_to_uhwi is more consistent there.
> >
> That was in the v2, however, the reviewers suggested to use uint64_t.

I see it now from Richard B.. Also I noticed you missed Richard S.'s
suggestion of using a typedef which will definitely help in the future
where we could even replace this with an enum class and overload the
bitwise operators to do the right thing.

Thanks,
Andrew

>
> Best wishes,
> Claudiu

Re: [PATCH] x86-64: Add --enable-x86-64-mfentry

2025-07-11 Thread Sam James

Siddhesh Poyarekar  writes:

> On 2025-07-08 18:07, Sam James wrote:
>>> OK in principle, but please allow some time for distro maintainers
>>> (CC'd) to voice their opinion.
>> It looks good to me and I plan on us using it. I'd like opinions
>> from
>> one other group first before it goes in if possible though, as our
>> perspective is different from others (e.g. we don't have to worry about
>> old enterprise deployments).
>
> Why not just switch over unconditionally?  __fentry__ seems like a
> better alternative to mcount overall and it has been around long
> enough that even older deployments should be relatively unaffected.

I think if we do that, we'll need a target tuple check for glibc, as
musl doesn't support fentry at a glance. Maybe we should have that
anyway though.

Looks like in glibc, it goes back to:

commit d22e4cc9397ed41534c9422d0b0ffef8c77bfa53
Author: Andi Kleen 
AuthorDate: Sat Aug 7 21:24:05 2010 -0700
Commit: Ulrich Drepper 
CommitDate: Sat Aug 7 21:24:05 2010 -0700

x86: Add support for frame pointer less mcount

.. which means >=glibc-2.19.

Re: Rewrite assign_discriminators pass

2025-07-11 Thread Andrew Pinski

On Fri, Jul 11, 2025 at 2:26 AM Richard Biener  wrote:
>
> On Thu, 10 Jul 2025, Jan Hubicka wrote:
>
> > Hi,
> > to assign debug locations to corresponding statements auto-fdo uses
> > discriminators.  Documentation says that if given statement belongs to 
> > multiple
> > basic blocks, the discrminator distinguishes them.
> >
> > Current implementation however only work fork statements that expands into a
> > squence of gimple statements which forms a linear sequence, sicne it
> > essentially tracks a current location and renews it each time new BB is 
> > found.
> > This is commonly not true for C++ code as in:
> >
> >:
> >   [simulator/csimplemodule.cc:379:85] _40 = 
> > std::__cxx11::basic_string::c_str 
> > ([simulator/csimplemodule.cc:379:85] &D.80680);
> >   [simulator/csimplemodule.cc:379:85 discrim 13] _41 = 
> > [simulator/csimplemodule.cc:379:85] 
> > &this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
> >   [simulator/csimplemodule.cc:379:85 discrim 13] _42 = 
> > &this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
> >   [simulator/csimplemodule.cc:377:45] _43 = 
> > this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782._vptr.cObject;
> >   [simulator/csimplemodule.cc:377:45] _44 = _43 + 40;
> >   [simulator/csimplemodule.cc:377:45] _45 = 
> > [simulator/csimplemodule.cc:377:45] *_44;
> >   [simulator/csimplemodule.cc:379:85] D.89001 = OBJ_TYPE_REF(_45;(const 
> > struct cObject)_42->5B) (_41);
> >
> > This is a fragment of code that is expanded from:
> >
> >
> > 371 if (this!=simulation.getContextModule())
> > 372 throw cRuntimeError("send()/sendDelayed() of module (%s)%s 
> > called in the context of "
> > 373 "module (%s)%s: method called from the 
> > latter module "
> > 374 "lacks Enter_Method() or 
> > Enter_Method_Silent()? "
> > 375 "Also, if message to be sent is passed 
> > from that module, "
> > 376 "you'll need to call take(msg) after 
> > Enter_Method() as well",
> > 377 getClassName(), getFullPath().c_str(),
> > 378 
> > simulation.getContextModule()->getClassName(),
> > 379 
> > simulation.getContextModule()->getFullPath().c_str());
> >
> > Notice that 379:85 is interleaved by 377:45 and the pass does not assign 
> > new discriminator.
> > With patch we get:
> >
> >:
> >   [simulator/csimplemodule.cc:379:85 discrim 7] _40 = 
> > std::__cxx11::basic_string::c_str 
> > ([simulator/csimplemodule.cc:379:85] &D.80680);
> >   [simulator/csimplemodule.cc:379:85 discrim 8] _41 = 
> > [simulator/csimplemodule.cc:379:85] 
> > &this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
> >   [simulator/csimplemodule.cc:379:85 discrim 8] _42 = 
> > &this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
> >   [simulator/csimplemodule.cc:377:45 discrim 1] _43 = 
> > this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782._vptr.cObject;
> >   [simulator/csimplemodule.cc:377:45 discrim 1] _44 = _43 + 40;
> >   [simulator/csimplemodule.cc:377:45 discrim 1] _45 = 
> > [simulator/csimplemodule.cc:377:45] *_44;
> >   [simulator/csimplemodule.cc:379:85 discrim 8] D.89001 = 
> > OBJ_TYPE_REF(_45;(const struct cObject)_42->5B) (_41);
> >
> > There are earlier statements with line number 379, so that is why there is 
> > discriminator 7 for the call.
> > After that discriminator is increased.  There are two reasons for it
> >  1) AFDO requires every callsite to have unique lineno:discriminator pair
> >  2) call may not terminate and htus the profile of first statement
> > may be higher than the rest.
> >
> > Old pass also contained logic to skip debug statements.  This is needed
> > to discriminator at train time (with -g) and discriminators at feedback
> > time (say -g0 -fauto-profile=...) are the same.  However keeping debug
> > statments with broken discriminators is not a good idea since we output
> > them to the debug output and if AFDO tool picks these locations up they
> > will be misplaced in basic blocks.
> >
> > Debug statements are naturally quite useful to track back the AFDO profiles
> > and in meantime LLVM folks implemented something similar called pseudoprobe.
> > I think it makes sense toenable debug statements with -fauto-profile even if
> > debug info is off and make use of them as done in this patch.
> >
> > Sadly AFDO tool is quite broken and bulid around assumption that every
> > address has at most one debug location assigned to it (i.e. debug info
> > before debug statements were introduced). I have WIP patch fixing this.
> > The fact that it ignores all but last location assigned to the address
> > sort of mitigates problem with debug statements.  If they are
> > immediately suceeded by another location, the tool ignores them.
> >
> > Note that LLVM also has -fdebug-info-for-a

Re: [PATCH v2] libstdc++: implement Philox Engine [PR119794]

2025-07-11 Thread Jonathan Wakely

On Fri, 11 Jul 2025 at 17:30, Patrick Palka  wrote:
>
> Hi,
>
> On Thu, 22 May 2025, 1nfocalypse wrote:
>
> > Implements Philox Engine (P2075R6) and associated tests.
> >
> > v2 corrects a multiline comment left in error in serialize.cc, and 
> > additionally corrects a bug hidden by said comment, where the stream was 
> > given the output of 'y()' instead of 'y', causing state to be
> > incorrectly passed. Lastly, it fixes numerous whitespace issues found in 
> > the original patch. My apologies for not noticing prior to the submission 
> > of the original patch, which can now be disregarded.
> >
> > To reiterate from the original email, the template unpacking functions are 
> > placed in a private classifier prior to the public one due to an ordering 
> > bug, where in order to function correctly, they must be
> > placed prior to the bulk of the header. This is counter to the style 
> > recommendations, but I was unable to obtain functionality any other way. 
> > Additionally, while SIMD instructions are not utilized, and I do
> > not think that they would integrate well with how the generator's state is 
> > currently handled, some structure choices could be made that may make them 
> > of interest.
> >
> > Lastly, since word width can be specified, and thus atypical, maximum value 
> > is calculated via some bit manipulation rather than numeric_limits, since 
> > the specified width may differ from the width of the type
> > used.
> >
> > Built/tested on x86_64-linux-gnu.
>
> Sorry for the delay and thanks for your patience!  Some initial review
> comments below.
>
> >
> >  *  1nfocalypse
>
> >  >
> >  >
> >  Subject: [PATCH] [PATCH v2] libstdc++: implement Philox Engine [PR119794]
> >
> >  The template unpacking functions, while private, are placed prior
> >  to the public access specifier due to issues where the template
> >  pack could not be unpacked and used to populate the public member
> >  arrays without being declared beforehand.
> >
> >  Additionally, the tests implemented attempt to mirror the tests
> >  for other engines, when they apply. Changes to random
> >  provided cause for changing 'pr60037-neg.cc' because it suppresses
> >  an error by explicit line number. It should still be correctly
> >  suppressed in this patch. Lastly, v2 fixes an issue in
> >  'serialize.cc' for Philox, where a portion of the test was
> >  commented out, hiding a bug where 'y()' was passed to the
> >  stream instead of 'y'. Both have been remedied in this patch.
>
> This implements the changes to the original paper
> https://wg21.link/lwg4134
> https://wg21.link/lwg4153
> right?  Maybe we could make a note of that in the commit message.
>
> >
> >  Plus some whitespace fixes.
> >
> >   PR libstdc++/119794
> >
> >  libstdc++-v3/ChangeLog:
> >
> >   * include/bits/random.h: Add Philox Engine components.
> >   * include/bits/random.tcc: Implement Philox Engine components.
> >   * testsuite/26_numerics/random/pr60037-neg.cc: Alter line #.
> >   * testsuite/26_numerics/random/inequal.cc: New test.
> >   * testsuite/26_numerics/random/philox4x32.cc: New test.
> >   * testsuite/26_numerics/random/philox4x64.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/cons/
> >   119794.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/cons/
> >   copy.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/cons/
> >   default.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/cons/
> >   seed.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/cons/
> >   seed_seq.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/operators/
> >   equal.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/operators/
> >   inequal.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/operators/
> >   serialize.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/requirements/
> >   constants.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/requirements/
> >   constexpr_data.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/requirements/
> >   constexpr_functions.cc: New test.
> >   * testsuite/26_numerics/random/philox_engine/requirements/
> >   typedefs.cc: New test.
> >  ---
> >   libstdc++-v3/include/bits/random.h| 340 ++
> >   libstdc++-v3/include/bits/random.tcc  | 201 +++
> >   .../testsuite/26_numerics/random/inequal.cc   |  49 +++
> >   .../26_numerics/random/philox4x32.cc  |  42 +++
> >   .../26_numerics/random/philox4x64.cc  |  42 +++
> >   .../random/philox_engine/cons/119794.cc   |  57 +++
> >   .../random/philox_engine/cons/copy.cc |  44 +++
> >   .../random/philox_engine/cons/default.cc  |  46 +++
> >   .../random/philox_engine/cons/seed.cc |  39 ++
> >   .../random/philox_engine/cons/s

Re: [PATCH] c, c++: Extend -Wunused-but-set-* warnings [PR44677]

2025-07-11 Thread Jakub Jelinek

On Fri, Jul 11, 2025 at 02:34:24PM -0400, Jason Merrill wrote:
> But by the time we get to cp_fold, DECL_READ_P should have already been set
> appropriately when we built the thing we're now folding.  And calling

Clearly it hasn't been, otherwise I'd need to patch different spots as well.

> mark_exp_read on foo(op0) won't mark op0 anyway; it doesn't recurse.  I
> don't see any regressions on Wunused* after

mark_exp_read recurses a little bit, on some simple cases.
E.g. COND_EXPR, COMPOUND_EXPR, casts, INDIRECT_REF, ...
For more complex trees the expectation is that mark_exp_read will
be called when creating or folding those trees (like CALL_EXPR etc.).
Admittedly it isn't very clean but mostly happens to work fine for years.

> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index 9a98628d9e8..addbc29d104 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -3221,16 +3221,8 @@ cp_fold (tree x, fold_flags_t flags)
>clear_decl_read = false;
>if (code == MODIFY_EXPR
>   && (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
> - && !DECL_READ_P (op0)
> - && (VAR_P (op0) ? warn_unused_but_set_variable
> - : warn_unused_but_set_parameter) > 2
> - && BINARY_CLASS_P (TREE_OPERAND (x, 1))
> - && TREE_OPERAND (TREE_OPERAND (x, 1), 0) == op0)
> -   {
> - mark_exp_read (TREE_OPERAND (TREE_OPERAND (x, 1), 1));

Guess another option instead of mark_exp_read would be some
cp_walk_tree_without_duplicates that would return if it finds
any references to op0 in that subexpression.

> > Well, I'm not sure it is actually an error.  finish_unary_op_expr doesn't
> > use cp_fully_fold result as an operand of {PRE,POST}{INC,DEC}REMENT_EXPR
> > (that would be wrong, we don't want the operand to fold into non-lvalue), it
> > is called only to find out if overflow warning should be emitted.
> 
> But it only warns if the whole expression folds to a constant, which can
> never happen for these tree codes.  So folding the operand is useless.

Seems you're right, can tweak it to punt for those codes earlier instead.
In fact, I wonder if the finish_unary_op_expr function is ever called on
POST{IN,DE}CREMENT_EXPR.

Jakub

Re: [EXT] Re: [PATCH 2/2] lra: Reallow reloading user hard registers if the insn is not asm [PR 120983]

2025-07-11 Thread Peter Bergner

On 7/11/25 10:22 AM, Vladimir Makarov wrote:
> On 7/8/25 9:43 PM, Xi Ruoyao wrote:
>>
>> IIUC "recog does not look at constraints until reload" has been a
>> well-established rule in GCC for years and I don't have enough skill to
>> challange it.  So reallow reloading user hard registers (but still
>> disallow doing so for asm) to fix the ICE.

I agree we should allow spilling of user defined hardregs outside of inline
asms if needed.  That said, I hesitate in spilling of hardregs (user defined
or not) in any other scenarios.

>> However before reload, recog completely ignores the constraints of
>> insns, so the RTL passes may produce insns where some user hard
>> registers violate an earlyclobber.  Then we'll get an ICE without
>> reloading them, like what we are recently encountering in LoongArch test
>> suite.

I wonder if the correct "fix" is to actually not generate the problematical
rtl insn in the first place?  Maybe recog() before lra should check for
issues with early clobbers and constraints when there are hard registers
involved?

>> -/* Operands don't match.  If the operands are
>> -   different user defined explicit hard
>> +/* Operands don't match.  For asm if the operands
>> +   are different user defined explicit hard
>> registers, then we cannot make them match
>> when one is early clobber operand.  */

I know this existed before your patch, but:

s/when one is early clobber operand/when one is an early clobber operand/

Peter

[pushed: r16-2208] libgdiagnostics: doc fixes

2025-07-11 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-2208-g457464edf19f17.

gcc/ChangeLog:
* doc/libgdiagnostics/topics/compatibility.rst
(_LIBGDIAGNOSTICS_ABI_2): Add missing anchor.
* doc/libgdiagnostics/topics/diagnostic-manager.rst
(diagnostic_manager_add_sink_from_spec): Add links to GCC's
documentation of "-fdiagnostics-add-output=".  Fix parameter
markup.
(diagnostic_manager_set_analysis_target): Fix parameter markup.
Add link to SARIF spec.
* doc/libgdiagnostics/topics/logical-locations.rst: Markup fix.
* doc/libgdiagnostics/tutorial/02-physical-locations.rst: Clarify
wording of what "the source file" means, and that a range can't
have multiple files.

Signed-off-by: David Malcolm 
---
 .../libgdiagnostics/topics/compatibility.rst  |  2 ++
 .../topics/diagnostic-manager.rst | 26 +--
 .../topics/logical-locations.rst  |  7 ++---
 .../tutorial/02-physical-locations.rst| 11 
 4 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/gcc/doc/libgdiagnostics/topics/compatibility.rst 
b/gcc/doc/libgdiagnostics/topics/compatibility.rst
index 10adcc516ce5..6d8c92f06ea2 100644
--- a/gcc/doc/libgdiagnostics/topics/compatibility.rst
+++ b/gcc/doc/libgdiagnostics/topics/compatibility.rst
@@ -178,6 +178,8 @@ acccessing values within a 
:type:`diagnostic_logical_location`:
 
   * :func:`diagnostic_logical_location_get_decorated_name`
 
+.. _LIBGDIAGNOSTICS_ABI_2:
+
 ``LIBGDIAGNOSTICS_ABI_2``
 -
 ``LIBGDIAGNOSTICS_ABI_2`` covers the addition of these functions for
diff --git a/gcc/doc/libgdiagnostics/topics/diagnostic-manager.rst 
b/gcc/doc/libgdiagnostics/topics/diagnostic-manager.rst
index 0390704963bd..c94d19e65097 100644
--- a/gcc/doc/libgdiagnostics/topics/diagnostic-manager.rst
+++ b/gcc/doc/libgdiagnostics/topics/diagnostic-manager.rst
@@ -63,17 +63,17 @@ Responsibilities include:
   diagnostic_manager *control_mgr)
 
This function can be used to support option processing similar to GCC's
-   :option:`-fdiagnostics-add-output=`.  This allows command-line tools to
-   support the same domain-specific language for specifying output sink
-   as GCC does.
-
-   The function will attempt to parse :param:`spec` as if it were
-   an argument to GCC's :option:`-fdiagnostics-add-output=OUTPUT-SPEC`.
-   If successful, it will add an output sink to :param:`affected_mgr` and 
return zero.
-   Otherwise, it will emit an error diagnostic to :param:`control_mgr` and
+   `-fdiagnostics-add-output= 
`_.
+   This allows command-line tools to support the same domain-specific
+   language for specifying output sinks as GCC does.
+
+   The function will attempt to parse ``spec`` as if it were
+   an argument to GCC's `-fdiagnostics-add-output= 
`_.
+   If successful, it will add an output sink to ``affected_mgr`` and return 
zero.
+   Otherwise, it will emit an error diagnostic to ``control_mgr`` and
return non-zero.
 
-   :param:`affected_mgr` and :param:`control_mgr` can be the same manager,
+   ``affected_mgr`` and ``control_mgr`` can be the same manager,
or be different managers.
 
This function was added in :ref:`LIBGDIAGNOSTICS_ABI_2`; you can
@@ -83,14 +83,14 @@ Responsibilities include:
 
   #ifdef LIBDIAGNOSTICS_HAVE_diagnostic_manager_add_sink_from_spec
 
-
 .. function:: void diagnostic_manager_set_analysis_target (diagnostic_manager 
*mgr, \
   const 
diagnostic_file *file)
 
-   This function sets the "main input file" of :param:`mgr` to be
-   :param:`file`.
+   This function sets the "main input file" of ``mgr`` to be
+   ``file``.
This affects the :code:`` of generated HTML and
-   the :code:`role` of the artifact in SARIF output (SARIF v2.1.0 section 
3.24.6).
+   the :code:`role` of the :code:`artifact` in SARIF output
+   (`SARIF v2.1.0 section 3.24.6 
`_).
 
This function was added in :ref:`LIBGDIAGNOSTICS_ABI_2`; you can
test for its presence using
diff --git a/gcc/doc/libgdiagnostics/topics/logical-locations.rst 
b/gcc/doc/libgdiagnostics/topics/logical-locations.rst
index 184b56381910..294d396f677c 100644
--- a/gcc/doc/libgdiagnostics/topics/logical-locations.rst
+++ b/gcc/doc/libgdiagnostics/topics/logical-locations.rst
@@ -120,7 +120,7 @@ source location
"equal" input values on the same :type:`diagnostic_manager` will return
the same instance of :type:`diagnostic_logical_location`.  "Equal" here
includes different string buffers th

[pushed: r16-2209] json: fix null-termination of json::string

2025-07-11 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-2209-g1ea72a15031cd8.

gcc/ChangeLog:
* json.cc (string::string): When constructing from pointer and
length, ensure the new buffer is null-terminated.
(selftest::test_strcmp): New.
(selftest::json_cc_tests): Likewise.
---
 gcc/json.cc | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/json.cc b/gcc/json.cc
index f3f364598569..df0702f8e6c5 100644
--- a/gcc/json.cc
+++ b/gcc/json.cc
@@ -501,9 +501,10 @@ string::string (const char *utf8)
 string::string (const char *utf8, size_t len)
 {
   gcc_assert (utf8);
-  m_utf8 = XNEWVEC (char, len);
+  m_utf8 = XNEWVEC (char, len + 1);
   m_len = len;
   memcpy (m_utf8, utf8, len);
+  m_utf8[len] = '\0';
 }
 
 /* Implementation of json::value::print for json::string.  */
@@ -914,6 +915,15 @@ test_comparisons ()
   ASSERT_JSON_NE (arr_1, arr_2);
 }
 
+/* Ensure that json::string's get_string is usable as a C-style string.  */
+
+static void
+test_strcmp ()
+{
+  string str ("foobar", 3);
+  ASSERT_EQ (strcmp (str.get_string (), "foo"), 0);
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -928,6 +938,7 @@ json_cc_tests ()
   test_writing_literals ();
   test_formatting ();
   test_comparisons ();
+  test_strcmp ();
 }
 
 } // namespace selftest
-- 
2.26.3

[pushed: r16-2210] json: add json::value::clone

2025-07-11 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-2210-gd7c1e9b37caad5.

gcc/ChangeLog:
* json.cc (json::object::clone): New.
(json::object::clone_as_object): New.
(json::array::clone): New.
(json::float_number::clone): New.
(json::integer_number::clone): New.
(json::string::clone): New.
(json::literal::clone): New.
(selftest::test_cloning): New test.
(selftest::json_cc_tests): Call it.
* json.h (json::value::clone): New vfunc.
(json::object::clone): New decl.
(json::object::clone_as_object): New decl.
(json::array::clone): New decl.
(json::float_number::clone): New decl.
(json::integer_number::clone): New decl.
(json::string::clone): New decl.
(json::literal::clone): New decl.
---
 gcc/json.cc | 109 
 gcc/json.h  |   9 +
 2 files changed, 118 insertions(+)

diff --git a/gcc/json.cc b/gcc/json.cc
index df0702f8e6c5..7153f087a001 100644
--- a/gcc/json.cc
+++ b/gcc/json.cc
@@ -289,6 +289,30 @@ object::print (pretty_printer *pp, bool formatted) const
   pp_character (pp, '}');
 }
 
+std::unique_ptr
+object::clone () const
+{
+  return clone_as_object ();
+}
+
+std::unique_ptr
+object::clone_as_object () const
+{
+  auto result = std::make_unique ();
+
+  /* Iterate in the order that the keys were inserted.  */
+  unsigned i;
+  const char *key;
+  FOR_EACH_VEC_ELT (m_keys, i, key)
+{
+  map_t &mut_map = const_cast (m_map);
+  value *value = *mut_map.get (key);
+  result->set (key, value->clone ());
+}
+
+  return result;
+}
+
 /* Set the json::value * for KEY, taking ownership of V
(and taking a copy of KEY if necessary).  */
 
@@ -443,6 +467,17 @@ array::print (pretty_printer *pp, bool formatted) const
   pp_character (pp, ']');
 }
 
+std::unique_ptr
+array::clone () const
+{
+  auto result = std::make_unique ();
+  unsigned i;
+  value *v;
+  FOR_EACH_VEC_ELT (m_elements, i, v)
+result->append (v->clone ());
+  return result;
+}
+
 /* Append non-NULL value V to a json::array, taking ownership of V.  */
 
 void
@@ -473,6 +508,12 @@ float_number::print (pretty_printer *pp,
   pp_string (pp, tmp);
 }
 
+std::unique_ptr
+float_number::clone () const
+{
+  return std::make_unique (m_value);
+}
+
 /* class json::integer_number, a subclass of json::value, wrapping a long.  */
 
 /* Implementation of json::value::print for json::integer_number.  */
@@ -486,6 +527,11 @@ integer_number::print (pretty_printer *pp,
   pp_string (pp, tmp);
 }
 
+std::unique_ptr
+integer_number::clone () const
+{
+  return std::make_unique (m_value);
+}
 
 /* class json::string, a subclass of json::value.  */
 
@@ -516,6 +562,12 @@ string::print (pretty_printer *pp,
   print_escaped_json_string (pp, m_utf8, m_len);
 }
 
+std::unique_ptr
+string::clone () const
+{
+  return std::make_unique (m_utf8, m_len);
+}
+
 /* class json::literal, a subclass of json::value.  */
 
 /* Implementation of json::value::print for json::literal.  */
@@ -540,6 +592,12 @@ literal::print (pretty_printer *pp,
 }
 }
 
+std::unique_ptr
+literal::clone () const
+{
+  return std::make_unique (m_kind);
+}
+
 
 #if CHECKING_P
 
@@ -924,6 +982,56 @@ test_strcmp ()
   ASSERT_EQ (strcmp (str.get_string (), "foo"), 0);
 }
 
+static void
+test_cloning ()
+{
+  // Objects
+  {
+object obj;
+obj.set_string ("foo", "bar");
+
+auto obj_clone = obj.clone ();
+ASSERT_JSON_EQ (obj, *obj_clone);
+  }
+
+  // Arrays
+  {
+array arr;
+arr.append (std::make_unique ("foo"));
+
+auto arr_clone = arr.clone ();
+ASSERT_JSON_EQ (arr, *arr_clone);
+  }
+
+  // float_number
+  {
+float_number f_one (1.0);
+auto f_clone = f_one.clone ();
+ASSERT_JSON_EQ (f_one, *f_clone);
+  }
+
+  // integer_number
+  {
+integer_number num (42);
+auto num_clone = num.clone ();
+ASSERT_JSON_EQ (num, *num_clone);
+  }
+
+  // string
+  {
+string str ("foo");
+auto str_clone = str.clone ();
+ASSERT_JSON_EQ (str, *str_clone);
+  }
+
+  // literal
+  {
+literal lit (JSON_TRUE);
+auto lit_clone = lit.clone ();
+ASSERT_JSON_EQ (lit, *lit_clone);
+  }
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -939,6 +1047,7 @@ json_cc_tests ()
   test_formatting ();
   test_comparisons ();
   test_strcmp ();
+  test_cloning ();
 }
 
 } // namespace selftest
diff --git a/gcc/json.h b/gcc/json.h
index da4da852a1ed..156c086d2cfd 100644
--- a/gcc/json.h
+++ b/gcc/json.h
@@ -124,6 +124,7 @@ class value
   virtual ~value () {}
   virtual enum kind get_kind () const = 0;
   virtual void print (pretty_printer *pp, bool formatted) const = 0;
+  virtual std::unique_ptr clone () const = 0;
 
   void dump (FILE *, bool formatted) const;
   void DEBUG_FUNCTION dump () const;
@@ -150,6 +151,7 @@ class object : public value
 
   enum kind get_kind () const final override { return JSON_

Re: [PATCH] c, c++: Extend -Wunused-but-set-* warnings [PR44677]

2025-07-11 Thread Jason Merrill


On 7/11/25 3:00 PM, Jakub Jelinek wrote:

On Fri, Jul 11, 2025 at 02:34:24PM -0400, Jason Merrill wrote:

But by the time we get to cp_fold, DECL_READ_P should have already been set
appropriately when we built the thing we're now folding.  And calling


Clearly it hasn't been, otherwise I'd need to patch different spots as well.


mark_exp_read on foo(op0) won't mark op0 anyway; it doesn't recurse.  I
don't see any regressions on Wunused* after


mark_exp_read recurses a little bit, on some simple cases.
E.g. COND_EXPR, COMPOUND_EXPR, casts, INDIRECT_REF, ...


Yes, for compound lvalues we need to recurse because forming x.y doesn't 
immediately read x.  And looking through COND/COMPOUND matches 
"potential results" of an expression

https://eel.is/c++draft/basic#def.odr-3

Hmm, including INDIRECT_REF and FLOAT_EXPR seems wrong, though.


For more complex trees the expectation is that mark_exp_read will
be called when creating or folding those trees (like CALL_EXPR etc.).
Admittedly it isn't very clean but mostly happens to work fine for years.


diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 9a98628d9e8..addbc29d104 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3221,16 +3221,8 @@ cp_fold (tree x, fold_flags_t flags)
clear_decl_read = false;
if (code == MODIFY_EXPR
   && (VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
- && !DECL_READ_P (op0)
- && (VAR_P (op0) ? warn_unused_but_set_variable
- : warn_unused_but_set_parameter) > 2
- && BINARY_CLASS_P (TREE_OPERAND (x, 1))
- && TREE_OPERAND (TREE_OPERAND (x, 1), 0) == op0)
-   {
- mark_exp_read (TREE_OPERAND (TREE_OPERAND (x, 1), 1));


Guess another option instead of mark_exp_read would be some
cp_walk_tree_without_duplicates that would return if it finds
any references to op0 in that subexpression.


But my simplifying patch didn't seem to break anything, so adding more 
complexity also seems unnecessary.


Am I wrong that it doesn't break anything?


Well, I'm not sure it is actually an error.  finish_unary_op_expr doesn't
use cp_fully_fold result as an operand of {PRE,POST}{INC,DEC}REMENT_EXPR
(that would be wrong, we don't want the operand to fold into non-lvalue), it
is called only to find out if overflow warning should be emitted.


But it only warns if the whole expression folds to a constant, which can
never happen for these tree codes.  So folding the operand is useless.


Seems you're right, can tweak it to punt for those codes earlier instead.
In fact, I wonder if the finish_unary_op_expr function is ever called on
POST{IN,DE}CREMENT_EXPR.


They seem to go through finish_increment_expr instead.

Jason

Re: [PATCH] testsuite: Disable musttail tests if target uses SJLJ exceptions

2025-07-11 Thread Joern Wolfgang Rennecke


Andi Kleen:
> I would rather make it XFAIL and also open a PR, after all it is a
limitation that could (and should) be fixed.

Huh, I didn't see this thread in time.
For the special case that the exception handling code in the caller is 
empty - which is what mustcall3.C and mustcall5.C are testing -

we can handle tailcalls by modifying the sjlj test to test for empty
exception code too.

To avoid unwanted code and constant duplication, I've refactored the 
code a bit.  See:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121041

Re: [PATCH] x86-64: Add --enable-x86-64-mfentry

2025-07-11 Thread Uros Bizjak

On Fri, Jul 11, 2025 at 2:33 PM Siddhesh Poyarekar  wrote:
>
> On 2025-07-08 18:07, Sam James wrote:
> >> OK in principle, but please allow some time for distro maintainers
> >> (CC'd) to voice their opinion.
> >
> > It looks good to me and I plan on us using it. I'd like opinions from
> > one other group first before it goes in if possible though, as our
> > perspective is different from others (e.g. we don't have to worry about
> > old enterprise deployments).
>
> Why not just switch over unconditionally?  __fentry__ seems like a
> better alternative to mcount overall and it has been around long enough
> that even older deployments should be relatively unaffected.

Actually, it is switched on by default for i?86-*-linux* |
x86_64-*-linux*. The default for --enable-x86-64-mfentry is "auto",
which triggers the mentioned condition. One still has a chance to use
"yes" or "no" in addition to "auto" when configuring with
--{enable|disable}-x86-64-mfentry.

Uros.

Re: [PATCH] x86-64: Add --enable-x86-64-mfentry

2025-07-11 Thread Sam James

Uros Bizjak  writes:

> On Fri, Jul 11, 2025 at 2:33 PM Siddhesh Poyarekar  
> wrote:
>>
>> On 2025-07-08 18:07, Sam James wrote:
>> >> OK in principle, but please allow some time for distro maintainers
>> >> (CC'd) to voice their opinion.
>> >
>> > It looks good to me and I plan on us using it. I'd like opinions from
>> > one other group first before it goes in if possible though, as our
>> > perspective is different from others (e.g. we don't have to worry about
>> > old enterprise deployments).
>>
>> Why not just switch over unconditionally?  __fentry__ seems like a
>> better alternative to mcount overall and it has been around long enough
>> that even older deployments should be relatively unaffected.
>
> Actually, it is switched on by default for i?86-*-linux* |
> x86_64-*-linux*. The default for --enable-x86-64-mfentry is "auto",
> which triggers the mentioned condition. One still has a chance to use
> "yes" or "no" in addition to "auto" when configuring with
> --{enable|disable}-x86-64-mfentry.

I think we need to conditionalise it on gnu too unless I'm missing
something.

>
> Uros.

Re: [PATCH] testsuite: arm: Add effective-target vect_early_break to vect-tsvc-*

2025-07-11 Thread Torbjorn SVENSSON


Hi Christophe,

On 2025-07-11 15:47, Christophe Lyon wrote:

Hi Torbjörn,

On Fri, 11 Jul 2025 at 10:47, Torbjörn SVENSSON
 wrote:


Ok for trunk, gcc-15 and gcc-14.

I discovered that the dg-require-effective-target is missing on gcc-14,
but it's probably the right thing to add on gcc-15 and trunk too.

Without the `dg-require-effective-target vect_early_break`, the
`dg-add-options vect_early_break` will return the flags unchanged and
`dg-require-effective-target vect_early_break_hw` will succeed as it
overrides the flags, causing the tests to use the wrong target.

Let me know what you think.

--

With the -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c, these tests start to pass due to that the
cpu/arch is overridden. The proper thing to do when using
`dg-add-options vect_early_break` is to also have a
`dg-require-effective-target vect_early_break`, so adding this.



So IIUC:
- on gcc-14 the tests are skipped because you override -march/-mcpu
when testing, and arm_v8_neon_ok fails when called from
vect_early_break ?


Not quite. On gcc-14 (running with -march=armv7ve/-mcpu=cortex-a7), the 
tests are considered, but they are built with the wrong flags so they XPASS.



- on gcc-15 and trunk, the tests now pass thanks to -mcpu=unset/-march=unset
but you are concerned that 'dg-add-options vect_early_break' is used
without the corresponding effective-target?


As the arm_v8_neon (part of vect_early_break check) results in 
`-mfpu=neon-fp-armv8 -mcpu=unset -march=armv8-a` flags, the test is PASS 
instead.



So your patch is just "cosmetic" and has no impact on the testsuite results?


For gcc-15 and trunk, it's consmetic for now, but may change in the 
future depending on the use/implementation of vect_early_break_hw etc.



I have another concern (hence cc'ing Alexandre): vect.exp calls
check_vect_support_and_set_flags which defines dg-do-what-default
according to what it discovers, meaning that for some targets these
tests are 'run' and on others they are just 'compile'.
So I suppose we should use 'dg-require-effective-target
vect_early_break_hw' only when running the tests and
'dg-require-effective-target vect_early_break' when compiling them?

I suppose at the moment we completely skip these tests, while would
could at least compile them on some targets?


I guess we would skip it too, but I can't find `vect_early_break_hw` in 
the log for gcc-14 where the test is XPASS, so I'm not sure this is 
completly true.
I'm also a bit confused about the selector used for the dg-final 
statement. I think it would make more sense to simply drop the condition 
and always expect it to be PASS and only run the test on supported targets.



I think you can remove the 'arm' keywork in the title of your commit
message, as this patch can impact all targets.


Ok, I will make sure to do that prior to pushing or a v2.

Kind regards,
Torbjörn


Thanks,

Christophe




gcc/testsuite/ChangeLog:

 * gcc.dg/vect/tsvc/vect-tsvc-s332.c: Add
 dg-require-effective-target vect_early_break to test.
 * gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
 * gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c | 1 +
  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c | 1 +
  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c | 1 +
  3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 21a9c5a6b2b..b4154040d1b 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -3,6 +3,7 @@

  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
  /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break } */
  /* { dg-require-effective-target vect_early_break_hw } */
  /* { dg-add-options vect_early_break } */

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index e4433385d66..156e44972bd 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -3,6 +3,7 @@

  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
  /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break } */
  /* { dg-require-effective-target vect_early_break_hw } */
  /* { dg-add-options vect_early_break } */

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 146df409ecc..a1fcb18c557 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -3,6 +3,7 @@

  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
  /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_early_break } */
  /* { dg-require-e

RE: [PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]

2025-07-11 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, July 11, 2025 4:23 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Alex Coplan ; Alice Carlotti 
> ;
> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw
> ; Tamar Christina ;
> Wilco Dijkstra 
> Subject: [PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]
> 
> This PR is partly about a code quality regression that was triggered
> by g:caa7a99a052929d5970677c5b639e1fa5166e334.  That patch taught the
> gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
> upon either (a) the original permutations not being "native" operations
> or (b) the combined permutation being a "native" operation.
> 
> Whether something is a "native" operation is tested by calling
> can_vec_perm_const_p with allow_variable_p set to false.  This requires
> the permutation to be supported directly by
> TARGET_VECTORIZE_VEC_PERM_CONST,
> rather than falling back to the general vec_perm optab.
> 
> This exposed a problem with the way that we handled general 2-input
> permutations for SVE.  Unlike Advanced SIMD, base SVE does not have
> an instruction to do general 2-input permutations.  We do still implement
> the vec_perm optab for SVE, but only when the vector length is known at
> compile time.  The general expansion is pretty expensive: an AND, a SUB,
> two TBLs, and an ORR.  It certainly couldn't be considered a "native"
> operation.
> 
> However, if a VEC_PERM_EXPR has a constant selector, the indices can
> be wider than the elements being permuted.  This is not true for the
> vec_perm optab, where the indices and permuted elements must have the
> same precision.
> 
> This leads to one case where we cannot leave a general 2-input permutation
> to be handled by the vec_perm optab: when permuting bytes on a target
> with 2048-bit vectors.  In that case, the indices of the elements in
> the second vector are in the range [256, 511], which cannot be stored
> in a byte index.
> 
> TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
> permutations for one specific case.  Rather than check for that
> specific case, the code went ahead and used the vec_perm expansion
> whenever it worked.  But that undermines the !allow_variable_p
> handling in can_vec_perm_const_p; it becomes impossible for
> target-independent code to distinguish "native" operations from
> the worst-case fallback.
> 
> This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
> cases that it has to handle.  It fixes the PR for all vector lengths
> except 2048 bits.
> 
> A better fix would be to introduce some sort of costing mechanism,
> which would allow us to reject the new VEC_PERM_EXPR even for
> 2048-bit targets.  But that would be a significant amount of work
> and would not be backportable.
> 
> Tested on aarch64-linux-gnu.  OK to install?

Ok.

Thanks!

I'm somewhat surprised by
"aarch64_expand_sve_vec_perm does not yet handle variable-length vectors"

I assume cases that we could handle are if the permute values are
a series right? It doesn't seem like we could handle an arbitrary permute with 
VLA.

Cheers,
Tamar

> 
> Richard
> 
> 
> gcc/
>   PR target/121027
>   * config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
>   operations that can be handled by vec_perm.
> 
> gcc/testsuite/
>   PR target/121027
>   * gcc.target/aarch64/sve/acle/general/perm_1.c: New test.
> ---
>  gcc/config/aarch64/aarch64.cc | 21 ++-
>  .../aarch64/sve/acle/general/perm_1.c | 14 +
>  2 files changed, 30 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_1.c
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 10b8ed5d387..6e16763f957 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -26960,12 +26960,23 @@ aarch64_evpc_tbl (struct expand_vec_perm_d
> *d)
>  static bool
>  aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d)
>  {
> -  unsigned HOST_WIDE_INT nelt;
> +  if (!d->one_vector_p)
> +{
> +  /* aarch64_expand_sve_vec_perm does not yet handle variable-length
> +  vectors.  */
> +  if (!d->perm.length ().is_constant ())
> + return false;
> 
> -  /* Permuting two variable-length vectors could overflow the
> - index range.  */
> -  if (!d->one_vector_p && !d->perm.length ().is_constant (&nelt))
> -return false;
> +  /* This permutation reduces to the vec_perm optab if the elements are
> +  large enough to hold all selector indices.  Do not handle that case
> +  here, since the general TBL+SUB+TBL+ORR sequence is too expensive to
> +  be considered a "native" constant permutation.
> +
> +  Not doing this would undermine code that queries
> can_vec_perm_const_p
> +  with allow_variable_p set to false.  See PR121027.  */
> +  if (selector_fits_mode_p (d->vmode, d->perm))
> + return false;
> +}

Re: [PATCH] Allow explicitly specifying the thread model for runtime libs

2025-07-11 Thread John Ericson

Hello, this 4-year-old patch of mine was never reviewed. Per 
https://github.com/NixOS/nixpkgs/pull/414299, we in a package set / distro, 
Nixpkgs/NixOS, just began (albeit on an experimental basis) packaging GCC with 
this patch (among others) applied. It would thus be nice to get it applied 
upstream --- if changes are needed, of course we can make them and resubmit.

(The is also visible at 
https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/compilers/gcc/ng/15/gcc/custom-threading-model.patch.)

This is I think the conceptually simplest patch/series of the ones we have in 
there, so a good one to start with. (I think it would be unnecessary 
complicated to discuss them all at once.)

Thanks in advance,

John

On Wed, Aug 18, 2021, at 4:38 PM, John Ericson wrote:
> From: John Ericson 
> 
> Previously, they always scraped the thread mode from `$CC -v', now, that
> is the default but one may pass `--with-threads=MODEL` to be explicit
> instead.
> 
> One use-case is bootstraping with a shorter critical path. The
> traditionally route was to build an entire "static stage" GCC, build
> libc, and then build GCC again supporting dynamic linking,
> multithreading, etc. But this is wasteful in that GCC itself is built
> twice.
> 
> With this change, rather than having to mess with spec files we can just
> configure the runtime libraries the way we want directly. In turn, that
> opens to just building libgcc twice rather than all of GCC.
> 
> Frankly, specs were always a rather indirect approach to coordinate this
> during GCC's bootstrap, since GCC itself really doesn't care what the
> threading model is, just that the runtime libraries agree among
> themselves. Relying on a hard-coded spec for this also keeps us one step
> further from the long-term goal of multi-target GCC, for what it's
> worth.
> 
> For the record, "single stage" builds of GCC have some precedent in
> downstream packaging, for example with [1]. That one, as far as I can
> tell, builds libgcc once, but with the headers of the libc
> implementation provided and then cyclic linking down later. They are
> both fine approaches, but I don't think one should have to be forced
> into cyclic dependencies if they don't want to. That opens the door to
> non-terminating programs due to, e.g., atomics used in a threads
> implementation being lowered to threads absent hardware support.
> 
> Finally, I understand that such custom bootstrapping is not officially
> supported. I don't mean to imply it should be --- a lot more cleanup
> work to the build system would be necessary before supporting it
> wouldn't be a huge additional maintainer burden --- I just hope to add a
> reasonable knob for those comfortable with doing unsupported things
> already.
> 
> [1]: https://github.com/richfelker/musl-cross-make
> ---
> config/gthr.m4  | 32 
> libatomic/configure.ac  |  4 +---
> libgcc/configure.ac |  4 +---
> libphobos/m4/druntime/os.m4 |  2 +-
> libstdc++-v3/acinclude.m4   |  8 +++-
> 5 files changed, 38 insertions(+), 12 deletions(-)
> 
> diff --git a/config/gthr.m4 b/config/gthr.m4
> index 4b937306ad0..c585b618e40 100644
> --- a/config/gthr.m4
> +++ b/config/gthr.m4
> @@ -5,6 +5,35 @@ dnl Public License, this file may be distributed as part of 
> a program
> dnl that contains a configuration script generated by Autoconf, under
> dnl the same distribution terms as the rest of that program.
>  
> +dnl Define thread model
> +
> +dnl usage: GCC_AC_THREAD_MODEL
> +AC_DEFUN([GCC_AC_THREAD_MODEL],
> +[
> +# With threads
> +# Pass with no value to take from compiler's metadata
> +# Pass with a value to specify a thread package
> +# 'single' means single threaded -- without threads.
> +AC_ARG_WITH(threads,
> +[AS_HELP_STRING([[--with-threads=MODEL]],
> + [specify thread model for this GCC
> + runtime library])],,
> +[with_threads=''])
> +
> +if test x"$with_threads" = x'yes'; then
> +AC_MSG_ERROR([Cannot pass bare --with-threads, must pass explicit 
> --with-threads=MODEL])
> +elif test x"$with_threads" = x'no'; then
> +target_thread_file=single
> +elif test x"$with_threads" = ''; then
> +AC_MSG_CHECKING([for thread model used by GCC])
> +target_thread_file=`$CC -v 2>&1 | sed -n 's/^Thread model: //p'`
> +AC_MSG_RESULT([$target_thread_file])
> +else
> +target_thread_file=$with_threads
> +fi
> +])
> +
> +
> dnl Define header location by thread model
>  
> dnl usage: GCC_AC_THREAD_HEADER([thread_model])
> @@ -22,6 +51,9 @@ case $1 in
>  tpf) thread_header=config/s390/gthr-tpf.h ;;
>  vxworks) thread_header=config/gthr-vxworks.h ;;
>  win32) thread_header=config/i386/gthr-win32.h ;;
> +*)
> +AC_MSG_ERROR([No known header for threading model '$1'.])
> +;;
> esac
> AC_SUBST(thread_header)
> ])
> diff --git a/libatomic/configure.ac b/libatomic/configure.ac
> index 2a371870c2f..42d2016b7a2 100644
> --- a/libatomic/configure.ac
> +++ b/libatomic/config

Re: [PATCH] ipa, cgraph: Enable constant propagation to OpenMP kernels

2025-07-11 Thread Martin Jambor

Hello,

and sorry for a rather late reaction.  First and foremost, thanks for
the patch, I think it is great to have this finally implemented and
although I would like to see a couple things changed, I think the patch
is quite close to what could be committed to gcc master.

After my first pass through the patch I have the following comments.  I
admit I have not looked into the issues with GOMP_task (though it looks
like it should be just special-cased in the IPA passes but that can wait
for later) and I do not have any strong opinion about what to do with
the fact that clang already has the callback attribute (I guess I'd aim
for a slightly different name for now).

On Sun, Apr 27 2025, Josef Melcr wrote:
> This patch enables constant propagation to outlined OpenMP kernels and
> improves support for optimizing callback functions in general. It
> implements the attribute 'callback' as found in clang, though argument
> numbering is a bit different, as described below. The title says OpenMP,
> but it can be used for any function which takes a callback argument, such
> as pthread functions, qsort and others.
>
> The attribute 'callback' captures the notion of a function calling one
> of its arguments with some of its parameters as arguments. An OpenMP
> example of such function is GOMP_parallel.
> We implement the attribute with new callgraph edges called 'callback'
> edges. They are imaginary edges pointing from the caller of the function
> with the attribute (e.g. caller of GOMP_parallel) to the body function
> itself (e.g. the outlined OpenMP body). They share their call statement
> with the edge from which they are derived (direct edge caller -> GOMP_parallel
> in this case). These edges allow passes such as ipa-cp to the see the
> hidden call site to the body function and optimize the function accordingly.
>
> To illustrate on an example, the body GOMP_parallel looks something
> like this:
>
> void GOMP_parallel (void (*fn) (void *), void *data, /* ... */)
> {
>   /* ... */
>   fn (data);
>   /* ... */
> }
>
>
> If we extend it with the attribute 'callback(1, 2)', we express that the
> function calls its first argument and passes it its second argument.
> This is represented in the call graph in this manner:
>
>  direct indirect
> caller -> GOMP_parallel ---> fn
>   |
>   --> fn
>   callback
>
> The direct edge is then the parent edge, with all callback edges being
> the child edges.

I know this will sound like nit-picking but could we avoid using the
terms parent and children?  It is not just that I don't think the names
capture what the edges are, but mainly I believe that (after this gets
in) people reading the code and comments will not immediately think
about callbacks when they encounter mentions of parents and children and
will be confused.

I'd suggest to call the edges "callback-carrying" and "callback" instead
but I am opened to other ideas too.

> While constant propagation is the main focus of this patch, callback
> edges can be useful for different passes (for example, it improves icf
> for OpenMP kernels), as they allow for address redirection.
> If the outlined body function gets optimized and cloned, from body_fn to
> body_fn.optimized, the callback edge allows us to replace the
> address in the arguments list:
>
> GOMP_parallel (body_fn, &data_struct, /* ... */);
>
> becomes
>
> GOMP_parallel (body_fn.optimized, &data_struct, /* ... */);
>
> This redirection is possible for any function with the attribute.
>
> This callback attribute implementation is partially compatible with
> clang's implementation. Its semantics, arguments and argument indexing style 
> are
> the same, but we represent an unknown argument position with 0
> (precedent set by attributes such as 'format'), while clang uses -1 or '?'.
> We also allow for multiple callback attributes on the same function,
> while clang only allows one.
>
> The attribute allows us to propagate constants into body functions of
> OpenMP constructs. Currently, GCC won't propagate the value 'c' into the
> OpenMP body in the following example:
>
> int a[100];
> void test(int c) {
> #pragma omp parallel for
>   for (int i = 0; i < c; i++) {
> if (!__builtin_constant_p(c)) {
>   __builtin_abort();
> }
> a[i] = i;
>   }
> }
> int main() {
>   test(100);
>   return a[5] - 5;
> }
>
> With this patch, the body function will get cloned and the constant 'c'
> will get propagated.
>
> Bootstrapped and regtested on x86_64-linux. OK for master?
>
> Thanks,
> Josef Melcr
>
> gcc/ChangeLog:
>
>   * builtin-attrs.def (0): New int list.
>   (ATTR_CALLBACK): Callback attribute identifier.
>   (DEF_CALLBACK_ATTRIBUTE): Macro for callback attribute creation.
>   (GOMP): Attributes for libgomp functions.
>   (OACC): Attribute used for oacc functions.
>   (ATTR_CALLBACK_GOMP_LIST): ATTR_NOTHROW_LIST but with the
>   callback attr

[committed] PR modula2/120253: Error message column numbers should start at 1 not 0

2025-07-11 Thread Gaius Mulley




This patch ensures that column numbers start at 1 rather than 0.

gcc/m2/ChangeLog:

PR modula2/120253
* m2.flex (FIRST_COLUMN): New define.
(updatepos): Remove commented code.
(consumeLine): Assign column to FIRST_COLUMN.
(initLine): Ditto.
(m2flex_GetColumnNo): Return FIRST_COLUMN if currentLine is NULL.
(m2flex_GetLineNo): Rewrite for positive logic.
(m2flex_GetLocation): Ditto.

(cherry picked from commit 9a485b83e177cb742be17faf20ac5cc7db14fee3)

Signed-off-by: Gaius Mulley 
---
 gcc/m2/m2.flex | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/m2/m2.flex b/gcc/m2/m2.flex
index d08ac3edefa..e3cf010b590 100644
--- a/gcc/m2/m2.flex
+++ b/gcc/m2/m2.flex
@@ -48,6 +48,8 @@ static int cpreprocessor = 0;  /* Replace this with correct 
getter.  */
 #define EXTERN extern "C"
 #endif
 
+#define FIRST_COLUMN 1
+
   /* m2.flex provides a lexical analyser for GNU Modula-2.  */
 
   struct lineInfo {
@@ -558,7 +560,7 @@ static void consumeLine (void)
   currentLine->lineno = lineno;
   currentLine->tokenpos=0;
   currentLine->nextpos=0;
-  currentLine->column=0;
+  currentLine->column=FIRST_COLUMN;
   START_LINE (lineno, yyleng);
   yyless(1);  /* push back all but the \n */
   traceLine ();
@@ -621,7 +623,6 @@ static void updatepos (void)
   seenModuleStart  = false;
   currentLine->nextpos = currentLine->tokenpos+yyleng;
   currentLine->toklen  = yyleng;
-  /* if (currentLine->column == 0) */
   currentLine->column = currentLine->tokenpos+1;
   currentLine->location =
 M2Options_OverrideLocation (GET_LOCATION (currentLine->column,
@@ -677,7 +678,7 @@ static void initLine (void)
   currentLine->toklen = 0;
   currentLine->nextpos= 0;
   currentLine->lineno = lineno;
-  currentLine->column = 0;
+  currentLine->column = FIRST_COLUMN;
   currentLine->inuse  = true;
   currentLine->next   = NULL;
 }
@@ -812,10 +813,10 @@ EXTERN bool m2flex_OpenSource (char *s)
 
 EXTERN int m2flex_GetLineNo (void)
 {
-  if (currentLine != NULL)
-return currentLine->lineno;
-  else
+  if (currentLine == NULL)
 return 0;
+  else
+return currentLine->lineno;
 }
 
 /*
@@ -825,10 +826,10 @@ EXTERN int m2flex_GetLineNo (void)
 
 EXTERN int m2flex_GetColumnNo (void)
 {
-  if (currentLine != NULL)
-return currentLine->column;
+  if (currentLine == NULL)
+return FIRST_COLUMN;
   else
-return 0;
+return currentLine->column;
 }
 
 /*
@@ -837,10 +838,10 @@ EXTERN int m2flex_GetColumnNo (void)
 
 EXTERN location_t m2flex_GetLocation (void)
 {
-  if (currentLine != NULL)
-return currentLine->location;
-  else
+  if (currentLine == NULL)
 return 0;
+  else
+return currentLine->location;
 }
 
 /*
-- 
2.20.1

Re: [PATCH v2] libstdc++: implement Philox Engine [PR119794]

2025-07-11 Thread 1nfocalypse

Good evening!

Thank you both for the review, I'll get to work on cleaning it up and send out 
a v3 soon. Additionally, don't worry about the delay, and thank you for your 
patience. Have a good weekend!

Regards,
1nfocalypse

On Friday, July 11th, 2025 at 9:29 AM, Patrick Palka  wrote:

> Hi,
> 
> On Thu, 22 May 2025, 1nfocalypse wrote:
> 
> > Implements Philox Engine (P2075R6) and associated tests.
> > 
> > v2 corrects a multiline comment left in error in serialize.cc, and 
> > additionally corrects a bug hidden by said comment, where the stream was 
> > given the output of 'y()' instead of 'y', causing state to be
> > incorrectly passed. Lastly, it fixes numerous whitespace issues found in 
> > the original patch. My apologies for not noticing prior to the submission 
> > of the original patch, which can now be disregarded.
> > 
> > To reiterate from the original email, the template unpacking functions are 
> > placed in a private classifier prior to the public one due to an ordering 
> > bug, where in order to function correctly, they must be
> > placed prior to the bulk of the header. This is counter to the style 
> > recommendations, but I was unable to obtain functionality any other way. 
> > Additionally, while SIMD instructions are not utilized, and I do
> > not think that they would integrate well with how the generator's state is 
> > currently handled, some structure choices could be made that may make them 
> > of interest.
> > 
> > Lastly, since word width can be specified, and thus atypical, maximum value 
> > is calculated via some bit manipulation rather than numeric_limits, since 
> > the specified width may differ from the width of the type
> > used.
> > 
> > Built/tested on x86_64-linux-gnu.
> 
> 
> Sorry for the delay and thanks for your patience! Some initial review
> comments below.
> 
> > * 1nfocalypse
> 
> > Subject: [PATCH] [PATCH v2] libstdc++: implement Philox Engine [PR119794]
> > 
> > The template unpacking functions, while private, are placed prior
> > to the public access specifier due to issues where the template
> > pack could not be unpacked and used to populate the public member
> > arrays without being declared beforehand.
> > 
> > Additionally, the tests implemented attempt to mirror the tests
> > for other engines, when they apply. Changes to random
> > provided cause for changing 'pr60037-neg.cc' because it suppresses
> > an error by explicit line number. It should still be correctly
> > suppressed in this patch. Lastly, v2 fixes an issue in
> > 'serialize.cc' for Philox, where a portion of the test was
> > commented out, hiding a bug where 'y()' was passed to the
> > stream instead of 'y'. Both have been remedied in this patch.
> 
> 
> This implements the changes to the original paper
> https://wg21.link/lwg4134
> https://wg21.link/lwg4153
> right? Maybe we could make a note of that in the commit message.
> 
> > Plus some whitespace fixes.
> > 
> > PR libstdc++/119794
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * include/bits/random.h: Add Philox Engine components.
> > * include/bits/random.tcc: Implement Philox Engine components.
> > * testsuite/26_numerics/random/pr60037-neg.cc: Alter line #.
> > * testsuite/26_numerics/random/inequal.cc: New test.
> > * testsuite/26_numerics/random/philox4x32.cc: New test.
> > * testsuite/26_numerics/random/philox4x64.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/cons/
> > 119794.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/cons/
> > copy.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/cons/
> > default.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/cons/
> > seed.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/cons/
> > seed_seq.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/operators/
> > equal.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/operators/
> > inequal.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/operators/
> > serialize.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/requirements/
> > constants.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/requirements/
> > constexpr_data.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/requirements/
> > constexpr_functions.cc: New test.
> > * testsuite/26_numerics/random/philox_engine/requirements/
> > typedefs.cc: New test.
> > ---
> > libstdc++-v3/include/bits/random.h | 340 ++
> > libstdc++-v3/include/bits/random.tcc | 201 +++
> > .../testsuite/26_numerics/random/inequal.cc | 49 +++
> > .../26_numerics/random/philox4x32.cc | 42 +++
> > .../26_numerics/random/philox4x64.cc | 42 +++
> > .../random/philox_engine/cons/119794.cc | 57 +++
> > .../random/philox_engine/cons/copy.cc | 44 +++
> > .../random/philox_engine/cons/default.cc | 46 +++
> > .../random/philox_engine/cons/seed.cc | 39 ++
> > .../random/philox_engine/cons/seed_seq.cc | 41 +++
> > .../random/philox_engine/operators/equal.

Re: [Patch] Fortran/OpenACC: Permit PARAMETER as 'var' in clauses (+ ignore)

2025-07-11 Thread Tobias Burnus


Now, finally pushed as r16-2213-g451b6dbf475959.

Tobias

On June 27, 2025, Tobias Burnus wrote:

Background: In real-world code, one can find:
   !$ACC DECLARE COPYIN(c1es, c2es, ...)
as here for the ICON weather model. This clearly implies that other
compilers accept and, potentially, require those. For better
compatibility with real-world use, the just released OpenACC 3.4 now
permits PARAMETER but permits compilers to ignore those (remove them
when doing optimizations).


Thus, this patch permits now named constants (PARAMETER) as 'var'
in OpenACC [with an off-by-default warning in all but one case
(device_resident, no warning)] but then ignores them later.


If you look at the following patch, I think the following is ponder about:

* Does skipping over PARAMETERS (named constants) in trans-openmp.cc
   clause handling will break some unrelated OpenACC or OpenMP code?
   (In principle, resolving an expression should remove the parameter,
   replacing it by its value. And the called trans-openmp.cc functions
   also should only deal with non-expressions.)

* Does this handle for OpenACC all cases (or did I miss one?)
   Does it handle too much for OpenACC (or OpenACC?)

* Do you think the warning handling is fine/consistent?

I think the patch should be fine, but, of course, I might have missed
something.


Comments, remarks, suggestions about this patch?

Tobias

Re: Rewrite assign_discriminators pass

2025-07-11 Thread H.J. Lu

On Fri, Jul 11, 2025 at 7:32 PM Jan Hubicka  wrote:
>
> > So with this the discriminator we assign might depend on whether
> > we have debug stmts or not.  We output them only to debug info, so
> > it should in principle not cause compare-debug issues, right?  And
> > we don't use discriminators to affect code generation (hopefully).
>
> This is the reason of opts.cc change:
> > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > index 6ca1ec7e865..60ad633b7ff 100644
> > --- a/gcc/opts.cc
> > +++ b/gcc/opts.cc
> > @@ -1411,11 +1411,14 @@ finish_options (struct gcc_options *opts, struct 
> > gcc_options *opts_set,
> >   opts->x_debug_info_level = DINFO_LEVEL_NONE;
> >  }
> >
> > +  /* Also enable markers with -fauto-profile even when debug info is 
> > disabled,
> > + so we assign same discriminators and can read back the profile info.  
> > */
> >if (!opts_set->x_debug_nonbind_markers_p)
> >  opts->x_debug_nonbind_markers_p
> >= (opts->x_optimize
> > -  && opts->x_debug_info_level >= DINFO_LEVEL_NORMAL
> > -  && (dwarf_debuginfo_p (opts) || codeview_debuginfo_p ())
> > +  && ((opts->x_debug_info_level >= DINFO_LEVEL_NORMAL
> > +   && (dwarf_debuginfo_p (opts) || codeview_debuginfo_p ()))
> > +  || opts->x_flag_auto_profile)
> >&& !(opts->x_flag_selective_scheduling
> > || opts->x_flag_selective_scheduling2));
>
> We only consume discriminators if we produce dwarf or if we do
> auto-profile and they indeed must agree.  With -Wauto-profile you now
> get compiler complain if they does not.
>
> I enable debug stmt markers in both cases, so discriminators should be
> the same.  I tested that on spec and it seems to work. As discussed on
> IRC I will look into possibility of enabling compare_debug for
> profiledbootstrap and autoprofiledbootstrap where it is currently off.
>
> We already have function to remove debug statements, so I guess we could
> remove them after auto-profile annotate pass at -O0, that I plan to look
> into incrementally now.  My immediate plan is to fix the create_gcov
> consumer, so things can be finally properly tested.
>
> Honza

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121045

-- 
H.J.

Re: [PATCH] testsuite: arm: Add effective-target vect_early_break to vect-tsvc-*

2025-07-11 Thread Alexandre Oliva

On Jul 11, 2025, Christophe Lyon  wrote:

> I have another concern (hence cc'ing Alexandre): vect.exp calls
> check_vect_support_and_set_flags which defines dg-do-what-default
> according to what it discovers, meaning that for some targets these
> tests are 'run' and on others they are just 'compile'.
> So I suppose we should use 'dg-require-effective-target
> vect_early_break_hw' only when running the tests and
> 'dg-require-effective-target vect_early_break' when compiling them?

Yeah, we may have use for some new dg-require-effective-target variants,
say dg-require-effective-target-for-link or
dg-require-effective-target-for-run to account for this sort of
conditional requirement.

That said, testsuite maintainers have been, erhm, less than responsive
lately, and I've had a number of patches, including a fix for a bug I
introduced that others have also attempted to fix, that have been
unreviewed for 3 months today.

A significant part of my work has required significant attention to the
GCC testsuite; I wonder if it would make sense for me to volunteer to be
appointed as testsuite co-maintainer, or reviewer, or somesuch.  Should
I take any steps towards that end, is this enough of a hint, hint ;-),
or is the fact that this hasn't happened for so long a hint that *I*
should take that it's unwanted for some reason? ;-)

> I suppose at the moment we completely skip these tests, while would
> could at least compile them on some targets?

*nod*.  I'm not sure how much sense it makes to compile these specific
tests when the feature is not available, but it shouldn't hurt.  Perhaps
instead of xfail we should change the dg-final scans to have target
requirements instead, that would be ignored if the feature is not
present.

Now, as for the problem at hand, IIUC there's a disconnect between the
vector support that check_vect_support_and_set_flags tests for, and the
vector support that these tests check for.  The compile/run default set
by the former doesn't necessarily apply to the latter; ISTM that, for
tests with stricter vector unit requirements, we *should* IMHO consider
overriding the action in these tests with dg-do-if (*):

/* { dg-do-if compile { target vect_early_break } } */
/* { dg-do-if run { target vect_early_break_hw } } */

and drop the corresponding dg-require-effective-target directives.

I also wonder whether it would make sense for
check_vect_support_and_set_flags to try arm_v8_neon _ok and _hw, and go
for that for the default options and actions.

(*) provided that either of the identical patches that fix it get in
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684734.html

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

99 matches

Mail list logo