[PATCH v2] libstdc++: Fix _Padding_sink in case when predicted is between padwidht and maxwidth [PR109162]

2025-04-25 Thread Tomasz Kamiński
The _Padding_sink was behaving incorrectly, when the predicated width (based on
code units count) was higher than _M_maxwidth, but lower than _M_padwidth.
In this case _M_update() returned without calling _M_force_update() and 
computing
field width for Unicode encoding, because _M_buffering() returned 'true'.
As a consequence we switched to _M_ignoring() mode, while storing a sequence
with more code units but smaller field width than _M_maxwidth.

We now call _M_force_update() if predicted width is greater or equal to either
_M_padwidth or _M_maxwidth.

This happened for existing test case on 32bit architecture.

libstdc++-v3/ChangeLog:

* include/std/format (_Padding_sink::_M_update): Fixed condition for
calling _M_force_update.
* testsuite/std/format/debug.cc: Add test that reproducers this issue
on 64bit architecture.
* testsuite/std/format/ranges/sequence.cc: Another edge value test.
---
Fixed some types in message and replaced > with >= in check.
Tested on x86_64-linux with unix{,-std=c++98,-std=gnu++11,-std=gnu++20,
-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG,-D_GLIBCXX_ASSERTIONS/-m32,-std=gnu++23}.
OK for trunk?

 libstdc++-v3/include/std/format  | 7 ---
 libstdc++-v3/testsuite/std/format/debug.cc   | 9 +
 libstdc++-v3/testsuite/std/format/ranges/sequence.cc | 9 +
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 69d8d189db6..b3794b64b59 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3697,9 +3697,10 @@ namespace __format
   _M_update(size_t __new)
   {
_M_printwidth += __new;
-   if (_M_buffering())
- return true;
-   return _M_force_update();
+   // Compute estimated witdth, to see if is not reduced.
+   if (_M_printwidth >= _M_padwidth || _M_printwidth >= _M_maxwidth)
+ return _M_force_update();
+   return true;
   }
 
   void
diff --git a/libstdc++-v3/testsuite/std/format/debug.cc 
b/libstdc++-v3/testsuite/std/format/debug.cc
index d3402f80f4b..6165a295496 100644
--- a/libstdc++-v3/testsuite/std/format/debug.cc
+++ b/libstdc++-v3/testsuite/std/format/debug.cc
@@ -596,6 +596,10 @@ void test_padding()
   VERIFY( strip_prefix(resv, 48, '*') );
   VERIFY( resv == inv );
 
+  resv = res = std::format("{:*>300.200s}", in);
+  VERIFY( strip_prefix(resv, 108, '*') );
+  VERIFY( resv == inv );
+
   resv = res = std::format("{:*>240.200s}", in);
   VERIFY( strip_prefix(resv, 48, '*') );
   VERIFY( resv == inv );
@@ -678,6 +682,11 @@ void test_padding()
   VERIFY( strip_quotes(resv) );
   VERIFY( resv == inv );
 
+  resv = res = std::format("{:*>300.200?}", in);
+  VERIFY( strip_prefix(resv, 106, '*') );
+  VERIFY( strip_quotes(resv) );
+  VERIFY( resv == inv );
+
   resv = res = std::format("{:*>240.200?}", in);
   VERIFY( strip_prefix(resv, 46, '*') );
   VERIFY( strip_quotes(resv) );
diff --git a/libstdc++-v3/testsuite/std/format/ranges/sequence.cc 
b/libstdc++-v3/testsuite/std/format/ranges/sequence.cc
index 75fe4c19a52..32242860f10 100644
--- a/libstdc++-v3/testsuite/std/format/ranges/sequence.cc
+++ b/libstdc++-v3/testsuite/std/format/ranges/sequence.cc
@@ -295,6 +295,15 @@ void test_padding()
   resv = res = std::format("{:*>10n:}", vs);
   VERIFY( check_elems(resv, false) );
 
+  resv = res = std::format("{:*>256}", vs);
+  VERIFY( strip_prefix(resv, 48, '*') );
+  VERIFY( strip_squares(resv) );
+  VERIFY( check_elems(resv, true) );
+
+  resv = res = std::format("{:*>256n}", vs);
+  VERIFY( strip_prefix(resv, 50, '*') );
+  VERIFY( check_elems(resv, true) );
+
   resv = res = std::format("{:*>240}", vs);
   VERIFY( strip_prefix(resv, 32, '*') );
   VERIFY( strip_squares(resv) );
-- 
2.49.0



[PATCH v2] tailcall: Support ERF_RETURNS_ARG for tailcall [PR67797]

2025-04-25 Thread Andrew Pinski
r15-6943-g9c4397cafc5ded added support to undo IPA-VRP return value 
optimization for tail calls,
using the same code ERF_RETURNS_ARG can be supported for functions which return 
one of their arguments.
This allows for tail calling of memset/memcpy in some cases which were not 
handled before.

Note this is very similar to 
https://gcc.gnu.org/legacy-ml/gcc-patches/2016-11/msg02485.html except
it has a few more checks.  Also on the question of expand vs tail call here is 
that this path is also
used by the IPA-VRP return value path and yes we get a tail call.
Note in the review in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83142#c2 
mentions about
re-instantiate a LHS on the call & propagate to dominating uses. Even though 
that can be done
for the ERF_RETURNS_ARG case, it is not done for the IPA-VRP return value case 
already so I don't think
there is anything to be done there.

Changes since v1:
* v2: Add an useless_type_conversion_p check as suggested by Jakub
  and add a testcase for that.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/67797

gcc/ChangeLog:

* tree-tailcall.cc (find_tail_calls): Add support for ERF_RETURNS_ARG.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/tailcall-14.c: New test.
* gcc.dg/tree-ssa/tailcall-15.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c |  25 +
 gcc/testsuite/gcc.dg/tree-ssa/tailcall-15.c |  16 +++
 gcc/tree-tailcall.cc| 109 +++-
 3 files changed, 104 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/tailcall-15.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c
new file mode 100644
index 000..6fadff8ea00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-14.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+/* PR tree-optimization/67797 */
+
+void *my_func(void *s, int n)
+{
+  __builtin_memset(s, 0, n);
+  return s;
+}
+void *my_func1(void *d, void *s, int n)
+{
+  __builtin_memcpy(d, s, n);
+  return d;
+}
+void *my_func2(void *s, void *p1, int n)
+{
+  if (p1)
+__builtin_memcpy(s, p1, n);
+  else
+__builtin_memset(s, 0, n);
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 4 "tailc"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-15.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-15.c
new file mode 100644
index 000..bf24fd8562f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-15.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+/* PR tree-optimization/67797 */
+
+/* We should not get a tail call here since the
+   types don't match and we don't know how the argument
+   truncation will work. */
+
+unsigned char my_func(int n)
+{
+  __builtin_memset((void*)0, 0, n);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Found tail call" "tailc"} } */
diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
index f593363dae4..db3010f67c4 100644
--- a/gcc/tree-tailcall.cc
+++ b/gcc/tree-tailcall.cc
@@ -1083,57 +1083,74 @@ find_tail_calls (basic_block bb, struct tailcall **ret, 
bool only_musttail,
 {
   bool ok = false;
   value_range val;
-  tree valr;
-  /* If IPA-VRP proves called function always returns a singleton range,
-the return value is replaced by the only value in that range.
-For tail call purposes, pretend such replacement didn't happen.  */
   if (ass_var == NULL_TREE && !tail_recursion)
-   if (tree type = gimple_range_type (call))
- if (tree callee = gimple_call_fndecl (call))
-   if ((INTEGRAL_TYPE_P (type)
-|| SCALAR_FLOAT_TYPE_P (type)
-|| POINTER_TYPE_P (type))
-   && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
- type)
-   && useless_type_conversion_p (TREE_TYPE (ret_var), type)
-   && ipa_return_value_range (val, callee)
-   && val.singleton_p (&valr))
+   {
+ tree other_value = NULL_TREE;
+ /* If we have a function call that we know the return value is the 
same
+as the argument, try the argument too. */
+ int flags = gimple_call_return_flags (call);
+ if ((flags & ERF_RETURNS_ARG) != 0
+ && (flags & ERF_RETURN_ARG_MASK) < gimple_call_num_args (call))
+   {
+ tree arg = gimple_call_arg (call, flags & ERF_RETURN_ARG_MASK);
+ if (useless_type_conversion_p (TREE_TYPE (arg), TREE_TYPE 
(ret_var)))
+   other_value = arg;
+   }
+ /* If IPA-VRP proves called function always returns a singleton range,
+the return value is replaced by the only value in that range.

RE: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-25 Thread Li, Pan2
> IMHO this is how it should roughly look like:

> With GR2VR=2:
> vadd.vv: cost 4 = COST_N_INSNS (1)
> vmv.v.x: cost COST_N_INSNS (GR2VR) = 8
> vadd.vx: cost 4 + GR2VR * COST_N_INSNS (1) = 12

> With GR2VR=1:
> vadd.vv: cost 4
> vmv.v.x: cost 4
> vadd.vx: cost 4 + 4 = 8

> With GR2VR=0:
> vadd.vv: cost 4
> vmv.v.x: cost 4 (or less?)
> vadd.vx: cost 4 + 0 * COST_N_INSNS (1) = 4

> So with GR2VR > 0 we would perform the replacement when the frequency is 
> similar.  With GR2VR == 0 we should always do.

> vmv.v.x cost 4 with GR2VR cost == 0 is a bit debatable but setting it to 0 
> would also seem off.

Make sense to me, it looks like the combine will always take place if GR2VR is 
0, 1 or 2 for now.
I am try to customize the cost here to make it fail to combine but get failed 
with below change.

+  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0 {
+cost_val = 1;
+  }
+
+  if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
+  && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 1 {
+cost_val = 8;
+  }
+
+  if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
+  && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 1 {
+cost_val = 2; // never picked up during combine.
+  }

It takes 8 for original cost as well as replacement(see below combine log). 
Thus, it will be always
keep replacement during combine. 

  51   │ trying to combine definition of r135 in:
  52   │11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │   REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │ (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 32 (weighted: 262.469092), replacement cost = 32 
(weighted: 258.909092); keeping replacement
  61   │ rescanning insn with uid = 18.
  62   │ updating insn 18 in-place
  63   │ verify found no changes in insn with uid = 18.
  64   │ deleting insn 11
  65   │ deleting insn with uid = 11.

Based on above, I have another try to understand how late-combine leverage the 
RTX_COST.
Aka, set vadd v1, (vec_dup(x1)) to 8 and others to 1.

+  if (rcode == PLUS) {
+rtx arg0 = XEXP (x, 0);
+rtx arg1 = XEXP (x, 1);
+
+if (riscv_v_ext_mode_p (GET_MODE (arg1))
+   && GET_CODE (arg0) == VEC_DUPLICATE) {
+   cost_val = 8;
+}
+  }

Then the late-combine reject the replacement as expected. Thus, the condition 
failed to combine may
Looks like vmv.vx + vadd.vv < vadd.vx here if my understanding is correct.  If 
so, it will also impact the
--param we would like to introduce, a single --param=gr2vr_cost=XXX is not good 
enough to make sure that
the condition is true, we may need --param=vv_cost/vx_cost=XXX.

Btw, is there any approach to set the cost attached to the 
define_insn_and_split? Which may be more
friendly to catch it from RTX_COST up to a point.

  51   │ trying to combine definition of r135 in:
  52   │11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │   REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │ (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 32 
(weighted: 258.909092); rejecting replacement
  61   │

Pan


-Original Message-
From: Robin Dapp  
Sent: Thursday, April 24, 2025 8:13 PM
To: Li, Pan2 ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken ; Liu, Hongtao ; Robin Dapp 

Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx 
on GR2VR cost

>> Ah, I see, thanks.  So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 
>> while vadd.vx costs 1 + 2, making it cheaper?
>
> Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff 
> cost here to see
> the impact to late-combine.
>
> +  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0 {
> +cost_val = get_vector_costs ()->regmove->GR2VR;
> +  }
>
>  cut line 
>
> If GR2VR is 2, we will perform the combine as below.
>
>  51 trying to combine definition of r135 in:
>  5211: r135:RVVM1DI=vec_duplicate(r150:DI)
>  53 into:
>  5418: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
>  55   REG_DEAD r146:RVVM1DI
>  56 successfully matched this instruction to *add_vx_rvvm1di:
>  57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
>  58 (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
>  59 (reg:RVVM1DI 146)))
>  60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4 
> (weighted: 32.363637); keeping replacement
>  61 rescanning insn with uid = 18.
>  62 updating

Re: [PATCH] simplify-rtx: Combine bitwise operations in more cases

2025-04-25 Thread Jeff Law




On 4/25/25 9:29 AM, Richard Sandiford wrote:


@@ -4274,6 +4286,18 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP 
(op0, 1));
}
  
+  /* Convert (and (subreg (not X) off) Y) into (and (not (subreg X off)) Y)

+to expose opportunities to combine AND and NOT.  */
+  if (GET_CODE (op0) == SUBREG


I think we should also check !paradoxical_subreg_p (op0).  There are
two reasons:

(1) (and (subreg (something-narrower)) (const_int mask)) is a common
 zero-extension idiom.  Pushing the subreg down into the first
 operand would prevent that.

(2) Applying the rule in the paradoxical case would compute the NOT
 in a wider mode, which might be more expensive.
I'd gotten as far as concluding it was safe from a correctness 
standpoint to not test for paradoxicals.  But I hadn't even considered 
the impact it could have on the zero extension idiom which seems fairly 
important.


I'm not as worried about the size of the NOT.  If NOT in a wider mode is 
more expensive, then it probably should be reflected as such in rtx 
costing for that port and the right things should just happen for the 
most part.


But yes, I'm overall supportive of the patch and I think Richard's 
comments are solid recommendations.


Jeff


RE: [PATCH 2/2] RISC-V: Add testcases for signed vector SAT_ADD IMM form 1

2025-04-25 Thread Li, Pan2
Nit for tailing empty line as below, otherwise LGTM for RISC-V part.

+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -ftree-vectorize 
-fdump-tree-optimized" } */
+
+#include "vec_sat_arith.h"
+
+DEF_VEC_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, 200, INT8_MIN, INT8_MAX)
+DEF_VEC_SAT_S_ADD_IMM_FMT_1(1, int8_t, uint8_t, -300, INT8_MIN, INT8_MAX)
+
+/* { dg-final { scan-tree-dump-not ".SAT_ADD " "optimized" } } */
+ <--- here.

Pan

-Original Message-
From: Li Xu  
Sent: Thursday, January 2, 2025 4:04 PM
To: gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; richard.guent...@gmail.com; tamar.christ...@arm.com; 
juzhe.zh...@rivai.ai; Li, Pan2 ; jeffreya...@gmail.com; 
rdapp@gmail.com; xuli 
Subject: [PATCH 2/2] RISC-V: Add testcases for signed vector SAT_ADD IMM form 1

From: xuli 

This patch adds testcase for form1, as shown below:

void __attribute__((noinline))   \
vec_sat_s_add_imm_##T##_fmt_1##_##INDEX (T *out, T *op_1, unsigned limit) \
{\
  unsigned i;\
  for (i = 0; i < limit; i++)\
{\
  T x = op_1[i]; \
  T sum = (UT)x + (UT)IMM;   \
  out[i] = (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}\
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: add signed vec 
SAT_ADD IMM form1.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: add sat_s_add_imm 
data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i16.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i32.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i64.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i8.c: New 
test.
* 
gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i16.c: New test.
* 
gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i8.c: 
New test.

---
 .../riscv/rvv/autovec/sat/vec_sat_arith.h |  25 ++
 .../riscv/rvv/autovec/sat/vec_sat_data.h  | 240 ++
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c  |  10 +
 .../autovec/sat/vec_sat_s_add_imm-run-1-i16.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i32.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i64.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i8.c  |  28 ++
 .../sat/vec_sat_s_add_imm_type_check-1-i16.c  |   9 +
 .../sat/vec_sat_s_add_imm_type_check-1-i32.c  |   9 +
 .../sat/vec_sat_s_add_imm_type_check-1-i8.c   |  10 +
 13 files changed, 445 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv

Re: [PATCH] Add m32c*-*-* to the list of obsolete targets

2025-04-25 Thread Jeff Law




On 4/25/25 12:38 PM, Iain Buclaw wrote:

Hi,

This patch marks m32c*-*-* targets obsolete in GCC 16.  The target has
not had a maintainer since GCC 9 (r9-1950), and fails to compile even
the simplest of functions since GCC 8 (r8-777, as reported in PR83670).

OK for trunk?

Regards,
Iain.

---
contrib/ChangeLog:

* config-list.mk: Add m32c*-*-* to the list of obsoleted targets.

gcc/ChangeLog:

* config.gcc (LIST): --enable-obsolete for m32c-elf.

OK
jeff



[PATCH] simplify-rtx: Simplify `(zero_extend (and x CST))` -> (and (subreg x) CST)

2025-04-25 Thread Andrew Pinski
This adds the simplification of a ZERO_EXTEND of an AND. This optimization
was already handled in combine via combine_simplify_rtx and the handling
there of compound_operations (ZERO_EXTRACT).

Build and tested for aarch64-linux-gnu.
Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_unary_operation_1) :
Add simplifcation for and with a constant.

Signed-off-by: Andrew Pinski 
---
 gcc/simplify-rtx.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 88d31a71c05..06b52ca8003 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1709,6 +1709,17 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
code, machine_mode mode,
   if (GET_MODE (op) == mode)
return op;
 
+  /* (zero_extend:SI (and:QI X (const))) -> (and:SI (lowpart:SI X) const)
+where const does not sign bit set. */
+  if (GET_CODE (op) == AND
+ && CONST_INT_P (XEXP (op, 1))
+ && INTVAL (XEXP (op, 1)) > 0)
+   {
+ rtx tem = rtl_hooks.gen_lowpart_no_emit (mode, XEXP (op, 0));
+ if (tem)
+   return simplify_gen_binary (AND, mode, tem, XEXP (op, 1));
+   }
+
   /* Check for a zero extension of a subreg of a promoted
 variable, where the promotion is zero-extended, and the
 target mode is the same as the variable's promotion.  */
-- 
2.43.0



[PATCH 3/3] aarch64: Add more vector permute tests for the FMOV optimization [PR100165]

2025-04-25 Thread Pengxuan Zheng
This patch adds more tests for vector permutes which can now be optimized as
FMOV with the generic PERM change and the aarch64 AND patch.

PR target/100165

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fmov-3.c: New test.
* gcc.target/aarch64/fmov-4.c: New test.
* gcc.target/aarch64/fmov-5.c: New test.
* gcc.target/aarch64/fmov-be-3.c: New test.
* gcc.target/aarch64/fmov-be-4.c: New test.
* gcc.target/aarch64/fmov-be-5.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/testsuite/gcc.target/aarch64/fmov-3.c| 130 
 gcc/testsuite/gcc.target/aarch64/fmov-4.c|  94 
 gcc/testsuite/gcc.target/aarch64/fmov-5.c| 150 +++
 gcc/testsuite/gcc.target/aarch64/fmov-be-3.c |  75 ++
 gcc/testsuite/gcc.target/aarch64/fmov-be-4.c |  54 +++
 gcc/testsuite/gcc.target/aarch64/fmov-be-5.c | 150 +++
 6 files changed, 653 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-5.c

diff --git a/gcc/testsuite/gcc.target/aarch64/fmov-3.c 
b/gcc/testsuite/gcc.target/aarch64/fmov-3.c
new file mode 100644
index 000..e7cf5e0b5de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmov-3.c
@@ -0,0 +1,130 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef char v8qi __attribute__ ((vector_size (8)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef char v16qi __attribute__ ((vector_size (16)));
+
+/*
+** f_v4hi:
+** fmovs0, s0
+** ret
+*/
+v4hi
+f_v4hi (v4hi x)
+{
+  return __builtin_shuffle (x, (v4hi){ 0, 0, 0, 0 }, (v4hi){ 0, 1, 4, 5 });
+}
+
+/*
+** g_v4hi:
+** uzp1v([0-9]+).2d, v0.2d, v0.2d
+** adrpx([0-9]+), .LC0
+** ldr d([0-9]+), \[x\2, #:lo12:.LC0\]
+** tbl v0.8b, {v\1.16b}, v\3.8b
+** ret
+*/
+v4hi
+g_v4hi (v4hi x)
+{
+  return __builtin_shuffle (x, (v4hi){ 0, 0, 0, 0 }, (v4hi){ 3, 1, 4, 2 });
+}
+
+/*
+** f_v8hi:
+** fmovs0, s0
+** ret
+*/
+v8hi
+f_v8hi (v8hi x)
+{
+  return __builtin_shuffle (x, (v8hi){ 0, 0, 0, 0, 0, 0, 0, 0 },
+   (v8hi){ 0, 1, 8, 9, 10, 11, 12, 13 });
+}
+
+/*
+** f_v4si:
+** fmovd0, d0
+** ret
+*/
+v4si
+f_v4si (v4si x)
+{
+  return __builtin_shuffle (x, (v4si){ 0, 0, 0, 0 }, (v4si){ 0, 1, 4, 5 });
+}
+
+/*
+** g_v4si:
+** fmovd0, d0
+** ret
+*/
+v4si
+g_v4si (v4si x)
+{
+  return __builtin_shuffle ((v4si){ 0, 0, 0, 0 }, x, (v4si){ 4, 5, 2, 3 });
+}
+
+/*
+** h_v4si:
+** fmovs0, s0
+** ret
+*/
+v4si
+h_v4si (v4si x)
+{
+  return __builtin_shuffle (x, (v4si){ 0, 0, 0, 0 }, (v4si){ 0, 4, 5, 6 });
+}
+
+/*
+** f_v4sf:
+** fmovd0, d0
+** ret
+*/
+v4sf
+f_v4sf (v4sf x)
+{
+  return __builtin_shuffle (x, (v4sf){ 0, 0, 0, 0 }, (v4si){ 0, 1, 6, 7 });
+}
+
+/*
+** f_v8qi:
+** fmovs0, s0
+** ret
+*/
+v8qi
+f_v8qi (v8qi x)
+{
+  return __builtin_shuffle (x, (v8qi){ 0, 0, 0, 0, 0, 0, 0, 0 },
+   (v8qi){ 0, 1, 2, 3, 10, 11, 12, 13 });
+}
+
+/*
+** f_v16qi:
+** fmovd0, d0
+** ret
+*/
+v16qi
+f_v16qi (v16qi x)
+{
+  return __builtin_shuffle (
+  x, (v16qi){ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  (v16qi){ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23 });
+}
+
+/*
+** g_v16qi:
+** fmovs0, s0
+** ret
+*/
+v16qi
+g_v16qi (v16qi x)
+{
+  return __builtin_shuffle (
+  x, (v16qi){ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  (v16qi){ 0, 1, 2, 3, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 });
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/fmov-4.c 
b/gcc/testsuite/gcc.target/aarch64/fmov-4.c
new file mode 100644
index 000..ba976251354
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fmov-4.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target ("arch=armv8.2-a+fp16")
+
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef char v8qi __attribute__ ((vector_size (8)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef char v16qi __attribute__ ((vector_size (16)));
+
+/*
+** f_v4hi:
+** fmovh0, h0
+** ret
+*/
+v4hi
+f_v4hi (v4hi x)
+{
+  return __builtin_shuffle (x, (v4hi){ 0, 0, 0, 0 }, (v4hi){ 0, 4, 5, 6 });
+}
+
+/*
+** g_v4hi:
+** fmovh0, h0
+** ret
+*/
+v4hi
+g_v4hi (v4hi x)
+{
+  return __builtin_shuffle 

[PATCH 2/3] aarch64: Optimize AND with certain vector of immediates as FMOV [PR100165]

2025-04-25 Thread Pengxuan Zheng
We can optimize AND with certain vector of immediates as FMOV if the result of
the AND is as if the upper lane of the input vector is set to zero and the lower
lane remains unchanged.

For example, at present:

v4hi
f_v4hi (v4hi x)
{
  return x & (v4hi){ 0x, 0x, 0, 0 };
}

generates:

f_v4hi:
movid31, 0x
and v0.8b, v0.8b, v31.8b
ret

With this patch, it generates:

f_v4hi:
fmovs0, s0
ret

PR target/100165

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_output_fmov): New prototype.
(aarch64_simd_valid_and_imm_fmov): Likewise.
* config/aarch64/aarch64-simd.md (and3): Allow FMOV
codegen.
* config/aarch64/aarch64.cc (aarch64_simd_valid_and_imm_fmov): New
function.
(aarch64_output_fmov): Likewise.
* config/aarch64/constraints.md (Df): New constraint.
* config/aarch64/predicates.md (aarch64_reg_or_and_imm): Update
predicate to support FMOV codegen.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fmov-1.c: New test.
* gcc.target/aarch64/fmov-2.c: New test.
* gcc.target/aarch64/fmov-be-1.c: New test.
* gcc.target/aarch64/fmov-be-2.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64-simd.md   |  10 +-
 gcc/config/aarch64/aarch64.cc|  75 ++
 gcc/config/aarch64/constraints.md|   7 +
 gcc/config/aarch64/predicates.md |   3 +-
 gcc/testsuite/gcc.target/aarch64/fmov-1.c| 149 +++
 gcc/testsuite/gcc.target/aarch64/fmov-2.c|  90 +++
 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c | 149 +++
 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c |  90 +++
 9 files changed, 569 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmov-be-2.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 1ca86c9d175..c461fce8896 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -933,6 +933,7 @@ char *aarch64_output_simd_mov_imm (rtx, unsigned);
 char *aarch64_output_simd_orr_imm (rtx, unsigned);
 char *aarch64_output_simd_and_imm (rtx, unsigned);
 char *aarch64_output_simd_xor_imm (rtx, unsigned);
+char *aarch64_output_fmov (rtx);
 
 char *aarch64_output_sve_mov_immediate (rtx);
 char *aarch64_output_sve_ptrues (rtx);
@@ -948,6 +949,7 @@ bool aarch64_simd_scalar_immediate_valid_for_move (rtx, 
scalar_int_mode);
 bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
 bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
 bool aarch64_simd_valid_and_imm (rtx);
+bool aarch64_simd_valid_and_imm_fmov (rtx, unsigned int * = NULL);
 bool aarch64_simd_valid_mov_imm (rtx);
 bool aarch64_simd_valid_orr_imm (rtx);
 bool aarch64_simd_valid_xor_imm (rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index e2afe87e513..e051e6459a5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1117,17 +1117,17 @@ (define_insn "fabd3"
   [(set_attr "type" "neon_fp_abd_")]
 )
 
-;; For AND (vector, register) and BIC (vector, immediate)
+;; For AND (vector, register), BIC (vector, immediate) and FMOV (register)
 (define_insn "and3"
   [(set (match_operand:VDQ_I 0 "register_operand")
(and:VDQ_I (match_operand:VDQ_I 1 "register_operand")
   (match_operand:VDQ_I 2 "aarch64_reg_or_and_imm")))]
   "TARGET_SIMD"
-  {@ [ cons: =0 , 1 , 2   ]
- [ w, w , w   ] and\t%0., %1., %2.
- [ w, 0 , Db  ] << aarch64_output_simd_and_imm (operands[2], 
);
+  {@ [ cons: =0 , 1 , 2  ; attrs: type   ]
+ [ w, w , w  ; neon_logic ] and\t%0., %1., 
%2.
+ [ w, 0 , Db ; neon_logic ] << aarch64_output_simd_and_imm 
(operands[2], );
+ [ w, w , Df ; fmov  ] << aarch64_output_fmov 
(operands[2]);
   }
-  [(set_attr "type" "neon_logic")]
 )
 
 ;; For ORR (vector, register) and ORR (vector, immediate)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 38c112cc92f..54895e3f456 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23516,6 +23516,61 @@ aarch64_simd_valid_and_imm (rtx op)
   return aarch64_simd_valid_imm (op, NULL, AARCH64_CHECK_AND);
 }
 
+/* Return true if OP is a valid SIMD and immediate which allows the and be
+   optimized as fmov.  If ELT_SIZE is nonnull, it represents the size of the
+   register for fmov.  */
+bool
+aarch64_simd_valid_and_imm_fmov (rtx op, unsigned int *elt_size)
+{
+  machine_mode mode = GET_MODE (op);
+  gcc_assert (!aarch64_sve_mode_p (mo

[PATCH 1/3] Recognize vector permute patterns which can be interpreted as AND [PR100165]

2025-04-25 Thread Pengxuan Zheng
Certain permute that blends a vector with zero can be interpreted as an AND of a
mask. This idea was suggested by Richard Sandiford when he was reviewing my
patch which tries to optimizes certain vector permute with the FMOV instruction
for the aarch64 target. Canonicalizing this class of vector permute as AND can
be more general and potentially benefit more targets.

For example, for the aarch64 target, at present:

v4hi
f_v4hi (v4hi x)
{
  return __builtin_shuffle (x, (v4hi){ 0, 0, 0, 0 }, (v4hi){ 4, 1, 6, 3 });
}

generates:

f_v4hi:
uzp1v0.2d, v0.2d, v0.2d
adrpx0, .LC0
ldr d31, [x0, #:lo12:.LC0]
tbl v0.8b, {v0.16b}, v31.8b
ret
.LC0:
.byte   -1
.byte   -1
.byte   2
.byte   3
.byte   -1
.byte   -1
.byte   6
.byte   7

With this patch, it generates:

f_v4hi:
mvniv31.2s, 0xff, msl 8
and v0.8b, v0.8b, v31.8b
ret

However, we do have to xfail a few i386 tests due to the new canonicalization
this patch introduces and PR119922 has been filed to track these regressions.

PR target/100165

gcc/ChangeLog:

* optabs.cc (vec_perm_and_mask): New function.
(expand_vec_perm_const): Add new AND canonicalization.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-pr94680.c: XFAIL.
* gcc.target/i386/avx10_2-vmovd-1.c: Likewise.
* gcc.target/i386/avx10_2-vmovw-1.c: Likewise.
* gcc.target/i386/avx512f-pr94680.c: Likewise.
* gcc.target/i386/avx512fp16-pr94680.c: Likewise.
* gcc.target/i386/sse2-pr94680.c: Likewise.
* gcc.target/aarch64/and-be.c: New test.
* gcc.target/aarch64/and.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/optabs.cc |  69 +-
 gcc/testsuite/gcc.target/aarch64/and-be.c | 125 ++
 gcc/testsuite/gcc.target/aarch64/and.c| 125 ++
 gcc/testsuite/gcc.target/i386/avx-pr94680.c   |   3 +-
 .../gcc.target/i386/avx10_2-vmovd-1.c |   3 +-
 .../gcc.target/i386/avx10_2-vmovw-1.c |   3 +-
 .../gcc.target/i386/avx512f-pr94680.c |   3 +-
 .../gcc.target/i386/avx512fp16-pr94680.c  |   3 +-
 gcc/testsuite/gcc.target/i386/sse2-pr94680.c  |   3 +-
 9 files changed, 330 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/and-be.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/and.c

diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 0a14b1eef8a..dca9df42673 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -6384,6 +6384,50 @@ expand_vec_perm_1 (enum insn_code icode, rtx target,
   return NULL_RTX;
 }
 
+/* Check if vec_perm mask SEL is a constant equivalent to an and operation of
+   the non-zero vec_perm operand with some mask consisting of 0xffs and 0x00s,
+   assuming the other vec_perm operand is a constant vector of zeros.  Return
+   the mask for the equivalent and operation, or NULL_RTX if the vec_perm can
+   not be modeled as an and.  MODE is the mode of the value being anded.
+   ZERO_OP0_P is true if the first operand of the vec_perm is a constant vector
+   of zeros or false if the second operand of the vec_perm is a constant vector
+   of zeros.  */
+static rtx
+vec_perm_and_mask (machine_mode mode, const vec_perm_indices &sel,
+  bool zero_op0_p)
+{
+  unsigned int nelt;
+  if (!GET_MODE_NUNITS (mode).is_constant (&nelt))
+return NULL_RTX;
+
+  rtx_vector_builder builder (mode, nelt, 1);
+  machine_mode emode = GET_MODE_INNER (mode);
+
+  for (unsigned int i = 0; i < nelt; i++)
+{
+  if (!zero_op0_p)
+   {
+ if (known_eq (sel[i], i))
+   builder.quick_push (CONSTM1_RTX (emode));
+ else if (known_ge (sel[i], nelt))
+   builder.quick_push (CONST0_RTX (emode));
+ else
+   return NULL_RTX;
+   }
+  else
+   {
+ if (known_eq (sel[i], nelt + i))
+   builder.quick_push (CONSTM1_RTX (emode));
+ else if (known_lt (sel[i], nelt))
+   builder.quick_push (CONST0_RTX (emode));
+ else
+   return NULL_RTX;
+   }
+}
+
+  return builder.build ();
+}
+
 /* Implement a permutation of vectors v0 and v1 using the permutation
vector in SEL and return the result.  Use TARGET to hold the result
if nonnull and convenient.
@@ -6422,12 +6466,18 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx 
v1,
   insn_code shift_code_qi = CODE_FOR_nothing;
   optab shift_optab = unknown_optab;
   rtx v2 = v0;
+  bool zero_op0_p = false;
+  bool zero_op1_p = false;
   if (v1 == CONST0_RTX (GET_MODE (v1)))
-shift_optab = vec_shr_optab;
+{
+  shift_optab = vec_shr_optab;
+  zero_op1_p = true;
+}
   else if (v0 == CONST0_RTX (GET_MODE (v0)))
 {
   shift_optab = vec_shl_optab;
   v2 = v1;
+  zero_op0_p = true;
 }
   if (shift_optab != unknown_optab)
 {
@@ -6463,6 +6513,

RE: [PATCH v3] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]

2025-04-25 Thread quic_pzheng
> Richard Sandiford  writes:
> > I think this would also simplify the evpc detection, since the
> > requirement for using AND is the same for big-endian and
> > little-endian, namely that index I of the result must either come from
> > index I of the nonzero vector or from any element of the zero vector.
> > (What differs between big-endian and little-endian is which masks
> > correspond to FMOV.)
> 
> Or perhaps more accurately, what differs between big-endian and
little-endian
> is the constant that needs to be materialised for a given permute mask.  I
think
> the easiest way of handling that would be to construct an array of
target_units
> (0xffs for bytes that come from the nonzero vector, 0x00s for bytes that
come
> from the zero
> vector) and then get native_encode_rtx to convert that into a vector
constant.
> native_encode_rtx will then do the endian correction for us.

Thanks for the great feedback, Richard! I've reworked the patch accordingly.
Please
let me know if you have any other comments.

[PATCH 1/3] Recognize vector permute patterns which can be interpreted as
AND [PR100165]
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681900.html

[PATCH 2/3] aarch64: Optimize AND with certain vector of immediates as FMOV
[PR100165]
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681901.html

[PATCH 3/3] aarch64: Add more vector permute tests for the FMOV optimization
[PR100165]
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681902.html

Thanks,
Pengxuan
> 
> Thanks,
> Richard




[PATCH] aarch64: Optimize SVE extract last to Neon lane extract for 128-bit VLS.

2025-04-25 Thread Jennifer Schmitz
For the test case
int32_t foo (svint32_t x)
{
 svbool_t pg = svpfalse ();
 return svlastb_s32 (pg, x);
}
compiled with -O3 -mcpu=grace -msve-vector-bits=128, GCC produced:
foo:
pfalse  p3.b
lastb   w0, p3, z0.s
ret
when it could use a Neon lane extract instead:
foo:
umovw0, v0.s[3]
ret

We implemented this optimization by guarding the emission of
pfalse+lastb in the pattern vec_extract by
known_gt (BYTES_PER_SVE_VECTOR, 16). Thus, for a last-extract operation
in 128-bit VLS, the pattern *vec_extract_v128 is used instead.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64-sve.md (vec_extract):
Prevent the emission of pfalse+lastb for 128-bit VLS.

gcc/testsuite/
* gcc.target/aarch64/sve/extract_last_128.c: New test.
---
gcc/config/aarch64/aarch64-sve.md |  7 ++--
.../gcc.target/aarch64/sve/extract_last_128.c | 33 +++
2 files changed, 37 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/extract_last_128.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 3dbd65986ec..824bd877e47 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2969,10 +2969,11 @@
  {
poly_int64 val;
if (poly_int_rtx_p (operands[2], &val)
-   && known_eq (val, GET_MODE_NUNITS (mode) - 1))
+   && known_eq (val, GET_MODE_NUNITS (mode) - 1)
+   && known_gt (BYTES_PER_SVE_VECTOR, 16))
  {
-   /* The last element can be extracted with a LASTB and a false
-  predicate.  */
+   /* Extract the last element with a LASTB and a false predicate.
+  Exclude 128-bit VLS to use *vec_extract_v128.  */
rtx sel = aarch64_pfalse_reg (mode);
emit_insn (gen_extract_last_ (operands[0], sel, operands[1]));
DONE;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/extract_last_128.c 
b/gcc/testsuite/gcc.target/aarch64/sve/extract_last_128.c
new file mode 100644
index 000..71d3561ec60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/extract_last_128.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -msve-vector-bits=128" } */
+
+#include 
+
+#define TEST(TYPE, TY) \
+  TYPE exract_last_##TY (sv##TYPE x)   \
+  {\
+svbool_t pg = svpfalse (); \
+return svlastb_##TY (pg, x);   \
+  }
+
+TEST(bfloat16_t, bf16)
+TEST(float16_t, f16)
+TEST(float32_t, f32)
+TEST(float64_t, f64)
+TEST(int8_t, s8)
+TEST(int16_t, s16)
+TEST(int32_t, s32)
+TEST(int64_t, s64)
+TEST(uint8_t, u8)
+TEST(uint16_t, u16)
+TEST(uint32_t, u32)
+TEST(uint64_t, u64)
+
+/* { dg-final { scan-assembler-times {\tdup\th0, v0\.h\[7\]} 2 } } */
+/* { dg-final { scan-assembler-times {\tdup\ts0, v0\.s\[3\]} 1 } } */
+/* { dg-final { scan-assembler-times {\tdup\td0, v0\.d\[1\]} 1 } } */
+/* { dg-final { scan-assembler-times {\tumov\tw0, v0\.h\[7\]} 2 } } */
+/* { dg-final { scan-assembler-times {\tumov\tw0, v0\.b\[15\]} 2 } } */
+/* { dg-final { scan-assembler-times {\tumov\tw0, v0\.s\[3\]} 2 } } */
+/* { dg-final { scan-assembler-times {\tumov\tx0, v0\.d\[1\]} 2 } } */
+/* { dg-final { scan-assembler-not "lastb" } } */
\ No newline at end of file
-- 
2.34.1



smime.p7s
Description: S/MIME cryptographic signature


[committed] libstdc++: Replace leftover std::queue with Adaptor in ranges/adaptors.cc tests.

2025-04-25 Thread Tomasz Kamiński
This was leftover from  work-in-progress state, where only std::queue was
tested.

libstdc++-v3/ChangeLog:

* testsuite/std/format/ranges/adaptors.cc: Updated test.
---
Tested on x86_64-linux.

 libstdc++-v3/testsuite/std/format/ranges/adaptors.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc 
b/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
index 854c7eef5bd..daa73aa39bf 100644
--- a/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
+++ b/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
@@ -88,7 +88,7 @@ test_output()
   VERIFY( res == WIDEN("==[0x03, 0x02, 0x01]===") );
 
   // Sequence output is always used
-  std::queue<_CharT, std::basic_string<_CharT>> qs(
+  Adaptor<_CharT, std::basic_string<_CharT>> qs(
 std::from_range,
 std::basic_string_view<_CharT>(WIDEN("321")));
 
-- 
2.49.0



[PATCH] c++/modules: Ensure DECL_FRIEND_CONTEXT is streamed [PR119939]

2025-04-25 Thread Nathaniel Shead
Tested so far on x86_64-pc-linux-gnu (just modules.exp), OK for trunk/15
if full bootstrap+regtest succeeds?

A potentially safer approach that would slightly bloat out the size of
the built modules would be to always stream this variable rather than
having any conditions, but from what I can tell this change should be
sufficient; happy to go that way if you prefer though.

-- >8 --

An instantiated friend function relies on DECL_FRIEND_CONTEXT being set
to be able to recover the template arguments of the class that
instantiated it, despite not being a template itself.  This patch
ensures that this data is streamed even when DECL_CLASS_SCOPE_P is not
true.

PR c++/119939

gcc/cp/ChangeLog:

* module.cc (trees_out::lang_decl_vals): Also stream
lang->u.fn.context when DECL_UNIQUE_FRIEND_P.
(trees_in::lang_decl_vals): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/concept-11_a.H: New test.
* g++.dg/modules/concept-11_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc| 4 ++--
 gcc/testsuite/g++.dg/modules/concept-11_a.H | 9 +
 gcc/testsuite/g++.dg/modules/concept-11_b.C | 9 +
 3 files changed, 20 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/concept-11_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/concept-11_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 5ff5c462e79..a2e0d6d2571 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -7386,7 +7386,7 @@ trees_out::lang_decl_vals (tree t)
WU (lang->u.fn.ovl_op_code);
}
 
-  if (DECL_CLASS_SCOPE_P (t))
+  if (DECL_CLASS_SCOPE_P (t) || DECL_UNIQUE_FRIEND_P (t))
WT (lang->u.fn.context);
 
   if (lang->u.fn.thunk_p)
@@ -7470,7 +7470,7 @@ trees_in::lang_decl_vals (tree t)
lang->u.fn.ovl_op_code = code;
}
 
-  if (DECL_CLASS_SCOPE_P (t))
+  if (DECL_CLASS_SCOPE_P (t) || DECL_UNIQUE_FRIEND_P (t))
RT (lang->u.fn.context);
 
   if (lang->u.fn.thunk_p)
diff --git a/gcc/testsuite/g++.dg/modules/concept-11_a.H 
b/gcc/testsuite/g++.dg/modules/concept-11_a.H
new file mode 100644
index 000..45127682812
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/concept-11_a.H
@@ -0,0 +1,9 @@
+// PR c++/119939
+// { dg-additional-options "-fmodule-header -std=c++20" }
+// { dg-module-cmi {} }
+
+template  concept A = true;
+template  concept B = requires { T{}; };
+template  struct S {
+  friend bool operator==(const S&, const S&) requires B = default;
+};
diff --git a/gcc/testsuite/g++.dg/modules/concept-11_b.C 
b/gcc/testsuite/g++.dg/modules/concept-11_b.C
new file mode 100644
index 000..3f6676ff965
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/concept-11_b.C
@@ -0,0 +1,9 @@
+// PR c++/119939
+// { dg-additional-options "-fmodules -std=c++20" }
+
+import "concept-11_a.H";
+
+int main() {
+  S s;
+  s == s;
+}
-- 
2.47.0



[wwwdocs] gcc-15: Add changes for Rust frontend

2025-04-25 Thread arthur . cohen
From: Arthur Cohen 

Content was validated using the Nu HTML checker per the contributing doc. 

---
 htdocs/gcc-15/changes.html | 57 ++
 1 file changed, 57 insertions(+)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 3e3c6655..10b1ce58 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -702,6 +702,63 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
   
 
 
+Rust
+
+  
+  Basic inline assembly support has been added to the frontend, which enables 
us to compile
+the architecture specific functions of core 1.49.
+  
+  
+  Support for for-loops has been added.
+  
+  
+  Fixes to our automatic dereferencing algorithm for Deref and
+DerefMut. This makes gccrs more correct and allow to 
handle
+complicated cases where the type-checker would previously fail.
+  
+  
+  Fixes to our indexing and iterator traits handling, which was required for
+for-loops to be properly implemented.
+  
+  
+  Our parser is now fully implemented and fully capable of parsing the 
entirety of
+core, alloc and std. It was still 
lacking in some
+areas, especially around unstable features like specialization.
+  
+  
+  Support for the question-mark operator has been added. This enables 
gccrs to
+handle all the error handling code and machinery often used in real world Rust 
programs, as
+well as in core.
+  
+  
+  Fixes to our macro expansion pass which now correctly expands all of 
core 1.49.
+This also includes fixes to our format_args!() handling code, 
which received
+numerous improvements.
+  
+  
+  Support for let-else has been added. While this is not used in 
core
+1.49, it is used in the Rust-for-Linux project, our next major 
objective for
+gccrs.
+  
+  
+  Support for the unstable specialization feature has been added. 
This is
+required for compiling core 1.49 correctly, in which 
specialization is used to
+improve the runtime performance of Rust binaries.
+  
+  
+  Support for more lang-items has been added
+  
+  
+  Lowered minimum required Rust version to 1.49. This allows more systems to 
compile the Rust
+frontend, and also brings us closer to gccrs compiling its own 
dependencies down
+the line.
+  
+  
+  Rewrite of our name resolution algorithm to properly handle the complex 
import/export
+structure used in core 1.49
+  
+
+
 
 New Targets and Target Specific Improvements
 
-- 
2.49.0



Re: [PATCH,LRA] Do inheritance transformations for any optimization [PR118591]

2025-04-25 Thread Vladimir Makarov



On 4/19/25 3:29 PM, Denis Chertykov wrote:

Bugfix for PR118591

This bug occurs only with '-Os' option.

The function 'inherit_reload_reg ()' have a wrong condition:

static bool
inherit_reload_reg (bool def_p, int original_regno,
enum reg_class cl, rtx_insn *insn, rtx next_usage_insns)
{
   if (optimize_function_for_size_p (cfun))
--
 return false;

It's wrong because we heed an inheritance and we need to undoing it after 
unsuccessful pass.


I applied the following patch:

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 7dbc7fe1e00..af2d2793159 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5884,7 +5884,11 @@ inherit_reload_reg (bool def_p, int original_regno,
enum reg_class cl, rtx_insn *insn, rtx next_usage_insns)
  {
if (optimize_function_for_size_p (cfun))
-return false;
+{
+  if (lra_dump_file != NULL)
+   fprintf (lra_dump_file,
+"<< inheritance for -Os <\n");
+}


Debug output from patched gcc:
--- Fragment from t.c.323r.reload --
** Inheritance #1: **

EBB 2
EBB 5
EBB 3
 << inheritance for -Os <
 <<
 Use smallest class of LD_REGS and GENERAL_REGS
   Creating newreg=59 from oldreg=43, assigning class LD_REGS to 
inheritance r59
 Original reg change 43->59 (bb3):
58: r59:SI=r57:SI
 Add original<-inheritance after:
60: r43:SI=r59:SI

 Inheritance reuse change 43->59 (bb3):
59: r58:SI=r59:SI
  
 << inheritance for -Os <
 <<
 Use smallest class of ALL_REGS and GENERAL_REGS
   Creating newreg=60 from oldreg=43, assigning class ALL_REGS to 
inheritance r60
 Original reg change 43->60 (bb3):
56: r56:QI=r60:SI#0
 Add inheritance<-original before:
61: r60:SI=r43:SI

 Inheritance reuse change 43->60 (bb3):
57: r57:SI=r60:SI
  
 << inheritance for -Os <
 <<
 Use smallest class of ALL_REGS and GENERAL_REGS
   Creating newreg=61 from oldreg=43, assigning class ALL_REGS to 
inheritance r61
 Original reg change 43->61 (bb3):
55: r55:QI=r61:SI#1
 Add inheritance<-original before:
62: r61:SI=r43:SI

 Inheritance reuse change 43->61 (bb3):
61: r60:SI=r61:SI
  
 << inheritance for -Os <
 <<
 Use smallest class of ALL_REGS and GENERAL_REGS
   Creating newreg=62 from oldreg=43, assigning class ALL_REGS to 
inheritance r62
 Original reg change 43->62 (bb3):
54: r54:QI=r62:SI#2
 Add inheritance<-original before:
63: r62:SI=r43:SI

 Inheritance reuse change 43->62 (bb3):
62: r61:SI=r62:SI
  
 << inheritance for -Os <
 <<
 Use smallest class of ALL_REGS and GENERAL_REGS
   Creating newreg=63 from oldreg=43, assigning class ALL_REGS to 
inheritance r63
 Original reg change 43->63 (bb3):
53: r53:QI=r63:SI#3
 Add inheritance<-original before:
64: r63:SI=r43:SI

 Inheritance reuse change 43->63 (bb3):
63: r62:SI=r63:SI
  
EBB 4

** Pseudo live ranges #1: **

   BB 4
Insn 43: point = 0, n_alt = -1


[...]

   Assign 24 to reload r56 (freq=2000)
   Reassigning non-reload pseudos
   Assign 24 to r43 (freq=3000)

** Undoing inheritance #1: **

Inherit 5 out of 5 (100.00%)

** Local #2: **

[...]



So, we need 'Inheritance' and we need 'Undoing inheritance'.

It is difficult for me to understand AVR code but I think the reason for 
the bug is in something else.  And the fix should be different.


Inheritance can increase the code size. In fact the code was added for 
PR59535 to solve code size regression for ARM Thumb.


Sorry, I can not approve the patch.


The patch:

PR rtl-optimization/118591
gcc/
* lra-constraints.cc (inherit_reload_reg): Do inheritance for any
optimization.


diff --git a/gcc/lra-constraints.cc b/gcc/lr

[PATCH] AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated instructions.

2025-04-25 Thread Jennifer Schmitz
SVE loads and stores where the predicate is all-true can be optimized to
unpredicated instructions. For example,
svuint8_t foo (uint8_t *x)
{
  return svld1 (svptrue_b8 (), x);
}
was compiled to:
foo:
ptrue   p3.b, all
ld1bz0.b, p3/z, [x0]
ret
but can be compiled to:
foo:
ldr z0, [x0]
ret

Late_combine2 had already been trying to do this, but was missing the
instruction:
(set (reg/i:VNx16QI 32 v0)
(unspec:VNx16QI [
(const_vector:VNx16BI repeat [
(const_int 1 [0x1])
])
(mem:VNx16QI (reg/f:DI 0 x0 [orig:106 x ] [106])
  [0 MEM  [(unsigned char *)x_2(D)]+0 S[16, 16] A8])
] UNSPEC_PRED_X))

This patch adds a new define_insn_and_split that matches the missing
instruction and splits it to an unpredicated load/store. Because LDR
offers fewer addressing modes than LD1[BHWD], the pattern is
guarded under reload_completed to only apply the transform once the
address modes have been chosen during RA.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64-sve.md (*aarch64_sve_ptrue_ldr_str):
Add define_insn_and_split to fold predicated SVE loads/stores with
ptrue predicates to unpredicated instructions.

gcc/testsuite/
* gcc.target/aarch64/sve/ptrue_ldr_str.c: New test.
* gcc.target/aarch64/sve/cost_model_14.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_4.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_5.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_6.c: Adjust expected outcome.
* gcc.target/aarch64/sve/cost_model_7.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_mf8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Adjust expected outcome.
* gcc.target/aarch64/sve/peel_ind_2.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_1.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_2.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_3.c: Adjust expected outcome.
* gcc.target/aarch64/sve/single_4.c: Adjust expected outcome.
---
 gcc/config/aarch64/aarch64-sve.md | 17 
 .../aarch64/sve/acle/general/attributes_6.c   |  8 +-
 .../gcc.target/aarch64/sve/cost_model_14.c|  4 +-
 .../gcc.target/aarch64/sve/cost_model_4.c |  3 +-
 .../gcc.target/aarch64/sve/cost_model_5.c |  3 +-
 .../gcc.target/aarch64/sve/cost_model_6.c |  3 +-
 .../gcc.target/aarch64/sve/cost_model_7.c |  3 +-
 .../aarch64/sve/pcs/varargs_2_f16.c   | 93 +--
 .../aarch64/sve/pcs/varargs_2_f32.c   | 93 +--
 .../aarch64/sve/pcs/varargs_2_f64.c   | 93 +--
 .../aarch64/sve/pcs/varargs_2_mf8.c   | 32 +++
 .../aarch64/sve/pcs/varargs_2_s16.c   | 93 +--
 .../aarch64/sve/pcs/varargs_2_s32.c   | 93 +--
 .../aarch64/sve/pcs/varargs_2_s64.c   | 93 +--
 .../gcc.target/aarch64/sve/pcs/varargs_2_s8.c | 34 +++
 .../aarch64/sve/pcs/varargs_2_u16.c   | 93 +--
 .../aarch64/sve/pcs/varargs_2_u32.c   | 93 +--
 .../aarch64/sve/pcs/varargs_2_u64.c   | 93 +--
 .../gcc.target/aarch64/sve/pcs/varargs_2_u8.c | 32 +++
 .../gcc.target/aarch64/sve/peel_ind_2.c   |  4 +-
 .../gcc.target/aarch64/sve/ptrue_ldr_str.c| 31 +++
 .../gcc.target/aarch64/sve/single_1.c | 11 ++-
 .../gcc.target/aarch64/sve/single_2.c | 11 ++-
 .../gcc.target/aarch64/sve/single_3.c | 11 ++-
 .../gcc.target/aarch64/sve/single_4.c | 11 ++-
 25 files changed, 907 insertions(+), 148 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ptrue_ldr_str.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index d4af3706294..03b7194d200 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.

Re: [PATCH RFC] c++: bad pending_template recursion

2025-04-25 Thread Patrick Palka
On Fri, 18 Apr 2025, Jason Merrill wrote:

> limit_bad_template_recursion currently avoids immediate instantiation of
> templates from uses in an already ill-formed instantiation, but we still can
> get unnecessary recursive instantiation in pending_templates if the
> instantiation was queued before the error.
> 
> Currently this regresses several libstdc++ tests which seem to rely on a
> static_assert in a function called from another that is separately ill-formed.
> For instance, in the 48101_neg.cc tests, we first get an error in find(), then
> later instantiate _S_key() (called from find) and get the static_assert error
> from there.
> 
> Thoughts?  Is this a desirable change, or is the fact that the use precedes 
> the
> error reason to go ahead with the instantiation?

Not saying we're at this point yet, but I do worry that being too
aggressive about avoiding error cascades could make the compiler less
noisy at the expense of making it less predictable, and make it more
likely that multiple edit+recompile cycles are needed before a TU is
error-free.

In recurse5.C the emitted error (in checked_add) and the suppressed
error (in add) are caused by the same bug so the suprression is clearly
an improvement, but if they're different bugs then the user will first
see only the add error, which they will fix and rightfully hope the TU
is now error-free, but recompilation will then reveal the suppressed
error in checked_add and the user might wonder why wasn't this error
wasn't reported during the first compile.

When it comes to avoiding too many errors from deferred instantiations
in particular, -fmax-errors works nicely for that since such errors are
naturally the last to be emitted.

> 
> > FAIL: 23_containers/map/48101_neg.cc  -std=gnu++17  (test for errors, line )
> > FAIL: 23_containers/multimap/48101_neg.cc  -std=gnu++17  (test for errors, 
> > line )
> > FAIL: 23_containers/multiset/48101_neg.cc  -std=gnu++17  (test for errors, 
> > line )
> > FAIL: 23_containers/set/48101_neg.cc  -std=gnu++17  (test for errors, line )
> > FAIL: 30_threads/packaged_task/cons/dangling_ref.cc  -std=gnu++17  (test 
> > for errors, line )
> > FAIL: 30_threads/packaged_task/cons/lwg4154_neg.cc  -std=gnu++17  (test for 
> > errors, line )
> 
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (struct tinst_level): Add had_errors bit.
>   * pt.cc (push_tinst_level_loc): Clear it.
>   (pop_tinst_level): Set it.
>   (reopen_tinst_level): Check it.
>   (instantiate_pending_templates): Call limit_bad_template_recursion.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/recurse5.C: New test.
> ---
>  gcc/cp/cp-tree.h | 10 --
>  gcc/cp/pt.cc | 10 --
>  gcc/testsuite/g++.dg/template/recurse5.C | 17 +
>  3 files changed, 33 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/recurse5.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 7798efba3db..856202c65dd 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -6755,8 +6755,14 @@ struct GTY((chain_next ("%h.next"))) tinst_level {
>/* The location where the template is instantiated.  */
>location_t locus;
>  
> -  /* errorcount + sorrycount when we pushed this level.  */
> -  unsigned short errors;
> +  /* errorcount + sorrycount when we pushed this level.  If the value
> + overflows, it will always seem like we currently have more errors, so we
> + will limit template recursion even from non-erroneous templates.  In a 
> TU
> + with over 32k errors, that's fine.  */
> +  unsigned short errors : 15;
> +
> +  /* set in pop_tinst_level if there have been errors since we pushed.  */
> +  bool had_errors : 1;
>  
>/* Count references to this object.  If refcount reaches
>   refcount_infinity value, we don't increment or decrement the
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index a71705fd085..e8d342f99f6 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -11418,6 +11418,7 @@ push_tinst_level_loc (tree tldcl, tree targs, 
> location_t loc)
>new_level->targs = targs;
>new_level->locus = loc;
>new_level->errors = errorcount + sorrycount;
> +  new_level->had_errors = false;
>new_level->next = NULL;
>new_level->refcount = 0;
>new_level->path = new_level->visible = nullptr;
> @@ -11468,6 +11469,9 @@ pop_tinst_level (void)
>/* Restore the filename and line number stashed away when we started
>   this instantiation.  */
>input_location = current_tinst_level->locus;
> +  if (unsigned errs = errorcount + sorrycount)
> +if (errs > current_tinst_level->errors)
> +  current_tinst_level->had_errors = true;
>set_refcount_ptr (current_tinst_level, current_tinst_level->next);
>--tinst_depth;
>  }
> @@ -11487,7 +11491,7 @@ reopen_tinst_level (struct tinst_level *level)
>  
>set_refcount_ptr (current_tinst_level, level);
>pop_tinst_level ();
> -  if (current_t

New French PO file for 'gcc' (version 15.1.0)

2025-04-25 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-15.1.0.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] simplify-rtx: Combine bitwise operations in more cases

2025-04-25 Thread Richard Sandiford
Pengfei Li  writes:
> This patch transforms RTL expressions of the form (subreg (not X) off)
> into (not (subreg X off)) when the subreg is an operand of a bitwise AND
> or OR. This transformation can expose opportunities to combine a NOT
> operation with the bitwise AND/OR.
>
> For example, it improves the codegen of the following AArch64 NEON
> intrinsics:
>   vandq_s64(vreinterpretq_s64_s32(vmvnq_s32(a)),
> vreinterpretq_s64_s32(b));
> from:
>   not v0.16b, v0.16b
>   and v0.16b, v0.16b, v1.16b
> to:
>   bic v0.16b, v1.16b, v0.16b
>
> Regression tested on x86_64-linux-gnu, arm-linux-gnueabihf and
> aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
>   * simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
> Add RTX simplification for bitwise AND/OR.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/simd/bic_orn_1.c: New test.

Thanks for the patch.  I agree we should have this simplification,
but some comments below.

> ---
>  gcc/simplify-rtx.cc   | 24 +++
>  .../gcc.target/aarch64/simd/bic_orn_1.c   | 17 +
>  2 files changed, 41 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/bic_orn_1.c
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 88d31a71c05..ed620ef5d45 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3738,6 +3738,18 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
> && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1))
>   return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
>  
> +  /* Convert (ior (subreg (not X) off) Y) into (ior (not (subreg X off)) 
> Y)
> +  to expose opportunities to combine IOR and NOT.  */
> +  if (GET_CODE (op0) == SUBREG
> +   && GET_CODE (SUBREG_REG (op0)) == NOT)
> + {
> +   rtx new_subreg = gen_rtx_SUBREG (mode,
> +XEXP (SUBREG_REG (op0), 0),
> +SUBREG_BYTE (op0));
> +   rtx new_not = simplify_gen_unary (NOT, mode, new_subreg, mode);
> +   return simplify_gen_binary (IOR, mode, new_not, op1);
> + }
> +
>tem = simplify_byte_swapping_operation (code, mode, op0, op1);
>if (tem)
>   return tem;
> @@ -4274,6 +4286,18 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
>   return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP 
> (op0, 1));
>   }
>  
> +  /* Convert (and (subreg (not X) off) Y) into (and (not (subreg X off)) 
> Y)
> +  to expose opportunities to combine AND and NOT.  */
> +  if (GET_CODE (op0) == SUBREG

I think we should also check !paradoxical_subreg_p (op0).  There are
two reasons:

(1) (and (subreg (something-narrower)) (const_int mask)) is a common
zero-extension idiom.  Pushing the subreg down into the first
operand would prevent that.

(2) Applying the rule in the paradoxical case would compute the NOT
in a wider mode, which might be more expensive.

> +   && GET_CODE (SUBREG_REG (op0)) == NOT)
> + {
> +   rtx new_subreg = gen_rtx_SUBREG (mode,
> +XEXP (SUBREG_REG (op0), 0),
> +SUBREG_BYTE (op0));

This should use simplify_gen_subreg instead of gen_rtx_SUBREG.

> +   rtx new_not = simplify_gen_unary (NOT, mode, new_subreg, mode);
> +   return simplify_gen_binary (AND, mode, new_not, op1);
> + }
> +

The NOT might be on op1 rather than op0 in cases where the other
operand is a nested AND, so I think we want to handle both op0 and op1
in the same way.

We might as well do the same thing for XOR at the same time.
It also seems worth splitting the code out into a subroutine,
to avoid cut-&-paste and accidental divergence.

Thanks,
Richard

>tem = simplify_byte_swapping_operation (code, mode, op0, op1);
>if (tem)
>   return tem;
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/bic_orn_1.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/bic_orn_1.c
> new file mode 100644
> index 000..1c66f21424e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/bic_orn_1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +int64x2_t bic_16b (int32x4_t a, int32x4_t b) {
> +  return vandq_s64 (vreinterpretq_s64_s32 (vmvnq_s32 (a)),
> + vreinterpretq_s64_s32 (b));
> +}
> +
> +int16x4_t orn_8b (int32x2_t a, int32x2_t b) {
> +  return vorr_s16 (vreinterpret_s16_s32 (a),
> +vreinterpret_s16_s32 (vmvn_s32 (b)));
> +}
> +
> +/* { dg-final { scan-assembler {\tbic\tv[0-9]+\.16b} } } */
> +/* { dg-final { scan-assembler {\torn\tv[0-9]+\.8b} } } */


[PATCH] libstdc++: Add some makefile dependencies

2025-04-25 Thread Jonathan Wakely
This ensures that wstring-inst.o and similar files will be rebuilt when
string-inst.cc changes.

libstdc++-v3/ChangeLog:

* src/c++11/Makefile.am: Add prerequisites for targets that
depend on string-inst.cc.
* src/c++11/Makefile.in: Regenerate.
---

We could make a lot of other changes like this, but for now I'm only
bothered by the one that affected me recently. We could also make .o
files depend on the headers they use, but that's a lot more work.

I've been using this in my local tree for a couple of weeks without
issues.

 libstdc++-v3/src/c++11/Makefile.am | 8 
 libstdc++-v3/src/c++11/Makefile.in | 8 
 2 files changed, 16 insertions(+)

diff --git a/libstdc++-v3/src/c++11/Makefile.am 
b/libstdc++-v3/src/c++11/Makefile.am
index b39115832e2..26d6fa0e01a 100644
--- a/libstdc++-v3/src/c++11/Makefile.am
+++ b/libstdc++-v3/src/c++11/Makefile.am
@@ -168,7 +168,15 @@ localename.lo: localename.cc
 localename.o: localename.cc
$(CXXCOMPILE) -fchar8_t -c $<
 
+wstring-inst.lo: wstring-inst.cc string-inst.cc
+wstring-inst.o: wstring-inst.cc string-inst.cc
+
 if ENABLE_DUAL_ABI
+cow-string-inst.lo: cow-string-inst.cc string-inst.cc
+cow-string-inst.o: cow-string-inst.cc string-inst.cc
+cow-wstring-inst.lo: cow-wstring-inst.cc string-inst.cc
+cow-wstring-inst.o: cow-wstring-inst.cc string-inst.cc
+
 # Rewrite the type info for __ios_failure.
 rewrite_ios_failure_typeinfo = sed -e 
'/^_*_ZTISt13__ios_failure:/,/_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'
 
diff --git a/libstdc++-v3/src/c++11/Makefile.in 
b/libstdc++-v3/src/c++11/Makefile.in
index 770e948a98a..dafdb260ec1 100644
--- a/libstdc++-v3/src/c++11/Makefile.in
+++ b/libstdc++-v3/src/c++11/Makefile.in
@@ -896,6 +896,14 @@ localename.lo: localename.cc
 localename.o: localename.cc
$(CXXCOMPILE) -fchar8_t -c $<
 
+wstring-inst.lo: wstring-inst.cc string-inst.cc
+wstring-inst.o: wstring-inst.cc string-inst.cc
+
+@enable_dual_abi_t...@cow-string-inst.lo: cow-string-inst.cc string-inst.cc
+@ENABLE_DUAL_ABI_TRUE@cow-string-inst.o: cow-string-inst.cc string-inst.cc
+@enable_dual_abi_t...@cow-wstring-inst.lo: cow-wstring-inst.cc string-inst.cc
+@ENABLE_DUAL_ABI_TRUE@cow-wstring-inst.o: cow-wstring-inst.cc string-inst.cc
+
 @ENABLE_DUAL_ABI_TRUE@cxx11-ios_failure-lt.s: cxx11-ios_failure.cc
 @ENABLE_DUAL_ABI_TRUE@ $(LTCXXCOMPILE) -gno-as-loc-support -S $< -o 
tmp-cxx11-ios_failure-lt.s
 @ENABLE_DUAL_ABI_TRUE@ -test -f tmp-cxx11-ios_failure-lt.o && mv -f 
tmp-cxx11-ios_failure-lt.o tmp-cxx11-ios_failure-lt.s
-- 
2.49.0



Re: [PATCH] OpenMP, GCN: Add interop-hsa testcase

2025-04-25 Thread Tobias Burnus

Andrew Stubbs wrote:

This testcase ensures that the interop HSA support is sufficient to run
a kernel manually on the same device.  It reuses an OpenMP kernel in
order to avoid all the complication of compiling a custom kernel in
Dejagnu (although, this does mean matching the OpenMP runtime
environment, which might be a maintenance issue in future).

OK for mainline and 15?

LGTM — thanks.

Tobias

PS: I think we are still officially in GCC 15 release freeze; but since 
GCC 15.1 is now released, happened, the branch should open up very soon.




[PATCH] s390: Allow 5+ argument tail-calls in some -m31 -mzarch special cases [PR119873]

2025-04-25 Thread Jakub Jelinek
Hi!

Here is a patch to handle the PARALLEL case too.
I think we can just use rtx_equal_p there, because it will always use
SImode in the EXPR_LIST REGs in that case.

Bootstrapped/regtested on s390x-linux, ok for trunk and 15.2 (with
CALL_EXPR_MUST_TAIL_CALL (call_expr) && added in that case)?

2025-04-25  Jakub Jelinek  

PR target/119873
* config/s390/s390.cc (s390_call_saved_register_used): Don't return
true if default definition of PARM_DECL SSA_NAME of the same register
is passed in call saved register in the PARALLEL case either.

* gcc.target/s390/pr119873-5.c: New test.

--- gcc/config/s390/s390.cc.jj  2025-04-24 20:04:23.252274117 +0200
+++ gcc/config/s390/s390.cc 2025-04-24 20:18:44.832835186 +0200
@@ -14524,7 +14524,17 @@ s390_call_saved_register_used (tree call
  gcc_assert (REG_NREGS (r) == 1);
 
  if (!call_used_or_fixed_reg_p (REGNO (r)))
-   return true;
+   {
+ rtx parm;
+ if (TREE_CODE (parameter) == SSA_NAME
+ && SSA_NAME_IS_DEFAULT_DEF (parameter)
+ && SSA_NAME_VAR (parameter)
+ && TREE_CODE (SSA_NAME_VAR (parameter)) == PARM_DECL
+ && (parm = DECL_INCOMING_RTL (SSA_NAME_VAR (parameter)))
+ && rtx_equal_p (parm_rtx, parm))
+   break;
+ return true;
+   }
}
}
 }
--- gcc/testsuite/gcc.target/s390/pr119873-5.c.jj   2025-04-24 
20:23:36.469962609 +0200
+++ gcc/testsuite/gcc.target/s390/pr119873-5.c  2025-04-24 20:24:03.078609253 
+0200
@@ -0,0 +1,11 @@
+/* PR target/119873 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -m31 -mzarch" } */
+
+extern void foo (int x, int y, int z, long long w, int v);
+
+void
+bar (int x, int y, int z, long long w, int v)
+{
+  [[gnu::musttail]] return foo (x, y, z, w, v);
+}

Jakub



Re: [PATCH v2] Document AArch64 changes for GCC 15

2025-04-25 Thread Kyrylo Tkachov


> On 25 Apr 2025, at 12:06, Richard Sandiford  wrote:
> 
> Kyrylo Tkachov  writes:
>> Hi Richard,
>> 
>>> On 23 Apr 2025, at 13:47, Richard Sandiford  
>>> wrote:
>>> 
>>> Thanks for all the feedback.  I've tried to address it in the version
>>> below.  I'll push later today if there are no further comments.
>>> 
>>> Richard
>>> 
>>> 
>>> The list is structured as:
>>> 
>>> - new configurations
>>> - command-line changes
>>> - ACLE changes
>>> - everything else
>>> 
>>> As usual, the list of new architectures, CPUs, and features is from a
>>> purely mechanical trawl of the associated .def files.  I've identified
>>> features by their architectural name to try to improve searchability.
>>> Similarly, the list of ACLE changes includes the associated ACLE
>>> feature macros, again to try to improve searchability.
>>> 
>>> The list summarises some of the target-specific optimisations because
>>> it sounded like Tamar had received feedback that people found such
>>> information interesting.
>>> 
>>> I've used the passive tense for most entries, to try to follow the
>>> style used elsewhere.
>>> 
>>> We don't yet define __ARM_FEATURE_FAMINMAX, but I'll fix that
>>> separately.
>> 
>> Thanks again for doing this…
>> 
>>> 
>>> +  
>>> +  Support has been added for the following features of the Arm C
>>> +Language Extensions
>>> +(https://github.com/ARM-software/acle";>ACLE):
>>> +
>>> +  guarded control stacks
>>> +  lookup table instructions with 2-bit and 4-bit indices
>>> +(predefined macro
>>> +__ARM_FEATURE_LUT, enabled by +lut)
>>> +  
>>> +  floating-point absolute minimum and maximum instructions
>>> +(predefined macro __ARM_FEATURE_FAMINMAX,
>>> +enabled by +faminmax)
>>> +  
>>> +  FP8 conversions (predefined macro
>>> +__ARM_FEATURE_FP8, enabled by +fp8)
>>> +  
>>> +  FP8 2-way dot product to half precision instructions
>>> +(predefined macro __ARM_FEATURE_FP8DOT2,
>>> +enabled by +fp8dot2)
>>> +  
>>> +  FP8 4-way dot product to single precision instructions
>>> +(predefined macro __ARM_FEATURE_FP8DOT4,
>>> +enabled by +fp8dot4)
>>> +  
>>> +  FP8 multiply-accumulate to half precision and single precision
>>> +instructions (predefined macro __ARM_FEATURE_FP8FMA,
>>> +enabled by +fp8fma)
>>> +  
>>> +  SVE FP8 2-way dot product to half precision instructions
>>> +(predefined macro __ARM_FEATURE_SSVE_FP8DOT2,
>>> +enabled by +ssve-fp8dot2)
>>> +  
>>> +  SVE FP8 4-way dot product to single precision instructions
>>> +(predefined macro __ARM_FEATURE_SSVE_FP8DOT4,
>>> +enabled by +ssve-fp8dot4)
>>> +  
>>> +  SVE FP8 multiply-accumulate to half precision and single 
>>> precision
>>> +instructions (predefined macro 
>>> __ARM_FEATURE_SSVE_FP8FMA,
>>> +enabled by +ssve-fp8fma)
>> 
>> 
>> … Should these FP8 entries say “SSVE FP8” rather than “SVE FP8”?
> 
> The official description is "SVE(2) ... instructions in Streaming
> SVE mode".  But yeah, I suppose dropping the "in Streaming SVE mode"
> was a mistake.  I've pushed the following incremental patch.

Thanks, that looks clearer.
Kyrill


> 
> Thanks,
> Richard
> 
> 
> diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
> index a71249ff..3cec4ff4 100644
> --- a/htdocs/gcc-15/changes.html
> +++ b/htdocs/gcc-15/changes.html
> @@ -847,17 +847,20 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
> instructions (predefined macro __ARM_FEATURE_FP8FMA,
> enabled by +fp8fma)
>   
> -  SVE FP8 2-way dot product to half precision instructions
> -(predefined macro __ARM_FEATURE_SSVE_FP8DOT2,
> -enabled by +ssve-fp8dot2)
> +  SVE FP8 2-way dot product to half precision instructions in
> +Streaming SVE mode (predefined macro
> +__ARM_FEATURE_SSVE_FP8DOT2, enabled by
> ++ssve-fp8dot2)
>   
> -  SVE FP8 4-way dot product to single precision instructions
> -(predefined macro __ARM_FEATURE_SSVE_FP8DOT4,
> -enabled by +ssve-fp8dot4)
> +  SVE FP8 4-way dot product to single precision instructions in
> +Streaming SVE mode (predefined macro
> +__ARM_FEATURE_SSVE_FP8DOT4, enabled by
> ++ssve-fp8dot4)
>   
>   SVE FP8 multiply-accumulate to half precision and single precision
> -instructions (predefined macro 
> __ARM_FEATURE_SSVE_FP8FMA,
> -enabled by +ssve-fp8fma)
> +instructions in Streaming SVE mode (predefined macro
> +__ARM_FEATURE_SSVE_FP8FMA, enabled by
> ++ssve-fp8fma)
>   
>   SVE2.1 instructions (predefined macro
> __ARM_FEATURE_SVE2p1, enabled by +sve2p1)
> -- 
> 2.43.0
> 



Re: [PATCH] libstdc++: Minimalize temporary allocations when width is specified [PR109162]

2025-04-25 Thread Tomasz Kaminski
On Fri, Apr 25, 2025 at 11:49 AM Jonathan Wakely  wrote:

> On 23/04/25 13:56 +0200, Tomasz Kamiński wrote:
> >When width parameter is specified for formatting range, tuple or escaped
> >presentation of string, we used to format characters to temporary string,
> >and write produce sequence padded according to the spec. However, once the
> >estimated width of formatted representation of input is larger than the
> value
> >of spec width, it can be written directly to the output. This limits size
> of
> >required allocation, especially for large ranges.
> >
> >Similarly, if precision (maximum) width is provided for string
> presentation,
> >on a prefix of sequence with estimated width not greater than precision,
> needs
> >to be buffered.
> >
> >To realize above, this commit implements a new _Padding_sink
> specialization.
> >This sink holds an output iterator, a value of padding width, (optionally)
> >maximum width and a string buffer inherited from _Str_sink.
> >Then any incoming characters are treated in one of following ways,
> depending of
> >estimated width W of written sequence:
> >* written to string if W is smaller than padding width and maximum width
> (if present)
> >* ignored, if W is greater than maximum width
> >* written to output iterator, if W is greater than padding width
> >
> >The padding sink is used instead of _Str_sink in
> __format::__format_padded,
> >__formatter_str::_M_format_escaped functions.
> >
> >Furthermore __formatter_str::_M_format implementation was reworked, to:
> >* reduce number of instantiations by delegating to _Rg& and const _Rg&
> overloads,
> >* non-debug presentation is written to _Out directly or via _Padding_sink
> >* if maximum width is specified for debug format with non-unicode
> encoding,
> >  string size is limited to that number.
> >
> >   PR libstdc++/109162
> >
> >libstdc++-v3/ChangeLog:
> >
> >   * include/bits/formatfwd.h (__simply_formattable_range): Moved from
> >   std/format.
> >   * include/std/format (__formatter_str::_format): Extracted escaped
> >   string handling to separate method...
> >   (__formatter_str::_M_format_escaped): Use __Padding_sink.
> >   (__formatter_str::_M_format): Adjusted implementation.
> >   (__formatter_str::_S_trunc): Extracted as namespace function...
> >   (__format::_truncate): Extracted from __formatter_str::_S_trunc.
> >   (__format::_Seq_sink): Removed forward declarations, made members
> >   protected and non-final.
> >   (_Seq_sink::_M_trim): Define.
> >   (_Seq_sink::_M_span): Renamed from view.
> >   (_Seq_sink::view): Returns string_view instead of span.
> >   (__format::_Str_sink): Moved after _Seq_sink.
> >   (__format::__format_padded): Use _Padding_sink.
> >   * testsuite/std/format/debug.cc: Add timeout and new tests.
> >   * testsuite/std/format/ranges/sequence.cc: Specify unicode as
> >   encoding and new tests.
> >   * testsuite/std/format/ranges/string.cc: Likewise.
> >   * testsuite/std/format/tuple.cc: Likewise.
> >---
> >This is for sure 16 material, and nothing to backport.
> >This addressed the TODO I created in __format_padded.
> >OK for trunk after 15.1?
>
>
> This is a nice improvement.
>
> OK with the spelling and minor tweaks mentioned below ...
>
>
> > libstdc++-v3/include/bits/formatfwd.h |   8 +
> > libstdc++-v3/include/std/format   | 396 +-
> > libstdc++-v3/testsuite/std/format/debug.cc| 386 -
> > .../testsuite/std/format/ranges/sequence.cc   | 116 +
> > .../testsuite/std/format/ranges/string.cc |  63 +++
> > libstdc++-v3/testsuite/std/format/tuple.cc|  93 
> > 6 files changed, 957 insertions(+), 105 deletions(-)
> >
> >diff --git a/libstdc++-v3/include/bits/formatfwd.h
> b/libstdc++-v3/include/bits/formatfwd.h
> >index 9ba658b078a..2d54ee5d30b 100644
> >--- a/libstdc++-v3/include/bits/formatfwd.h
> >+++ b/libstdc++-v3/include/bits/formatfwd.h
> >@@ -131,6 +131,14 @@ namespace __format
> >   = ranges::input_range
> > && formattable, _CharT>;
> >
> >+  // _Rg& and const _Rg& are both formattable and use same formatter
> >+  // specialization for their references.
> >+  template
> >+concept __simply_formattable_range
> >+  = __const_formattable_range<_Rg, _CharT>
> >+&& same_as>,
> >+   remove_cvref_t>>;
> >+
> >   template
> > using __maybe_const_range
> >   = __conditional_t<__const_formattable_range<_Rg, _CharT>, const
> _Rg, _Rg>;
> >diff --git a/libstdc++-v3/include/std/format
> b/libstdc++-v3/include/std/format
> >index 7d3067098be..355db5f2a60 100644
> >--- a/libstdc++-v3/include/std/format
> >+++ b/libstdc++-v3/include/std/format
> >@@ -56,7 +56,7 @@
> > #include   // input_range, range_reference_t
> > #include   // subrange
> > #include  // ranges::copy
> >-#include  // back_insert_iterator
> >+#include  // back_insert_iterator, counted_iterator
> > #include  // __is_pair

Re: [PATCH] libstdc++: Add some makefile dependencies

2025-04-25 Thread Tomasz Kaminski
On Fri, Apr 25, 2025 at 12:55 PM Jonathan Wakely  wrote:

> This ensures that wstring-inst.o and similar files will be rebuilt when
> string-inst.cc changes.
>
> libstdc++-v3/ChangeLog:
>
> * src/c++11/Makefile.am: Add prerequisites for targets that
> depend on string-inst.cc.
> * src/c++11/Makefile.in: Regenerate.
> ---
>
> We could make a lot of other changes like this, but for now I'm only
> bothered by the one that affected me recently. We could also make .o
> files depend on the headers they use, but that's a lot more work.
>
> I've been using this in my local tree for a couple of weeks without
> issues.
>
LGTM.

>
>  libstdc++-v3/src/c++11/Makefile.am | 8 
>  libstdc++-v3/src/c++11/Makefile.in | 8 
>  2 files changed, 16 insertions(+)
>
> diff --git a/libstdc++-v3/src/c++11/Makefile.am
> b/libstdc++-v3/src/c++11/Makefile.am
> index b39115832e2..26d6fa0e01a 100644
> --- a/libstdc++-v3/src/c++11/Makefile.am
> +++ b/libstdc++-v3/src/c++11/Makefile.am
> @@ -168,7 +168,15 @@ localename.lo: localename.cc
>  localename.o: localename.cc
> $(CXXCOMPILE) -fchar8_t -c $<
>
> +wstring-inst.lo: wstring-inst.cc string-inst.cc
> +wstring-inst.o: wstring-inst.cc string-inst.cc
> +
>  if ENABLE_DUAL_ABI
> +cow-string-inst.lo: cow-string-inst.cc string-inst.cc
> +cow-string-inst.o: cow-string-inst.cc string-inst.cc
> +cow-wstring-inst.lo: cow-wstring-inst.cc string-inst.cc
> +cow-wstring-inst.o: cow-wstring-inst.cc string-inst.cc
> +
>  # Rewrite the type info for __ios_failure.
>  rewrite_ios_failure_typeinfo = sed -e
> '/^_*_ZTISt13__ios_failure:/,/_ZTVN10__cxxabiv120__si_class_type_infoE/s/_ZTVN10__cxxabiv120__si_class_type_infoE/_ZTVSt19__iosfail_type_info/'
>
> diff --git a/libstdc++-v3/src/c++11/Makefile.in
> b/libstdc++-v3/src/c++11/Makefile.in
> index 770e948a98a..dafdb260ec1 100644
> --- a/libstdc++-v3/src/c++11/Makefile.in
> +++ b/libstdc++-v3/src/c++11/Makefile.in
> @@ -896,6 +896,14 @@ localename.lo: localename.cc
>  localename.o: localename.cc
> $(CXXCOMPILE) -fchar8_t -c $<
>
> +wstring-inst.lo: wstring-inst.cc string-inst.cc
> +wstring-inst.o: wstring-inst.cc string-inst.cc
> +
> +@enable_dual_abi_t...@cow-string-inst.lo: cow-string-inst.cc
> string-inst.cc
> +@ENABLE_DUAL_ABI_TRUE@cow-string-inst.o: cow-string-inst.cc
> string-inst.cc
> +@enable_dual_abi_t...@cow-wstring-inst.lo: cow-wstring-inst.cc
> string-inst.cc
> +@ENABLE_DUAL_ABI_TRUE@cow-wstring-inst.o: cow-wstring-inst.cc
> string-inst.cc
> +
>  @ENABLE_DUAL_ABI_TRUE@cxx11-ios_failure-lt.s: cxx11-ios_failure.cc
>  @ENABLE_DUAL_ABI_TRUE@ $(LTCXXCOMPILE) -gno-as-loc-support -S $< -o
> tmp-cxx11-ios_failure-lt.s
>  @ENABLE_DUAL_ABI_TRUE@ -test -f tmp-cxx11-ios_failure-lt.o && mv -f
> tmp-cxx11-ios_failure-lt.o tmp-cxx11-ios_failure-lt.s
> --
> 2.49.0
>
>


Re: [PATCH] s390: Allow 5+ argument tail-calls in some -m31 -mzarch special cases [PR119873]

2025-04-25 Thread Stefan Schulze Frielinghaus
On Fri, Apr 25, 2025 at 01:21:46PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> Here is a patch to handle the PARALLEL case too.
> I think we can just use rtx_equal_p there, because it will always use
> SImode in the EXPR_LIST REGs in that case.
> 
> Bootstrapped/regtested on s390x-linux, ok for trunk and 15.2 (with
> CALL_EXPR_MUST_TAIL_CALL (call_expr) && added in that case)?

Ok.  Thanks for taking care of this!

Cheers,
Stefan


Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-25 Thread Qing Zhao


> On Apr 24, 2025, at 19:56, Kees Cook  wrote:
> 
> 
> 
> On April 24, 2025 1:44:23 PM PDT, Qing Zhao  wrote:
>> 
>> 
>>> On Apr 24, 2025, at 15:43, Bill Wendling  wrote:
>>> 
>>> On Thu, Apr 24, 2025 at 8:15 AM Qing Zhao  wrote:
 
 Hi,
 
 Kees reported a segmentation failure when he used the patch to compiler 
 kernel,
 and the reduced the testing case is something like the following:
 
 struct f {
 void *g __attribute__((__counted_by__(h)));
 long h;
 };
 
 extern struct f *my_alloc (int);
 
 int i(void) {
 struct f *iov = my_alloc (10);
 int *j = (int *)iov->g;
 return __builtin_dynamic_object_size(iov->g, 0);
 }
 
 Basically, the problem is relating to the pointee type of the pointer 
 array being “void”,
 As a result, the element size of the array is not available in the IR. 
 Therefore segmentation
 fault when calculating the size of the whole object.
 
 Although it’s easy to fix this segmentation failure, I am not quite sure 
 what’s the best
 solution to this issue:
 
 1. Reject such usage of “counted_by” in the very beginning by reporting 
 warning to the
 User, and delete the counted_by attribute from the field.
 
 Or:
 
 2. Accept such usage, but issue warnings when calculating the object_size 
 in Middle-end.
 
 Personally, I prefer the above 1 since I think that when the pointee type 
 is void, we don’t know
 The type of the element of the pointer array, there is no way to decide 
 the size of the pointer array.
 
 So, the counted_by information is not useful for the 
 __builtin_dynamic_object_size.
 
 But I am not sure whether the counted_by still can be used for bound 
 sanitizer?
 
 Thanks for suggestions and help.
 
>>> Clang supports __sized_by that can handle a 'void *', where it defaults to 
>>> 'u8'.
> 
> I would like to be able to use counted_by (and not sized_by) so that users of 
> the annotation don't need to have to change the marking just because it's 
> "void *". Everything else operates on "void *" as if it were u8 ...
> 
> Regardless, ignoring "void *", the rest of my initial testing (of both GCC 
> and Clang) is positive. The test cases are all behaving as expected! Yay! :) 
> I will try to construct some more goofy stuff to find more corner cases.

Thanks a lot for the help!
> 
> And at some future point we may want to think about 
> -fsanitize=pointer-overflow using this information too, to catch arithmetic 
> and increments past the bounds...
> 
> struct foo {
>  u8 *buf __counted_by(len);
>  int len;
> } str;
> u8 *walk;
> str->buf = malloc(10);
> str->len = 10;
> 
> walk = str->buf + 12; // trip!
> for (walk = str->buf; ; walk++) // trip after 10 loops
>   ;
> 
Add this to my todo list.  -:)

thanks.

Qing
> 
> -Kees
> 
> -- 
> Kees Cook




[PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

2025-04-25 Thread Jennifer Schmitz
If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a
ptrue predicate can be replaced by neon instructions (LDR and STR),
thus avoiding the predicate altogether. This also enables formation of
LDP/STP pairs.

For example, the test cases

svfloat64_t
ptrue_load (float64_t *x)
{
  svbool_t pg = svptrue_b64 ();
  return svld1_f64 (pg, x);
}
void
ptrue_store (float64_t *x, svfloat64_t data)
{
  svbool_t pg = svptrue_b64 ();
  return svst1_f64 (pg, x, data);
}

were previously compiled to
(with -O2 -march=armv8.2-a+sve -msve-vector-bits=128):

ptrue_load:
ptrue   p3.b, vl16
ld1dz0.d, p3/z, [x0]
ret
ptrue_store:
ptrue   p3.b, vl16
st1dz0.d, p3, [x0]
ret

Now the are compiled to:

ptrue_load:
ldr q0, [x0]
ret
ptrue_store:
str q0, [x0]
ret

The implementation includes the if-statement
if (known_eq (BYTES_PER_SVE_VECTOR, 16)
&& known_eq (GET_MODE_SIZE (mode), 16))
which checks for 128-bit VLS and excludes partial modes with a
mode size < 128 (e.g. VNx2QI).

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move):
Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS.

gcc/testsuite/
* gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test.
* gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome.
* gcc.target/aarch64/sve/pst/return_4_128.c: Likewise.
* gcc.target/aarch64/sve/pst/return_5_128.c: Likewise.
* gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc | 27 ++--
 .../gcc.target/aarch64/sve/cond_arith_6.c |  3 +-
 .../aarch64/sve/ldst_ptrue_128_to_neon.c  | 36 +++
 .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 ---
 .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 ---
 .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 64 +--
 6 files changed, 102 insertions(+), 106 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f7bccf532f8..ac01149276b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -6416,13 +6416,28 @@ aarch64_stack_protect_canary_mem (machine_mode mode, 
rtx decl_rtl,
 void
 aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
 {
-  expand_operand ops[3];
   machine_mode mode = GET_MODE (dest);
-  create_output_operand (&ops[0], dest, mode);
-  create_input_operand (&ops[1], pred, GET_MODE(pred));
-  create_input_operand (&ops[2], src, mode);
-  temporary_volatile_ok v (true);
-  expand_insn (code_for_aarch64_pred_mov (mode), 3, ops);
+  if ((MEM_P (dest) || MEM_P (src))
+  && known_eq (BYTES_PER_SVE_VECTOR, 16)
+  && known_eq (GET_MODE_SIZE (mode), 16)
+  && !BYTES_BIG_ENDIAN)
+{
+  rtx tmp = gen_reg_rtx (V16QImode);
+  emit_move_insn (tmp, lowpart_subreg (V16QImode, src, mode));
+  if (MEM_P (src))
+   emit_move_insn (dest, lowpart_subreg (mode, tmp, V16QImode));
+  else
+   emit_move_insn (adjust_address (dest, V16QImode, 0), tmp);
+}
+  else
+{
+  expand_operand ops[3];
+  create_output_operand (&ops[0], dest, mode);
+  create_input_operand (&ops[1], pred, GET_MODE(pred));
+  create_input_operand (&ops[2], src, mode);
+  temporary_volatile_ok v (true);
+  expand_insn (code_for_aarch64_pred_mov (mode), 3, ops);
+}
 }
 
 /* Expand a pre-RA SVE data move from SRC to DEST in which at least one
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c
index 4085ab12444..d5a12f1df07 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c
@@ -8,7 +8,8 @@ f (float *x)
   x[i] -= 1.0f;
 }
 
-/* { dg-final { scan-assembler {\tld1w\tz} } } */
+/* { dg-final { scan-assembler {\tld1w\tz} { target aarch64_big_endian } } } */
+/* { dg-final { scan-assembler {\tldr\tq} { target aarch64_little_endian } } } 
*/
 /* { dg-final { scan-assembler {\tfcmgt\tp} } } */
 /* { dg-final { scan-assembler {\tfsub\tz} } } */
 /* { dg-final { scan-assembler {\tst1w\tz} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c 
b/gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c
new file mode 100644
index 000..69f42b121ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=128" } */
+/* { dg-require-effective-target aarch64_little_endian } */
+
+#include 
+
+#define TEST(TYPE, TY, B)  \
+  sv##TYPE \
+  ld1_##TY##B (TYPE *x)

Re: [committed] libstdc++: Replace leftover std::queue with Adaptor in ranges/adaptors.cc tests.

2025-04-25 Thread Jonathan Wakely
On Fri, 25 Apr 2025 at 15:31, Tomasz Kamiński  wrote:
>
> This was leftover from  work-in-progress state, where only std::queue was
> tested.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/std/format/ranges/adaptors.cc: Updated test.
> ---
> Tested on x86_64-linux.

OK for trunk and gcc-15.

>
>  libstdc++-v3/testsuite/std/format/ranges/adaptors.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc 
> b/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
> index 854c7eef5bd..daa73aa39bf 100644
> --- a/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
> +++ b/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
> @@ -88,7 +88,7 @@ test_output()
>VERIFY( res == WIDEN("==[0x03, 0x02, 0x01]===") );
>
>// Sequence output is always used
> -  std::queue<_CharT, std::basic_string<_CharT>> qs(
> +  Adaptor<_CharT, std::basic_string<_CharT>> qs(
>  std::from_range,
>  std::basic_string_view<_CharT>(WIDEN("321")));
>
> --
> 2.49.0
>



[PUSHED] phiopt: Remove calls.h include [PR119811]

2025-04-25 Thread Andrew Pinski
When the patch, 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660807.html was rewroked 
into r15-3047-g404d947d8ddd3c,
the include for calls.h was still included and missed that it was no longer 
needed.

Pushed as obvious.

PR tree-optimization/119811
gcc/ChangeLog:

* tree-ssa-phiopt.cc: Remove calls.h include.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index a194bf675e4..e27166c55a5 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -54,7 +54,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "tree-ssa-propagate.h"
 #include "tree-ssa-dce.h"
-#include "calls.h"
 #include "tree-ssa-loop-niter.h"
 
 /* Return the singleton PHI in the SEQ of PHIs for edges E0 and E1. */
-- 
2.43.0



[committed] cobol: New testcases.

2025-04-25 Thread Robert Dubner
Fifty-eight new testcases for COBOL.  These cover a range of fundamental
operations and data representation.

>From 591831dcd4bc9cb9c089d952e73ec8bfcb6cb3fb Mon Sep 17 00:00:00 2001
From: Robert Dubner mailto:rdub...@symas.com
Date: Fri, 25 Apr 2025 10:19:35 -0400
Subject: [PATCH] cobol: New testcases.

These testcases are derived from the cobolworx run_fundamental.at file.

gcc/testsuite

* cobol.dg/group2/88_level_with_FALSE_IS_clause.cob: New testcase.
* cobol.dg/group2/88_level_with_FILLER.cob: Likewise.
* cobol.dg/group2/88_level_with_THRU.cob: Likewise.
* cobol.dg/group2/ADD_CORRESPONDING.cob: Likewise.
* cobol.dg/group2/ADD_SUBTRACT_CORR_mixed_fix___float.cob:
Likewise.
* cobol.dg/group2/ALPHABETIC-LOWER_test.cob: Likewise.
* cobol.dg/group2/ALPHABETIC_test.cob: Likewise.
* cobol.dg/group2/ALPHABETIC-UPPER_test.cob: Likewise.
* cobol.dg/group2/BLANK_WHEN_ZERO.cob: Likewise.
* cobol.dg/group2/Check_for_equality_of_COMP-1___COMP-2.cob:
Likewise.
* cobol.dg/group2/Compare_COMP-2_with_floating-point_literal.cob:
Likewise.
* cobol.dg/group2/Contained_program_visibility__3_.cob: Likewise.
* cobol.dg/group2/Contained_program_visibility__4_.cob: Likewise.
* cobol.dg/group2/Context_sensitive_words__1_.cob: Likewise.
* cobol.dg/group2/Context_sensitive_words__2_.cob: Likewise.
* cobol.dg/group2/Context_sensitive_words__3_.cob: Likewise.
* cobol.dg/group2/Context_sensitive_words__4_.cob: Likewise.
* cobol.dg/group2/Context_sensitive_words__5_.cob: Likewise.
* cobol.dg/group2/Context_sensitive_words__6_.cob: Likewise.
* cobol.dg/group2/Context_sensitive_words__7_.cob: Likewise.
* cobol.dg/group2/Context_sensitive_words__8_.cob: Likewise.
* cobol.dg/group2/debugging_lines__not_active_.cob: Likewise.
* cobol.dg/group2/debugging_lines__WITH_DEBUGGING_MODE_.cob:
Likewise.
* cobol.dg/group2/DEBUG_Line.cob: Likewise.
* cobol.dg/group2/DISPLAY_and_assignment_NumericDisplay.cob:
Likewise.
* cobol.dg/group2/DISPLAY_data_items_with_MOVE_statement.cob:
Likewise.
* cobol.dg/group2/DISPLAY_data_items_with_VALUE_clause.cob:
Likewise.
* cobol.dg/group2/DISPLAY_literals__DECIMAL-POINT_is_COMMA.cob:
Likewise.
* cobol.dg/group2/GLOBAL_at_lower_level.cob: Likewise.
* cobol.dg/group2/GLOBAL_at_same_level.cob: Likewise.
* cobol.dg/group2/GLOBAL_FD__1_.cob: Likewise.
* cobol.dg/group2/GLOBAL_FD__2_.cob: Likewise.
* cobol.dg/group2/GLOBAL_FD__3_.cob: Likewise.
* cobol.dg/group2/GLOBAL_FD__4_.cob: Likewise.
* cobol.dg/group2/Hexadecimal_literal.cob: Likewise.
* cobol.dg/group2/integer_arithmetic_on_floating-point_var.cob:
Likewise.
* cobol.dg/group2/MULTIPLY_BY_literal_in_INITIAL_program.cob:
Likewise.
*
cobol.dg/group2/Named_conditionals_-_fixed__float__and_alphabetic.cob:
Likewise.
* cobol.dg/group2/Numeric_operations__1_.cob: Likewise.
* cobol.dg/group2/Numeric_operations__2_.cob: Likewise.
* cobol.dg/group2/Numeric_operations__3_.cob: Likewise.
* cobol.dg/group2/Numeric_operations__4_.cob: Likewise.
* cobol.dg/group2/Numeric_operations__5_.cob: Likewise.
* cobol.dg/group2/Numeric_operations__7_.cob: Likewise.
* cobol.dg/group2/Numeric_operations__8_.cob: Likewise.
* cobol.dg/group2/ROUNDED_AWAY-FROM-ZERO.cob: Likewise.
* cobol.dg/group2/ROUNDED_NEAREST-AWAY-FROM-ZERO.cob: Likewise.
* cobol.dg/group2/ROUNDED_NEAREST-EVEN.cob: Likewise.
* cobol.dg/group2/ROUNDED_NEAREST-TOWARD-ZERO.cob: Likewise.
* cobol.dg/group2/ROUNDED_TOWARD-GREATER.cob: Likewise.
* cobol.dg/group2/ROUNDED_TOWARD-LESSER.cob: Likewise.
* cobol.dg/group2/ROUNDED_TRUNCATION.cob: Likewise.
*
cobol.dg/group2/ROUNDING_omnibus_Floating-Point_from_COMPUTE.cob:
Likewise.
*
cobol.dg/group2/ROUNDING_omnibus_NumericDisplay_from_COMPUTE.cob:
Likewise.
* cobol.dg/group2/Separate_sign_positions__1_.cob: Likewise.
* cobol.dg/group2/Separate_sign_positions__2_.cob: Likewise.
* cobol.dg/group2/Simple_p-scaling.cob: Likewise.
* cobol.dg/group2/Simple_TYPEDEF.cob: Likewise.
* cobol.dg/group2/ADD_SUBTRACT_CORR_mixed_fix___float.out: New
known-good result.
* cobol.dg/group2/BLANK_WHEN_ZERO.out: Likewise.
* cobol.dg/group2/Contained_program_visibility__4_.out: Likewise.
* cobol.dg/group2/Context_sensitive_words__1_.out: Likewise.
* cobol.dg/group2/Context_sensitive_words__2_.out: Likewise.
* cobol.dg/group2/Context_sensitive_words__3_.out: Likewise.
* cobol.dg/group2/Context_sensitive_words__4_.out: Likewise.
* cobol.dg/group2/Context_sensitive_words__5_.out: Likewise.
* cobol.dg/group2/Context_sensitive_words__6_.out: Likewise.
* cobol.dg/group2/Cont

Re: [committed] libstdc++: Replace leftover std::queue with Adaptor in ranges/adaptors.cc tests.

2025-04-25 Thread Jonathan Wakely
On Fri, 25 Apr 2025 at 15:41, Jonathan Wakely  wrote:
>
> On Fri, 25 Apr 2025 at 15:31, Tomasz Kamiński  wrote:
> >
> > This was leftover from  work-in-progress state, where only std::queue was
> > tested.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * testsuite/std/format/ranges/adaptors.cc: Updated test.
> > ---
> > Tested on x86_64-linux.
>
> OK for trunk and gcc-15.

Oh sorry, I just saw it's already [committed] - great.

>
> >
> >  libstdc++-v3/testsuite/std/format/ranges/adaptors.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc 
> > b/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
> > index 854c7eef5bd..daa73aa39bf 100644
> > --- a/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
> > +++ b/libstdc++-v3/testsuite/std/format/ranges/adaptors.cc
> > @@ -88,7 +88,7 @@ test_output()
> >VERIFY( res == WIDEN("==[0x03, 0x02, 0x01]===") );
> >
> >// Sequence output is always used
> > -  std::queue<_CharT, std::basic_string<_CharT>> qs(
> > +  Adaptor<_CharT, std::basic_string<_CharT>> qs(
> >  std::from_range,
> >  std::basic_string_view<_CharT>(WIDEN("321")));
> >
> > --
> > 2.49.0
> >



Re: [PATCH v3] RISC-V: Fix riscv_modes_tieable_p

2025-04-25 Thread Jeff Law




On 1/12/25 10:44 PM, Zhijin Zeng wrote:

Compared to the patch v2, I added Zfinx check and Zfh check. Please help to 
review again.

Thanks,
Zhijin

 From 9ddb402cebe868050ebc2f75e4d87238161411b4 Mon Sep 17 00:00:00 2001
From: Zhijin Zeng 
Date: Sat, 11 Jan 2025 12:09:11 +0800
Subject: [PATCH] RISC-V: Fix mode compatibility between floating-point and
  integer value

I find there are some unnecessary fmv instructions in glibc math exp,
and reduce exp function to the attached test case. The unnecessary
fmv instructions as follow:

```
         fld     fa4,16(a4)
         fmadd.d fa2,fa2,fa0,fa5
         fld     fa0,56(a4)
         fmv.x.d a5,fa2             *
         fld     fa2,48(a4)
         fmv.d.x fa1,a5             *
         andi    a3,a5,127
         addi    a3,a3,15
         fsub.d  fa5,fa1,fa5
         slli    a3,a3,3
```
So just to be clear, this is one of those cases where we want to work on 
the value as both an FP value and an integer value.  These cases are 
notoriously hard to make "perfect".





The data of fa2 and fa1 are the same, we should directly use fa2
rather than fa1 in following instructions and save one fmv instruction.
Ideally, yes.  But note that you may well fix this case and make others 
perform poorly.   That really just means we have to be careful.





The `fmv.d.x a5,fa2` is correspond to pattern `(subreg:DI (reg/v:DF 143`.
In ira pass, virtual register r143 is assigned to GP_REGS, so its data
need be copied to FP_REGS before `fsub.d fa5,fa1,fa5` by reload pass,
and that's exactly the `fmv.d.x fa1,a5` instruction.
Right.  We've a pseudo that is primarily used in FP instructions, but 
which is also used in an integer context.  This is an artifact of fwprop.


Prior to fwprop we had:


(insn 13 12 14 2 (set (reg:DI 144 [ _16 ])
(subreg:DI (reg/v:DF 143 [ kd ]) 0)) "j.c":44:5 277 {*movdi_64bit}
 (nil))


The net is that prior to fwprop things looked pretty sensible.  We had 
r143 for use in the FP instructions and r144 for use in integer 
instructions.


fwprop propagates away the subreg copy resulting in using r143 in both 
the scalar and FP contexts.  Given the current definition of 
modes_tieable_p, that's a sensible decision, though it's causing 
problems later.


Changing modes_tieable_p would be the way to prevent that behavior in 
fwprop.  But I don't think that's really the right change.


Note carefully modes_tieable_p has no notion of register files.  It 
works strictly on modes.  Some ports do use modes as a rough proxy for 
register files, but it's far from clear if it's the best way forward for 
RISC-V.


So if we're going to change modes_tieable_p like this we really need to 
do deeper analysis than looking at a few routines from glibc.  It's the 
kind of change that would normally be tested on designs with spec.


I don't know if you have access to spec or not.  If not, then this 
probably needs to wait for someone who has the time to do a deeper dive.




3. JALR_REGS is also a subset of GR_REGS and need to be taken
into acount in riscv_register_move_cost, otherwise it will get
a incorrect cost.
This looks like a pretty straightforward bugfix.  Note that we probably 
also want to handle SIBCALL_REGS in a manner similar to JALR_REGS since 
it's a subset of the GPR regsiter file.


Can you break out that change separately and include SIBCALL_REGS? 
Barring any surprises that should be able to go forward immediately.


Jeff


New template for 'gcc' made available

2025-04-25 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

https://translationproject.org/POT-files/gcc-15.1.0.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://gcc.gnu.org/pub/gcc/releases/gcc-15.1.0/gcc-15.1.0.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] aarch64: Optimize SVE extract last to Neon lane extract for 128-bit VLS.

2025-04-25 Thread Richard Sandiford
Jennifer Schmitz  writes:
> For the test case
> int32_t foo (svint32_t x)
> {
>  svbool_t pg = svpfalse ();
>  return svlastb_s32 (pg, x);
> }
> compiled with -O3 -mcpu=grace -msve-vector-bits=128, GCC produced:
> foo:
>   pfalse  p3.b
>   lastb   w0, p3, z0.s
>   ret
> when it could use a Neon lane extract instead:
> foo:
>   umovw0, v0.s[3]
>   ret
>
> We implemented this optimization by guarding the emission of
> pfalse+lastb in the pattern vec_extract by
> known_gt (BYTES_PER_SVE_VECTOR, 16). Thus, for a last-extract operation
> in 128-bit VLS, the pattern *vec_extract_v128 is used instead.
>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>   * config/aarch64/aarch64-sve.md (vec_extract):
>   Prevent the emission of pfalse+lastb for 128-bit VLS.
>
> gcc/testsuite/
>   * gcc.target/aarch64/sve/extract_last_128.c: New test.

OK, thanks.

Richard

> ---
> gcc/config/aarch64/aarch64-sve.md |  7 ++--
> .../gcc.target/aarch64/sve/extract_last_128.c | 33 +++
> 2 files changed, 37 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/extract_last_128.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 3dbd65986ec..824bd877e47 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -2969,10 +2969,11 @@
>   {
> poly_int64 val;
> if (poly_int_rtx_p (operands[2], &val)
> - && known_eq (val, GET_MODE_NUNITS (mode) - 1))
> + && known_eq (val, GET_MODE_NUNITS (mode) - 1)
> + && known_gt (BYTES_PER_SVE_VECTOR, 16))
>   {
> - /* The last element can be extracted with a LASTB and a false
> -predicate.  */
> + /* Extract the last element with a LASTB and a false predicate.
> +Exclude 128-bit VLS to use *vec_extract_v128.  */
>   rtx sel = aarch64_pfalse_reg (mode);
>   emit_insn (gen_extract_last_ (operands[0], sel, operands[1]));
>   DONE;
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/extract_last_128.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/extract_last_128.c
> new file mode 100644
> index 000..71d3561ec60
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/extract_last_128.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -msve-vector-bits=128" } */
> +
> +#include 
> +
> +#define TEST(TYPE, TY)   \
> +  TYPE exract_last_##TY (sv##TYPE x) \
> +  {  \
> +svbool_t pg = svpfalse ();   \
> +return svlastb_##TY (pg, x); \
> +  }
> +
> +TEST(bfloat16_t, bf16)
> +TEST(float16_t, f16)
> +TEST(float32_t, f32)
> +TEST(float64_t, f64)
> +TEST(int8_t, s8)
> +TEST(int16_t, s16)
> +TEST(int32_t, s32)
> +TEST(int64_t, s64)
> +TEST(uint8_t, u8)
> +TEST(uint16_t, u16)
> +TEST(uint32_t, u32)
> +TEST(uint64_t, u64)
> +
> +/* { dg-final { scan-assembler-times {\tdup\th0, v0\.h\[7\]} 2 } } */
> +/* { dg-final { scan-assembler-times {\tdup\ts0, v0\.s\[3\]} 1 } } */
> +/* { dg-final { scan-assembler-times {\tdup\td0, v0\.d\[1\]} 1 } } */
> +/* { dg-final { scan-assembler-times {\tumov\tw0, v0\.h\[7\]} 2 } } */
> +/* { dg-final { scan-assembler-times {\tumov\tw0, v0\.b\[15\]} 2 } } */
> +/* { dg-final { scan-assembler-times {\tumov\tw0, v0\.s\[3\]} 2 } } */
> +/* { dg-final { scan-assembler-times {\tumov\tx0, v0\.d\[1\]} 2 } } */
> +/* { dg-final { scan-assembler-not "lastb" } } */
> \ No newline at end of file


[pushed 2/2] c++: pruning non-captures in noexcept lambda [PR119764]

2025-04-25 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The patch for PR87185 fixed the ICE without fixing the underlying problem,
that we were failing to find the declaration of the capture proxy that we
are trying to decide whether to prune.  Fixed by looking at the right index
in stmt_list_stack.

Since this changes captures, it changes the ABI of noexcept lambdas; we
haven't worked hard to maintain lambda capture ABI, but it's easy enough to
control here.

PR c++/119764
PR c++/87185

gcc/cp/ChangeLog:

* lambda.cc (insert_capture_proxy): Handle noexcept lambda.
(prune_lambda_captures): Likewise, in ABI v21.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-noexcept1.C: New test.
---
 gcc/cp/lambda.cc  | 41 ---
 .../g++.dg/cpp0x/lambda/lambda-noexcept1.C| 10 +
 2 files changed, 37 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-noexcept1.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index b2e0ecdd67e..a2bed9fb36a 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -348,7 +348,11 @@ insert_capture_proxy (tree var)
 
   /* And put a DECL_EXPR in the STATEMENT_LIST for the same block.  */
   var = build_stmt (DECL_SOURCE_LOCATION (var), DECL_EXPR, var);
-  tree stmt_list = (*stmt_list_stack)[1];
+  /* The first stmt_list is from start_preparsed_function.  Then there's a
+ possible stmt_list from begin_eh_spec_block, then the one from the
+ lambda's outer {}.  */
+  unsigned index = 1 + use_eh_spec_block (current_function_decl);
+  tree stmt_list = (*stmt_list_stack)[index];
   gcc_assert (stmt_list);
   append_to_statement_list_force (var, &stmt_list);
 }
@@ -1859,11 +1863,10 @@ prune_lambda_captures (tree body)
   cp_walk_tree_without_duplicates (&body, mark_const_cap_r, &const_vars);
 
   tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function (lam)));
-  if (bind_expr && TREE_CODE (bind_expr) == MUST_NOT_THROW_EXPR)
+  bool noexcept_p = (bind_expr
+&& TREE_CODE (bind_expr) == MUST_NOT_THROW_EXPR);
+  if (noexcept_p)
 bind_expr = expr_single (TREE_OPERAND (bind_expr, 0));
-  /* FIXME: We don't currently handle noexcept lambda captures correctly,
- so bind_expr may not be set; see PR c++/119764.  */
-  gcc_assert (!bind_expr || TREE_CODE (bind_expr) == BIND_EXPR);
 
   tree *fieldp = &TYPE_FIELDS (LAMBDA_EXPR_CLOSURE (lam));
   for (tree *capp = &LAMBDA_EXPR_CAPTURE_LIST (lam); *capp; )
@@ -1872,11 +1875,23 @@ prune_lambda_captures (tree body)
   if (tree var = var_to_maybe_prune (cap))
{
  tree **use = const_vars.get (var);
- if (use && TREE_CODE (**use) == DECL_EXPR)
+ if (TREE_CODE (**use) == DECL_EXPR)
{
  /* All uses of this capture were folded away, leaving only the
 proxy declaration.  */
 
+ if (noexcept_p)
+   {
+ /* We didn't handle noexcept lambda captures correctly before
+the fix for PR c++/119764.  */
+ if (abi_version_crosses (21))
+   warning_at (location_of (lam), OPT_Wabi, "%qD is no longer"
+   " captured in noexcept lambda in ABI v21 "
+   "(GCC 16)", var);
+ if (!abi_version_at_least (21))
+   goto next;
+   }
+
  /* Splice the capture out of LAMBDA_EXPR_CAPTURE_LIST.  */
  *capp = TREE_CHAIN (cap);
 
@@ -1894,14 +1909,11 @@ prune_lambda_captures (tree body)
 
  /* And maybe out of the vars declared in the containing
 BIND_EXPR, if it's listed there.  */
- if (bind_expr)
-   {
- tree *bindp = &BIND_EXPR_VARS (bind_expr);
- while (*bindp && *bindp != DECL_EXPR_DECL (**use))
-   bindp = &DECL_CHAIN (*bindp);
- if (*bindp)
-   *bindp = DECL_CHAIN (*bindp);
-   }
+ tree *bindp = &BIND_EXPR_VARS (bind_expr);
+ while (*bindp && *bindp != DECL_EXPR_DECL (**use))
+   bindp = &DECL_CHAIN (*bindp);
+ if (*bindp)
+   *bindp = DECL_CHAIN (*bindp);
 
  /* And remove the capture proxy declaration.  */
  **use = void_node;
@@ -1909,6 +1921,7 @@ prune_lambda_captures (tree body)
}
}
 
+next:
   capp = &TREE_CHAIN (cap);
 }
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-noexcept1.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-noexcept1.C
new file mode 100644
index 000..d7445569801
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-noexcept1.C
@@ -0,0 +1,10 @@
+// PR c++/119764
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fabi-version=0 -Wabi=20" }
+
+int main() {
+  const int x = 123;
+  auto a = [&]() { return x; };

[Patch] nvptx/nvptx.opt: Update -march-map= for newer sm_xxx

2025-04-25 Thread Tobias Burnus

The idea of -march-map= is to simply and future proof select the
best -march for a certain arch, without requiring that the compiler
has support for it (like having a special multilib for it) - while
-march= sets the actually used '.target' (and the compiler might
actually generate specialized code for it).

The patch updates the sm_X for the CUDA 12.8 additions, namely for
three Blackwell GPU architectures.

Cf. 
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
or also https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

OK for mainline?

Tobias

PS: CUDA 12.7 seems to be an internal release, which shows up as
PTX version but was not released to the public.
PTX 8.6/CUDA 12.7 added sm_100/sm_101 - and PTX 8.7/CUDA 12.8 added sm120.

PPS: sm_80 (Ampere) was added in PTX ISA 7.0 (CUDA 11.0),
sm_89 (Ada) in PTX ISA 7.8 (CUDA).
As sm_90 (Hopper) + sm_100/101/120 (Blackwell) currently/now map to
sm_89, GCC generates PTX ISA .version 7.8 for them.
Otherwise, sm_80 and sm_89 produce (for now) identical code.
nvptx/nvptx.opt: Update -march-map= for newer sm_xxx

Usage of the -march-map=: "Select the closest available '-march=' value
that is not more capable."

As PTX ISA 8.6/8.7 (= unreleased CUDA 12.7 + CUDA 12.8) added the
Nvidia Blackwell GPUs SM_100, SM_101, and SM_120, it makes sense to
add them as well. Note that all three come as sm_XXX and sm_XXXa.

Internally, GCC currently generates the same code for >= sm_80 (Ampere);
however, as GCC's -march= also supports sm_89 (Ada), the here added
sm_1xxs (Blackwell) will map to sm_89.

[Naming note: while ptx code generated for sm_X can also run with sm_Y
if Y > X, code generated for sm_XXXa can (generally) only run on
the specific hardware.]

gcc/ChangeLog:

	* config/nvptx/nvptx.opt (march-map=): Add sm_100, sm_101a,
	sm_101, sm_101a, sm_120, and sm_120a.

 gcc/config/nvptx/nvptx.opt | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index ce9fbc7312e..8c160874648 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -120,6 +120,24 @@ Target RejectNegative Alias(misa=,sm_89)
 march-map=sm_90a
 Target RejectNegative Alias(misa=,sm_89)
 
+march-map=sm_100
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_100a
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_101
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_101a
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_120
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_120a
+Target RejectNegative Alias(misa=,sm_89)
+
 Enum
 Name(ptx_version) Type(enum ptx_version)
 Known PTX ISA versions (for use with the -mptx= option):


Re: [PATCH 1/2] libstdc++: Add _M_key_compare helper to associative containers

2025-04-25 Thread Tomasz Kaminski
On Fri, Apr 25, 2025 at 12:19 AM Jonathan Wakely  wrote:

> In r10-452-ge625ccc21a91f3 I noted that we don't have an accessor for
> invoking _M_impl._M_key_compare in the associative containers. That
> meant that the static assertions to check for valid comparison functions
> were squirrelled away in _Rb_tree::_S_key instead. As Jason noted in
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681436.html this
> means that the static assertions fail later than we'd like.
>
> This change adds a new _Rb_tree::_M_key_compare member function which
> invokes the _M_impl._M_key_compare function object, and then moves the
> static_assert from _S_key into _M_key_compare.
>
> Because the new function is const-qualified, we now treat LWG 2542 as a
> DR for older standards, requiring the comparison function to be const
> invocable. Previously we only enforced the LWG 2542 rule for C++17 and
> later.
>
> I did consider deprecating support for comparisons which aren't const
> invocable, something like this:
>
>   // Before LWG 2542 it wasn't strictly necessary for _Compare to be
>   // const invocable, if you only used non-const container members.
>   // Define a non-const overload for pre-C++17, deprecated for C++11/14.
>   #if __cplusplus < 201103L
>   bool
>   _M_key_compare(const _Key& __k1, const _Key& __k2)
>   { return _M_impl._M_key_compare(__k1, __k2); }
>   #elif __cplusplus < 201703L
>   template
> [[__deprecated__("support for comparison functions that are not "
>  "const invocable is deprecated")]]
> __enable_if_t<
> __and_<__is_invocable<_Compare&, const _Key1&, const _Key2&>,
>__not_<__is_invocable _Key2&>>>::value,
>bool>
> _M_key_compare(const _Key1& __k1, const _Key2& __k2)
> {
>   static_assert(
> __is_invocable<_Compare&, const _Key&, const _Key&>::value,
> "comparison object must be invocable with two arguments of key
> type"
>   );
>   return _M_impl._M_key_compare(__k1, __k2);
> }
>   #endif // < C++17
>
> But I decided that this isn't necessary, because we've been enforcing
> the C++17 rule since GCC 8.4 and 9.2, and C++17 has been the default
> since GCC 11.1. Users have had plenty of time to fix their invalid
> comparison functions.
>
LGTM with one very small change to comment.

>
> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_tree.h (_Rb_tree::_M_key_compare): New member
> function to invoke comparison function.
> (_Rb_tree): Use new member function instead of accessing the
> comparison function directly.
> ---
>
> Tested x86_64-linux.
>
>  libstdc++-v3/include/bits/stl_tree.h | 108 +--
>  1 file changed, 50 insertions(+), 58 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/stl_tree.h
> b/libstdc++-v3/include/bits/stl_tree.h
> index 6b35f99a25a..c7352093933 100644
> --- a/libstdc++-v3/include/bits/stl_tree.h
> +++ b/libstdc++-v3/include/bits/stl_tree.h
> @@ -1390,27 +1390,25 @@ namespace __rb_tree
>_M_end() const _GLIBCXX_NOEXCEPT
>{ return this->_M_impl._M_header._M_base_ptr(); }
>
> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +  // 2542. Missing const requirements for associative containers
> +  template
> +   bool
> +   _M_key_compare(const _Key1& __k1, const _Key2& __k2) const
> +   {
> +#if __cplusplus >= 201103L
> + // Enforce this here with a user-friendly message.
> + static_assert(
> +   __is_invocable _Key&>::value,
> +   "comparison object must be invocable with two arguments of key
> type"
> + );
> +# endif // C++17
>
This does not match the condition in if, should be C++11

> + return _M_impl._M_key_compare(__k1, __k2);
> +   }
> +
>static const _Key&
>_S_key(const _Node& __node)
> -  {
> -#if __cplusplus >= 201103L
> -   // If we're asking for the key we're presumably using the
> comparison
> -   // object, and so this is a good place to sanity check it.
> -   static_assert(__is_invocable<_Compare&, const _Key&, const
> _Key&>{},
> - "comparison object must be invocable "
> - "with two arguments of key type");
> -# if __cplusplus >= 201703L
> -   // _GLIBCXX_RESOLVE_LIB_DEFECTS
> -   // 2542. Missing const requirements for associative containers
> -   if constexpr (__is_invocable<_Compare&, const _Key&, const
> _Key&>{})
> - static_assert(
> - is_invocable_v,
> - "comparison object must be invocable as const");
> -# endif // C++17
> -#endif // C++11
>
I checked that _S_key seems to be mostly passed to __M_key_compare.
There is one function _M_key or _Auto_node that calls is, but  it get
passed to
M_get_insert_(hint_)?(equal|uniq)_pos, but that function is called
_M_comapare.
In short, yes, this was duplicate of check.

> -
> -   return _KeyOfValue()(*__node._M_valptr());
> -  }
> +  { return _KeyOfValue()(*_

Re: [PATCH 2/2] libstdc++: Improve diagnostics for std::packaged_task invocable checks

2025-04-25 Thread Tomasz Kaminski
On Fri, Apr 25, 2025 at 12:20 AM Jonathan Wakely  wrote:

> Moving the static_assert that checks is_invocable_r_v into _Task_state
> means it is checked when we instantiate that class template.
>
> Replacing the __create_task_state function with a static member function
> _Task_state::_S_create ensures we instantiate _Task_state and trigger
> the static_assert immediately, not deep inside the implementation of
> std::allocate_shared. This results in shorter diagnostics that don't
> show deeply-nested template instantiations before the static_assert
> failure.
>
> Placing the static_assert at class scope also helps us to fail earlier
> than waiting until when the _Task_state::_M_run virtual function is
> instantiated. That also makes the diagnostics shorter and easier to read
> (although for C++11 and C++14 modes the class-scope static_assert
> doesn't check is_invocable_r, so dangling references aren't detected
> until _M_run is instantiated).
>
> libstdc++-v3/ChangeLog:
>
> * include/std/future (__future_base::_Task_state): Check
> invocable requirement here.
> (__future_base::_Task_state::_S_create): New static member
> function.
> (__future_base::_Task_state::_M_reset): Use _S_create.
> (__create_task_state): Remove.
> (packaged_task): Use _Task_state::_S_create instead of
> __create_task_state.
> * testsuite/30_threads/packaged_task/cons/dangling_ref.cc:
> Adjust dg-error patterns.
> * testsuite/30_threads/packaged_task/cons/lwg4154_neg.cc:
> Likewise.
> ---
>
> Tested x86_64-linux.
>
LGTM.

>
>  libstdc++-v3/include/std/future   | 66 +--
>  .../packaged_task/cons/dangling_ref.cc|  3 +-
>  .../packaged_task/cons/lwg4154_neg.cc | 10 +--
>  3 files changed, 38 insertions(+), 41 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/future
> b/libstdc++-v3/include/std/future
> index b7ab233b85f..080690064a9 100644
> --- a/libstdc++-v3/include/std/future
> +++ b/libstdc++-v3/include/std/future
> @@ -1486,12 +1486,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct __future_base::_Task_state<_Fn, _Alloc, _Res(_Args...)> final
>  : __future_base::_Task_state_base<_Res(_Args...)>
>  {
> +#ifdef __cpp_lib_is_invocable // C++ >= 17
> +  static_assert(is_invocable_r_v<_Res, _Fn&, _Args...>);
> +#else
> +  static_assert(__is_invocable<_Fn&, _Args...>::value,
> +   "_Fn& is invocable with _Args...");
> +#endif
> +
>template
> _Task_state(_Fn2&& __fn, const _Alloc& __a)
> : _Task_state_base<_Res(_Args...)>(__a),
>   _M_impl(std::forward<_Fn2>(__fn), __a)
> { }
>
> +  template
> +   static shared_ptr<_Task_state_base<_Res(_Args...)>>
> +   _S_create(_Fn2&& __fn, const _Alloc& __a)
> +   {
> + return std::allocate_shared<_Task_state>(__a,
> +
> std::forward<_Fn2>(__fn),
> +  __a);
> +   }
> +
>  private:
>virtual void
>_M_run(_Args&&... __args)
> @@ -1515,7 +1531,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>}
>
>virtual shared_ptr<_Task_state_base<_Res(_Args...)>>
> -  _M_reset();
> +  _M_reset()
> +  { return _S_create(std::move(_M_impl._M_fn), _M_impl); }
>
>struct _Impl : _Alloc
>{
> @@ -1525,38 +1542,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> _Fn _M_fn;
>} _M_impl;
>  };
> -
> -  template -  typename _Alloc = std::allocator>
> -shared_ptr<__future_base::_Task_state_base<_Signature>>
> -__create_task_state(_Fn&& __fn, const _Alloc& __a = _Alloc())
> -{
> -  typedef typename decay<_Fn>::type _Fn2;
> -  typedef __future_base::_Task_state<_Fn2, _Alloc, _Signature> _State;
> -  return std::allocate_shared<_State>(__a, std::forward<_Fn>(__fn),
> __a);
> -}
> -
> -  template _Args>
> -shared_ptr<__future_base::_Task_state_base<_Res(_Args...)>>
> -__future_base::_Task_state<_Fn, _Alloc, _Res(_Args...)>::_M_reset()
> -{
> -  return __create_task_state<_Res(_Args...)>(std::move(_M_impl._M_fn),
> -
> static_cast<_Alloc&>(_M_impl));
> -}
>/// @endcond
>
>/// packaged_task
>template
>  class packaged_task<_Res(_ArgTypes...)>
>  {
> -  typedef __future_base::_Task_state_base<_Res(_ArgTypes...)>
> _State_type;
> +  using _State_type =
> __future_base::_Task_state_base<_Res(_ArgTypes...)>;
>shared_ptr<_State_type>   _M_state;
>
>// _GLIBCXX_RESOLVE_LIB_DEFECTS
>// 3039. Unnecessary decay in thread and packaged_task
>template>
> -   using __not_same
> - = typename enable_if::value>::type;
> +   using __not_same = __enable_if_t _Fn2>::value>;
> +
> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +  // 4154. The Mandates for std::packaged_task's constructor
> +  // from a callable entity should consider decaying.
> +  templ

[committed] Adjust gcc_release for id href web transformations

2025-04-25 Thread Jakub Jelinek
Hi!

We now have some script which transforms e.g.
GCC 15.1
line in gcc-15/changes.html to
GCC 15.1

This unfortunately breaks the gcc_release script, which looks for
GCC 15.1 appearing in gennews after optional blanks from the start of
the line in the NEWS file, which is no longer the case, there is
[129]GCC 15.1
or something like that with an URL later on
 129. https://gcc.gnu.org/gcc-15/changes.html#15.1

The following patch handles this.

Tested during gcc 15.1 release process, committed to trunk.

2025-04-25  Jakub Jelinek  

* gcc_release: Allow optional \[[0-9]+\] before GCC major.minor
in the NEWS file.

--- maintainer-scripts/gcc_release.jj
+++ maintainer-scripts/gcc_release
@@ -141,7 +141,7 @@ build_sources() {
 "in gcc-${RELEASE_MAJOR}/index.html"
 
 sed -n -e "/^${thischanges}/,/^${previndex}/p" NEWS |\
-  grep -q "^[[:blank:]]*GCC ${RELEASE_MAJOR}.${RELEASE_MINOR}" ||\
+  grep -q "^[[:blank:]]*\(\[[0-9]\{1,\}\][[:blank:]]*\)\{0,1\}GCC 
${RELEASE_MAJOR}.${RELEASE_MINOR}" ||\
   error "GCC ${RELEASE_MAJOR}.${RELEASE_MINOR} not mentioned "\
 "in gcc-${RELEASE_MAJOR}/changes.html"
 


Jakub



Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-25 Thread Hongtao Liu
>
> I am not so sure about this when it come to relatively common
> instructions.  Hiding things in unspec prevents combine and other RTL
> passes from doing their job. I would say that it only makes sense for
> siutations where RTL equivalent is very inconvenient.
>
In the direction of using general rtl instead of unspec, I'm not
opposed to it, I'm just not sure how much good it will do.



-- 
BR,
Hongtao


Re: [PATCH 1/2] libstdc++: Add _M_key_compare helper to associative containers

2025-04-25 Thread Jonathan Wakely
On Fri, 25 Apr 2025 at 09:13, Tomasz Kaminski  wrote:
>
>
>
> On Fri, Apr 25, 2025 at 12:19 AM Jonathan Wakely  wrote:
>>
>> In r10-452-ge625ccc21a91f3 I noted that we don't have an accessor for
>> invoking _M_impl._M_key_compare in the associative containers. That
>> meant that the static assertions to check for valid comparison functions
>> were squirrelled away in _Rb_tree::_S_key instead. As Jason noted in
>> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681436.html this
>> means that the static assertions fail later than we'd like.
>>
>> This change adds a new _Rb_tree::_M_key_compare member function which
>> invokes the _M_impl._M_key_compare function object, and then moves the
>> static_assert from _S_key into _M_key_compare.
>>
>> Because the new function is const-qualified, we now treat LWG 2542 as a
>> DR for older standards, requiring the comparison function to be const
>> invocable. Previously we only enforced the LWG 2542 rule for C++17 and
>> later.
>>
>> I did consider deprecating support for comparisons which aren't const
>> invocable, something like this:
>>
>>   // Before LWG 2542 it wasn't strictly necessary for _Compare to be
>>   // const invocable, if you only used non-const container members.
>>   // Define a non-const overload for pre-C++17, deprecated for C++11/14.
>>   #if __cplusplus < 201103L
>>   bool
>>   _M_key_compare(const _Key& __k1, const _Key& __k2)
>>   { return _M_impl._M_key_compare(__k1, __k2); }
>>   #elif __cplusplus < 201703L
>>   template
>> [[__deprecated__("support for comparison functions that are not "
>>  "const invocable is deprecated")]]
>> __enable_if_t<
>> __and_<__is_invocable<_Compare&, const _Key1&, const _Key2&>,
>>__not_<__is_invocable> _Key2&>>>::value,
>>bool>
>> _M_key_compare(const _Key1& __k1, const _Key2& __k2)
>> {
>>   static_assert(
>> __is_invocable<_Compare&, const _Key&, const _Key&>::value,
>> "comparison object must be invocable with two arguments of key type"
>>   );
>>   return _M_impl._M_key_compare(__k1, __k2);
>> }
>>   #endif // < C++17
>>
>> But I decided that this isn't necessary, because we've been enforcing
>> the C++17 rule since GCC 8.4 and 9.2, and C++17 has been the default
>> since GCC 11.1. Users have had plenty of time to fix their invalid
>> comparison functions.
>
> LGTM with one very small change to comment.
>>
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/bits/stl_tree.h (_Rb_tree::_M_key_compare): New member
>> function to invoke comparison function.
>> (_Rb_tree): Use new member function instead of accessing the
>> comparison function directly.
>> ---
>>
>> Tested x86_64-linux.
>>
>>  libstdc++-v3/include/bits/stl_tree.h | 108 +--
>>  1 file changed, 50 insertions(+), 58 deletions(-)
>>
>> diff --git a/libstdc++-v3/include/bits/stl_tree.h 
>> b/libstdc++-v3/include/bits/stl_tree.h
>> index 6b35f99a25a..c7352093933 100644
>> --- a/libstdc++-v3/include/bits/stl_tree.h
>> +++ b/libstdc++-v3/include/bits/stl_tree.h
>> @@ -1390,27 +1390,25 @@ namespace __rb_tree
>>_M_end() const _GLIBCXX_NOEXCEPT
>>{ return this->_M_impl._M_header._M_base_ptr(); }
>>
>> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
>> +  // 2542. Missing const requirements for associative containers
>> +  template
>> +   bool
>> +   _M_key_compare(const _Key1& __k1, const _Key2& __k2) const
>> +   {
>> +#if __cplusplus >= 201103L
>> + // Enforce this here with a user-friendly message.
>> + static_assert(
>> +   __is_invocable::value,
>> +   "comparison object must be invocable with two arguments of key 
>> type"
>> + );
>> +# endif // C++17
>
> This does not match the condition in if, should be C++11

Oh yes, thanks, I'll fix it.

>>
>> + return _M_impl._M_key_compare(__k1, __k2);
>> +   }
>> +
>>static const _Key&
>>_S_key(const _Node& __node)
>> -  {
>> -#if __cplusplus >= 201103L
>> -   // If we're asking for the key we're presumably using the comparison
>> -   // object, and so this is a good place to sanity check it.
>> -   static_assert(__is_invocable<_Compare&, const _Key&, const _Key&>{},
>> - "comparison object must be invocable "
>> - "with two arguments of key type");
>> -# if __cplusplus >= 201703L
>> -   // _GLIBCXX_RESOLVE_LIB_DEFECTS
>> -   // 2542. Missing const requirements for associative containers
>> -   if constexpr (__is_invocable<_Compare&, const _Key&, const _Key&>{})
>> - static_assert(
>> - is_invocable_v,
>> - "comparison object must be invocable as const");
>> -# endif // C++17
>> -#endif // C++11
>
> I checked that _S_key seems to be mostly passed to __M_key_compare.
> There is one function _M_key or _Auto_node that calls is, but  it get passed 
> to
> M_get_insert_(hint_)?

Re: [PATCH] libstdc++: Constrain formatter for thread:id [PR119918]

2025-04-25 Thread Jonathan Wakely

s/thread:id/thread::id/ in the subject line

On 24/04/25 12:50 +0200, Tomasz Kamiński wrote:

This patch add constrains __formatter::__char to _CharT type parameter


s/constrains/constraint/


of formatter specialization, matching the constrains


same change here


of formatting of integer/pointers that are used as native handles.

The dependency on  header, is changed to .
To achieve that, formatting of pointers is extraced from void const*


s/extraced/extracted/


specialization to internal __formatter_ptr<_CharT>, that can be forward
declared.

Finally, the handle representation is now printed directly to __fc.out(),
by the formatter for handle type. To support this, internal formatters
can now be constructed from _Spec object as alternative to invoking parse
method.

PR libstdc++/119918

libstdc++-v3/ChangeLog:

* include/bits/formatfwd.h (__format::_Align): Moved from std/format.
(std::__throw_format_error, __format::__formatter_str)
(__format::__formatter_ptr): Declare.
* include/std/format (__format::_Align): Moved to bits/formatfwd.h.
(__formatter_int::__formatter_int): Define.
(__format::__formatter_ptr): Extracted from formatter for const void*.
(std::formatter, formatter)
(std::formatter): Delegate to 
__formatter_ptr<_CharT>.
* include/std/thread (std::formatter): Constrain
_CharT template parameter.
(formatter::parse): Specify default aligment, and
qualify __throw_format_error to disable ADL.
(formatter::format): Use formatters to write 
directly
to output.
* testsuite/30_threads/thread/id/output.cc: Tests for formatting 
thread::id
representing not-a-thread with padding and formattable concept.
---
Tested on x86_64-linux. OK for trunk, after GCC 15.1?


OK for trunk with the minor review comments addressed, thanks.


libstdc++-v3/include/bits/formatfwd.h |  19 +-
libstdc++-v3/include/std/format   | 257 ++
libstdc++-v3/include/std/thread   |  46 ++--
.../testsuite/30_threads/thread/id/output.cc  |  30 ++
4 files changed, 211 insertions(+), 141 deletions(-)

diff --git a/libstdc++-v3/include/bits/formatfwd.h 
b/libstdc++-v3/include/bits/formatfwd.h
index a6b5ac8c8ce..8a84af4d6e4 100644
--- a/libstdc++-v3/include/bits/formatfwd.h
+++ b/libstdc++-v3/include/bits/formatfwd.h
@@ -50,6 +50,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // [format.formatter], formatter
  template struct formatter;

+  /// @cond undocumented
+  [[noreturn]]
+  inline void
+  __throw_format_error(const char* __what);
+
namespace __format
{
#ifdef _GLIBCXX_USE_WCHAR_T
@@ -60,8 +65,18 @@ namespace __format
concept __char = same_as<_CharT, char>;
#endif

-  template<__char _CharT>
-struct __formatter_int;
+  enum _Align {
+_Align_default,
+_Align_left,
+_Align_right,
+_Align_centre,
+  };
+
+  template struct _Spec;
+
+  template<__char _CharT> struct __formatter_str;
+  template<__char _CharT> struct __formatter_int;
+  template<__char _CharT> struct __formatter_ptr;
}

_GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index e557e104d74..8f47e5acb80 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -485,13 +485,6 @@ namespace __format
_Pres_esc = 0xf,  // For strings, charT and ranges
  };

-  enum _Align {
-_Align_default,
-_Align_left,
-_Align_right,
-_Align_centre,
-  };
-
  enum _Sign {
_Sign_default,
_Sign_plus,
@@ -1434,6 +1427,13 @@ namespace __format
  static constexpr _Pres_type _AsBool = _Pres_s;
  static constexpr _Pres_type _AsChar = _Pres_c;

+  __formatter_int() = default;
+
+  constexpr
+  __formatter_int(_Spec<_CharT> __spec)


This could be noexcept, but it probably doesn't make any difference.


+  : _M_spec(__spec)
+  { }
+
  constexpr typename basic_format_parse_context<_CharT>::iterator
  _M_do_parse(basic_format_parse_context<_CharT>& __pc, _Pres_type __type)
  {
@@ -2381,6 +2381,134 @@ namespace __format
  _Spec<_CharT> _M_spec{};
};

+  template<__format::__char _CharT>
+struct __formatter_ptr
+{
+  __formatter_ptr() = default;
+
+  constexpr
+  __formatter_ptr(_Spec<_CharT> __spec)
+  : _M_spec(__spec)
+  { }
+
+  constexpr typename basic_format_parse_context<_CharT>::iterator
+  parse(basic_format_parse_context<_CharT>& __pc)
+  {
+   __format::_Spec<_CharT> __spec{};
+   const auto __last = __pc.end();
+   auto __first = __pc.begin();
+
+   auto __finalize = [this, &__spec] {
+ _M_spec = __spec;
+   };
+
+   auto __finished = [&] {
+ if (__first == __last || *__first == '}')
+   {
+ __finalize();
+ return true;
+   }
+ return false;
+   };
+
+   if (__finished())
+ return __

Re: [PATCH] Fortran: fix procedure pointer handling with -fcheck=pointer [PR102900]

2025-04-25 Thread Harald Anlauf

Hi Jerry,

Am 24.04.25 um 22:56 schrieb Jerry D:

On 4/24/25 12:59 PM, Harald Anlauf wrote:

Dear all,

the attached patch is the result of my attempts to fix an ICE when
compiling gfortran.dg/proc_ptr_52.f90 with -fcheck=all.  While
trying to reduce this, I found several oddities with functions
returning class(*), pointer that ICE'd too.

The original ICE in the PR turned out to be a bug in the pointer
checking code when passing a procedure pointer to a CLASS procedure
dummy that tried to access the container of the procedure pointer.
I believe that this should not be done, and one should only check
that the procedure pointer is not null.
I am not too experienced which class-valued functions, so if any
of the experts (Paul, Andre', ...) could have a look?

(After fixing the issue with -fcheck=pointer, I ran into a bogus
error with -Wexternal-argument-mismatch for the same testcase.
This is now pr119928.)

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Cheers,
Harald


OK Harald,


thanks for the review!

Pushed as r16-150-gcc8d86ee4680d5.

Harald


Jerry






[PATCH] libstdc++: Replace AC_LANG_CPLUSPLUS with AC_LANG_PUSH

2025-04-25 Thread Jonathan Wakely
Autoconf documents AC_LANG_SAVE, AC_LANG_CPLUSPLUS etc. as deprecated.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_LANG_PUSH, GLIBCXX_LANG_POP): New
macros to replace uses of AC_LANG_SAVE, AC_LANG_CPLUSPLUS, and
AC_LANG_RESTORE.
* configure: Regenerate.
---

Tomasz pointed out that we're using deprecated macros. This would
replace them with the modern Autoconf macros, but I'm not sure it's
really an improvement, or necessary.

Should we bother?

Tested x86_64-linux.

 libstdc++-v3/acinclude.m4 | 290 +--
 libstdc++-v3/configure| 307 ++
 2 files changed, 349 insertions(+), 248 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index a0094c2dd95..86f974feb6f 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -118,6 +118,27 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
   GLIBCXX_CHECK_HOST
 ])
 
+dnl
+dnl Sets the current language for compilation tests to C++ and saves the
+dnl current values of CXXFLAGS and LIBS so they can be restored afterwards.
+dnl This should be paired with GLIBCXX_LANG_POP.
+dnl
+AC_DEFUN([GLIBCXX_LANG_PUSH],[
+  AC_LANG_PUSH(C++)
+  ac_save_CXXFLAGS="$CXXFLAGS"
+  ac_save_LIBS="$LIBS"
+])
+
+dnl
+dnl Restores the current language for compilation tests to the previous value
+dnl and restores the previous values of CXXFLAGS and LIBS.
+dnl This should be paired with GLIBCXX_LANG_PUSH.
+dnl
+AC_DEFUN([GLIBCXX_LANG_POP],[
+  CXXFLAGS="$ac_save_CXXFLAGS"
+  LIBS="$ac_save_LIBS"
+  AC_LANG_POP(C++)
+])
 
 dnl
 dnl Tests for newer compiler features, or features that are present in newer
@@ -135,27 +156,18 @@ AC_DEFUN([GLIBCXX_CHECK_COMPILER_FEATURES], [
   # All these tests are for C++; save the language and the compiler flags.
   # The CXXFLAGS thing is suspicious, but based on similar bits previously
   # found in GLIBCXX_CONFIGURE.
-  AC_LANG_SAVE
-  AC_LANG_CPLUSPLUS
-  ac_test_CXXFLAGS="${CXXFLAGS+set}"
-  ac_save_CXXFLAGS="$CXXFLAGS"
+  GLIBCXX_LANG_PUSH
 
   # Check for -ffunction-sections -fdata-sections
   AC_MSG_CHECKING([for g++ that supports -ffunction-sections -fdata-sections])
   CXXFLAGS='-g -Werror -ffunction-sections -fdata-sections'
   AC_TRY_COMPILE([int foo; void bar() { };],, [ac_fdsections=yes], 
[ac_fdsections=no])
-  if test "$ac_test_CXXFLAGS" = set; then
-CXXFLAGS="$ac_save_CXXFLAGS"
-  else
-# this is the suspicious part
-CXXFLAGS=''
-  fi
   if test x"$ac_fdsections" = x"yes"; then
 SECTION_FLAGS='-ffunction-sections -fdata-sections'
   fi
   AC_MSG_RESULT($ac_fdsections)
 
-  AC_LANG_RESTORE
+  GLIBCXX_LANG_POP
   AC_SUBST(SECTION_FLAGS)
 ])
 
@@ -329,8 +341,7 @@ AC_DEFUN([GLIBCXX_CHECK_SETRLIMIT_ancilliary], [
 ])
 
 AC_DEFUN([GLIBCXX_CHECK_SETRLIMIT], [
-  AC_LANG_SAVE
-  AC_LANG_CPLUSPLUS
+  GLIBCXX_LANG_PUSH
   setrlimit_have_headers=yes
   AC_CHECK_HEADERS(unistd.h sys/time.h sys/resource.h,
   [],
@@ -364,7 +375,7 @@ AC_DEFUN([GLIBCXX_CHECK_SETRLIMIT], [
"make check"])
 fi
   fi
-  AC_LANG_RESTORE
+  GLIBCXX_LANG_POP
 ])
 
 
@@ -374,9 +385,7 @@ dnl Define HAVE_S_ISREG / HAVE_S_IFREG appropriately.
 dnl
 AC_DEFUN([GLIBCXX_CHECK_S_ISREG_OR_S_IFREG], [
 
-  AC_LANG_SAVE
-  AC_LANG_CPLUSPLUS
-  ac_save_CXXFLAGS="$CXXFLAGS"
+  GLIBCXX_LANG_PUSH
   CXXFLAGS="$CXXFLAGS -fno-exceptions"
 
   AC_MSG_CHECKING([for S_ISREG or S_IFREG])
@@ -410,8 +419,7 @@ AC_DEFUN([GLIBCXX_CHECK_S_ISREG_OR_S_IFREG], [
   fi
   AC_MSG_RESULT($res)
 
-  CXXFLAGS="$ac_save_CXXFLAGS"
-  AC_LANG_RESTORE
+  GLIBCXX_LANG_POP
 ])
 
 
@@ -420,9 +428,7 @@ dnl Check whether poll is available in , and define 
HAVE_POLL.
 dnl
 AC_DEFUN([GLIBCXX_CHECK_POLL], [
 
-  AC_LANG_SAVE
-  AC_LANG_CPLUSPLUS
-  ac_save_CXXFLAGS="$CXXFLAGS"
+  GLIBCXX_LANG_PUSH
   CXXFLAGS="$CXXFLAGS -fno-exceptions"
 
   AC_CACHE_CHECK([for poll], glibcxx_cv_POLL, [
@@ -438,8 +444,7 @@ AC_DEFUN([GLIBCXX_CHECK_POLL], [
 AC_DEFINE(HAVE_POLL, 1, [Define if poll is available in .])
   fi
 
-  CXXFLAGS="$ac_save_CXXFLAGS"
-  AC_LANG_RESTORE
+  GLIBCXX_LANG_POP
 ])
 
 
@@ -448,9 +453,7 @@ dnl Check whether writev is available in , and 
define HAVE_WRITEV.
 dnl
 AC_DEFUN([GLIBCXX_CHECK_WRITEV], [
 
-  AC_LANG_SAVE
-  AC_LANG_CPLUSPLUS
-  ac_save_CXXFLAGS="$CXXFLAGS"
+  GLIBCXX_LANG_PUSH
   CXXFLAGS="$CXXFLAGS -fno-exceptions"
 
   AC_CACHE_CHECK([for writev], glibcxx_cv_WRITEV, [
@@ -465,8 +468,7 @@ AC_DEFUN([GLIBCXX_CHECK_WRITEV], [
 AC_DEFINE(HAVE_WRITEV, 1, [Define if writev is available in .])
   fi
 
-  CXXFLAGS="$ac_save_CXXFLAGS"
-  AC_LANG_RESTORE
+  GLIBCXX_LANG_POP
 ])
 
 
@@ -474,9 +476,7 @@ dnl
 dnl Check whether LFS support is available.
 dnl
 AC_DEFUN([GLIBCXX_CHECK_LFS], [
-  AC_LANG_SAVE
-  AC_LANG_CPLUSPLUS
-  ac_save_CXXFLAGS="$CXXFLAGS"
+  GLIBCXX_LANG_PUSH
   CXXFLAGS="$CXXFLAGS -fno-exceptions"
   AC_CACHE_CHECK([for LFS support], glibcxx_cv_LFS, [
 GCC_TRY_COMPILE_OR_LINK(
@@ -513,8 +513,7 @@ AC_DEF

Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-25 Thread Qing Zhao


> On Apr 24, 2025, at 13:07, Kees Cook  wrote:
> 
> On Thu, Apr 24, 2025 at 04:36:14PM +, Qing Zhao wrote:
>> 
>>> On Apr 24, 2025, at 11:59, Martin Uecker  wrote:
>>> 
>>> Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
 Hi, 
 
 Kees reported a segmentation failure when he used the patch to compiler 
 kernel, 
 and the reduced the testing case is something like the following:
 
 struct f {
 void *g __attribute__((__counted_by__(h)));
 long h;
 };
 
 extern struct f *my_alloc (int);
 
 int i(void) {
 struct f *iov = my_alloc (10);
 int *j = (int *)iov->g;
 return __builtin_dynamic_object_size(iov->g, 0);
 }
 
 Basically, the problem is relating to the pointee type of the pointer 
 array being “void”, 
 As a result, the element size of the array is not available in the IR. 
 Therefore segmentation
 fault when calculating the size of the whole object. 
 
 Although it’s easy to fix this segmentation failure, I am not quite sure 
 what’s the best
 solution to this issue:
 
 1. Reject such usage of “counted_by” in the very beginning by reporting 
 warning to the
 User, and delete the counted_by attribute from the field.
 
 Or:
 
 2. Accept such usage, but issue warnings when calculating the object_size 
 in Middle-end.
 
 Personally, I prefer the above 1 since I think that when the pointee type 
 is void, we don’t know
 The type of the element of the pointer array, there is no way to decide 
 the size of the pointer array. 
 
 So, the counted_by information is not useful for the 
 __builtin_dynamic_object_size.
 
 But I am not sure whether the counted_by still can be used for bound 
 sanitizer?
 
 Thanks for suggestions and help.
>>> 
>>> GNU C allows pointer arithmetic and sizeof on void pointers and
>>> that treats void as having size 1.  So you could also allow counted_by
>>> and assume as size 1 for void.
>>> 
>>> https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html
>> 
>> Okay, thanks for the info.
>> So, 
>> 1. should we issue warnings when doing this?
> 
> Please don't, Linux would very much like to track these allocation sizes
> still. Performing pointer arithmetic and bounds checking (via __bdos) on
> "void *" is wanted (and such a calculation was what tripped the
> segfault).
> 
>> 2. If the compilation option is explicitly asking for standard C,
>>shall we issue warning and delete the counted_by attribute from the field?
> 
> I think it needs to stay attached for __bdos. And from the looks of it,
> even array access works with 1-byte values too:
> 
> extern void *ptr;
> void *foo(int num) {
>return &ptr[num];
> }
> 
> The assembly output of this shows it's doing byte addition. Clang
> doesn't warn about this, but GCC does:
> 
> :5:16: warning: dereferencing 'void *' pointer
>5 | return &ptr[num];
>  |^
> 
> So, I think even the bounds sanitizer should handle it, even if a
> warning ultimately gets emitted.

I tried to come up with a testing case for array sanitizer on void pointers as 
following:

#include 

struct annotated {
  int b;
  void *c __attribute__ ((counted_by (b)));
} *array_annotated;

void __attribute__((__noinline__)) setup (int annotated_count)
{
  array_annotated
= (struct annotated *)malloc (sizeof (struct annotated));
  array_annotated->c = malloc (sizeof (char) * annotated_count);
  array_annotated->b = annotated_count;

  return;
}

void __attribute__((__noinline__)) test (int annotated_index)
{
  array_annotated->c[annotated_index] = 2;
}

int main(int argc, char *argv[])
{
  setup (10);
  test (10);
  return 0;
}

When I compile it, I always got the following error:

: In function ‘test’:
:25:21: warning: dereferencing ‘void *’ pointer
:25:39: error: invalid use of void expression

Looks like that the void pointer cannot be accessed as an array?

thanks.

Qing
> 
> -Kees
> 
> -- 
> Kees Cook



[PING][PATCH v3] Add new warning Wmissing-designated-initializers [PR39589]

2025-04-25 Thread Peter Frost

Ping https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672568.html

Missed the version 15 freeze with the last ping, I believe the project 
is open for general development again now?


Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-25 Thread Bill Wendling
[sorry for hijacking this thread]

Does anyone have any response to Yeoul Na's suggestion to hold off on
implementing __counted_by_expr() until the next C standards committee?

  
https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885/17?u=void

I'm fine with it, but I can't make a decision by fiat.

-bw


[committed] libstdc++: Remove c++26 dg-error lines for -Wdelete-incomplete errors

2025-04-25 Thread Jonathan Wakely
This fixes:
FAIL: tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc  -std=gnu++26  (test 
for errors, line 283)
FAIL: tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc  -std=gnu++26  (test 
for errors, line 305)

This is another consequence of r16-133-g8acea9ffa82ed8 which prevents
the -Wdelete-incomplete errors that happen after the first error.

libstdc++-v3/ChangeLog:

* testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc:
Remove dg-error directives for additional c++26 errors.
---

Tested x86_64-linux. Pushed to trunk.

 .../tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc   | 3 ---
 1 file changed, 3 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc 
b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc
index 7b5ede4f00a..03fad1486c7 100644
--- 
a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc
+++ 
b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc
@@ -39,9 +39,6 @@ void test01()
   // { dg-error "incomplete" "" { target *-*-* } 600 }
 }
 
-// { dg-error "-Wdelete-incomplete" "" { target c++26 } 283 }
-// { dg-error "-Wdelete-incomplete" "" { target c++26 } 305 }
-
 // Ignore additional diagnostic given with -Wsystem-headers:
 // { dg-prune-output "has incomplete type" }
 // { dg-prune-output "possible problem detected" }
-- 
2.49.0



[committed] libstdc++: Micro-optimization for std::addressof

2025-04-25 Thread Jonathan Wakely
Currently std::addressof calls std::__addressof which uses
__builtin_addressof. This leads to me prefering std::__addressof in some
code, to avoid the extra hop. But it's not as though the implementation
of std::__addressof is complicated and reusing it avoids any code
duplication.

So let's just make std::addressof use the built-in directly, and then we
only need to use std::__addressof in C++98 code. (Transitioning existing
uses of std::__addressof to std::addressof isn't included in this
change.)

The front end does fold std::addressof with -ffold-simple-inlines but
this change still seems worthwhile.

libstdc++-v3/ChangeLog:

* include/bits/move.h (addressof): Use __builtin_addressof
directly.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/move.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/move.h b/libstdc++-v3/include/bits/move.h
index e91b003e695..085ca074fc8 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -174,7 +174,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 [[__nodiscard__,__gnu__::__always_inline__]]
 inline _GLIBCXX17_CONSTEXPR _Tp*
 addressof(_Tp& __r) noexcept
-{ return std::__addressof(__r); }
+{ return __builtin_addressof(__r); }
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2598. addressof works on temporaries
-- 
2.49.0



[committed] libstdc++: Use markdown in some Doxygen comments

2025-04-25 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* include/bits/ptr_traits.h (to_address): Use markdown for
formatting in Doxygen comments.
---

Pushed to trunk.

 libstdc++-v3/include/bits/ptr_traits.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/ptr_traits.h 
b/libstdc++-v3/include/bits/ptr_traits.h
index d3c17652426..4308669e03b 100644
--- a/libstdc++-v3/include/bits/ptr_traits.h
+++ b/libstdc++-v3/include/bits/ptr_traits.h
@@ -223,7 +223,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /**
* @brief Obtain address referenced by a pointer to an object
* @param __ptr A pointer to an object
-   * @return @c __ptr
+   * @return `__ptr`
* @ingroup pointer_abstractions
   */
   template
@@ -239,8 +239,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /**
* @brief Obtain address referenced by a pointer to an object
* @param __ptr A pointer to an object
-   * @return @c pointer_traits<_Ptr>::to_address(__ptr) if that expression is
- well-formed, otherwise @c to_address(__ptr.operator->())
+   * @return `pointer_traits<_Ptr>::to_address(__ptr)` if that expression is
+   * well-formed, otherwise `to_address(__ptr.operator->())`.
* @ingroup pointer_abstractions
   */
   template
-- 
2.49.0



[PATCH] libstdc++: Use 'if constexpr' to slightly simplify

2025-04-25 Thread Jonathan Wakely
This will hardly make a dent in the very slow compile times for 
but it seems worth doing anyway.

libstdc++-v3/ChangeLog:

* include/bits/regex_compiler.h (_AnyMatcher::operator()):
Replace tag dispatching with 'if constexpr'.
(_AnyMatcher::_M_apply): Remove both overloads.
(_BracketMatcher::operator(), _BracketMatcher::_M_ready):
Replace tag dispatching with 'if constexpr'.
(_BracketMatcher::_M_apply(_CharT, true_type)): Remove.
(_BracketMatcher::_M_apply(_CharT, false_type)): Remove second
parameter.
(_BracketMatcher::_M_make_cache): Remove both overloads.
* include/bits/regex_compiler.tcc (_BracketMatcher::_M_apply):
Remove second parameter.
---

Tested x86_64-linux.

 libstdc++-v3/include/bits/regex_compiler.h   | 59 +---
 libstdc++-v3/include/bits/regex_compiler.tcc |  2 +-
 2 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index f24c7e3baa6..3931790091a 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -376,26 +376,21 @@ namespace __detail
 
   bool
   operator()(_CharT __ch) const
-  { return _M_apply(__ch, typename is_same<_CharT, char>::type()); }
-
-  bool
-  _M_apply(_CharT __ch, true_type) const
   {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
auto __c = _M_translator._M_translate(__ch);
auto __n = _M_translator._M_translate('\n');
auto __r = _M_translator._M_translate('\r');
-   return __c != __n && __c != __r;
-  }
-
-  bool
-  _M_apply(_CharT __ch, false_type) const
-  {
-   auto __c = _M_translator._M_translate(__ch);
-   auto __n = _M_translator._M_translate('\n');
-   auto __r = _M_translator._M_translate('\r');
-   auto __u2028 = _M_translator._M_translate(u'\u2028');
-   auto __u2029 = _M_translator._M_translate(u'\u2029');
-   return __c != __n && __c != __r && __c != __u2028 && __c != __u2029;
+   if constexpr (is_same<_CharT, char>::value)
+ return __c != __n && __c != __r;
+   else
+ {
+   auto __ls = _M_translator._M_translate(u'\u2028'); // line sep
+   auto __ps = _M_translator._M_translate(u'\u2029'); // para sep
+   return __c != __n && __c != __r && __c != __ls && __c != __ps;
+ }
+#pragma GCC diagnostic pop
   }
 
   _TransT _M_translator;
@@ -440,8 +435,14 @@ namespace __detail
   bool
   operator()(_CharT __ch) const
   {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
_GLIBCXX_DEBUG_ASSERT(_M_is_ready);
-   return _M_apply(__ch, _UseCache());
+   if constexpr (_UseCache::value)
+ if (!(__ch & 0x80)) [[__likely__]]
+   return _M_cache[static_cast<_UnsignedCharT>(__ch)];
+   return _M_apply(__ch);
+#pragma GCC diagnostic pop
   }
 
   void
@@ -509,11 +510,16 @@ namespace __detail
   void
   _M_ready()
   {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
std::sort(_M_char_set.begin(), _M_char_set.end());
auto __end = std::unique(_M_char_set.begin(), _M_char_set.end());
_M_char_set.erase(__end, _M_char_set.end());
-   _M_make_cache(_UseCache());
+   if constexpr (_UseCache::value)
+ for (unsigned __i = 0; __i < 128; __i++) // Only cache 7-bit chars
+   _M_cache[__i] = _M_apply(static_cast<_CharT>(__i));
_GLIBCXX_DEBUG_ONLY(_M_is_ready = true);
+#pragma GCC diagnostic pop
   }
 
 private:
@@ -531,22 +537,7 @@ namespace __detail
   using _UnsignedCharT = typename std::make_unsigned<_CharT>::type;
 
   bool
-  _M_apply(_CharT __ch, false_type) const;
-
-  bool
-  _M_apply(_CharT __ch, true_type) const
-  { return _M_cache[static_cast<_UnsignedCharT>(__ch)]; }
-
-  void
-  _M_make_cache(true_type)
-  {
-   for (unsigned __i = 0; __i < _M_cache.size(); __i++)
- _M_cache[__i] = _M_apply(static_cast<_CharT>(__i), false_type());
-  }
-
-  void
-  _M_make_cache(false_type)
-  { }
+  _M_apply(_CharT __ch) const;
 
 private:
   _GLIBCXX_STD_C::vector<_CharT>_M_char_set;
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index cd0db2761b5..59b79fdd106 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -598,7 +598,7 @@ namespace __detail
   template
 bool
 _BracketMatcher<_TraitsT, __icase, __collate>::
-_M_apply(_CharT __ch, false_type) const
+_M_apply(_CharT __ch) const
 {
   return [this, __ch]
   {
-- 
2.49.0



Re: [EXT] Re: [PATCH] RISC-V: Add tt-ascalon-d8 integer and floating point scheduling model

2025-04-25 Thread Anton Blanchard
Hi Jeff,

On Fri, Apr 25, 2025 at 6:04 AM Jeff Law  wrote:
>
> On 4/24/25 2:37 AM, Anton Blanchard wrote:
> > Add integer and floating point scheduling models for the Tenstorrent
> > Ascalon 8 wide CPU.
> >
> > gcc/ChangeLog:
> >   * config/riscv/riscv-cores.def (RISCV_TUNE): Update.
> >   * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
> > Add tt_ascalon_d8.
> >   * config/riscv/riscv.md: Update tune attribute and include
> > tt-ascalon-d8.md.
> >   * config/riscv/tenstorrent-ascalon.md: New file.
> This looks pretty sensible.  The only worry would be insns types that
> don't have a mapping to anything in the DFA -- those will cause an ICE
> in the scheduler as we require all insns to have a type and map to a
> reservation in the DFA.
>
> So for example someone could ask for rv64gcv code generation, but
> ascalon-d8 scheduling.  The compiler will ultimately fault in the
> scheduler because you don't have a mapping of vector insns to a
> reservation in the DFA.

Thanks for the review. I was able to build cpu2006 with -mcpu=rv64gcv
-mtune=tt-ascalon-d8. I think this is because the generic-vector-ooo
description doesn't check tune so is applied always.

Thanks,
Anton


Re: [PATCH] libstdc++: Strip reference and cv-qual in range deduction guides for maps.

2025-04-25 Thread Jonathan Wakely
On Tue, 22 Apr 2025 at 08:13, Tomasz Kaminski  wrote:
>
> The test cover constructors introduced in [PATCH v2] libstdc++-v3: Implement 
> missing allocator-aware constructors for unordered containers,
> so I will merge that after.

This change is OK for trunk now too.

>
> On Fri, Apr 18, 2025 at 5:18 PM Jonathan Wakely  wrote:
>>
>>
>>
>> On Fri, 18 Apr 2025, 12:55 Tomasz Kamiński,  wrote:
>>>
>>> This implements part of LWG4223 that adjust the deduction guides for maps 
>>> types
>>> (map, unordered_map, flat_map and non-unique equivalent) from "range", such 
>>> that
>>> referience and cv qualification are stripped from the element of the 
>>> pair-like
>>> value_type.
>>>
>>> In combination with r15-8296-gd50171bc07006d, the LWG4223 is fully 
>>> implemented now.
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>> * include/bits/ranges_base.h (__detail::__range_key_type):
>>> Replace remove_const_t with remove_cvref_t.
>>> (__detail::__range_mapped_type): Apply remove_cvref_t.
>>> * include/bits/stl_iterator.h: (__detail::__iter_key_t):
>>> Replace remove_const_t with __remove_cvref_t.
>>> (__detail::__iter_val_t): Apply __remove_cvref_t.
>>> * testsuite/23_containers/flat_map/1.cc: New tests.
>>> * testsuite/23_containers/flat_multimap/1.cc: New tests.
>>> * testsuite/23_containers/map/cons/deduction.cc: New tests.
>>> * testsuite/23_containers/map/cons/from_range.cc: New tests.
>>> * testsuite/23_containers/multimap/cons/deduction.cc: New tests.
>>> * testsuite/23_containers/multimap/cons/from_range.cc: New tests.
>>> * testsuite/23_containers/unordered_map/cons/deduction.cc: New 
>>> tests.
>>> * testsuite/23_containers/unordered_map/cons/from_range.cc: New 
>>> tests.
>>> * testsuite/23_containers/unordered_multimap/cons/deduction.cc:
>>> New tests.
>>> * testsuite/23_containers/unordered_multimap/cons/from_range.cc:
>>> New tests.
>>> ---
>>> Desipite there being some discussion about this on reflector, I think we
>>> should go ahead with this, to avoid creating maps of references/const types.
>>> As we are at the begining of development of 16, this seems like a good time
>>> to do it.
>>> OK for trunk?
>>
>>
>> OK for trunk.
>>
>>
>>>
>>>
>>>  libstdc++-v3/include/bits/ranges_base.h   |  4 +-
>>>  libstdc++-v3/include/bits/stl_iterator.h  | 11 +--
>>>  .../testsuite/23_containers/flat_map/1.cc | 21 +++---
>>>  .../23_containers/flat_multimap/1.cc  | 21 +++---
>>>  .../23_containers/map/cons/deduction.cc   | 46 +
>>>  .../23_containers/map/cons/from_range.cc  |  8 +--
>>>  .../23_containers/multimap/cons/deduction.cc  | 46 +
>>>  .../23_containers/multimap/cons/from_range.cc |  8 +--
>>>  .../unordered_map/cons/deduction.cc   | 69 +++
>>>  .../unordered_map/cons/from_range.cc  | 10 ++-
>>>  .../unordered_multimap/cons/deduction.cc  | 69 +++
>>>  .../unordered_multimap/cons/from_range.cc | 10 +--
>>>  12 files changed, 279 insertions(+), 44 deletions(-)
>>>
>>> diff --git a/libstdc++-v3/include/bits/ranges_base.h 
>>> b/libstdc++-v3/include/bits/ranges_base.h
>>> index 488907da446..dde16498856 100644
>>> --- a/libstdc++-v3/include/bits/ranges_base.h
>>> +++ b/libstdc++-v3/include/bits/ranges_base.h
>>> @@ -1103,11 +1103,11 @@ namespace __detail
>>>// 4223. Deduction guides for maps are mishandling tuples and references
>>>template
>>>  using __range_key_type
>>> -  = remove_const_t>>;
>>> +  = remove_cvref_t>>;
>>>
>>>template
>>>  using __range_mapped_type
>>> -  = tuple_element_t<1, ranges::range_value_t<_Range>>;
>>> +  = remove_cvref_t>>;
>>>
>>>// The allocator's value_type for map-like containers.
>>>template
>>> diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
>>> b/libstdc++-v3/include/bits/stl_iterator.h
>>> index 9203a66b2ff..bed72955d0c 100644
>>> --- a/libstdc++-v3/include/bits/stl_iterator.h
>>> +++ b/libstdc++-v3/include/bits/stl_iterator.h
>>> @@ -3086,8 +3086,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>  #if __cpp_deduction_guides >= 201606
>>>// These helper traits are used for deduction guides
>>>// of associative containers.
>>> +
>>> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>> +  // 4223. Deduction guides for maps are mishandling tuples and references
>>>template
>>> -using __iter_key_t = remove_const_t<
>>> +using __iter_key_t = __remove_cvref_t<
>>>  #ifdef __glibcxx_tuple_like // >= C++23
>>>tuple_element_t<0, typename 
>>> iterator_traits<_InputIterator>::value_type>>;
>>>  #else
>>> @@ -3095,11 +3098,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>  #endif
>>>
>>>template
>>> -using __iter_val_t
>>> +using __iter_val_t = __remove_cvref_t<
>>>  #ifdef __glibcxx_tuple_like // >= C++23
>>> -  = tuple_element_t<1, typename 
>>> iterator_traits<_InputIterator>::va

Re: [PATCH v2] libstdc++-v3: Implement missing allocator-aware constructors for unordered containers.

2025-04-25 Thread Jonathan Wakely
On Fri, 18 Apr 2025 at 09:11, Tomasz Kamiński  wrote:
>
> This patch implements remainder of LWG2713 (after r15-8293-g64f5c854597759)
> by adding missing allocator aware version of unordered associative containers
> constructors accepting pair of iterators or initializer_list, and 
> corresponding
> deduction guides.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/unordered_map.h (unordered_map):
> Define constructors accepting:
> (_InputIterator, _InputIterator, const allocator_type&),
> (initializer_list, const allocator_type&),
> (unordered_multimap): Likewise.
> * include/debug/unordered_map (unordered_map): Likewise.
> (unordered_multimap): Likewise.
> * include/bits/unordered_set.h (unordered_set):
> Define constructors and deduction guide accepting:
> (_InputIterator, _InputIterator, const allocator_type&),
> (initializer_list, const allocator_type&),
> (unordered_multiset): Likewise.
> * include/debug/unordered_set (unordered_set): Likewise.
> (unordered_multiset): Likewise.
> * testsuite/23_containers/unordered_map/cons/66055.cc: New tests.
> * testsuite/23_containers/unordered_map/cons/deduction.cc: New tests.
> * testsuite/23_containers/unordered_multimap/cons/66055.cc: New tests.
> * testsuite/23_containers/unordered_multimap/cons/deduction.cc: New
> tests.
> * testsuite/23_containers/unordered_multiset/cons/66055.cc: New tests.
> * testsuite/23_containers/unordered_multiset/cons/deduction.cc: New
> tests.
> * testsuite/23_containers/unordered_set/cons/66055.cc: New tests.
> * testsuite/23_containers/unordered_set/cons/deduction.cc: New tests.
> ---
> I would like to merge rest of the changes, now that we are in 16.
> Tested on x86_64-linux. OK for trunk?

OK for trunk, thanks.


>
>  libstdc++-v3/include/bits/unordered_map.h | 30 
>  libstdc++-v3/include/bits/unordered_set.h | 71 +++
>  libstdc++-v3/include/debug/unordered_map  | 22 ++
>  libstdc++-v3/include/debug/unordered_set  | 54 ++
>  .../23_containers/unordered_map/cons/66055.cc | 11 +--
>  .../unordered_map/cons/deduction.cc   | 28 
>  .../unordered_multimap/cons/66055.cc  | 10 +--
>  .../unordered_multimap/cons/deduction.cc  | 34 +
>  .../unordered_multiset/cons/66055.cc  | 10 +--
>  .../unordered_multiset/cons/deduction.cc  | 28 
>  .../23_containers/unordered_set/cons/66055.cc | 10 +--
>  .../unordered_set/cons/deduction.cc   | 28 
>  12 files changed, 320 insertions(+), 16 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/unordered_map.h 
> b/libstdc++-v3/include/bits/unordered_map.h
> index 5bc58e849ff..fc07ffc998c 100644
> --- a/libstdc++-v3/include/bits/unordered_map.h
> +++ b/libstdc++-v3/include/bits/unordered_map.h
> @@ -251,6 +251,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>: unordered_map(__n, __hf, key_equal(), __a)
>{ }
>
> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +  // 2713. More missing allocator-extended constructors for unordered 
> containers
> +  template
> +   unordered_map(_InputIterator __first, _InputIterator __last,
> + const allocator_type& __a)
> +   : unordered_map(__first, __last, 0, hasher(), key_equal(), __a)
> +   { }
> +
>template
> unordered_map(_InputIterator __first, _InputIterator __last,
>   size_type __n,
> @@ -271,6 +279,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>: unordered_map(__l, __n, hasher(), key_equal(), __a)
>{ }
>
> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +  // 2713. More missing allocator-extended constructors for unordered 
> containers
> +  unordered_map(initializer_list __l,
> +   const allocator_type& __a)
> +  : unordered_map(__l, 0, hasher(), key_equal(), __a)
> +  { }
> +
>unordered_map(initializer_list __l,
> size_type __n, const hasher& __hf,
> const allocator_type& __a)
> @@ -1504,6 +1519,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>: unordered_multimap(__n, __hf, key_equal(), __a)
>{ }
>
> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +  // 2713. More missing allocator-extended constructors for unordered 
> containers
> +  template
> +   unordered_multimap(_InputIterator __first, _InputIterator __last,
> +  const allocator_type& __a)
> +   : unordered_multimap(__first, __last, 0, hasher(), key_equal(), __a)
> +   { }
> +
>template
> unordered_multimap(_InputIterator __first, _InputIterator __last,
>size_type __n,
> @@ -1518,6 +1541,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> : unordered_multimap(__first, __last, __n, __hf, key_equal(), __a)
> { }
>
> +  // _GLIB

[committed] libstdc++: Remove c++98_only dg-error

2025-04-25 Thread Jonathan Wakely
This fixes
FAIL: 22_locale/ctype/is/string/89728_neg.cc  -std=gnu++98  (test for errors, 
line )

Since r16-133-g8acea9ffa82ed8 we don't keep issuing more errors after
the first one, so this dg-error no longer matches anything.

libstdc++-v3/ChangeLog:

* testsuite/22_locale/ctype/is/string/89728_neg.cc: Remove
dg-error for c++98_only effective target.
---

Tested x86_64-linux, pushed to trunk.

 libstdc++-v3/testsuite/22_locale/ctype/is/string/89728_neg.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/22_locale/ctype/is/string/89728_neg.cc 
b/libstdc++-v3/testsuite/22_locale/ctype/is/string/89728_neg.cc
index a34b2aed971..24aba99376f 100644
--- a/libstdc++-v3/testsuite/22_locale/ctype/is/string/89728_neg.cc
+++ b/libstdc++-v3/testsuite/22_locale/ctype/is/string/89728_neg.cc
@@ -18,7 +18,6 @@
 // .
 
 // { dg-error "invalid use of incomplete type" "" { target *-*-* } 0 }
-// { dg-error "invalid 'static_cast'" "" { target c++98_only } 0 }
 
 #include 
 
-- 
2.49.0



[PATCH] libstdc++: Rename std::latch data member

2025-04-25 Thread Jonathan Wakely
Rename _M_a to match the name of the exposition-only data member shown
in the standard, i.e. 'counter'.

libstdc++-v3/ChangeLog:

* include/std/latch (latch::_M_a): Rename to _M_counter.
---

IMHO this makes it a little easier to compare the implementation to the
spec.

Tested x86_64-linux.

 libstdc++-v3/include/std/latch | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index cf648545629..dc147c24bbe 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -62,7 +62,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 constexpr explicit
 latch(ptrdiff_t __expected) noexcept
-: _M_a(__expected)
+: _M_counter(__expected)
 { __glibcxx_assert(__expected >= 0 && __expected <= max()); }
 
 ~latch() = default;
@@ -74,23 +74,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 count_down(ptrdiff_t __update = 1)
 {
   __glibcxx_assert(__update >= 0 && __update <= max());
-  auto const __old = __atomic_impl::fetch_sub(&_M_a, __update,
+  auto const __old = __atomic_impl::fetch_sub(&_M_counter, __update,
  memory_order::release);
   if (std::cmp_equal(__old, __update))
-   __atomic_impl::notify_all(&_M_a);
+   __atomic_impl::notify_all(&_M_counter);
   else
__glibcxx_assert(std::cmp_less(__update, __old));
 }
 
 _GLIBCXX_ALWAYS_INLINE bool
 try_wait() const noexcept
-{ return __atomic_impl::load(&_M_a, memory_order::acquire) == 0; }
+{ return __atomic_impl::load(&_M_counter, memory_order::acquire) == 0; }
 
 _GLIBCXX_ALWAYS_INLINE void
 wait() const noexcept
 {
   auto const __pred = [this] { return this->try_wait(); };
-  std::__atomic_wait_address(&_M_a, __pred);
+  std::__atomic_wait_address(&_M_counter, __pred);
 }
 
 _GLIBCXX_ALWAYS_INLINE void
@@ -102,7 +102,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   private:
 alignas(__detail::__platform_wait_alignment)
-  __detail::__platform_wait_t _M_a;
+  __detail::__platform_wait_t _M_counter;
   };
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
-- 
2.49.0



RE: [PATCH 1/3][GCC16-Stage-1] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-25 Thread Li, Pan2
Thanks Jeff for comments.

> The point is the separation isn't neat and clean in rtx_costs.   We've 
> kind of let it go as-is with the vector short-cut out, but I don't think 
> that's really where we want to be, it was just an expedient decision to 
> allow us to focus on more important stuff.  Now that we're looking at 
> using rtx costing in more meaningful ways we probably need to rethink 
> the hack we've got in place.

I see, will have a try in v3 for this part.

Pan

-Original Message-
From: Jeff Law  
Sent: Friday, April 25, 2025 4:26 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; Chen, Ken 
; Liu, Hongtao ; Paul-Antoine Arras 
; Alexandre Oliva 
Subject: Re: [PATCH 1/3][GCC16-Stage-1] RISC-V: Combine vec_duplicate + vadd.vv 
to vadd.vx on GR2VR cost



On 4/18/25 9:28 AM, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> So we've got 3 patches all touching on the same basic area, so we need
>> to be careful about staging in.
> 
> Agree, thanks Jeff for paying attention.
> 
>> So don't be surprised if most review time is focused on how the costing
>> model works since that's a common theme across these patches.
> 
> I see, Robin also mentioned this last year.
> 
>> This feels very much like a hack to me.  Why not just handle
>> VEC_DUPLICATE like the rest of the opcodes in riscv_rtx_costs?
>> Ultimately all that code needs to work together rather than having
>> separate paths for vector/scalar.
> 
> The idea is to separate vector related code into another sub function
> for readability, instead of unroll the vector cost logic in riscv_rtx_costs. 
> Given the
> riscv_rtx_costs function body is quite long already. Currently we may have
> Vec_dup but it may introduce more cases.
All true and understandable.  The problem is rtx_cost isn't really 
separable like that.  RTL in there can be fairly arbitrary and mixed, 
just because we have a vector mode doesn't mean we won't have scalar 
ops.  Worse yet those scalar ops might be something more complex than a 
simple register.

You should expect to get arbitrary RTL in there with arbitrary operands 
-- it doesn't have to match anything actually supported by the target.

The point is the separation isn't neat and clean in rtx_costs.   We've 
kind of let it go as-is with the vector short-cut out, but I don't think 
that's really where we want to be, it was just an expedient decision to 
allow us to focus on more important stuff.  Now that we're looking at 
using rtx costing in more meaningful ways we probably need to rethink 
the hack we've got in place.


Jeff


Re: [PATCH] libstdc++: Rename std::latch data member

2025-04-25 Thread Tomasz Kaminski
On Fri, Apr 25, 2025 at 1:51 PM Jonathan Wakely  wrote:

> Rename _M_a to match the name of the exposition-only data member shown
> in the standard, i.e. 'counter'.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/latch (latch::_M_a): Rename to _M_counter.
> ---
>
> IMHO this makes it a little easier to compare the implementation to the
> spec.
>
> Tested x86_64-linux.
>
LGTM

>
>  libstdc++-v3/include/std/latch | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/latch
> b/libstdc++-v3/include/std/latch
> index cf648545629..dc147c24bbe 100644
> --- a/libstdc++-v3/include/std/latch
> +++ b/libstdc++-v3/include/std/latch
> @@ -62,7 +62,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  constexpr explicit
>  latch(ptrdiff_t __expected) noexcept
> -: _M_a(__expected)
> +: _M_counter(__expected)
>  { __glibcxx_assert(__expected >= 0 && __expected <= max()); }
>
>  ~latch() = default;
> @@ -74,23 +74,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  count_down(ptrdiff_t __update = 1)
>  {
>__glibcxx_assert(__update >= 0 && __update <= max());
> -  auto const __old = __atomic_impl::fetch_sub(&_M_a, __update,
> +  auto const __old = __atomic_impl::fetch_sub(&_M_counter, __update,
>   memory_order::release);
>if (std::cmp_equal(__old, __update))
> -   __atomic_impl::notify_all(&_M_a);
> +   __atomic_impl::notify_all(&_M_counter);
>else
> __glibcxx_assert(std::cmp_less(__update, __old));
>  }
>
>  _GLIBCXX_ALWAYS_INLINE bool
>  try_wait() const noexcept
> -{ return __atomic_impl::load(&_M_a, memory_order::acquire) == 0; }
> +{ return __atomic_impl::load(&_M_counter, memory_order::acquire) ==
> 0; }
>
>  _GLIBCXX_ALWAYS_INLINE void
>  wait() const noexcept
>  {
>auto const __pred = [this] { return this->try_wait(); };
> -  std::__atomic_wait_address(&_M_a, __pred);
> +  std::__atomic_wait_address(&_M_counter, __pred);
>  }
>
>  _GLIBCXX_ALWAYS_INLINE void
> @@ -102,7 +102,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>private:
>  alignas(__detail::__platform_wait_alignment)
> -  __detail::__platform_wait_t _M_a;
> +  __detail::__platform_wait_t _M_counter;
>};
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace
> --
> 2.49.0
>
>


Re: [PATCH v2] Document AArch64 changes for GCC 15

2025-04-25 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi Richard,
>
>> On 23 Apr 2025, at 13:47, Richard Sandiford  
>> wrote:
>> 
>> Thanks for all the feedback.  I've tried to address it in the version
>> below.  I'll push later today if there are no further comments.
>> 
>> Richard
>> 
>> 
>> The list is structured as:
>> 
>> - new configurations
>> - command-line changes
>> - ACLE changes
>> - everything else
>> 
>> As usual, the list of new architectures, CPUs, and features is from a
>> purely mechanical trawl of the associated .def files.  I've identified
>> features by their architectural name to try to improve searchability.
>> Similarly, the list of ACLE changes includes the associated ACLE
>> feature macros, again to try to improve searchability.
>> 
>> The list summarises some of the target-specific optimisations because
>> it sounded like Tamar had received feedback that people found such
>> information interesting.
>> 
>> I've used the passive tense for most entries, to try to follow the
>> style used elsewhere.
>> 
>> We don't yet define __ARM_FEATURE_FAMINMAX, but I'll fix that
>> separately.
>
> Thanks again for doing this…
>
>> 
>> +  
>> +  Support has been added for the following features of the Arm C
>> +Language Extensions
>> +(https://github.com/ARM-software/acle";>ACLE):
>> +
>> +  guarded control stacks
>> +  lookup table instructions with 2-bit and 4-bit indices
>> +(predefined macro
>> +__ARM_FEATURE_LUT, enabled by +lut)
>> +  
>> +  floating-point absolute minimum and maximum instructions
>> +(predefined macro __ARM_FEATURE_FAMINMAX,
>> +enabled by +faminmax)
>> +  
>> +  FP8 conversions (predefined macro
>> +__ARM_FEATURE_FP8, enabled by +fp8)
>> +  
>> +  FP8 2-way dot product to half precision instructions
>> +(predefined macro __ARM_FEATURE_FP8DOT2,
>> +enabled by +fp8dot2)
>> +  
>> +  FP8 4-way dot product to single precision instructions
>> +(predefined macro __ARM_FEATURE_FP8DOT4,
>> +enabled by +fp8dot4)
>> +  
>> +  FP8 multiply-accumulate to half precision and single precision
>> +instructions (predefined macro __ARM_FEATURE_FP8FMA,
>> +enabled by +fp8fma)
>> +  
>> +  SVE FP8 2-way dot product to half precision instructions
>> +(predefined macro __ARM_FEATURE_SSVE_FP8DOT2,
>> +enabled by +ssve-fp8dot2)
>> +  
>> +  SVE FP8 4-way dot product to single precision instructions
>> +(predefined macro __ARM_FEATURE_SSVE_FP8DOT4,
>> +enabled by +ssve-fp8dot4)
>> +  
>> +  SVE FP8 multiply-accumulate to half precision and single precision
>> +instructions (predefined macro 
>> __ARM_FEATURE_SSVE_FP8FMA,
>> +enabled by +ssve-fp8fma)
>
>
> … Should these FP8 entries say “SSVE FP8” rather than “SVE FP8”?

The official description is "SVE(2) ... instructions in Streaming
SVE mode".  But yeah, I suppose dropping the "in Streaming SVE mode"
was a mistake.  I've pushed the following incremental patch.

Thanks,
Richard


diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index a71249ff..3cec4ff4 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -847,17 +847,20 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
 instructions (predefined macro __ARM_FEATURE_FP8FMA,
 enabled by +fp8fma)
   
-  SVE FP8 2-way dot product to half precision instructions
-(predefined macro __ARM_FEATURE_SSVE_FP8DOT2,
-enabled by +ssve-fp8dot2)
+  SVE FP8 2-way dot product to half precision instructions in
+Streaming SVE mode (predefined macro
+__ARM_FEATURE_SSVE_FP8DOT2, enabled by
++ssve-fp8dot2)
   
-  SVE FP8 4-way dot product to single precision instructions
-(predefined macro __ARM_FEATURE_SSVE_FP8DOT4,
-enabled by +ssve-fp8dot4)
+  SVE FP8 4-way dot product to single precision instructions in
+Streaming SVE mode (predefined macro
+__ARM_FEATURE_SSVE_FP8DOT4, enabled by
++ssve-fp8dot4)
   
   SVE FP8 multiply-accumulate to half precision and single precision
-instructions (predefined macro __ARM_FEATURE_SSVE_FP8FMA,
-enabled by +ssve-fp8fma)
+instructions in Streaming SVE mode (predefined macro
+__ARM_FEATURE_SSVE_FP8FMA, enabled by
++ssve-fp8fma)
   
   SVE2.1 instructions (predefined macro
 __ARM_FEATURE_SVE2p1, enabled by +sve2p1)
-- 
2.43.0



[wwwdocs] Update documentation URL in robots.txt

2025-04-25 Thread Jonathan Wakely
---
Pushed to wwwdocs because the old URL gives a 404 error.

 htdocs/robots.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/robots.txt b/htdocs/robots.txt
index 01ae43dd..526fa1dd 100644
--- a/htdocs/robots.txt
+++ b/htdocs/robots.txt
@@ -1,4 +1,4 @@
-# See http://www.robotstxt.org/wc/robots.html
+# See https://www.robotstxt.org/robotstxt.html
 # for information about the file format.
 # Contact g...@gcc.gnu.org for questions.
 
-- 
2.49.0



[wwwdocs v2] Document some more libstdc++ additions in gcc-15

2025-04-25 Thread Jonathan Wakely
---
Pushed to wwwdocs.

 htdocs/gcc-15/changes.html | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 3cec4ff4..3e3c6655 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -564,9 +564,24 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
   Debug assertions are now enabled by default for unoptimized builds.
 Use -D_GLIBCXX_NO_ASSERTIONS to override this.
   
+  
+Associative containers and lists now use custom pointer
+types internally, instead of only when interacting with their allocator.
+  
   Improved experimental support for C++26, including:
 
-views::concat.
+views::concat, views::to_input,
+  views::cache_latest.
+
+
+  Sorting algorithms and raw memory algorithms are constexpr
+  so can be used during constant evaluation.
+
+
+   and 
+  headers.
+
+std::is_virtual_base_of type trait.
 Member visit.
 Type-checking std::format args.
 
@@ -580,6 +595,14 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
 
   std::flat_map and std::flat_set.
 
+
+  std::from_range_t constructors added to all containers,
+  as well as new member functions such as insert_range.
+
+
+  Formatting of ranges and tuples with std::format,
+  as well as string escaping for debug formats.
+
 
   Clarify handling of encodings in localized formatting of chrono types.
 
-- 
2.49.0



[PUSHED] Add 'libgomp.c-c++-common/target-cdtor-1.c'

2025-04-25 Thread Thomas Schwinge
libgomp/
* testsuite/libgomp.c-c++-common/target-cdtor-1.c: New.
---
 .../libgomp.c-c++-common/target-cdtor-1.c | 89 +++
 1 file changed, 89 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-cdtor-1.c

diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-cdtor-1.c 
b/libgomp/testsuite/libgomp.c-c++-common/target-cdtor-1.c
new file mode 100644
index 000..e6099cf23b8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-cdtor-1.c
@@ -0,0 +1,89 @@
+/* Offloaded 'constructor' and 'destructor' functions.  */
+
+#include 
+
+#pragma omp declare target
+
+static void
+__attribute__((constructor))
+initHD1()
+{
+  __builtin_printf("%s, %d\n", __FUNCTION__, omp_is_initial_device());
+}
+
+static void
+__attribute__((constructor))
+initHD2()
+{
+  __builtin_printf("%s, %d\n", __FUNCTION__, omp_is_initial_device());
+}
+
+static void
+__attribute__((destructor))
+finiHD1()
+{
+  __builtin_printf("%s, %d\n", __FUNCTION__, omp_is_initial_device());
+}
+
+static void
+__attribute__((destructor))
+finiHD2()
+{
+  __builtin_printf("%s, %d\n", __FUNCTION__, omp_is_initial_device());
+}
+
+#pragma omp end declare target
+
+static void
+__attribute__((constructor))
+initH1()
+{
+  __builtin_printf("%s, %d\n", __FUNCTION__, omp_is_initial_device());
+}
+
+static void
+__attribute__((destructor))
+finiH2()
+{
+  __builtin_printf("%s, %d\n", __FUNCTION__, omp_is_initial_device());
+}
+
+int main()
+{
+  int c = 0;
+
+  __builtin_printf("%s:%d, %d\n", __FUNCTION__, ++c, omp_is_initial_device());
+
+#pragma omp target map(c)
+  {
+__builtin_printf("%s:%d, %d\n", __FUNCTION__, ++c, 
omp_is_initial_device());
+  }
+
+#pragma omp target map(c)
+  {
+__builtin_printf("%s:%d, %d\n", __FUNCTION__, ++c, 
omp_is_initial_device());
+  }
+
+  __builtin_printf("%s:%d, %d\n", __FUNCTION__, ++c, omp_is_initial_device());
+
+  return 0;
+}
+
+/* The order is undefined, in which same-priority 'constructor' functions, and 
'destructor' functions are run.
+   { dg-output {init[^,]+, 1[\r\n]+} }
+   { dg-output {init[^,]+, 1[\r\n]+} }
+   { dg-output {init[^,]+, 1[\r\n]+} }
+   { dg-output {main:1, 1[\r\n]+} }
+   { dg-output {initHD[^,]+, 0[\r\n]+} { target offload_device } }
+   { dg-output {initHD[^,]+, 0[\r\n]+} { target offload_device } }
+   { dg-output {main:2, 1[\r\n]+} { target { ! offload_device } } }
+   { dg-output {main:2, 0[\r\n]+} { target offload_device } }
+   { dg-output {main:3, 1[\r\n]+} { target  { ! offload_device } } }
+   { dg-output {main:3, 0[\r\n]+} { target offload_device } }
+   { dg-output {main:4, 1[\r\n]+} }
+   { dg-output {finiHD[^,]+, 0[\r\n]+} { target offload_device } }
+   { dg-output {finiHD[^,]+, 0[\r\n]+} { target offload_device } }
+   { dg-output {fini[^,]+, 1[\r\n]+} }
+   { dg-output {fini[^,]+, 1[\r\n]+} }
+   { dg-output {fini[^,]+, 1[\r\n]+} }
+*/
-- 
2.34.1



GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]

2025-04-25 Thread Thomas Schwinge
Hi!

On 2025-04-23T21:49:27+0200, I wrote:
> [...]

With 'libgomp.c++/target-cdtor-2.C' changed to
'dg-require-effective-target init_priority', and
'libgomp.c++/target-cdtor-1.C', 'libgomp.c++/target-cdtor-2.C' changed to
scan for '__cxa_atexit' only for 'target cxa_atexit', I've pushed to
trunk branch commit aafe942227baf8c2bcd4cac2cb150e49a4b895a9
"GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object 
Destruction API [PR119853, PR119854]",
see attached.


Grüße
 Thomas


>From aafe942227baf8c2bcd4cac2cb150e49a4b895a9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 23 Apr 2025 10:51:48 +0200
Subject: [PATCH] GCN, nvptx offloading: Host/device compatibility: Itanium C++
 ABI, DSO Object Destruction API [PR119853, PR119854]

'__dso_handle' for '__cxa_atexit', '__cxa_finalize'.  See
.

	PR target/119853
	PR target/119854
	libgcc/
	* config/gcn/crt0.c (_fini_array): Call
	'__GCC_offload___cxa_finalize'.
	* config/nvptx/gbl-ctors.c (__static_do_global_dtors): Likewise.
	libgomp/
	* target-cxa-dso-dtor.c: New.
	* config/accel/target-cxa-dso-dtor.c: Likewise.
	* Makefile.am (libgomp_la_SOURCES): Add it.
	* Makefile.in: Regenerate.
	* testsuite/libgomp.c++/target-cdtor-1.C: New.
	* testsuite/libgomp.c++/target-cdtor-2.C: Likewise.
---
 libgcc/config/gcn/crt0.c  |  32 
 libgcc/config/nvptx/gbl-ctors.c   |  16 ++
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |   7 +-
 libgomp/config/accel/target-cxa-dso-dtor.c|  62 
 libgomp/target-cxa-dso-dtor.c |   3 +
 .../testsuite/libgomp.c++/target-cdtor-1.C| 104 +
 .../testsuite/libgomp.c++/target-cdtor-2.C| 140 ++
 8 files changed, 363 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/config/accel/target-cxa-dso-dtor.c
 create mode 100644 libgomp/target-cxa-dso-dtor.c
 create mode 100644 libgomp/testsuite/libgomp.c++/target-cdtor-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/target-cdtor-2.C

diff --git a/libgcc/config/gcn/crt0.c b/libgcc/config/gcn/crt0.c
index dbd6749a47f..cc23e214cf9 100644
--- a/libgcc/config/gcn/crt0.c
+++ b/libgcc/config/gcn/crt0.c
@@ -24,6 +24,28 @@ typedef long long size_t;
 /* Provide an entry point symbol to silence a linker warning.  */
 void _start() {}
 
+
+#define PR119369_fixed 0
+
+
+/* Host/device compatibility: '__cxa_finalize'.  Dummy; if necessary,
+   overridden via libgomp 'target-cxa-dso-dtor.c'.  */
+
+#if PR119369_fixed
+extern void __GCC_offload___cxa_finalize (void *) __attribute__((weak));
+#else
+void __GCC_offload___cxa_finalize (void *) __attribute__((weak));
+
+void __attribute__((weak))
+__GCC_offload___cxa_finalize (void *dso_handle __attribute__((unused)))
+{
+}
+#endif
+
+/* There are no DSOs; this is the main program.  */
+static void * const __dso_handle = 0;
+
+
 #ifdef USE_NEWLIB_INITFINI
 
 extern void __libc_init_array (void) __attribute__((weak));
@@ -38,6 +60,11 @@ void _init_array()
 __attribute__((amdgpu_hsa_kernel ()))
 void _fini_array()
 {
+#if PR119369_fixed
+  if (__GCC_offload___cxa_finalize)
+#endif
+__GCC_offload___cxa_finalize (__dso_handle);
+
   __libc_fini_array ();
 }
 
@@ -70,6 +97,11 @@ void _init_array()
 __attribute__((amdgpu_hsa_kernel ()))
 void _fini_array()
 {
+#if PR119369_fixed
+  if (__GCC_offload___cxa_finalize)
+#endif
+__GCC_offload___cxa_finalize (__dso_handle);
+
   size_t count;
   size_t i;
 
diff --git a/libgcc/config/nvptx/gbl-ctors.c b/libgcc/config/nvptx/gbl-ctors.c
index 26268116ee0..10954ee3ab6 100644
--- a/libgcc/config/nvptx/gbl-ctors.c
+++ b/libgcc/config/nvptx/gbl-ctors.c
@@ -31,6 +31,20 @@
 extern int atexit (void (*function) (void));
 
 
+/* Host/device compatibility: '__cxa_finalize'.  Dummy; if necessary,
+   overridden via libgomp 'target-cxa-dso-dtor.c'.  */
+
+extern void __GCC_offload___cxa_finalize (void *);
+
+void __attribute__((weak))
+__GCC_offload___cxa_finalize (void *dso_handle __attribute__((unused)))
+{
+}
+
+/* There are no DSOs; this is the main program.  */
+static void * const __dso_handle = 0;
+
+
 /* Handler functions ('static', in contrast to the 'gbl-ctors.h'
prototypes).  */
 
@@ -49,6 +63,8 @@ static void __static_do_global_dtors (void);
 static void
 __static_do_global_dtors (void)
 {
+  __GCC_offload___cxa_finalize (__dso_handle);
+
   func_ptr *p = __DTOR_LIST__;
   ++p;
   for (; *p; ++p)
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index e3202aeb0e0..19479aea462 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -70,7 +70,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
 	target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
 	oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
 	priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c target-indirect.c
+	

[PATCH] OpenMP, GCN: Add interop-hsa testcase

2025-04-25 Thread Andrew Stubbs
This testcase ensures that the interop HSA support is sufficient to run
a kernel manually on the same device.  It reuses an OpenMP kernel in
order to avoid all the complication of compiling a custom kernel in
Dejagnu (although, this does mean matching the OpenMP runtime
environment, which might be a maintenance issue in future).

OK for mainline and 15?

Andrew

libgomp/ChangeLog:

* testsuite/libgomp.c/interop-hsa.c: New test.
---
 libgomp/testsuite/libgomp.c/interop-hsa.c | 203 ++
 1 file changed, 203 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c/interop-hsa.c

diff --git a/libgomp/testsuite/libgomp.c/interop-hsa.c 
b/libgomp/testsuite/libgomp.c/interop-hsa.c
new file mode 100644
index 000..cf8bc90bb9c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/interop-hsa.c
@@ -0,0 +1,203 @@
+/* { dg-additional-options "-ldl" } */
+/* { dg-require-effective-target offload_device_gcn } */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../../../include/hsa.h"
+#include "../../config/gcn/libgomp-gcn.h"
+
+#define STACKSIZE (100 * 1024)
+#define HEAPSIZE (10 * 1024 * 1024)
+#define ARENASIZE HEAPSIZE
+
+/* This code fragment must be optimized or else the host-fallback kernel has
+ * invalid ASM inserts.  The rest of the file can be compiled safely at -O0.  
*/
+#pragma omp declare target
+uintptr_t __attribute__((optimize("O1")))
+get_kernel_ptr ()
+{
+  uintptr_t val;
+  if (!omp_is_initial_device ())
+/* "main._omp_fn.0" is the name GCC gives the first OpenMP target
+ * region in the "main" function.
+ * The ".kd" suffix is added by the LLVM assembler when it creates the
+ * kernel meta-data, and this is what we need to launch a kernel.  */
+asm ("s_getpc_b64 %0\n\t"
+"s_add_u32 %L0, %L0, main._omp_fn.0.kd@rel32@lo+4\n\t"
+"s_addc_u32 %H0, %H0, main._omp_fn.0.kd@rel32@hi+4"
+: "=Sg"(val));
+  return val;
+}
+#pragma omp end declare target
+
+int
+main(int argc, char** argv)
+{
+
+  /* Load the HSA runtime DLL.  */
+  void *hsalib = dlopen ("libhsa-runtime64.so.1", RTLD_LAZY);
+  assert (hsalib);
+
+  hsa_status_t (*hsa_signal_create) (hsa_signal_value_t initial_value,
+uint32_t num_consumers,
+const hsa_agent_t *consumers,
+hsa_signal_t *signal)
+= dlsym (hsalib, "hsa_signal_create");
+  assert (hsa_signal_create);
+
+  uint64_t (*hsa_queue_load_write_index_relaxed) (const hsa_queue_t *queue)
+= dlsym (hsalib, "hsa_queue_load_write_index_relaxed");
+  assert (hsa_queue_load_write_index_relaxed);
+
+  void (*hsa_signal_store_relaxed) (hsa_signal_t signal,
+   hsa_signal_value_t value)
+= dlsym (hsalib, "hsa_signal_store_relaxed");
+  assert (hsa_signal_store_relaxed);
+
+  hsa_signal_value_t (*hsa_signal_wait_relaxed) (hsa_signal_t signal,
+hsa_signal_condition_t 
condition,
+hsa_signal_value_t 
compare_value,
+uint64_t timeout_hint,
+hsa_wait_state_t 
wait_state_hint)
+= dlsym (hsalib, "hsa_signal_wait_relaxed");
+  assert (hsa_signal_wait_relaxed);
+
+  void (*hsa_queue_store_write_index_relaxed) (const hsa_queue_t *queue,
+  uint64_t value)
+= dlsym (hsalib, "hsa_queue_store_write_index_relaxed");
+  assert (hsa_queue_store_write_index_relaxed);
+
+  hsa_status_t (*hsa_signal_destroy) (hsa_signal_t signal)
+= dlsym (hsalib, "hsa_signal_destroy");
+  assert (hsa_signal_destroy);
+
+  /* Set up the device data environment.  */
+  int test_data_value = 0;
+#pragma omp target enter data map(test_data_value)
+
+  /* Get the interop details.  */
+  int device_num = omp_get_default_device();
+  hsa_agent_t *gpu_agent;
+  hsa_queue_t *hsa_queue = NULL;
+
+  omp_interop_t interop = omp_interop_none;
+#pragma omp interop init(target, targetsync, prefer_type("hsa"): interop) 
device(device_num)
+  assert (interop != omp_interop_none);
+
+  omp_interop_rc_t retcode;
+  omp_interop_fr_t fr = omp_get_interop_int (interop, omp_ipr_fr_id, &retcode);
+  assert (retcode == omp_irc_success);
+  assert (fr == omp_ifr_hsa);
+
+  gpu_agent = omp_get_interop_ptr(interop, omp_ipr_device, &retcode);
+  assert (retcode == omp_irc_success);
+
+  hsa_queue = omp_get_interop_ptr(interop, omp_ipr_targetsync, &retcode);
+  assert (retcode == omp_irc_success);
+  assert (hsa_queue);
+
+  /* Call an offload kernel via OpenMP/libgomp.
+   *
+   * This kernel serves two purposes:
+   *   1) Lookup the device-side load-address of itself (thus avoiding the
+   *   need to access the libgomp internals).
+   *   2) Count how many times it is called.
+   * We then call it once using OpenMP, and once manually, and check
+   * the counter

Re: [PATCH RFC] c++: bad pending_template recursion

2025-04-25 Thread Jonathan Wakely
On Thu, 24 Apr 2025 at 15:48, Jonathan Wakely  wrote:
>
> On Fri, 18 Apr 2025 at 23:08, Jason Merrill  wrote:
> >
> > limit_bad_template_recursion currently avoids immediate instantiation of
> > templates from uses in an already ill-formed instantiation, but we still can
> > get unnecessary recursive instantiation in pending_templates if the
> > instantiation was queued before the error.
> >
> > Currently this regresses several libstdc++ tests which seem to rely on a
> > static_assert in a function called from another that is separately 
> > ill-formed.
> > For instance, in the 48101_neg.cc tests, we first get an error in find(), 
> > then
> > later instantiate _S_key() (called from find) and get the static_assert 
> > error
> > from there.
> >
> > Thoughts?  Is this a desirable change, or is the fact that the use precedes 
> > the
> > error reason to go ahead with the instantiation?
> >
> > > FAIL: 23_containers/map/48101_neg.cc  -std=gnu++17  (test for errors, 
> > > line )
> > > FAIL: 23_containers/multimap/48101_neg.cc  -std=gnu++17  (test for 
> > > errors, line )
> > > FAIL: 23_containers/multiset/48101_neg.cc  -std=gnu++17  (test for 
> > > errors, line )
> > > FAIL: 23_containers/set/48101_neg.cc  -std=gnu++17  (test for errors, 
> > > line )
>
> As discussed the other day, I will tweak the diagnostics for the
> associative containers so that these tests don't regress. I'll post
> that for review shortly.
>
> > > FAIL: 30_threads/packaged_task/cons/dangling_ref.cc  -std=gnu++17  (test 
> > > for errors, line )
> > > FAIL: 30_threads/packaged_task/cons/lwg4154_neg.cc  -std=gnu++17  (test 
> > > for errors, line )
>
> These two don't strictly need to use dg-error for the messages that
> disappear with your patch, they could have used dg-prune-output
> instead. What the test intends to check is that we get a failed
> static_assert for std::is_invocable_r_v, and that's still the case.
> The other errors are collateral damage, and your patch prevents that.
> Which is a good thing. So we can just remove the failing dg-error
> lines.
>
> I'll finish testing my associative container changes, then push your
> patch after mine lands.

The front end patch has been pushed as r16-133-g8acea9ffa82ed8


>
> >
> >
> > gcc/cp/ChangeLog:
> >
> > * cp-tree.h (struct tinst_level): Add had_errors bit.
> > * pt.cc (push_tinst_level_loc): Clear it.
> > (pop_tinst_level): Set it.
> > (reopen_tinst_level): Check it.
> > (instantiate_pending_templates): Call limit_bad_template_recursion.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/template/recurse5.C: New test.
> > ---
> >  gcc/cp/cp-tree.h | 10 --
> >  gcc/cp/pt.cc | 10 --
> >  gcc/testsuite/g++.dg/template/recurse5.C | 17 +
> >  3 files changed, 33 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/template/recurse5.C
> >
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 7798efba3db..856202c65dd 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -6755,8 +6755,14 @@ struct GTY((chain_next ("%h.next"))) tinst_level {
> >/* The location where the template is instantiated.  */
> >location_t locus;
> >
> > -  /* errorcount + sorrycount when we pushed this level.  */
> > -  unsigned short errors;
> > +  /* errorcount + sorrycount when we pushed this level.  If the value
> > + overflows, it will always seem like we currently have more errors, so 
> > we
> > + will limit template recursion even from non-erroneous templates.  In 
> > a TU
> > + with over 32k errors, that's fine.  */
> > +  unsigned short errors : 15;
> > +
> > +  /* set in pop_tinst_level if there have been errors since we pushed.  */
> > +  bool had_errors : 1;
> >
> >/* Count references to this object.  If refcount reaches
> >   refcount_infinity value, we don't increment or decrement the
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index a71705fd085..e8d342f99f6 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -11418,6 +11418,7 @@ push_tinst_level_loc (tree tldcl, tree targs, 
> > location_t loc)
> >new_level->targs = targs;
> >new_level->locus = loc;
> >new_level->errors = errorcount + sorrycount;
> > +  new_level->had_errors = false;
> >new_level->next = NULL;
> >new_level->refcount = 0;
> >new_level->path = new_level->visible = nullptr;
> > @@ -11468,6 +11469,9 @@ pop_tinst_level (void)
> >/* Restore the filename and line number stashed away when we started
> >   this instantiation.  */
> >input_location = current_tinst_level->locus;
> > +  if (unsigned errs = errorcount + sorrycount)
> > +if (errs > current_tinst_level->errors)
> > +  current_tinst_level->had_errors = true;
> >set_refcount_ptr (current_tinst_level, current_tinst_level->next);
> >--tinst_depth;
> >  }
> > @@ -11487,7 +11491,7 @@ reopen_tinst_level (struct tins

Re: [PATCH] libstdc++: Minimalize temporary allocations when width is specified [PR109162]

2025-04-25 Thread Jonathan Wakely

On 23/04/25 13:56 +0200, Tomasz Kamiński wrote:

When width parameter is specified for formatting range, tuple or escaped
presentation of string, we used to format characters to temporary string,
and write produce sequence padded according to the spec. However, once the
estimated width of formatted representation of input is larger than the value
of spec width, it can be written directly to the output. This limits size of
required allocation, especially for large ranges.

Similarly, if precision (maximum) width is provided for string presentation,
on a prefix of sequence with estimated width not greater than precision, needs
to be buffered.

To realize above, this commit implements a new _Padding_sink specialization.
This sink holds an output iterator, a value of padding width, (optionally)
maximum width and a string buffer inherited from _Str_sink.
Then any incoming characters are treated in one of following ways, depending of
estimated width W of written sequence:
* written to string if W is smaller than padding width and maximum width (if 
present)
* ignored, if W is greater than maximum width
* written to output iterator, if W is greater than padding width

The padding sink is used instead of _Str_sink in __format::__format_padded,
__formatter_str::_M_format_escaped functions.

Furthermore __formatter_str::_M_format implementation was reworked, to:
* reduce number of instantiations by delegating to _Rg& and const _Rg& 
overloads,
* non-debug presentation is written to _Out directly or via _Padding_sink
* if maximum width is specified for debug format with non-unicode encoding,
 string size is limited to that number.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/bits/formatfwd.h (__simply_formattable_range): Moved from
std/format.
* include/std/format (__formatter_str::_format): Extracted escaped
string handling to separate method...
(__formatter_str::_M_format_escaped): Use __Padding_sink.
(__formatter_str::_M_format): Adjusted implementation.
(__formatter_str::_S_trunc): Extracted as namespace function...
(__format::_truncate): Extracted from __formatter_str::_S_trunc.
(__format::_Seq_sink): Removed forward declarations, made members
protected and non-final.
(_Seq_sink::_M_trim): Define.
(_Seq_sink::_M_span): Renamed from view.
(_Seq_sink::view): Returns string_view instead of span.
(__format::_Str_sink): Moved after _Seq_sink.
(__format::__format_padded): Use _Padding_sink.
* testsuite/std/format/debug.cc: Add timeout and new tests.
* testsuite/std/format/ranges/sequence.cc: Specify unicode as
encoding and new tests.
* testsuite/std/format/ranges/string.cc: Likewise.
* testsuite/std/format/tuple.cc: Likewise.
---
This is for sure 16 material, and nothing to backport.
This addressed the TODO I created in __format_padded.
OK for trunk after 15.1?



This is a nice improvement.

OK with the spelling and minor tweaks mentioned below ...



libstdc++-v3/include/bits/formatfwd.h |   8 +
libstdc++-v3/include/std/format   | 396 +-
libstdc++-v3/testsuite/std/format/debug.cc| 386 -
.../testsuite/std/format/ranges/sequence.cc   | 116 +
.../testsuite/std/format/ranges/string.cc |  63 +++
libstdc++-v3/testsuite/std/format/tuple.cc|  93 
6 files changed, 957 insertions(+), 105 deletions(-)

diff --git a/libstdc++-v3/include/bits/formatfwd.h 
b/libstdc++-v3/include/bits/formatfwd.h
index 9ba658b078a..2d54ee5d30b 100644
--- a/libstdc++-v3/include/bits/formatfwd.h
+++ b/libstdc++-v3/include/bits/formatfwd.h
@@ -131,6 +131,14 @@ namespace __format
  = ranges::input_range
  && formattable, _CharT>;

+  // _Rg& and const _Rg& are both formattable and use same formatter
+  // specialization for their references.
+  template
+concept __simply_formattable_range
+  = __const_formattable_range<_Rg, _CharT>
+ && same_as>,
+remove_cvref_t>>;
+
  template
using __maybe_const_range
  = __conditional_t<__const_formattable_range<_Rg, _CharT>, const _Rg, _Rg>;
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 7d3067098be..355db5f2a60 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -56,7 +56,7 @@
#include   // input_range, range_reference_t
#include   // subrange
#include  // ranges::copy
-#include  // back_insert_iterator
+#include  // back_insert_iterator, counted_iterator
#include  // __is_pair
#include   // __is_scalar_value, _Utf_view, etc.
#include   // tuple_size_v
@@ -99,19 +99,12 @@ namespace __format

  // Size for stack located buffer
  template
-  constexpr size_t __stackbuf_size = 32 * sizeof(void*) / sizeof(_CharT);
+constexpr size_t __stackbuf_size = 32 * sizeof(void*) / sizeof(_CharT);

  // Type-erased character sinks.
  templ

[committed] wwwdocs: Rotate news

2025-04-25 Thread Jakub Jelinek
Hi!

I've committed the following patch to rotate news, chose to move
202{2,3}-ish items to news.html.  We'll soon have 14.3/13.4 releases
so the first column will be about the same length as the second.

diff --git a/htdocs/index.html b/htdocs/index.html
index 8bdc4071..a662ab09 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -92,48 +92,6 @@ mission statement.
 [2024-05-07]
 
 
-https://inbox.sourceware.org/gcc/36fadb0549c3dca716eb3b923d66a11be2c67a61.ca...@redhat.com/";>GCC
 developer room at FOSDEM 2024: Call for Participation open
-[2023-11-20]
-FOSDEM 2024: Brussels, Belgium, February 3-4 2024
-
-https://gcc.gnu.org/wiki/cauldron2023";>GNU Tools Cauldron 
2023
-[2023-09-05]
-Cambridge, United Kingdom, September 22-24 2023
-
-GCC 13.2 released
-[2023-07-27]
-
-
-GCC 10.5 released
-[2023-07-07]
-
-
-GCC Code of Conduct adopted
-[2023-06-16]
-
-GCC 11.4 released
-[2023-05-29]
-
-
-GCC 12.3 released
-[2023-05-08]
-
-
-GCC 13.1 released
-[2023-04-26]
-
-
-https://godbolt.org/z/GT1vGdzMb";>GCC BPF in Compiler 
Explorer
- [2022-12-23]
-Support for a nightly build of the bpf-unknown-none-gcc compiler
-  has been contributed to Compiler Explorer (aka godbolt.org) by Marc
-  Poulhiès
-
-Modula-2 front end 
added
-  [2022-12-14]
-The Modula-2 programming language front end has been added to GCC.
-  This front end was contributed by Gaius Mulley.
-
 
 
 
diff --git a/htdocs/news.html b/htdocs/news.html
index 71c2a5bb..df1a969c 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -16,6 +16,48 @@
 
 
 
+https://inbox.sourceware.org/gcc/36fadb0549c3dca716eb3b923d66a11be2c67a61.ca...@redhat.com/";>GCC
 developer room at FOSDEM 2024: Call for Participation open
+[2023-11-20]
+FOSDEM 2024: Brussels, Belgium, February 3-4 2024
+
+https://gcc.gnu.org/wiki/cauldron2023";>GNU Tools Cauldron 
2023
+[2023-09-05]
+Cambridge, United Kingdom, September 22-24 2023
+
+GCC 13.2 released
+[2023-07-27]
+
+
+GCC 10.5 released
+[2023-07-07]
+
+
+GCC Code of Conduct adopted
+[2023-06-16]
+
+GCC 11.4 released
+[2023-05-29]
+
+
+GCC 12.3 released
+[2023-05-08]
+
+
+GCC 13.1 released
+[2023-04-26]
+
+
+https://godbolt.org/z/GT1vGdzMb";>GCC BPF in Compiler 
Explorer
+ [2022-12-23]
+Support for a nightly build of the bpf-unknown-none-gcc compiler
+  has been contributed to Compiler Explorer (aka godbolt.org) by Marc
+  Poulhiès
+
+Modula-2 front end 
added
+  [2022-12-14]
+The Modula-2 programming language front end has been added to GCC.
+  This front end was contributed by Gaius Mulley.
+
 https://gcc.gnu.org/wiki/cauldron2022";>GNU Tools Cauldron 
2022
 [2022-09-02]
 Prague, Czech Republic and online, September 16-18 2022

Jakub



[PATCH] c++/modules: Fix imported CNTTPs being considered non-constant [PR119938]

2025-04-25 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15?

-- >8 --

When importing a CNTTP object, since r15-3031-g0b7904e274fbd6 we
shortcut the processing of the generated NTTP so that we don't attempt
to recursively load pendings.  However, due to an oversight we do not
properly set TREE_CONSTANT or DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P
on the decl, which confuses later processing.  This patch ensures that
this happens correctly.

PR c++/119938

gcc/cp/ChangeLog:

* pt.cc (get_template_parm_object): When !check_init, add assert
that expr really is constant and mark decl as such.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-nttp-2_a.H: New test.
* g++.dg/modules/tpl-nttp-2_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/pt.cc|  7 ++-
 gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H | 14 ++
 gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C | 10 ++
 3 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a71705fd085..75d34532426 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7492,8 +7492,13 @@ get_template_parm_object (tree expr, tree name, bool 
check_init/*=true*/)
 {
   /* The EXPR is the already processed initializer, set it on the NTTP
 object now so that cp_finish_decl doesn't do it again later.  */
+  gcc_checking_assert (reduced_constant_expression_p (expr));
   DECL_INITIAL (decl) = expr;
-  DECL_INITIALIZED_P (decl) = 1;
+  DECL_INITIALIZED_P (decl) = true;
+  DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = true;
+  /* FIXME setting TREE_CONSTANT on refs breaks the back end.  */
+  if (!TYPE_REF_P (type))
+   TREE_CONSTANT (decl) = true;
 }
 
   pushdecl_top_level_and_finish (decl, expr);
diff --git a/gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H 
b/gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H
new file mode 100644
index 000..bfae11cd185
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-nttp-2_a.H
@@ -0,0 +1,14 @@
+// PR c++/119938
+// { dg-additional-options "-fmodules -std=c++20" }
+// { dg-module-cmi {} }
+
+struct A { int x; };
+
+template  struct B { static_assert(a.x == 1); };
+using C = B;
+
+template  void D() { static_assert(a.x == 2); };
+inline void E() { D(); }
+
+template  struct F { static constexpr int result = a.x; };
+template  constexpr int G() { return F::result; };
diff --git a/gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C 
b/gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C
new file mode 100644
index 000..7e661cbdef0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-nttp-2_b.C
@@ -0,0 +1,10 @@
+// PR c++/119938
+// { dg-additional-options "-fmodules -std=c++20" }
+
+import "tpl-nttp-2_a.H";
+
+int main() {
+  C c;
+  E();
+  static_assert(G() == 3);
+}
-- 
2.47.0



Re: [PATCHv2] modulo-sched: reject loop conditions when not decrementing with one [PR 116479]

2025-04-25 Thread Jakub Jelinek
On Fri, Apr 25, 2025 at 01:55:58PM +0100, Andre Vieira (lists) wrote:
> Differences from v1:
> 
> Made suggested changes to testcase and moved to target-agnostic area.
> 
> gcc/ChangeLog:
> 
>   PR rtl-optimization/116479
>   * modulo-sched.cc (doloop_register_get): Reject conditions with
>   decrements that are not 1.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr116479.c: New test.

Ok.

Jakub



[PATCHv2] modulo-sched: reject loop conditions when not decrementing with one [PR 116479]

2025-04-25 Thread Andre Vieira (lists)

Differences from v1:

Made suggested changes to testcase and moved to target-agnostic area.

gcc/ChangeLog:

PR rtl-optimization/116479
* modulo-sched.cc (doloop_register_get): Reject conditions with
decrements that are not 1.

gcc/testsuite/ChangeLog:

* gcc.dg/pr116479.c: New test.

On 23/04/2025 16:51, Jakub Jelinek wrote:

On Wed, Apr 23, 2025 at 04:46:04PM +0100, Andre Vieira (lists) wrote:

On 23/04/2025 16:22, Jakub Jelinek wrote:

On Wed, Apr 23, 2025 at 03:57:58PM +0100, Andre Vieira (lists) wrote:

+++ b/gcc/testsuite/gcc.target/aarch64/pr116479.c
@@ -0,0 +1,20 @@
+/* PR 116479 */
+/* { dg-do run } */
+/* { dg-additional-options "-O -funroll-loops -finline-stringops -fmodulo-sched 
--param=max-iterations-computation-cost=637924687 -static -std=c23" } */
+_BitInt (13577) b;
+
+void
+foo (char *ret)
+{
+  __builtin_memset (&b, 4, 697);
+  *ret = 0;
+}
+
+int
+main ()
+{
+  char x;
+  foo (&x);
+  for (unsigned i = 0; i < sizeof (x); i++)
+__builtin_printf ("%02x", i[(volatile unsigned char *) &x]);


Shouldn't these 2 lines instead be
if (x != 0)
  __builtin_abort ();
?



Fair, I copied the testcase verbatim from the PR, the error-mode was a
segfault. But I agree a check !=0 with __builtin_abort here seems more
appropriate.  Any opinions on whether I should move it to dg with a bitint
target?


I think there isn't anything aarch64 specific on the test, so yes,
I'd move it to gcc/testsuite/gcc.dg/bitint-123.c,
/* { dg-do run { target bitint } } */
and wrap b/foo definitions into #if __BITINT_MAXWIDTH__ >= 13577
and the main body as well (just in case some target supports smaller maximum
width than that).
Also, drop -static from dg-additional-options?

Jakub

diff --git a/gcc/modulo-sched.cc b/gcc/modulo-sched.cc
index 
08af5a929e148df8b3f6f4f9c4ada564aac22cdb..002346778f447ffe4fbad803872ba03880236e34
 100644
--- a/gcc/modulo-sched.cc
+++ b/gcc/modulo-sched.cc
@@ -356,7 +356,13 @@ doloop_register_get (rtx_insn *head, rtx_insn *tail)
 reg = XEXP (condition, 0);
   else if (GET_CODE (XEXP (condition, 0)) == PLUS
   && REG_P (XEXP (XEXP (condition, 0), 0)))
-reg = XEXP (XEXP (condition, 0), 0);
+{
+  if (CONST_INT_P (XEXP (condition, 1))
+ && INTVAL (XEXP (condition, 1)) == -1)
+   reg = XEXP (XEXP (condition, 0), 0);
+  else
+   return NULL_RTX;
+}
   else
 gcc_unreachable ();
 
diff --git a/gcc/testsuite/gcc.dg/pr116479.c b/gcc/testsuite/gcc.dg/pr116479.c
new file mode 100644
index 
..dbbcb9aaf5753c500d56aa2e12c2c34cf8d5d6d4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr116479.c
@@ -0,0 +1,26 @@
+/* PR 116479 */
+/* { dg-do run { target { bitint } } } */
+/* { dg-additional-options "-O -funroll-loops -finline-stringops 
-fmodulo-sched --param=max-iterations-computation-cost=637924687 -std=c23" } */
+
+#if __BITINT_MAXWIDTH__ >= 13577
+_BitInt (13577) b;
+
+void
+foo (char *ret)
+{
+  __builtin_memset (&b, 4, 697);
+  *ret = 0;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 13577
+  char x;
+  foo (&x);
+  for (unsigned i = 0; i < sizeof (x); i++)
+if (x != 0)
+  __builtin_abort ();
+#endif
+}


Re: [PATCH] Add a bootstrap-native build config

2025-04-25 Thread Andi Kleen

On 2025-04-23 10:18, Richard Biener wrote:

On Tue, Apr 22, 2025 at 5:43 PM Andi Kleen  wrote:


On 2025-04-22 13:22, Richard Biener wrote:
> On Sat, Apr 12, 2025 at 5:09 PM Andi Kleen  wrote:
>>
>> From: Andi Kleen 
>>
>> ... that uses -march=native -mtune=native to build a compiler
>> optimized
>> for the host.
>
> -march=native implies -mtune=native so I think the latter is redundant.

Ok with that change?


Put the list back in the loop.



>
>> config/ChangeLog:
>>
>> * bootstrap-native.mk: New file.
>>
>> gcc/ChangeLog:
>>
>> * doc/install.texi: Document bootstrap-native.
>> ---
>>  config/bootstrap-native.mk | 1 +
>>  gcc/doc/install.texi   | 7 +++
>>  2 files changed, 8 insertions(+)
>>  create mode 100644 config/bootstrap-native.mk
>>
>> diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
>> new file mode 100644
>> index 000..a4a3d859408
>> --- /dev/null
>> +++ b/config/bootstrap-native.mk
>> @@ -0,0 +1 @@
>> +BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
>
> bootstrap-O3 uses
>
> BOOT_CFLAGS := -O3 $(filter-out -O%, $(BOOT_CFLAGS))
>
> so do you want to filer-out other -march/-mtune/-mcpu options?

I don't think that is needed because these are usually not used 
(unlike

-O)

>
> Some targets know -mcpu= instead of -march=, did you check whether
> any of those have =native?

There are some like Alpha and others dont jave it at all. That is the
why the documentation says "if supported".


I see.

So yes, OK with the above change.


Based on Tamars comment the original patch seems better because it works 
correctly on aarch which should have much more users than the exotic 
architectures. Also many of then are likely cross compiler only. So I 
would like to commit the original patch if its ok.


Thanks,
Andi



Richard.


>
>> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
>> index 4973f195daf..04a2256b97a 100644
>> --- a/gcc/doc/install.texi
>> +++ b/gcc/doc/install.texi
>> @@ -3052,6 +3052,13 @@ Removes any @option{-O}-started option from
>> @code{BOOT_CFLAGS}, and adds
>>  @itemx @samp{bootstrap-Og}
>>  Analogous to @code{bootstrap-O1}.
>>
>> +@item @samp{bootstrap-native}
>> +@itemx @samp{bootstrap-native}
>> +Optimize the compiler code for the build host, if supported by the
>> +architecture. Note this only affects the compiler, not the targeted
>> +code. If you want the later, choose options suitable to the target
>> you
>> +are looking for. For example @samp{--with-cpu} would be a good
>> starting point.
>> +
>>  @item @samp{bootstrap-lto}
>>  Enables Link-Time Optimization for host tools during bootstrapping.
>>  @samp{BUILD_CONFIG=bootstrap-lto} is equivalent to adding
>> --
>> 2.47.1
>>


[pushed 1/2] c++: add -fabi-version=21

2025-04-25 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

I'm about to add a bugfix that changes the ABI of noexcept lambdas, so first
let's add the new ABI version.  And I think it's time to update the
compatibility version; let's bump to GCC 13, before the addition of concepts
mangling.

gcc/ChangeLog:

* common.opt: Add ABI v21.

gcc/c-family/ChangeLog:

* c-opts.cc (c_common_post_options): Bump default ABI to 21
and compat ABI to 18.
---
 gcc/common.opt| 3 +++
 gcc/c-family/c-opts.cc| 6 +++---
 gcc/testsuite/g++.dg/abi/macro0.C | 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index e3fa0dacec4..d10a6b7e533 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1061,6 +1061,9 @@ Driver Undocumented
 ; 20: Fix mangling of lambdas in static data member initializers.
 ; Default in G++ 15.
 ;
+; 21:
+; Default in G++ 16.
+;
 ; Additional positive integers will be assigned as new versions of
 ; the ABI become the default version of the ABI.
 fabi-version=
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index d43b3aef102..40163821948 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1084,9 +1084,9 @@ c_common_post_options (const char **pfilename)
 
   /* Change flag_abi_version to be the actual current ABI level, for the
  benefit of c_cpp_builtins, and to make comparison simpler.  */
-  const int latest_abi_version = 20;
-  /* Generate compatibility aliases for ABI v13 (8.2) by default.  */
-  const int abi_compat_default = 13;
+  const int latest_abi_version = 21;
+  /* Generate compatibility aliases for ABI v18 (GCC 13) by default.  */
+  const int abi_compat_default = 18;
 
 #define clamp(X) if (X == 0 || X > latest_abi_version) X = latest_abi_version
   clamp (flag_abi_version);
diff --git a/gcc/testsuite/g++.dg/abi/macro0.C 
b/gcc/testsuite/g++.dg/abi/macro0.C
index f6a57c11ae7..3dd44fcbae9 100644
--- a/gcc/testsuite/g++.dg/abi/macro0.C
+++ b/gcc/testsuite/g++.dg/abi/macro0.C
@@ -1,6 +1,6 @@
 // This testcase will need to be kept in sync with c_common_post_options.
 // { dg-options "-fabi-version=0" }
 
-#if __GXX_ABI_VERSION != 1020
+#if __GXX_ABI_VERSION != 1021
 #error "Incorrect value of __GXX_ABI_VERSION"
 #endif

base-commit: 79ad792bf04dfa1109d1afdae93cee9fb457439b
-- 
2.49.0



Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-25 Thread Kees Cook
On Fri, Apr 25, 2025 at 04:56:52PM +, Qing Zhao wrote:
> 
> 
> > On Apr 24, 2025, at 13:07, Kees Cook  wrote:
> > 
> > On Thu, Apr 24, 2025 at 04:36:14PM +, Qing Zhao wrote:
> >> 
> >>> On Apr 24, 2025, at 11:59, Martin Uecker  wrote:
> >>> 
> >>> Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
>  Hi, 
>  
>  Kees reported a segmentation failure when he used the patch to compiler 
>  kernel, 
>  and the reduced the testing case is something like the following:
>  
>  struct f {
>  void *g __attribute__((__counted_by__(h)));
>  long h;
>  };
>  
>  extern struct f *my_alloc (int);
>  
>  int i(void) {
>  struct f *iov = my_alloc (10);
>  int *j = (int *)iov->g;
>  return __builtin_dynamic_object_size(iov->g, 0);
>  }
>  
>  Basically, the problem is relating to the pointee type of the pointer 
>  array being “void”, 
>  As a result, the element size of the array is not available in the IR. 
>  Therefore segmentation
>  fault when calculating the size of the whole object. 
>  
>  Although it’s easy to fix this segmentation failure, I am not quite sure 
>  what’s the best
>  solution to this issue:
>  
>  1. Reject such usage of “counted_by” in the very beginning by reporting 
>  warning to the
>  User, and delete the counted_by attribute from the field.
>  
>  Or:
>  
>  2. Accept such usage, but issue warnings when calculating the 
>  object_size in Middle-end.
>  
>  Personally, I prefer the above 1 since I think that when the pointee 
>  type is void, we don’t know
>  The type of the element of the pointer array, there is no way to decide 
>  the size of the pointer array. 
>  
>  So, the counted_by information is not useful for the 
>  __builtin_dynamic_object_size.
>  
>  But I am not sure whether the counted_by still can be used for bound 
>  sanitizer?
>  
>  Thanks for suggestions and help.
> >>> 
> >>> GNU C allows pointer arithmetic and sizeof on void pointers and
> >>> that treats void as having size 1.  So you could also allow counted_by
> >>> and assume as size 1 for void.
> >>> 
> >>> https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html
> >> 
> >> Okay, thanks for the info.
> >> So, 
> >> 1. should we issue warnings when doing this?
> > 
> > Please don't, Linux would very much like to track these allocation sizes
> > still. Performing pointer arithmetic and bounds checking (via __bdos) on
> > "void *" is wanted (and such a calculation was what tripped the
> > segfault).
> > 
> >> 2. If the compilation option is explicitly asking for standard C,
> >>shall we issue warning and delete the counted_by attribute from the 
> >> field?
> > 
> > I think it needs to stay attached for __bdos. And from the looks of it,
> > even array access works with 1-byte values too:
> > 
> > extern void *ptr;
> > void *foo(int num) {
> >return &ptr[num];
> > }
> > 
> > The assembly output of this shows it's doing byte addition. Clang
> > doesn't warn about this, but GCC does:
> > 
> > :5:16: warning: dereferencing 'void *' pointer
> >5 | return &ptr[num];
> >  |^
> > 
> > So, I think even the bounds sanitizer should handle it, even if a
> > warning ultimately gets emitted.
> 
> I tried to come up with a testing case for array sanitizer on void pointers 
> as following:
> 
> #include 
> 
> struct annotated {
>   int b;
>   void *c __attribute__ ((counted_by (b)));
> } *array_annotated;
> 
> void __attribute__((__noinline__)) setup (int annotated_count)
> {
>   array_annotated
> = (struct annotated *)malloc (sizeof (struct annotated));
>   array_annotated->c = malloc (sizeof (char) * annotated_count);
>   array_annotated->b = annotated_count;
> 
>   return;
> }
> 
> void __attribute__((__noinline__)) test (int annotated_index)
> {
>   array_annotated->c[annotated_index] = 2;

Have this be:

void * __attribute__((__noinline__)) test (int annotated_index)
{
return &array_annotated->c[annotated_index];
}

We stay dereferenced, but we can do take the address.

Actually, the index will likely need to be "annotated_index + 1" because
of this bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119132
(I still don't accept this as "invalid", but I have other hills to die on)


-- 
Kees Cook


[PATCH] libstdc++: Fix _Padding_sink in case when predicted is between padwidht and maxwidth [PR109162]

2025-04-25 Thread Tomasz Kamiński
The _Padding_sink was behaving incorreclty, when the predicated width (based on
code units counts) was higher than _M_maxwidth, but lower than _M_padwidth.
In this case _M_update() returned without calling _M_force_update() and 
computing
field width for Unicode encondings, because _M_buffering() returns 'true'.
As cosnequence we switched to _M_ignoring() mode with prefix of smaller lenght.

We now call _M_force_update() if predicted witdh is greater than either
_M_padwidth or _M_maxwidth.

This was detected for existing test case on 32bit achicture.

libstdc++-v3/ChangeLog:

* include/std/format (_Padding_sink::_M_update): Fixed condition for
calling _M_force_update.
* testsuite/std/format/debug.cc: Add test that reproducers this issue
on 64bit architevture.
---
Tested only debug.cc now for x64_64-linux with and without -m32.
With peform full test on Monday, but sending patch to inform about issue.
This is GCC 16 only.

 libstdc++-v3/include/std/format| 6 +++---
 libstdc++-v3/testsuite/std/format/debug.cc | 9 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 69d8d189db6..a9e4beca9ca 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3697,9 +3697,9 @@ namespace __format
   _M_update(size_t __new)
   {
_M_printwidth += __new;
-   if (_M_buffering())
- return true;
-   return _M_force_update();
+   if (_M_printwidth > _M_padwidth || _M_printwidth > _M_maxwidth)
+ return _M_force_update();
+   return true;
   }
 
   void
diff --git a/libstdc++-v3/testsuite/std/format/debug.cc 
b/libstdc++-v3/testsuite/std/format/debug.cc
index d3402f80f4b..6165a295496 100644
--- a/libstdc++-v3/testsuite/std/format/debug.cc
+++ b/libstdc++-v3/testsuite/std/format/debug.cc
@@ -596,6 +596,10 @@ void test_padding()
   VERIFY( strip_prefix(resv, 48, '*') );
   VERIFY( resv == inv );
 
+  resv = res = std::format("{:*>300.200s}", in);
+  VERIFY( strip_prefix(resv, 108, '*') );
+  VERIFY( resv == inv );
+
   resv = res = std::format("{:*>240.200s}", in);
   VERIFY( strip_prefix(resv, 48, '*') );
   VERIFY( resv == inv );
@@ -678,6 +682,11 @@ void test_padding()
   VERIFY( strip_quotes(resv) );
   VERIFY( resv == inv );
 
+  resv = res = std::format("{:*>300.200?}", in);
+  VERIFY( strip_prefix(resv, 106, '*') );
+  VERIFY( strip_quotes(resv) );
+  VERIFY( resv == inv );
+
   resv = res = std::format("{:*>240.200?}", in);
   VERIFY( strip_prefix(resv, 46, '*') );
   VERIFY( strip_quotes(resv) );
-- 
2.49.0



[PATCH] Add m32c*-*-* to the list of obsolete targets

2025-04-25 Thread Iain Buclaw
Hi,

This patch marks m32c*-*-* targets obsolete in GCC 16.  The target has
not had a maintainer since GCC 9 (r9-1950), and fails to compile even
the simplest of functions since GCC 8 (r8-777, as reported in PR83670).

OK for trunk?

Regards,
Iain.

---
contrib/ChangeLog:

* config-list.mk: Add m32c*-*-* to the list of obsoleted targets.

gcc/ChangeLog:

* config.gcc (LIST): --enable-obsolete for m32c-elf.
---
 contrib/config-list.mk | 2 +-
 gcc/config.gcc | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index fc9fc9902bf..58bb4c5c186 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -65,7 +65,7 @@ LIST = \
   ia64-hp-vmsOPT-enable-obsolete iq2000-elf lm32-elf \
   lm32-rtems lm32-uclinux \
   loongarch64-linux-gnuf64 loongarch64-linux-gnuf32 loongarch64-linux-gnusf \
-  m32c-elf m32r-elf m32rle-elf \
+  m32c-elfOPT-enable-obsolete m32r-elf m32rle-elf \
   m68k-elf m68k-netbsdelf \
   m68k-uclinux m68k-linux m68k-rtems \
   mcore-elf microblaze-linux microblaze-elf \
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d98df883fce..6dbe880c9d4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -273,6 +273,7 @@ esac
 # Obsolete configurations.
 case ${target} in
  ia64*-*-hpux* | ia64*-*-*vms* | ia64*-*-elf*  \
+   | m32c*-*-* \
  )
 if test "x$enable_obsolete" != xyes; then
   echo "*** Configuration ${target} is obsolete." >&2
-- 
2.43.0



Re: [PATCH] c++/modules: Ensure DECL_FRIEND_CONTEXT is streamed [PR119939]

2025-04-25 Thread Jason Merrill

On 4/25/25 10:30 AM, Nathaniel Shead wrote:

Tested so far on x86_64-pc-linux-gnu (just modules.exp), OK for trunk/15
if full bootstrap+regtest succeeds?


OK.


A potentially safer approach that would slightly bloat out the size of
the built modules would be to always stream this variable rather than
having any conditions, but from what I can tell this change should be
sufficient; happy to go that way if you prefer though.

-- >8 --

An instantiated friend function relies on DECL_FRIEND_CONTEXT being set
to be able to recover the template arguments of the class that
instantiated it, despite not being a template itself.  This patch
ensures that this data is streamed even when DECL_CLASS_SCOPE_P is not
true.

PR c++/119939

gcc/cp/ChangeLog:

* module.cc (trees_out::lang_decl_vals): Also stream
lang->u.fn.context when DECL_UNIQUE_FRIEND_P.
(trees_in::lang_decl_vals): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/concept-11_a.H: New test.
* g++.dg/modules/concept-11_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc| 4 ++--
  gcc/testsuite/g++.dg/modules/concept-11_a.H | 9 +
  gcc/testsuite/g++.dg/modules/concept-11_b.C | 9 +
  3 files changed, 20 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/concept-11_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/concept-11_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 5ff5c462e79..a2e0d6d2571 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -7386,7 +7386,7 @@ trees_out::lang_decl_vals (tree t)
WU (lang->u.fn.ovl_op_code);
}
  
-  if (DECL_CLASS_SCOPE_P (t))

+  if (DECL_CLASS_SCOPE_P (t) || DECL_UNIQUE_FRIEND_P (t))
WT (lang->u.fn.context);
  
if (lang->u.fn.thunk_p)

@@ -7470,7 +7470,7 @@ trees_in::lang_decl_vals (tree t)
lang->u.fn.ovl_op_code = code;
}
  
-  if (DECL_CLASS_SCOPE_P (t))

+  if (DECL_CLASS_SCOPE_P (t) || DECL_UNIQUE_FRIEND_P (t))
RT (lang->u.fn.context);
  
if (lang->u.fn.thunk_p)

diff --git a/gcc/testsuite/g++.dg/modules/concept-11_a.H 
b/gcc/testsuite/g++.dg/modules/concept-11_a.H
new file mode 100644
index 000..45127682812
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/concept-11_a.H
@@ -0,0 +1,9 @@
+// PR c++/119939
+// { dg-additional-options "-fmodule-header -std=c++20" }
+// { dg-module-cmi {} }
+
+template  concept A = true;
+template  concept B = requires { T{}; };
+template  struct S {
+  friend bool operator==(const S&, const S&) requires B = default;
+};
diff --git a/gcc/testsuite/g++.dg/modules/concept-11_b.C 
b/gcc/testsuite/g++.dg/modules/concept-11_b.C
new file mode 100644
index 000..3f6676ff965
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/concept-11_b.C
@@ -0,0 +1,9 @@
+// PR c++/119939
+// { dg-additional-options "-fmodules -std=c++20" }
+
+import "concept-11_a.H";
+
+int main() {
+  S s;
+  s == s;
+}




Re: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

2025-04-25 Thread Richard Sandiford
Jennifer Schmitz  writes:
> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a
> ptrue predicate can be replaced by neon instructions (LDR and STR),
> thus avoiding the predicate altogether. This also enables formation of
> LDP/STP pairs.
>
> For example, the test cases
>
> svfloat64_t
> ptrue_load (float64_t *x)
> {
>   svbool_t pg = svptrue_b64 ();
>   return svld1_f64 (pg, x);
> }
> void
> ptrue_store (float64_t *x, svfloat64_t data)
> {
>   svbool_t pg = svptrue_b64 ();
>   return svst1_f64 (pg, x, data);
> }
>
> were previously compiled to
> (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128):
>
> ptrue_load:
> ptrue   p3.b, vl16
> ld1dz0.d, p3/z, [x0]
> ret
> ptrue_store:
> ptrue   p3.b, vl16
> st1dz0.d, p3, [x0]
> ret
>
> Now the are compiled to:
>
> ptrue_load:
> ldr q0, [x0]
> ret
> ptrue_store:
> str q0, [x0]
> ret
>
> The implementation includes the if-statement
> if (known_eq (BYTES_PER_SVE_VECTOR, 16)
> && known_eq (GET_MODE_SIZE (mode), 16))
>
> which checks for 128-bit VLS and excludes partial modes with a
> mode size < 128 (e.g. VNx2QI).

I think it would be better to use:

if (known_eq (GET_MODE_SIZE (mode), 16)
&& aarch64_classify_vector_mode (mode) == VEC_SVE_DATA

to defend against any partial structure modes that might be added in future.

>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>   * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move):
>   Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS.
>
> gcc/testsuite/
>   * gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test.
>   * gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pst/return_4_128.c: Likewise.
>   * gcc.target/aarch64/sve/pst/return_5_128.c: Likewise.
>   * gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64.cc | 27 ++--
>  .../gcc.target/aarch64/sve/cond_arith_6.c |  3 +-
>  .../aarch64/sve/ldst_ptrue_128_to_neon.c  | 36 +++
>  .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 ---
>  .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 ---
>  .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 64 +--
>  6 files changed, 102 insertions(+), 106 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index f7bccf532f8..ac01149276b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -6416,13 +6416,28 @@ aarch64_stack_protect_canary_mem (machine_mode mode, 
> rtx decl_rtl,
>  void
>  aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
>  {
> -  expand_operand ops[3];
>machine_mode mode = GET_MODE (dest);
> -  create_output_operand (&ops[0], dest, mode);
> -  create_input_operand (&ops[1], pred, GET_MODE(pred));
> -  create_input_operand (&ops[2], src, mode);
> -  temporary_volatile_ok v (true);
> -  expand_insn (code_for_aarch64_pred_mov (mode), 3, ops);
> +  if ((MEM_P (dest) || MEM_P (src))
> +  && known_eq (BYTES_PER_SVE_VECTOR, 16)
> +  && known_eq (GET_MODE_SIZE (mode), 16)
> +  && !BYTES_BIG_ENDIAN)
> +{
> +  rtx tmp = gen_reg_rtx (V16QImode);
> +  emit_move_insn (tmp, lowpart_subreg (V16QImode, src, mode));
> +  if (MEM_P (src))
> + emit_move_insn (dest, lowpart_subreg (mode, tmp, V16QImode));
> +  else
> + emit_move_insn (adjust_address (dest, V16QImode, 0), tmp);

We shouldn't usually need a temporary register for the store case.
Also, using lowpart_subreg for a source memory leads to the best-avoided
subregs of mems when the mem is volatile, due to:

  /* Allow splitting of volatile memory references in case we don't
 have instruction to move the whole thing.  */
  && (! MEM_VOLATILE_P (op)
  || ! have_insn_for (SET, innermode))

in simplify_subreg.  So how about:

  if (MEM_P (src))
{
  rtx tmp = force_reg (V16QImode, adjust_address (src, V16QImode, 0));
  emit_move_insn (dest, lowpart_subreg (mode, tmp, V16QImode));
}
  else
emit_move_insn (adjust_address (dest, V16QImode, 0),
force_lowpart_subreg (V16QImode, src, mode));

It might be good to test the volatile case too.  That case does work
with your patch, since the subreg gets ironed out later.  It's just for
completeness.

Thanks,
Richard

> +}
> +  else
> +{
> +  expand_operand ops[3];
> +  create_output_operand (&ops[0], dest, mode);
> +  create_input_operand (&ops[1], pred, GET_MODE(pred));
> +  create_input_operand (&ops[2], src, mode);
> +  temporary_volatile_ok v (true);
> +  expand_insn (code_for_aarch64_pred_mov (mode), 3, ops)

[PUSHED] gimple: Fix comment before gimple_cond_make_false/gimple_cond_make_true

2025-04-25 Thread Andrew Pinski
I noticed the comments and the code don't match.
The correct form is:
'if (0 != 0)': false
and
'if (1 != 0)': true

That is always NE and always 0 as the second operand.

Also there is a spello for statement in the comment in
front of gimple_cond_true_p.

Pushed as obvious.

gcc/ChangeLog:

* gimple.h (gimple_cond_make_false): Fix comment.
(gimple_cond_make_true): Likewise.
(gimple_cond_true_p): Fix spello for statement in comment.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 112e5ae472d..7e3086f5632 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -3829,7 +3829,7 @@ gimple_cond_false_label (const gcond *gs)
 }
 
 
-/* Set the conditional COND_STMT to be of the form 'if (1 == 0)'.  */
+/* Set the conditional COND_STMT to be of the form 'if (0 != 0)'.  */
 
 inline void
 gimple_cond_make_false (gcond *gs)
@@ -3840,7 +3840,7 @@ gimple_cond_make_false (gcond *gs)
 }
 
 
-/* Set the conditional COND_STMT to be of the form 'if (1 == 1)'.  */
+/* Set the conditional COND_STMT to be of the form 'if (1 != 0)'.  */
 
 inline void
 gimple_cond_make_true (gcond *gs)
@@ -3850,7 +3850,7 @@ gimple_cond_make_true (gcond *gs)
   gs->subcode = NE_EXPR;
 }
 
-/* Check if conditional statemente GS is of the form 'if (1 == 1)',
+/* Check if conditional statement GS is of the form 'if (1 == 1)',
   'if (0 == 0)', 'if (1 != 0)' or 'if (0 != 1)' */
 
 inline bool
-- 
2.43.0



Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-25 Thread Qing Zhao


> On Apr 25, 2025, at 13:18, Kees Cook  wrote:
> 
> On Fri, Apr 25, 2025 at 04:56:52PM +, Qing Zhao wrote:
>> 
>> 
>>> On Apr 24, 2025, at 13:07, Kees Cook  wrote:
>>> 
>>> On Thu, Apr 24, 2025 at 04:36:14PM +, Qing Zhao wrote:
 
> On Apr 24, 2025, at 11:59, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 24.04.2025 um 15:15 + schrieb Qing Zhao:
>> Hi, 
>> 
>> Kees reported a segmentation failure when he used the patch to compiler 
>> kernel, 
>> and the reduced the testing case is something like the following:
>> 
>> struct f {
>> void *g __attribute__((__counted_by__(h)));
>> long h;
>> };
>> 
>> extern struct f *my_alloc (int);
>> 
>> int i(void) {
>> struct f *iov = my_alloc (10);
>> int *j = (int *)iov->g;
>> return __builtin_dynamic_object_size(iov->g, 0);
>> }
>> 
>> Basically, the problem is relating to the pointee type of the pointer 
>> array being “void”, 
>> As a result, the element size of the array is not available in the IR. 
>> Therefore segmentation
>> fault when calculating the size of the whole object. 
>> 
>> Although it’s easy to fix this segmentation failure, I am not quite sure 
>> what’s the best
>> solution to this issue:
>> 
>> 1. Reject such usage of “counted_by” in the very beginning by reporting 
>> warning to the
>> User, and delete the counted_by attribute from the field.
>> 
>> Or:
>> 
>> 2. Accept such usage, but issue warnings when calculating the 
>> object_size in Middle-end.
>> 
>> Personally, I prefer the above 1 since I think that when the pointee 
>> type is void, we don’t know
>> The type of the element of the pointer array, there is no way to decide 
>> the size of the pointer array. 
>> 
>> So, the counted_by information is not useful for the 
>> __builtin_dynamic_object_size.
>> 
>> But I am not sure whether the counted_by still can be used for bound 
>> sanitizer?
>> 
>> Thanks for suggestions and help.
> 
> GNU C allows pointer arithmetic and sizeof on void pointers and
> that treats void as having size 1.  So you could also allow counted_by
> and assume as size 1 for void.
> 
> https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html
 
 Okay, thanks for the info.
 So, 
 1. should we issue warnings when doing this?
>>> 
>>> Please don't, Linux would very much like to track these allocation sizes
>>> still. Performing pointer arithmetic and bounds checking (via __bdos) on
>>> "void *" is wanted (and such a calculation was what tripped the
>>> segfault).
>>> 
 2. If the compilation option is explicitly asking for standard C,
   shall we issue warning and delete the counted_by attribute from the 
 field?
>>> 
>>> I think it needs to stay attached for __bdos. And from the looks of it,
>>> even array access works with 1-byte values too:
>>> 
>>> extern void *ptr;
>>> void *foo(int num) {
>>>   return &ptr[num];
>>> }
>>> 
>>> The assembly output of this shows it's doing byte addition. Clang
>>> doesn't warn about this, but GCC does:
>>> 
>>> :5:16: warning: dereferencing 'void *' pointer
>>>   5 | return &ptr[num];
>>> |^
>>> 
>>> So, I think even the bounds sanitizer should handle it, even if a
>>> warning ultimately gets emitted.
>> 
>> I tried to come up with a testing case for array sanitizer on void pointers 
>> as following:
>> 
>> #include 
>> 
>> struct annotated {
>>  int b;
>>  void *c __attribute__ ((counted_by (b)));
>> } *array_annotated;
>> 
>> void __attribute__((__noinline__)) setup (int annotated_count)
>> {
>>  array_annotated
>>= (struct annotated *)malloc (sizeof (struct annotated));
>>  array_annotated->c = malloc (sizeof (char) * annotated_count);
>>  array_annotated->b = annotated_count;
>> 
>>  return;
>> }
>> 
>> void __attribute__((__noinline__)) test (int annotated_index)
>> {
>>  array_annotated->c[annotated_index] = 2;
> 
> Have this be:
> 
> void * __attribute__((__noinline__)) test (int annotated_index)
> {
> return &array_annotated->c[annotated_index];
> }
> 
> We stay dereferenced, but we can do take the address.

This works.  (Is this the only valid usage for this?)

So, you want the bound sanitizer to catch the out-of-bound access for the 
following?

return &array_annotated->c[annotated_index + 1];

Qing
> 
> Actually, the index will likely need to be "annotated_index + 1" because
> of this bug:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119132
> (I still don't accept this as "invalid", but I have other hills to die on)
> 
> 
> -- 
> Kees Cook




Re: [GCC16 stage 1][PATCH v2 0/3] extend "counted_by" attribute to pointer fields of structures

2025-04-25 Thread Bill Wendling
On Thu, Apr 24, 2025 at 4:56 PM Kees Cook  wrote:
> On April 24, 2025 1:44:23 PM PDT, Qing Zhao  wrote:
> >> On Apr 24, 2025, at 15:43, Bill Wendling  wrote:
> >>
> >> On Thu, Apr 24, 2025 at 8:15 AM Qing Zhao  wrote:
> >>>
> >>> Hi,
> >>>
> >>> Kees reported a segmentation failure when he used the patch to compiler 
> >>> kernel,
> >>> and the reduced the testing case is something like the following:
> >>>
> >>> struct f {
> >>> void *g __attribute__((__counted_by__(h)));
> >>> long h;
> >>> };
> >>>
> >>> extern struct f *my_alloc (int);
> >>>
> >>> int i(void) {
> >>> struct f *iov = my_alloc (10);
> >>> int *j = (int *)iov->g;
> >>> return __builtin_dynamic_object_size(iov->g, 0);
> >>> }
> >>>
> >>> Basically, the problem is relating to the pointee type of the pointer 
> >>> array being “void”,
> >>> As a result, the element size of the array is not available in the IR. 
> >>> Therefore segmentation
> >>> fault when calculating the size of the whole object.
> >>>
> >>> Although it’s easy to fix this segmentation failure, I am not quite sure 
> >>> what’s the best
> >>> solution to this issue:
> >>>
> >>> 1. Reject such usage of “counted_by” in the very beginning by reporting 
> >>> warning to the
> >>> User, and delete the counted_by attribute from the field.
> >>>
> >>> Or:
> >>>
> >>> 2. Accept such usage, but issue warnings when calculating the object_size 
> >>> in Middle-end.
> >>>
> >>> Personally, I prefer the above 1 since I think that when the pointee type 
> >>> is void, we don’t know
> >>> The type of the element of the pointer array, there is no way to decide 
> >>> the size of the pointer array.
> >>>
> >>> So, the counted_by information is not useful for the 
> >>> __builtin_dynamic_object_size.
> >>>
> >>> But I am not sure whether the counted_by still can be used for bound 
> >>> sanitizer?
> >>>
> >>> Thanks for suggestions and help.
> >>>
> >> Clang supports __sized_by that can handle a 'void *', where it defaults to 
> >> 'u8'.
>
> I would like to be able to use counted_by (and not sized_by) so that users of 
> the annotation don't need to have to change the marking just because it's 
> "void *". Everything else operates on "void *" as if it were u8 ...
>
I'll float this idea past the Clang people. I don't have any immediate
objections to it.

-bw

> Regardless, ignoring "void *", the rest of my initial testing (of both GCC 
> and Clang) is positive. The test cases are all behaving as expected! Yay! :) 
> I will try to construct some more goofy stuff to find more corner cases.
>
> And at some future point we may want to think about 
> -fsanitize=pointer-overflow using this information too, to catch arithmetic 
> and increments past the bounds...
>
> struct foo {
>   u8 *buf __counted_by(len);
>   int len;
> } str;
> u8 *walk;
> str->buf = malloc(10);
> str->len = 10;
>
> walk = str->buf + 12; // trip!
> for (walk = str->buf; ; walk++) // trip after 10 loops
>;
>
>
> -Kees
>
> --
> Kees Cook


Re: [PATCH] tailcall: Support ERF_RETURNS_ARG for tailcall [PR67797]

2025-04-25 Thread Jakub Jelinek
On Sat, Apr 19, 2025 at 08:33:48PM -0700, Andrew Pinski wrote:
> > +   {
> > + tree other_value = NULL_TREE;
> > + /* If we have a function call that we know the return value is 
> > the same
> > +as the argument, try the argument too. */
> > + int flags = gimple_call_return_flags (call);
> > + if ((flags & ERF_RETURNS_ARG) != 0
> > + && (flags & ERF_RETURN_ARG_MASK) < gimple_call_num_args 
> > (call))
> > +   other_value = gimple_call_arg (call, flags & 
> > ERF_RETURN_ARG_MASK);

I think this needs to verify that other_value's type is uselessly
convertible to TREE_TYPE (ret_var).
Because just relying on operand_equal_p returning false otherwise wouldn't
work.
E.g. if call is memcpy, and you pass say NULL as the first argument to it
(and some pointer and some non-constant as last), but the function returns
unsigned char 0, you don't want to do a tail call just because NULL is equal
to that unsigned char 0.  Sure, on lots of targets/ABIs that will in this
particular case work fine, but isn't guaranteed to work on all.
Perhaps on 32-bit arches if function returns 0ULL?

Jakub



[PATCH] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-04-25 Thread Konstantinos Eleftheriou
During the base register initialization, when we are eliminating the load
instruction, we were calling `emit_move_insn` on registers of the same
size but of different mode in some cases, causing an ICE.

This patch fixes this, by adding a check for the modes to match before
calling `emit_move_insn`.

Bootstrapped/regtested on AArch64 and x86_64.

PR rtl-optimization/119884

gcc/ChangeLog:

* avoid-store-forwarding.cc (process_store_forwarding):
Added check to ensure that the register modes match
before calling `emit_move_insn`.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr119884.c: New test.
---
 gcc/avoid-store-forwarding.cc|  3 ++-
 gcc/testsuite/gcc.target/i386/pr119884.c | 13 +
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr119884.c

diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
index ded8d7e596e0..aec05c22ac37 100644
--- a/gcc/avoid-store-forwarding.cc
+++ b/gcc/avoid-store-forwarding.cc
@@ -244,7 +244,8 @@ process_store_forwarding (vec &stores, 
rtx_insn *load_insn,
GET_MODE_BITSIZE (GET_MODE (it->mov_reg
base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg);
 
- if (base_reg)
+ /* Generate a move instruction, only when the modes match.  */
+ if (base_reg && dest_mode == GET_MODE (base_reg))
{
  rtx_insn *move0 = emit_move_insn (dest, base_reg);
  if (recog_memoized (move0) >= 0)
diff --git a/gcc/testsuite/gcc.target/i386/pr119884.c 
b/gcc/testsuite/gcc.target/i386/pr119884.c
new file mode 100644
index ..34d5b689244d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr119884.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-dse -favoid-store-forwarding" } */
+
+typedef __attribute__((__vector_size__(64))) char V;
+char c;
+V v;
+
+char
+foo()
+{
+  v *= c;
+  return v[0];
+}
\ No newline at end of file
-- 
2.49.0



[committed v2] testsuite: Skip tests incompatible with generic thunk support

2025-04-25 Thread Dimitar Dimitrov
Some backends do not define TARGET_ASM_OUTPUT_MI_THUNK.  But the generic
thunk support cannot emit code for calling variadic methods of
multiple-inheritance classes.  Example error for pru-unknown-elf:

 .../gcc/gcc/testsuite/g++.dg/ipa/pr83549.C:7:24: error: generic thunk code 
fails for method 'virtual void C::_ZThn4_N1C3fooEz(...)' which uses '...'

Disable the affected tests for all targets which do not define
TARGET_ASM_OUTPUT_MI_THUNK.

Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.

Pushed to trunk. V1 of this patch was provisionally approved in
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681791.html

gcc/ChangeLog:

* doc/sourcebuild.texi: Document variadic_mi_thunk effective
target check.

gcc/testsuite/ChangeLog:

* g++.dg/ipa/pr83549.C: Require effective target
variadic_mi_thunk.
* g++.dg/ipa/pr83667.C: Ditto.
* g++.dg/torture/pr81812.C: Ditto.
* g++.old-deja/g++.jason/thunk3.C: Ditto.
* lib/target-supports.exp
(check_effective_target_variadic_mi_thunk): New function.

Signed-off-by: Dimitar Dimitrov 
---
Changes since v1:
 - Added documentation to sourcebuild.texi.

 gcc/doc/sourcebuild.texi  |  3 ++
 gcc/testsuite/g++.dg/ipa/pr83549.C|  1 +
 gcc/testsuite/g++.dg/ipa/pr83667.C|  1 +
 gcc/testsuite/g++.dg/torture/pr81812.C|  1 +
 gcc/testsuite/g++.old-deja/g++.jason/thunk3.C |  3 +-
 gcc/testsuite/lib/target-supports.exp | 31 +++
 6 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 0bd98737156..65eeeccb264 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2957,6 +2957,9 @@ Target supports @code{sysconf}.
 @item trampolines
 Target supports trampolines.
 
+@item variadic_mi_thunk
+Target supports C++ virtual variadic function calls with multiple inheritance.
+
 @item two_plus_gigs
 Target supports linking programs with 2+GiB of data.
 
diff --git a/gcc/testsuite/g++.dg/ipa/pr83549.C 
b/gcc/testsuite/g++.dg/ipa/pr83549.C
index 90cf8fe7e0d..3b4547b71df 100644
--- a/gcc/testsuite/g++.dg/ipa/pr83549.C
+++ b/gcc/testsuite/g++.dg/ipa/pr83549.C
@@ -1,5 +1,6 @@
 // PR ipa/83549
 // { dg-do compile }
+// { dg-require-effective-target variadic_mi_thunk }
 // { dg-options "-O2" }
 
 struct A { virtual ~A (); };
diff --git a/gcc/testsuite/g++.dg/ipa/pr83667.C 
b/gcc/testsuite/g++.dg/ipa/pr83667.C
index a8a5a5adb3a..ec91a2ebd33 100644
--- a/gcc/testsuite/g++.dg/ipa/pr83667.C
+++ b/gcc/testsuite/g++.dg/ipa/pr83667.C
@@ -1,6 +1,7 @@
 // { dg-require-alias "" }
 // { dg-options "-fdump-ipa-inline" }
 // c++/83667 ICE dumping a static thunk when TARGET_USE_LOCAL_THUNK_ALIAS_P
+// { dg-require-effective-target variadic_mi_thunk }
 
 
 struct a
diff --git a/gcc/testsuite/g++.dg/torture/pr81812.C 
b/gcc/testsuite/g++.dg/torture/pr81812.C
index b5c621d2beb..80aed8eb2b6 100644
--- a/gcc/testsuite/g++.dg/torture/pr81812.C
+++ b/gcc/testsuite/g++.dg/torture/pr81812.C
@@ -1,4 +1,5 @@
 // { dg-xfail-if "PR108277" { arm_thumb1 } }
+// { dg-require-effective-target variadic_mi_thunk }
 
 struct Error {
   virtual void error(... ) const;
diff --git a/gcc/testsuite/g++.old-deja/g++.jason/thunk3.C 
b/gcc/testsuite/g++.old-deja/g++.jason/thunk3.C
index 4e684f9367d..e894194db1f 100644
--- a/gcc/testsuite/g++.old-deja/g++.jason/thunk3.C
+++ b/gcc/testsuite/g++.old-deja/g++.jason/thunk3.C
@@ -1,5 +1,6 @@
 // { dg-do run }
-// { dg-skip-if "fails with generic thunk support" { rs6000-*-* powerpc-*-eabi 
v850-*-* sh-*-* h8*-*-* xtensa*-*-* m32r*-*-* lm32-*-* } }
+// { dg-skip-if "fails with generic thunk support" { rs6000-*-* powerpc-*-eabi 
sh-*-* xtensa*-*-* } }
+// { dg-require-effective-target variadic_mi_thunk }
 // Test that variadic function calls using thunks work right.
 // Note that this will break on any target that uses the generic thunk
 //  support, because it doesn't support variadic functions.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 16bb2ae4426..4ccac18accc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -659,6 +659,37 @@ proc check_effective_target_trampolines { } {
 return 1
 }
 
+# Return 1 if target supports calling virtual variadic methods
+# of multi-inheritance classes.
+
+proc check_effective_target_variadic_mi_thunk { } {
+# These targets do not implement TARGET_ASM_OUTPUT_MI_THUNK.
+if { [istarget avr-*-*]
+|| [istarget bpf-*-*]
+|| [istarget fr30-*-*]
+|| [istarget ft32-*-*]
+|| [istarget amdgcn-*-*]
+|| [istarget h8300-*-*]
+|| [istarget iq2000-*-*]
+|| [istarget lm32-*-*]
+|| [istarget m32c-*-*]
+|| [istarget m32r-*-*]
+|| [istarget mcore-*-*]
+|| [istarget moxie-*-*]
+|| [istarget msp430-*-*]
+||

Re: [PATCH] tailcall: Support ERF_RETURNS_ARG for tailcall [PR67797]

2025-04-25 Thread Andrew Pinski
On Fri, Apr 25, 2025 at 11:54 AM Jakub Jelinek  wrote:
>
> On Sat, Apr 19, 2025 at 08:33:48PM -0700, Andrew Pinski wrote:
> > > +   {
> > > + tree other_value = NULL_TREE;
> > > + /* If we have a function call that we know the return value is 
> > > the same
> > > +as the argument, try the argument too. */
> > > + int flags = gimple_call_return_flags (call);
> > > + if ((flags & ERF_RETURNS_ARG) != 0
> > > + && (flags & ERF_RETURN_ARG_MASK) < gimple_call_num_args 
> > > (call))
> > > +   other_value = gimple_call_arg (call, flags & 
> > > ERF_RETURN_ARG_MASK);
>
> I think this needs to verify that other_value's type is uselessly
> convertible to TREE_TYPE (ret_var).
> Because just relying on operand_equal_p returning false otherwise wouldn't
> work.
> E.g. if call is memcpy, and you pass say NULL as the first argument to it
> (and some pointer and some non-constant as last), but the function returns
> unsigned char 0, you don't want to do a tail call just because NULL is equal
> to that unsigned char 0.  Sure, on lots of targets/ABIs that will in this
> particular case work fine, but isn't guaranteed to work on all.
> Perhaps on 32-bit arches if function returns 0ULL?

Yes you are right. Along the similar lines I noticed that
useless_type_conversion_p is also too strong in some cases; I filed PR
119945 for that. I will implement the fix for that too. It does not
matter for musttail compatibility since clang requires the same return
types so we didn't see that come up until now.
Anyways I will fix that one too.

Thanks,
Andrew

>
> Jakub
>


[committed v2] testsuite: Add require target for SJLJ exception implementation

2025-04-25 Thread Dimitar Dimitrov
Testcases for musttail call optimization fail on pru-unknown-elf:
  FAIL: c-c++-common/musttail14.c  -std=gnu++17 (test for excess errors)
  Excess errors:
  .../gcc/gcc/testsuite/c-c++-common/musttail14.c:37:14: error: cannot 
tail-call: caller uses sjlj exceptions

Silence these errors by disabling the tests if target uses SJLJ for
implementing exceptions.  Use a new effective target check for this.

Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.

Pushed to trunk. V1 of this patch was provisionally approved in
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681802.html

gcc/ChangeLog:

* doc/sourcebuild.texi: Document effective target
using_sjlj_exceptions.

gcc/testsuite/ChangeLog:

* c-c++-common/musttail14.c: Disable test if effective target
using_sjlj_exceptions.
* c-c++-common/musttail22.c: Ditto.
* g++.dg/musttail8.C: Ditto.
* g++.dg/musttail9.C: Ditto.
* g++.dg/opt/musttail3.C: Ditto.
* g++.dg/opt/musttail4.C: Ditto.
* g++.dg/opt/musttail5.C: Ditto.
* g++.dg/opt/pr119613.C: Ditto.
* lib/target-supports.exp
(check_effective_target_using_sjlj_exceptions): New check.

Signed-off-by: Dimitar Dimitrov 
---
Changes since v1:
 - Added documentation to sourcebuild.texi.

 gcc/doc/sourcebuild.texi|  3 +++
 gcc/testsuite/c-c++-common/musttail14.c |  2 +-
 gcc/testsuite/c-c++-common/musttail22.c |  2 +-
 gcc/testsuite/g++.dg/musttail8.C|  2 +-
 gcc/testsuite/g++.dg/musttail9.C|  2 +-
 gcc/testsuite/g++.dg/opt/musttail3.C|  2 +-
 gcc/testsuite/g++.dg/opt/musttail4.C|  2 +-
 gcc/testsuite/g++.dg/opt/musttail5.C|  2 +-
 gcc/testsuite/g++.dg/opt/pr119613.C |  2 +-
 gcc/testsuite/lib/target-supports.exp   | 12 
 10 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index c29cd3f5207..0bd98737156 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3012,6 +3012,9 @@ Note that this is orthogonal to effective-target 
@code{exceptions_enabled}.
 Testing configuration has exception handling enabled.
 Note that this is orthogonal to effective-target @code{exceptions}.
 
+@item using_sjlj_exceptions
+Target uses @code{setjmp} and @code{longjmp} for implementing exceptions.
+
 @item fgraphite
 Target supports Graphite optimizations.
 
diff --git a/gcc/testsuite/c-c++-common/musttail14.c 
b/gcc/testsuite/c-c++-common/musttail14.c
index 56a52b8638b..5bda742b36f 100644
--- a/gcc/testsuite/c-c++-common/musttail14.c
+++ b/gcc/testsuite/c-c++-common/musttail14.c
@@ -1,5 +1,5 @@
 /* PR tree-optimization/118430 */
-/* { dg-do compile { target musttail } } */
+/* { dg-do compile { target { musttail && { ! using_sjlj_exceptions } } } } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 /* { dg-final { scan-tree-dump-times "  \[^\n\r]* = bar \\\(\[^\n\r]*\\\); 
\\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "  \[^\n\r]* = freddy \\\(\[^\n\r]*\\\); 
\\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/musttail22.c 
b/gcc/testsuite/c-c++-common/musttail22.c
index eb812494f44..7dc0f199ea9 100644
--- a/gcc/testsuite/c-c++-common/musttail22.c
+++ b/gcc/testsuite/c-c++-common/musttail22.c
@@ -1,5 +1,5 @@
 /* PR tree-optimization/118430 */
-/* { dg-do compile { target musttail } } */
+/* { dg-do compile { target { musttail && { ! using_sjlj_exceptions } } } } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 /* { dg-final { scan-tree-dump-times "  \[^\n\r]* = bar \\\(\[^\n\r]*\\\); 
\\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "  \[^\n\r]* = freddy \\\(\[^\n\r]*\\\); 
\\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
diff --git a/gcc/testsuite/g++.dg/musttail8.C b/gcc/testsuite/g++.dg/musttail8.C
index 0f1b68bd269..18de9c87935 100644
--- a/gcc/testsuite/g++.dg/musttail8.C
+++ b/gcc/testsuite/g++.dg/musttail8.C
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { musttail } } } */
+/* { dg-do compile { target { musttail && { ! using_sjlj_exceptions } } } } */
 /* { dg-options "-std=gnu++11" } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
diff --git a/gcc/testsuite/g++.dg/musttail9.C b/gcc/testsuite/g++.dg/musttail9.C
index 85937dcdcd3..1c3a744a4e4 100644
--- a/gcc/testsuite/g++.dg/musttail9.C
+++ b/gcc/testsuite/g++.dg/musttail9.C
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { musttail } } } */
+/* { dg-do compile { target { musttail && { ! using_sjlj_exceptions } } } } */
 /* { dg-options "-std=gnu++11" } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
diff --git a/gcc/testsuite/g++.dg/opt/musttail3.C 
b/gcc/testsuite/g++.dg/opt/musttail3.C
index 1c4e54952b1..a2db4479ec1 100644
--- a/gcc/testsuite/g++.dg/opt/mustt

Re: [PATCH] testsuite: Skip tests incompatible with generic thunk support

2025-04-25 Thread Dimitar Dimitrov
On Thu, Apr 24, 2025 at 11:56:52PM +0200, Bernhard Reutner-Fischer wrote:
> 
> >>* lib/target-supports.exp
> >>(check_effective_target_variadic_mi_thunk): New function.
> >OK.
> >jeff
> >
> 
> Please document new effective_target checks in sourcebuild.texi
> 
> thanks

Pushed with added documentation: 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b1cb7a5e273eb3442259981216295d286a7892c7

Thanks,
Dimitar


[pushed] wwwdocs: my GCC 15 changes

2025-04-25 Thread David Malcolm
I forgot to push this before going on vacation; sorry.

Pushed to trunk.

---
 htdocs/gcc-15/changes.html  | 260 +++-
 htdocs/gcc-15/diag-color-screenshot.png | Bin 0 -> 33062 bytes
 2 files changed, 257 insertions(+), 3 deletions(-)
 create mode 100644 htdocs/gcc-15/diag-color-screenshot.png

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 3e3c6655..1b7d0e1b 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -38,6 +38,15 @@ You may also want to check out our
 padding bits is desirable, use {} (valid in C23 or C++)
 or use -fzero-init-padding-bits=unions option to restore
 old GCC behavior.
+  https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html";>-fanalyzer
+is still only suitable for analyzing C code.
+In particular, using it on C++ is unlikely to give meaningful output.
+  
+  The json format for
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format";>-fdiagnostics-format=
+is deprecated and may be removed in a future release.
+Users seeking machine-readable diagnostics from GCC should use
+https://gcc.gnu.org/wiki/SARIF";>SARIF.
 
 
 
@@ -79,6 +88,20 @@ You may also want to check out our
 significantly improved. The compiler can now track columnn numbers larger
 than 4096. Very large source files have more accurate location reporting.
   
+  GCC can now emit diagnostics in multiple formats simultaneously,
+via the new option
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-add-output";>-fdiagnostics-add-output=.
+For example, use
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-add-output";>-fdiagnostics-add-output=sarif
+to get both GCC's classic text output on stderr and
+https://gcc.gnu.org/wiki/SARIF";>SARIF output to a file.
+There is also a new option
+ https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-set-output";>-fdiagnostics-set-output=
+which exposes more control than existing options for some experimental 
cases.
+These new options are an alternative to the existing
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format";>-fdiagnostics-format=
+which only supports a single output format at a time.
+  
 
 
 
@@ -301,7 +324,7 @@ procedure Initialize (Obj : in out T);
   New constraints have been added for defining symbols or using symbols
   inside of inline assembler, and a new generic operand modifier has
   been added to allow printing those regardless of PIC.  For example:
-
+
 struct S { int a, b, c; };
 extern foo (void);
 extern char var;
@@ -313,7 +336,7 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
  "-s" (var2), /* Tell compiler asm uses var2 variable.  */
   /* "s" would work too but might not work with -fpic.  */
  "i" (sizeof (struct S))); /* It is possible to pass constants to toplevel 
asm.  */
-
+
 
 The "redzone" clobber is now allowed in inline
 assembler statements to describe that the assembler can overwrite
@@ -333,6 +356,42 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
 -Wheader-guard warning has been added and enabled
 in -Wall to warn about some inconsistencies in header
 file guarding macros.
+The C and C++ frontends now provide fix-it hints for some cases of 
missing
+'&' and '*'.
+For example, note the ampersand fix-it hint in the following:
+
+demo.c: In function 'int main()':
+demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka 'unsigned 
int'}
+   to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
+5 |   pthread_key_create(key, NULL);
+  |  ^~~
+  |  |
+  |  pthread_key_t {aka unsigned int}
+demo.c:5:22: note: possible fix: take the address with '&'
+5 |   pthread_key_create(key, NULL);
+  |  ^~~
+  |  &
+In file included from demo.c:1:
+/usr/include/pthread.h:1122:47: note:   initializing argument 1 of
+   'int pthread_key_create(pthread_key_t*, void (*)(void*))'
+ 1122 | extern int pthread_key_create (pthread_key_t *__key,
+  |~~~^
+
+
+Diagnostic messages referring to attributes now provide URLs
+  to the documentation of the pertinent attributes in sufficiently
+  capable terminals, and in SARIF output.
+
+Diagnostics in which two different things in the source are
+  being contrasted (such as type mismatches) now use color to
+  visually highlight and distinguish the differences, in both the
+  text message of the diagnostic, and the 

RE: [PATCH] RISC-V: Add testcases for unsigned imm vec SAT_SUB form2~4

2025-04-25 Thread Li, Pan2
> Form2:

> void __attribute__((noinline)) \

> vec_sat_u_sub_imm##IMM##_##T##_fmt_2 (T *out, T *in, unsigned limit)  \

> {   \

>   unsigned i;   \

>   for (i = 0; i < limit; i++)   \

> out[i] = in[i] >= (T)IMM ? in[i] - (T)IMM : 0;  \

> }



> Form3:

> void __attribute__((noinline)) \

> vec_sat_u_sub_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit)  \

> {   \

>   unsigned i;   \

>   for (i = 0; i < limit; i++)   \

> out[i] = (T)IMM > in[i] ? (T)IMM - in[i] : 0;   \

> }



> Form4:

> void __attribute__((noinline)) \

> vec_sat_u_sub_imm##IMM##_##T##_fmt_4 (T *out, T *in, unsigned limit)  \

> {   \

>   unsigned i;   \

>   for (i = 0; i < limit; i++)   \

> out[i] = in[i] > (T)IMM ? in[i] - (T)IMM : 0;   \

> }



Ideally should be 4 forms here? I mean commutative with 4 forms.



In[i] >= IMM

In[i] > IMM



IMM > in[i]

IMM >= in[i]



Otherwise, LGTM. Please also double check if we need to rebase to upstream.

Pan

From: 钟居哲 mailto:juzhe.zh...@rivai.ai>>
Sent: Thursday, January 2, 2025 4:04 PM
To: xuli1 mailto:xu...@eswincomputing.com>>; 
gcc-patches mailto:gcc-patches@gcc.gnu.org>>
Cc: kito.cheng mailto:kito.ch...@gmail.com>>; palmer 
mailto:pal...@dabbelt.com>>; Li, Pan2 
mailto:pan2...@intel.com>>; xuli1 
mailto:xu...@eswincomputing.com>>
Subject: Re: [PATCH] RISC-V: Add testcases for unsigned imm vec SAT_SUB form2~4

LGTM.


juzhe.zh...@rivai.ai

From: Li Xu
Date: 2025-01-02 16:02
To: gcc-patches
CC: kito.cheng; palmer; 
juzhe.zhong; pan2.li; 
xuli
Subject: [PATCH] RISC-V: Add testcases for unsigned imm vec SAT_SUB form2~4
From: xuli mailto:xu...@eswincomputing.com>>

Form2:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_2 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = in[i] >= (T)IMM ? in[i] - (T)IMM : 0;  \
}

Form3:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = (T)IMM > in[i] ? (T)IMM - in[i] : 0;   \
}

Form4:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_4 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = in[i] > (T)IMM ? in[i] - (T)IMM : 0;   \
}

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu mailto:xu...@eswincomputing.com>>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: add unsigned imm vec 
sat_sub form2~4.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: add data for vec sat_sub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat

RE: [PATCH 2/2] RISC-V:Add testcases for signed .SAT_ADD IMM form 1 with IMM = -1.

2025-04-25 Thread Li, Pan2
LGTM for the RISC-V part.

Pan

-Original Message-
From: Li Xu  
Sent: Thursday, January 2, 2025 4:35 PM
To: gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; richard.guent...@gmail.com; tamar.christ...@arm.com; 
juzhe.zh...@rivai.ai; Li, Pan2 ; jeffreya...@gmail.com; 
rdapp@gmail.com; xuli 
Subject: [PATCH 2/2] RISC-V:Add testcases for signed .SAT_ADD IMM form 1 with 
IMM = -1.

From: xuli 

This patch adds testcase for form1, as shown below:

T __attribute__((noinline))  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{\
  T sum = (UT)x + (UT)IMM; \
  return (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_s_add_imm-2.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-3.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-4.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i64.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i8.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-2.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-3.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-4.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i64.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i8.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-2-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-3-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-1-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i8.c: ...here.
---
 ...at_s_add_imm-2.c => sat_s_add_imm-1-i16.c} | 27 ++-
 ...at_s_add_imm-3.c => sat_s_add_imm-1-i32.c} | 26 +-
 ...at_s_add_imm-4.c => sat_s_add_imm-1-i64.c} | 22 ++-
 ...sat_s_add_imm-1.c => sat_s_add_imm-1-i8.c} | 22 ++-
 ..._imm-run-2.c => sat_s_add_imm-run-1-i16.c} |  6 +
 ..._imm-run-3.c => sat_s_add_imm-run-1-i32.c} |  6 +
 ..._imm-run-4.c => sat_s_add_imm-run-1-i64.c} |  6 +
 ...d_imm-run-1.c => sat_s_add_imm-run-1-i8.c} |  6 +
 ...2-1.c => sat_s_add_imm_type_check-1-i16.c} |  0
 ...3-1.c => sat_s_add_imm_type_check-1-i32.c} |  0
 ...-1-1.c => sat_s_add_imm_type_check-1-i8.c} |  0
 11 files changed, 117 insertions(+), 4 deletions(-)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-2.c => 
sat_s_add_imm-1-i16.c} (53%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-3.c => 
sat_s_add_imm-1-i32.c} (53%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-4.c => 
sat_s_add_imm-1-i64.c} (55%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-1.c => 
sat_s_add_imm-1-i8.c} (57%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-2.c => 
sat_s_add_imm-run-1-i16.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-3.c => 
sat_s_add_imm-run-1-i32.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-4.c => 
sat_s_add_imm-run-1-i64.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-1.c => 
sat_s_add_imm-run-1-i8.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-2-1.c => 
sat_s_add_imm_type_check-1-i16.c} (100%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-3-1.c => 
sat_s_add_imm_type_check-1-i32.c} (100%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-1-1.c => 
sat_s_add_imm_type_check-1-i8.c} (100%)

diff --git a/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c 
b/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
similarity index 53%
rename from gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c
rename to gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
index 3878286d207..2e23af5d86b 100644
--- a/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
@@ -29,4 +29,29 @@
 */
 DEF_SAT_S_ADD_IMM_FMT_1(0, int16_t, uint16_t, -7, INT16_MIN, INT16_MAX)
 
-/* { dg-final { scan-tree-dump-times ".SAT_ADD " 1 "optimized" } } */
+/*
+** sat_s_add_imm_int16_t_fmt_1_1:
+** addi\s+[atx][0-9]+,\s*a0,\s*-1
+** not\s+[atx][0-9]+,\s*a0
+** xor\s+[atx][0-

Re: [PATCH] simplify-rtx: Simplify `(zero_extend (and x CST))` -> (and (subreg x) CST)

2025-04-25 Thread Jeff Law




On 4/25/25 9:25 PM, Andrew Pinski wrote:

This adds the simplification of a ZERO_EXTEND of an AND. This optimization
was already handled in combine via combine_simplify_rtx and the handling
there of compound_operations (ZERO_EXTRACT).

Build and tested for aarch64-linux-gnu.
Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_unary_operation_1) :
Add simplifcation for and with a constant.
Funny.  That might allow simplification of a patch Shreya and I were 
about to submit.


On the RISC-V port we do not define SHIFT_COUNT_TRUNCATED so if we want 
to eliminate any explicit masking of the shift count/bit position we 
need patterns which incorporate the masking operation.


While putting those together recently we found that we often had an 
extraneous zero_extend.  That was the last technical question on that 
patchkit.


OK for the trunk.

jeff



Re: [PATCH] simplify-rtx: Simplify `(zero_extend (and x CST))` -> (and (subreg x) CST)

2025-04-25 Thread Andrew Pinski
On Fri, Apr 25, 2025 at 8:53 PM Jeff Law  wrote:
>
>
>
> On 4/25/25 9:25 PM, Andrew Pinski wrote:
> > This adds the simplification of a ZERO_EXTEND of an AND. This optimization
> > was already handled in combine via combine_simplify_rtx and the handling
> > there of compound_operations (ZERO_EXTRACT).
> >
> > Build and tested for aarch64-linux-gnu.
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> >   * simplify-rtx.cc (simplify_context::simplify_unary_operation_1) 
> > :
> >   Add simplifcation for and with a constant.
> Funny.  That might allow simplification of a patch Shreya and I were
> about to submit.
>
> On the RISC-V port we do not define SHIFT_COUNT_TRUNCATED so if we want
> to eliminate any explicit masking of the shift count/bit position we
> need patterns which incorporate the masking operation.
>
> While putting those together recently we found that we often had an
> extraneous zero_extend.  That was the last technical question on that
> patchkit.
>
> OK for the trunk.

Thanks for the approval. I had forgot mention where I noticed this. It
was when I was looking into gcc.target/aarch64/ccmp_3.c failure (That
got fixed a different way),
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118755#c2. That is it
shows up when dealing with booleans on aarch64. It was unrelated to
shifts though I have seen a similar issue with zero_extend with shifts
recently too but I can't remember where.

Thanks,
Andrew Pinski


>
> jeff
>