date:20250526

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vxor.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-05-26 Thread pan2 . li

From: Pan Li 

Add asm dump check test for vec_duplicate + vxor.vv combine to vxor.vx,
with the GR2VR cost is 0, 2 and 15.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vxor.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for vxor run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx-1-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 392 ++
 .../rvv/autovec/vx_vf/vx_vxor-run-1-i16.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-i32.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-i64.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-i8.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-u16.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-u32.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-u64.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-u8.c  |  15 +
 33 files changed, 560 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/risc

[PATCH] i386: Add more forms peephole2 for adc/sbb

2025-05-26 Thread Hu, Lin1

Hi, all

Enable -mapxf will change some patterns about adc/sbb.

Hence gcc will raise an extra mov like
 movq8(%rdi), %rax
 adcq%rax, 8(%rsi), %rax
 movq%rax, 8(%rdi)
rather than
 movq8(%rsi), %rax
 adcq%rax, 8(%rdi)

The patch add more kinds of peephole2 to eliminate the extra mov.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

BRs,
Lin

gcc/ChangeLog:

* config/i386/i386.md: Add 4 new peephole2 by swap the original
peephole2's operands' order to support new pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr79173-13.c: New test.
* gcc.target/i386/pr79173-14.c: Ditto.
* gcc.target/i386/pr79173-15.c: Ditto.
* gcc.target/i386/pr79173-16.c: Ditto.
* gcc.target/i386/pr79173-17.c: Ditto.
* gcc.target/i386/pr79173-18.c: Ditto.
---
 gcc/config/i386/i386.md| 186 +
 gcc/testsuite/gcc.target/i386/pr79173-13.c |  59 +++
 gcc/testsuite/gcc.target/i386/pr79173-14.c |  59 +++
 gcc/testsuite/gcc.target/i386/pr79173-15.c |  61 +++
 gcc/testsuite/gcc.target/i386/pr79173-16.c |  61 +++
 gcc/testsuite/gcc.target/i386/pr79173-17.c |  32 
 gcc/testsuite/gcc.target/i386/pr79173-18.c |  33 
 7 files changed, 491 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr79173-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr79173-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr79173-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr79173-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr79173-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr79173-18.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b7a18d583da..4c9cb81d5f9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8719,6 +8719,34 @@ (define_peephole2
  (set (match_dup 1)
   (minus:SWI (match_dup 1) (match_dup 0)))])])
 
+;; Under APX NDD, 'sub reg, mem, reg' is valid.
+;; New format for
+;; mov reg0, mem1
+;; sub reg0, mem2, reg0
+;; mov mem2, reg0
+;; to
+;; mov reg0, mem1
+;; sub mem2, reg0
+(define_peephole2
+  [(set (match_operand:SWI 0 "general_reg_operand")
+   (match_operand:SWI 1 "memory_operand"))
+   (parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC (match_operand:SWI 2 "memory_operand")
+  (match_dup 0)))
+ (set (match_dup 0)
+  (minus:SWI (match_dup 2) (match_dup 0)))])
+   (set (match_dup 2) (match_dup 0))]
+  "TARGET_APX_NDD
+   && (TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && peep2_reg_dead_p (3, operands[0])
+   && !reg_overlap_mentioned_p (operands[0], operands[1])
+   && !reg_overlap_mentioned_p (operands[0], operands[2])"
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC (match_dup 2) (match_dup 0)))
+ (set (match_dup 2)
+  (minus:SWI (match_dup 2) (match_dup 0)))])])
+
 ;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into
 ;; subl $1, %eax; jnc .Lxx;
 (define_peephole2
@@ -9166,6 +9194,118 @@ (define_peephole2
   (match_dup 1))
   (match_dup 0)))])])
 
+;; Under APX NDD, 'adc reg, mem, reg' is valid.
+;;
+;; New format for
+;; mov reg0, mem1
+;; adc reg0, mem2, reg0
+;; mov mem1, reg0
+;; to
+;; mov reg0, mem2
+;; adc mem1, reg0
+(define_peephole2
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+   (match_operand:SWI48 1 "memory_operand"))
+   (parallel [(set (reg:CCC FLAGS_REG)
+  (compare:CCC
+(zero_extend:
+  (plus:SWI48
+(plus:SWI48
+  (match_operator:SWI48 5 "ix86_carry_flag_operator"
+[(match_operand 3 "flags_reg_operand")
+ (const_int 0)])
+  (match_operand:SWI48 2 "memory_operand"))
+(match_dup 0)))
+(plus:
+  (match_operator: 4 "ix86_carry_flag_operator"
+[(match_dup 3) (const_int 0)])
+  (zero_extend: (match_dup 0)
+ (set (match_dup 0)
+  (plus:SWI48 (plus:SWI48 (match_op_dup 5
+[(match_dup 3) (const_int 0)])
+  (match_dup 2))
+  (match_dup 0)))])
+   (set (match_dup 1) (match_dup 0))]
+  "TARGET_APX_NDD
+   && (TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && peep2_reg_dead_p (3, operands[0])
+   && !reg_overlap_mentioned_p (operands[0], operands[1])
+   && !reg_overlap_mentioned_p (operands[0], operands[2])"
+  [(set (match_dup 0) (match_dup 2))
+   (parallel [(set (reg:CCC FLAGS_REG)
+  (comp

Re: [PATCH] expmed: Prevent non-canonical subreg generation in store_bit_field [PR118873]

2025-05-26 Thread Richard Biener

On Mon, 26 May 2025, Konstantinos Eleftheriou wrote:

> In `store_bit_field_1`, when the value to be written in the bitfield
> and/or the bitfield itself have vector modes, non-canonical subregs
> are generated, like `(subreg:V4SI (reg:V8SI x) 0)`. If one them is
> a scalar, this happens only when the scalar mode is different than the
> vector's inner mode.
> 
> This patch tries to prevent this, using vec_set patterns when
> possible.

I know almost nothing about this code, but why does the patch
fixup things after the fact rather than avoid generating the
SUBREG in the first place?

ISTR it also (unfortunately) depends on the target which forms
are considered canonical.

I'm also not sure you got endianess right for all possible
values of SUBREG_BYTE.  One more reason to not generate such
subreg in the first place but stick to vec_select/concat.

Richard.

> Bootstrapped/regtested on AArch64 and x86_64.
> 
>   PR rtl-optimization/118873
> 
> gcc/ChangeLog:
> 
>   * expmed.cc (generate_vec_concat): New function.
>   (store_bit_field_1): Check for cases where the value
>   to be written and/or the bitfield have vector modes
>   and try to generate the corresponding vec_set patterns
>   instead of subregs.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr118873.c: New test.
> ---
>  gcc/expmed.cc| 174 ++-
>  gcc/testsuite/gcc.target/i386/pr118873.c |  33 +
>  2 files changed, 200 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr118873.c
> 
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index 8cf10d9c73bf..8c641f55b9c6 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -740,6 +740,42 @@ store_bit_field_using_insv (const extraction_insn *insv, 
> rtx op0,
>return false;
>  }
>  
> +/* Helper function for store_bit_field_1, used in the case that the bitfield
> +   and the destination are both vectors.  It extracts the elements of OP from
> +   LOWER_BOUND to UPPER_BOUND using a vec_select and uses a vec_concat to
> +   concatenate the extracted elements with the VALUE.  */
> +
> +rtx
> +generate_vec_concat (machine_mode fieldmode, rtx op, rtx value,
> +  HOST_WIDE_INT lower_bound,
> +  HOST_WIDE_INT upper_bound)
> +{
> +  if (!VECTOR_MODE_P (fieldmode))
> +return NULL_RTX;
> +
> +  rtvec vec = rtvec_alloc (GET_MODE_NUNITS (fieldmode).to_constant ());
> +  machine_mode outermode = GET_MODE (op);
> +
> +  for (HOST_WIDE_INT i = lower_bound; i < upper_bound; ++i)
> +RTVEC_ELT (vec, i) = GEN_INT (i);
> +  rtx par = gen_rtx_PARALLEL (VOIDmode, vec);
> +  rtx select = gen_rtx_VEC_SELECT (fieldmode, op, par);
> +  if (BYTES_BIG_ENDIAN)
> +{
> +  if (lower_bound > 0)
> + return gen_rtx_VEC_CONCAT (outermode, select, value);
> +  else
> + return gen_rtx_VEC_CONCAT (outermode, value, select);
> +}
> +  else
> +{
> +  if (lower_bound > 0)
> + return gen_rtx_VEC_CONCAT (outermode, value, select);
> +  else
> + return gen_rtx_VEC_CONCAT (outermode, select, value);
> +}
> +}
> +
>  /* A subroutine of store_bit_field, with the same arguments.  Return true
> if the operation could be implemented.
>  
> @@ -778,18 +814,142 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>if (VECTOR_MODE_P (outermode)
>&& !MEM_P (op0)
>&& optab_handler (vec_set_optab, outermode) != CODE_FOR_nothing
> -  && fieldmode == innermode
> -  && known_eq (bitsize, GET_MODE_PRECISION (innermode))
>&& multiple_p (bitnum, GET_MODE_PRECISION (innermode), &pos))
>  {
> +  /* Cases where the destination's inner mode is not equal to the
> +  value's mode need special treatment.  */
> +
>class expand_operand ops[3];
>enum insn_code icode = optab_handler (vec_set_optab, outermode);
>  
> -  create_fixed_operand (&ops[0], op0);
> -  create_input_operand (&ops[1], value, innermode);
> -  create_integer_operand (&ops[2], pos);
> -  if (maybe_expand_insn (icode, 3, ops))
> - return true;
> +  /* Subreg expressions should operate on scalars only.  Subregs on
> +  vectors are not canonical.  Extractions from vectors should use
> +  vector operations instead.  */
> +  bool is_non_canon_subreg = GET_CODE (value) == SUBREG
> +  && VECTOR_MODE_P (fieldmode)
> +  && !VECTOR_MODE_P (
> + GET_MODE (SUBREG_REG (value)));
> +
> +  /* If the value to be written is a memory expression or a non-canonical
> +  scalar to vector subreg, don't try to generate a vec_set pattern.
> +  Instead, fall back and try to generate an instruction without
> +  touching the operands.  */
> +  if (!MEM_P (value) && !is_non_canon_subreg)
> +  {
> + if (VECTOR_MODE_P (fieldmode))
> +   {
> + /* Handle the case where both the v

[PATCH] Fixup gcc.target/i386/vect-epilogues-5.c

2025-05-26 Thread Richard Biener

The following adjusts the expected messages after -fopt-info-vec
was improved for (masked) epilogues.

Pushed.

* gcc.target/i386/vect-epilogues-5.c: Adjust.
---
 gcc/testsuite/gcc.target/i386/vect-epilogues-5.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect-epilogues-5.c 
b/gcc/testsuite/gcc.target/i386/vect-epilogues-5.c
index 6772cabeb4a..d7c75dfe5cc 100644
--- a/gcc/testsuite/gcc.target/i386/vect-epilogues-5.c
+++ b/gcc/testsuite/gcc.target/i386/vect-epilogues-5.c
@@ -9,5 +9,6 @@ int test (signed char *data, int n)
   return sum;
 }
 
-/* { dg-final { scan-tree-dump-times "loop vectorized using 64 byte vectors" 2 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "loop vectorized using 64 byte vectors" 1 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "epilogue loop vectorized using masked 64 
byte vectors" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-not "loop vectorized using 32 byte vectors" 
"vect" } } */
-- 
2.43.0

Re: simple frm save/restore strategy (was Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn)

2025-05-26 Thread Robin Dapp


2. OK'ish: A bunch of testcases see more reads/writes as PRE of redundant
read/writes is punted to later passes which obviously needs more work.

3. NOK: We loose the ability to instrument local RM writes - especially in the
testsuite.
  e.g.
     a.  instrinsic setting a static RM
 b. get_frm() to ensure that happened (inline asm to read out frm)

The tightly coupled restore kicks in before get_frm could be emitted which fails
to observe #a. This is a deal breaker for the testsuite as much of frm tests
report as fail even if the actual codegen is sane.


I'd say that most of the tests we have right now are written with the existing 
behavior in mind and don't necessarily translate well to a changed behavior.


We mostly test the proper LCM and backup update behavior and backup updates 
don't happen with a local-only approach.


I haven't really understood how the FRM-changing intrinsics are used.

There are two extremes: 

- A single intrinsic using a different rounding mode and a lot of other 
 arithmetic before and after it.  In that case we cannot optimize anyway 
 because the rest must operate with the global rounding mode.


- A longer code sequence, like a function, that uses a different rounding mode 
 and every instrinsic being FRM-changing.  In that case we would need to 
 optimize a lot of saves and restores away until we only have a single save at 
 the beginning and a single restore at the end.


I suppose we don't handle the latter case well right now.  But on the other 
hand it's also not very interesting as explicit fegetround (), fesetround (), 
fesetround () is what the user would/should have done anyway.


So IMHO the only interesting cases are somewhere in the middle.  It would 
really help to have some examples here that could tell us whether the simple 
approach leaves a lot on the table (in terms of redundant save/restore).



--
Regards
Robin

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-26 Thread Luc Grosheintz





On 5/22/25 15:21, Tomasz Kaminski wrote:


For the stride and product computation, we should perform them in
Extent::size_type, not index_type.
The latter may be signed, and we may hit UB in multiplying non-zero
extents, before reaching the zero.



Then I observe the following issues:

1. When computing products, the integer promotion rules can interfere.
For simplicity let's assume that int is a 32 bit integer. Then the
relevant case is `uint16_t` (or unsigned short). Which is unsigned; and
therefore overflow shouldn't be UB. I observe that the expression

  prod *= n;

will overflow as `int` (for large enough `n`). I believe that during the
computation of `prod * n` both sides are promoted to int (because the
range of uint16_t is contained in the range of `int`) and then
overflows, e.g. for n = 2**16-1.

Note that many other small, both signed and unsigned, integers
semantically also overflow, but it's neither UB that's detected by
-fsanitize=undefined, nor a compiler error. Likely because the
"overflow" happens during conversion, which (in C++23) is uniquely
defined in [conv.integral], i.e. not UB.

draft: https://eel.is/c++draft/conv.integral
N4950: 7.3.9 on p. 101

The solution I've come up is to not use `size_type` but
  make_unsigned_t

Please let me know if there's a better solution to forcing unsigned
math.

Godbolt: https://godbolt.org/z/PnvaYT7vd

2. Let's assume we compute `__extents_prod` safely, e.g. by doing all
math as unsigned integers. There's several places we need to be careful:

  2.1. layout_{right,left}::stride, these still compute products, that
  overflow and might not be multiplied by `0` to make the answer
  unambiguous. For an empty extent, any number is a valid stride. Hence,
  this only requires that we don't run into UB.

  2.2. The default ctor of layout_stride computes the layout_right
  strides on the fly. We can use __unsigned_prod to keep computing the
  extents in linear time. The only requirement I'm aware of is that the
  strides are the same as those for layout_right (but the actual value
  in not defined directly).

  2.3 layout_stride::required_span_size, the current implementation
  first scans for zeros; and only if there are none does it proceed with
  computing the required span size in index_type. This is safe, because
  the all terms in the sum are non-negative and the mandate states that
  the total is a representable number. Hence, all the involved terms are
  representable too.

3. For those interested in what the other two implementions do: both
fail in some subset of the corner cases.

Godbolt: https://godbolt.org/z/vEYxEvMWs

Re: [PATCH v1] libstdc++: Fix bug in default ctor of extents.

2025-05-26 Thread Tomasz Kaminski

On Sat, May 24, 2025 at 1:29 PM Luc Grosheintz 
wrote:

> The array that stores the dynamic extents used to be default
> initialized. The standard requires value intialization. This
> commit fixes the bug and adds a test.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan: Value initialize the array storing the
> dynamic extents.
> * testsuite/23_containers/mdspan/extents/ctor_default.cc: New
> test.
>
> Signed-off-by: Luc Grosheintz 
> ---
>
LGTM, thanks for noticing and fixing it.
 We also need approval from the maintainer.

>  libstdc++-v3/include/std/mdspan   |  2 +-
>  .../mdspan/extents/ctor_default.cc| 41 +++
>  2 files changed, 42 insertions(+), 1 deletion(-)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_default.cc
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 47cfa405e44..bcf2fa60fea 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -146,7 +146,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>private:
> using _S_storage = __array_traits<_IndexType,
> _S_rank_dynamic>::_Type;
> -   [[no_unique_address]] _S_storage _M_dynamic_extents;
> +   [[no_unique_address]] _S_storage _M_dynamic_extents{};
>
We know that these are integral types, so we can use {}.

>};
>
>  template
> diff --git
> a/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_default.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_default.cc
> new file mode 100644
> index 000..eec300f6896
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_default.cc
> @@ -0,0 +1,41 @@
> +// { dg-do run { target c++23 } }
> +#include 
> +
> +#include 
> +#include 
> +
> +constexpr auto dyn = std::dynamic_extent;
> +
> +template
> +  constexpr void
> +  test_default_ctor()
> +  {
> +Extents exts;
> +for(size_t i = 0; i < Extents::rank(); ++i)
> +  if(exts.static_extent(i) == std::dynamic_extent)
> +   VERIFY(exts.extent(i) == 0);
> +  else
> +   VERIFY(exts.extent(i) == Extents::static_extent(i));
> +  }
> +
> +constexpr bool
> +test_default_ctor_all()
> +{
> +  test_default_ctor>();
> +  test_default_ctor>();
> +  test_default_ctor>();
> +  test_default_ctor>();
> +  test_default_ctor>();
> +  test_default_ctor>();
> +  test_default_ctor>();
> +  test_default_ctor>();
> +  return true;
> +}
> +
> +int
> +main()
> +{
> +  test_default_ctor_all();
> +  static_assert(test_default_ctor_all());
> +  return 0;
> +}
> --
> 2.49.0
>
>

Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vxor.vv to vxor.vx on GR2VR cost

2025-05-26 Thread Robin Dapp


OK, thanks.

--
Regards
Robin

Re: [PATCH v2] libstdc++: Implement C++26 std::indirect [PR119152]

2025-05-26 Thread Tomasz Kaminski

On Sat, May 24, 2025 at 5:06 PM NightStrike  wrote:

>
>
> On Thu, May 22, 2025 at 08:54 Tomasz Kamiński  wrote:
>
>> From: Jonathan Wakely 
>>
>> This papers implements C++26 std::indirect as specified
>
>
> “This patch”?
>
Indeed. I will fix it before committing. Thank you.

Re: [PATCH] libstdc++: Support std::abs for 128-bit integers and floats [PR96710]

2025-05-26 Thread Tomasz Kaminski

On Fri, May 23, 2025 at 6:58 PM Jonathan Wakely  wrote:

> Currently we only provide std::abs(__int128) and std::abs(__float128)
> for non-strict modes, i.e. -std=gnu++NN but not -std=c++NN.
>
> This defines those overloads for strict modes too, as a small step
> towards resolving PR 96710 (which will eventually mean that __int128
> satisfies the std::integral concept).
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/96710
> * include/bits/std_abs.h [__SIZEOF_INT128__] (abs(__int128)):
> Define.
> [_GLIBCXX_USE_FLOAT128] (abs(__float128)): Enable definition for
> strict modes.
> * testsuite/26_numerics/headers/cmath/82644.cc: Use strict_std
> instead of defining __STRICT_ANSI__.
> * testsuite/26_numerics/headers/cstdlib/abs128.cc: New test.
> ---
>
> Even before we make std::is_integral_v<__int128> true, I don't see why
> we can't overload std::abs for it. Likewise for __float128.
>
> Tested x86_64-linux and sparc-solaris11.3  (-m32 and -m64 for both).
>
LGTM.

>
>  libstdc++-v3/include/bits/std_abs.h  |  9 -
>  .../testsuite/26_numerics/headers/cmath/82644.cc |  3 ++-
>  .../26_numerics/headers/cstdlib/abs128.cc| 16 
>  3 files changed, 26 insertions(+), 2 deletions(-)
>  create mode 100644
> libstdc++-v3/testsuite/26_numerics/headers/cstdlib/abs128.cc
>
> diff --git a/libstdc++-v3/include/bits/std_abs.h
> b/libstdc++-v3/include/bits/std_abs.h
> index 35ec4d374b6e..3d805e6d6f04 100644
> --- a/libstdc++-v3/include/bits/std_abs.h
> +++ b/libstdc++-v3/include/bits/std_abs.h
> @@ -103,6 +103,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>abs(__GLIBCXX_TYPE_INT_N_3 __x) { return __x >= 0 ? __x : -__x; }
>  #endif
>
> +#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
> +  // In strict modes __GLIBCXX_TYPE_INT_N_0 is not defined for __int128,
> +  // but we want to always define std::abs(__int128).
> +  __extension__ inline _GLIBCXX_CONSTEXPR __int128
> +  abs(__int128 __x) { return __x >= 0 ? __x : -__x; }
> +#endif
> +
>  #if defined(__STDCPP_FLOAT16_T__) &&
> defined(_GLIBCXX_FLOAT_IS_IEEE_BINARY32)
>constexpr _Float16
>abs(_Float16 __x)
> @@ -137,7 +144,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>{ return __gnu_cxx::__bfloat16_t(__builtin_fabsf(__x)); }
>  #endif
>
> -#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
> +#if defined(_GLIBCXX_USE_FLOAT128)
>__extension__ inline _GLIBCXX_CONSTEXPR
>__float128
>abs(__float128 __x)
> diff --git a/libstdc++-v3/testsuite/26_numerics/headers/cmath/82644.cc
> b/libstdc++-v3/testsuite/26_numerics/headers/cmath/82644.cc
> index 3274f0564c4d..40abb2ced668 100644
> --- a/libstdc++-v3/testsuite/26_numerics/headers/cmath/82644.cc
> +++ b/libstdc++-v3/testsuite/26_numerics/headers/cmath/82644.cc
> @@ -15,8 +15,9 @@
>  // with this library; see the file COPYING3.  If not see
>  // .
>
> -// { dg-options "-D__STDCPP_WANT_MATH_SPEC_FUNCS__ -D__STRICT_ANSI__" }
> +// { dg-options "-D__STDCPP_WANT_MATH_SPEC_FUNCS__" }
>  // { dg-do compile { target c++11 } }
> +// // { dg-add-options strict_std }
>
>  #define conf_hyperg 1
>  #define conf_hypergf 2
> diff --git a/libstdc++-v3/testsuite/26_numerics/headers/cstdlib/abs128.cc
> b/libstdc++-v3/testsuite/26_numerics/headers/cstdlib/abs128.cc
> new file mode 100644
> index ..cfb056219b29
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/26_numerics/headers/cstdlib/abs128.cc
> @@ -0,0 +1,16 @@
> +// { dg-do compile }
> +// { dg-add-options strict_std }
> +
> +#include 
> +
> +template T same_type(T, T) { return T(); }
> +
> +#ifdef __SIZEOF_INT128__
> +__int128 i = 0;
> +__int128 j = same_type(std::abs(i), i);
> +#endif
> +
> +#ifdef __SIZEOF_FLOAT128__
> +__float128 f = 0.0;
> +__float128 g = same_type(std::abs(f), f);
> +#endif
> --
> 2.49.0
>
>

[PATCH] expmed: Prevent non-canonical subreg generation in store_bit_field [PR118873]

2025-05-26 Thread Konstantinos Eleftheriou

In `store_bit_field_1`, when the value to be written in the bitfield
and/or the bitfield itself have vector modes, non-canonical subregs
are generated, like `(subreg:V4SI (reg:V8SI x) 0)`. If one them is
a scalar, this happens only when the scalar mode is different than the
vector's inner mode.

This patch tries to prevent this, using vec_set patterns when
possible.

Bootstrapped/regtested on AArch64 and x86_64.

PR rtl-optimization/118873

gcc/ChangeLog:

* expmed.cc (generate_vec_concat): New function.
(store_bit_field_1): Check for cases where the value
to be written and/or the bitfield have vector modes
and try to generate the corresponding vec_set patterns
instead of subregs.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr118873.c: New test.
---
 gcc/expmed.cc| 174 ++-
 gcc/testsuite/gcc.target/i386/pr118873.c |  33 +
 2 files changed, 200 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr118873.c

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 8cf10d9c73bf..8c641f55b9c6 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -740,6 +740,42 @@ store_bit_field_using_insv (const extraction_insn *insv, 
rtx op0,
   return false;
 }
 
+/* Helper function for store_bit_field_1, used in the case that the bitfield
+   and the destination are both vectors.  It extracts the elements of OP from
+   LOWER_BOUND to UPPER_BOUND using a vec_select and uses a vec_concat to
+   concatenate the extracted elements with the VALUE.  */
+
+rtx
+generate_vec_concat (machine_mode fieldmode, rtx op, rtx value,
+HOST_WIDE_INT lower_bound,
+HOST_WIDE_INT upper_bound)
+{
+  if (!VECTOR_MODE_P (fieldmode))
+return NULL_RTX;
+
+  rtvec vec = rtvec_alloc (GET_MODE_NUNITS (fieldmode).to_constant ());
+  machine_mode outermode = GET_MODE (op);
+
+  for (HOST_WIDE_INT i = lower_bound; i < upper_bound; ++i)
+RTVEC_ELT (vec, i) = GEN_INT (i);
+  rtx par = gen_rtx_PARALLEL (VOIDmode, vec);
+  rtx select = gen_rtx_VEC_SELECT (fieldmode, op, par);
+  if (BYTES_BIG_ENDIAN)
+{
+  if (lower_bound > 0)
+   return gen_rtx_VEC_CONCAT (outermode, select, value);
+  else
+   return gen_rtx_VEC_CONCAT (outermode, value, select);
+}
+  else
+{
+  if (lower_bound > 0)
+   return gen_rtx_VEC_CONCAT (outermode, value, select);
+  else
+   return gen_rtx_VEC_CONCAT (outermode, select, value);
+}
+}
+
 /* A subroutine of store_bit_field, with the same arguments.  Return true
if the operation could be implemented.
 
@@ -778,18 +814,142 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
   if (VECTOR_MODE_P (outermode)
   && !MEM_P (op0)
   && optab_handler (vec_set_optab, outermode) != CODE_FOR_nothing
-  && fieldmode == innermode
-  && known_eq (bitsize, GET_MODE_PRECISION (innermode))
   && multiple_p (bitnum, GET_MODE_PRECISION (innermode), &pos))
 {
+  /* Cases where the destination's inner mode is not equal to the
+value's mode need special treatment.  */
+
   class expand_operand ops[3];
   enum insn_code icode = optab_handler (vec_set_optab, outermode);
 
-  create_fixed_operand (&ops[0], op0);
-  create_input_operand (&ops[1], value, innermode);
-  create_integer_operand (&ops[2], pos);
-  if (maybe_expand_insn (icode, 3, ops))
-   return true;
+  /* Subreg expressions should operate on scalars only.  Subregs on
+vectors are not canonical.  Extractions from vectors should use
+vector operations instead.  */
+  bool is_non_canon_subreg = GET_CODE (value) == SUBREG
+&& VECTOR_MODE_P (fieldmode)
+&& !VECTOR_MODE_P (
+   GET_MODE (SUBREG_REG (value)));
+
+  /* If the value to be written is a memory expression or a non-canonical
+scalar to vector subreg, don't try to generate a vec_set pattern.
+Instead, fall back and try to generate an instruction without
+touching the operands.  */
+  if (!MEM_P (value) && !is_non_canon_subreg)
+  {
+   if (VECTOR_MODE_P (fieldmode))
+ {
+   /* Handle the case where both the value to be written and the
+  destination are vectors.  */
+
+   HOST_WIDE_INT op_elem_num
+ = GET_MODE_NUNITS (outermode).to_constant ();
+   rtx concat_rtx = value;
+   rtx_insn *last_insn = get_last_insn ();
+   HOST_WIDE_INT index = 0;
+   /* If the store position is not at the start of the bitfield,
+  store the value by selecting the first pos elements of the
+  vector and then placing the value after them, using
+  a vec_concat.  */
+   if (pos.to_constant () > 0)
+ {
+   concat_rtx = generate_vec_concat (fieldmode,

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Kugan Vivekanandarajah

> On 26 May 2025, at 5:34 pm, Jan Hubicka  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi,
> also, please, can you add an testcase?  We should have some coverage for
> auto-fdo specific issues
I was looking for this too. AFIK we dont do any testing currently. 
We could 

1. Add gcov files as part of the test. However, This would make updating gcov 
versions difficult.
2.We could add execution test that also uses autfdo tools to generate .gcov. 
This would make them slow.
Also we may not be able to match exact profile values and only see if  afdo 
annotations are there.

Any thoughts?

Thanks,
Kugan

> 
> Honza
> <0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch>

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-26 Thread Tomasz Kaminski

On Fri, May 23, 2025 at 5:25 PM Tomasz Kaminski  wrote:

>
>
> On Fri, May 23, 2025 at 4:22 PM Luc Grosheintz 
> wrote:
>
>>
>>
>> On 5/22/25 15:21, Tomasz Kaminski wrote:
>> > Thanks for working on the patches, they look solid, comments.
>> >
>> > Could you prepare a separate patch to fix initialization
>> > default-initialization of extents,
>> > that you have noticed, standard requires them to be value-initialized,
>> and
>> > add corresponding test?
>>
>> Yes, I was thinking of something that statically allocates
>> sizeof(Extents) bytes, sets all the bits to 1 and then uses placement
>> new to "run" the default ctor. That way we can be confident that
>> the zeros we're observing aren't accidental.
>>
> Running the constructor at compile time will be sufficient, to detect if
> zero were
> set explicitly, so do not need to go extra.
>
>>
>> We do a renaming within this patch series that affects the value
>> initialization patch. Should I submit a v4 that applies cleanly
>> to master *after* applying the value initialization patch? (And
>> cause conflict if applied to the current state of master.)
>>
> Yes, I think we will land the initialization patch before the rest of
> series..
>
>>
>> >
>> > Similarly, we have test for default constructor of stride_mapping, I
>> would
>> > add them for other layouts,
>> > and check:
>> >* all extents being static -> depends on mapping
>> >* first extent being dynamic, and rest static -> all strides are zero
>> >* middle extent being dynamic
>>
>> Yes, definitely.
>>
>> >
>> > For the stride and product computation, we should perform them in
>> > Extent::size_type, not index_type.
>> > The latter may be signed, and we may hit UB in multiplying non-zero
>> > extents, before reaching the zero.
>>
>> This seems unfortunate, because if I understood correctly, the idea
>> behind allowing signed `index_type`, was that signed integer operations
>> are faster (exactly because they don't need to behave "well" on
>> overflow).
>>
>
>> I've made the changes to use unsigned integers. However, I'd like to add
>> this to my pile of performance questions to be investigated.
>>
> Just to clarify, we only need to perform stride/required_span_size
> computation
> using the size_type. For the operator() we should keep using an index_type,
> as:
>  * we check that required_span_size() fits in index_type
>  * if any extent is zero the operator() cannot be invoked.
>
> This will we will still use signed (potentially faster) arithmetic when
> retrieving elements,
> and unsigned will be limited to use of stride, and other funciton that
> perform layout conversions.
> And I am less concerned with them.
>
>>
>> I believe that since constexpr code must be UB free, we should be able
>> to create tests, that can detect overflow in the naive formulas.
>>
>> >
>> > For is_exhaustive question, I will write to LWG reflector to ask
>> authors,
>> > and see what their opinion is.
>> > Will keep you posted.
>>
>> Thank you! Somehow I'm failing to figure out a reasonable algorithm
>> to check it in a conformant way, so this is definitely welcome.
>> (Brute-force is not reasonable.)
>>
> I should soon have LWG issue number. Then you will add comment in
> implementation:
>  // _GLIBCXX_RESOLVE_LIB_DEFECTS
>  // XXX. Issue title
> And also mention in commit message that you are implemented it as modified
> by the issue.
>
The issue number is https://cplusplus.github.io/LWG/issue4266. I suggested
also adjusting is_always_exhaustive,
so is_exhaustive can be implemented as:
if constexpr (!is_always_exhaustive())
 if (size_t __size = __mdspan::__fwd_prod(_M_extents, extents_type::rank())
return __size == required_span_size();
return true;

>
>> >
>> > In a lot of tests we are doing, where I believe we could skip template
>> > parameters, and deduce it for argument.
>> >verify_nothrow_convertible>(
>> > +   std::extents{});
>> > Could you look into doing it?
>>
>> I could only find them in cases like:
>>
>>template
>>const void
>>verify_nothrow_convertible(OExtents oexts);
>>
>> i.e. the first is explicitly given and the second is deduced. This
>> happens when checking properties of mappings of the same layout with
>> different extent_types. I didn't add anything to reduce the amount
>> of code when Extents == OExtents, because it felt non-uniform (I
>> don't like remembering special cases) and a little error prone. Let
>> me know if you see it differently and would like me to add a default
>> template argument to handle the case Extents == OExtents.
>>
> In that case, I believe this is OK.
>
>>
>> >
>> > Regards,
>> > Tomasz
>> >
>> > On Thu, May 22, 2025 at 2:21 PM Tomasz Kaminski 
>> wrote:
>> >
>> >>
>> >>
>> >> On Wed, May 21, 2025 at 4:21 PM Luc Grosheintz <
>> luc.groshei...@gmail.com>
>> >> wrote:
>> >>
>> >>> It's missing the "registration" of the three new classes in
>> >>> std.cc.in.
>> >>>
>> >> Please remember to add it in next revisions.
>> >>
>> >>>
>> >>>

[PATCH] or1k: Support long jump offsets with -mcmodel=large

2025-05-26 Thread Stafford Horne

The -mcmodel=large option was originally added to handle generation of
large binaries with large PLTs.  However, when compiling the Linux
kernel with allyesconfig the output binary is so large that the jump
instruction 26-bit immediate is not large enough to store the jump
offset to some symbols when linking.  Example error:

  relocation truncated to fit: R_OR1K_INSN_REL_26 against symbol `do_fpe_trap' 
defined in .text section in arch/openrisc/kernel/traps.o

We fix this by forcing jump offsets to registers when -mcmodel=large.

Note, to get the Linux kernel allyesconfig config to work with OpenRISC,
this patch is needed along with some other patches to the Linux hand
coded assembly bits.

gcc/ChangeLog:

* config/or1k/predicates.md (call_insn_operand): Add condition
  to not allow symbol_ref operands with TARGET_CMODEL_LARGE.
* config/or1k/or1k.opt: Document new -mcmodel=large
  implications.
* doc/invoke.texi: Likewise.
---
If anyone is interested with the kernel patches I have them on this branch:

 https://github.com/stffrdhrn/linux/commits/or1k-allyesconfig-2/

 gcc/config/or1k/or1k.opt  | 4 ++--
 gcc/config/or1k/predicates.md | 3 ++-
 gcc/doc/invoke.texi   | 7 ---
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/config/or1k/or1k.opt b/gcc/config/or1k/or1k.opt
index 00c55603300..d252de08204 100644
--- a/gcc/config/or1k/or1k.opt
+++ b/gcc/config/or1k/or1k.opt
@@ -69,8 +69,8 @@ are used to perform unordered floating point compare and set 
flag operations.
 mcmodel=
 Target RejectNegative Joined Enum(or1k_cmodel_type) Var(or1k_code_model) 
Init(CMODEL_SMALL)
 Specify the code model used for accessing memory addresses.  Specifying large
-enables generating binaries with large global offset tables.  By default the
-value is small.
+enables generating binaries with large global offset tables and calling
+functions anywhere in an executable.  By default the value is small.
 
 Enum
 Name(or1k_cmodel_type) Type(enum or1k_cmodel_type)
diff --git a/gcc/config/or1k/predicates.md b/gcc/config/or1k/predicates.md
index 11bb5181436..144f4d7b577 100644
--- a/gcc/config/or1k/predicates.md
+++ b/gcc/config/or1k/predicates.md
@@ -61,7 +61,8 @@
 (match_test "TARGET_ROR"
 
 (define_predicate "call_insn_operand"
-  (ior (match_code "symbol_ref")
+  (ior (and (match_code "symbol_ref")
+   (match_test "!TARGET_CMODEL_LARGE"))
(match_operand 0 "register_operand")))
 
 (define_predicate "high_operand"
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ab89686256d..cfbb9eda083 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -30879,12 +30879,13 @@ to store the immediate to a register first.
 @opindex mcmodel=
 @opindex mcmodel=small
 @item -mcmodel=small
-Generate OpenRISC code for the small model: The GOT is limited to 64k. This is
-the default model.
+Generate OpenRISC code for the small model: The GOT is limited to 64k and
+function call jumps are limited to 64M offsets. This is the default model.
 
 @opindex mcmodel=large
 @item -mcmodel=large
-Generate OpenRISC code for the large model: The GOT may grow up to 4G in size.
+Generate OpenRISC code for the large model: The GOT may grow up to 4G in size
+and function call jumps can target the full 4G address space.
 
 
 @end table
-- 
2.49.0

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-26 Thread Tomasz Kaminski

On Mon, May 26, 2025 at 11:35 AM Luc Grosheintz 
wrote:

>
>
> On 5/22/25 15:21, Tomasz Kaminski wrote:
> >
> > For the stride and product computation, we should perform them in
> > Extent::size_type, not index_type.
> > The latter may be signed, and we may hit UB in multiplying non-zero
> > extents, before reaching the zero.
> >
>
> Then I observe the following issues:
>
> 1. When computing products, the integer promotion rules can interfere.
> For simplicity let's assume that int is a 32 bit integer. Then the
> relevant case is `uint16_t` (or unsigned short). Which is unsigned; and
> therefore overflow shouldn't be UB. I observe that the expression
>
>prod *= n;
>
> will overflow as `int` (for large enough `n`). I believe that during the
> computation of `prod * n` both sides are promoted to int (because the
> range of uint16_t is contained in the range of `int`) and then
> overflows, e.g. for n = 2**16-1.
>
> Note that many other small, both signed and unsigned, integers
> semantically also overflow, but it's neither UB that's detected by
> -fsanitize=undefined, nor a compiler error. Likely because the
> "overflow" happens during conversion, which (in C++23) is uniquely
> defined in [conv.integral], i.e. not UB.
>
> draft: https://eel.is/c++draft/conv.integral
> N4950: 7.3.9 on p. 101
>
> The solution I've come up is to not use `size_type` but
>make_unsigned_t
>
> Please let me know if there's a better solution to forcing unsigned
> math.
>
I think at this point we should perform stride computation in std::size_t.
Because accessors are defined to accept size_t, the required_span_size()
cannot be greater
than maximum of size_t, and that limits our product of extents.

>
> Godbolt: https://godbolt.org/z/PnvaYT7vd
>
> 2. Let's assume we compute `__extents_prod` safely, e.g. by doing all
> math as unsigned integers. There's several places we need to be careful:
>
>2.1. layout_{right,left}::stride, these still compute products, that
>overflow and might not be multiplied by `0` to make the answer
>unambiguous. For an empty extent, any number is a valid stride. Hence,
>this only requires that we don't run into UB.
>
>2.2. The default ctor of layout_stride computes the layout_right
>strides on the fly. We can use __unsigned_prod to keep computing the
>extents in linear time. The only requirement I'm aware of is that the
>strides are the same as those for layout_right (but the actual value
>in not defined directly).
>
>2.3 layout_stride::required_span_size, the current implementation
>first scans for zeros; and only if there are none does it proceed with
>computing the required span size in index_type. This is safe, because
>the all terms in the sum are non-negative and the mandate states that
>the total is a representable number. Hence, all the involved terms are
>representable too.
>
> 3. For those interested in what the other two implementions do: both
> fail in some subset of the corner cases.
>
> Godbolt: https://godbolt.org/z/vEYxEvMWs
>
>

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Jan Hubicka

> 
> 
> > On 26 May 2025, at 5:34 pm, Jan Hubicka  wrote:
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > Hi,
> > also, please, can you add an testcase?  We should have some coverage for
> > auto-fdo specific issues
> I was looking for this too. AFIK we dont do any testing currently. 
> We could 
> 
> 1. Add gcov files as part of the test. However, This would make updating gcov 
> versions difficult.
> 2.We could add execution test that also uses autfdo tools to generate .gcov. 
> This would make them slow.
> Also we may not be able to match exact profile values and only see if  afdo 
> annotations are there.

There is a testuiste coverage, but currently enabled only for Intel
based x86_64 CPUs and I think no-one runs it regularly.  To get AutoFDO
into a good shape we definitely need to enable it on more setup and also
start testing/benmarking regularly.

For a long time I had no easy access for CPU with AutoFDO support, but
now I have zen3 based desktop and also use zen5 based box for testing.
I think the attached patch makes testuite do the right hting on AMD Zens 3,4 
and 5.

I get following failures on Zen5:
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1 
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2 
into main/4."
FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times"

while on Intel CPU I get:
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining add1/1 
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining sub1/2 
into main/4."
FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-tree-dump-not optimized "Invalid 
sum"
FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function ..;"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 times"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled likely exits: likely 
decreased number of iterations of loop 1"
FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled all exits: decreased 
number of iterations of loop 2"
FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-tree-dump-not optimized 
"Invalid sum"

I did not dive yet into where the difference scome from.  

Andy, does the patch makes sense to you?  I simply followed kernel's
auto-fdo instructions for clang and built current git version of
create_gcov.  In the past I always had troubles to get create_gcov
working with version of perf distributted by open-suse, but this time it
seems to work even though it complains:

[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1322]
 Skipping 4228 bytes of metadata: HEADER_CPU_TOPOLOGY
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
 Skipping unsupported event PERF_RECORD_ID_INDEX
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
 Skipping unsupported event PERF_RECORD_EVENT_UPDATE
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
 Skipping unsupported event PERF_RECORD_CPU_MAP
[WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
 Skipping unsupported event UNKNOWN_EVENT_82
[INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1060]
 Number of events stored: 2178
[INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:272]
 Parser processed: 5 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT 
events, 2108 SAMPLE events, 2099 of these were mapped, 0 SAMPLE events with a 
data address, 0 of these were mapped
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250525 22:10:18.478610 1692721 sample_reader.cc:289] No buildid found in 
binary
W20250525 22:10:18.479000 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1050->0 index=4
W20250525 22:10:18.479007 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1057->0 index=2
W20250525 22:10:18.479010 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1050->0 index=6
W20250525 22:10:18.479013 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1050->0 index=6
W20250525 22:10:18.479017 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1057->0 index=8
W20250525 22:10:18.479019 1692721 sample_reader.cc:345] Bogus LBR data (range 
is negative): 1050->0 index=c
I20250525 22:10:18.479228 1692721 symbol_map.cc:477] Adding loadable exec 
segment: offset=1000 vaddr=401000

Did someone run SPEC recently? I made auto-FDO spec config and tested
-Ofast with ipa-icf, ipa-cp-clone and ipa-sra disabled (to get rid of
the cl

Fwd: [PATCH] testsuite: Fix up dg-do-if

2025-05-26 Thread Xi Ruoyao

I forgot to send this to the list :(.

 Forwarded Message 
From: Xi Ruoyao 
To: Alexandre Oliva 
Cc: Xi Ruoyao 
Subject: [PATCH] testsuite: Fix up dg-do-if
Date: 05/26/25 17:59:32

The line number needs to be passed to dg-do, instead of being stripped.

Fixes 'compile: syntax error for " dg-do-if 1 compile { target {
sse2_runtime && { ! sse4_runtime } } } "' errors in
vect-simd-clone-{16f,17f,18f,20}.c on an old K8.

gcc/testsuite/ChangeLog:

* lib/target-supports-dg.exp (dg-do-if): Pass the line number
to
dg-do.
---

Ok for trunk and gcc-15 branch?

 gcc/testsuite/lib/target-supports-dg.exp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports-dg.exp
b/gcc/testsuite/lib/target-supports-dg.exp
index 422ea838084..2dca8e15c42 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -422,9 +422,8 @@ proc check-flags { args } {
 # (possibly the default) prevails.
 
 proc dg-do-if { args } {
-    set args [lreplace $args 0 0]
 # Verify the number of arguments.
-    if { [llength $args] != 2 } {
+    if { [llength $args] != 3 } {
    error "syntax error, need a single action and target selector"
 }
 
@@ -435,7 +434,7 @@ proc dg-do-if { args } {
 }
 
 # Evaluate selector, return if it does not match.
-    switch [dg-process-target-1 [lindex $args 1]] {
+    switch [dg-process-target-1 [lindex $args 2]] {
    "N" { return }
    "P" { return }
 }
-- 
2.49.0


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Dhruv Chawla


On 26/05/25 12:58, Jan Hubicka wrote:

External email: Use caution opening links or attachments


Hi,

Ping?

Sorry for the delay. I think I finally got auto-fdo running on my box
and indeed I see that if function is cloned later, the profile is lost.
There are .suffixes added before afdo pass (such as openmp offloading or
nested functions) and there are .suffixes added afer afdo (by ipa
cloning and LTO privatization).  I see we want to merge those created by
ipa cloning (after afdo pass).  But I do not think we want to merge
those for i.e.  nested functions since those are actual different
functions or for openmp offloading.

I also wonder what happens with LTO privatization - i.e. how we look up
what static function does the symbol belong?


Hi,

LTO privatization is a known issue - I had filed a bugzilla report for it
at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120229. One method to fix
it is to attach file names to the GCOV profile data, and we are working on
a patch to do this.

--
Regards,
Dhruv



Overwritting the data by the last clone is definitely bad, so the patch
is OK, but we should figure out what happens in the cases above.

Also if we merge, it may happen that the clone is noticeably different
from original - for example with ipa split it may be missing part of the
body. Merging the tables elementwise is safe then?

Honza


Thanks,
Kugan




On 9 May 2025, at 11:54 am, Kugan Vivekanandarajah  
wrote:

External email: Use caution opening links or attachments


This patch add support for merging profiles from multiple clones.
That is, when optimized binaries have clones such as IPA-CP clone or SRA
clones, genarted gcov will have profiled them spereately.
Currently we pick one and ignore the rest. This patch fixes this by
merging the profiles.


Regression tested on aarch64-linux-gnu with no new regression.
Also successfully  done autoprofiledbootstrap with the relevant patch.

Is this OK for trunk?
Thanks,
Kugan

[PATCH v1 0/3] Refine the avg_floor with fixed point vaadd

2025-05-26 Thread pan2 . li

From: Pan Li 

The spec of RVV is somehow not that clear about the difference
between the float point and fixed point for the rounding that
discard least-significant information.

For float point which is not two's complement, the "discard
least-significant information" indicates truncation round.  For
example as below:

* 3.5 -> 3
* -2.3 -> -2

For fixed point which is two's complement, the "discard
least-significant information" indicates round down.  For
example as below:

* 3.5 -> 3
* -2.3 -> -3

And the vaadd takes the round down which is totally matching
the sematics of the avf_floor.  Thus, leverage it to implement
the avg_floor.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (3):
  RISC-V: Leverage vaadd.vv for signed standard name avg_floor
  RISC-V: Reconcile the existing test for avg_floor
  RISC-V: Add test cases for avg_floor vaadd implementation

 gcc/config/riscv/autovec.md   |  32 +--
 .../gcc.target/riscv/rvv/autovec/avg.h|  23 +++
 .../gcc.target/riscv/rvv/autovec/avg_data.h   | 185 ++
 .../rvv/autovec/avg_floor-1-i16-from-i32.c|  12 ++
 .../rvv/autovec/avg_floor-1-i16-from-i64.c|  12 ++
 .../rvv/autovec/avg_floor-1-i32-from-i64.c|  12 ++
 .../rvv/autovec/avg_floor-1-i8-from-i16.c |  12 ++
 .../rvv/autovec/avg_floor-1-i8-from-i32.c |  12 ++
 .../rvv/autovec/avg_floor-1-i8-from-i64.c |  12 ++
 .../autovec/avg_floor-run-1-i16-from-i32.c|  16 ++
 .../autovec/avg_floor-run-1-i16-from-i64.c|  16 ++
 .../autovec/avg_floor-run-1-i32-from-i64.c|  16 ++
 .../rvv/autovec/avg_floor-run-1-i8-from-i16.c |  16 ++
 .../rvv/autovec/avg_floor-run-1-i8-from-i32.c |  16 ++
 .../rvv/autovec/avg_floor-run-1-i8-from-i64.c |  16 ++
 .../gcc.target/riscv/rvv/autovec/avg_run.h|  28 +++
 .../gcc.target/riscv/rvv/autovec/vls/avg-1.c  |   5 +-
 .../gcc.target/riscv/rvv/autovec/vls/avg-2.c  |   5 +-
 .../gcc.target/riscv/rvv/autovec/vls/avg-3.c  |   5 +-
 .../riscv/rvv/autovec/widen/vec-avg-rv32gcv.c |   7 +-
 .../riscv/rvv/autovec/widen/vec-avg-rv64gcv.c |   7 +-
 21 files changed, 424 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i32-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i32-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_run.h

-- 
2.43.0

[PATCH v1 2/3] RISC-V: Reconcile the existing test for avg_floor

2025-05-26 Thread pan2 . li

From: Pan Li 

Some existing avg_floor test need updated due to change to
leverage vaadd.vv directly.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/avg-1.c: Update asm check
to vaadd.
* gcc.target/riscv/rvv/autovec/vls/avg-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/avg-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-1.c | 5 ++---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-2.c | 5 ++---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-3.c | 5 ++---
 .../gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c   | 7 ++-
 .../gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c   | 7 ++-
 5 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-1.c
index 30e60d520d6..4920fa6ad41 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-1.c
@@ -25,9 +25,8 @@ DEF_AVG_FLOOR (uint8_t, uint16_t, 512)
 DEF_AVG_FLOOR (uint8_t, uint16_t, 1024)
 DEF_AVG_FLOOR (uint8_t, uint16_t, 2048)
 
-/* { dg-final { scan-assembler-times {vwadd\.vv} 10 } } */
-/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 10 } } */
-/* { dg-final { scan-assembler-times {vnsra\.wi} 10 } } */
+/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 20 } } */
+/* { dg-final { scan-assembler-times {vaadd\.vv} 10 } } */
 /* { dg-final { scan-assembler-times {vaaddu\.vv} 10 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
 /* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-2.c
index 33df429a634..c6a120b7613 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-2.c
@@ -23,9 +23,8 @@ DEF_AVG_FLOOR (uint16_t, uint32_t, 256)
 DEF_AVG_FLOOR (uint16_t, uint32_t, 512)
 DEF_AVG_FLOOR (uint16_t, uint32_t, 1024)
 
-/* { dg-final { scan-assembler-times {vwadd\.vv} 9 } } */
-/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 9 } } */
-/* { dg-final { scan-assembler-times {vnsra\.wi} 9 } } */
+/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 18 } } */
+/* { dg-final { scan-assembler-times {vaadd\.vv} 9 } } */
 /* { dg-final { scan-assembler-times {vaaddu\.vv} 9 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
 /* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-3.c
index 9058905e3f5..2838c1ed106 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-3.c
@@ -21,9 +21,8 @@ DEF_AVG_FLOOR (uint32_t, uint64_t, 128)
 DEF_AVG_FLOOR (uint32_t, uint64_t, 256)
 DEF_AVG_FLOOR (uint32_t, uint64_t, 512)
 
-/* { dg-final { scan-assembler-times {vwadd\.vv} 8 } } */
-/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 8 } } */
-/* { dg-final { scan-assembler-times {vnsra\.wi} 8 } } */
+/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 16 } } */
+/* { dg-final { scan-assembler-times {vaadd\.vv} 8 } } */
 /* { dg-final { scan-assembler-times {vaaddu\.vv} 8 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
 /* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c
index 5880ccca477..b7246a38dba 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c
@@ -3,9 +3,6 @@
 
 #include "vec-avg-template.h"
 
-/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
-/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*0} 3 } } */
-/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 3 } } */
-/* { dg-final { scan-assembler-times {\tvadd\.vi} 3 } } */
-/* { dg-final { scan-assembler-times {\tvnsra.wi} 6 } } */
+/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 6 } } */
 /* { dg-final { scan-assembler-times {vaaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {vaadd\.vv} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c
index 916f33d9f13..3ffe0ef39ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c
@@ -3,9 +3,6 @@
 
 #include "vec-avg-template.h"
 
-/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
-/* { dg-final { s

[PATCH v1 3/3] RISC-V: Add test cases for avg_floor vaadd implementation

2025-05-26 Thread pan2 . li

From: Pan Li 

Add asm and run testcase for avg_floor vaadd implementation.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/avg.h: New test.
* gcc.target/riscv/rvv/autovec/avg_data.h: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i32-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i16.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i32-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i16.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_run.h: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/avg.h|  23 +++
 .../gcc.target/riscv/rvv/autovec/avg_data.h   | 185 ++
 .../rvv/autovec/avg_floor-1-i16-from-i32.c|  12 ++
 .../rvv/autovec/avg_floor-1-i16-from-i64.c|  12 ++
 .../rvv/autovec/avg_floor-1-i32-from-i64.c|  12 ++
 .../rvv/autovec/avg_floor-1-i8-from-i16.c |  12 ++
 .../rvv/autovec/avg_floor-1-i8-from-i32.c |  12 ++
 .../rvv/autovec/avg_floor-1-i8-from-i64.c |  12 ++
 .../autovec/avg_floor-run-1-i16-from-i32.c|  16 ++
 .../autovec/avg_floor-run-1-i16-from-i64.c|  16 ++
 .../autovec/avg_floor-run-1-i32-from-i64.c|  16 ++
 .../rvv/autovec/avg_floor-run-1-i8-from-i16.c |  16 ++
 .../rvv/autovec/avg_floor-run-1-i8-from-i32.c |  16 ++
 .../rvv/autovec/avg_floor-run-1-i8-from-i64.c |  16 ++
 .../gcc.target/riscv/rvv/autovec/avg_run.h|  28 +++
 15 files changed, 404 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i32-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i32-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_run.h

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
new file mode 100644
index 000..746c635ae57
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
@@ -0,0 +1,23 @@
+#ifndef HAVE_DEFINED_AVG_H
+#define HAVE_DEFINED_AVG_H
+
+#include 
+
+#define DEF_AVG_0(NT, WT, NAME) \
+__attribute__((noinline))   \
+void\
+test_##NAME##_##WT##_##NT##_0(NT * restrict a, NT * restrict b, \
+ NT * restrict out, int n) \
+{   \
+  for (int i = 0; i < n; i++) { \
+out[i] = (NT)(((WT)a[i] + (WT)b[i]) >> 1);  \
+  } \
+}
+#define DEF_AVG_0_WRAP(NT, WT, NAME) DEF_AVG_0(NT, WT, NAME)
+
+#define RUN_AVG_0(NT, WT, NAME, a, b, out, n) \
+  test_##NAME##_##WT##_##NT##_0(a, b, out, n)
+#define RUN_AVG_0_WRAP(NT, WT, NAME, a, b, out, n) \
+  RUN_AVG_0(NT, WT, NAME, a, b, out, n)
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_data.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_data.h
new file mode 100644
index 000..cbeed147a56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/a

[PATCH v1 1/3] RISC-V: Leverage vaadd.vv for signed standard name avg_floor

2025-05-26 Thread pan2 . li

From: Pan Li 

The signed avg_floor totally match the sematics of fixed point
rvv insn vaadd, within round down.  Thus, leverage it directly
to implement the avf_floor.

The spec of RVV is somehow not that clear about the difference
between the float point and fixed point for the rounding that
discard least-significant information.

For float point which is not two's complement, the "discard
least-significant information" indicates truncation round.  For
example as below:

* 3.5 -> 3
* -2.3 -> -2

For fixed point which is two's complement, the "discard
least-significant information" indicates round down.  For
example as below:

* 3.5 -> 3
* -2.3 -> -3

And the vaadd takes the round down which is totally matching
the sematics of the avf_floor.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (avg3_floor): Remove.
(avg3_floor): Add new mode for avg_floor to leverage
vaadd directly.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md | 32 ++--
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 9e51e3ce6a3..9de786014a7 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2481,29 +2481,17 @@ (define_expand "len_fold_extract_last_"
 ;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
 ;; -
 
-(define_expand "avg3_floor"
- [(set (match_operand: 0 "register_operand")
-   (truncate:
-(ashiftrt:VWEXTI
- (plus:VWEXTI
-  (sign_extend:VWEXTI
-   (match_operand: 1 "register_operand"))
-  (sign_extend:VWEXTI
-   (match_operand: 2 "register_operand"))]
+(define_expand "avg3_floor"
+ [(match_operand:V_VLSI 0 "register_operand")
+  (match_operand:V_VLSI 1 "register_operand")
+  (match_operand:V_VLSI 2 "register_operand")]
   "TARGET_VECTOR"
-{
-  /* First emit a widening addition.  */
-  rtx tmp1 = gen_reg_rtx (mode);
-  rtx ops1[] = {tmp1, operands[1], operands[2]};
-  insn_code icode = code_for_pred_dual_widen (PLUS, SIGN_EXTEND, mode);
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
-
-  /* Then a narrowing shift.  */
-  rtx ops2[] = {operands[0], tmp1, const1_rtx};
-  icode = code_for_pred_narrow_scalar (ASHIFTRT, mode);
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
-  DONE;
-})
+  {
+insn_code icode = code_for_pred (UNSPEC_VAADD, mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP_VXRM_RDN, 
operands);
+DONE;
+  }
+)
 
 (define_expand "avg3_ceil"
  [(set (match_operand: 0 "register_operand")
-- 
2.43.0

Re: [PATCH v2] c++: Unwrap type traits defined in terms of builtins within diagnostics [PR117294]

2025-05-26 Thread Nathaniel Shead

On Wed, Nov 27, 2024 at 11:45:40AM -0500, Patrick Palka wrote:
> On Fri, 8 Nov 2024, Nathaniel Shead wrote:
> 
> > Does this approach seem reasonable?  I'm pretty sure that the way I've
> > handled the templating here is unideal but I'm not sure what a neat way
> > to do what I'm trying to do here would be; any comments are welcome.
> 
> Clever approach, I like it!
> 
> > 
> > -- >8 --
> > 
> > Currently, concept failures of standard type traits just report
> > 'expression X evaluates to false'.  However, many type traits are
> > actually defined in terms of compiler builtins; we can do better here.
> > For instance, 'is_constructible_v' could go on to explain why the type
> > is not constructible, or 'is_invocable_v' could list potential
> > candidates.
> 
> That'd be great improvement.
> 
> > 
> > As a first step to supporting that we need to be able to map the
> > standard type traits to the builtins that they use.  Rather than adding
> > another list that would need to be kept up-to-date whenever a builtin is
> > added, this patch instead tries to detect any variable template defined
> > directly in terms of a TRAIT_EXPR.
> > 
> > To avoid false positives, we ignore any variable templates that have any
> > specialisations (partial or explicit), even if we wouldn't have chosen
> > that specialisation anyway.  This shouldn't affect any of the standard
> > library type traits that I could see.
> 
> You should be able to tsubst the TEMPLATE_ID_EXPR directly and look at
> its TI_PARTIAL_INFO in order to determine which (if any) partial
> specialization was selected.  And if an explicit specialization was
> selected the resulting VAR_DECL will have DECL_TEMPLATE_SPECIALIZATION
> set.
> 
> > ...[snip]...
> 
> If we substituted the TEMPLATE_ID_EXPR as a whole we could use the
> DECL_TI_ARGS of that IIUC?
> 

Thanks for your comments, they were very helpful.  Here's a totally new
approach which I'm much happier with.  I've also removed the "disable in
case any specialisation exists" logic, as on further reflection I don't
imagine this to be the kind of issue I thought it might have been.

With this patch,

  template 
  constexpr bool is_default_constructible_v = __is_constructible(T);

  template 
  concept default_constructible = is_default_constructible_v;

  static_assert(default_constructible);

now emits the following error:

  test.cpp:6:15: error: static assertion failed
  6 | static_assert(default_constructible);
|   ^~~
  test.cpp:6:15: note: constraints not satisfied
  test.cpp:4:9:   required by the constraints of ‘template concept 
default_constructible’
  test.cpp:4:33: note:   ‘void’ is not default constructible
  4 | concept default_constructible = is_default_constructible_v;
| ^

There's still a lot of improvements to be made in this area, I think:

- I haven't yet looked into updating the specific diagnostics emitted by
  the traits; I'd like to try to avoid too much code duplication with
  the implementation in cp/semantics.cc.  (I also don't think the manual
  indentation at the start of the message is particularly helpful?)

- The message doesn't print the mapping '[with T = void]'; I tried a
  couple of things but this doesn't currently look especially
  straight-forward, as we don't currently associate the args with the
  normalised atomic constraint of the declaration.

- Just generally I think there's still a lot of noise in the diagnostic,
  and I find the back-to-front ordering of 'required by...' confusing.

Depending on how much time I find myself with I might take a look at
some of these further issues later, but in the meantime, does this look
like an improvement over the status quo?

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, concept failures of standard type traits just report
'expression X evaluates to false'.  However, many type traits are
actually defined in terms of compiler builtins; we can do better here.
For instance, 'is_constructible_v' could go on to explain why the type
is not constructible, or 'is_invocable_v' could list potential
candidates.

As a first step to supporting that we need to be able to map the
standard type traits to the builtins that they use.  Rather than adding
another list that would need to be kept up-to-date whenever a builtin is
added, this patch instead tries to detect any variable template defined
directly in terms of a TRAIT_EXPR.

The new diagnostics from this patch are not immediately much better;
however, it would be relatively straight-forward to update the messages
in 'diagnose_trait_expr' to provide these new details, which can be done
in a future patch.

Apart from concept diagnostics, this is also useful when using such
traits in a 'static_assert' directly, so this patch also adjusts the
diagnostics in that context.

PR c++/117294
PR c++/113854

gcc/cp/Change

[PATCH v1 0/1] Add error message to cmp_* and in_range.

2025-05-26 Thread Luc Grosheintz

While reading the compiler output of

make check-target-libstdc++-v3

for buggy code, e.g. cmp_equal(1.0, 1.0), the error message
was very short, and I saw no hint that neither of the two
template arguments weren't integers. Essentially, the trace
was:

  1. my faulty line
  2. required from here
  3. static_assert(false)

On regular builds with g++ the error message mentions the
static_assert(__is_standard_integer), and is much less
cryptic. Please ignore if this is intended behaviour.

Tested on x86_64 with:

make check-target-libstdc++-v3

(in a no PCH build).

Luc Grosheintz (1):
  libstdc++: Improve diagnostic message for `cmp_*` and `in_range`.

 libstdc++-v3/include/std/utility | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

-- 
2.49.0

[PATCH v1 1/1] libstdc++: Improve diagnostic message for `cmp_*` and `in_range`.

2025-05-26 Thread Luc Grosheintz

Without the message, the compiler output can be very short, e.g.
as short as a `required from here`. If the output includes the
line of code that trigges the static_assert, the user might
interpret it as "must be a standard integer", which is incorrect,
because that term doesn't cover extended integers.

This commit adds a diagnostic message that states that the template
argument must be a signed or unsigned integer.

libstdc++-v3/ChangeLog:

* include/std/utility (cmp_equal): Add message to the
static_assert.
(cmp_less): Ditto.
(in_range): Ditto.
---
 libstdc++-v3/include/std/utility | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 8a85ccfd09b..8a79ee9cc21 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -133,8 +133,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr bool
 cmp_equal(_Tp __t, _Up __u) noexcept
 {
-  static_assert(__is_standard_integer<_Tp>::value);
-  static_assert(__is_standard_integer<_Up>::value);
+  static_assert(__is_standard_integer<_Tp>::value,
+   "T must be a signed or unsigned integer");
+  static_assert(__is_standard_integer<_Up>::value,
+   "U must be a signed or unsigned integer");
 
   if constexpr (is_signed_v<_Tp> == is_signed_v<_Up>)
return __t == __u;
@@ -153,8 +155,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr bool
 cmp_less(_Tp __t, _Up __u) noexcept
 {
-  static_assert(__is_standard_integer<_Tp>::value);
-  static_assert(__is_standard_integer<_Up>::value);
+  static_assert(__is_standard_integer<_Tp>::value,
+   "T must be a signed or unsigned integer");
+  static_assert(__is_standard_integer<_Up>::value,
+   "U must be a signed or unsigned integer");
 
   if constexpr (is_signed_v<_Tp> == is_signed_v<_Up>)
return __t < __u;
@@ -183,8 +187,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr bool
 in_range(_Tp __t) noexcept
 {
-  static_assert(__is_standard_integer<_Res>::value);
-  static_assert(__is_standard_integer<_Tp>::value);
+  static_assert(__is_standard_integer<_Res>::value,
+   "R must be a signed or unsigned integer");
+  static_assert(__is_standard_integer<_Tp>::value,
+   "T must be a signed or unsigned integer");
   using __gnu_cxx::__int_traits;
 
   if constexpr (is_signed_v<_Tp> == is_signed_v<_Res>)
-- 
2.49.0

Re: [PATCH] arm_neon.h: remove useless push/pop pragmas

2025-05-26 Thread Christophe Lyon

On Mon, 26 May 2025 at 18:14, Christophe Lyon
 wrote:
>
> Remove #pragma GCC target ("arch=armv8.2-a+bf16") and preceding
> target and is thus useless.
I guess this should read:
Remove #pragma GCC target ("arch=armv8.2-a+bf16") since it matches the preceding
pragma GCC target and is thus useless.

Sorry for the typo,

Christophe


>
> gcc/ChangeLog:
>
> * config/arm/arm_neon.h: Remove useless push/pop pragmas.
> ---
>  gcc/config/arm/arm_neon.h | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
> index cba50de0720..105385f7f5d 100644
> --- a/gcc/config/arm/arm_neon.h
> +++ b/gcc/config/arm/arm_neon.h
> @@ -20938,11 +20938,6 @@ vbfdotq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, 
> bfloat16x4_t __b,
>return __builtin_neon_vbfdot_lanev4bfv4sf (__r, __a, __b, __index);
>  }
>
> -#pragma GCC pop_options
> -
> -#pragma GCC push_options
> -#pragma GCC target ("arch=armv8.2-a+bf16")
> -
>  typedef struct bfloat16x4x2_t
>  {
>bfloat16x4_t val[2];
> --
> 2.34.1
>

Re: [PATCH] arm: always enable both simd and mve builtins

2025-05-26 Thread Christophe Lyon

On Mon, 26 May 2025 at 18:35, Christophe Lyon
 wrote:
>
> We get lots of error messages when compiling arm_neon.h under
> e.g. -mcpu=cortex-m55, because Neon builtins are enabled only when
> !TARGET_HAVE_MVE.  This has been the case since MVE support was
> introduced.
>
> This patch uses an approach similar to what we do on aarch64, but only
> partially since Neon intrinsics do not use the "new" framework.
>
> We register all types and Neon intrinsics, whether MVE is enabled or
> not, which enables to compile arm_neon.h.  However, we need to
> introduce a "switcher" similar to aarch64's to avoid ICEs when LTO is
> enabled: in that case, since we have to enable the MVE intrinsics, we
> temporarily change arm_active_target.isa to enable MVE bits.  This
> enables hooks like arm_vector_mode_supported_p and arm_array_mode to
> behave as expected by the MVE intrinsics framework.  We switch patch

s/patch/back/ :-)

> to the previous arm_active_target.isa immediately after.
>
> There is no impact on the testsuite results, except that gcc.log is no
> longer full of errors messages when trying to compile arm_neon.h if
> MVE is forced somehow.
>
> gcc/ChangeLog:
>
> * config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Remove
> TARGET_HAVE_MVE condition.
> (arm_init_mve_builtins): Remove calls to
> arm_init_simd_builtin_types and
> arm_init_simd_builtin_scalar_types.  Switch to MVE isa flags.
> (arm_init_neon_builtins): Remove calls to
> arm_init_simd_builtin_types and
> arm_init_simd_builtin_scalar_types.
> (arm_target_switcher::arm_target_switcher): New.
> (arm_target_switcher::~arm_target_switcher): New.
> (arm_init_builtins): Call arm_init_simd_builtin_scalar_types and
> arm_init_simd_builtin_types.  Always call arm_init_mve_builtins
> and arm_init_neon_builtins.
> * config/arm/arm-protos.h (class arm_target_switcher): New.
> ---
>  gcc/config/arm/arm-builtins.cc | 131 ++---
>  gcc/config/arm/arm-protos.h|  15 
>  2 files changed, 101 insertions(+), 45 deletions(-)
>
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index 3bb2566f9a2..2e4f3595ed2 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -48,6 +48,7 @@
>  #include "basic-block.h"
>  #include "gimple.h"
>  #include "ssa.h"
> +#include "regs.h"
>
>  #define SIMD_MAX_BUILTIN_ARGS 7
>
> @@ -1105,37 +1106,35 @@ arm_init_simd_builtin_types (void)
>   an entry in our mangling table, consequently, they get default
>   mangling.  As a further gotcha, poly8_t and poly16_t are signed
>   types, poly64_t and poly128_t are unsigned types.  */
> -  if (!TARGET_HAVE_MVE)
> -{
> -  arm_simd_polyQI_type_node
> -   = build_distinct_type_copy (intQI_type_node);
> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
> -"__builtin_neon_poly8");
> -  arm_simd_polyHI_type_node
> -   = build_distinct_type_copy (intHI_type_node);
> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
> -"__builtin_neon_poly16");
> -  arm_simd_polyDI_type_node
> -   = build_distinct_type_copy (unsigned_intDI_type_node);
> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
> -"__builtin_neon_poly64");
> -  arm_simd_polyTI_type_node
> -   = build_distinct_type_copy (unsigned_intTI_type_node);
> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,
> -"__builtin_neon_poly128");
> -  /* Init poly vector element types with scalar poly types.  */
> -  arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
> -  arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
> -  arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
> -  arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
> -  /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
> -mangling.  */
> -
> -  /* Prevent front-ends from transforming poly vectors into string
> -literals.  */
> -  TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;
> -  TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
> -}
> +  arm_simd_polyQI_type_node
> += build_distinct_type_copy (intQI_type_node);
> +  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
> +"__builtin_neon_poly8");
> +  arm_simd_polyHI_type_node
> += build_distinct_type_copy (intHI_type_node);
> +  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
> +"__builtin_neon_poly16");
> +  arm_simd_polyDI_type_no

Re: [PATCH v1 1/3] RISC-V: Leverage vaadd.vv for signed standard name avg_floor

2025-05-26 Thread Robin Dapp


-(define_expand "avg3_floor"
- [(set (match_operand: 0 "register_operand")
-   (truncate:
-(ashiftrt:VWEXTI
- (plus:VWEXTI
-  (sign_extend:VWEXTI
-   (match_operand: 1 "register_operand"))
-  (sign_extend:VWEXTI
-   (match_operand: 2 "register_operand"))]
+(define_expand "avg3_floor"
+ [(match_operand:V_VLSI 0 "register_operand")
+  (match_operand:V_VLSI 1 "register_operand")
+  (match_operand:V_VLSI 2 "register_operand")]
   "TARGET_VECTOR"


Couldn't we keep the RTL in order for other optimizations?  I'm not really 
expecting any but at least we'd still have the opportunity.  Or does that 
interfere with the tests?


Apart from that it LGTM, thanks for digging deeper here.

--
Regards
Robin

[PATCH 2/2] forwprop: Add stats for memcpy->memset

2025-05-26 Thread Andrew Pinski

As part of the review of copy prop for aggregates, it was
mentioned there should be some statistics added, and I noticed
the memcpy->memset was missing the statistics too. So this adds
that.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Adds
statistics when the statement changed.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-forwprop.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index e457a69ed48..81ea7d4195e 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -1324,6 +1324,7 @@ optimize_memcpy_to_memset (gimple_stmt_iterator *gsip, 
tree dest, tree src, tree
   tree ctor = build_constructor (TREE_TYPE (dest), NULL);
   gimple_assign_set_rhs_from_tree (gsip, ctor);
   update_stmt (stmt);
+  statistics_counter_event (cfun, "copy zeroing propagation of aggregate", 
1);
 }
   else /* If stmt is memcpy, transform it into memset.  */
 {
@@ -1333,6 +1334,7 @@ optimize_memcpy_to_memset (gimple_stmt_iterator *gsip, 
tree dest, tree src, tree
   gimple_call_set_fntype (call, TREE_TYPE (fndecl));
   gimple_call_set_arg (call, 1, val);
   update_stmt (stmt);
+  statistics_counter_event (cfun, "memcpy to memset changed", 1);
 }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
-- 
2.43.0

[PATCH 1/2] forwprop: Change test in loop of optimize_memcpy_to_memset

2025-05-26 Thread Andrew Pinski

This was noticed in the review of copy propagation for aggregates
patch, instead of checking for a NULL or a non-ssa name of vuse,
we should instead check if it the vuse is a default name and stop
then.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Change check
from NULL/non-ssa name to default name.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-forwprop.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 4c048a9a298..e457a69ed48 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -1226,7 +1226,8 @@ optimize_memcpy_to_memset (gimple_stmt_iterator *gsip, 
tree dest, tree src, tree
   gimple *defstmt;
   unsigned limit = param_sccvn_max_alias_queries_per_access;
   do {
-if (vuse == NULL || TREE_CODE (vuse) != SSA_NAME)
+/* If the vuse is the default definition, then there is no stores 
beforhand. */
+if (SSA_NAME_IS_DEFAULT_DEF (vuse))
   return false;
 defstmt = SSA_NAME_DEF_STMT (vuse);
 if (is_a (defstmt))
-- 
2.43.0

Re: simple frm save/restore strategy (was Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn)

2025-05-26 Thread Vineet Gupta

On 5/26/25 01:18, Robin Dapp wrote:
>> 2. OK'ish: A bunch of testcases see more reads/writes as PRE of redundant
>> read/writes is punted to later passes which obviously needs more work.
>>
>> 3. NOK: We loose the ability to instrument local RM writes - especially in 
>> the
>> testsuite.
>>   e.g.
>>      a.  instrinsic setting a static RM
>>  b. get_frm() to ensure that happened (inline asm to read out frm)
>>
>> The tightly coupled restore kicks in before get_frm could be emitted which 
>> fails
>> to observe #a. This is a deal breaker for the testsuite as much of frm tests
>> report as fail even if the actual codegen is sane.
> I'd say that most of the tests we have right now are written with the 
> existing 
> behavior in mind and don't necessarily translate well to a changed behavior.
>
> We mostly test the proper LCM and backup update behavior and backup updates 
> don't happen with a local-only approach.
>
> I haven't really understood how the FRM-changing intrinsics are used.
>
> There are two extremes: 
>
> - A single intrinsic using a different rounding mode and a lot of other 
>   arithmetic before and after it.  In that case we cannot optimize anyway 
>   because the rest must operate with the global rounding mode.
>
> - A longer code sequence, like a function, that uses a different rounding 
> mode 
>   and every instrinsic being FRM-changing.  In that case we would need to 
>   optimize a lot of saves and restores away until we only have a single save 
> at 
>   the beginning and a single restore at the end.
>
> I suppose we don't handle the latter case well right now.  But on the other 
> hand it's also not very interesting as explicit fegetround (), fesetround (), 
> fesetround () is what the user would/should have done anyway.
>
> So IMHO the only interesting cases are somewhere in the middle.  It would 
> really help to have some examples here that could tell us whether the simple 
> approach leaves a lot on the table (in terms of redundant save/restore).

As I mentioned earlier (3. above), the main issue with this approach is get_frm
() testsuite instrumentation being broken now.
FRM is already restored before it is read back (by inline asm) thus rendering
most of testsuite machinery crippled.
e.g. float-point-frm-run-1.c won't even pass the test local RM set to some
static value.

And indeed there are cases where rest of passes fail to eliminate extraneous
save/restores
float-point-dynamic-frm-13.c  now generates 3 pairs of save/restores vs. 1 save
and 3 restores

I do have yet another implementation which is mid way between the 2 extremes. It
is not as stateless as the tight save/restore and still transitions on calls and
jumps and seems to be a better compromise.
I have an little implementation issue - where inline asm reg is clobbering the
backup reg but get_frm () works as expected.

We can discuss some more in the call tomorrow.

-Vineet

[PATCH V2] For datarefs with big gap, split them into different groups.

2025-05-26 Thread liuhongt

> > It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181
>
> Please mention that in the changelog.  Also ...

Changed.

> Please put this condition in the set of conds we test in the else branch of 
> ...
>
> > >           /* Do not place the same access in the interleaving chain 
> > > twice.  */
> > >           if (init_b == init_prev)
> > >             {
>
> ... this if.  There we have conditions grouped spltting groups.
>
Changed.


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

The patch tries to solve miss vectorization for below case.

void
foo (int* a, int* restrict b)
{
b[0] = a[0] * a[64];
b[1] = a[65] * a[1];
b[2] = a[2] * a[66];
b[3] = a[67] * a[3];
b[4] = a[68] * a[4];
b[5] = a[69] * a[5];
b[6] = a[6] * a[70];
b[7] = a[7] * a[71];
}

In vect_analyze_data_ref_accesses, a[0], a[1], .. a[7], a[64], ...,
a[71] are in same group with size of 71. It caused vectorization
unprofitable.

gcc/ChangeLog:

PR tree-optimization/119181
* tree-vect-data-refs.cc (vect_analyze_data_ref_accesses):
Split datarefs when there's a gap bigger than
MAX_BITSIZE_MODE_ANY_MODE.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr119181.c: New test.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c | 15 +++
 gcc/tree-vect-data-refs.cc  |  7 +++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c
new file mode 100644
index 000..b0d3e5a3cb8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+void
+foo (int* a, int* restrict b)
+{
+b[0] = a[0] * a[64];
+b[1] = a[65] * a[1];
+b[2] = a[2] * a[66];
+b[3] = a[67] * a[3];
+b[4] = a[68] * a[4];
+b[5] = a[69] * a[5];
+b[6] = a[6] * a[70];
+b[7] = a[7] * a[71];
+}
+
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
target vect_int_mult } } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 9fd1ef29650..f2deb751ed9 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3682,6 +3682,13 @@ vect_analyze_data_ref_accesses (vec_info *vinfo,
  != type_size_a))
break;
 
+ /* For datarefs with big gap, it's better to split them into 
different
+groups.
+.i.e a[0], a[1], a[2], .. a[7], a[100], a[101],..., a[107]  */
+ if ((unsigned HOST_WIDE_INT)(init_b - init_prev) * tree_to_uhwi 
(szb)
+ > MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)
+   break;
+
  /* If the step (if not zero or non-constant) is smaller than the
 difference between data-refs' inits this splits groups into
 suitable sizes.  */
-- 
2.34.1

Re: [PATCH v2] driver: Fix multilib_os_dir and multiarch_dir for those target use TARGET_COMPUTE_MULTILIB

2025-05-26 Thread Kito Cheng

Pushed to trunk :)

On Wed, May 21, 2025 at 2:35 AM Jeff Law  wrote:
>
>
>
> On 5/19/25 12:48 AM, Kito Cheng wrote:
> > Hi Jin:
> >
> > Thanks for heads up:)
> >
> > Hi Jeff:
> >
> > I've rebased that on the trunk and everything seems right, do you think
> > it's OK for the trunk?
> Yea, let's get it on the trunk and get it some soak time.  We can then
> look at backporting it to gcc-15's release branch.
>
> jeff
>

Re: [ping^2] [PATCH v2] MIPS: Fix the issue with the '-fpatchable-function-entry=' feature.

2025-05-26 Thread Lulu Cheng


Ping^2

在 2025/5/13 下午2:06, Lulu Cheng 写道:

Ping?

在 2025/5/9 上午10:14, Lulu Cheng 写道:

From: ChengLulu 

PR target/99217

gcc/ChangeLog:

* config/mips/mips.cc (mips_start_function_definition):
Implements the functionality of '-fpatchable-function-entry='.
(mips_print_patchable_function_entry): Define empty function.
(TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): Define macro.

gcc/testsuite/ChangeLog:

* gcc.target/mips/pr99217.c: New test.

---
v1 -> v2:
Add testsuite.
---
  gcc/config/mips/mips.cc | 33 +
  gcc/testsuite/gcc.target/mips/pr99217.c | 10 
  2 files changed, 43 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/mips/pr99217.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 24a28dcf817..f4ec59713b4 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -7478,6 +7478,9 @@ static void
  mips_start_function_definition (const char *name, bool mips16_p,
  tree decl ATTRIBUTE_UNUSED)
  {
+  unsigned HOST_WIDE_INT patch_area_size = crtl->patch_area_size;
+  unsigned HOST_WIDE_INT patch_area_entry = crtl->patch_area_entry;
+
    if (mips16_p)
  fprintf (asm_out_file, "\t.set\tmips16\n");
    else
@@ -7490,6 +7493,10 @@ mips_start_function_definition (const char 
*name, bool mips16_p,

  fprintf (asm_out_file, "\t.set\tnomicromips\n");
  #endif
  +  /* Emit the patching area before the entry label, if any. */
+  if (patch_area_entry > 0)
+    default_print_patchable_function_entry (asm_out_file,
+    patch_area_entry, true);
    if (!flag_inhibit_size_directive)
  {
    fputs ("\t.ent\t", asm_out_file);
@@ -7501,6 +7508,13 @@ mips_start_function_definition (const char 
*name, bool mips16_p,

      /* Start the definition proper.  */
    ASM_OUTPUT_FUNCTION_LABEL (asm_out_file, name, decl);
+
+  /* And the area after the label.  Record it if we haven't done so 
yet.  */

+  if (patch_area_size > patch_area_entry)
+    default_print_patchable_function_entry (asm_out_file,
+    patch_area_size
+    - patch_area_entry,
+    patch_area_entry == 0);
  }
    /* End a function definition started by 
mips_start_function_definition.  */
@@ -23338,6 +23352,21 @@ mips_bit_clear_p (enum machine_mode mode, 
unsigned HOST_WIDE_INT m)

    return false;
  }
  +/* define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY */
+
+/* The MIPS function start is implemented in the prologue function.
+   TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY needs to be inserted
+   before or after the function name, so this function does not
+   use a public implementation. This function is implemented in
+   mips_start_function_definition. */
+
+void
+mips_print_patchable_function_entry (FILE *file ATTRIBUTE_UNUSED,
+ unsigned HOST_WIDE_INT
+ patch_area_size ATTRIBUTE_UNUSED,
+ bool record_p ATTRIBUTE_UNUSED)
+{}
+
  /* Initialize the GCC target structure.  */
  #undef TARGET_ASM_ALIGNED_HI_OP
  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -23651,6 +23680,10 @@ mips_bit_clear_p (enum machine_mode mode, 
unsigned HOST_WIDE_INT m)

  #undef TARGET_DOCUMENTATION_NAME
  #define TARGET_DOCUMENTATION_NAME "MIPS"
  +#undef TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY
+#define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY \
+mips_print_patchable_function_entry
+
  struct gcc_target targetm = TARGET_INITIALIZER;
  
  #include "gt-mips.h"
diff --git a/gcc/testsuite/gcc.target/mips/pr99217.c 
b/gcc/testsuite/gcc.target/mips/pr99217.c

new file mode 100644
index 000..f5851bb1606
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr99217.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fpatchable-function-entry=1" } */
+/* { dg-final { scan-assembler 
"foo:*.*.LPFE0:\n\t.set\tnoreorder\n\tnop\n\t.set\treorder" } } */

+
+/* Test the placement of the .LPFE0 label.  */
+
+void
+foo (void)
+{
+}

Re: [PATCH] testsuite: Fix pr101145inf*.c testcases [PR117494]

2025-05-26 Thread Christophe Lyon

Hi Andrew,

On Sun, 17 Nov 2024 at 22:49, Andrew Pinski  wrote:
>
> Instead of doing a dg-run with a specific target check for linux.
> Use signal as the effective-target since this requires the use
> of ALARM signal to do the testing.
> Also use check_vect in the main and renames main to main1 to make sure
> we don't use the registers.
>
> Tested on x86_64-linux-gnu.

Can you explain the context of this change? Are you testing on a
target which matches *-*-linux* *-*-gnu* *-*-uclinux* but does not
support 'alarm'?

I was looking at backporting this and a later patch from Torbjorn to
gcc-14, but I noticed that the default dg-do-what in this directory is
'compile' and the previous
dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* }  changed it to
'run' on linux targets (thus on linux we had 2 PASS, one for
compilation, one for execution).
And the test was skipped on non-linux targets.

After your patch, the test became compile-only on linux, was this intentional?

Maybe we should use instead:
/* { dg-do run } */ (without target selector)
/* { dg-require-effective-target signal } */
not tested, but I think this would restore the previous behaviour on
*linux* targets, but skip the test on targets which do not support
alarm.

Am I missing something?
I can send a patch to do this.

Thanks,

Christophe


>
> PR testsuite/117494
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/pr101145inf.c: Remove dg-do and replace
> with dg-require-effective-target of signal.
> * gcc.dg/vect/pr101145inf_1.c: Likewise.
> * gcc.dg/vect/pr101145inf.inc: Rename main to main1
> and mark as noinline.
> Include tree-vect.h. Have main call check_vect and main1.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.dg/vect/pr101145inf.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr101145inf.inc | 9 -
>  gcc/testsuite/gcc.dg/vect/pr101145inf_1.c | 2 +-
>  3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf.c 
> b/gcc/testsuite/gcc.dg/vect/pr101145inf.c
> index 3ad8c1a2dd7..aa598875aa5 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr101145inf.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr101145inf.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */
> +/* { dg-require-effective-target signal } */
>  /* { dg-additional-options "-O3" } */
>  #include 
>  #include "pr101145inf.inc"
> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf.inc 
> b/gcc/testsuite/gcc.dg/vect/pr101145inf.inc
> index 4aa3d049187..eb855b9881a 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr101145inf.inc
> +++ b/gcc/testsuite/gcc.dg/vect/pr101145inf.inc
> @@ -1,6 +1,7 @@
>  #include 
>  #include 
>  #include 
> +#include "tree-vect.h"
>
>  void test_finite ();
>  void test_infinite ();
> @@ -10,7 +11,8 @@ void do_exit (int i)
>exit (0);
>  }
>
> -int main(void)
> +__attribute__((noinline))
> +int main1(void)
>  {
>test_finite ();
>struct sigaction s;
> @@ -26,3 +28,8 @@ int main(void)
>return 1;
>  }
>
> +int main(void)
> +{
> +  check_vect ();
> +  return main1();
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c 
> b/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c
> index e3e9dd46d10..0465788c3cc 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr101145inf_1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */
> +/* { dg-require-effective-target signal } */
>  /* { dg-additional-options "-O3" } */
>  #include 
>  #include "pr101145inf.inc"
> --
> 2.43.0
>

Re: [PATCH] For datarefs with big gap, split them into different groups.

2025-05-26 Thread Richard Biener

On Fri, May 16, 2025 at 4:05 AM Hongtao Liu  wrote:
>
> It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181

Please mention that in the changelog.  Also ...

> On Fri, May 16, 2025 at 10:02 AM liuhongt  wrote:
> >
> > The patch tries to solve miss vectorization for below case.
> >
> > void
> > foo (int* a, int* restrict b)
> > {
> > b[0] = a[0] * a[64];
> > b[1] = a[65] * a[1];
> > b[2] = a[2] * a[66];
> > b[3] = a[67] * a[3];
> > b[4] = a[68] * a[4];
> > b[5] = a[69] * a[5];
> > b[6] = a[6] * a[70];
> > b[7] = a[7] * a[71];
> > }
> >
> > In vect_analyze_data_ref_accesses, a[0], a[1], .. a[7], a[64], ...,
> > a[71] are in same group with size of 71. It caused vectorization
> > unprofitable.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-data-refs.cc (vect_analyze_data_ref_accesses):
> > Split datarefs when there's a gap bigger than
> > MAX_BITSIZE_MODE_ANY_MODE.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/bb-slp-pr119181.c: New test.
> > ---
> >  gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c | 15 +++
> >  gcc/tree-vect-data-refs.cc  |  6 ++
> >  2 files changed, 21 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c 
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c
> > new file mode 100644
> > index 000..b0d3e5a3cb8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr119181.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +void
> > +foo (int* a, int* restrict b)
> > +{
> > +b[0] = a[0] * a[64];
> > +b[1] = a[65] * a[1];
> > +b[2] = a[2] * a[66];
> > +b[3] = a[67] * a[3];
> > +b[4] = a[68] * a[4];
> > +b[5] = a[69] * a[5];
> > +b[6] = a[6] * a[70];
> > +b[7] = a[7] * a[71];
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
> > target vect_int_mult } } } */
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index 9fd1ef29650..387e8ac8b61 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -3657,6 +3657,12 @@ vect_analyze_data_ref_accesses (vec_info *vinfo,
> >   && init_a <= init_prev
> >   && init_prev <= init_b);
> >
> > + /* For datarefs with big gap, it's better to split them into 
> > different
> > +groups.
> > +.i.e a[0], a[1], a[2], .. a[7], a[100], a[101],..., a[107]  */
> > + if ((unsigned HOST_WIDE_INT)(init_b - init_prev) * tree_to_uhwi 
> > (szb)
> > + > MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT)
> > +   break;

Please put this condition in the set of conds we test in the else branch of ...

> >   /* Do not place the same access in the interleaving chain twice.  
> > */
> >   if (init_b == init_prev)
> > {

... this if.  There we have conditions grouped spltting groups.

OK with those changes.

Richard.

> > --
> > 2.34.1
> >
>
>
> --
> BR,
> Hongtao

Re: [PATCH] gimple-fold: Implement simple copy propagation for aggregates [PR14295]

2025-05-26 Thread Richard Biener

On Sun, May 18, 2025 at 10:58 PM Andrew Pinski  wrote:
>
> This implements a simple copy propagation for aggregates in the similar
> fashion as we already do for copy prop of zeroing.
>
> Right now this only looks at the previous vdef statement but this allows us
> to catch a lot of cases that show up in C++ code.
>
> Also deletes aggregate copies that are to the same location (PR57361), this 
> was
> already done in DSE but we should do it here also since it is simple to add 
> and
> when doing a copy to a temporary and back to itself should be deleted too.
> So we need a variant that tests DSE and one for forwprop.
>
> Also adds a variant of pr22237.c which was found while working on this patch.
>
> PR tree-optimization/14295
> PR tree-optimization/108358
> PR tree-optimization/114169
>
> gcc/ChangeLog:
>
> * tree-ssa-forwprop.cc (optimize_agr_copyprop): New function.
> (pass_forwprop::execute): Call optimize_agr_copyprop for load/store 
> statements.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/20031106-6.c: Un-xfail. Add scan for forwprop1.
> * g++.dg/opt/pr66119.C: Disable forwprop since that does
> the copy prop now.
> * gcc.dg/tree-ssa/pr108358-a.c: New test.
> * gcc.dg/tree-ssa/pr114169-1.c: New test.
> * gcc.c-torture/execute/builtins/pr22237-1-lib.c: New test.
> * gcc.c-torture/execute/builtins/pr22237-1.c: New test.
> * gcc.dg/tree-ssa/pr57361.c: Disable forwprop1.
> * gcc.dg/tree-ssa/pr57361-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/g++.dg/opt/pr66119.C|   2 +-
>  .../execute/builtins/pr22237-1-lib.c  |  27 +
>  .../execute/builtins/pr22237-1.c  |  57 ++
>  gcc/testsuite/gcc.dg/tree-ssa/20031106-6.c|   8 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr108358-a.c|  33 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr114169-1.c|  39 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr57361-1.c |   9 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr57361.c   |   2 +-
>  gcc/tree-ssa-forwprop.cc  | 103 ++
>  9 files changed, 276 insertions(+), 4 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr108358-a.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr114169-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr57361-1.c
>
> diff --git a/gcc/testsuite/g++.dg/opt/pr66119.C 
> b/gcc/testsuite/g++.dg/opt/pr66119.C
> index d1b1845a258..52362e44434 100644
> --- a/gcc/testsuite/g++.dg/opt/pr66119.C
> +++ b/gcc/testsuite/g++.dg/opt/pr66119.C
> @@ -3,7 +3,7 @@
> the value of MOVE_RATIO now is.  */
>
>  /* { dg-do compile  { target { { i?86-*-* x86_64-*-* } && c++11 } }  }  */
> -/* { dg-options "-O3 -mavx -fdump-tree-sra -march=slm -mtune=slm 
> -fno-early-inlining" } */
> +/* { dg-options "-O3 -mavx -fdump-tree-sra -fno-tree-forwprop -march=slm 
> -mtune=slm -fno-early-inlining" } */
>  // { dg-skip-if "requires hosted libstdc++ for cstdlib malloc" { ! hostedlib 
> } }
>
>  #include 
> diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c 
> b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
> new file mode 100644
> index 000..44032357405
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
> @@ -0,0 +1,27 @@
> +extern void abort (void);
> +
> +void *
> +memcpy (void *dst, const void *src, __SIZE_TYPE__ n)
> +{
> +  const char *srcp;
> +  char *dstp;
> +
> +  srcp = src;
> +  dstp = dst;
> +
> +  if (dst < src)
> +{
> +  if (dst + n > src)
> +   abort ();
> +}
> +  else
> +{
> +  if (src + n > dst)
> +   abort ();
> +}
> +
> +  while (n-- != 0)
> +*dstp++ = *srcp++;
> +
> +  return dst;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
> new file mode 100644
> index 000..0a12b0fc9a1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
> @@ -0,0 +1,57 @@
> +extern void abort (void);
> +extern void exit (int);
> +struct s { unsigned char a[256]; };
> +union u { struct { struct s b; int c; } d; struct { int c; struct s b; } e; 
> };
> +static union u v;
> +static union u v0;
> +static struct s *p = &v.d.b;
> +static struct s *q = &v.e.b;
> +
> +struct outers
> +{
> +  struct s inner;
> +};
> +
> +static inline struct s rp (void) { return *p; }
> +static inline struct s rq (void) { return *q; }
> +static void pq (void)
> +{
> +  struct outers o = {rq () };
> +  *p = o.inner;
> +}
> +static void qp (void)
> +{
> +  struct outers o = {rp () };
> +  *q  = o.inner;
> +}
> +
> +static void
> +init (struct s *sp)
> +{
> +  int i;
> +  for (i = 0; i < 256; i++)
> +sp->a[i] = i;
> +}

Re: [PATCH] testsuite: Fix pr101145inf*.c testcases [PR117494]

2025-05-26 Thread Andrew Pinski

On Mon, May 26, 2025 at 4:57 AM Christophe Lyon
 wrote:
>
> ,,
>
> On Mon, 26 May 2025 at 12:54, Andrew Pinski (QUIC)
>  wrote:
> >
> > > -Original Message-
> > > From: Christophe Lyon 
> > > Sent: Monday, May 26, 2025 3:09 AM
> > > To: Andrew Pinski (QUIC) 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH] testsuite: Fix pr101145inf*.c testcases
> > > [PR117494]
> > >
> > > Hi Andrew,
> > >
> > > On Sun, 17 Nov 2024 at 22:49, Andrew Pinski
> > >  wrote:
> > > >
> > > > Instead of doing a dg-run with a specific target check for
> > > linux.
> > > > Use signal as the effective-target since this requires the use
> > > of
> > > > ALARM signal to do the testing.
> > > > Also use check_vect in the main and renames main to main1
> > > to make sure
> > > > we don't use the registers.
> > > >
> > > > Tested on x86_64-linux-gnu.
> > >
> > > Can you explain the context of this change? Are you testing on
> > > a target which matches *-*-linux* *-*-gnu* *-*-uclinux* but
> > > does not support 'alarm'?
> > >
> > > I was looking at backporting this and a later patch from
> > > Torbjorn to gcc-14, but I noticed that the default dg-do-what
> > > in this directory is 'compile' and the previous dg-do run {
> > > target *-*-linux* *-*-gnu* *-*-uclinux* }  changed it to 'run' on
> > > linux targets (thus on linux we had 2 PASS, one for
> > > compilation, one for execution).
> > > And the test was skipped on non-linux targets.
> > >
> > > After your patch, the test became compile-only on linux, was
> > > this intentional?
> >
> > So vect.exp has some extra code in there where the default to dg-run unless 
> > the target you are running with does not support executing with the vector 
> > options selected.
> > So for an example on x86_64-linux-gnu with we get the test being executed:
> > ```
> > Setting LD_LIBRARY_PATH to 
> > :/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc:/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc/32::/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc:/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc/32
> > Execution timeout is: 300
> > spawn [open ...]^M
> > PASS: gcc.dg/vect/pr101145inf.c execution test
> > ```
> >
> > If you are not getting the executed test any more then the target which you 
> > are running on does not support using the options that are added by 
> > vect.exp.
> >
> > This is the exact behavior we need in this case; otherwise you get 
> > executable failures in some cases. With dg-run, you will get executable 
> > failures on RISC-V because the check_vect functionality is not currently 
> > implemented.
> >
> > I don't know which exact linux target you are testing on since you didn't 
> > mention either. I can only assume it is arm-linux-gnueabi (as 
> > aarch64-linux-gnu just does ` set dg-do-what-default run` so it runs 
> > always) which does the following for check_vect_support_and_set_flags:
> > ```
> > } elseif [is-effective-target arm_neon_ok] {
> > eval lappend DEFAULT_VECTCFLAGS [add_options_for_arm_neon ""]
> > # NEON does not support denormals, so is not used for vectorization 
> > by
> > # default to avoid loss of precision.  We must pass -ffast-math to 
> > test
> > # vectorization of float operations.
> > lappend DEFAULT_VECTCFLAGS "-ffast-math"
> > if [is-effective-target arm_neon_hw] {
> > set dg-do-what-default run
> > } else {
> > set dg-do-what-default compile
> > }
> > }
> > ```
> >
> > So ` is-effective-target arm_neon_hw` must be returning false on the 
> > hardware you are running with. Which also means check_vect would have just 
> > done an `exit (0)` and not do the full test for the hardware you are 
> > running on anyways.
> >
>
> I was looking at aarch64-linux-gnu results, but initially for
> gcc.dg/pr114052-1.c where there's a discussion in the PR about its
> backport to gcc-14.
> Looking at where the dg-require-effective-target alarm was coming
> from, I found r15-7152-g57b706d141b87c which also modified
> gcc.dg/vect/pr101145inf.c.
>
> I forgot that vect.exp does set dg-do run, so you are right of course.
> However, I think for the other testcases updated by r15-7152-g57b706d141b87c,
> we probably want to keep the dg-do run line, as currently
> gcc.dg/pr78185.c, gcc.dg/pr116906-1.c and gcc.dg/pr116906-2.c are
> compiile-only tests on aarch64-linux-gnu.

It looks like you got two commits/patches mixed up here. The commit
associated with my patch is r15-5377-g0dc389f21bfd4e .
I have no comment on r15-7152-g57b706d141b87c since it is unrelated to
this patch (except it touched pr101145inf.c adding the alarm check).
You should move the discussion over to
https://inbox.sourceware.org/gcc-patches/20250119201401.4082622-1-torbjorn.svens...@foss.st.com/
patch instead.

Thanks,
Andrew


>
> Thanks,
>
> Christophe
>
> >
> > Thanks,
> > Andrew Pinski
> >
> >
> > >
> > > Maybe we should use instead:
> > > /* { dg-do run } */ (without target

Re: [PATCH 2/4] c++/modules: Implement streaming of uncontexted TYPE_DECLs [PR98735]

2025-05-26 Thread Nathaniel Shead

On Fri, May 23, 2025 at 11:31:26AM -0400, Jason Merrill wrote:
> On 5/21/25 10:15 PM, Nathaniel Shead wrote:
> > Another approach would be to fix 'write_class_def' to handle these
> > declarations better, but that ended up being more work and felt fragile.
> > It also meant streaming a lot more information that we don't need.
> > 
> > Long term I had been playing around with reworking ubsan.cc entirely to
> > have a fixed set of types it would use we that we could merge with, but
> > given that there seems to be at least one other place we are creating
> > ad-hoc types (the struct for constexpr new allocations), and I couldn't
> > see an easy way of reworking that, I thought we should support this.
> > 
> > Finally, I'm not 100% certain about the hard-coding MK_unique for fields
> > of contextless types, but given that we've given up merging the owning
> > TYPE_DECL with anything anyway I think it should be OK.
> > 
> > -- >8 --
> > 
> > Currently, most declarations must have a DECL_CONTEXT for modules
> > streaming to behave correctly, so that they can have an appropriate
> > merge key generated and be correctly deduplicated on import.
> > 
> > There are a few exceptions, however, for internally generated
> > declarations that will never be merged and don't necessarily have an
> > appropriate parent to key off for the context.  One case that's come up
> > a few times is TYPE_DECLs, especially temporary RECORD_TYPEs used as
> > intermediaries within expressions.
> > 
> > Previously I've tried to give all such types a DECL_CONTEXT, but in some
> > cases that has ended up being infeasible, such as with the types
> > generated by UBSan (which are shared with the C frontend and don't know
> > their context, especially when created at global scope).  Additionally,
> > these types often don't have many of the parts that a normal struct
> > declaration created via parsing user code would have, which confuses
> > module streaming.
> > 
> > Given that these types are typically intended to be one-off and unique
> > anyway, this patch instead adds support for by-value streaming of
> > uncontexted TYPE_DECLs.  The patch only support streaming the bare
> > minimum amount of fields needed for the cases I've come across so far;
> > in general the preference should still be to ensure that DECL_CONTEXT is
> > set where possible.
> 
> We should be able to distinguish such types by CLASS_TYPE_P, which is false
> for them.
> 
> Jason
> 

Right, thanks; here's an updated version that adds this check to
'trees_out::tree_node'.  I think leaving the other asserts is still
valuable to catch if we ever have cases that start using these bits
in the future.

Bootstraped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, most declarations must have a DECL_CONTEXT for modules
streaming to behave correctly, so that they can have an appropriate
merge key generated and be correctly deduplicated on import.

There are a few exceptions, however, for internally generated
declarations that will never be merged and don't necessarily have an
appropriate parent to key off for the context.  One case that's come up
a few times is TYPE_DECLs, especially temporary RECORD_TYPEs used as
intermediaries within expressions.

Previously I've tried to give all such types a DECL_CONTEXT, but in some
cases that has ended up being infeasible, such as with the types
generated by UBSan (which are shared with the C frontend and don't know
their context, especially when created at global scope).  Additionally,
these types often don't have many of the parts that a normal struct
declaration created via parsing user code would have, which confuses
module streaming.

Given that these types are typically intended to be one-off and unique
anyway, this patch instead adds support for by-value streaming of
uncontexted TYPE_DECLs.  The patch only support streaming the bare
minimum amount of fields needed for the cases I've come across so far;
in general the preference should still be to ensure that DECL_CONTEXT is
set where possible.

PR c++/98735
PR c++/120040

gcc/cp/ChangeLog:

* module.cc (trees_out::tree_value): Write TYPE_DECLs.
(trees_in::tree_value): Read TYPE_DECLs.
(trees_out::tree_node): Support uncontexted TYPE_DECLs, and
ensure that all parts of a by-value decl are marked for
streaming.
(trees_out::get_merge_kind): Treat members of uncontexted types
as always unique.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr120040_a.C: New test.
* g++.dg/modules/pr120040_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc | 104 ---
 1 file changed, 99 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f267c3e5fda..d587835dd4f 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -9654,9 +9654,10 @@ trees_out::tree_value (tree t)

   if (DECL_P (t))
 /* No templa

Re: [PATCH] testsuite: Fix pr101145inf*.c testcases [PR117494]

2025-05-26 Thread Christophe Lyon

,,

On Mon, 26 May 2025 at 12:54, Andrew Pinski (QUIC)
 wrote:
>
> > -Original Message-
> > From: Christophe Lyon 
> > Sent: Monday, May 26, 2025 3:09 AM
> > To: Andrew Pinski (QUIC) 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] testsuite: Fix pr101145inf*.c testcases
> > [PR117494]
> >
> > Hi Andrew,
> >
> > On Sun, 17 Nov 2024 at 22:49, Andrew Pinski
> >  wrote:
> > >
> > > Instead of doing a dg-run with a specific target check for
> > linux.
> > > Use signal as the effective-target since this requires the use
> > of
> > > ALARM signal to do the testing.
> > > Also use check_vect in the main and renames main to main1
> > to make sure
> > > we don't use the registers.
> > >
> > > Tested on x86_64-linux-gnu.
> >
> > Can you explain the context of this change? Are you testing on
> > a target which matches *-*-linux* *-*-gnu* *-*-uclinux* but
> > does not support 'alarm'?
> >
> > I was looking at backporting this and a later patch from
> > Torbjorn to gcc-14, but I noticed that the default dg-do-what
> > in this directory is 'compile' and the previous dg-do run {
> > target *-*-linux* *-*-gnu* *-*-uclinux* }  changed it to 'run' on
> > linux targets (thus on linux we had 2 PASS, one for
> > compilation, one for execution).
> > And the test was skipped on non-linux targets.
> >
> > After your patch, the test became compile-only on linux, was
> > this intentional?
>
> So vect.exp has some extra code in there where the default to dg-run unless 
> the target you are running with does not support executing with the vector 
> options selected.
> So for an example on x86_64-linux-gnu with we get the test being executed:
> ```
> Setting LD_LIBRARY_PATH to 
> :/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc:/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc/32::/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc:/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc/32
> Execution timeout is: 300
> spawn [open ...]^M
> PASS: gcc.dg/vect/pr101145inf.c execution test
> ```
>
> If you are not getting the executed test any more then the target which you 
> are running on does not support using the options that are added by vect.exp.
>
> This is the exact behavior we need in this case; otherwise you get executable 
> failures in some cases. With dg-run, you will get executable failures on 
> RISC-V because the check_vect functionality is not currently implemented.
>
> I don't know which exact linux target you are testing on since you didn't 
> mention either. I can only assume it is arm-linux-gnueabi (as 
> aarch64-linux-gnu just does ` set dg-do-what-default run` so it runs always) 
> which does the following for check_vect_support_and_set_flags:
> ```
> } elseif [is-effective-target arm_neon_ok] {
> eval lappend DEFAULT_VECTCFLAGS [add_options_for_arm_neon ""]
> # NEON does not support denormals, so is not used for vectorization by
> # default to avoid loss of precision.  We must pass -ffast-math to 
> test
> # vectorization of float operations.
> lappend DEFAULT_VECTCFLAGS "-ffast-math"
> if [is-effective-target arm_neon_hw] {
> set dg-do-what-default run
> } else {
> set dg-do-what-default compile
> }
> }
> ```
>
> So ` is-effective-target arm_neon_hw` must be returning false on the hardware 
> you are running with. Which also means check_vect would have just done an 
> `exit (0)` and not do the full test for the hardware you are running on 
> anyways.
>

I was looking at aarch64-linux-gnu results, but initially for
gcc.dg/pr114052-1.c where there's a discussion in the PR about its
backport to gcc-14.
Looking at where the dg-require-effective-target alarm was coming
from, I found r15-7152-g57b706d141b87c which also modified
gcc.dg/vect/pr101145inf.c.

I forgot that vect.exp does set dg-do run, so you are right of course.
However, I think for the other testcases updated by r15-7152-g57b706d141b87c,
we probably want to keep the dg-do run line, as currently
gcc.dg/pr78185.c, gcc.dg/pr116906-1.c and gcc.dg/pr116906-2.c are
compiile-only tests on aarch64-linux-gnu.

Thanks,

Christophe

>
> Thanks,
> Andrew Pinski
>
>
> >
> > Maybe we should use instead:
> > /* { dg-do run } */ (without target selector)
> > /* { dg-require-effective-target signal } */ not tested, but I
> > think this would restore the previous behaviour on
> > *linux* targets, but skip the test on targets which do not
> > support alarm.
> >
> > Am I missing something?
> > I can send a patch to do this.
> >
> > Thanks,
> >
> > Christophe
> >
> >
> > >
> > > PR testsuite/117494
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/vect/pr101145inf.c: Remove dg-do and replace
> > > with dg-require-effective-target of signal.
> > > * gcc.dg/vect/pr101145inf_1.c: Likewise.
> > > * gcc.dg/vect/pr101145inf.inc: Rename main to main1
> > > and mark as noinline.
> > > Include tree-vect.h. Have mai

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vxor.vv to vxor.vx on GR2VR cost

2025-05-26 Thread pan2 . li

From: Pan Li 

This patch would like to combine the vec_duplicate + vxor.vv to the
vxor.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, OP)\
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = in[i] OP x;\
  }

  DEF_VX_BINARY(int32_t, |)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma
  13   │ vmv.v.x v2,a2
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vxor.vv v1,v1,v2
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vxor.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case for XOR op.
(expand_vx_binary_vec_vec_dup): Diito.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op or to no_shift_vx_ops.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc  | 2 ++
 gcc/config/riscv/riscv.cc| 1 +
 gcc/config/riscv/vector-iterators.md | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a6ee582f87e..eedcda2b8ff 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -5535,6 +5535,7 @@ expand_vx_binary_vec_dup_vec (rtx op_0, rtx op_1, rtx 
op_2,
 case PLUS:
 case AND:
 case IOR:
+case XOR:
   icode = code_for_pred_scalar (code, mode);
   break;
 case MINUS:
@@ -5563,6 +5564,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx 
op_2,
 case MINUS:
 case AND:
 case IOR:
+case XOR:
   icode = code_for_pred_scalar (code, mode);
   break;
 default:
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 18c8e188f23..7f013d022ce 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3918,6 +3918,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  case MINUS:
  case AND:
  case IOR:
+ case XOR:
{
  rtx op_0 = XEXP (x, 0);
  rtx op_1 = XEXP (x, 1);
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index a50b7fde9c6..77d72a78c1b 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -4042,7 +4042,7 @@ (define_code_iterator any_int_binop [plus minus and ior 
xor ashift ashiftrt lshi
 ])
 
 (define_code_iterator any_int_binop_no_shift_vx [
-  plus minus and ior
+  plus minus and ior xor
 ])
 
 (define_code_iterator any_int_unop [neg not])
-- 
2.43.0

RE: [PATCH] testsuite: Fix pr101145inf*.c testcases [PR117494]

2025-05-26 Thread Andrew Pinski (QUIC)

> -Original Message-
> From: Christophe Lyon 
> Sent: Monday, May 26, 2025 3:09 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] testsuite: Fix pr101145inf*.c testcases
> [PR117494]
> 
> Hi Andrew,
> 
> On Sun, 17 Nov 2024 at 22:49, Andrew Pinski
>  wrote:
> >
> > Instead of doing a dg-run with a specific target check for
> linux.
> > Use signal as the effective-target since this requires the use
> of
> > ALARM signal to do the testing.
> > Also use check_vect in the main and renames main to main1
> to make sure
> > we don't use the registers.
> >
> > Tested on x86_64-linux-gnu.
> 
> Can you explain the context of this change? Are you testing on
> a target which matches *-*-linux* *-*-gnu* *-*-uclinux* but
> does not support 'alarm'?
> 
> I was looking at backporting this and a later patch from
> Torbjorn to gcc-14, but I noticed that the default dg-do-what
> in this directory is 'compile' and the previous dg-do run {
> target *-*-linux* *-*-gnu* *-*-uclinux* }  changed it to 'run' on
> linux targets (thus on linux we had 2 PASS, one for
> compilation, one for execution).
> And the test was skipped on non-linux targets.
> 
> After your patch, the test became compile-only on linux, was
> this intentional?

So vect.exp has some extra code in there where the default to dg-run unless the 
target you are running with does not support executing with the vector options 
selected. 
So for an example on x86_64-linux-gnu with we get the test being executed:
```
Setting LD_LIBRARY_PATH to 
:/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc:/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc/32::/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc:/bajas/pinskia/src/upstream-gcc-isel/gcc/objdir/gcc/32
Execution timeout is: 300
spawn [open ...]^M
PASS: gcc.dg/vect/pr101145inf.c execution test
```

If you are not getting the executed test any more then the target which you are 
running on does not support using the options that are added by vect.exp.

This is the exact behavior we need in this case; otherwise you get executable 
failures in some cases. With dg-run, you will get executable failures on RISC-V 
because the check_vect functionality is not currently implemented.

I don't know which exact linux target you are testing on since you didn't 
mention either. I can only assume it is arm-linux-gnueabi (as aarch64-linux-gnu 
just does ` set dg-do-what-default run` so it runs always) which does the 
following for check_vect_support_and_set_flags:
```
} elseif [is-effective-target arm_neon_ok] {
eval lappend DEFAULT_VECTCFLAGS [add_options_for_arm_neon ""]
# NEON does not support denormals, so is not used for vectorization by
# default to avoid loss of precision.  We must pass -ffast-math to test
# vectorization of float operations.
lappend DEFAULT_VECTCFLAGS "-ffast-math"
if [is-effective-target arm_neon_hw] {
set dg-do-what-default run
} else {
set dg-do-what-default compile
}
}
```

So ` is-effective-target arm_neon_hw` must be returning false on the hardware 
you are running with. Which also means check_vect would have just done an `exit 
(0)` and not do the full test for the hardware you are running on anyways.

Thanks,
Andrew Pinski

> 
> Maybe we should use instead:
> /* { dg-do run } */ (without target selector)
> /* { dg-require-effective-target signal } */ not tested, but I
> think this would restore the previous behaviour on
> *linux* targets, but skip the test on targets which do not
> support alarm.
> 
> Am I missing something?
> I can send a patch to do this.
> 
> Thanks,
> 
> Christophe
> 
> 
> >
> > PR testsuite/117494
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/pr101145inf.c: Remove dg-do and replace
> > with dg-require-effective-target of signal.
> > * gcc.dg/vect/pr101145inf_1.c: Likewise.
> > * gcc.dg/vect/pr101145inf.inc: Rename main to main1
> > and mark as noinline.
> > Include tree-vect.h. Have main call check_vect and
> main1.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/testsuite/gcc.dg/vect/pr101145inf.c   | 2 +-
> >  gcc/testsuite/gcc.dg/vect/pr101145inf.inc | 9 -
> > gcc/testsuite/gcc.dg/vect/pr101145inf_1.c | 2 +-
> >  3 files changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf.c
> > b/gcc/testsuite/gcc.dg/vect/pr101145inf.c
> > index 3ad8c1a2dd7..aa598875aa5 100644
> > --- a/gcc/testsuite/gcc.dg/vect/pr101145inf.c
> > +++ b/gcc/testsuite/gcc.dg/vect/pr101145inf.c
> > @@ -1,4 +1,4 @@
> > -/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */
> > +/* { dg-require-effective-target signal } */
> >  /* { dg-additional-options "-O3" } */  #include 
> #include
> > "pr101145inf.inc"
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr101145inf.inc
> > b/gcc/testsuite/gcc.dg/vect/pr101145inf.inc
> > index 4aa3d049187..eb855b988

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-26 Thread Tomasz Kaminski

On Mon, May 26, 2025 at 1:32 PM Luc Grosheintz 
wrote:

>
>
> On 5/26/25 11:43, Tomasz Kaminski wrote:
> > On Mon, May 26, 2025 at 11:35 AM Luc Grosheintz <
> luc.groshei...@gmail.com>
> > wrote:
> >
> >>
> >>
> >> On 5/22/25 15:21, Tomasz Kaminski wrote:
> >>>
> >>> For the stride and product computation, we should perform them in
> >>> Extent::size_type, not index_type.
> >>> The latter may be signed, and we may hit UB in multiplying non-zero
> >>> extents, before reaching the zero.
> >>>
> >>
> >> Then I observe the following issues:
> >>
> >> 1. When computing products, the integer promotion rules can interfere.
> >> For simplicity let's assume that int is a 32 bit integer. Then the
> >> relevant case is `uint16_t` (or unsigned short). Which is unsigned; and
> >> therefore overflow shouldn't be UB. I observe that the expression
> >>
> >> prod *= n;
> >>
> >> will overflow as `int` (for large enough `n`). I believe that during the
> >> computation of `prod * n` both sides are promoted to int (because the
> >> range of uint16_t is contained in the range of `int`) and then
> >> overflows, e.g. for n = 2**16-1.
> >>
> >> Note that many other small, both signed and unsigned, integers
> >> semantically also overflow, but it's neither UB that's detected by
> >> -fsanitize=undefined, nor a compiler error. Likely because the
> >> "overflow" happens during conversion, which (in C++23) is uniquely
> >> defined in [conv.integral], i.e. not UB.
> >>
> >> draft: https://eel.is/c++draft/conv.integral
> >> N4950: 7.3.9 on p. 101
> >>
> >> The solution I've come up is to not use `size_type` but
> >> make_unsigned_t
> >>
> >> Please let me know if there's a better solution to forcing unsigned
> >> math.
> >>
> > I think at this point we should perform stride computation in
> std::size_t.
> > Because accessors are defined to accept size_t, the required_span_size()
> > cannot be greater
> > than maximum of size_t, and that limits our product of extents.
> >
>
> I looked into this in the context of computing the product of
> static extents. The stumbling block was that I couldn't find
> a clear statement that sizeof(int) <= sizeof(size_t), or that
> size_t is exempted from the integer conversion rules.
>
> Therefore, the concern was that the overflow issue would come
> back on systems with 16-bit size_t and 32-bit int.
>
We could cast elements of __dyn_exts to size_t before multiplying in
__ext_prod.
Even use size_t in for loop: for (size_t x ; __dyn_ext()).

>
> I'm slightly unhappy that (on common systems) we need to use
> 64-bit integers for 32-bit (or less) operations; but as you
> point out, this only affects code that shouldn't be performance
> sensitive.
>
> >>
> >> Godbolt: https://godbolt.org/z/PnvaYT7vd
> >>
> >> 2. Let's assume we compute `__extents_prod` safely, e.g. by doing all
> >> math as unsigned integers. There's several places we need to be careful:
> >>
> >> 2.1. layout_{right,left}::stride, these still compute products, that
> >> overflow and might not be multiplied by `0` to make the answer
> >> unambiguous. For an empty extent, any number is a valid stride.
> Hence,
> >> this only requires that we don't run into UB.
> >>
> >> 2.2. The default ctor of layout_stride computes the layout_right
> >> strides on the fly. We can use __unsigned_prod to keep computing the
> >> extents in linear time. The only requirement I'm aware of is that
> the
> >> strides are the same as those for layout_right (but the actual value
> >> in not defined directly).
> >>
> >> 2.3 layout_stride::required_span_size, the current implementation
> >> first scans for zeros; and only if there are none does it proceed
> with
> >> computing the required span size in index_type. This is safe,
> because
> >> the all terms in the sum are non-negative and the mandate states
> that
> >> the total is a representable number. Hence, all the involved terms
> are
> >> representable too.
> >>
> >> 3. For those interested in what the other two implementions do: both
> >> fail in some subset of the corner cases.
> >>
> >> Godbolt: https://godbolt.org/z/vEYxEvMWs
> >>
> >>
> >
>
>

Re: [PATCH 1/2] Match:Support IMM=-1 for signed scalar SAT_ADD IMM form1

2025-05-26 Thread Richard Biener

On Mon, May 19, 2025 at 10:41 AM Li Xu  wrote:
>
> From: xuli 
>
> This patch would like to support .SAT_ADD when IMM=-1.
>
> Form1:
> T __attribute__((noinline))  \
> sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
> {\
>   T sum = (UT)x + (UT)IMM; \
>   return (x ^ IMM) < 0 \
> ? sum\
> : (sum ^ x) >= 0 \
>   ? sum  \
>   : x < 0 ? MIN : MAX;   \
> }
>
> Take below form1 as example:
> DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -1, INT8_MIN, INT8_MAX)
>
> Before this patch:
> __attribute__((noinline))
> int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
> {
>   unsigned char x.0_1;
>   unsigned char _2;
>   unsigned char _3;
>   int8_t iftmp.1_4;
>   signed char _8;
>   unsigned char _9;
>   signed char _10;
>
>[local count: 1073741824]:
>   x.0_1 = (unsigned char) x_5(D);
>   _3 = -x.0_1;
>   _10 = (signed char) _3;
>   _8 = x_5(D) & _10;
>   if (_8 < 0)
> goto ; [1.40%]
>   else
> goto ; [98.60%]
>
>[local count: 434070867]:
>   _2 = x.0_1 + 255;
>
>[local count: 1073741824]:
>   # _9 = PHI <_2(3), 128(2)>
>   iftmp.1_4 = (int8_t) _9;
>   return iftmp.1_4;
>
> }
>
> After this patch:
> __attribute__((noinline))
> int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
> {
>   int8_t _4;
>
>[local count: 1073741824]:
>   gimple_call <.SAT_ADD, _4, x_5(D), 255> [tail call]
>   gimple_return <_4>
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

OK.

Richard.

> Signed-off-by: Li Xu 
>
> gcc/ChangeLog:
>
> * match.pd: Add signed scalar SAT_ADD IMM form1 with IMM=-1 matching.
> * tree-ssa-math-opts.cc (match_unsigned_saturation_add): Adapt 
> function name.
> (match_saturation_add_with_assign): Match signed and unsigned SAT_ADD 
> with assign.
> (math_opts_dom_walker::after_dom_children): Match imm=-1 signed 
> SAT_ADD with NOP_EXPR case.
>
> ---
>  gcc/match.pd  | 19 ++-
>  gcc/tree-ssa-math-opts.cc | 30 +-
>  2 files changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 98411af3940..a07dbb808d2 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3403,7 +3403,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (bit_xor:c @0 INTEGER_CST@3)) integer_zerop)
>  (signed_integer_sat_val @0)
>  @2)
> -  (if (wi::bit_and (wi::to_wide (@1), wi::to_wide (@3)) == 0
> +  (if (wi::bit_and (wi::to_wide (@1), wi::to_wide (@3)) == 0)))
> +
> +(match (signed_integer_sat_add @0 @1)
> +  /* T SUM = (T)((UT)X + (UT)-1);
> + SAT_S_ADD = (X ^ -1) < 0 ? SUM : (X ^ SUM) >= 0 ? SUM
> + : (x < 0) ? MIN : MAX  
> */
> +  (convert (cond^ (lt (bit_and:c @0 (nop_convert (negate (nop_convert @0
> + integer_zerop)
> +INTEGER_CST@2
> +(plus (nop_convert @0) integer_all_onesp@1)))
> +   (with
> +{
> + unsigned precision = TYPE_PRECISION (type);
> + wide_int c1 = wi::to_wide (@1);
> + wide_int c2 = wi::to_wide (@2);
> + wide_int sum = wi::add (c1, c2);
> +}
> +(if (wi::eq_p (sum, wi::max_value (precision, SIGNED)))
>
>  /* Saturation sub for signed integer.  */
>  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type))
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 292eb852f2d..f6a1bea2002 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4064,15 +4064,34 @@ build_saturation_binary_arith_call_and_insert 
> (gimple_stmt_iterator *gsi,
>   *   _10 = -_9;
>   *   _12 = _7 | _10;
>   *   =>
> - *   _12 = .SAT_ADD (_4, _6);  */
> + *   _12 = .SAT_ADD (_4, _6);
> + *
> + * Try to match IMM=-1 saturation signed add with assign.
> + *  [local count: 1073741824]:
> + * x.0_1 = (unsigned char) x_5(D);
> + * _3 = -x.0_1;
> + * _10 = (signed char) _3;
> + * _8 = x_5(D) & _10;
> + * if (_8 < 0)
> + *   goto ; [1.40%]
> + * else
> + *   goto ; [98.60%]
> + *  [local count: 434070867]:
> + * _2 = x.0_1 + 255;
> + *  [local count: 1073741824]:
> + * # _9 = PHI <_2(3), 128(2)>
> + * _4 = (int8_t) _9;
> + *   =>
> + * _4 = .SAT_ADD (x_5, -1); */
>
>  static void
> -match_unsigned_saturation_add (gimple_stmt_iterator *gsi, gassign *stmt)
> +match_saturation_add_with_assign (gimple_stmt_iterator *gsi, gassign *stmt)
>  {
>tree ops[2];
>tree lhs = gimple_assign_lhs (stmt);
>
> -  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
> +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +  || gimple_signed_integer_sat_add (lhs, ops, NULL))
>  build_saturation_binary_arith_call_and_replace (gsi, IFN_SAT_ADD, lhs,
>

Re: [PATCH 1/2] Match:Support signed vector SAT_ADD IMM form 1

2025-05-26 Thread Richard Biener

On Mon, May 19, 2025 at 10:42 AM Li Xu  wrote:
>
> From: xuli 
>
> This patch would like to support vector SAT_ADD when one of the op
> is singed IMM.
>
> void __attribute__((noinline))   \
> vec_sat_s_add_imm_##T##_fmt_1##_##INDEX (T *out, T *op_1, unsigned limit) \
> {\
>   unsigned i;\
>   for (i = 0; i < limit; i++)\
> {\
>   T x = op_1[i]; \
>   T sum = (UT)x + (UT)IMM;   \
>   out[i] = (x ^ IMM) < 0 \
> ? sum\
> : (sum ^ x) >= 0 \
>   ? sum  \
>   : x < 0 ? MIN : MAX;   \
> }\
> }
>
> Take below form1 as example:
> DEF_VEC_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, 9, INT8_MIN, INT8_MAX)
>
> Before this patch:
> __attribute__((noinline))
> void vec_sat_s_add_imm_int8_t_fmt_1_0 (int8_t * restrict out, int8_t * 
> restrict op_1, unsigned int limit)
> {
>   vector([16,16]) signed char * vectp_out.28;
>   vector([16,16]) signed char vect_iftmp.27;
>   vector([16,16])  mask__28.26;
>   vector([16,16])  mask__29.25;
>   vector([16,16])  mask__19.19;
>   vector([16,16])  mask__31.18;
>   vector([16,16]) signed char vect__6.17;
>   vector([16,16]) signed char vect__5.16;
>   vector([16,16]) signed char vect_sum_15.15;
>   vector([16,16]) unsigned char vect__4.14;
>   vector([16,16]) unsigned char vect_x.13;
>   vector([16,16]) signed char vect_x_14.12;
>   vector([16,16]) signed char * vectp_op_1.10;
>   vector([16,16])  _78;
>   vector([16,16]) unsigned char _79;
>   vector([16,16]) unsigned char _80;
>   unsigned long _92;
>   unsigned long ivtmp_93;
>   unsigned long ivtmp_94;
>   unsigned long _95;
>
>[local count: 118111598]:
>   if (limit_12(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119322]:
>   _92 = (unsigned long) limit_12(D);
>
>[local count: 955630226]:
>   # vectp_op_1.10_62 = PHI 
>   # vectp_out.28_89 = PHI 
>   # ivtmp_93 = PHI 
>   _95 = .SELECT_VL (ivtmp_93, POLY_INT_CST [16, 16]);
>   vect_x_14.12_64 = .MASK_LEN_LOAD (vectp_op_1.10_62, 8B, { -1, ... }, _95, 
> 0);
>   vect_x.13_65 = VIEW_CONVERT_EXPR char>(vect_x_14.12_64);
>   vect__4.14_67 = vect_x.13_65 + { 9, ... };
>   vect_sum_15.15_68 = VIEW_CONVERT_EXPR char>(vect__4.14_67);
>   vect__5.16_70 = vect_x_14.12_64 ^ { 9, ... };
>   vect__6.17_71 = vect_x_14.12_64 ^ vect_sum_15.15_68;
>   mask__31.18_73 = vect__5.16_70 >= { 0, ... };
>   mask__19.19_75 = vect_x_14.12_64 < { 0, ... };
>   mask__29.25_85 = vect__6.17_71 < { 0, ... };
>   mask__28.26_86 = mask__31.18_73 & mask__29.25_85;
>   _78 = ~mask__28.26_86;
>   _79 = .VCOND_MASK (mask__19.19_75, { 128, ... }, { 127, ... });
>   _80 = .COND_ADD (_78, vect_x.13_65, { 9, ... }, _79);
>   vect_iftmp.27_87 = VIEW_CONVERT_EXPR(_80);
>   .MASK_LEN_STORE (vectp_out.28_89, 8B, { -1, ... }, _95, 0, 
> vect_iftmp.27_87);
>   vectp_op_1.10_63 = vectp_op_1.10_62 + _95;
>   vectp_out.28_90 = vectp_out.28_89 + _95;
>   ivtmp_94 = ivtmp_93 - _95;
>   if (ivtmp_94 != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   return;
>
> }
>
> After this patch:
> __attribute__((noinline))
> void vec_sat_s_add_imm_int8_t_fmt_1_0 (int8_t * restrict out, int8_t * 
> restrict op_1, unsigned int limit)
> {
>   vector([16,16]) signed char * vectp_out.12;
>   vector([16,16]) signed char vect_patt_10.11;
>   vector([16,16]) signed char vect_x_14.10;
>   vector([16,16]) signed char D.2852;
>   vector([16,16]) signed char * vectp_op_1.8;
>   vector([16,16]) signed char _73(D);
>   unsigned long _80;
>   unsigned long ivtmp_81;
>   unsigned long ivtmp_82;
>   unsigned long _83;
>
>[local count: 118111598]:
>   if (limit_12(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119322]:
>   _80 = (unsigned long) limit_12(D);
>
>[local count: 955630226]:
>   # vectp_op_1.8_71 = PHI 
>   # vectp_out.12_77 = PHI 
>   # ivtmp_81 = PHI 
>   _83 = .SELECT_VL (ivtmp_81, POLY_INT_CST [16, 16]);
>   vect_x_14.10_74 = .MASK_LEN_LOAD (vectp_op_1.8_71, 8B, { -1, ... }, _73(D), 
> _83, 0);
>   vect_patt_10.11_75 = .SAT_ADD (vect_x_14.10_74, { 9, ... });
>   .MASK_LEN_STORE (vectp_out.12_77, 8B, { -1, ... }, _83, 0, 
> vect_patt_10.11_75);
>   vectp_op_1.8_72 = vectp_op_1.8_71 + _83;
>   vectp_out.12_78 = vectp_out.12_77 + _83;
>   ivtmp_82 = ivtmp_81 - _83;
>   if (ivtmp_82 != 0)
> goto ; [89.00%]
>   else
> goto ; [11.

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Jan Hubicka

Hi,
> Ping?
Sorry for the delay. I think I finally got auto-fdo running on my box
and indeed I see that if function is cloned later, the profile is lost.
There are .suffixes added before afdo pass (such as openmp offloading or
nested functions) and there are .suffixes added afer afdo (by ipa
cloning and LTO privatization).  I see we want to merge those created by
ipa cloning (after afdo pass).  But I do not think we want to merge
those for i.e.  nested functions since those are actual different
functions or for openmp offloading.

I also wonder what happens with LTO privatization - i.e. how we look up
what static function does the symbol belong?

Overwritting the data by the last clone is definitely bad, so the patch
is OK, but we should figure out what happens in the cases above.

Also if we merge, it may happen that the clone is noticeably different
from original - for example with ipa split it may be missing part of the
body. Merging the tables elementwise is safe then?

Honza
> 
> Thanks,
> Kugan
> 
> 
> 
> > On 9 May 2025, at 11:54 am, Kugan Vivekanandarajah 
> >  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > This patch add support for merging profiles from multiple clones.
> > That is, when optimized binaries have clones such as IPA-CP clone or SRA
> > clones, genarted gcov will have profiled them spereately.
> > Currently we pick one and ignore the rest. This patch fixes this by
> > merging the profiles.
> >
> >
> > Regression tested on aarch64-linux-gnu with no new regression.
> > Also successfully  done autoprofiledbootstrap with the relevant patch.
> >
> > Is this OK for trunk?
> > Thanks,
> > Kugan
> >
>

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Jan Hubicka

Hi,
also, please, can you add an testcase?  We should have some coverage for
auto-fdo specific issues

Honza


0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch
Description: 0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-05-26 Thread Richard Biener

On Fri, May 23, 2025 at 2:31 PM Alexander Monakov  wrote:
>
> In PR 105965 we accepted a request to form FMA instructions when the
> source code is using a narrow generic vector that contains just one
> element, corresponding to V1SF or V1DF mode, while the backend does not
> expand fma patterns for such modes.
>
> For this to work under -ffp-contract=on, we either need to modify
> backends, or emulate such degenerate-vector FMA via scalar FMA in
> tree-vect-generic.  Do the latter.

Can you instead apply the lowering during gimplification?  That is because
having an unsupported internal-function in the IL the user could not have
emitted directly is somewhat bad.  I thought the vector lowering could
be generalized for more single-argument internal functions but then no
such unsupported calls should exist in the first place.

Richard.

> gcc/c-family/ChangeLog:
>
> * c-gimplify.cc (fma_supported_p): Allow forming single-element
> vector FMA when scalar FMA is available.
> (c_gimplify_expr): Allow vector types.
>
> gcc/ChangeLog:
>
> * tree-vect-generic.cc (expand_vec1_fma): New helper.  Use it...
> (expand_vector_operations_1): ... here to handle IFN_FMA.
> ---
>  gcc/c-family/c-gimplify.cc | 10 ++--
>  gcc/tree-vect-generic.cc   | 48 --
>  2 files changed, 54 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> index c6fb764656..1942d5019e 100644
> --- a/gcc/c-family/c-gimplify.cc
> +++ b/gcc/c-family/c-gimplify.cc
> @@ -875,7 +875,13 @@ c_build_bind_expr (location_t loc, tree block, tree body)
>  static bool
>  fma_supported_p (enum internal_fn fn, tree type)
>  {
> -  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
> +  return (direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH)
> + /* Accept single-element vector FMA (see PR 105965) when the
> +backend handles the scalar but not the vector mode.  */
> + || (VECTOR_TYPE_P (type)
> + && known_eq (TYPE_VECTOR_SUBPARTS (type),  1U)
> + && direct_internal_fn_supported_p (fn, TREE_TYPE (type),
> +OPTIMIZE_FOR_BOTH)));
>  }
>
>  /* Gimplification of expression trees.  */
> @@ -939,7 +945,7 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> ATTRIBUTE_UNUSED,
> /* For -ffp-contract=on we need to attempt FMA contraction only
>during initial gimplification.  Late contraction across statement
>boundaries would violate language semantics.  */
> -   if (SCALAR_FLOAT_TYPE_P (type)
> +   if ((SCALAR_FLOAT_TYPE_P (type) || VECTOR_FLOAT_TYPE_P (type))
> && flag_fp_contract_mode == FP_CONTRACT_ON
> && cfun && !(cfun->curr_properties & PROP_gimple_any)
> && fma_supported_p (IFN_FMA, type))
> diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> index 3c68361870..954b84edce 100644
> --- a/gcc/tree-vect-generic.cc
> +++ b/gcc/tree-vect-generic.cc
> @@ -1983,6 +1983,36 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
>gsi_replace (gsi, g, false);
>  }
>
> +/* Expand IFN_FMA, assuming vector contains just one scalar.
> +   c_gimplify_expr can introduce it when performing FMA contraction.  */
> +
> +static void
> +expand_vec1_fma (gimple_stmt_iterator *gsi)
> +{
> +  gcall *call = as_a  (gsi_stmt (*gsi));
> +  tree type = TREE_TYPE (gimple_call_arg (call, 0));
> +  if (!VECTOR_TYPE_P (type))
> +return;
> +  gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), 1U));
> +
> +  for (int i = 0; i < 3; i++)
> +{
> +  tree arg = gimple_call_arg (call, i);
> +  arg = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, TREE_TYPE (type), arg);
> +  gimple_call_set_arg (call, i, arg);
> +}
> +  tree lhs = gimple_call_lhs (call);
> +  if (lhs)
> +{
> +  tree new_lhs = make_ssa_name (TREE_TYPE (type));
> +  gimple_call_set_lhs (call, new_lhs);
> +  tree ctor = build_constructor_single (type, 0, new_lhs);
> +  gimple *g = gimple_build_assign (lhs, CONSTRUCTOR, ctor);
> +  gsi_insert_after (gsi, g, GSI_NEW_STMT);
> +}
> +  update_stmt (call);
> +}
> +
>  /* Process one statement.  If we identify a vector operation, expand it.  */
>
>  static void
> @@ -1998,8 +2028,22 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi)
>gassign *stmt = dyn_cast  (gsi_stmt (*gsi));
>if (!stmt)
>  {
> -  if (gimple_call_internal_p (gsi_stmt (*gsi), IFN_VEC_CONVERT))
> -   expand_vector_conversion (gsi);
> +  gcall *call = dyn_cast  (gsi_stmt (*gsi));
> +  if (!call || !gimple_call_internal_p (call))
> +   return;
> +  switch (gimple_call_internal_fn (call))
> +   {
> +   case IFN_VEC_CONVERT:
> + return expand_vector_conversion (gsi);
> +   case IFN_FMA:
> +   case IFN_FMS:
> +   case IFN_FNMA:
> +   case IFN_FNMS:
> + if (!direct_in

Re: [PATCH RFA] fold: DECL_VALUE_EXPR isn't simple [PR120400]

2025-05-26 Thread Iain Sandoe

Hi Jason

> On 26 May 2025, at 15:07, Jason Merrill  wrote:
> 
> Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> Iain, will you verify that one of your coroutine testcases breaks without this
> fix?  

Yes; all current coroutine ramp cleanups are exposed to (potential) UB at -O > 
0.
This patch resolves the issue (together with a typo fix that I should get 
applied
later today).

Note: this is very hard to test by execution since, on most platforms I’ve 
tried,
the coroutine frame content remains after it is freed and so the dangling 
pointer
still sees something that looks valid.  I’ve been manually checking the gimple;
possibly we might find some not-too-fragile test that way.

Iain.

> I don't think lambda or anonymous union uses of DECL_VALUE_EXPR can break
> in the same way, though this change is also correct for them.
> 
> -- 8< --
> 
> This PR noted that fold_truth_andor was wrongly changing && to & where the
> RHS is a VAR_DECL with DECL_VALUE_EXPR; we can't assume that such can be
> evaluated unconditionally.
> 
> To be more precise we could recurse into DECL_VALUE_EXPR, but that doesn't
> seem worth bothering with since typical uses involve a COMPONENT_REF, which
> is not simple.
> 
>   PR c++/120400
> 
> gcc/ChangeLog:
> 
>   * fold-const.cc (simple_operand_p): False for vars with
>   DECL_VALUE_EXPR.
> ---
> gcc/fold-const.cc | 5 +
> 1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 5f48ced5063..014f4218793 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -5085,6 +5085,11 @@ simple_operand_p (const_tree exp)
>#pragma weak, etc).  */
> && ! TREE_PUBLIC (exp)
> && ! DECL_EXTERNAL (exp)
> +   /* DECL_VALUE_EXPR will expand to something non-simple.  */
> +   && ! ((VAR_P (exp)
> +  || TREE_CODE (exp) == PARM_DECL
> +  || TREE_CODE (exp) == RESULT_DECL)
> + && DECL_HAS_VALUE_EXPR_P (exp))
> /* Weakrefs are not safe to be read, since they can be NULL.
>They are !TREE_PUBLIC && !DECL_EXTERNAL but still
>have DECL_WEAK flag set.  */
> 
> base-commit: f59ff19bc3d37f4dd159db541ed4f07efb10fcc8
> -- 
> 2.49.0
>

PING² — Re: PING (and v2) – [Patch] nvptx/nvptx.opt: Update -march-map= for newer sm_xxx

2025-05-26 Thread Tobias Burnus


PING²

On May 12, 2025, Tobias Burnus wrote:

PING.

There is actually a minor update as meanwhile CUDA 12.8 was
released that added the 'f' suffix and sm_103 and sm_121.
Still, the pattern remains the same; hence, a normal PING.

On April 25, 2025, Tobias Burnus wrote:


The idea of -march-map= is to simply and future proof select the
best -march for a certain arch, without requiring that the compiler
has support for it (like having a special multilib for it) - while
-march= sets the actually used '.target' (and the compiler might
actually generate specialized code for it).

The patch updates the sm_X for the CUDA 12.8 additions, namely for
three Blackwell GPU architectures.

Cf. 
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
or also 
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html


OK for mainline?

Tobias

PS: CUDA 12.7 seems to be an internal release, which shows up as
PTX version but was not released to the public.
PTX 8.6/CUDA 12.7 added sm_100/sm_101 - and PTX 8.7/CUDA 12.8 added 
sm120.


PPS: sm_80 (Ampere) was added in PTX ISA 7.0 (CUDA 11.0),
sm_89 (Ada) in PTX ISA 7.8 (CUDA).
As sm_90 (Hopper) + sm_100/101/120 (Blackwell) currently/now map to
sm_89, GCC generates PTX ISA .version 7.8 for them.
Otherwise, sm_80 and sm_89 produce (for now) identical code.nvptx/nvptx.opt: Update -march-map= for newer sm_xxx

Usage of the -march-map=: "Select the closest available '-march=' value
that is not more capable."

As PTX ISA 8.6/8.7 (= unreleased CUDA 12.7 + CUDA 12.8) added the
Nvidia Blackwell GPUs SM_100, SM_101, and SM_120, it makes sense to
add them as well. Note that all three come as sm_XXX and sm_XXXa.

PTX ISA 8.8 (CUDA 12.9) added SM_103 and SM_121 and the new 'f' suffix
for all SM_1xx.

Internally, GCC currently generates the same code for >= sm_80 (Ampere);
however, as GCC's -march= also supports sm_89 (Ada), the here added
sm_1xxs (Blackwell) will map to sm_89.

[Naming note: while ptx code generated for sm_X can also run with sm_Y
if Y > X, code generated for sm_XXXa can (generally) only run on
the specific hardware; and sm_XXXf implies compatibility with only
subsequent targets in the same family.]

gcc/ChangeLog:

	* config/nvptx/nvptx.opt (march-map=): Add sm_100{,f,a},
	sm_101{,f,a}, sm_103{,a,f}, sm_120{,a,f} and sm_121{,f,a}.

 gcc/config/nvptx/nvptx.opt | 45 +
 1 file changed, 45 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index d326ca4ad26..9796839f8df 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -120,6 +120,51 @@ Target RejectNegative Alias(misa=,sm_89)
 march-map=sm_90a
 Target RejectNegative Alias(misa=,sm_89)
 
+march-map=sm_100
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_100f
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_100a
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_101
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_101f
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_101a
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_103
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_103f
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_103a
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_120
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_120f
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_120a
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_121
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_121f
+Target RejectNegative Alias(misa=,sm_89)
+
+march-map=sm_121a
+Target RejectNegative Alias(misa=,sm_89)
+
 Enum
 Name(ptx_version) Type(enum ptx_version)
 Known PTX ISA versions (for use with the -mptx= option):

[PATCH] testsuite: Restore dg-do run on pr116906 and pr78185 tests

2025-05-26 Thread Christophe Lyon

Commit r15-7152-g57b706d141b87c removed
/* { dg-do run { target*-*-linux* *-*-gnu* *-*-uclinux* } } */

from these tests, turning them into 'compile' only tests, even when
they could be executed.

This patch adds
/* { dg-do run } */

which is OK since the tests are correctly skipped if needed thanks to
the following effective-targets (alarm and signal).

With this patch we have again two entries for these tests on linux targets:
* compile (test for excess errors)
* execution test
---
 gcc/testsuite/gcc.dg/pr116906-1.c | 1 +
 gcc/testsuite/gcc.dg/pr116906-2.c | 1 +
 gcc/testsuite/gcc.dg/pr78185.c| 1 +
 3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/pr116906-1.c 
b/gcc/testsuite/gcc.dg/pr116906-1.c
index 7187507a60d..ee60ad67e93 100644
--- a/gcc/testsuite/gcc.dg/pr116906-1.c
+++ b/gcc/testsuite/gcc.dg/pr116906-1.c
@@ -1,3 +1,4 @@
+/* { dg-do run } */
 /* { dg-require-effective-target alarm } */
 /* { dg-require-effective-target signal } */
 /* { dg-options "-O2" } */
diff --git a/gcc/testsuite/gcc.dg/pr116906-2.c 
b/gcc/testsuite/gcc.dg/pr116906-2.c
index 41a352bf837..4172ec3644a 100644
--- a/gcc/testsuite/gcc.dg/pr116906-2.c
+++ b/gcc/testsuite/gcc.dg/pr116906-2.c
@@ -1,3 +1,4 @@
+/* { dg-do run } */
 /* { dg-require-effective-target alarm } */
 /* { dg-require-effective-target signal } */
 /* { dg-options "-O2 -fno-tree-ch" } */
diff --git a/gcc/testsuite/gcc.dg/pr78185.c b/gcc/testsuite/gcc.dg/pr78185.c
index ada8b1b9f90..4c3af4f2890 100644
--- a/gcc/testsuite/gcc.dg/pr78185.c
+++ b/gcc/testsuite/gcc.dg/pr78185.c
@@ -1,3 +1,4 @@
+/* { dg-do run } */
 /* { dg-require-effective-target alarm } */
 /* { dg-require-effective-target signal } */
 /* { dg-options "-O" } */
-- 
2.34.1

Re: [PATCH] testsuite: Restore dg-do run on pr116906 and pr78185 tests

2025-05-26 Thread Christophe Lyon

On Mon, 26 May 2025 at 17:14, Christophe Lyon
 wrote:
>
> Commit r15-7152-g57b706d141b87c removed
> /* { dg-do run { target*-*-linux* *-*-gnu* *-*-uclinux* } } */
>
> from these tests, turning them into 'compile' only tests, even when
> they could be executed.
>
> This patch adds
> /* { dg-do run } */
>
> which is OK since the tests are correctly skipped if needed thanks to
> the following effective-targets (alarm and signal).
>
> With this patch we have again two entries for these tests on linux targets:
> * compile (test for excess errors)
> * execution test

Gasp I forgot to add a ChangeLog entry, but it would be an obvious:
Add 'dg-do run' :-)


> ---
>  gcc/testsuite/gcc.dg/pr116906-1.c | 1 +
>  gcc/testsuite/gcc.dg/pr116906-2.c | 1 +
>  gcc/testsuite/gcc.dg/pr78185.c| 1 +
>  3 files changed, 3 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.dg/pr116906-1.c 
> b/gcc/testsuite/gcc.dg/pr116906-1.c
> index 7187507a60d..ee60ad67e93 100644
> --- a/gcc/testsuite/gcc.dg/pr116906-1.c
> +++ b/gcc/testsuite/gcc.dg/pr116906-1.c
> @@ -1,3 +1,4 @@
> +/* { dg-do run } */
>  /* { dg-require-effective-target alarm } */
>  /* { dg-require-effective-target signal } */
>  /* { dg-options "-O2" } */
> diff --git a/gcc/testsuite/gcc.dg/pr116906-2.c 
> b/gcc/testsuite/gcc.dg/pr116906-2.c
> index 41a352bf837..4172ec3644a 100644
> --- a/gcc/testsuite/gcc.dg/pr116906-2.c
> +++ b/gcc/testsuite/gcc.dg/pr116906-2.c
> @@ -1,3 +1,4 @@
> +/* { dg-do run } */
>  /* { dg-require-effective-target alarm } */
>  /* { dg-require-effective-target signal } */
>  /* { dg-options "-O2 -fno-tree-ch" } */
> diff --git a/gcc/testsuite/gcc.dg/pr78185.c b/gcc/testsuite/gcc.dg/pr78185.c
> index ada8b1b9f90..4c3af4f2890 100644
> --- a/gcc/testsuite/gcc.dg/pr78185.c
> +++ b/gcc/testsuite/gcc.dg/pr78185.c
> @@ -1,3 +1,4 @@
> +/* { dg-do run } */
>  /* { dg-require-effective-target alarm } */
>  /* { dg-require-effective-target signal } */
>  /* { dg-options "-O" } */
> --
> 2.34.1
>

[PATCH] testsuite, arm: factorize arm_v8_neon_ok flags

2025-05-26 Thread Christophe Lyon

Like we do in other effective-targets, add "-mcpu=unset
-march=armv8-a" directly when setting et_arm_v8_neon_flags in
arm_v8_neon_ok_nocache, to avoid having to add these two flags in all
users of arm_v8_neon_ok.

This avoids duplication and possible typos.

gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_arm_v8_neon_ok_nocache): Add "-mcpu=unset
-march=armv8-a" to et_arm_v8_neon_flags.
(add_options_for_vect_early_break): Remove useless "-mcpu=unset
-march=armv8-a".
(add_options_for_arm_v8_neon): Likewise.
---
 gcc/testsuite/lib/target-supports.exp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 3cbc13fc8a7..7ace678518e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4574,7 +4574,7 @@ proc add_options_for_vect_early_break { flags } {
 
 if { [check_effective_target_arm_v8_neon_ok] } {
global et_arm_v8_neon_flags
-   return "$flags $et_arm_v8_neon_flags -mcpu=unset -march=armv8-a"
+   return "$flags $et_arm_v8_neon_flags"
 }
 
 if { [check_effective_target_sse4] } {
@@ -5397,7 +5397,7 @@ proc add_options_for_arm_v8_neon { flags } {
return "$flags"
 }
 global et_arm_v8_neon_flags
-return "$flags $et_arm_v8_neon_flags -mcpu=unset -march=armv8-a"
+return "$flags $et_arm_v8_neon_flags"
 }
 
 # Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
@@ -5856,7 +5856,7 @@ proc check_effective_target_arm_v8_neon_ok_nocache { } {
__asm__ volatile ("vrintn.f32 q0, q0");
}
} "$flags -mcpu=unset -march=armv8-a"] } {
-   set et_arm_v8_neon_flags $flags
+   set et_arm_v8_neon_flags "$flags -mcpu=unset -march=armv8-a"
return 1
}
 }
-- 
2.34.1

Re: [Patch] OpenMP/C++: Avoid ICE for BIND_EXPR with empty BIND_EXPR_BLOCK [PR120413]

2025-05-26 Thread Tobias Burnus


Jakub Jelinek wrote:

There is also BIND_EXPR_VARS, dunno if that should be walked instead or
in addition.


The usage is to ensure that variables are mapped with lambdas (→ 
closure_vars_accessed.add (…)) but not if they are local variables (→ 
data->local_decls.add (var)).


The 'closure_vars_accessed.add(…)' code currently only triggers for 
libgomp.c++/target-lambda-1.C.


I will leave the PR open for some more testing and test-case adding.

However, as an ICE is bad, I now committed the patch with the suggested 
change:



Anyway, from formatting POV, it would be nicer to do

...

   if (tree block = BIND_EXPR_BLOCK (t))


Done so - and committed as r16-881-g45b849d05b733a.

Thanks,

Tobias

[PATCH] testsuite: arm: add needed -mcpu / -march to arm_crypto_ok

2025-05-26 Thread Christophe Lyon

This effective target implicitly expects -march=armv8-a, otherwise
with a toolchain configured for instance with
--with-cpu=cortex-m0 --with-float=soft,
it fails even when trying
-mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp:
arm_neon.h:45:2: error: #error "NEON intrinsics not available with the 
soft-float ABI.  Please use -mfloat-abi=softfp or -mfloat-abi=hard"

With this patch, the effective target succeeds using
-mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp -mcpu=unset -march=armv8-a
thus enabling a few more tests.

For instance with a toolchain defaulting to cortex-m0, we now enable:
gcc.target/arm/aes-fuse-1.c
gcc.target/arm/aes-fuse-2.c
gcc.target/arm/aes_xor_combine.c
gcc.target/arm/attr-neon3.c
gcc.target/arm/crypto-*
gcc.target/arm/simd: several *p64* tests

Out of these, a few are failing, but this should be addressed
separately:
FAIL: gcc.target/arm/aes-fuse-1.c scan-assembler-times crypto_aese_fused 6
FAIL: gcc.target/arm/aes-fuse-2.c scan-assembler-times crypto_aesd_fused 6

With a toolchain defaulting to cortex-m55, we have these additional failures:
FAIL: gcc.target/arm/attr-neon3.c (test for excess errors)
FAIL: gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times 
vdup.32\\tq[0-9]+, r[0-9]+ 4
FAIL: gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times 
vdup.32\\tq[0-9]+, r[0-9]+ 4
FAIL: gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times 
vdup.32\\tq[0-9]+, r[0-9]+ 4
FAIL: gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times 
vdup.32\\tq[0-9]+, r[0-9]+ 4

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_crypto_ok_nocache): Add "-mcpu=unset
-march=armv8-a".
---
 gcc/testsuite/lib/target-supports.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 7ace678518e..a88f9be8851 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5346,8 +5346,8 @@ proc check_effective_target_arm_crypto_ok_nocache { } {
{
  return vaeseq_u8 (a, b);
}
-   } "$flags"] } {
-   set et_arm_crypto_flags $flags
+   } "$flags -mcpu=unset -march=armv8-a"] } {
+   set et_arm_crypto_flags "$flags -mcpu=unset -march=armv8-a"
return 1
}
}
-- 
2.34.1

[PATCH] arm_neon.h: remove useless push/pop pragmas

2025-05-26 Thread Christophe Lyon

Remove #pragma GCC target ("arch=armv8.2-a+bf16") and preceding
target and is thus useless.

gcc/ChangeLog:

* config/arm/arm_neon.h: Remove useless push/pop pragmas.
---
 gcc/config/arm/arm_neon.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index cba50de0720..105385f7f5d 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -20938,11 +20938,6 @@ vbfdotq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, 
bfloat16x4_t __b,
   return __builtin_neon_vbfdot_lanev4bfv4sf (__r, __a, __b, __index);
 }
 
-#pragma GCC pop_options
-
-#pragma GCC push_options
-#pragma GCC target ("arch=armv8.2-a+bf16")
-
 typedef struct bfloat16x4x2_t
 {
   bfloat16x4_t val[2];
-- 
2.34.1

Re: [PATCH v4 2/8] libstdc++: Implement layout_left from mdspan.

2025-05-26 Thread Tomasz Kaminski

On Mon, May 26, 2025 at 4:15 PM Luc Grosheintz 
wrote:

> Implements the parts of layout_left that don't depend on any of the
> other layouts.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (layout_left): New class.
> * src/c++23/std.cc.in: Add layout_left.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan  | 304 ++-
>  libstdc++-v3/src/c++23/std.cc.in |   1 +
>  2 files changed, 304 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 0f49b0e09a0..d81072596b4 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   { return __exts[__i]; });
>   }
>
> +   static constexpr span
> +   _S_static_extents(size_t __begin, size_t __end) noexcept
> +   {
> + return {_Extents.data() + __begin, _Extents.data() + __end};
> +   }
>
Oh, I think I was very unclear, regarding removing the dependency on
index_type.
What I was thinking of, is changing this function to:
+   static consteval array&
+   _S_static_extents() noexcept
+   {
+ return _Extents;
+   }

> +
> +   constexpr span
> +   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
> +   requires (_Extents.size() > 0)
> +   {
> + return {_M_dyn_exts + _S_dynamic_index[__begin],
> + _M_dyn_exts + _S_dynamic_index[__end]};
> +   }
> +
>private:
> using _S_storage = __array_traits<_IndexType,
> _S_rank_dynamic>::_Type;
> [[no_unique_address]] _S_storage _M_dyn_exts{};
> @@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> || _Extent <= numeric_limits<_IndexType>::max();
>}
>
> +  namespace __mdspan
> +  {
> +template
> +  constexpr span
> +  __static_extents(size_t __begin = 0, size_t __end =
> _Extents::rank())
> +  { return _Extents::_S_storage::_S_static_extents(__begin, __end); }

Also adjusting this one to have:
   template
 constexpr const std::array&
__static_extents()
+  { return _Extents::_S_storage::_S_static_extents; }

> +
> +template
> +  constexpr span
> +  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
> +   size_t __end = _Extents::rank())
> +  {
> +   return __exts._M_exts._M_dynamic_extents(__begin, __end);
> +  }
> +  }
> +
>template
>  class extents
>  {
> @@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> : _M_exts(span(__exts))
> { }
>
> -
>template<__mdspan::__valid_index_type _OIndexType,
> size_t _Nm>
> requires (_Nm == rank() || _Nm == rank_dynamic())
> constexpr explicit(_Nm != rank_dynamic())
> @@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
>
>  private:
> +  friend span
> +  __mdspan::__static_extents(size_t, size_t);
> +
> +  friend span
> +  __mdspan::__dynamic_extents(const extents&, size_t,
> size_t);
> +
>using _S_storage = __mdspan::_ExtentsStorage<
> _IndexType, array{_Extents...}>;
>[[no_unique_address]] _S_storage _M_exts;
> @@ -286,6 +321,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>namespace __mdspan
>{
> +template
> +  constexpr bool
> +  __contains_zero(span<_Tp, _Nm> __exts)
> +  {
> +   for (size_t __i = 0; __i < __exts.size(); ++__i)
> + if (__exts[__i] == 0)
> +   return true;
> +   return false;
> +  }
> +
> +constexpr size_t
> +__static_extents_prod(const auto& __sta_exts)
>
Then this could be implemented as:
   template
   __static_extents_prod(size_t __begin, size_t __end);
// We provide the array as template parameter.
   {
  }

> +{
> +  size_t __ret = 1;
> +  for (auto __factor : span(_Ext).subspan(__begin, __end))
> +   if (__factor != dynamic_extent)
> + __ret *= __factor;
> +  return __ret;
> +}
> +
> +template
> +  constexpr typename _Extents::index_type
> +  __exts_prod(const _Extents& __exts, size_t __begin, size_t __end)
> noexcept
> +  {
> +   using _IndexType = typename _Extents::index_type;
> +
> +   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
> +   size_t __sta_prod = __static_extents_prod(__sta_exts);
>
And then it will be called as follows:
   constexpr auto& __sta_exts = __static_extents<_Extents>();
size_t __sta_prod = __static_extents_prod<__sta_exts>(__begin, __end);
This way __static_extents_prod will not depend on index type.

We seem to also compute the product twice.

> +
> +   size_t __ret = 1;
> +   if constexpr (_Extents::rank_dynamic() != _Extents::rank())
> + __ret = __static_extents_prod(__sta_exts);
> +
> +   if (__ret == 0)
> + return 0;
> +
> +   if constexpr (_Extents::rank_dynamic() > 0)
> + for (auto __factor : __

[committed] libstdc++: Run in_place constructor test for std::indirect [PR119152]

2025-05-26 Thread Tomasz Kamiński

In indirect/ctor.cc test_inplace_ctor function was defined, but never
called.

PR libstdc++/119152

libstdc++-v3/ChangeLog:

* testsuite/std/memory/indirect/ctor.cc: Run test_inplace_ctor.
---
Tested on x86_64-linux. Pushed to trunk.

 libstdc++-v3/testsuite/std/memory/indirect/ctor.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/std/memory/indirect/ctor.cc 
b/libstdc++-v3/testsuite/std/memory/indirect/ctor.cc
index 67e7a8aba03..124874d02fe 100644
--- a/libstdc++-v3/testsuite/std/memory/indirect/ctor.cc
+++ b/libstdc++-v3/testsuite/std/memory/indirect/ctor.cc
@@ -139,7 +139,6 @@ test_inplace_ctor()
 
   std::indirect> i5(std::in_place);
   VERIFY( i5->size() == 0 );
-  VERIFY( i5->at(0) == 13 );
 
   std::indirect> i6(std::in_place, 5, 13);
   VERIFY( i6->size() == 5 );
@@ -194,10 +193,12 @@ int main()
 {
   test_default_ctor();
   test_forwarding_ctor();
+  test_inplace_ctor();
 
   static_assert([] {
 test_default_ctor();
 test_forwarding_ctor();
+test_inplace_ctor();
 return true;
   });
 }
-- 
2.49.0

[PATCH] arm: always enable both simd and mve builtins

2025-05-26 Thread Christophe Lyon

We get lots of error messages when compiling arm_neon.h under
e.g. -mcpu=cortex-m55, because Neon builtins are enabled only when
!TARGET_HAVE_MVE.  This has been the case since MVE support was
introduced.

This patch uses an approach similar to what we do on aarch64, but only
partially since Neon intrinsics do not use the "new" framework.

We register all types and Neon intrinsics, whether MVE is enabled or
not, which enables to compile arm_neon.h.  However, we need to
introduce a "switcher" similar to aarch64's to avoid ICEs when LTO is
enabled: in that case, since we have to enable the MVE intrinsics, we
temporarily change arm_active_target.isa to enable MVE bits.  This
enables hooks like arm_vector_mode_supported_p and arm_array_mode to
behave as expected by the MVE intrinsics framework.  We switch patch
to the previous arm_active_target.isa immediately after.

There is no impact on the testsuite results, except that gcc.log is no
longer full of errors messages when trying to compile arm_neon.h if
MVE is forced somehow.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Remove
TARGET_HAVE_MVE condition.
(arm_init_mve_builtins): Remove calls to
arm_init_simd_builtin_types and
arm_init_simd_builtin_scalar_types.  Switch to MVE isa flags.
(arm_init_neon_builtins): Remove calls to
arm_init_simd_builtin_types and
arm_init_simd_builtin_scalar_types.
(arm_target_switcher::arm_target_switcher): New.
(arm_target_switcher::~arm_target_switcher): New.
(arm_init_builtins): Call arm_init_simd_builtin_scalar_types and
arm_init_simd_builtin_types.  Always call arm_init_mve_builtins
and arm_init_neon_builtins.
* config/arm/arm-protos.h (class arm_target_switcher): New.
---
 gcc/config/arm/arm-builtins.cc | 131 ++---
 gcc/config/arm/arm-protos.h|  15 
 2 files changed, 101 insertions(+), 45 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 3bb2566f9a2..2e4f3595ed2 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -48,6 +48,7 @@
 #include "basic-block.h"
 #include "gimple.h"
 #include "ssa.h"
+#include "regs.h"
 
 #define SIMD_MAX_BUILTIN_ARGS 7
 
@@ -1105,37 +1106,35 @@ arm_init_simd_builtin_types (void)
  an entry in our mangling table, consequently, they get default
  mangling.  As a further gotcha, poly8_t and poly16_t are signed
  types, poly64_t and poly128_t are unsigned types.  */
-  if (!TARGET_HAVE_MVE)
-{
-  arm_simd_polyQI_type_node
-   = build_distinct_type_copy (intQI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
-"__builtin_neon_poly8");
-  arm_simd_polyHI_type_node
-   = build_distinct_type_copy (intHI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
-"__builtin_neon_poly16");
-  arm_simd_polyDI_type_node
-   = build_distinct_type_copy (unsigned_intDI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
-"__builtin_neon_poly64");
-  arm_simd_polyTI_type_node
-   = build_distinct_type_copy (unsigned_intTI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,
-"__builtin_neon_poly128");
-  /* Init poly vector element types with scalar poly types.  */
-  arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
-  arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
-  arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
-  arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
-  /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
-mangling.  */
-
-  /* Prevent front-ends from transforming poly vectors into string
-literals.  */
-  TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;
-  TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
-}
+  arm_simd_polyQI_type_node
+= build_distinct_type_copy (intQI_type_node);
+  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
+"__builtin_neon_poly8");
+  arm_simd_polyHI_type_node
+= build_distinct_type_copy (intHI_type_node);
+  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
+"__builtin_neon_poly16");
+  arm_simd_polyDI_type_node
+= build_distinct_type_copy (unsigned_intDI_type_node);
+  (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
+"__builtin_neon_poly64");
+  arm_simd_polyTI_type_node
+= build_distinct_type

Re: [PATCH] fortran: add constant input support for trig functions with half-revolutions

2025-05-26 Thread Steve Kargl

On Mon, May 26, 2025 at 09:30:59AM +, Yuao Ma wrote:
> Hi Steve,
> 
> > I looked at the patch in a bit more detail, and
> > I am not thrilled with large-scale whitespace
> > changes mingled with functional changes. It makes
> > the patch harder to read and review.
> 
> I'm not sure which file you're referring to.
> 
> If it's mathbuiltins.def, I'll need to add extra spaces to maintain argument
> alignment when I add the seven new built-ins.
> 
> If it's intrinsic.cc, the issue is related to clang-format usage. You can find
> a more detailed explanation at
> https://gcc.gnu.org/pipermail/fortran/2025-May/062193.html.
> 

It's intrinsic.cc.

   /* Two-argument version of atand, equivalent to atan2d.  */
-  add_sym_2 ("atand", GFC_ISYM_ATAN2D, CLASS_ELEMENTAL, ACTUAL_YES,
-BT_REAL, dr, GFC_STD_F2023,
-gfc_check_atan2, gfc_simplify_atan2d, gfc_resolve_trigd2,
-y, BT_REAL, dr, REQUIRED,
-x, BT_REAL, dr, REQUIRED);
+  add_sym_2 ("atand", GFC_ISYM_ATAN2D, CLASS_ELEMENTAL, ACTUAL_YES, BT_REAL, 
dr,
+GFC_STD_F2023, gfc_check_atan2, gfc_simplify_atan2d,
+gfc_resolve_trig2, y, BT_REAL, dr, REQUIRED, x, BT_REAL, dr,
+REQUIRED);
 
What is the functional change in the above?  It is somewhat difficult
to see a single character change.  It's much easier to see (and detect 
possible typos) in the following:

   /* Two-argument version of atand, equivalent to atan2d.  */
   add_sym_2 ("atand", GFC_ISYM_ATAN2D, CLASS_ELEMENTAL, ACTUAL_YES,
 BT_REAL, dr, GFC_STD_F2023,
-gfc_check_atan2, gfc_simplify_atan2d, gfc_resolve_trigd2,
+gfc_check_atan2, gfc_simplify_atan2d, gfc_resolve_trig2,
 y, BT_REAL, dr, REQUIRED,
 x, BT_REAL, dr, REQUIRED);
 
To be clear, do not use clang-format if it inserts large-scale
whitespace changes.  I'll also contend that that final result
in the latter is easier for a programmer to read.  The 2nd line
is return type info and standard conformation.  The 3rd line
contains the checking, simplification, and iresolve functions.
The remaining lines are the dummy arguments with one per line.


-- 
Steve

Re: [PATCH] libstdc++: Make debug iterator pointer sequence const [PR116369]

2025-05-26 Thread François Dumont


Ok, I'll give it another try.

Trying to use the same approach for targets using gnu.ver and others 
thought, seems more reasonable to me.


François


On 22/05/2025 09:28, Jonathan Wakely wrote:



On Thu, 22 May 2025, 08:26 Jonathan Wakely,  wrote:



On Thu, 15 May 2025, 06:26 François Dumont, 
wrote:

Got

On 14/05/2025 18:46, Jonathan Wakely wrote:
> On Wed, 14 May 2025 at 17:31, François Dumont
 wrote:
>> On 12/05/2025 23:03, Jonathan Wakely wrote:
>>> On 31/03/25 22:20 +0200, François Dumont wrote:
 Hi

 Following this previous patch

https://gcc.gnu.org/pipermail/libstdc++/2024-August/059418.html
I've
 completed it for the _Safe_unordered_container_base type and
 implemented the rest of the change to store the safe iterator
 sequence as a pointer-to-const.

      libstdc++: Make debug iterator pointer sequence
const [PR116369]

      In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c
the debug
 sequence
      have been made mutable to allow attach iterators to
const
 containers.
      This change completes this fix by also declaring
debug unordered
 container
      members mutable.

      Additionally the debug iterator sequence is now a
 pointer-to-const and so
      _Safe_sequence_base _M_attach and all other methods
are const
 qualified.
      Symbols export are maintained thanks to __asm
directives.

>>> I can't compile this, it seems to be missing changes to
>>> safe_local_iterator.tcc:
>>>
>>> In file included from
>>>

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.h:444,
>>>                   from
>>> /home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:33:
>>>

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:
>>> In member function ‘typename
>>> __gnu_debug::_Distance_traits<_Iterator>::__type
>>> __gnu_debug::_Safe_local_iterator<_Iterator,
>>> _Sequence>::_M_get_distance_to(const
>>> __gnu_debug::_Safe_local_iterator<_Iterator, _Sequence>&)
const’:
>>>

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
>>> error: there are no arguments to ‘_M_get_sequence’ that
depend on a
>>> template parameter, so a declaration of ‘_M_get_sequence’
must be
>>> available [-Wtemplate-body]
>>>     47 | _M_get_sequence()->bucket_size(bucket()),
>>>        |  ^~~
>>>

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
>>> note: (if you use ‘-fpermissive’, G++ will accept your
code, but
>>> allowing the use of an undeclared name is deprecated)
>>>

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:59:18:
>>> error: there are no arguments to ‘_M_get_sequence’ that
depend on a
>>> template parameter, so a declaration of ‘_M_get_sequence’
must be
>>> available [-Wtemplate-body]
>>>     59 | -_M_get_sequence()->bucket_size(bucket()),
>>>        | ^~~
>>>
>> Yes, sorry, I had already spotted this problem, but only
updated the PR
>> and not re-sending patch here.
>>
>>
 Also available as a PR

 https://forge.sourceware.org/gcc/gcc-TEST/pulls/47

      /** Detach all singular iterators.
       *  @post for all iterators i attached to this sequence,
       *   i->_M_version == _M_version.
       */
      void
 -    _M_detach_singular();
 +    _M_detach_singular() const
 +
__asm("_ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEv");
>>> Does this work on all targets?
>> No idea ! I thought the symbol name used here just had to
match the
>> entries in config/abi/pre/gnu.ver.
> That linker script is not used for all targets.

Ok, got it, I only need to use this when symbol versioning is
activated.


I don't think that's right. For targets that don't use gnu.ver we
still want to preserve the same symbols. They just aren't
versioned on those targets.
And e.g. Solaris uses versioning, but a different format, not
gnu.ver, and I don't remember it the s

[PATCH v1] rs6000: Restore opaque overload variant for correct diagnostics

2025-05-26 Thread Kishan Parmar

Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

After r12-5752-gd08236359eb229, a new bif infrastructure was introduced
which stopped using opaque vector types (e.g. opaque_V4SI_type_node)
for overloaded built-in functions, which led to incorrect and
misleading diagnostics when argument types didn’t exactly match.

This patch reinstates the opaque overload variant for entries with
multiple arguments where at least one is a vector, inserting it
at the beginning of each stanza. This helps recover the intended
fallback behavior and ensures clearer, type-generic error reporting.

2025-05-23  Kishan Parmar  

gcc:
PR target/104930
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Skip the first overload entry during iteration if it uses opaque type
parameters.
* config/rs6000/rs6000-gen-builtins.cc
(maybe_generate_opaque_variant): New function.
(parse_first_ovld_entry): New function.
(parse_ovld_stanza): call parse_first_ovld_entry.
---
 gcc/config/rs6000/rs6000-c.cc|   9 +-
 gcc/config/rs6000/rs6000-gen-builtins.cc | 180 ++-
 2 files changed, 187 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index d3b0a566821..6217d585b40 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1972,7 +1972,14 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
   arg_i++)
{
  tree parmtype = TREE_VALUE (nextparm);
- if (!rs6000_builtin_type_compatible (types[arg_i], parmtype))
+ /* Since we only need opaque vector type for the default
+prototype which is the same as the first instance, we
+only expect to see it in the first instance.  */
+ gcc_assert (instance == 
rs6000_overload_info[adj_fcode].first_instance
+ || parmtype != opaque_V4SI_type_node);
+ if ((instance == rs6000_overload_info[adj_fcode].first_instance
+  && parmtype == opaque_V4SI_type_node)
+ || !rs6000_builtin_type_compatible (types[arg_i], parmtype))
{
  mismatch = true;
  break;
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.cc 
b/gcc/config/rs6000/rs6000-gen-builtins.cc
index f77087e0452..d442b93138e 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.cc
+++ b/gcc/config/rs6000/rs6000-gen-builtins.cc
@@ -353,6 +353,7 @@ struct typeinfo
   char isunsigned;
   char isbool;
   char ispixel;
+  char isopaque;
   char ispointer;
   basetype base;
   restriction restr;
@@ -579,6 +580,7 @@ static typemap type_map[] =
 { "v4sf",  "V4SF" },
 { "v4si",  "V4SI" },
 { "v8hi",  "V8HI" },
+{ "vop4si","opaque_V4SI" },
 { "vp8hi", "pixel_V8HI" },
   };
 
@@ -1058,6 +1060,7 @@ match_type (typeinfo *typedata, int voidok)
vd  vector double
v256__vector_pair
v512__vector_quad
+   vop vector opaque
 
  For simplicity, We don't support "short int" and "long long int".
  We don't currently support a  of "_Float16".  "signed"
@@ -1496,6 +1499,12 @@ complete_vector_type (typeinfo *typeptr, char *buf, int 
*bufi)
   *bufi += 4;
   return;
 }
+  else if (typeptr->isopaque)
+{
+  memcpy (&buf[*bufi], "op4si", 5);
+  *bufi += 5;
+  return;
+}
   switch (typeptr->base)
 {
 case BT_CHAR:
@@ -1661,7 +1670,8 @@ construct_fntype_id (prototype *protoptr)
  buf[bufi++] = '_';
  if (argptr->info.isconst
  && argptr->info.base == BT_INT
- && !argptr->info.ispointer)
+ && !argptr->info.ispointer
+ && !argptr->info.isopaque)
{
  buf[bufi++] = 'c';
  buf[bufi++] = 'i';
@@ -1969,6 +1979,168 @@ create_bif_order (void)
   rbt_inorder_callback (&bifo_rbt, bifo_rbt.rbt_root, set_bif_order);
 }
 
+/* Attempt to generate an opaque variant if needed and valid.  */
+static void
+maybe_generate_opaque_variant (ovlddata* entry)
+{
+  /* If no vector arg, no need to create opaque variant.  */
+  bool has_vector_arg = false;
+  for (typelist* arg = entry->proto.args; arg; arg = arg->next)
+{
+  if (arg->info.isvector)
+   {
+ has_vector_arg = true;
+ break;
+   }
+}
+
+  if (!has_vector_arg || entry->proto.nargs <= 1)
+return;
+
+  /* Construct the opaque variant.  */
+  ovlddata* opaque_entry = &ovlds[curr_ovld];
+  memcpy (opaque_entry, entry, sizeof (*entry));
+
+  /* Deep-copy and override vector args.  */
+  typelist** dst = &opaque_entry->proto.args;
+  for (typelist* src = entry->proto.args; src; src = src->next)
+{
+  typelist* copy = (typelist*) malloc (sizeof (typelist));
+
+  if (src->info.isvector)
+   {
+ memset (©->in

[committed] c-c++-common/gomp/{attrs-,}metadirective-3.c: Fix expected result [PR118694]

2025-05-26 Thread Tobias Burnus


Committed as r16-883-g5d6ed6d604ff94.

Silence errors when the error supports nvptx offloading.
Seehttps://gcc.gnu.org/PR118694 (esp. comment 9) why we
cannot easily nesting silence the error.

(Short answer: 'target' call is different if teams is present
but that's only known when processing the offload code + deciding
at runtime whether host fallback, gcn or nvptx offloading is
used.)

I will backport this also to GCC 15 as it is affected as well.

Tobias
commit 5d6ed6d604ff949b650e48fa4eaed3ec8b6489c1
Author: Tobias Burnus 
Date:   Mon May 26 19:50:40 2025 +0200

c-c++-common/gomp/{attrs-,}metadirective-3.c: Fix expected result [PR118694]

With compilation for nvptx enabled, two issues showed up:
(a) "error: 'target' construct with nested 'teams' construct contains
 directives outside of the 'teams' construct"
See PR comment 9 why this is difficult to fix.
Solution: Add dg-bogus and accept/expect the error for 'target offload_nvptx'.

(b) The assumptions about the dump for 'target offload_nvptx' were wrong
as the metadirective was already expanded to a OMP_NEXT_VARIANT
construct such that no 'omp metadirective' was left in either case.
Solution: Check that no 'omp metadirective' is left; additionally, expect
either OMP_NEXT_VARIANT (when offload_nvptx is available) or no 'teams'
directive at all (if not).

gcc/testsuite/ChangeLog:

PR middle-end/118694
* c-c++-common/gomp/attrs-metadirective-3.c: Change to never
expect 'omp metadirective' in the dump. If !offload_nvptx, check
that no 'teams' shows up in the dump; for offload_nvptx, expect
OMP_NEXT_VARIANT and an error about directive between 'target'
and 'teams'.
* c-c++-common/gomp/metadirective-3.c: Likewise.
---
 gcc/testsuite/c-c++-common/gomp/attrs-metadirective-3.c | 7 ---
 gcc/testsuite/c-c++-common/gomp/metadirective-3.c   | 7 ---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/attrs-metadirective-3.c b/gcc/testsuite/c-c++-common/gomp/attrs-metadirective-3.c
index 31dd054922f..803bf0ad1eb 100644
--- a/gcc/testsuite/c-c++-common/gomp/attrs-metadirective-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/attrs-metadirective-3.c
@@ -9,7 +9,7 @@ f (int x[], int y[], int z[])
 {
   int i;
 
-  [[omp::sequence (directive (target map(to: x, y) map(from: z)),
+  [[omp::sequence (directive (target map(to: x, y) map(from: z)),  /* { dg-bogus "'target' construct with nested 'teams' construct contains directives outside of the 'teams' construct" "PR118694" { xfail offload_nvptx } }  */ 
 		   directive (metadirective
 			  when (device={arch("nvptx")}: teams loop)
 			  default (parallel loop)))]]
@@ -20,5 +20,6 @@ f (int x[], int y[], int z[])
 /* If offload device "nvptx" isn't supported, the front end can eliminate
that alternative and not produce a metadirective at all.  Otherwise this
won't be resolved until late.  */
-/* { dg-final { scan-tree-dump-not "#pragma omp metadirective" "gimple" { target { ! offload_nvptx } } } } */
-/* { dg-final { scan-tree-dump "#pragma omp metadirective" "gimple" { target { offload_nvptx } } } } */
+/* { dg-final { scan-tree-dump-not "#pragma omp metadirective" "gimple" } } */
+/* { dg-final { scan-tree-dump-not " teams" "gimple" { target { ! offload_nvptx } } } } */
+/* { dg-final { scan-tree-dump "variant.\[0-9\]+ = \\\[omp_next_variant\\\] OMP_NEXT_VARIANT <0,\[\r\n \]+construct context = 14\[\r\n \]+1: device = \\{arch \\(.nvptx.\\)\\}\[\r\n \]+2: >;" "gimple" { target { offload_nvptx } } } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/metadirective-3.c b/gcc/testsuite/c-c++-common/gomp/metadirective-3.c
index 0ac0d1d329d..b6c1601f7b1 100644
--- a/gcc/testsuite/c-c++-common/gomp/metadirective-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/metadirective-3.c
@@ -8,7 +8,7 @@ f (int x[], int y[], int z[])
 {
   int i;
 
-  #pragma omp target map(to: x, y) map(from: z)
+  #pragma omp target map(to: x, y) map(from: z)  /* { dg-bogus "'target' construct with nested 'teams' construct contains directives outside of the 'teams' construct" "PR118694" { xfail offload_nvptx } }  */ 
 #pragma omp metadirective \
 	when (device={arch("nvptx")}: teams loop) \
 	default (parallel loop)
@@ -19,5 +19,6 @@ f (int x[], int y[], int z[])
 /* If offload device "nvptx" isn't supported, the front end can eliminate
that alternative and not produce a metadirective at all.  Otherwise this
won't be resolved until late.  */
-/* { dg-final { scan-tree-dump-not "#pragma omp metadirective" "gimple" { target { ! offload_nvptx } } } } */
-/* { dg-final { scan-tree-dump "#pragma omp metadirective" "gimple" { target { offload_nvptx } } } } */
+/* { dg-final { scan-tree-dump-not "#pragma omp metadirective" "gimple" } } */
+/* { dg-final { scan-tree-dump-not " teams" "gimple" { target { ! offload_nvptx } } } } */

Re: [PATCH] fortran: add constant input support for trig functions with half-revolutions

2025-05-26 Thread Harald Anlauf


Am 26.05.25 um 18:36 schrieb Steve Kargl:

On Mon, May 26, 2025 at 09:30:59AM +, Yuao Ma wrote:

Hi Steve,


I looked at the patch in a bit more detail, and
I am not thrilled with large-scale whitespace
changes mingled with functional changes. It makes
the patch harder to read and review.


I'm not sure which file you're referring to.

If it's mathbuiltins.def, I'll need to add extra spaces to maintain argument
alignment when I add the seven new built-ins.

If it's intrinsic.cc, the issue is related to clang-format usage. You can find
a more detailed explanation at
https://gcc.gnu.org/pipermail/fortran/2025-May/062193.html.



It's intrinsic.cc.

/* Two-argument version of atand, equivalent to atan2d.  */
-  add_sym_2 ("atand", GFC_ISYM_ATAN2D, CLASS_ELEMENTAL, ACTUAL_YES,
-BT_REAL, dr, GFC_STD_F2023,
-gfc_check_atan2, gfc_simplify_atan2d, gfc_resolve_trigd2,
-y, BT_REAL, dr, REQUIRED,
-x, BT_REAL, dr, REQUIRED);
+  add_sym_2 ("atand", GFC_ISYM_ATAN2D, CLASS_ELEMENTAL, ACTUAL_YES, BT_REAL, 
dr,
+GFC_STD_F2023, gfc_check_atan2, gfc_simplify_atan2d,
+gfc_resolve_trig2, y, BT_REAL, dr, REQUIRED, x, BT_REAL, dr,
+REQUIRED);
  
What is the functional change in the above?  It is somewhat difficult

to see a single character change.  It's much easier to see (and detect
possible typos) in the following:

/* Two-argument version of atand, equivalent to atan2d.  */
add_sym_2 ("atand", GFC_ISYM_ATAN2D, CLASS_ELEMENTAL, ACTUAL_YES,
  BT_REAL, dr, GFC_STD_F2023,
-gfc_check_atan2, gfc_simplify_atan2d, gfc_resolve_trigd2,
+gfc_check_atan2, gfc_simplify_atan2d, gfc_resolve_trig2,
  y, BT_REAL, dr, REQUIRED,
  x, BT_REAL, dr, REQUIRED);
  
To be clear, do not use clang-format if it inserts large-scale

whitespace changes.  I'll also contend that that final result
in the latter is easier for a programmer to read.  The 2nd line
is return type info and standard conformation.  The 3rd line
contains the checking, simplification, and iresolve functions.
The remaining lines are the dummy arguments with one per line.



I fully agree with Steve on this.

Re: [PATCH v4 2/8] libstdc++: Implement layout_left from mdspan.

2025-05-26 Thread Luc Grosheintz





On 5/26/25 18:17, Tomasz Kaminski wrote:

On Mon, May 26, 2025 at 4:15 PM Luc Grosheintz 
wrote:


Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++-v3/ChangeLog:

 * include/std/mdspan (layout_left): New class.
 * src/c++23/std.cc.in: Add layout_left.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan  | 304 ++-
  libstdc++-v3/src/c++23/std.cc.in |   1 +
  2 files changed, 304 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 0f49b0e09a0..d81072596b4 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __exts[__i]; });
   }

+   static constexpr span
+   _S_static_extents(size_t __begin, size_t __end) noexcept
+   {
+ return {_Extents.data() + __begin, _Extents.data() + __end};
+   }


Oh, I think I was very unclear, regarding removing the dependency on
index_type.
What I was thinking of, is changing this function to:
+   static consteval array&
+   _S_static_extents() noexcept
+   {
+ return _Extents;
+   }


Sorry, this was clear. I implemented it once, then we found the
issues with overflow and while cleaning out the mess I'd made
to compute things with an __unsigned_prod, I reverted this back
to the original version.

If you look at __static_extents_prod, it already doesn't depend
on the index_type. Therefore, it a) felt like a change related
to something we'd agreed to postpone; and b) I preferred the
symmetry between __static_extents and __dynamic_extents; I'll
change it to the proposed version.




+
+   constexpr span
+   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
+   requires (_Extents.size() > 0)
+   {
+ return {_M_dyn_exts + _S_dynamic_index[__begin],
+ _M_dyn_exts + _S_dynamic_index[__end]};
+   }
+
private:
 using _S_storage = __array_traits<_IndexType,
_S_rank_dynamic>::_Type;
 [[no_unique_address]] _S_storage _M_dyn_exts{};
@@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 || _Extent <= numeric_limits<_IndexType>::max();
}

+  namespace __mdspan
+  {
+template
+  constexpr span
+  __static_extents(size_t __begin = 0, size_t __end =
_Extents::rank())
+  { return _Extents::_S_storage::_S_static_extents(__begin, __end); }


Also adjusting this one to have:
template
  constexpr const std::array&
 __static_extents()
+  { return _Extents::_S_storage::_S_static_extents; }


+
+template
+  constexpr span
+  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
+   size_t __end = _Extents::rank())
+  {
+   return __exts._M_exts._M_dynamic_extents(__begin, __end);
+  }
+  }
+
template
  class extents
  {
@@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : _M_exts(span(__exts))
 { }

-
template<__mdspan::__valid_index_type _OIndexType,
size_t _Nm>
 requires (_Nm == rank() || _Nm == rank_dynamic())
 constexpr explicit(_Nm != rank_dynamic())
@@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }

  private:
+  friend span
+  __mdspan::__static_extents(size_t, size_t);
+
+  friend span
+  __mdspan::__dynamic_extents(const extents&, size_t,
size_t);
+
using _S_storage = __mdspan::_ExtentsStorage<
 _IndexType, array{_Extents...}>;
[[no_unique_address]] _S_storage _M_exts;
@@ -286,6 +321,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

namespace __mdspan
{
+template
+  constexpr bool
+  __contains_zero(span<_Tp, _Nm> __exts)
+  {
+   for (size_t __i = 0; __i < __exts.size(); ++__i)
+ if (__exts[__i] == 0)
+   return true;
+   return false;
+  }
+
+constexpr size_t
+__static_extents_prod(const auto& __sta_exts)


Then this could be implemented as:
template
__static_extents_prod(size_t __begin, size_t __end);
// We provide the array as template parameter.
{
   }


+{
+  size_t __ret = 1;
+  for (auto __factor : span(_Ext).subspan(__begin, __end))
+   if (__factor != dynamic_extent)
+ __ret *= __factor;
+  return __ret;
+}
+
+template
+  constexpr typename _Extents::index_type
+  __exts_prod(const _Extents& __exts, size_t __begin, size_t __end)
noexcept
+  {
+   using _IndexType = typename _Extents::index_type;
+
+   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
+   size_t __sta_prod = __static_extents_prod(__sta_exts);


And then it will be called as follows:
constexpr auto& __sta_exts = __static_extents<_Extents>();
 size_t __sta_prod = __static_extents_prod<__sta_exts>(__begin, __end);
This way __static_extents_prod will not depend on

Re: Fwd: [PATCH] testsuite: Fix up dg-do-if

2025-05-26 Thread Alexandre Oliva

On May 26, 2025, Xi Ruoyao  wrote:

> gcc/testsuite/ChangeLog:

>   * lib/target-supports-dg.exp (dg-do-if): Pass the line number
> to
>   dg-do.

Thanks!  I support that fix, FWIW.

Indeed, an identical fix was included (but remains unreviewed) in
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680779.html

Maybe I should have split it out of the ppc series?  ppc maintainers
haven't been very responsive (to testsuite-related changes?), and it's
not a ppc-specific change.  I expected testsuite maintainers would have
reviewed and approved it, but...  not yet :-(

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

Re: [PATCH] gimple-fold: Implement simple copy propagation for aggregates [PR14295]

2025-05-26 Thread Andrew Pinski

On Mon, May 26, 2025 at 5:36 AM Richard Biener
 wrote:
>
> On Sun, May 18, 2025 at 10:58 PM Andrew Pinski  
> wrote:
> >
> > This implements a simple copy propagation for aggregates in the similar
> > fashion as we already do for copy prop of zeroing.
> >
> > Right now this only looks at the previous vdef statement but this allows us
> > to catch a lot of cases that show up in C++ code.
> >
> > Also deletes aggregate copies that are to the same location (PR57361), this 
> > was
> > already done in DSE but we should do it here also since it is simple to add 
> > and
> > when doing a copy to a temporary and back to itself should be deleted too.
> > So we need a variant that tests DSE and one for forwprop.
> >
> > Also adds a variant of pr22237.c which was found while working on this 
> > patch.
> >
> > PR tree-optimization/14295
> > PR tree-optimization/108358
> > PR tree-optimization/114169
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-forwprop.cc (optimize_agr_copyprop): New function.
> > (pass_forwprop::execute): Call optimize_agr_copyprop for load/store 
> > statements.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/20031106-6.c: Un-xfail. Add scan for forwprop1.
> > * g++.dg/opt/pr66119.C: Disable forwprop since that does
> > the copy prop now.
> > * gcc.dg/tree-ssa/pr108358-a.c: New test.
> > * gcc.dg/tree-ssa/pr114169-1.c: New test.
> > * gcc.c-torture/execute/builtins/pr22237-1-lib.c: New test.
> > * gcc.c-torture/execute/builtins/pr22237-1.c: New test.
> > * gcc.dg/tree-ssa/pr57361.c: Disable forwprop1.
> > * gcc.dg/tree-ssa/pr57361-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/testsuite/g++.dg/opt/pr66119.C|   2 +-
> >  .../execute/builtins/pr22237-1-lib.c  |  27 +
> >  .../execute/builtins/pr22237-1.c  |  57 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/20031106-6.c|   8 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/pr108358-a.c|  33 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr114169-1.c|  39 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr57361-1.c |   9 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr57361.c   |   2 +-
> >  gcc/tree-ssa-forwprop.cc  | 103 ++
> >  9 files changed, 276 insertions(+), 4 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr108358-a.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr114169-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr57361-1.c
> >
> > diff --git a/gcc/testsuite/g++.dg/opt/pr66119.C 
> > b/gcc/testsuite/g++.dg/opt/pr66119.C
> > index d1b1845a258..52362e44434 100644
> > --- a/gcc/testsuite/g++.dg/opt/pr66119.C
> > +++ b/gcc/testsuite/g++.dg/opt/pr66119.C
> > @@ -3,7 +3,7 @@
> > the value of MOVE_RATIO now is.  */
> >
> >  /* { dg-do compile  { target { { i?86-*-* x86_64-*-* } && c++11 } }  }  */
> > -/* { dg-options "-O3 -mavx -fdump-tree-sra -march=slm -mtune=slm 
> > -fno-early-inlining" } */
> > +/* { dg-options "-O3 -mavx -fdump-tree-sra -fno-tree-forwprop -march=slm 
> > -mtune=slm -fno-early-inlining" } */
> >  // { dg-skip-if "requires hosted libstdc++ for cstdlib malloc" { ! 
> > hostedlib } }
> >
> >  #include 
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c 
> > b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
> > new file mode 100644
> > index 000..44032357405
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
> > @@ -0,0 +1,27 @@
> > +extern void abort (void);
> > +
> > +void *
> > +memcpy (void *dst, const void *src, __SIZE_TYPE__ n)
> > +{
> > +  const char *srcp;
> > +  char *dstp;
> > +
> > +  srcp = src;
> > +  dstp = dst;
> > +
> > +  if (dst < src)
> > +{
> > +  if (dst + n > src)
> > +   abort ();
> > +}
> > +  else
> > +{
> > +  if (src + n > dst)
> > +   abort ();
> > +}
> > +
> > +  while (n-- != 0)
> > +*dstp++ = *srcp++;
> > +
> > +  return dst;
> > +}
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c 
> > b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
> > new file mode 100644
> > index 000..0a12b0fc9a1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
> > @@ -0,0 +1,57 @@
> > +extern void abort (void);
> > +extern void exit (int);
> > +struct s { unsigned char a[256]; };
> > +union u { struct { struct s b; int c; } d; struct { int c; struct s b; } 
> > e; };
> > +static union u v;
> > +static union u v0;
> > +static struct s *p = &v.d.b;
> > +static struct s *q = &v.e.b;
> > +
> > +struct outers
> > +{
> > +  struct s inner;
> > +};
> > +
> > +static inline struct s rp (void) { return *p; }
> > +static i

Re: [PATCH] gimple-fold: Implement simple copy propagation for aggregates [PR14295]

2025-05-26 Thread Andrew Pinski

On Mon, May 26, 2025 at 1:40 PM Andrew Pinski  wrote:
>
> On Mon, May 26, 2025 at 5:36 AM Richard Biener
>  wrote:
> >
> > On Sun, May 18, 2025 at 10:58 PM Andrew Pinski  
> > wrote:
> > >
> > > This implements a simple copy propagation for aggregates in the similar
> > > fashion as we already do for copy prop of zeroing.
> > >
> > > Right now this only looks at the previous vdef statement but this allows 
> > > us
> > > to catch a lot of cases that show up in C++ code.
> > >
> > > Also deletes aggregate copies that are to the same location (PR57361), 
> > > this was
> > > already done in DSE but we should do it here also since it is simple to 
> > > add and
> > > when doing a copy to a temporary and back to itself should be deleted too.
> > > So we need a variant that tests DSE and one for forwprop.
> > >
> > > Also adds a variant of pr22237.c which was found while working on this 
> > > patch.
> > >
> > > PR tree-optimization/14295
> > > PR tree-optimization/108358
> > > PR tree-optimization/114169
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-forwprop.cc (optimize_agr_copyprop): New function.
> > > (pass_forwprop::execute): Call optimize_agr_copyprop for 
> > > load/store statements.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/20031106-6.c: Un-xfail. Add scan for forwprop1.
> > > * g++.dg/opt/pr66119.C: Disable forwprop since that does
> > > the copy prop now.
> > > * gcc.dg/tree-ssa/pr108358-a.c: New test.
> > > * gcc.dg/tree-ssa/pr114169-1.c: New test.
> > > * gcc.c-torture/execute/builtins/pr22237-1-lib.c: New test.
> > > * gcc.c-torture/execute/builtins/pr22237-1.c: New test.
> > > * gcc.dg/tree-ssa/pr57361.c: Disable forwprop1.
> > > * gcc.dg/tree-ssa/pr57361-1.c: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/testsuite/g++.dg/opt/pr66119.C|   2 +-
> > >  .../execute/builtins/pr22237-1-lib.c  |  27 +
> > >  .../execute/builtins/pr22237-1.c  |  57 ++
> > >  gcc/testsuite/gcc.dg/tree-ssa/20031106-6.c|   8 +-
> > >  gcc/testsuite/gcc.dg/tree-ssa/pr108358-a.c|  33 ++
> > >  gcc/testsuite/gcc.dg/tree-ssa/pr114169-1.c|  39 +++
> > >  gcc/testsuite/gcc.dg/tree-ssa/pr57361-1.c |   9 ++
> > >  gcc/testsuite/gcc.dg/tree-ssa/pr57361.c   |   2 +-
> > >  gcc/tree-ssa-forwprop.cc  | 103 ++
> > >  9 files changed, 276 insertions(+), 4 deletions(-)
> > >  create mode 100644 
> > > gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
> > >  create mode 100644 
> > > gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr108358-a.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr114169-1.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr57361-1.c
> > >
> > > diff --git a/gcc/testsuite/g++.dg/opt/pr66119.C 
> > > b/gcc/testsuite/g++.dg/opt/pr66119.C
> > > index d1b1845a258..52362e44434 100644
> > > --- a/gcc/testsuite/g++.dg/opt/pr66119.C
> > > +++ b/gcc/testsuite/g++.dg/opt/pr66119.C
> > > @@ -3,7 +3,7 @@
> > > the value of MOVE_RATIO now is.  */
> > >
> > >  /* { dg-do compile  { target { { i?86-*-* x86_64-*-* } && c++11 } }  }  
> > > */
> > > -/* { dg-options "-O3 -mavx -fdump-tree-sra -march=slm -mtune=slm 
> > > -fno-early-inlining" } */
> > > +/* { dg-options "-O3 -mavx -fdump-tree-sra -fno-tree-forwprop -march=slm 
> > > -mtune=slm -fno-early-inlining" } */
> > >  // { dg-skip-if "requires hosted libstdc++ for cstdlib malloc" { ! 
> > > hostedlib } }
> > >
> > >  #include 
> > > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c 
> > > b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
> > > new file mode 100644
> > > index 000..44032357405
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1-lib.c
> > > @@ -0,0 +1,27 @@
> > > +extern void abort (void);
> > > +
> > > +void *
> > > +memcpy (void *dst, const void *src, __SIZE_TYPE__ n)
> > > +{
> > > +  const char *srcp;
> > > +  char *dstp;
> > > +
> > > +  srcp = src;
> > > +  dstp = dst;
> > > +
> > > +  if (dst < src)
> > > +{
> > > +  if (dst + n > src)
> > > +   abort ();
> > > +}
> > > +  else
> > > +{
> > > +  if (src + n > dst)
> > > +   abort ();
> > > +}
> > > +
> > > +  while (n-- != 0)
> > > +*dstp++ = *srcp++;
> > > +
> > > +  return dst;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c 
> > > b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
> > > new file mode 100644
> > > index 000..0a12b0fc9a1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/pr22237-1.c
> > > @@ -0,0 +1,57 @@
> > > +extern void abort (void);
> > > +extern void exit (int);
> > > +struct s { unsigned char a[256]; };
> > > +union u

Re: [AUTOFDO] Merge profiles of clones before annotating

2025-05-26 Thread Kugan Vivekanandarajah

Hi Homza.


> On 26 May 2025, at 7:48 pm, Jan Hubicka  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> 
>> 
>>> On 26 May 2025, at 5:34 pm, Jan Hubicka  wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Hi,
>>> also, please, can you add an testcase?  We should have some coverage for
>>> auto-fdo specific issues
>> I was looking for this too. AFIK we dont do any testing currently.
>> We could
>> 
>> 1. Add gcov files as part of the test. However, This would make updating 
>> gcov versions difficult.
>> 2.We could add execution test that also uses autfdo tools to generate .gcov. 
>> This would make them slow.
>> Also we may not be able to match exact profile values and only see if  afdo 
>> annotations are there.
> 
> There is a testuiste coverage, but currently enabled only for Intel
> based x86_64 CPUs and I think no-one runs it regularly.  To get AutoFDO
> into a good shape we definitely need to enable it on more setup and also
> start testing/benmarking regularly.
I will look into  this. We also want to enable it for aacrh64.
> 
> For a long time I had no easy access for CPU with AutoFDO support, but
> now I have zen3 based desktop and also use zen5 based box for testing.
> I think the attached patch makes testuite do the right hting on AMD Zens 3,4 
> and 5.
> 
> I get following failures on Zen5:
> FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining 
> add1/1 into main/4."
> FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining 
> sub1/2 into main/4."
> FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function 
> ..;"
> FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 
> times"
> FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 
> times"
> 
> while on Intel CPU I get:
> FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining 
> add1/1 into main/4."
> FAIL: gcc.dg/tree-prof/indir-call-prof-2.c scan-ipa-dump afdo "Inlining 
> sub1/2 into main/4."
> FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/tree-prof/inliner-1.c scan-tree-dump optimized "cold_function 
> ..;"
> FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 
> times"
> FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump cunroll "Peeled loop 2, 1 
> times"
> FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled likely exits: 
> likely decreased number of iterations of loop 1"
> FAIL: gcc.dg/tree-prof/peel-2.c scan-tree-dump ch2 "Peeled all exits: 
> decreased number of iterations of loop 2"
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-tree-dump-not optimized 
> "Invalid sum"
> 
> I did not dive yet into where the difference scome from.
> 
> Andy, does the patch makes sense to you?  I simply followed kernel's
> auto-fdo instructions for clang and built current git version of
> create_gcov.  In the past I always had troubles to get create_gcov
> working with version of perf distributted by open-suse, but this time it
> seems to work even though it complains:
> 
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1322]
>  Skipping 4228 bytes of metadata: HEADER_CPU_TOPOLOGY
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
>  Skipping unsupported event PERF_RECORD_ID_INDEX
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
>  Skipping unsupported event PERF_RECORD_EVENT_UPDATE
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
>  Skipping unsupported event PERF_RECORD_CPU_MAP
> [WARNING:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1069]
>  Skipping unsupported event UNKNOWN_EVENT_82
> [INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1060]
>  Number of events stored: 2178
> [INFO:/home/jh/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:272]
>  Parser processed: 5 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT 
> events, 2108 SAMPLE events, 2099 of these were mapped, 0 SAMPLE events with a 
> data address, 0 of these were mapped
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I20250525 22:10:18.478610 1692721 sample_reader.cc:289] No buildid found in 
> binary
> W20250525 22:10:18.479000 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1050->0 index=4
> W20250525 22:10:18.479007 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1057->0 index=2
> W20250525 22:10:18.479010 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1050->0 index=6
> W20250525 22:10:18.479013 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1050->0 index=6
> W20250525 22:10:18.479017 1692721 sample_reader.cc:345] Bogus LBR data (range 
> is negative): 1057->0 index=8

[PATCH v4 1/8] libstdc++: Improve naming and whitespace for extents.

2025-05-26 Thread Luc Grosheintz

libstdc++-v3/ChangeLog:

* include/std/mdspan(__mdspan::_ExtentsStorage): Change name
of private member _M_dynamic_extens to _M_dyn_exts.
* include/std/mdspan(extents): Change name of private member
from _M_dynamic_extents to _M_exts.
* include/std/mdspan: Fix two instances of
whitespace errors: `for(` -> `for (`.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index bcf2fa60fea..0f49b0e09a0 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -69,12 +69,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
//
// If __r is the index of a dynamic extent, then
// _S_dynamic_index[__r] is the index of that extent in
-   // _M_dynamic_extents.
+   // _M_dyn_exts.
static constexpr auto _S_dynamic_index = [] consteval
{
  array __ret;
  size_t __dyn = 0;
- for(size_t __i = 0; __i < _S_rank; ++__i)
+ for (size_t __i = 0; __i < _S_rank; ++__i)
{
  __ret[__i] = __dyn;
  __dyn += _S_is_dyn(_Extents[__i]);
@@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  auto __se = _Extents[__r];
  if (__se == dynamic_extent)
-   return _M_dynamic_extents[_S_dynamic_index[__r]];
+   return _M_dyn_exts[_S_dynamic_index[__r]];
  else
return __se;
}
@@ -114,12 +114,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  constexpr void
  _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
  {
-   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
+   for (size_t __i = 0; __i < _S_rank_dynamic; ++__i)
  {
size_t __di = __i;
if constexpr (_OtherRank != _S_rank_dynamic)
  __di = _S_dynamic_index_inv[__i];
-   _M_dynamic_extents[__i] = _S_int_cast(__get_extent(__di));
+   _M_dyn_exts[__i] = _S_int_cast(__get_extent(__di));
  }
  }
 
@@ -146,7 +146,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   private:
using _S_storage = __array_traits<_IndexType, _S_rank_dynamic>::_Type;
-   [[no_unique_address]] _S_storage _M_dynamic_extents{};
+   [[no_unique_address]] _S_storage _M_dyn_exts{};
   };
 
 template
@@ -197,7 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if constexpr (rank() == 0)
  __builtin_trap();
else
- return _M_dynamic_extents._M_extent(__r);
+ return _M_exts._M_extent(__r);
   }
 
   constexpr
@@ -233,14 +233,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
requires (_S_is_compatible_extents<_OExtents...>())
constexpr explicit(_S_ctor_explicit<_OIndexType, _OExtents...>())
extents(const extents<_OIndexType, _OExtents...>& __other) noexcept
-   : _M_dynamic_extents(__other._M_dynamic_extents)
+   : _M_exts(__other._M_exts)
{ }
 
   template<__mdspan::__valid_index_type... _OIndexTypes>
requires (sizeof...(_OIndexTypes) == rank()
  || sizeof...(_OIndexTypes) == rank_dynamic())
constexpr explicit extents(_OIndexTypes... __exts) noexcept
-   : _M_dynamic_extents(span(
+   : _M_exts(span(
initializer_list{_S_storage::_S_int_cast(__exts)...}))
{ }
 
@@ -248,7 +248,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
requires (_Nm == rank() || _Nm == rank_dynamic())
constexpr explicit(_Nm != rank_dynamic())
extents(span<_OIndexType, _Nm> __exts) noexcept
-   : _M_dynamic_extents(span(__exts))
+   : _M_exts(span(__exts))
{ }
 
 
@@ -256,7 +256,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
requires (_Nm == rank() || _Nm == rank_dynamic())
constexpr explicit(_Nm != rank_dynamic())
extents(const array<_OIndexType, _Nm>& __exts) noexcept
-   : _M_dynamic_extents(span(__exts))
+   : _M_exts(span(__exts))
{ }
 
   template
@@ -278,7 +278,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   using _S_storage = __mdspan::_ExtentsStorage<
_IndexType, array{_Extents...}>;
-  [[no_unique_address]] _S_storage _M_dynamic_extents;
+  [[no_unique_address]] _S_storage _M_exts;
 
   template
friend class extents;
-- 
2.49.0

RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer

2025-05-26 Thread Richard Biener

On Mon, 19 May 2025, Tamar Christina wrote:

> > >/* Complete the target-specific cost calculations.  */
> > >loop_vinfo->vector_costs->finish_cost (loop_vinfo->scalar_costs);
> > >vec_prologue_cost = loop_vinfo->vector_costs->prologue_cost ();
> > > @@ -12373,6 +12394,13 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> > gimple *loop_vectorized_call)
> > >   dump_printf_loc (MSG_NOTE, vect_location, "Disabling unrolling due to"
> > >" variable-length vectorization factor\n");
> > >  }
> > > +
> > > +  /* When we have unrolled the loop due to a user requested value we 
> > > should
> > > + leave it up to the RTL unroll heuristics to determine if it's still 
> > > worth
> > > + while to unroll more.  */
> > > +  if (LOOP_VINFO_USER_UNROLL (loop_vinfo))
> > 
> > What I meant with copying of LOOP_VINFO_USER_UNROLL is that I think
> > you'll never get to this being true as you set the suggested unroll
> > factor for the costing attempt of the not extra unrolled loop but
> > the transform where you want to reset is is when the unrolling
> > was actually applied?
> 
> It was being set on every analysis of the main loop body.  Since it wasn't
> actually cleared until we've picked a mode and did codegen the condition would
> be true.
> 
> However..
> 
> > 
> > That said, it would be clearer if LOOP_VINFO_USER_UNROLL would be
> > set in vect_analyze_loop_1 where we have
> > 
> 
> I agree this is much nicer.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * doc/extend.texi: Document pragma unroll interaction with vectorizer.
>   * tree-vectorizer.h (LOOP_VINFO_USER_UNROLL): New.
>   (class _loop_vec_info): Add user_unroll.
>   * tree-vect-loop.cc (vect_analyze_loop_1 ): Set
>   suggested_unroll_factor and retry.
>   (_loop_vec_info::_loop_vec_info): Initialize user_unroll.
>   (vect_transform_loop): Clear the loop->unroll value if the pragma was
>   used.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/unroll-vect.c: New test.
> 
> -- inline copy of patch --
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 
> e87a3c271f8420d8fd175823b5bb655f76c89afe..f8261d13903afc90d3341c09ab3fdbd0ab96ea49
>  100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -10398,6 +10398,11 @@ unrolled @var{n} times regardless of any commandline 
> arguments.
>  When the option is @var{preferred} then the user is allowed to override the
>  unroll amount through commandline options.
>  
> +If the loop was vectorized the unroll factor specified will be used to seed 
> the
> +vectorizer unroll factor.  Whether the loop is unrolled or not will be
> +determined by target costing.  The resulting vectorized loop may still be
> +unrolled more in later passes depending on the target costing.
> +
>  @end table
>  
>  @node Thread-Local
> diff --git a/gcc/testsuite/gcc.target/aarch64/unroll-vect.c 
> b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c
> new file mode 100644
> index 
> ..3cb774ba95787ebee488fbe7306299ef28e6bb35
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -march=armv8-a --param 
> aarch64-autovec-preference=asimd-only -std=gnu99" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +/*
> +** f1:
> +**   ...
> +**   add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**   ...
> +*/
> +void f1 (int *restrict a, int n)
> +{
> +#pragma GCC unroll 16
> +  for (int i = 0; i < n; i++)
> +a[i] *= 2;
> +}
> +
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..f215b6bc7881e7e659272cefbe3d5c8892ef768c
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1073,6 +1073,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>  peeling_for_gaps (false),
>  peeling_for_niter (false),
>  early_breaks (false),
> +user_unroll (false),
>  no_data_dependencies (false),
>  has_mask_store (false),
>  scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -3428,27 +3429,51 @@ vect_analyze_loop_1 (class loop *loop, 
> vec_info_shared *shared,
>res ? "succeeded" : "failed",
>GET_MODE_NAME (loop_vinfo->vector_mode));
>  
> -  if (res && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) && suggested_unroll_factor 
> > 1)
> +  auto user_unroll = LOOP_VINFO_LOOP (loop_vinfo)->unroll;
> +  if (res && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  /* Check to see if the user wants to unroll or if the target wants t

[PATCH v4 0/8] Implement layouts from mdspan.

2025-05-26 Thread Luc Grosheintz

This follows up on:
https://gcc.gnu.org/pipermail/libstdc++/2025-May/061572.html

Note that this patch series can only be applied after merging:
https://gcc.gnu.org/pipermail/libstdc++/2025-May/061653.html

The important changes since v3 are:
  * Fixed and testsed several related overflow issues that occured in
extents of size 0 by using `size_t` to compute products.
  * Fixed and tested default ctors.
  * Add missing code for module support.
  * Documented deviation from standard.

The smaller changes include:
  * Squashed the three small commits that make cosmetic changes to
std::extents.
  * Remove layout_left related changes from the layout_stride commit.
  * Remove superfluous `mapping(extents_type(__exts))`.
  * Fix indenting and improve comment in layout_stride.
  * Add an easy check for representable required_span_size to
layout_stride.
  * Inline __dynamic_extents_prod

Thank you Tomasz for all the great reviews!

Luc Grosheintz (8):
  libstdc++: Improve naming and whitespace for extents.
  libstdc++: Implement layout_left from mdspan.
  libstdc++: Add tests for layout_left.
  libstdc++: Implement layout_right from mdspan.
  libstdc++: Add tests for layout_right.
  libstdc++: Implement layout_stride from mdspan.
  libstdc++: Add tests for layout_stride.
  libstdc++: Make layout_left(layout_stride) noexcept.

 libstdc++-v3/include/std/mdspan   | 711 +-
 libstdc++-v3/src/c++23/std.cc.in  |   5 +-
 .../mdspan/layouts/class_mandate_neg.cc   |  42 ++
 .../23_containers/mdspan/layouts/ctors.cc | 459 +++
 .../23_containers/mdspan/layouts/empty.cc |  78 ++
 .../23_containers/mdspan/layouts/mapping.cc   | 568 ++
 .../23_containers/mdspan/layouts/stride.cc| 500 
 7 files changed, 2349 insertions(+), 14 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layouts/empty.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc

-- 
2.49.0

[PATCH v4 7/8] libstdc++: Add tests for layout_stride.

2025-05-26 Thread Luc Grosheintz

Implements the tests for layout_stride and for the features of the other
two layouts that depend on layout_stride.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
tests for layout_stride.
* testsuite/23_containers/mdspan/layouts/ctors.cc: Add test for
layout_stride and the interaction with other layouts.
* testsuite/23_containers/mdspan/layouts/mapping.cc: Ditto.
* testsuite/23_containers/mdspan/layouts/stride.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan   |   5 +-
 .../mdspan/layouts/class_mandate_neg.cc   |  19 +
 .../23_containers/mdspan/layouts/ctors.cc | 114 
 .../23_containers/mdspan/layouts/empty.cc |   9 +-
 .../23_containers/mdspan/layouts/mapping.cc   |  75 ++-
 .../23_containers/mdspan/layouts/stride.cc| 500 ++
 6 files changed, 718 insertions(+), 4 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index d5f613a19fd..33ad5070a37 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -792,7 +792,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
else
  {
auto __impl = [&__m](index_sequence<_Counts...>)
-   { return __m(((void) _Counts, _IndexType(0))...); };
+ { return __m(((void) _Counts, _IndexType(0))...); };
return __impl(make_index_sequence<__rank>());
  }
   }
@@ -890,7 +890,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   extents() const noexcept { return _M_extents; }
 
   constexpr array
-  strides() const noexcept {
+  strides() const noexcept
+  {
array __ret;
for (size_t __i = 0; __i < extents_type::rank(); ++__i)
  __ret[__i] = _M_strides[__i];
diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
index a41bad988d2..0e39bd3aab0 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -17,7 +17,26 @@ template
 typename Layout::mapping m3; // { dg-error "required from" }
   };
 
+template
+  struct B // { dg-error "expansion of" }
+  {
+using Extents = std::extents;
+using OExtents = std::extents;
+
+using Mapping = typename Layout::mapping;
+using OMapping = typename Layout::mapping;
+
+Mapping m{OMapping{}};
+  };
+
 A a_left; // { dg-error "required from" }
 A a_right;   // { dg-error "required from" }
+A a_stride; // { dg-error "required from" }
+
+B<1, std::layout_left, std::layout_right> blr; // { dg-error "required 
here" }
+B<2, std::layout_left, std::layout_stride> bls;// { dg-error "required 
here" }
+
+B<3, std::layout_right, std::layout_left> brl; // { dg-error "required 
here" }
+B<4, std::layout_right, std::layout_stride> brs;   // { dg-error "required 
here" }
 
 // { dg-prune-output "must be representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
index cc719dfee10..2507eeaf7a1 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -80,6 +80,20 @@ namespace default_ctor
VERIFY(m.extents().extent(i) == 0);
 }
 
+  template
+constexpr void
+verify_default_stride(Mapping m, size_t i)
+{
+  using Layout = typename Mapping::layout_type;
+  using Extents = typename Mapping::extents_type;
+
+  if constexpr (std::is_same_v)
+   {
+ std::layout_right::mapping mr;
+ VERIFY(m.stride(i) == mr.stride(i));
+   }
+}
+
   template
 constexpr void
 test_default_ctor()
@@ -91,6 +105,7 @@ namespace default_ctor
   for(size_t i = 0; i < Extents::rank(); ++i)
{
  verify_default_extent(m, i);
+ verify_default_stride(m, i);
}
 }
 
@@ -329,6 +344,104 @@ namespace from_left_or_right
 }
 }
 
+// ctor: mapping(layout_stride::mapping)
+namespace from_stride
+{
+  template
+constexpr auto
+strides(Mapping m)
+{
+  constexpr auto rank = Mapping::extents_type::rank();
+  std::array s;
+
+  if constexpr (rank > 0)
+   for(size_t i = 0; i < rank; ++i)
+ s[i] = m.stride(i);
+  return s;
+}
+
+  template
+constexpr void
+verify_convertible(OExtents oexts)
+{
+  using Mapping = typename Layout::mapping;
+  using OMapping = std::layout_stride::mapping;
+
+  constexpr auto other = OMapping(oexts, strides(Mapping(Extents(oexts;
+  if constexpr (std::is_same_v)
+   ::verify_n

RE: [PATCH 1/2]middle-end: Add new parameter to scale scalar loop costing in vectorizer

2025-05-26 Thread Richard Biener

On Mon, 19 May 2025, Tamar Christina wrote:

> > > +-param=vect-scalar-cost-multiplier=
> > > +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(1)
> > IntegerRange(0, 10) Param Optimization
> > > +The scaling multiplier to add to all scalar loop costing when performing
> > vectorization profitability analysis.  The default value is 1.
> > > +
> > 
> > Note this only allows whole number scaling.  May I suggest to instead
> > use percentage as unit, thus the multiplier is --param
> > param_vect_scalar_cost_multiplier / 100?
> > 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * params.opt (vect-scalar-cost-multiplier): New.
>   * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it.
>   * doc/invoke.texi (vect-scalar-cost-multiplier): Document it.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/cost_model_16.c: New test.
> 
> -- inline copy of patch --
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 699ee1cc0b7580d4729bbefff8f897eed1c3e49b..95a25c0f63b77f26db05a7b48bfad8f9c58bcc5f
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17273,6 +17273,10 @@ this parameter.  The default value of this parameter 
> is 50.
>  @item vect-induction-float
>  Enable loop vectorization of floating point inductions.
>  
> +@item vect-scalar-cost-multiplier
> +Apply the given multiplier % to scalar loop costing during vectorization.
> +Increasing the cost multiplier will make vector loops more profitable.
> +
>  @item vrp-block-limit
>  Maximum number of basic blocks before VRP switches to a lower memory 
> algorithm.
>  
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 
> 1f0abeccc4b9b439ad4a4add6257b4e50962863d..a67f900a63f7187b1daa593fe17cd88f2fc32367
>  100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1253,6 +1253,10 @@ The maximum factor which the loop vectorizer applies 
> to the cost of statements i
>  Common Joined UInteger Var(param_vect_induction_float) Init(1) 
> IntegerRange(0, 1) Param Optimization
>  Enable loop vectorization of floating point inductions.
>  
> +-param=vect-scalar-cost-multiplier=
> +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(100) 
> IntegerRange(0, 1) Param Optimization
> +The scaling multiplier as a percentage to apply to all scalar loop costing 
> when performing vectorization profitability analysis.  The default value is 
> 100.
> +
>  -param=vrp-block-limit=
>  Common Joined UInteger Var(param_vrp_block_limit) Init(15) Optimization 
> Param
>  Maximum number of basic blocks before VRP switches to a fast model with less 
> memory requirements.
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
> new file mode 100644
> index 
> ..c405591a101d50b4734bc6d65a6d6c01888bea48
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -march=armv8-a+sve -mmax-vectorization 
> -fdump-tree-vect-details" } */
> +
> +void
> +foo (char *restrict a, int *restrict b, int *restrict c,
> + int *restrict d, int stride)
> +{
> +if (stride <= 1)
> +return;
> +
> +for (int i = 0; i < 3; i++)
> +{
> +int res = c[i];
> +int t = b[i * stride];
> +if (a[i] != 0)
> +res = t * d[i];
> +c[i] = res;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..c18e75794046f506c473b36639e6ae6658a5516b
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4646,7 +4646,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>   TODO: Consider assigning different costs to different scalar
>   statements.  */
>  
> -  scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost ();
> +  scalar_single_iter_cost = (loop_vinfo->scalar_costs->total_cost ()
> +  * param_vect_scalar_cost_multiplier) / 100;
>  
>/* Add additional cost for the peeled instructions in prologue and epilogue
>   loop.  (For fully-masked loops there will be no peeling.)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH v4 6/8] libstdc++: Implement layout_stride from mdspan.

2025-05-26 Thread Luc Grosheintz

Implements the remaining parts of layout_left and layout_right; and all
of layout_stride.

The implementation of layout_stride::mapping::is_exhaustive applies
the following change to the standard:

  4266. layout_stride::mapping should treat empty mappings as exhaustive

https://cplusplus.github.io/LWG/issue4266

libstdc++-v3/ChangeLog:

* include/std/mdspan(layout_stride): New class.
* src/c++23/std.cc.in: Add layout_right.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan  | 229 ++-
 libstdc++-v3/src/c++23/std.cc.in |   3 +-
 2 files changed, 230 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 7daa0713716..d5f613a19fd 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -403,6 +403,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class mapping;
   };
 
+  struct layout_stride
+  {
+template
+  class mapping;
+  };
+
   namespace __mdspan
   {
 template
@@ -496,7 +502,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 template
   concept __standardized_mapping = __mapping_of
-  || __mapping_of;
+  || __mapping_of
+  || __mapping_of;
 
 template
   concept __mapping_like = requires
@@ -554,6 +561,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: mapping(__other.extents(), __mdspan::__internal_ctor{})
{ }
 
+  template
+   requires (is_constructible_v)
+   constexpr explicit(extents_type::rank() > 0)
+   mapping(const layout_stride::mapping<_OExtents>& __other)
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { __glibcxx_assert(*this == __other); }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -684,6 +698,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: mapping(__other.extents(), __mdspan::__internal_ctor{})
{ }
 
+  template
+   requires (is_constructible_v)
+   constexpr explicit(extents_type::rank() > 0)
+   mapping(const layout_stride::mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { __glibcxx_assert(*this == __other); }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -757,6 +778,212 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
[[no_unique_address]] _Extents _M_extents{};
 };
 
+  namespace __mdspan
+  {
+template
+  constexpr typename _Mapping::index_type
+  __offset(const _Mapping& __m) noexcept
+  {
+   using _IndexType = typename _Mapping::index_type;
+   constexpr auto __rank = _Mapping::extents_type::rank();
+
+   if constexpr (__standardized_mapping<_Mapping>)
+ return 0;
+   else
+ {
+   auto __impl = [&__m](index_sequence<_Counts...>)
+   { return __m(((void) _Counts, _IndexType(0))...); };
+   return __impl(make_index_sequence<__rank>());
+ }
+  }
+
+template
+  constexpr typename _Mapping::index_type
+  __linear_index_strides(const _Mapping& __m,
+_Indices... __indices)
+  {
+   using _IndexType = typename _Mapping::index_type;
+   _IndexType __res = 0;
+   if constexpr (sizeof...(__indices) > 0)
+ {
+   auto __update = [&, __pos = 0u](_IndexType __idx) mutable
+ {
+   __res += __idx * __m.stride(__pos++);
+ };
+   (__update(__indices), ...);
+ }
+   return __res;
+  }
+  }
+
+  template
+class layout_stride::mapping
+{
+public:
+  using extents_type = _Extents;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using layout_type = layout_stride;
+
+  static_assert(__mdspan::__representable_size<_Extents, index_type>,
+   "The size of extents_type must be representable as index_type");
+
+  constexpr
+  mapping() noexcept
+  {
+   size_t __stride = 1;
+   for (size_t __i = extents_type::rank(); __i > 0; --__i)
+ {
+   _M_strides[__i - 1] = index_type(__stride);
+   __stride *= size_t(_M_extents.extent(__i - 1));
+ }
+  }
+
+  constexpr
+  mapping(const mapping&) noexcept = default;
+
+  template<__mdspan::__valid_index_type _OIndexType>
+   constexpr
+   mapping(const extents_type& __exts,
+   span<_OIndexType, extents_type::rank()> __strides) noexcept
+   : _M_extents(__exts)
+   {
+ for (size_t __i = 0; __i < extents_type::rank(); ++__i)
+   _M_strides[__i] = index_type(as_const(__strides[__i]));
+   }
+
+  template<__mdspan::__valid_index_type _OIndexType>
+   constexpr
+   mapping(const extents_type&

[PATCH v4 5/8] libstdc++: Add tests for layout_right.

2025-05-26 Thread Luc Grosheintz

Adds tests for layout_right and for the parts of layout_left that depend
on layout_right.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: Add
tests for layout_stride.
* testsuite/23_containers/mdspan/layouts/ctors.cc: Add tests for
layout_right and the interaction with layout_left.
* testsuite/23_containers/mdspan/layouts/mapping.cc: ditto.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/layouts/class_mandate_neg.cc   |  1 +
 .../23_containers/mdspan/layouts/ctors.cc | 64 +++
 .../23_containers/mdspan/layouts/empty.cc |  1 +
 .../23_containers/mdspan/layouts/mapping.cc   | 78 ---
 4 files changed, 134 insertions(+), 10 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
index b276fbd333e..a41bad988d2 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -18,5 +18,6 @@ template
   };
 
 A a_left; // { dg-error "required from" }
+A a_right;   // { dg-error "required from" }
 
 // { dg-prune-output "must be representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
index 18d9743a57b..cc719dfee10 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -269,6 +269,66 @@ namespace from_same_layout
 }
 }
 
+// ctor: mapping(layout_{right,left}::mapping)
+namespace from_left_or_right
+{
+  template
+constexpr void
+verify_ctor(OExtents oexts)
+{
+  using SMapping = typename SLayout::mapping;
+  using OMapping = typename OLayout::mapping;
+
+  constexpr bool expected = std::is_convertible_v;
+  if constexpr (expected)
+   verify_nothrow_convertible(OMapping(oexts));
+  else
+   verify_nothrow_constructible(OMapping(oexts));
+}
+
+  template
+constexpr bool
+test_ctor()
+{
+  assert_not_constructible<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>();
+
+  verify_ctor>(
+   std::extents{});
+
+  verify_ctor>(
+   std::extents{});
+
+  assert_not_constructible<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>();
+
+  verify_ctor>(
+   std::extents{});
+
+  verify_ctor>(
+   std::extents{});
+
+  verify_ctor>(
+   std::extents{});
+
+  assert_not_constructible<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>();
+  return true;
+}
+
+  template
+constexpr void
+test_all()
+{
+  test_ctor();
+  static_assert(test_ctor());
+}
+}
+
 template
   constexpr void
   test_all()
@@ -282,5 +342,9 @@ int
 main()
 {
   test_all();
+  test_all();
+
+  from_left_or_right::test_all();
+  from_left_or_right::test_all();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/empty.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/empty.cc
index 8cca8171d12..e95eacd80b6 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/empty.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/empty.cc
@@ -66,5 +66,6 @@ int
 main()
 {
   static_assert(test_all());
+  static_assert(test_all());
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
index a5be1166617..40a0c828cc4 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
@@ -293,6 +293,15 @@ template<>
 VERIFY(m.stride(1) == 3);
   }
 
+template<>
+  constexpr void
+  test_stride_2d()
+  {
+std::layout_right::mapping> m;
+VERIFY(m.stride(0) == 5);
+VERIFY(m.stride(1) == 1);
+  }
+
 template
   constexpr void
   test_stride_3d();
@@ -307,6 +316,16 @@ template<>
 VERIFY(m.stride(2) == 3*5);
   }
 
+template<>
+  constexpr void
+  test_stride_3d()
+  {
+std::layout_right::mapping m(std::dextents(3, 5, 7));
+VERIFY(m.stride(0) == 35);
+VERIFY(m.stride(1) == 7);
+VERIFY(m.stride(2) == 1);
+  }
+
 template
   constexpr bool
   test_stride_all()
@@ -381,24 +400,59 @@ template
 { m2 != m1 } -> std::same_as;
   };
 
-template
-  constexpr bool
+template
+  constexpr void
   test_has_op_eq()
   {
+static_assert(has_op_eq<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>> == Expected);
+
+static_assert(!has_op_eq<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>>);
+
+static_assert(has_op_eq<
+   typename SLayout::mapping>,
+   typename OLayout::mapping>> == Expe

[pushed 2/2] c++: add -fdump-lang-tinst

2025-05-26 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

This patch adds a dump with a trace of template instantiations, indented
based on the depth of recursive instantiation. -lineno adds the location
that triggered the instantiation, -details adds non-instantiation
sbustitutions.

The instantiate_pending_templates change is to avoid a bunch of entries for
reopening tinst scopes that we then don't instantiate anything with; it also
seems a bit cleaner this way.

gcc/cp/ChangeLog:

* cp-tree.h: Declare tinst_dump_id.
* cp-objcp-common.cc (cp_register_dumps): Set it.
* pt.cc (push_tinst_level_loc): Dump it.
(reopen_tinst_level): Here too.
(tinst_complete_p): New.
(instantiate_pending_templates): Don't reopen_tinst_level for
already-complete instantiations.

gcc/ChangeLog:

* doc/invoke.texi: Move C++ -fdump-lang to C++ section.
Add -fdump-lang-tinst.
---
 gcc/doc/invoke.texi   |  72 -
 gcc/cp/cp-tree.h  |   1 +
 gcc/cp/cp-objcp-common.cc |   2 +
 gcc/cp/pt.cc  | 111 --
 4 files changed, 145 insertions(+), 41 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fe47ce56487..e3bc833c59b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3297,6 +3297,50 @@ Enable support for the C++ coroutines extension 
(experimental).
 Permit the C++ front end to note all candidates during overload resolution
 failure, including when a deleted function is selected.
 
+@item -fdump-lang-
+@itemx -fdump-lang-@var{switch}
+@itemx -fdump-lang-@var{switch}-@var{options}
+@itemx -fdump-lang-@var{switch}-@var{options}=@var{filename}
+Control the dumping of C++-specific information.  The @var{options}
+and @var{filename} portions behave as described in the
+@option{-fdump-tree} option.  The following @var{switch} values are
+accepted:
+
+@table @samp
+@item all
+Enable all of the below.
+
+@opindex fdump-lang-class
+@item class
+Dump class hierarchy information.  Virtual table information is emitted
+unless '@option{slim}' is specified.
+
+@opindex fdump-lang-module
+@item module
+Dump module information.  Options @option{lineno} (locations),
+@option{graph} (reachability), @option{blocks} (clusters),
+@option{uid} (serialization), @option{alias} (mergeable),
+@option{asmname} (Elrond), @option{eh} (mapper) & @option{vops}
+(macros) may provide additional information.
+
+@opindex fdump-lang-raw
+@item raw
+Dump the raw internal tree data.
+
+@opindex fdump-lang-tinst
+@item tinst
+Dump the sequence of template instantiations, indented to show the
+depth of recursion.  The @option{lineno} option adds the source
+location where the instantiation was triggered, and the
+@option{details} option also dumps pre-instantiation substitutions
+such as those performed during template argument deduction.
+
+Lines in the .tinst dump start with @samp{I} for an instantiation,
+@samp{S} for another substitution, and @samp{R[IS]} for the reopened
+context of a deferred instantiation.
+
+@end table
+
 @opindex fno-elide-constructors
 @opindex felide-constructors
 @item -fno-elide-constructors
@@ -20891,30 +20935,10 @@ Dump language-specific information.  The file name is 
made by appending
 @itemx -fdump-lang-@var{switch}-@var{options}=@var{filename}
 Control the dumping of language-specific information.  The @var{options}
 and @var{filename} portions behave as described in the
-@option{-fdump-tree} option.  The following @var{switch} values are
-accepted:
-
-@table @samp
-@item all
-
-Enable all language-specific dumps.
-
-@item class
-Dump class hierarchy information.  Virtual table information is emitted
-unless '@option{slim}' is specified.  This option is applicable to C++ only.
-
-@item module
-Dump module information.  Options @option{lineno} (locations),
-@option{graph} (reachability), @option{blocks} (clusters),
-@option{uid} (serialization), @option{alias} (mergeable),
-@option{asmname} (Elrond), @option{eh} (mapper) & @option{vops}
-(macros) may provide additional information.  This option is
-applicable to C++ only.
-
-@item raw
-Dump the raw internal tree data.  This option is applicable to C++ only.
-
-@end table
+@option{-fdump-tree} option.  @option{-fdump-tree-all} enables all
+language-specific dumps; other options vary with the language.  For
+instance, see @xref{C++ Dialect Options} for the @option{-fdump-lang}
+flags supported by the C++ front-end.
 
 @opindex fdump-passes
 @item -fdump-passes
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7433b896219..19c0b452d86 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6822,6 +6822,7 @@ extern int class_dump_id;
 extern int module_dump_id;
 extern int raw_dump_id;
 extern int coro_dump_id;
+extern int tinst_dump_id;
 
 /* Whether the current context is manifestly constant-evaluated.
Used by the constexpr machinery to control folding of
diff --git a/gcc/cp/cp-objcp-common.cc b/gc

[pushed 1/2] c++: add cxx_dump_pretty_printer

2025-05-26 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

A class to simplify implementation of -fdump-lang-foo with support for
pp_printf using %D and such.

gcc/cp/ChangeLog:

* cxx-pretty-print.h (class cxx_dump_pretty_printer): New.
* error.cc (cxx_dump_pretty_printer): Ctor/dtor definitions.
---
 gcc/cp/cp-tree.h | 23 +++
 gcc/cp/error.cc  | 27 +++
 2 files changed, 50 insertions(+)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 175ab287490..7433b896219 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7322,6 +7322,29 @@ extern void cp_check_const_attributes (tree);
 extern void maybe_propagate_warmth_attributes (tree, tree);
 
 /* in error.cc */
+/* A class for pretty-printing to -flang-dump-XXX files.  Used like
+
+   if (cxx_dump_pretty_printer pp {foo_dump_id})
+ {
+   pp_printf (&pp, ...);
+ }
+
+   If the dump is enabled, the pretty printer will open the dump file and
+   attach to it, and flush and close the file on destruction.  */
+
+class cxx_dump_pretty_printer: public pretty_printer
+{
+  int phase;
+  FILE *outf;
+  dump_flags_t flags;
+
+public:
+  cxx_dump_pretty_printer (int phase);
+  operator bool() { return outf != nullptr; }
+  bool has_flag (dump_flags_t f) { return (flags & f); }
+  ~cxx_dump_pretty_printer ();
+};
+
 extern const char *type_as_string  (tree, int);
 extern const char *type_as_string_translate(tree, int);
 extern const char *decl_as_string  (tree, int);
diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 305064d476c..d52dad3db29 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -193,6 +193,33 @@ class cxx_format_postprocessor : public 
format_postprocessor
   deferred_printed_type m_type_b;
 };
 
+/* Constructor and destructor for cxx_dump_pretty_printer, defined here to
+   avoid needing to move cxx_format_postprocessor into the header as well.  */
+
+cxx_dump_pretty_printer::
+cxx_dump_pretty_printer (int phase)
+  : phase (phase)
+{
+  outf = dump_begin (phase, &flags);
+  if (outf)
+{
+  pp_format_decoder (this) = cp_printer;
+  /* This gets deleted in ~pretty_printer.  */
+  pp_format_postprocessor (this) = new cxx_format_postprocessor ();
+  set_output_stream (outf);
+}
+}
+
+cxx_dump_pretty_printer::
+~cxx_dump_pretty_printer ()
+{
+  if (outf)
+{
+  pp_flush (this);
+  dump_end (phase, outf);
+}
+}
+
 /* Return the in-scope template that's currently being parsed, or
NULL_TREE otherwise.  */
 

base-commit: 545433e9bd32e965726956cb238d53b39844b85c
-- 
2.49.0

[PATCH v4 2/8] libstdc++: Implement layout_left from mdspan.

2025-05-26 Thread Luc Grosheintz

Implements the parts of layout_left that don't depend on any of the
other layouts.

libstdc++-v3/ChangeLog:

* include/std/mdspan (layout_left): New class.
* src/c++23/std.cc.in: Add layout_left.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan  | 304 ++-
 libstdc++-v3/src/c++23/std.cc.in |   1 +
 2 files changed, 304 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 0f49b0e09a0..d81072596b4 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -144,6 +144,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  { return __exts[__i]; });
  }
 
+   static constexpr span
+   _S_static_extents(size_t __begin, size_t __end) noexcept
+   {
+ return {_Extents.data() + __begin, _Extents.data() + __end};
+   }
+
+   constexpr span
+   _M_dynamic_extents(size_t __begin, size_t __end) const noexcept
+   requires (_Extents.size() > 0)
+   {
+ return {_M_dyn_exts + _S_dynamic_index[__begin],
+ _M_dyn_exts + _S_dynamic_index[__end]};
+   }
+
   private:
using _S_storage = __array_traits<_IndexType, _S_rank_dynamic>::_Type;
[[no_unique_address]] _S_storage _M_dyn_exts{};
@@ -160,6 +174,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
|| _Extent <= numeric_limits<_IndexType>::max();
   }
 
+  namespace __mdspan
+  {
+template
+  constexpr span
+  __static_extents(size_t __begin = 0, size_t __end = _Extents::rank())
+  { return _Extents::_S_storage::_S_static_extents(__begin, __end); }
+
+template
+  constexpr span
+  __dynamic_extents(const _Extents& __exts, size_t __begin = 0,
+   size_t __end = _Extents::rank())
+  {
+   return __exts._M_exts._M_dynamic_extents(__begin, __end);
+  }
+  }
+
   template
 class extents
 {
@@ -251,7 +281,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _M_exts(span(__exts))
{ }
 
-
   template<__mdspan::__valid_index_type _OIndexType, size_t 
_Nm>
requires (_Nm == rank() || _Nm == rank_dynamic())
constexpr explicit(_Nm != rank_dynamic())
@@ -276,6 +305,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 
 private:
+  friend span
+  __mdspan::__static_extents(size_t, size_t);
+
+  friend span
+  __mdspan::__dynamic_extents(const extents&, size_t, size_t);
+
   using _S_storage = __mdspan::_ExtentsStorage<
_IndexType, array{_Extents...}>;
   [[no_unique_address]] _S_storage _M_exts;
@@ -286,6 +321,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   namespace __mdspan
   {
+template
+  constexpr bool
+  __contains_zero(span<_Tp, _Nm> __exts)
+  {
+   for (size_t __i = 0; __i < __exts.size(); ++__i)
+ if (__exts[__i] == 0)
+   return true;
+   return false;
+  }
+
+constexpr size_t
+__static_extents_prod(const auto& __sta_exts)
+{
+  size_t __ret = 1;
+  for (auto __factor : __sta_exts)
+   if (__factor != dynamic_extent)
+ __ret *= __factor;
+  return __ret;
+}
+
+template
+  constexpr typename _Extents::index_type
+  __exts_prod(const _Extents& __exts, size_t __begin, size_t __end) 
noexcept
+  {
+   using _IndexType = typename _Extents::index_type;
+
+   auto __sta_exts = __static_extents<_Extents>(__begin, __end);
+   size_t __sta_prod = __static_extents_prod(__sta_exts);
+
+   size_t __ret = 1;
+   if constexpr (_Extents::rank_dynamic() != _Extents::rank())
+ __ret = __static_extents_prod(__sta_exts);
+
+   if (__ret == 0)
+ return 0;
+
+   if constexpr (_Extents::rank_dynamic() > 0)
+ for (auto __factor : __dynamic_extents(__exts, __begin, __end))
+   __ret *= size_t(__factor);
+   return _IndexType(__ret);
+  }
+
+template
+  constexpr typename _Extents::index_type
+  __fwd_prod(const _Extents& __exts, size_t __r) noexcept
+  { return __exts_prod(__exts, 0, __r); }
+
+template
+  constexpr typename _Extents::index_type
+  __rev_prod(const _Extents& __exts, size_t __r) noexcept
+  { return __exts_prod(__exts, __r + 1, __exts.rank()); }
+
 template
   auto __build_dextents_type(integer_sequence)
-> extents<_IndexType, ((void) _Counts, dynamic_extent)...>;
@@ -304,6 +391,221 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 explicit extents(_Integrals...) ->
   extents()...>;
 
+  struct layout_left
+  {
+template
+  class mapping;
+  };
+
+  namespace __mdspan
+  {
+template
+  constexpr bool __is_extents = false;
+
+template
+  constexpr bool __is_extents> = true;
+
+template
+  constexpr typename _Extents::index_type
+  __linear_index_left(const _Extents& __exts, _Indices... __indices)
+  {
+   using _IndexType = typename _Extents::index_type;
+   _IndexType __res = 0

[PATCH v4 4/8] libstdc++: Implement layout_right from mdspan.

2025-05-26 Thread Luc Grosheintz

Implement the parts of layout_left that depend on layout_right; and the
parts of layout_right that don't depend on layout_stride.

libstdc++-v3/ChangeLog:

* include/std/mdspan (layout_right): New class.
* src/c++23/std.cc.in: Add layout_right.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan  | 153 ++-
 libstdc++-v3/src/c++23/std.cc.in |   1 +
 2 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index d81072596b4..7daa0713716 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -397,6 +397,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class mapping;
   };
 
+  struct layout_right
+  {
+template
+  class mapping;
+  };
+
   namespace __mdspan
   {
 template
@@ -489,7 +495,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _Mapping>;
 
 template
-  concept __standardized_mapping = __mapping_of;
+  concept __standardized_mapping = __mapping_of
+  || __mapping_of;
 
 template
   concept __mapping_like = requires
@@ -539,6 +546,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: mapping(__other.extents(), __mdspan::__internal_ctor{})
{ }
 
+  template
+   requires (_Extents::rank() <= 1
+ && is_constructible_v<_Extents, _OExtents>)
+   constexpr explicit(!is_convertible_v<_OExtents, _Extents>)
+   mapping(const layout_right::mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { }
+
   constexpr mapping&
   operator=(const mapping&) noexcept = default;
 
@@ -606,6 +621,142 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
[[no_unique_address]] _Extents _M_extents{};
 };
 
+  namespace __mdspan
+  {
+template
+  constexpr typename _Extents::index_type
+  __linear_index_right(const _Extents& __exts, _Indices... __indices)
+  {
+   using _IndexType = typename _Extents::index_type;
+   array<_IndexType, sizeof...(__indices)> __ind_arr{__indices...};
+   _IndexType __res = 0;
+   if constexpr (sizeof...(__indices) > 0)
+ {
+   _IndexType __mult = 1;
+   auto __update = [&, __pos = __exts.rank()](_IndexType) mutable
+ {
+   --__pos;
+   __res += __ind_arr[__pos] * __mult;
+   __mult *= __exts.extent(__pos);
+ };
+   (__update(__indices), ...);
+ }
+   return __res;
+  }
+  }
+
+  template
+class layout_right::mapping
+{
+public:
+  using extents_type = _Extents;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using layout_type = layout_right;
+
+  static_assert(__mdspan::__representable_size<_Extents, index_type>,
+   "The size of extents_type must be representable as index_type");
+
+  constexpr
+  mapping() noexcept = default;
+
+  constexpr
+  mapping(const mapping&) noexcept = default;
+
+  constexpr
+  mapping(const _Extents& __extents) noexcept
+  : _M_extents(__extents)
+  { __glibcxx_assert(__mdspan::__is_representable_extents(_M_extents)); }
+
+  template
+   requires (is_constructible_v)
+   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+   mapping(const mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { }
+
+  template
+   requires (extents_type::rank() <= 1
+   && is_constructible_v)
+   constexpr explicit(!is_convertible_v<_OExtents, extents_type>)
+   mapping(const layout_left::mapping<_OExtents>& __other) noexcept
+   : mapping(__other.extents(), __mdspan::__internal_ctor{})
+   { }
+
+  constexpr mapping&
+  operator=(const mapping&) noexcept = default;
+
+  constexpr const _Extents&
+  extents() const noexcept { return _M_extents; }
+
+  constexpr index_type
+  required_span_size() const noexcept
+  { return __mdspan::__fwd_prod(_M_extents, extents_type::rank()); }
+
+  template<__mdspan::__valid_index_type... _Indices>
+   requires (sizeof...(_Indices) == extents_type::rank())
+   constexpr index_type
+   operator()(_Indices... __indices) const noexcept
+   {
+ return __mdspan::__linear_index_right(
+   _M_extents, static_cast(__indices)...);
+   }
+
+  static constexpr bool
+  is_always_unique() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_always_exhaustive() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_always_strided() noexcept
+  { return true; }
+
+  static constexpr bool
+  is_unique() noexcept
+  { return true; }
+
+  static constexpr bool
+

[PATCH v4 3/8] libstdc++: Add tests for layout_left.

2025-05-26 Thread Luc Grosheintz

Implements a suite of tests for the currently implemented parts of
layout_left. The individual tests are templated over the layout type, to
allow reuse as more layouts are added.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc: New test.
* testsuite/23_containers/mdspan/layouts/ctors.cc: New test.
* testsuite/23_containers/mdspan/layouts/mapping.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../mdspan/layouts/class_mandate_neg.cc   |  22 +
 .../23_containers/mdspan/layouts/ctors.cc | 286 
 .../23_containers/mdspan/layouts/empty.cc |  70 +++
 .../23_containers/mdspan/layouts/mapping.cc   | 437 ++
 4 files changed, 815 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/layouts/empty.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc

diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
new file mode 100644
index 000..b276fbd333e
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
@@ -0,0 +1,22 @@
+// { dg-do compile { target c++23 } }
+#include
+
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+static constexpr size_t n = std::numeric_limits::max() / 2;
+
+template
+  struct A
+  {
+typename Layout::mapping> m0;
+typename Layout::mapping> m1;
+typename Layout::mapping> m2;
+
+using extents_type = std::extents;
+typename Layout::mapping m3; // { dg-error "required from" }
+  };
+
+A a_left; // { dg-error "required from" }
+
+// { dg-prune-output "must be representable as index_type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
new file mode 100644
index 000..18d9743a57b
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -0,0 +1,286 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+#include 
+
+constexpr size_t dyn = std::dynamic_extent;
+
+template
+  constexpr void
+  verify(std::extents oexts)
+  {
+auto m = Mapping(oexts);
+VERIFY(m.extents() == oexts);
+  }
+
+template
+  requires (requires { typename OMapping::layout_type; })
+  constexpr void
+  verify(OMapping other)
+  {
+constexpr auto rank = Mapping::extents_type::rank();
+auto m = Mapping(other);
+VERIFY(m.extents() == other.extents());
+if constexpr (rank > 0)
+  for(size_t i = 0; i < rank; ++i)
+   VERIFY(std::cmp_equal(m.stride(i), other.stride(i)));
+  }
+
+
+template
+  constexpr void
+  verify_convertible(From from)
+  {
+static_assert(std::is_convertible_v);
+verify(from);
+  }
+
+template
+  constexpr void
+  verify_nothrow_convertible(From from)
+  {
+static_assert(std::is_nothrow_constructible_v);
+verify_convertible(from);
+  }
+
+template
+  constexpr void
+  verify_constructible(From from)
+  {
+static_assert(!std::is_convertible_v);
+static_assert(std::is_constructible_v);
+verify(from);
+  }
+
+template
+  constexpr void
+  verify_nothrow_constructible(From from)
+  {
+static_assert(std::is_nothrow_constructible_v);
+verify_constructible(from);
+  }
+
+template
+  constexpr void
+  assert_not_constructible()
+  {
+static_assert(!std::is_constructible_v);
+  }
+
+// ctor: mapping()
+namespace default_ctor
+{
+  template
+constexpr void
+verify_default_extent(Mapping m, size_t i)
+{
+  using Extents = typename Mapping::extents_type;
+  if (Extents::static_extent(i) == std::dynamic_extent)
+   VERIFY(m.extents().extent(i) == 0);
+}
+
+  template
+constexpr void
+test_default_ctor()
+{
+  using Mapping = typename Layout::mapping;
+
+  Mapping m;
+  auto exts = m.extents();
+  for(size_t i = 0; i < Extents::rank(); ++i)
+   {
+ verify_default_extent(m, i);
+   }
+}
+
+  template
+constexpr bool
+test_default_ctor_all()
+{
+  test_default_ctor>();
+  test_default_ctor>();
+  test_default_ctor>();
+  test_default_ctor>();
+  test_default_ctor>();
+  return true;
+}
+
+  template
+  constexpr void
+  test_all()
+  {
+test_default_ctor_all();
+static_assert(test_default_ctor_all());
+  }
+}
+
+// ctor: mapping(const extents&)
+namespace from_extents
+{
+  template
+constexpr void
+verify_nothrow_convertible(OExtents oexts)
+{
+  using Mapping = typename Layout::mapping;
+  ::verify_nothrow_convertible(oexts);
+}
+
+  template
+constexpr void
+verify_nothrow_constructible(OExtents oexts)
+{
+  using Mapping = typename La

[PATCH v4 8/8] libstdc++: Make layout_left(layout_stride) noexcept.

2025-05-26 Thread Luc Grosheintz

[mdspan.layout.left.cons] of N4950 states that this ctor is not
noexcept. Since, all other ctors of layout_left, layout_right or
layout_stride are noexcept, the choice was made, based on
[res.on.exception.handling], to make this ctor noexcept.

Two other major implementations of the STL make the same choice.

libstdc++-v3/ChangeLog:

* include/std/mdspan: Strengthen the exception
guarantees of layout_left::mapping(layout_stride::mapping).
* testsuite/23_containers/mdspan/layouts/ctors.cc:
Simplify tests to reflect the change.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan   |  6 +++-
 .../23_containers/mdspan/layouts/ctors.cc | 33 ---
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 33ad5070a37..2a7f0452cd7 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -561,10 +561,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: mapping(__other.extents(), __mdspan::__internal_ctor{})
{ }
 
+  // [mdspan.layout.left.cons] of N4950 states that this ctor is not
+  // noexcept. Since, all other ctors of layout_left, layout_right or
+  // layout_stride are noexcept, the choice was made, based on
+  // [res.on.exception.handling], to make this ctor noexcept.
   template
requires (is_constructible_v)
constexpr explicit(extents_type::rank() > 0)
-   mapping(const layout_stride::mapping<_OExtents>& __other)
+   mapping(const layout_stride::mapping<_OExtents>& __other) noexcept
: mapping(__other.extents(), __mdspan::__internal_ctor{})
{ __glibcxx_assert(*this == __other); }
 
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
index 2507eeaf7a1..6f813e9877b 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
@@ -362,30 +362,24 @@ namespace from_stride
 
   template
 constexpr void
-verify_convertible(OExtents oexts)
+verify_nothrow_convertible(OExtents oexts)
 {
   using Mapping = typename Layout::mapping;
   using OMapping = std::layout_stride::mapping;
 
   constexpr auto other = OMapping(oexts, strides(Mapping(Extents(oexts;
-  if constexpr (std::is_same_v)
-   ::verify_nothrow_convertible(other);
-  else
-   ::verify_convertible(other);
+  ::verify_nothrow_convertible(other);
 }
 
   template
 constexpr void
-verify_constructible(OExtents oexts)
+verify_nothrow_constructible(OExtents oexts)
 {
   using Mapping = typename Layout::mapping;
   using OMapping = std::layout_stride::mapping;
 
   constexpr auto other = OMapping(oexts, strides(Mapping(Extents(oexts;
-  if constexpr (std::is_same_v)
-   ::verify_nothrow_constructible(other);
-  else
-   ::verify_constructible(other);
+  ::verify_nothrow_constructible(other);
 }
 
   template
@@ -404,31 +398,32 @@ namespace from_stride
typename Layout::mapping>,
std::layout_stride::mapping>>();
 
-  verify_convertible>(std::extents{});
+  verify_nothrow_convertible>(
+   std::extents{});
 
-  verify_convertible>(
+  verify_nothrow_convertible>(
std::extents{});
 
   // Rank ==  0 doesn't check IndexType for convertibility.
-  verify_convertible>(
+  verify_nothrow_convertible>(
std::extents{});
 
-  verify_constructible>(
+  verify_nothrow_constructible>(
std::extents{});
 
-  verify_constructible>(
+  verify_nothrow_constructible>(
std::extents{});
 
-  verify_constructible>(
+  verify_nothrow_constructible>(
std::extents{});
 
-  verify_constructible>(
+  verify_nothrow_constructible>(
std::extents{});
 
-  verify_constructible>(
+  verify_nothrow_constructible>(
std::extents{});
 
-  verify_constructible>(
+  verify_nothrow_constructible>(
std::extents{});
   return true;
 }
-- 
2.49.0

[PATCH RFA] fold: DECL_VALUE_EXPR isn't simple [PR120400]

2025-05-26 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, OK for trunk?

Iain, will you verify that one of your coroutine testcases breaks without this
fix?  I don't think lambda or anonymous union uses of DECL_VALUE_EXPR can break
in the same way, though this change is also correct for them.

-- 8< --

This PR noted that fold_truth_andor was wrongly changing && to & where the
RHS is a VAR_DECL with DECL_VALUE_EXPR; we can't assume that such can be
evaluated unconditionally.

To be more precise we could recurse into DECL_VALUE_EXPR, but that doesn't
seem worth bothering with since typical uses involve a COMPONENT_REF, which
is not simple.

PR c++/120400

gcc/ChangeLog:

* fold-const.cc (simple_operand_p): False for vars with
DECL_VALUE_EXPR.
---
 gcc/fold-const.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 5f48ced5063..014f4218793 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -5085,6 +5085,11 @@ simple_operand_p (const_tree exp)
 #pragma weak, etc).  */
  && ! TREE_PUBLIC (exp)
  && ! DECL_EXTERNAL (exp)
+ /* DECL_VALUE_EXPR will expand to something non-simple.  */
+ && ! ((VAR_P (exp)
+|| TREE_CODE (exp) == PARM_DECL
+|| TREE_CODE (exp) == RESULT_DECL)
+   && DECL_HAS_VALUE_EXPR_P (exp))
  /* Weakrefs are not safe to be read, since they can be NULL.
 They are !TREE_PUBLIC && !DECL_EXTERNAL but still
 have DECL_WEAK flag set.  */

base-commit: f59ff19bc3d37f4dd159db541ed4f07efb10fcc8
-- 
2.49.0

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

2025-05-26 Thread Alexander Monakov


On Mon, 26 May 2025, Richard Biener wrote:

> On Fri, May 23, 2025 at 2:31 PM Alexander Monakov  wrote:
> >
> > In PR 105965 we accepted a request to form FMA instructions when the
> > source code is using a narrow generic vector that contains just one
> > element, corresponding to V1SF or V1DF mode, while the backend does not
> > expand fma patterns for such modes.
> >
> > For this to work under -ffp-contract=on, we either need to modify
> > backends, or emulate such degenerate-vector FMA via scalar FMA in
> > tree-vect-generic.  Do the latter.
> 
> Can you instead apply the lowering during gimplification?  That is because
> having an unsupported internal-function in the IL the user could not have
> emitted directly is somewhat bad.  I thought the vector lowering could
> be generalized for more single-argument internal functions but then no
> such unsupported calls should exist in the first place.

Sure, like below?  Not fully tested yet.

-- 8< --

>From 4caee92434d9425912979b285725166b22f40a87 Mon Sep 17 00:00:00 2001
From: Alexander Monakov 
Date: Wed, 21 May 2025 18:35:45 +0300
Subject: [PATCH v2] allow contraction to synthetic single-element vector FMA

In PR 105965 we accepted a request to form FMA instructions when the
source code is using a narrow generic vector that contains just one
element, corresponding to V1SF or V1DF mode, while the backend does not
expand fma patterns for such modes.

For this to work under -ffp-contract=on, we either need to modify
backends, or emulate such degenerate-vector FMA via scalar FMA.
Do the latter, in gimplification hook together with contraction.

gcc/c-family/ChangeLog:

* c-gimplify.cc (fma_supported_p): Allow forming single-element
vector FMA when scalar FMA is available.
(c_gimplify_expr): Allow vector types.
---
 gcc/c-family/c-gimplify.cc | 50 ++
 1 file changed, 40 insertions(+), 10 deletions(-)

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index c6fb764656..6c313287e6 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -870,12 +870,28 @@ c_build_bind_expr (location_t loc, tree block, tree body)
   return bind;
 }
 
+enum fma_expansion
+{
+  FMA_NONE,
+  FMA_DIRECT,
+  FMA_VEC1_SYNTHETIC
+};
+
 /* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
 
-static bool
+static fma_expansion
 fma_supported_p (enum internal_fn fn, tree type)
 {
-  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
+  if (direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH))
+return FMA_DIRECT;
+  /* Accept single-element vector FMA (see PR 105965) when the
+ backend handles the scalar but not the vector mode.  */
+  if (VECTOR_TYPE_P (type)
+  && known_eq (TYPE_VECTOR_SUBPARTS (type),  1U)
+  && direct_internal_fn_supported_p (fn, TREE_TYPE (type),
+OPTIMIZE_FOR_BOTH))
+return FMA_VEC1_SYNTHETIC;
+  return FMA_NONE;
 }
 
 /* Gimplification of expression trees.  */
@@ -936,13 +952,14 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
ATTRIBUTE_UNUSED,
 case MINUS_EXPR:
   {
tree type = TREE_TYPE (*expr_p);
+   enum fma_expansion how;
/* For -ffp-contract=on we need to attempt FMA contraction only
   during initial gimplification.  Late contraction across statement
   boundaries would violate language semantics.  */
-   if (SCALAR_FLOAT_TYPE_P (type)
+   if ((SCALAR_FLOAT_TYPE_P (type) || VECTOR_FLOAT_TYPE_P (type))
&& flag_fp_contract_mode == FP_CONTRACT_ON
&& cfun && !(cfun->curr_properties & PROP_gimple_any)
-   && fma_supported_p (IFN_FMA, type))
+   && (how = fma_supported_p (IFN_FMA, type)) != FMA_NONE)
  {
bool neg_mul = false, neg_add = code == MINUS_EXPR;
 
@@ -973,7 +990,7 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
ATTRIBUTE_UNUSED,
enum internal_fn ifn = IFN_FMA;
if (neg_mul)
  {
-   if (fma_supported_p (IFN_FNMA, type))
+   if ((how = fma_supported_p (IFN_FNMA, type)) != FMA_NONE)
  ifn = IFN_FNMA;
else
  ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
@@ -981,21 +998,34 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
ATTRIBUTE_UNUSED,
if (neg_add)
  {
enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : IFN_FNMS;
-   if (fma_supported_p (ifn2, type))
+   if ((how = fma_supported_p (ifn2, type)) != FMA_NONE)
  ifn = ifn2;
else
  ops[2] = build1 (NEGATE_EXPR, type, ops[2]);
  }
/* Avoid gimplify_arg: it emits all side effects into *PRE_P.  */
for (auto &&op : ops)
- if (gimplify_expr (&op, pre_p, post_p, is_gimple_val, fb_rvalue)
- == GS_ERROR)
-   r

[PATCH v2] s390: Floating point vector lane handling

2025-05-26 Thread Juergen Christ

Since floating point and vector registers overlap on s390, more
efficient code can be generated to extract FPRs from VRs.
Additionally, for double vectors, more efficient code can be generated
to load specific lanes.

Bootstrapped and regtested on s390x.

gcc/ChangeLog:

* config/s390/vector.md (VF): New mode iterator.
(VEC_SET_NONFLOAT): New mode iterator.
(VEC_SET_SINGLEFLOAT): New mode iterator.
(*vec_set): Split pattern in two.
(*vec_setv2df): Extract special handling for V2DF mode.
(*vec_extract): Split pattern in two.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-extract-1.c: New test.
* gcc.target/s390/vector/vec-set-1.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/vector.md | 137 +++--
 .../gcc.target/s390/vector/vec-extract-1.c| 190 ++
 .../gcc.target/s390/vector/vec-set-1.c| 133 
 3 files changed, 448 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-set-1.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index e29255fe1116..340dafd729eb 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -75,6 +75,8 @@
   V1DF V2DF
   (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
 
+(define_mode_iterator VF [(V2SF "TARGET_VXE") (V4SF "TARGET_VXE") V2DF])
+
 ; All modes present in V_HW1 and VFT.
 (define_mode_iterator V_HW1_FT [V16QI V8HI V4SI V2DI V1TI V1DF
   V2DF (V1SF "TARGET_VXE") (V2SF "TARGET_VXE")
@@ -506,26 +508,89 @@
   UNSPEC_VEC_SET))]
   "TARGET_VX")
 
+; Iterator for vec_set that does not use special float/vect overlay tricks
+(define_mode_iterator VEC_SET_NONFLOAT
+  [V1QI V2QI V4QI V8QI V16QI V1HI V2HI V4HI V8HI V1SI V2SI V4SI V1DI V2DI V2SF 
V4SF])
+; Iterator for single element float vectors
+(define_mode_iterator VEC_SET_SINGLEFLOAT [(V1SF "TARGET_VXE") V1DF (V1TF 
"TARGET_VXE")])
+
 ; FIXME: Support also vector mode operands for 1
 ; FIXME: A target memory operand seems to be useful otherwise we end
 ; up with vl vlvgg vst.  Shouldn't the middle-end be able to handle
 ; that itself?
 ; vlvgb, vlvgh, vlvgf, vlvgg, vleb, vleh, vlef, vleg, vleib, vleih, vleif, 
vleig
 (define_insn "*vec_set"
-  [(set (match_operand:V0 "register_operand"  "=v,v,v")
-   (unspec:V [(match_operand: 1 "general_operand""d,R,K")
-  (match_operand:SI2 "nonmemory_operand" "an,I,I")
-  (match_operand:V 3 "register_operand"   "0,0,0")]
- UNSPEC_VEC_SET))]
+  [(set (match_operand:VEC_SET_NONFLOAT  0 "register_operand"  "=v,v,v")
+   (unspec:VEC_SET_NONFLOAT
+ [(match_operand:  1 "general_operand""d,R,K")
+  (match_operand:SI 2 "nonmemory_operand" "an,I,I")
+  (match_operand:VEC_SET_NONFLOAT   3 "register_operand"   "0,0,0")]
+ UNSPEC_VEC_SET))]
   "TARGET_VX
&& (!CONST_INT_P (operands[2])
-   || UINTVAL (operands[2]) < GET_MODE_NUNITS (mode))"
+   || UINTVAL (operands[2]) < GET_MODE_NUNITS 
(mode))"
   "@
vlvg\t%v0,%1,%Y2
vle\t%v0,%1,%2
vlei\t%v0,%1,%2"
   [(set_attr "op_type" "VRS,VRX,VRI")])
 
+(define_insn "*vec_set"
+  [(set (match_operand:VEC_SET_SINGLEFLOAT 0 "register_operand"  "=v,v")
+   (unspec:VEC_SET_SINGLEFLOAT
+ [(match_operand:1 "general_operand""v,R")
+  (match_operand:SI   2 "nonmemory_operand" "an,I")
+  (match_operand:VEC_SET_SINGLEFLOAT  3 "register_operand"   "0,0")]
+ UNSPEC_VEC_SET))]
+  "TARGET_VX"
+  "@
+   vlr\t%v0,%v1
+   vle\t%v0,%1,0"
+ [(set_attr "op_type" "VRR,VRX")])
+
+(define_insn "*vec_setv2df"
+  [(set (match_operand:V2DF0 "register_operand"  
"=v,v,v,v")
+   (unspec:V2DF [(match_operand:DF1 "general_operand""d,R,K,v")
+ (match_operand:SI2 "nonmemory_operand" "an,I,I,n")
+ (match_operand:V2DF  3 "register_operand"   
"0,0,0,0")]
+UNSPEC_VEC_SET))]
+  "TARGET_VX
+   && (!CONST_INT_P (operands[2])
+   || UINTVAL (operands[2]) < GET_MODE_NUNITS (V2DFmode))"
+  "@
+   vlvgg\t%v0,%1,%Y2
+   vleg\t%v0,%1,%2
+   vleig\t%v0,%1,%2
+   #"
+  [(set_attr "op_type" "VRS,VRX,VRI,*")])
+
+(define_split
+  [(set (match_operand:V2DF0 "register_operand"  "")
+   (unspec:V2DF [(match_operand:DF1 "register_operand"  "")
+ (match_operand:SI2 "const_int_operand" "")
+ (match_operand:V2DF  3 "register_operand"  "")]
+UNSPEC_VEC_SET))]
+  "TARGET_VX
+   && (UINTVAL (operands[2]) < GET_MODE_NUNITS (V2DFmode))
+   && reload_completed
+   && VECTOR_REGNO_P (REGNO (oper

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-26 Thread Luc Grosheintz





On 5/26/25 11:43, Tomasz Kaminski wrote:

On Mon, May 26, 2025 at 11:35 AM Luc Grosheintz 
wrote:




On 5/22/25 15:21, Tomasz Kaminski wrote:


For the stride and product computation, we should perform them in
Extent::size_type, not index_type.
The latter may be signed, and we may hit UB in multiplying non-zero
extents, before reaching the zero.



Then I observe the following issues:

1. When computing products, the integer promotion rules can interfere.
For simplicity let's assume that int is a 32 bit integer. Then the
relevant case is `uint16_t` (or unsigned short). Which is unsigned; and
therefore overflow shouldn't be UB. I observe that the expression

prod *= n;

will overflow as `int` (for large enough `n`). I believe that during the
computation of `prod * n` both sides are promoted to int (because the
range of uint16_t is contained in the range of `int`) and then
overflows, e.g. for n = 2**16-1.

Note that many other small, both signed and unsigned, integers
semantically also overflow, but it's neither UB that's detected by
-fsanitize=undefined, nor a compiler error. Likely because the
"overflow" happens during conversion, which (in C++23) is uniquely
defined in [conv.integral], i.e. not UB.

draft: https://eel.is/c++draft/conv.integral
N4950: 7.3.9 on p. 101

The solution I've come up is to not use `size_type` but
make_unsigned_t

Please let me know if there's a better solution to forcing unsigned
math.


I think at this point we should perform stride computation in std::size_t.
Because accessors are defined to accept size_t, the required_span_size()
cannot be greater
than maximum of size_t, and that limits our product of extents.



I looked into this in the context of computing the product of
static extents. The stumbling block was that I couldn't find
a clear statement that sizeof(int) <= sizeof(size_t), or that
size_t is exempted from the integer conversion rules.

Therefore, the concern was that the overflow issue would come
back on systems with 16-bit size_t and 32-bit int.

I'm slightly unhappy that (on common systems) we need to use
64-bit integers for 32-bit (or less) operations; but as you
point out, this only affects code that shouldn't be performance
sensitive.



Godbolt: https://godbolt.org/z/PnvaYT7vd

2. Let's assume we compute `__extents_prod` safely, e.g. by doing all
math as unsigned integers. There's several places we need to be careful:

2.1. layout_{right,left}::stride, these still compute products, that
overflow and might not be multiplied by `0` to make the answer
unambiguous. For an empty extent, any number is a valid stride. Hence,
this only requires that we don't run into UB.

2.2. The default ctor of layout_stride computes the layout_right
strides on the fly. We can use __unsigned_prod to keep computing the
extents in linear time. The only requirement I'm aware of is that the
strides are the same as those for layout_right (but the actual value
in not defined directly).

2.3 layout_stride::required_span_size, the current implementation
first scans for zeros; and only if there are none does it proceed with
computing the required span size in index_type. This is safe, because
the all terms in the sum are non-negative and the mandate states that
the total is a representable number. Hence, all the involved terms are
representable too.

3. For those interested in what the other two implementions do: both
fail in some subset of the corner cases.

Godbolt: https://godbolt.org/z/vEYxEvMWs

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-26 Thread Luc Grosheintz





On 5/26/25 13:53, Tomasz Kaminski wrote:

On Mon, May 26, 2025 at 1:32 PM Luc Grosheintz 
wrote:




On 5/26/25 11:43, Tomasz Kaminski wrote:

On Mon, May 26, 2025 at 11:35 AM Luc Grosheintz <

luc.groshei...@gmail.com>

wrote:




On 5/22/25 15:21, Tomasz Kaminski wrote:


For the stride and product computation, we should perform them in
Extent::size_type, not index_type.
The latter may be signed, and we may hit UB in multiplying non-zero
extents, before reaching the zero.



Then I observe the following issues:

1. When computing products, the integer promotion rules can interfere.
For simplicity let's assume that int is a 32 bit integer. Then the
relevant case is `uint16_t` (or unsigned short). Which is unsigned; and
therefore overflow shouldn't be UB. I observe that the expression

 prod *= n;

will overflow as `int` (for large enough `n`). I believe that during the
computation of `prod * n` both sides are promoted to int (because the
range of uint16_t is contained in the range of `int`) and then
overflows, e.g. for n = 2**16-1.

Note that many other small, both signed and unsigned, integers
semantically also overflow, but it's neither UB that's detected by
-fsanitize=undefined, nor a compiler error. Likely because the
"overflow" happens during conversion, which (in C++23) is uniquely
defined in [conv.integral], i.e. not UB.

draft: https://eel.is/c++draft/conv.integral
N4950: 7.3.9 on p. 101

The solution I've come up is to not use `size_type` but
 make_unsigned_t

Please let me know if there's a better solution to forcing unsigned
math.


I think at this point we should perform stride computation in

std::size_t.

Because accessors are defined to accept size_t, the required_span_size()
cannot be greater
than maximum of size_t, and that limits our product of extents.



I looked into this in the context of computing the product of
static extents. The stumbling block was that I couldn't find
a clear statement that sizeof(int) <= sizeof(size_t), or that
size_t is exempted from the integer conversion rules.

Therefore, the concern was that the overflow issue would come
back on systems with 16-bit size_t and 32-bit int.


We could cast elements of __dyn_exts to size_t before multiplying in
__ext_prod.
Even use size_t in for loop: for (size_t x ; __dyn_ext()).



This is clear. However, what I'm worried about is that due to
the integer conversion rules:
https://eel.is/c++draft/conv.integral#1

On a system in which size_t is 16 bits and int is 32 bits, the
conversion rank of size_t will be less than that of int:
https://eel.is/c++draft/conv.rank#1.2
https://eel.is/c++draft/conv.rank#1.4

Therefore, during binary operations, both operands undergo integer
promotion due to the arithmetic conversion:
https://eel.is/c++draft/expr.arith.conv#1.5
https://eel.is/c++draft/conv.prom#2

Hence, IIUC, both operands are converted to int and we're straight
back to the issue of `uint16_t` on a system with 32-bit int, i.e.

  uint16_t(n) * uint16_t(n)

is equivalent to

  int(n) * int(n)

and causes UB due to signed overflow, if `n` is sufficiently large,
e.g. 2**16-1. Note, it doesn't help to use `*=`.

The case for uint16_t is setup here:
Godbolt: https://godbolt.org/z/bcY1GnMPr



I'm slightly unhappy that (on common systems) we need to use
64-bit integers for 32-bit (or less) operations; but as you
point out, this only affects code that shouldn't be performance
sensitive.



Godbolt: https://godbolt.org/z/PnvaYT7vd

2. Let's assume we compute `__extents_prod` safely, e.g. by doing all
math as unsigned integers. There's several places we need to be careful:

 2.1. layout_{right,left}::stride, these still compute products, that
 overflow and might not be multiplied by `0` to make the answer
 unambiguous. For an empty extent, any number is a valid stride.

Hence,

 this only requires that we don't run into UB.

 2.2. The default ctor of layout_stride computes the layout_right
 strides on the fly. We can use __unsigned_prod to keep computing the
 extents in linear time. The only requirement I'm aware of is that

the

 strides are the same as those for layout_right (but the actual value
 in not defined directly).

 2.3 layout_stride::required_span_size, the current implementation
 first scans for zeros; and only if there are none does it proceed

with

 computing the required span size in index_type. This is safe,

because

 the all terms in the sum are non-negative and the mandate states

that

 the total is a representable number. Hence, all the involved terms

are

 representable too.

3. For those interested in what the other two implementions do: both
fail in some subset of the corner cases.

Godbolt: https://godbolt.org/z/vEYxEvMWs

Re: [PATCH v3 0/9] Implement layouts from mdspan.

2025-05-26 Thread Tomasz Kaminski

On Mon, May 26, 2025 at 2:20 PM Luc Grosheintz 
wrote:

>
>
> On 5/26/25 13:53, Tomasz Kaminski wrote:
> > On Mon, May 26, 2025 at 1:32 PM Luc Grosheintz  >
> > wrote:
> >
> >>
> >>
> >> On 5/26/25 11:43, Tomasz Kaminski wrote:
> >>> On Mon, May 26, 2025 at 11:35 AM Luc Grosheintz <
> >> luc.groshei...@gmail.com>
> >>> wrote:
> >>>
> 
> 
>  On 5/22/25 15:21, Tomasz Kaminski wrote:
> >
> > For the stride and product computation, we should perform them in
> > Extent::size_type, not index_type.
> > The latter may be signed, and we may hit UB in multiplying non-zero
> > extents, before reaching the zero.
> >
> 
>  Then I observe the following issues:
> 
>  1. When computing products, the integer promotion rules can interfere.
>  For simplicity let's assume that int is a 32 bit integer. Then the
>  relevant case is `uint16_t` (or unsigned short). Which is unsigned;
> and
>  therefore overflow shouldn't be UB. I observe that the expression
> 
>   prod *= n;
> 
>  will overflow as `int` (for large enough `n`). I believe that during
> the
>  computation of `prod * n` both sides are promoted to int (because the
>  range of uint16_t is contained in the range of `int`) and then
>  overflows, e.g. for n = 2**16-1.
> 
>  Note that many other small, both signed and unsigned, integers
>  semantically also overflow, but it's neither UB that's detected by
>  -fsanitize=undefined, nor a compiler error. Likely because the
>  "overflow" happens during conversion, which (in C++23) is uniquely
>  defined in [conv.integral], i.e. not UB.
> 
>  draft: https://eel.is/c++draft/conv.integral
>  N4950: 7.3.9 on p. 101
> 
>  The solution I've come up is to not use `size_type` but
>   make_unsigned_t
> 
>  Please let me know if there's a better solution to forcing unsigned
>  math.
> 
> >>> I think at this point we should perform stride computation in
> >> std::size_t.
> >>> Because accessors are defined to accept size_t, the
> required_span_size()
> >>> cannot be greater
> >>> than maximum of size_t, and that limits our product of extents.
> >>>
> >>
> >> I looked into this in the context of computing the product of
> >> static extents. The stumbling block was that I couldn't find
> >> a clear statement that sizeof(int) <= sizeof(size_t), or that
> >> size_t is exempted from the integer conversion rules.
> >>
> >> Therefore, the concern was that the overflow issue would come
> >> back on systems with 16-bit size_t and 32-bit int.
> >>
> > We could cast elements of __dyn_exts to size_t before multiplying in
> > __ext_prod.
> > Even use size_t in for loop: for (size_t x ; __dyn_ext()).
> >
>
> This is clear. However, what I'm worried about is that due to
> the integer conversion rules:
> https://eel.is/c++draft/conv.integral#1
>
> On a system in which size_t is 16 bits and int is 32 bits, the
> conversion rank of size_t will be less than that of int:
>
I highly doubt that GCC/libstdc++ targets any architecture where this would
be the case.

> https://eel.is/c++draft/conv.rank#1.2
> https://eel.is/c++draft/conv.rank#1.4
>
> Therefore, during binary operations, both operands undergo integer
> promotion due to the arithmetic conversion:
> https://eel.is/c++draft/expr.arith.conv#1.5
> https://eel.is/c++draft/conv.prom#2
>
> Hence, IIUC, both operands are converted to int and we're straight
> back to the issue of `uint16_t` on a system with 32-bit int, i.e.
>
>uint16_t(n) * uint16_t(n)
>
> is equivalent to
>
>int(n) * int(n)
>
> and causes UB due to signed overflow, if `n` is sufficiently large,
> e.g. 2**16-1. Note, it doesn't help to use `*=`.
>
> The case for uint16_t is setup here:
> Godbolt: https://godbolt.org/z/bcY1GnMPr
>
> >>
> >> I'm slightly unhappy that (on common systems) we need to use
> >> 64-bit integers for 32-bit (or less) operations; but as you
> >> point out, this only affects code that shouldn't be performance
> >> sensitive.
> >>
> 
>  Godbolt: https://godbolt.org/z/PnvaYT7vd
> 
>  2. Let's assume we compute `__extents_prod` safely, e.g. by doing all
>  math as unsigned integers. There's several places we need to be
> careful:
> 
>   2.1. layout_{right,left}::stride, these still compute products,
> that
>   overflow and might not be multiplied by `0` to make the answer
>   unambiguous. For an empty extent, any number is a valid stride.
> >> Hence,
>   this only requires that we don't run into UB.
> 
>   2.2. The default ctor of layout_stride computes the layout_right
>   strides on the fly. We can use __unsigned_prod to keep computing
> the
>   extents in linear time. The only requirement I'm aware of is that
> >> the
>   strides are the same as those for layout_right (but the actual
> value
>   in not defined directly).
> 
>   2.3 layout_st

88 matches

Mail list logo