date:20241124

Re: [PATCH] Optimize 128-bit vector permutation with pand, pandn and por.

2024-11-24 Thread Hongtao Liu

On Wed, Nov 20, 2024 at 8:03 PM Cui, Lili  wrote:
>
> Hi, all
>
> This patch aims to handle certain vector shuffle operations using pand, pandn 
> and por more efficiently.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
Although it's stage 3, I think this one is low risk, so Ok for trunk.
>
> Regards,
> Lili.
>
>
> This patch introduces a new subroutine in ix86_expand_vec_perm_const_1.
> On x86, use mixed constant permutation for V8HImode and V16QImode when
> SSE2 is supported. This patch handles certain vector shuffle operations
> more efficiently using pand, pandn and por. This change is intended to
> improve assembly code generation for configurations that support SSE2.
>
> gcc/ChangeLog:
>
> PR target/116675
> * config/i386/i386-expand.cc (expand_vec_perm_pand_pandn_por):
> New subroutine.
> (ix86_expand_vec_perm_const_1): Call expand_vec_perm_pand_pandn_por.
>
> gcc/testsuite/ChangeLog:
>
> PR target/116675
> * gcc.target/i386/pr116675.c: New test.
> ---
>  gcc/config/i386/i386-expand.cc   | 50 
>  gcc/testsuite/gcc.target/i386/pr116675.c | 75 
>  2 files changed, 125 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr116675.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index a6e6e738a52..f9fa0281298 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -23103,6 +23103,53 @@ expand_vec_perm_vpshufb2_vpermq_even_odd (struct 
> expand_vec_perm_d *d)
>return true;
>  }
>
> +/* A subroutine of ix86_expand_vec_perm_const_1. Try to implement a
> +   permutation (which is a bland) with and, andnot and or when pshufb is not 
> available.
> +
> +   It handles case:
> +   __builtin_shufflevector (v1, v2, 0, 9, 2, 11, 4, 13, 6, 15);
> +   __builtin_shufflevector (v1, v2, 8, 1, 2, 11, 4, 13, 6, 15);
> +
> +   An element[i] must be chosen between op0[i] and op1[i] to satisfy the
> +   requirement.
> + */
> +
> +static bool
> +expand_vec_perm_pand_pandn_por (struct expand_vec_perm_d *d)
> +{
> +  rtx rperm[16], vperm;
> +  unsigned int i, nelt = d->nelt;
> +
> +  if (!TARGET_SSE2
> +  || d->one_operand_p
> +  || (d->vmode != V16QImode && d->vmode != V8HImode))
> +return false;
> +
> +  if (d->perm[0] != 0)
> +return false;
> +
> +  /* The dest[i] must select an element between op0[i] and op1[i].  */
> +  for (i = 1; i < nelt; i++)
> +if ((d->perm[i] % nelt) != i)
> +  return false;
> +
> +  if (d->testing_p)
> + return true;
> +
> +  /* Generates a blend mask for the operators AND and ANDNOT.  */
> +  machine_mode inner_mode = GET_MODE_INNER (d->vmode);
> +  for (i = 0; i < nelt; i++)
> +rperm[i] = (d->perm[i] <  nelt) ? CONSTM1_RTX (inner_mode)
> +  : CONST0_RTX (inner_mode);
> +
> +  vperm = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (nelt, rperm));
> +  vperm = force_reg (d->vmode, vperm);
> +
> +  ix86_expand_sse_movcc (d->target, vperm, d->op0, d->op1);
> +
> +  return true;
> +}
> +
>  /* Implement permutation with pslldq + psrldq + por when pshufb is not
> available.  */
>  static bool
> @@ -24162,6 +24209,9 @@ ix86_expand_vec_perm_const_1 (struct 
> expand_vec_perm_d *d)
>if (expand_vec_perm_psrlw_psllw_por (d))
>  return true;
>
> +  if (expand_vec_perm_pand_pandn_por (d))
> +return true;
> +
>/* Try sequences of four instructions.  */
>
>if (expand_vec_perm_even_odd_trunc (d))
> diff --git a/gcc/testsuite/gcc.target/i386/pr116675.c 
> b/gcc/testsuite/gcc.target/i386/pr116675.c
> new file mode 100644
> index 000..e463dd8415f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr116675.c
> @@ -0,0 +1,75 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -msse2 -mno-ssse3" } */
> +/* { dg-final { scan-assembler-times "pand" 4 } } */
> +/* { dg-final { scan-assembler-times "pandn" 4 } } */
> +/* { dg-final { scan-assembler-times "por" 4 } } */
> +
> +#include 
> +
> +__attribute__((noinline, noclone, target("sse2")))
> +static __v8hi foo1 (__v8hi a, __v8hi b)
> +{
> +  return __builtin_shufflevector (a, b, 0, 9, 2, 11, 4, 13, 6, 15);
> +}
> +
> +__attribute__((noinline, noclone, target("sse2")))
> +static __v8hi foo2 (__v8hi a, __v8hi b)
> +{
> +  return __builtin_shufflevector (a, b, 8, 9, 2, 3, 4, 13, 14, 15);
> +}
> +
> +__attribute__((noinline, noclone, target("sse2")))
> +static __v16qi foo3 (__v16qi a, __v16qi b)
> +{
> +  return __builtin_shufflevector (a, b, 0, 17, 2, 19, 4, 21, 6, 23,
> + 8, 25, 10, 27, 12, 29, 14, 31);
> +}
> +
> +__attribute__((noinline, noclone, target("sse2")))
> +static __v16qi foo4 (__v16qi a, __v16qi b)
> +{
> +  return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 21, 6, 23,
> +8, 25, 10, 27,12,29,14,31);
> +}
> +
> +__attribute__((noinline, noclone)) void
> +compare_v8hi (__v8hi a,  __v8hi b)
> +{
> +  for (int i = 0; i < 8;

[PATCH v1 1/2] RISC-V: Fix incorrect optimization options passing to gather/scatter

2024-11-24 Thread pan2 . li

From: Pan Li 

Like the strided load/store, the testcases of vector gather/scatter are
designed to pick up different sorts of optimization options but actually
these option are ignored according to the Execution log of gcc.log.  This patch
would like to make it correct almost the same as what we fixed for
strided load/store.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 71251737be2..448374d49db 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -119,7 +119,7 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O2 -mrvv-vector-bits=scalable -mrvv-max-lmul=dynamic 
-ffast-math} ]
 foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/gather-scatter/*.\[cS\]]] \
-"" "$op"
+"$op" ""
   dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/strided/*.\[cS\]]] \
 "$op" ""
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/sat/*.\[cS\]]] \
-- 
2.43.0

[PATCH v1 2/2] RISC-V: Refactor the testcases for RVV gather/scatter

2024-11-24 Thread pan2 . li

From: Pan Li 

This patch would like to refactor the testcases of gather/scatter
after sorts of optimization option passing to testcase.  Includes:

* Remove unnecessary optimization options.
* Adjust dg-final by any-opts and/or no-opts if the rtl dump changes
  on different optimization options (like O2, O3).

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c:
Adjust the dump check times.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:
Remove unnecessary option and add target no-opts/any-tops.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: Ditto.

Signed-off-by: Pan Li 
---
 .../gather-scatter/gather_load_64-12-zvbb.c  |  2 +-
 .../rvv/autovec/gather-scatter/strided_load-1.c  |  9 +++--
 .../rvv/autovec/gather-scatter/strided_load-2.c  | 16 ++--
 .../rvv/autovec/gather-scatter/strided_store-1.c |  9 +++--
 .../rvv/autovec/gather-scatter/strided_store-2.c |  2 +-
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
index 11a4031f47b..1fd3644886a 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
@@ -106,7 +106,7 @@ TEST_LOOP (_Float16, uint64_t)
 TEST_LOOP (float, uint64_t)
 TEST_LOOP (double, uint64_t)
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 80 
"vect" } } */
 /* { dg-final { scan-tree-dump " \.MASK_LEN_GATHER_LOAD" "vect" } } */
 /* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */
 /* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
index 79b39f102bf..b8c9669fa54 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 
-mrvv-vector-bits=scalable -fno-vect-cost-model -ffast-math 
-fdump-tree-optimized-details" } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -mrvv-vector-bits=scalable 
-fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
 
 #include 
 
@@ -40,6 +40,11 @@
 
 TEST_ALL (TEST_LOOP)
 
-/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 66 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 66 
"optimized" { target { any-opts
+ "-O3"
+   } } } } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 44 
"optimized" { target { any-opts
+ "-O2"
+   } } } } */
 /* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
index 8a452e547a3..dab9658b12b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 
-mrvv-vector-bits=scalable -fno-vect-cost-model -ffast-math 
-fdump-tree-optimized-details" } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -mrvv-vector-bits=scalable 
-fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
 
 #include 
 
@@ -40,6 +40,18 @@
 
 TEST_ALL (TEST_LOOP)
 
-/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 33 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 64 
"optimized" { target { any-opts {
+ "-mrvv-max-lmul=m8"
+ "-mrvv-max-lmul=dynamic"
+   } } } } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 56 
"optimized" { target { any-opts {
+ "-mrvv-max-lmul=m4"
+   } } } } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 50 
"optimized" { target { any-opts {
+ "-mrvv-max-lmul=m2"
+   } } } } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 33 
"optimized" { target

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu

On Sun, Nov 24, 2024 at 8:05 PM Richard Biener  wrote:
>
>
>
> > Am 24.11.2024 um 09:17 schrieb Hongtao Liu :
> >
> > On Fri, Nov 22, 2024 at 9:33 PM Richard Biener  wrote:
> >>
> >> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables
> >> an extra 128bit SSE vector epilouge when doing 512bit AVX512
> >> vectorization in the main loop the following allows a 64bit SSE
> >> vector epilogue to be generated when the previous vector epilogue
> >> still had a vectorization factor of 16 or larger (which usually
> >> means we are operating on char data).
> >>
> >> This effectively applies to 256bit and 512bit AVX2/AVX512 main loops,
> >> a 128bit SSE main loop would already get a 64bit SSE vector epilogue.
> >>
> >> Together with X86_TUNE_AVX512_TWO_EPILOGUES this means three
> >> vector epilogues for 512bit and two vector epilogues when enabling
> >> 256bit vectorization.  I have not added another tunable for this
> >> RFC - suggestions on how to avoid inflation there welcome.
> >>
> >> This speeds up 525.x264_r to within 5% of the -mprefer-vector-size=128
> >> speed with -mprefer-vector-size=256 or -mprefer-vector-size=512
> >> (the latter only when -mtune-crtl=avx512_two_epilogues is in effect).
> >>
> >> I have not done any further benchmarking, this merely shows the
> >> possibility and looks for guidance on how to expose this to the
> >> uarch tunings or to the user (at all?) if not gating on any uarch
> >> specific tuning.
> >>
> >> Note 64bit SSE isn't a native vector size so we rely on emulation
> >> being "complete" (if not epilogue vectorization will only fail, so
> >> it's "safe" in this regard).  With AVX512 ISA available an alternative
> >> is a predicated epilog, but due to possible STLF issues user control
> >> would be required here.
> >>
> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress
> >> (I expect some fallout in scans due to some extra epilogues, let's see)
> > I'll do some benchmark, Guess it should be ok.
>
> Any suggestion as to how (or if at all?) we should expose this to users for 
> tuning?
According to my benchmarking, it's generally better on both SRF and
SPR, and at most improves 14% on SRF, 9% on SPR for some specific
benchmark.
So I suggest turn it on by default, no need to put it under uarch tuning.
>
> Richard
>
> >>
> >>* config/i386/i386.cc (ix86_vector_costs::finish_cost): For an
> >>128bit SSE epilogue request a 64bit SSE epilogue if the 128bit
> >>SSE epilogue VF was 16 or higher.
> >> ---
> >> gcc/config/i386/i386.cc | 7 +++
> >> 1 file changed, 7 insertions(+)
> >>
> >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> >> index c7e70c21999..f2e8de3aafc 100644
> >> --- a/gcc/config/i386/i386.cc
> >> +++ b/gcc/config/i386/i386.cc
> >> @@ -25495,6 +25495,13 @@ ix86_vector_costs::finish_cost (const 
> >> vector_costs *scalar_costs)
> >>   && GET_MODE_SIZE (loop_vinfo->vector_mode) == 32)
> >>m_suggested_epilogue_mode = V16QImode;
> >> }
> >> +  /* When a 128bit SSE vectorized epilogue still has a VF of 16 or larger
> >> + enable a 64bit SSE epilogue.  */
> >> +  if (loop_vinfo
> >> +  && LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> >> +  && GET_MODE_SIZE (loop_vinfo->vector_mode) == 16
> >> +  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant () >= 16)
> >> +m_suggested_epilogue_mode = V8QImode;
> >>
> >>   vector_costs::finish_cost (scalar_costs);
> >> }
> >> --
> >> 2.43.0
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

[PATCH] aarch64: Use SVE ASRD instruction with Neon modes.

2024-11-24 Thread Soumya AR

The ASRD instruction on SVE performs an arithmetic shift right by an immediate
for divide.

This patch enables the use of ASRD with Neon modes.

For example:

int in[N], out[N];

void
foo (void)
{
  for (int i = 0; i < N; i++)
out[i] = in[i] / 4;
}

compiles to:

ldr q31, [x1, x0]
cmltv30.16b, v31.16b, #0
and z30.b, z30.b, 3
add v30.16b, v30.16b, v31.16b
sshrv30.16b, v30.16b, 2
str q30, [x0, x2]
add x0, x0, 16
cmp x0, 1024

but can just be:

ldp q30, q31, [x0], 32
asrdz31.b, p7/m, z31.b, #2
asrdz30.b, p7/m, z30.b, #2
stp q30, q31, [x1], 32
cmp x0, x2

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR 

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md: Extended sdiv_pow23 and
*sdiv_pow23 to support Neon modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/sve-asrd.c: New test.
---
 gcc/config/aarch64/aarch64-sve.md | 25 -
 .../gcc.target/aarch64/sve/sve-asrd.c | 54 +++
 2 files changed, 67 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index affdb24a93d..96effe4abed 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4972,34 +4972,35 @@
 
 ;; Unpredicated ASRD.
 (define_expand "sdiv_pow23"
-  [(set (match_operand:SVE_I 0 "register_operand")
-   (unspec:SVE_I
+  [(set (match_operand:SVE_VDQ_I 0 "register_operand")
+   (unspec:SVE_VDQ_I
  [(match_dup 3)
-  (unspec:SVE_I
-[(match_operand:SVE_I 1 "register_operand")
+  (unspec:SVE_VDQ_I
+[(match_operand:SVE_VDQ_I 1 "register_operand")
  (match_operand 2 "aarch64_simd_rshift_imm")]
 UNSPEC_ASRD)]
 UNSPEC_PRED_X))]
   "TARGET_SVE"
   {
-operands[3] = aarch64_ptrue_reg (mode);
+operands[3] = aarch64_ptrue_reg (mode,
+   GET_MODE_UNIT_SIZE (mode));
   }
 )
 
 ;; Predicated ASRD.
 (define_insn "*sdiv_pow23"
-  [(set (match_operand:SVE_I 0 "register_operand")
-   (unspec:SVE_I
+  [(set (match_operand:SVE_VDQ_I 0 "register_operand")
+   (unspec:SVE_VDQ_I
  [(match_operand: 1 "register_operand")
-  (unspec:SVE_I
-[(match_operand:SVE_I 2 "register_operand")
- (match_operand:SVE_I 3 "aarch64_simd_rshift_imm")]
+  (unspec:SVE_VDQ_I
+[(match_operand:SVE_VDQ_I 2 "register_operand")
+ (match_operand:SVE_VDQ_I 3 "aarch64_simd_rshift_imm")]
 UNSPEC_ASRD)]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
   {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
- [ w, Upl , 0 ; *  ] asrd\t%0., %1/m, 
%0., #%3
- [ ?&w  , Upl , w ; yes] movprfx\t%0, 
%2\;asrd\t%0., %1/m, %0., #%3
+ [ w, Upl , 0 ; *  ] asrd\t%Z0., %1/m, 
%Z0., #%3
+ [ ?&w  , Upl , w ; yes] movprfx\t%Z0, 
%Z2\;asrd\t%Z0., %1/m, %Z0., #%3
   }
 )
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c 
b/gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c
new file mode 100644
index 000..00aa8b2380d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast --param aarch64-autovec-preference=asimd-only" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+#define N 1024
+
+#define FUNC(M) \
+M in_##M[N];\
+M out_##M[N];   \
+void asrd_##M() {   \
+  for (int i = 0; i < N; i++)   \
+out_##M[i] = in_##M[i] / 4; \
+}
+
+/*
+** asrd_int8_t:
+** ...
+** ptrue   (p[0-7]).b, vl1
+** ...
+** asrdz[0-9]+\.b, \1/m, z[0-9]+\.b, #2
+** ...
+*/
+FUNC(int8_t)
+
+/*
+** asrd_int16_t:
+** ...
+** ptrue   (p[0-7]).b, vl2
+** ...
+** asrdz[0-9]+\.h, \1/m, z[0-9]+\.h, #2
+** ...
+*/
+FUNC(int16_t)
+
+/*
+** asrd_int32_t:
+** ...
+** ptrue   (p[0-7]).b, vl4
+** ...
+** asrdz[0-9]+\.s, \1/m, z[0-9]+\.s, #2
+** ...
+*/
+FUNC(int32_t)
+
+/*
+** asrd_int64_t:
+** ...
+** ptrue   (p[0-7]).b, vl8
+** ...
+** asrdz[0-9]+\.d, \1/m, z[0-9]+\.d, #2
+** ...
+*/
+FUNC(int64_t)
-- 
2.43.2

Re: [PATCH] i386/testsuite: Correct AVX10.2 FP8 test mask usage

2024-11-24 Thread Hongtao Liu

On Fri, Nov 22, 2024 at 4:08 PM Haochen Jiang  wrote:
>
> Hi all,
>
> Under FP8, we should not use AVX512F_LEN_HALF to get the mask size since
> it will get 16 instead of 8 and drop into wrong if condition. Correct
> the usage for vcvtneph2[b,h]f8[,s] runtime test.
>
> Tested under sde. Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Correct 128bit
> mask usage.
> * gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
> ---
>  .../i386/avx10_2-512-vcvtneph2bf8-2.c | 25 +++
>  .../i386/avx10_2-512-vcvtneph2bf8s-2.c| 25 +++
>  .../i386/avx10_2-512-vcvtneph2hf8-2.c | 23 ++---
>  .../i386/avx10_2-512-vcvtneph2hf8s-2.c| 23 ++---
>  4 files changed, 58 insertions(+), 38 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c 
> b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c
> index d5ba911334c..96ca7e80c4d 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c
> @@ -11,8 +11,8 @@
>  #include "avx10-helper.h"
>  #include "fp8-helper.h"
>
> -#define SIZE_SRC (AVX512F_LEN / 16)
> -#define SIZE (AVX512F_LEN_HALF / 8)
> +#define SIZE (AVX512F_LEN / 16)
> +#define SIZE_DST (AVX512F_LEN_HALF / 8)
>  #include "avx512f-mask-type.h"
>
>  void
> @@ -23,14 +23,14 @@ CALC (unsigned char *r, _Float16 *s)
>hf8_bf8 = 1;
>saturate = 0;
>
> -  for (i = 0; i < SIZE; i++)
> +  for (i = 0; i < SIZE_DST; i++)
>  {
>r[i] = 0;
> -  if (i < SIZE_SRC)
> +  if (i < SIZE)
> {
>   Float16Union usrc = {.f16 = s[i]};
>   r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate);
> }
>  }
>  }
>
> @@ -41,17 +41,22 @@ TEST (void)
>UNION_TYPE (AVX512F_LEN_HALF, i_b) res1, res2, res3;
>UNION_TYPE (AVX512F_LEN, h) src;
>MASK_TYPE mask = MASK_VALUE;
> -  unsigned char res_ref[SIZE];
> +  unsigned char res_ref[SIZE_DST];
>
>sign = 1;
> -  for (i = 0; i < SIZE_SRC; i++)
> +  for (i = 0; i < SIZE; i++)
>  {
>src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3;
>sign = -sign;
>  }
>
> +#if AVX512F_LEN > 128
> +  for (i = 0; i < SIZE_DST; i++)
> +res2.a[i] = DEFAULT_VALUE;
> +#else
>for (i = 0; i < SIZE; i++)
>  res2.a[i] = DEFAULT_VALUE;
> +#endif
>
>CALC(res_ref, src.a);
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c 
> b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c
> index 49e170aa428..c458f1ebb77 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c
> @@ -11,8 +11,8 @@
>  #include "avx10-helper.h"
>  #include "fp8-helper.h"
>
> -#define SIZE_SRC (AVX512F_LEN / 16)
> -#define SIZE (AVX512F_LEN_HALF / 8)
> +#define SIZE (AVX512F_LEN / 16)
> +#define SIZE_DST (AVX512F_LEN_HALF / 8)
>  #include "avx512f-mask-type.h"
>
>  void
> @@ -23,14 +23,14 @@ CALC (unsigned char *r, _Float16 *s)
>hf8_bf8 = 1;
>saturate = 1;
>
> -  for (i = 0; i < SIZE; i++)
> +  for (i = 0; i < SIZE_DST; i++)
>  {
>r[i] = 0;
> -  if (i < SIZE_SRC)
> +  if (i < SIZE)
> {
>   Float16Union usrc = {.f16 = s[i]};
>   r[i] = convert_fp16_to_fp8(usrc.f16, 0, hf8_bf8, saturate);
> }
>  }
>  }
>
> @@ -41,17 +41,22 @@ TEST (void)
>UNION_TYPE (AVX512F_LEN_HALF, i_b) res1, res2, res3;
>UNION_TYPE (AVX512F_LEN, h) src;
>MASK_TYPE mask = MASK_VALUE;
> -  unsigned char res_ref[SIZE];
> +  unsigned char res_ref[SIZE_DST];
>
>sign = 1;
> -  for (i = 0; i < SIZE_SRC; i++)
> +  for (i = 0; i < SIZE; i++)
>  {
>src.a[i] = (_Float16)(sign * (2.5 * (1 << (i % 3;
>sign = -sign;
>  }
>
> +#if AVX512F_LEN > 128
> +  for (i = 0; i < SIZE_DST; i++)
> +res2.a[i] = DEFAULT_VALUE;
> +#else
>for (i = 0; i < SIZE; i++)
>  res2.a[i] = DEFAULT_VALUE;
> +#endif
>
>CALC(res_ref, src.a);
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c 
> b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c
> index f481b72cc71..cb9cdbb89c1 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c
> @@ -11,8 +11,8 @@
>  #include "avx10-helper.h"
>  #include "fp8-helper.h"
>
> -#define SIZE_SRC (AVX512F_LEN / 16)
> -#define SIZE (AVX512F_LEN_HALF / 8)
> +#define SIZE (AVX512F_LEN / 16)
> +#define SIZE_DST (AVX512F_LEN_HALF / 8)
>  #include "avx512f-mask-type.h"
>
>  void
> @@ -23,14 +23,14 @@ CALC (unsigned char *r, _Float16 *s)
>hf8_bf8 = 0;
>saturate = 0;
>
> -  for (i = 0; i < SIZE; i++)
> +  for (i

Re: [RFC/RFA][PATCH v6 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-11-24 Thread Mariam Arutunian

On Sun, Nov 24, 2024, 08:59 Jeff Law  wrote:

>
>
> On 11/13/24 7:16 AM, Mariam Arutunian wrote:
>
> >
> >
> > To address this, I added code in |target-supports.exp| and modified the
> > relevant tests.
> > I've attached the patch. Could you please check whether it is correct?
> Just a few more notes.
>
> I revamped the changes to emit_crc.  I think the right way to fix that
> riscv problem you were chasing was actually higher in the call chain.
>
> If you look at the setup code when you use the backend expanders, it
> looks like this:
>
> >   /* Use target specific expansion if it exists.
> >  Otherwise, generate table-based CRC.  */
> >   if (direct_internal_fn_supported_p (fn, tree_pair (data_type,
> result_type),
> >   OPTIMIZE_FOR_SPEED))
> > {
> >   class expand_operand ops[4];
> >   create_call_lhs_operand (&ops[0], dest, TYPE_MODE (result_type));
> >   create_input_operand (&ops[1], crc, TYPE_MODE (result_type));
> >   create_input_operand (&ops[2], data, TYPE_MODE (data_type));
> >   create_input_operand (&ops[3], polynomial, TYPE_MODE
> (result_type));
> >   insn_code icode = convert_optab_handler (optab, TYPE_MODE
> (data_type),
> >TYPE_MODE (result_type));
> >   expand_insn (icode, 4, ops);
> >   assign_call_lhs (lhs, dest, &ops[0]);
> > }
>
>
> We need to do basically the same thing (at least for the return value)
> when we expand via a table.  If you look in assign_call_lhs is has all
> the necessary bits to handle the promoted subreg case correctly.
>
> I've changed the table based expansion clause to look like this:
>
> >   else
> > {
> >   /* We're bypassing all the operand conversions that are done in the
> >  case when we get an icode, operands and pass that off to
> expand_insn.
> >
> >  That path has special case handling for promoted return values
> which
> >  we must emulate here (is the same kind of special treatment ever
> >  needed for input arguments here?).
> >
> >  In particular we do not want to store directly into a promoted
> >  SUBREG destination, instead store into a suitably sized
> pseudo.  */
> >   rtx orig_dest = dest;
> >   if (SUBREG_P (dest) && SUBREG_PROMOTED_VAR_P (dest))
> > dest = gen_reg_rtx (GET_MODE (dest));
> >
> >   /* If it's IFN_CRC generate bit-forward CRC.  */
> >   if (fn == IFN_CRC)
> > expand_crc_table_based (dest, crc, data, polynomial,
> > TYPE_MODE (data_type));
> >   else
> > /* If it's IFN_CRC_REV generate bit-reversed CRC.  */
> > expand_reversed_crc_table_based (dest, crc, data, polynomial,
> >  TYPE_MODE (data_type),
> >
> generate_reflecting_code_standard);
> >
> >   /* Now get the return value where it needs to be, taking care to
> >  ensure it's promoted appropriately if the ABI demands it.
> >
> >  Re-use assign_call_lhs to handle the details.  */
> >   class expand_operand ops[4];
> >   create_call_lhs_operand (&ops[0], dest, TYPE_MODE (result_type));
> >   ops[0].value = dest;
> >   assign_call_lhs (lhs, orig_dest, &ops[0]);
> > }
>
>
> And I'm also starting to fix the word_size assumptions, particularly in
> the table expansion path.  I know we used word_size to simplify stuff in
> the target bits and that's probably still the right thing to do in those
> target specific paths.  But for the table lookup path we shouldn't
> really need to do that.  By removing the word_size assumptions we also
> can avoid needing to make the CRC builtins conditional on target
> specific properties -- I'd really likely to avoid having the
> availability of the builtins be dependent on the target word size.
>
> In the testsuite, tests which use integers that don't fit in 16 bits
> need a magic marker so they they're not used on 16 bit integer targets.
>
> /* { dg-require-effective-target int32plus } */
>
>
> There were two tests using assert which we generally try to avoid.
> Instead we should just do a simple equality test and abort if the test
> is false.  Also execution tests need to exit with zero status when they
> pass.  I've fixed those problems as well.
>
> I've pushed the current state to users/jlaw/mariam-crc-branch.  I
> haven't yet incorporated your risc-v testsuite fix yet though.  I have
> done some rebasing/squashing of patches where it made sense.  I'm hoping
> to get the various final nits cleaned up this week.
>


Thank you very much! I'll have a look.
Please let me know if there's anything specific you’d like me to address.

Thanks,
Mariam

Jeff
>
>
>

RE: [PATCH v1] I386: Add more testcases for unsigned SAT_ADD vector pattern

2024-11-24 Thread Liu, Hongtao




> -Original Message-
> From: Li, Pan2 
> Sent: Monday, November 25, 2024 10:01 AM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao ; Li, Pan2
> 
> Subject: [PATCH v1] I386: Add more testcases for unsigned SAT_ADD vector
> pattern
> 
> From: Pan Li 
> 
> There are some forms like below failed to recog the SAT_ADD pattern for target
> i386.  It is related to some match pattern extraction but get fixed after the
> refactor of the SAT_ADD pattern.  Thus, add testcases to ensure we may have
> similar issue in futrue.
> 
>   #define DEF_SAT_ADD(T)   \
>   T sat_add_##T (T x, T y) \
>   {\
> T res; \
> res = x + y;   \
> res |= -(T)(res < x);  \
> return res;\
>   }
> 
>   #define VEC_DEF_SAT_ADD(T)   \
>   void vec_sat_add(T * restrict a, T * restrict b) \
>   {\
> for (int i = 0; i < 8; i++)\
>   b[i] = sat_add_##T (a[i], b[i]); \
>   }
> 
>   DEF_SAT_ADD (uint32_t)
>   VEC_DEF_SAT_ADD (uint32_t)
> 
> The below test suites are passed for this patch.
> * The x86 fully regression test.
> 
>   PR target/112600
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr112600-5a-u16.c: New test.
>   * gcc.target/i386/pr112600-5a-u32.c: New test.
>   * gcc.target/i386/pr112600-5a-u64.c: New test.
>   * gcc.target/i386/pr112600-5a-u8.c: New test.
>   * gcc.target/i386/pr112600-5a.h: New test.
> 
> Signed-off-by: Pan Li 
> ---
>  .../gcc.target/i386/pr112600-5a-u16.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u32.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u64.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u8.c  | 10 +
>  gcc/testsuite/gcc.target/i386/pr112600-5a.h   | 22 +++
>  5 files changed, 62 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> new file mode 100644
> index 000..a278703fbdc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint16_t)
> +VEC_DEF_SAT_ADD (uint16_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */

You're scanning ".SAT_ADD ", so maybe better with pass "optimized" instead of 
"expand"?
Others LGTM.

> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> new file mode 100644
> index 000..52e31b7e1c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint32_t)
> +VEC_DEF_SAT_ADD (uint32_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> new file mode 100644
> index 000..4ee717471b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint64_t)
> +VEC_DEF_SAT_ADD (uint64_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> new file mode 100644
> index 000..9f488ebf658
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint8_t)
> +VEC_DEF_SAT_ADD (uint8_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a.h
> b/gcc/testsuite/gcc.target/i386/pr112600-5a.h
> new file mode 100644
> index 000..1e753695e81
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a.h
> @@ -0,0 +1,22 @@
> +#ifndef HAVE_DEFINED_PR112600_5A_H
> +#define HAVE_DEFINED_PR112600_5A_H
> +
> +#include 
> +
> +#define DEF_SAT_ADD(T)   \
> +T sat_add_##T (T x, T y) \
> +{\
> +  T res; \
>

RE: [PATCH v1] I386: Add more testcases for unsigned SAT_ADD vector pattern

2024-11-24 Thread Li, Pan2

> You're scanning ".SAT_ADD ", so maybe better with pass "optimized" instead of 
> "expand"?

Sure, let me update in v2.

Pan

-Original Message-
From: Liu, Hongtao  
Sent: Monday, November 25, 2024 10:09 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: ubiz...@gmail.com
Subject: RE: [PATCH v1] I386: Add more testcases for unsigned SAT_ADD vector 
pattern



> -Original Message-
> From: Li, Pan2 
> Sent: Monday, November 25, 2024 10:01 AM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao ; Li, Pan2
> 
> Subject: [PATCH v1] I386: Add more testcases for unsigned SAT_ADD vector
> pattern
> 
> From: Pan Li 
> 
> There are some forms like below failed to recog the SAT_ADD pattern for target
> i386.  It is related to some match pattern extraction but get fixed after the
> refactor of the SAT_ADD pattern.  Thus, add testcases to ensure we may have
> similar issue in futrue.
> 
>   #define DEF_SAT_ADD(T)   \
>   T sat_add_##T (T x, T y) \
>   {\
> T res; \
> res = x + y;   \
> res |= -(T)(res < x);  \
> return res;\
>   }
> 
>   #define VEC_DEF_SAT_ADD(T)   \
>   void vec_sat_add(T * restrict a, T * restrict b) \
>   {\
> for (int i = 0; i < 8; i++)\
>   b[i] = sat_add_##T (a[i], b[i]); \
>   }
> 
>   DEF_SAT_ADD (uint32_t)
>   VEC_DEF_SAT_ADD (uint32_t)
> 
> The below test suites are passed for this patch.
> * The x86 fully regression test.
> 
>   PR target/112600
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr112600-5a-u16.c: New test.
>   * gcc.target/i386/pr112600-5a-u32.c: New test.
>   * gcc.target/i386/pr112600-5a-u64.c: New test.
>   * gcc.target/i386/pr112600-5a-u8.c: New test.
>   * gcc.target/i386/pr112600-5a.h: New test.
> 
> Signed-off-by: Pan Li 
> ---
>  .../gcc.target/i386/pr112600-5a-u16.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u32.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u64.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u8.c  | 10 +
>  gcc/testsuite/gcc.target/i386/pr112600-5a.h   | 22 +++
>  5 files changed, 62 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> new file mode 100644
> index 000..a278703fbdc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint16_t)
> +VEC_DEF_SAT_ADD (uint16_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */

You're scanning ".SAT_ADD ", so maybe better with pass "optimized" instead of 
"expand"?
Others LGTM.

> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> new file mode 100644
> index 000..52e31b7e1c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint32_t)
> +VEC_DEF_SAT_ADD (uint32_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> new file mode 100644
> index 000..4ee717471b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint64_t)
> +VEC_DEF_SAT_ADD (uint64_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> new file mode 100644
> index 000..9f488ebf658
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand-details" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint8_t)
> +VEC_DEF_SAT_ADD (uint8_t)
> +
> +/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a.h
> b/gcc/testsuite/gcc.target/i386/pr112600-5a.

Re: [PATCH] [x86] Fix uninitialized operands[2] in vec_unpacks_hi_v4sf.

2024-11-24 Thread Hongtao Liu

On Fri, Nov 22, 2024 at 9:16 PM Richard Biener  wrote:
>
> On Fri, 22 Nov 2024, liuhongt wrote:
>
> > It could cause weired spill in RA when register pressure is high.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > BTW, It's difficult to get a decent testcase for the issue since the spill
> > is not exposed in simple testcase.
>
> I think it's a good patch independent on the spill issue given it
> avoids false dependences on the scratch reg contents.
Yes, committed, and will backport to release branches.
> Richard.
>
> > gcc/ChangeLog:
> >
> >   PR target/117562
> >   * config/i386/sse.md (vec_unpacks_hi_v4sf): Initialize
> >   operands[2] with CONST0_RTX.
> > ---
> >  gcc/config/i386/sse.md | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 72acd5bde5e..498a42d6e1e 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -10424,7 +10424,10 @@ (define_expand "vec_unpacks_hi_v4sf"
> > (match_dup 2)
> > (parallel [(const_int 0) (const_int 1)]]
> >"TARGET_SSE2"
> > -  "operands[2] = gen_reg_rtx (V4SFmode);")
> > +{
> > +  operands[2] = gen_reg_rtx (V4SFmode);
> > +  emit_move_insn (operands[2], CONST0_RTX (V4SFmode));
> > +})
> >
> >  (define_expand "vec_unpacks_hi_v8sf"
> >[(set (match_dup 2)
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)



-- 
BR,
Hongtao

Re: [PATCH v8] Target-independent store forwarding avoidance.

2024-11-24 Thread Philipp Tomsich

Pushed to master with the following fixups:
  - new timevar added
  - nits addressed
  - whitespace fixes

Philipp.


On Mon, 25 Nov 2024 at 03:30, Jeff Law  wrote:
>
>
>
> On 11/9/24 2:48 AM, Konstantinos Eleftheriou wrote:
> > From: kelefth 
> >
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >   strbw2, [x1, 1]
> >   ldr x0, [x1]  # Expensive store forwarding to larger load.
> >
> > To:
> >
> >   ldr x0, [x1]
> >   strbw2, [x1]
> >   bfi x0, x2, 0, 8
> >
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >Neoverse-N1:  +29.4%
> >Intel Coffeelake: +13.1%
> >AMD 5950X:+17.5%
> >
> > The transformation is rejected on cases that would cause store_bit_field
> > to generate subreg expressions on different register classes.
> > Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain
> > such cases and have been marked as XFAIL.
> >
> > There is a special handling for machines with BITS_BIG_ENDIAN !=
> > BYTES_BIG_ENDIAN. The need for this came up from an issue in H8
> > architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN
> > is false. In that case, the START parameter of store_bit_field
> > needs to be calculated from the end of the destination register.
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in (OBJS): Add avoid-store-forwarding.o.
> >   * common.opt (favoid-store-forwarding): New option.
> >   * common.opt.urls: Regenerate.
> >   * doc/invoke.texi: New param store-forwarding-max-distance.
> >   * doc/passes.texi: Document new pass.
> >   * doc/tm.texi: Regenerate.
> >   * doc/tm.texi.in: Document new pass.
> >   * params.opt (store-forwarding-max-distance): New param.
> >   * passes.def: Add pass_rtl_avoid_store_forwarding before
> >   pass_early_remat.
> >   * target.def (avoid_store_forwarding_p): New DEFHOOK.
> >   * target.h (struct store_fwd_info): Declare.
> >   * targhooks.cc (default_avoid_store_forwarding_p): New function.
> >   * targhooks.h (default_avoid_store_forwarding_p): Declare.
> >   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> >   * avoid-store-forwarding.cc: New file.
> >   * avoid-store-forwarding.h: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
> >   * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
> >   * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
> >   * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
> >   * gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
> >   * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test.
> >  * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New 
> > test.
> >
> > Signed-off-by: Philipp Tomsich 
> > Signed-off-by: Konstantinos Eleftheriou 
> >
> > Series-version: 8
> >
> > Series-changes: 8
> >   - Fix store_bit_field call for big-endian targets, where
> > BITS_BIG_ENDIAN is false.
> >   - Handle store_forwarding_max_distance = 0 as a special case that
> > disables cost checks for avoid-store-forwarding.
> >   - Update testcases for AArch64 and add testcases for x86-64.
> >
> > Series-changes: 7
> >   - Fix bug when copying back the load register, in the case that the
> > load is eliminated.
> >
> > Series-changes: 6
> >   - Reject the transformation on cases that would cause store_bit_field
> > to generate subreg expressions on different register classes.
> > Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c
> >contain such cases and have been marked as XFAIL.
> >   - Use optimize_bb_for_speed_p instead of optimize_insn_for_speed_p.
> >   - Inline and remove get_load_mem.
> >   - New implementation for is_store_forwarding.
> >   - Refactor the main loop in avoid_store_forwarding.
> >   - Avoid using the word 'forwardings'.
> >   - Use lowpart_subreg instead of validate_subreg + gen_rtx_subreg.
> >   - Don't use df_insn_rescan where not needed.
> >   - Change order of emitting stores and bit insert instructions.
> >   - Check and reject loads for which the dest register overlaps with 
> > src.
> >   - Remove unused variable.
> >   - Change some gen_mov_insn function calls to gen_rtx_SET.
> >   - Subtract the cost of eliminated load, instead of 1, for the total 
> > cost.
> >   - Use delete_insn instead of set_insn_deleted.
> >   - Regenerate common.opt.urls.
> >   - Add some more comments.
> >
> > Series-changes: 5
> >   - Fix bug with BIG_ENDIAN targets.
> >   -

[PATCH v1] I386: Add more testcases for unsigned SAT_ADD vector pattern

2024-11-24 Thread pan2 . li

From: Pan Li 

There are some forms like below failed to recog the SAT_ADD
pattern for target i386.  It is related to some match pattern
extraction but get fixed after the refactor of the SAT_ADD
pattern.  Thus, add testcases to ensure we may have similar
issue in futrue.

  #define DEF_SAT_ADD(T)   \
  T sat_add_##T (T x, T y) \
  {\
T res; \
res = x + y;   \
res |= -(T)(res < x);  \
return res;\
  }

  #define VEC_DEF_SAT_ADD(T)   \
  void vec_sat_add(T * restrict a, T * restrict b) \
  {\
for (int i = 0; i < 8; i++)\
  b[i] = sat_add_##T (a[i], b[i]); \
  }

  DEF_SAT_ADD (uint32_t)
  VEC_DEF_SAT_ADD (uint32_t)

The below test suites are passed for this patch.
* The x86 fully regression test.

PR target/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-5a-u16.c: New test.
* gcc.target/i386/pr112600-5a-u32.c: New test.
* gcc.target/i386/pr112600-5a-u64.c: New test.
* gcc.target/i386/pr112600-5a-u8.c: New test.
* gcc.target/i386/pr112600-5a.h: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/i386/pr112600-5a-u16.c | 10 +
 .../gcc.target/i386/pr112600-5a-u32.c | 10 +
 .../gcc.target/i386/pr112600-5a-u64.c | 10 +
 .../gcc.target/i386/pr112600-5a-u8.c  | 10 +
 gcc/testsuite/gcc.target/i386/pr112600-5a.h   | 22 +++
 5 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h

diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c 
b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
new file mode 100644
index 000..a278703fbdc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
@@ -0,0 +1,10 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand-details" } */
+
+#include "pr112600-5a.h"
+
+DEF_SAT_ADD (uint16_t)
+VEC_DEF_SAT_ADD (uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c 
b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
new file mode 100644
index 000..52e31b7e1c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
@@ -0,0 +1,10 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand-details" } */
+
+#include "pr112600-5a.h"
+
+DEF_SAT_ADD (uint32_t)
+VEC_DEF_SAT_ADD (uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c 
b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
new file mode 100644
index 000..4ee717471b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
@@ -0,0 +1,10 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand-details" } */
+
+#include "pr112600-5a.h"
+
+DEF_SAT_ADD (uint64_t)
+VEC_DEF_SAT_ADD (uint64_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c 
b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
new file mode 100644
index 000..9f488ebf658
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
@@ -0,0 +1,10 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand-details" } */
+
+#include "pr112600-5a.h"
+
+DEF_SAT_ADD (uint8_t)
+VEC_DEF_SAT_ADD (uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a.h 
b/gcc/testsuite/gcc.target/i386/pr112600-5a.h
new file mode 100644
index 000..1e753695e81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a.h
@@ -0,0 +1,22 @@
+#ifndef HAVE_DEFINED_PR112600_5A_H
+#define HAVE_DEFINED_PR112600_5A_H
+
+#include 
+
+#define DEF_SAT_ADD(T)   \
+T sat_add_##T (T x, T y) \
+{\
+  T res; \
+  res = x + y;   \
+  res |= -(T)(res < x);  \
+  return res;\
+}
+
+#define VEC_DEF_SAT_ADD(T)   \
+void vec_sat_add(T * restrict a, T * restrict b) \
+{\
+  for (int i = 0; i < 8; i++)\
+b[i] = sat_add_##T (a[i], b[i]); \
+}
+
+#endif
-- 
2.43.0

[PATCH v2] I386: Add more testcases for unsigned SAT_ADD vector pattern

2024-11-24 Thread pan2 . li

From: Pan Li 

Update in v2:

* Skip lto build as no such dump files.
* scan dump check for optimized.

Original log:

There are some forms like below failed to recog the SAT_ADD
pattern for target i386.  It is related to some match pattern
extraction but get fixed after the refactor of the SAT_ADD
pattern.  Thus, add testcases to ensure we may have similar
issue in futrue.

  #define DEF_SAT_ADD(T)   \
  T sat_add_##T (T x, T y) \
  {\
T res; \
res = x + y;   \
res |= -(T)(res < x);  \
return res;\
  }

  #define VEC_DEF_SAT_ADD(T)   \
  void vec_sat_add(T * restrict a, T * restrict b) \
  {\
for (int i = 0; i < 8; i++)\
  b[i] = sat_add_##T (a[i], b[i]); \
  }

  DEF_SAT_ADD (uint32_t)
  VEC_DEF_SAT_ADD (uint32_t)

The below test suites are passed for this patch.
* The x86 fully regression test.

PR target/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-5a-u16.c: New test.
* gcc.target/i386/pr112600-5a-u32.c: New test.
* gcc.target/i386/pr112600-5a-u64.c: New test.
* gcc.target/i386/pr112600-5a-u8.c: New test.
* gcc.target/i386/pr112600-5a.h: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/i386/pr112600-5a-u16.c | 10 +
 .../gcc.target/i386/pr112600-5a-u32.c | 10 +
 .../gcc.target/i386/pr112600-5a-u64.c | 10 +
 .../gcc.target/i386/pr112600-5a-u8.c  | 10 +
 gcc/testsuite/gcc.target/i386/pr112600-5a.h   | 22 +++
 5 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h

diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c 
b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
new file mode 100644
index 000..5f314d6b46a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
@@ -0,0 +1,10 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include "pr112600-5a.h"
+
+DEF_SAT_ADD (uint16_t)
+VEC_DEF_SAT_ADD (uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { 
no-opts { "-ffat-lto-objects" } } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c 
b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
new file mode 100644
index 000..229a27c4c20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
@@ -0,0 +1,10 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include "pr112600-5a.h"
+
+DEF_SAT_ADD (uint32_t)
+VEC_DEF_SAT_ADD (uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { 
no-opts { "-ffat-lto-objects" } } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c 
b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
new file mode 100644
index 000..2c9e4d09fe0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
@@ -0,0 +1,10 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include "pr112600-5a.h"
+
+DEF_SAT_ADD (uint64_t)
+VEC_DEF_SAT_ADD (uint64_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { 
no-opts { "-ffat-lto-objects" } } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c 
b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
new file mode 100644
index 000..a3c593a0551
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
@@ -0,0 +1,10 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include "pr112600-5a.h"
+
+DEF_SAT_ADD (uint8_t)
+VEC_DEF_SAT_ADD (uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "optimized" { target { 
no-opts { "-ffat-lto-objects" } } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a.h 
b/gcc/testsuite/gcc.target/i386/pr112600-5a.h
new file mode 100644
index 000..1e753695e81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-5a.h
@@ -0,0 +1,22 @@
+#ifndef HAVE_DEFINED_PR112600_5A_H
+#define HAVE_DEFINED_PR112600_5A_H
+
+#include 
+
+#define DEF_SAT_ADD(T)   \
+T sat_add_##T (T x, T y) \
+{\
+  T res; \
+  res = x + y;   \
+  res |= -(T)(res < x);  \
+  return res;\
+}
+
+#define VEC_DEF_SAT_ADD(T)   \
+void vec_sat_add(T * restrict a, T * restrict b) \
+{\
+  for (int i = 0; i < 8; i++)\
+b[i] = sat_add_##T (a[i], b[i])

RE: [gcc-wwwdocs PATCH] gcc-15: Mention new ISA and Diamond Rapids support for x86_64 backend

2024-11-24 Thread Jiang, Haochen

> From: Gerald Pfeifer 
> Sent: Sunday, November 24, 2024 7:17 AM
> 
> On Mon, 11 Nov 2024, Haochen Jiang wrote:
> > This patch will add recent new ISA and arch support for x86_64 backend
> > into gcc-wwwdocs.
> 
> > +  New ISA extension support for Intel AMX-AVX512 was added.
> 
> In all these cases, can we just sasy "ISA extension support ... was added" and
> drop the "New"?
> 
> > +  compiler switch. 128 and 256 bit MOVRS intrinsics are available
> > + via
> 
> "128- and 256-bit..."
> 
> > +  The EVEX version support for Intel SM4 was added.
> > +  New 512-bit SM4 intrinsics are available via the
> > +  -msm4 -mavx10.2-512 compiler switch.
> 
> Just "EVEX version support..."
> 
> > +AMX-FP8, AMX-MOVRS, AMX-TF32, AMX-TRANSPOSE, APX_F, AVX10.2
> with
> > + 512 bit
> 
> "512-bit"
> 
> >Support for Xeon Phi CPUs (a.k.a. Knight Landing and Knight Mill) 
> > were
> >removed in GCC 15. GCC will no longer accept -march=knl,
> >-march=knm,-mavx5124fmaps,
> 
> "...no longer accepts..."
> 
> And make the last ", and -mavx5124fmaps" (adding a blank
> and the word "and").
>

I will do all of the changes with little tweak here. The "and" should be added
(actually changed the previous "or" to "and") between -mtune=knl
and -mtune=knm.

Thx,
Haochen

[patch, fortran] PR117765 Impure function within a BLOCK construct within a DO CONCURRENT

2024-11-24 Thread Jerry D


I would like to commit the attached patch for Steve.

Regression tested on x86-64-linux-gnu.

OK for trunk?

Author: Steve Kargl 
Date:   Sun Nov 24 18:26:03 2024 -0800

Fortran: Check IMPURE in BLOCK inside DO CONCURRENT.

PR fortran/117765

gcc/fortran/ChangeLog:

* resolve.cc (check_pure_function): Check the stack too
see if there in a nested BLOCK and, if that block is in
a DO_CONCURRENT, issue an error.

gcc/testsuite/ChangeLog:

* gfortran.dg/impure_fcn_do_concurrent.f90: New test.
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index b8c908b51e9..1b98be205b4 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -3228,6 +3228,24 @@ pure_stmt_function (gfc_expr *e, gfc_symbol *sym)
 static bool check_pure_function (gfc_expr *e)
 {
   const char *name = NULL;
+  code_stack *stack;
+  bool saw_block = false;
+  
+  /* A BLOCK construct within a DO CONCURRENT construct leads to 
+ gfc_do_concurrent_flag = 0 when the check for an impure function
+ occurs.  Check the stack to see if the source code has a nested
+ BLOCK construct.  *
+  for (stack = cs_base; stack; stack = stack->prev)
+{
+  if (stack->current->op == EXEC_BLOCK) saw_block = true;
+  if (saw_block && stack->current->op == EXEC_DO_CONCURRENT)
+	{
+	  gfc_error ("Reference to impure function at %L inside a "
+		 "DO CONCURRENT", &e->where);
+	  return false;
+	}
+}
+
   if (!gfc_pure_function (e, &name) && name)
 {
   if (forall_flag)
diff --git a/gcc/testsuite/gfortran.dg/impure_fcn_do_concurrent.f90 b/gcc/testsuite/gfortran.dg/impure_fcn_do_concurrent.f90
new file mode 100644
index 000..07b5a37f978
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/impure_fcn_do_concurrent.f90
@@ -0,0 +1,31 @@
+!
+! { dg-do compile }
+!
+program foo
+
+   implicit none
+
+   integer i
+   integer :: j = 0
+   real y(4)
+
+   do concurrent(i=1:4)
+  y(i) = bar(i)! { dg-error "Reference to impure function" }
+   end do
+
+   do concurrent(i=1:4)
+  block
+ y(i) = bar(i) ! { dg-error "Reference to impure function" }
+  end block
+   end do
+
+   contains
+
+  impure function bar(i)
+ real bar
+ integer, intent(in) :: i
+ j = j + i
+ bar = j
+  end function bar
+
+end program foo

RE: [committed] c: Default to -std=gnu23

2024-11-24 Thread Jiang, Haochen

> From: Joseph Myers 
> Sent: Saturday, November 16, 2024 7:47 AM
> 
> Change the default language version for C compilation from -std=gnu17
> to -std=gnu23.  A few tests are updated to remove local definitions of
> bool, true and false (where making such an unconditional test change
> seemed to make more sense than changing the test conditionally earlier
> or building it with -std=gnu17); most test issues were already
> addressed in previous patches.  In the case of
> ctf-function-pointers-2.c, it was agreed in bug 117289 that it would
> be OK to put -std=gnu17 in the test and leave more optimal BTF / CTF
> output for this test as a potential future improvement.
> 
> Since the original test fixes, more such fixes have become necessary
> and so are included in this patch.  More noinline attributes are added
> to simulate-thread tests where () meaning a prototype affected test
> results, while gcc.dg/torture/pr117496-1.c (a test declaring a
> function with () then calling it with arguments) gets -std=gnu17
> added.
> 
> Bootstrapped with no regressions for x86_64-pc-linux-gnu.
> 
> NOTE: it's likely there are target-specific tests for non-x86 targets
> that need updating as a result of this change.  See commit
> 9fb5348e3021021e82d75e4ca4e6f8d51a34c24f ("testsuite: Prepare for
> -std=gnu23 default") for examples of changes to prepare the testsuite
> to work with a -std=gnu23 default.  In most cases, adding
> -Wno-old-style-definition (for warnings for old-style function
> definitions) or -std=gnu17 (for other issues such as unprototyped
> function declarations with ()) is appropriate, but watch out for cases
> that indicate bugs with -std=gnu23 (in particular, any ICEs - there
> was only the one nested function test where I had to fix an ICE on
> x86_64).
> 

A quick question: Should we add this in gcc-wwwdocs porting doc or
somewhere else? The upgrade does cause some old code fail to compile
although it should fail.

Thx,
Haochen

Pushed: [PATCH] pa: Remove pa_section_type_flags

2024-11-24 Thread Xi Ruoyao

On Sun, 2024-11-24 at 14:12 -0500, John David Anglin wrote:
> I don't see any regressions with this change.  Patch is okay
> if you remove declaration of pa_section_type_flags in pa.cc.

Pushed https://gcc.gnu.org/r15-5641 with the declaration of
pa_section_type_flags removed.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PUSHED] opt.url: Regenerate the .opt.urls files

2024-11-24 Thread Andrew Pinski

Just regenerated them after the addition of msplit-bit-shift avr option.

Pushed as obvious.

gcc/ChangeLog:

* config/avr/avr.opt.urls: Regenerate.
* config/g.opt.urls: Regenerate.
* config/i386/nto.opt.urls: Regenerate.
* config/riscv/riscv.opt.urls: Regenerate.
* config/rx/rx.opt.urls: Regenerate.
* config/sol2.opt.urls: Regenerate.

Signed-off-by: Andrew Pinski 
---
 gcc/config/avr/avr.opt.urls | 3 +++
 gcc/config/g.opt.urls   | 2 +-
 gcc/config/i386/nto.opt.urls| 2 +-
 gcc/config/riscv/riscv.opt.urls | 2 +-
 gcc/config/rx/rx.opt.urls   | 2 +-
 gcc/config/sol2.opt.urls| 2 +-
 6 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/config/avr/avr.opt.urls b/gcc/config/avr/avr.opt.urls
index 672821db79d..63d5a694937 100644
--- a/gcc/config/avr/avr.opt.urls
+++ b/gcc/config/avr/avr.opt.urls
@@ -44,6 +44,9 @@ UrlSuffix(gcc/AVR-Options.html#index-mrelax)
 maccumulate-args
 UrlSuffix(gcc/AVR-Options.html#index-maccumulate-args)
 
+msplit-bit-shift
+UrlSuffix(gcc/AVR-Options.html#index-msplit-bit-shift)
+
 mstrict-X
 UrlSuffix(gcc/AVR-Options.html#index-mstrict-X)
 
diff --git a/gcc/config/g.opt.urls b/gcc/config/g.opt.urls
index 4ffd5cbd2cf..10ab02a6d63 100644
--- a/gcc/config/g.opt.urls
+++ b/gcc/config/g.opt.urls
@@ -1,5 +1,5 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/g.opt and generated 
HTML
 
 G
-UrlSuffix(gcc/System-V-Options.html#index-G-5)
+UrlSuffix(gcc/System-V-Options.html#index-G-6)
 
diff --git a/gcc/config/i386/nto.opt.urls b/gcc/config/i386/nto.opt.urls
index 37c07a5b88b..055e669d54b 100644
--- a/gcc/config/i386/nto.opt.urls
+++ b/gcc/config/i386/nto.opt.urls
@@ -1,5 +1,5 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/nto.opt and 
generated HTML
 
 G
-UrlSuffix(gcc/System-V-Options.html#index-G-5)
+UrlSuffix(gcc/System-V-Options.html#index-G-6)
 
diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index 622cb6e7b44..294d6628e86 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -33,7 +33,7 @@ mcpu=
 UrlSuffix(gcc/RISC-V-Options.html#index-mcpu-8)
 
 msmall-data-limit=
-UrlSuffix(gcc/RISC-V-Options.html#index-msmall-data-limit-1)
+UrlSuffix(gcc/RISC-V-Options.html#index-msmall-data-limit)
 
 msave-restore
 UrlSuffix(gcc/RISC-V-Options.html#index-msave-restore)
diff --git a/gcc/config/rx/rx.opt.urls b/gcc/config/rx/rx.opt.urls
index 7af4bd249d8..62e2a23cba6 100644
--- a/gcc/config/rx/rx.opt.urls
+++ b/gcc/config/rx/rx.opt.urls
@@ -22,7 +22,7 @@ mlittle-endian-data
 UrlSuffix(gcc/RX-Options.html#index-mlittle-endian-data)
 
 msmall-data-limit=
-UrlSuffix(gcc/RX-Options.html#index-msmall-data-limit-2)
+UrlSuffix(gcc/RX-Options.html#index-msmall-data-limit-1)
 
 mrelax
 UrlSuffix(gcc/RX-Options.html#index-mrelax-7)
diff --git a/gcc/config/sol2.opt.urls b/gcc/config/sol2.opt.urls
index ef64d47d65e..950bb860719 100644
--- a/gcc/config/sol2.opt.urls
+++ b/gcc/config/sol2.opt.urls
@@ -1,7 +1,7 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/sol2.opt and 
generated HTML
 
 G
-UrlSuffix(gcc/System-V-Options.html#index-G-5)
+UrlSuffix(gcc/System-V-Options.html#index-G-6)
 
 mclear-hwcap
 UrlSuffix(gcc/Solaris-2-Options.html#index-mclear-hwcap)
-- 
2.43.0

[committed] i386: x86 can use x >> -y for x >> 32-y [PR36503]

2024-11-24 Thread Uros Bizjak

x86 targets mask 32-bit shifts with a 5-bit mask (and 64-bit with 6-bit mask),
so they can use x >> -y instead of x >> 32-y.  This form is very common in
bitstream readers, where it's used to read the top N bits from a word.

The optimization converts:

movl$32, %ecx
subl%esi, %ecx
sall%cl, %eax

to:

negl%ecx
sall%cl, %eax

PR target/36503

gcc/ChangeLog:

* config/i386/i386.md (*ashl3_negcnt):
New define_insn_and_split pattern.
(*ashl3_negcnt_1): Ditto.
(*3_negcnt): Ditto.
(*3_negcnt_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr36503-1.c: New test.
* gcc.target/i386/pr36503-2.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1c1bf659fc2..399a6a81f9c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15896,6 +15896,62 @@ (define_insn_and_split "*ashl3_mask_1"
   ""
   [(set_attr "isa" "*,bmi2")])
 
+(define_insn_and_split "*ashl3_negcnt"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (ashift:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (subreg:QI
+   (minus
+ (match_operand 3 "const_int_operand")
+ (match_operand 2 "int248_register_operand" "c,r")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands)
+   && INTVAL (operands[3]) ==  * BITS_PER_UNIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 4)
+  (neg:QI (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])
+   (parallel
+ [(set (match_dup 0)
+  (ashift:SWI48 (match_dup 1)
+(match_dup 4)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+
+  operands[4] = gen_reg_rtx (QImode);
+}
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*ashl3_negcnt_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (ashift:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (minus:QI
+   (match_operand:QI 3 "const_int_operand")
+   (match_operand:QI 2 "register_operand" "c,r"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands)
+   && INTVAL (operands[3]) ==  * BITS_PER_UNIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 4)
+  (neg:QI (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])
+   (parallel
+ [(set (match_dup 0)
+  (ashift:SWI48 (match_dup 1)
+(match_dup 4)))
+  (clobber (reg:CC FLAGS_REG))])]
+  "operands[4] = gen_reg_rtx (QImode);"
+  [(set_attr "isa" "*,bmi2")])
+
 (define_insn "*bmi2_ashl3_1"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")
@@ -16622,6 +16678,62 @@ (define_insn_and_split "*3_mask_1"
   ""
   [(set_attr "isa" "*,bmi2")])
 
+(define_insn_and_split "*3_negcnt"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (any_shiftrt:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (subreg:QI
+   (minus
+ (match_operand 3 "const_int_operand")
+ (match_operand 2 "int248_register_operand" "c,r")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (, mode, operands)
+   && INTVAL (operands[3]) ==  * BITS_PER_UNIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 4)
+  (neg:QI (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])
+   (parallel
+ [(set (match_dup 0)
+  (any_shiftrt:SWI48 (match_dup 1)
+ (match_dup 4)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+
+  operands[4] = gen_reg_rtx (QImode);
+}
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*3_negcnt_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (any_shiftrt:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (minus:QI
+   (match_operand:QI 3 "const_int_operand")
+   (match_operand:QI 2 "register_operand" "c,r"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (, mode, operands)
+   && INTVAL (operands[3]) ==  * BITS_PER_UNIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 4)
+  (neg:QI (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])
+   (parallel
+ [(set (match_dup 0)
+  (any_shiftrt:SWI48 (match_dup 1)
+ (match_dup 4)))
+  (clobber (reg:CC FLAGS_REG))])]
+  "operands[4] = gen_reg_rtx (QImode);"
+  [(set_attr "isa" "*,bmi2")])
+
 (define_insn_an

[committed] testsuite/x86: Add -mfpmath=sse to add_options_for_float16

2024-11-24 Thread Uros Bizjak

Add -mfpmath=sse to add_options_for_float16 to avoid error:
'-fexcess-precision=16' is not compatible with '-mfpmath=387'
when compiling gcc.dg/tree-ssa/pow_fold_1.c.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (add_options_for_float16): Add -mpfpmath=sse.

Tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index d550f288a0f..187a7e2992c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3985,7 +3985,7 @@ proc add_options_for_float16 { flags } {
return "$flags -mfp16-format=ieee"
 }
 if { [istarget i?86-*-*] || [istarget x86_64-*-*] } {
-   return "$flags -msse2"
+   return "$flags -msse2 -mfpmath=sse"
 }
 return "$flags"
 }

Re: [Bug fortran/84869] [12/13/14/15 Regression] ICE in gfc_class_len_get, at fortran/trans-expr.c:233

2024-11-24 Thread Harald Anlauf


Am 24.11.24 um 17:40 schrieb Paul Richard Thomas:

Fixed as 'obvious' on 13-branch to mainline with commit
r15-5629-g470ebd31843db58fc503ccef38b82d0da93c65e4

An error with PR number in the mainline ChangeLogs will be corrected
tomorrow.

 Fortran: Fix segfault in allocation of unlimited poly array [PR84869]

 2024-11-24  Paul Thomas  

 gcc/fortran/ChangeLog

 PR fortran/84869
 * trans-expr.cc (trans_class_vptr_len_assignment): To access
 the '_len' field, 're' must be unlimited polymorphic.

 gcc/testsuite/
 PR fortran/84869
 * gfortran.dg/pr84869.f90: Comment out test of component refs.



This part of the patch is not obvious:

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 7013dd3a411..bc1d5a87307 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see

 /* trans-expr.cc-- generate GENERIC trees for gfc_expr.  */

+#define INCLUDE_MEMORY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"


Forgotten to rebase?

Re: [PATCH v2 10/14] Support for 64-bit location_t: gimple parts

2024-11-24 Thread Hans-Peter Nilsson

On Sat, 16 Nov 2024, Lewis Hyatt wrote:
> The size of struct gimple increases by 8 bytes with the change in size of
> location_t from 32- to 64-bit

Half-way scrolling through the patches, this seems a good time 
for a possibly disruptive comment from the side-line: ;-)

For the size-critical types containing and, affected by 
enlarging location_t to 64 bits, would it be feasible to instead 
express the location as an index into a (new) array elsewhere 
that contains the location_t?  If that idea was discarded early 
or pursued and discarded, I missed that.

brgds, H-P

[PATCH v1] Match: Refactor the unsigned SAT_ADD match ADD_OVERFLOW pattern [NFC]

2024-11-24 Thread pan2 . li

From: Pan Li 

This patch would like to refactor the unsigned SAT_ADD pattern when
leverage the IFN ADD_OVERFLOW, aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
* Remove unnecessary helper pattern matches.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Refactor sorts of unsigned SAT_ADD match pattern for
IFN ADD_OVERFLOW.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 83 +++-
 1 file changed, 30 insertions(+), 53 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 48317dc80b6..2ff0536d901 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3086,27 +3086,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
|| POINTER_TYPE_P (itype))
   && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
 
-/* SAT_ADD = usadd_left_part_2 | usadd_right_part_2, aka:
-   SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | (IMAGPART_EXPR <.ADD_OVERFLOW> != 
0) */
-(match (usadd_left_part_2 @0 @1)
- (realpart (IFN_ADD_OVERFLOW:c @0 @1))
- (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
-
-/* SAT_ADD = usadd_left_part_2 | usadd_right_part_2, aka:
-   SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | (IMAGPART_EXPR <.ADD_OVERFLOW> != 
0) */
-(match (usadd_right_part_2 @0 @1)
- (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
- (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
-
-/* SAT_ADD = usadd_left_part_2 | usadd_right_part_2, aka:
-   SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | -IMAGPART_EXPR <.ADD_OVERFLOW> */
-(match (usadd_right_part_2 @0 @1)
- (negate (imagpart (IFN_ADD_OVERFLOW:c @0 @1)))
- (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
-
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type))
  (match (usadd_overflow_mask @0 @1)
   /* SAT_U_ADD = (X + Y) | -(X > (X + Y)).
@@ -3150,38 +3129,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  wide_int max = wi::mask (precision, false, precision);
  wide_int sum = wi::add (cst_1, cst_2);
 }
-(if (wi::eq_p (max, sum)))
-
-/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
-   because the sub part of left_part_2 cannot work with right_part_1.
-   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
-   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
-
-/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW):
-   SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | -IMAGPART_EXPR <.ADD_OVERFLOW> or
-   SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | (IMAGPART_EXPR <.ADD_OVERFLOW> != 
0) */
-(match (unsigned_integer_sat_add @0 @1)
- (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
-
-/* Unsigned saturation add, case 5 (branch with eq .ADD_OVERFLOW):
-   SAT_U_ADD = REALPART_EXPR <.ADD_OVERFLOW> == 0 ? .ADD_OVERFLOW : -1.  */
-(match (unsigned_integer_sat_add @0 @1)
- (cond^ (eq (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)
-  (usadd_left_part_2 @0 @1) integer_minus_onep))
-
-/* Unsigned saturation add, case 6 (branch with ne .ADD_OVERFLOW):
-   SAT_U_ADD = REALPART_EXPR <.ADD_OVERFLOW> != 0 ? -1 : .ADD_OVERFLOW.  */
-(match (unsigned_integer_sat_add @0 @1)
- (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)
-  integer_minus_onep (usadd_left_part_2 @0 @1)))
-
-/* Unsigned saturation add, case 10 (one op is imm):
-   SAT_U_ADD = __builtin_add_overflow (X, 3, &ret) == 0 ? ret : -1.  */
-(match (unsigned_integer_sat_add @0 @1)
- (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop)
-  integer_minus_onep (realpart @2))
-  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-   && types_match (type, @0) && int_fits_type_p (@1, type
+(if (wi::eq_p (max, sum))
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SUM = ADD_OVERFLOW (X, Y)
+ SAT_U_ADD = REALPART (SUM) | -IMAGPART (SUM)   */
+  (bit_ior:c (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) (negate (imagpart @2)))
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SUM = ADD_OVERFLOW (X, Y)
+ SAT_U_ADD = REALPART (SUM) | -(IMAGPART (SUM) != 0)   */
+  (bit_ior:c (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1))
+(negate (convert (ne (imagpart @2) integer_zerop
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SUM = ADD_OVERFLOW (X, Y)
+ SAT_U_ADD = IMAGPART (SUM) == 0 ? REALPART (SUM) : -1   */
+  (cond^ (eq (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
+(realpart @2) integer_minus_onep)
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SUM = ADD_OVERFLOW (X, Y)
+ SAT_U_ADD = IMAGPART (SUM) != 0 ? -1 : REALPART (SUM)  */
+  (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
+in

[PATCH v2] ada: PR target/117538 Traceback includes load address if executable is PIE.

2024-11-24 Thread Simon Wright

If s-trasym.adb (System.Traceback.Symbolic, used as a renaming by
GNAT.Traceback.Symbolic) is given a traceback from a
position-independent executable, it does not include the executable's
load address in the report. This is necessary in order to decode the
traceback report.

Note, this has already been done for s-trasym__dwarf.adb, which really
does produce a symbolic traceback; s-trasym.adb is the version used in
systems which don't actually support symbolication.

Bootstrapped and regtested (ada onlyj) on x86_64-apple-darwin.

* gcc/ada/libgnat/s-trasym.adb: Returns the traceback in the required
   form. Note that leading zeros are trimmed from hexadecimal strings.
 (Symbolic_Traceback): Import Executable_Load_Address.
 (Load_Address): New, from call to Executable_Load_Address.
 (One_If_Executable_Is_PI): New, 0 if Load_Address is null, 1 if
   not.
 (Image_Length): New, found by calling System.Address_Image on
   the first address in the traceback. NB, doesn't include "0x".
 (Load_Address_Prefix): New, String containing the required value.
 (Length_Needed): New, computed using the number of elements
   in the traceback, plus the load address if the executable is PIE.
 (Result): New String of the required length (which will be an
   overestimate).

2024-11-24  Simon Wright   

gcc/ada/Changelog:

PR target/117538
* libgnat/s-trasym.adb: Returns the traceback, with the program load
address if applicable.

---
diff --git a/gcc/ada/libgnat/s-trasym.adb b/gcc/ada/libgnat/s-trasym.adb
index 894fcf37ffd..5351f6fda9b 100644
--- a/gcc/ada/libgnat/s-trasym.adb
+++ b/gcc/ada/libgnat/s-trasym.adb
@@ -53,19 +53,63 @@ package body System.Traceback.Symbolic is
 
   else
  declare
-Img : String := System.Address_Image (Traceback (Traceback'First));
-
-Result : String (1 .. (Img'Length + 3) * Traceback'Length);
-Last   : Natural := 0;
+function Executable_Load_Address return System.Address;
+pragma Import
+  (C, Executable_Load_Address,
+   "__gnat_get_executable_load_address");
+
+Load_Address : constant System.Address :=
+  Executable_Load_Address;
+One_If_Executable_Is_PI : constant Natural :=
+  Boolean'Pos (Load_Address /= Null_Address);
+
+--  How long is an Address_Image? (hex digits only).
+Image_Length : constant Natural :=
+  System.Address_Image (Traceback (Traceback'First))'
+Length;
+
+Load_Address_Prefix : constant String :=
+  "Load address: ";
+
+--  For each address to be output, we need the preceding "%x"
+--  and a trailing space, making 3 additional characters.
+--  There are 2 additional LFs.
+Length_Needed : constant Positive :=
+  (Load_Address_Prefix'Length *
+   One_If_Executable_Is_PI) +
+  (Image_Length + 3) *
+(Traceback'Length + One_If_Executable_Is_PI) +
+  2;
+
+Result : String (1 .. Length_Needed);
+
+Last : Natural := 0;
 
  begin
+
+if One_If_Executable_Is_PI /= 0 then
+   declare
+  Item : constant String :=
+Load_Address_Prefix & "0x" &
+System.Address_Image (Load_Address) &
+ASCII.LF;
+   begin
+  Last := Item'Length;
+  Result (1 .. Last) := Item;
+   end;
+end if;
+
 for J in Traceback'Range loop
-   Img := System.Address_Image (Traceback (J));
-   Result (Last + 1 .. Last + 2)  := "0x";
-   Last   := Last + 2;
-   Result (Last + 1 .. Last + Img'Length) := Img;
-   Last   := Last + Img'Length + 1;
-   Result (Last)  := ' ';
+   declare
+  Img : constant String :=
+System.Address_Image (Traceback (J));
+   begin
+  Result (Last + 1 .. Last + 2) := "0x";
+  Last := Last + 2;
+  Result (Last + 1 .. Last + Img'Length) := Img;
+  Last := Last + Img'Length + 1;
+  Result (Last) := ' ';
+   end;
 end loop;
 
 Result (Last) := ASCII.LF;

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Richard Biener




> Am 24.11.2024 um 09:17 schrieb Hongtao Liu :
> 
> On Fri, Nov 22, 2024 at 9:33 PM Richard Biener  wrote:
>> 
>> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables
>> an extra 128bit SSE vector epilouge when doing 512bit AVX512
>> vectorization in the main loop the following allows a 64bit SSE
>> vector epilogue to be generated when the previous vector epilogue
>> still had a vectorization factor of 16 or larger (which usually
>> means we are operating on char data).
>> 
>> This effectively applies to 256bit and 512bit AVX2/AVX512 main loops,
>> a 128bit SSE main loop would already get a 64bit SSE vector epilogue.
>> 
>> Together with X86_TUNE_AVX512_TWO_EPILOGUES this means three
>> vector epilogues for 512bit and two vector epilogues when enabling
>> 256bit vectorization.  I have not added another tunable for this
>> RFC - suggestions on how to avoid inflation there welcome.
>> 
>> This speeds up 525.x264_r to within 5% of the -mprefer-vector-size=128
>> speed with -mprefer-vector-size=256 or -mprefer-vector-size=512
>> (the latter only when -mtune-crtl=avx512_two_epilogues is in effect).
>> 
>> I have not done any further benchmarking, this merely shows the
>> possibility and looks for guidance on how to expose this to the
>> uarch tunings or to the user (at all?) if not gating on any uarch
>> specific tuning.
>> 
>> Note 64bit SSE isn't a native vector size so we rely on emulation
>> being "complete" (if not epilogue vectorization will only fail, so
>> it's "safe" in this regard).  With AVX512 ISA available an alternative
>> is a predicated epilog, but due to possible STLF issues user control
>> would be required here.
>> 
>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress
>> (I expect some fallout in scans due to some extra epilogues, let's see)
> I'll do some benchmark, Guess it should be ok.

Any suggestion as to how (or if at all?) we should expose this to users for 
tuning?

Richard 

>> 
>>* config/i386/i386.cc (ix86_vector_costs::finish_cost): For an
>>128bit SSE epilogue request a 64bit SSE epilogue if the 128bit
>>SSE epilogue VF was 16 or higher.
>> ---
>> gcc/config/i386/i386.cc | 7 +++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
>> index c7e70c21999..f2e8de3aafc 100644
>> --- a/gcc/config/i386/i386.cc
>> +++ b/gcc/config/i386/i386.cc
>> @@ -25495,6 +25495,13 @@ ix86_vector_costs::finish_cost (const vector_costs 
>> *scalar_costs)
>>   && GET_MODE_SIZE (loop_vinfo->vector_mode) == 32)
>>m_suggested_epilogue_mode = V16QImode;
>> }
>> +  /* When a 128bit SSE vectorized epilogue still has a VF of 16 or larger
>> + enable a 64bit SSE epilogue.  */
>> +  if (loop_vinfo
>> +  && LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>> +  && GET_MODE_SIZE (loop_vinfo->vector_mode) == 16
>> +  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant () >= 16)
>> +m_suggested_epilogue_mode = V8QImode;
>> 
>>   vector_costs::finish_cost (scalar_costs);
>> }
>> --
>> 2.43.0
> 
> 
> 
> --
> BR,
> Hongtao

Re: [PATCH] gimplefe: Fix handling of ')'/'}' after a parse error [PR117741]

2024-11-24 Thread Richard Biener




> Am 24.11.2024 um 02:36 schrieb Andrew Pinski :
> 
> The problem here is c_parser_skip_until_found stops at a closing nesting
> delimiter without consuming it. So if we don't consume it in
> c_parser_gimple_compound_statement, we would go into an infinite loop. The C
> parser similar code in c_parser_statement_after_labels to handle this specific
> case too.

Ok

Richard 

>PR c/117741
> 
> gcc/c/ChangeLog:
> 
>* gimple-parser.cc (c_parser_gimple_compound_statement): Handle
>CPP_CLOSE_PAREN/CPP_CLOSE_SQUARE with an error and skipping the token.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/gimplefe-54.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/c/gimple-parser.cc | 10 ++
> gcc/testsuite/gcc.dg/gimplefe-54.c | 10 ++
> 2 files changed, 20 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/gimplefe-54.c
> 
> diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
> index 81f3921c876..4763cf23313 100644
> --- a/gcc/c/gimple-parser.cc
> +++ b/gcc/c/gimple-parser.cc
> @@ -664,6 +664,16 @@ c_parser_gimple_compound_statement (gimple_parser 
> &parser, gimple_seq *seq)
>break;
>  }
> 
> +case CPP_CLOSE_PAREN:
> +case CPP_CLOSE_SQUARE:
> +  /* Avoid infinite loop in error recovery:
> + c_parser_skip_until_found stops at a closing nesting
> + delimiter without consuming it, but here we need to consume
> + it to proceed further.  */
> +  c_parser_error (parser, "expected statement");
> +  c_parser_consume_token (parser);
> +break;
> +
>default:
> expr_stmt:
>  c_parser_gimple_statement (parser, seq);
> diff --git a/gcc/testsuite/gcc.dg/gimplefe-54.c 
> b/gcc/testsuite/gcc.dg/gimplefe-54.c
> new file mode 100644
> index 000..71a49ac39c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/gimplefe-54.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple" } */
> +
> +/* PR c/117741 */
> +/* Make sure after a parsing error we
> +   don't go into an infinite loop. */
> +
> +int i;
> +void __GIMPLE foo() {
> +  i = ) /* { dg-error "" } */
> --
> 2.43.0
>

Re: [PATCH] rs6000, fix test builtins-1-p10-runnable.c

2024-11-24 Thread Kewen Lin

Hi Carl,

在 2024/10/3 23:11, Carl Love 写道:
> GCC maintainers:
> 
> The builtins-1-10-runnable.c has the debugging inadvertently enabled.  The 
> test uses #ifdef to enable/disable the debugging. Unfortunately, the #define 
> DEBUG was set to 0 to disable debugging and enable the call to abort in case 
> of error.  The #define should have been removed to disable debugging.
> Additionally, a change in the expected output which was made for testing 
> purposes was not removed.  Hence, the test is printing that there was an 
> error not calling abort.  The result is the test does not get reported as 
> failing.
> 
> This patch removes the #define DEBUG to enable the call to abort and restores 
> the expected output to the correct value.  The patch was tested on a Power 10 
> without the #define DEBUG to verify that the test does fail with the 
> incorrect expected value.  The correct expected value was then restored.  The 
> test reports 19 expected passes and no errors.
> 
> Please let me know if this patch is acceptable for mainline. Thanks.
> 
>    Carl
> 
> 
> ---
> 
> rs6000, fix test builtins-1-p10-runnable.c
> 
> The test has two issues:
> 
> 1) The test should generate execute abort() if an error is found.
> However, the test contains a #define 0 which actually enables the
> error prints not exectuting void() because the debug code is protected
> by an #ifdef not #if.  The #define DEBUG needs to be removed to so the
> test will abort on an error.
> 
> 2) The vec_i_expected output was tweeked to test that it would fail.
> The test value was not removed.
> 
> By removing the #define DEBUG, the test fails and reports 1 failure.
> Removing the intentionally wrong expected value results in the test
> passing with no errors as expected.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove #define
>     DEBUG.    Replace vec_i_expected value with correct value.

Nit: Three more spaces before "Replace", so s/   //.

OK for trunk with this tweaked, thanks!

BR,
Kewen

> ---
>  gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> index 222c8b3a409..3e8a1c736e3 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> @@ -25,8 +25,6 @@
>  #include 
>  #include 
> 
> -#define DEBUG 0
> -
>  #ifdef DEBUG
>  #include 
>  #endif
> @@ -281,8 +279,7 @@ int main()
>  /* Signed word multiply high */
>  i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 2147483648 };
>  i_arg2 = (vector int){ 2, 3, 4, 5};
> -    //    vec_i_expected = (vector int){-1, -2, -2, -3};
> -    vec_i_expected = (vector int){1, -2, -2, -3};
> +    vec_i_expected = (vector int){-1, -2, -2, -3};
> 
>  vec_i_result = vec_mulh (i_arg1, i_arg2);
>

Re: [PATCH ver2 3/4] rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

2024-11-24 Thread Kewen Lin

Hi Carl,

在 2024/10/1 23:27, Carl Love 写道:
> 
> 
> GCC maintainers:
> 
> Version 2: Fixed the wording in the changelog per the feedback. With this 
> change the patch was approved by Kewen.
> 
> The patch removed the built-in __builtin_vsx_xvcvuxwdp as it is covered by 
> the overloaded vec_doubleo built-in.
> 
> The patch has been tested on Power 10 LE and BE with no regressions.
> 
> Please let me know if it is acceptable for mainline.  Thanks.
> 
>   Carl
> 
> 
> 
> rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp
> 
> The built-in __builtin_vsx_xvcvuxwdp can be covered with PVIPR
> function vec_doubleo on LE and vec_doublee on BE.  There are no test
> cases or documentation for __builtin_vsx_xvcvuxwdp.  This patch
> removes the redundant built-in.

OK for trunk, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvuxwdp):
>     Remove built-in definition.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index adb4fe761f3..7350b913d03 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1616,9 +1616,6 @@
>    const vf __builtin_vsx_xvcvuxdsp (vull);
>  XVCVUXDSP vsx_xvcvuxdsp {}
> 
> -  const vd __builtin_vsx_xvcvuxwdp (vsi);
> -    XVCVUXWDP vsx_xvcvuxwdp {}
> -
>    const vf __builtin_vsx_xvcvuxwsp (vsi);
>  XVCVUXWSP vsx_floatunsv4siv4sf2 {}
>

Re: [PATCH ver2 1/4] rs6000, add testcases to the overloaded vec_perm built-in

2024-11-24 Thread Kewen Lin

Hi Carl,

在 2024/10/1 23:27, Carl Love 写道:
> 
> 
> GCC maintainers:
> 
> Version 2, fixed the changelog, updated the wording in the documentation and 
> updated the argument types in the vsx-builtin-3.c test file.
> 
> The following patch adds missing test cases for the overloaded vec_perm 
> built-in.  It also fixes and issue with printing the 128-bit values in the 
> DEBUG section that was noticed when adding the additional test cases.
> 
> The patch has been tested on Power 10 LE and BE with no regressions.
> 
> Please let me know if it is acceptable for mainline.  Thanks.
> 
>   Carl
> 
> ---
> From 4c672e8895107bc1f62e09122e7af157436cb83d Mon Sep 17 00:00:00 2001
> From: Carl Love 
> Date: Wed, 31 Jul 2024 16:31:34 -0400
> Subject: [PATCH 1/4] rs6000, add testcases to the overloaded vec_perm built-in
> 
> The overloaded vec_perm built-in supports permuting signed and unsigned
> vectors of char, bool char, short int, short bool, int, bool, long long
> int, long long bool, int128, float and double.  However, not all of the
> supported arguments are included in the test cases.  This patch adds
> the missing test cases.
> 
> Additionally, in the 128-bit debug print statements the expected result and
> the result need to be cast to unsigned long long to print correctly.  The
> patch makes this additional change to the print statements.
> 
> gcc/ChangeLog:
>     * doc/extend.texi: Fix spelling mistake in description of the
>     vec_sel built-in.  Add documentation of the 128-bit vec_perm
>     instance.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vsx-builtin-3.c: Add vec_perm test cases for
>     arguments of type vector signed long long int, long long bool,
>     bool, bool short, bool char and pixel,    vector unsigned long long

Nit: too many spaces.

>     int, unsigned int, unsigned short int, unsigned char.  Cast
>     arguments for debug prints to unsigned long long.
>     * gcc.target/powerpc/builtins-4-int128-runnable.c: Add vec_perm
>     test cases for signed and unsigned int128 arguments.
> ---
>  gcc/doc/extend.texi   |  12 +-
>  .../powerpc/builtins-4-int128-runnable.c  | 108 +++---
>  .../gcc.target/powerpc/vsx-builtin-3.c    |  14 ++-
>  3 files changed, 116 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 2d795ba7e59..adc4a54c5fa 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21642,9 +21642,19 @@ vector bool __int128 vec_sel (vector bool __int128,
>     vector bool __int128, vector unsigned __int128);
>  @end smallexample
> 
> -The instance is an extension of the exiting overloaded built-in 
> @code{vec_sel}
> +The instance is an extension of the existing overloaded built-in 
> @code{vec_sel}
>  that is documented in the PVIPR.
> 
> +@smallexample
> +vector signed __int128 vec_perm (vector signed __int128,
> +   vector signed __int128);
> +vector unsigned __int128 vec_perm (vector unsigned __int128,
> +   vector unsigned __int128);
> +@end smallexample
> +
> +The instance is an extension of the existing overloaded built-in
> +@code{vec_perm} that is documented in the PVIPR.
> +
>  @node Basic PowerPC Built-in Functions Available on ISA 2.06
>  @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c
> index 62c11132cf3..c61b0ecb854 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c
> @@ -18,6 +18,16 @@ int main() {
>    __uint128_t data_u128[100];
>    __int128_t data_128[100];
> 
> +#ifdef __BIG_ENDIAN__
> +  vector unsigned char vuc = {0xC, 0xD, 0xE, 0xF, 0x8, 0x9, 0xA, 0xB,
> +  0x1C, 0x1D, 0x1E, 0x1F, 0x18, 0x19, 0x1A, 
> 0x1B};
> +#else
> +  vector unsigned char vuc = {0x4, 0x5, 0x6, 0x7, 0x0, 0x1, 0x2, 0x3,
> +  0x14, 0x15, 0x16, 0x17, 0x10, 0x11, 0x12, 0x13};
> +#endif
> +
> +  vector __int128_t vec_128_arg1, vec_128_arg2;
> +  vector __uint128_t vec_u128_arg1, vec_u128_arg2;
>    vector __int128_t vec_128_expected1, vec_128_result1;
>    vector __uint128_t vec_u128_expected1, vec_u128_result1;
>    signed long long zero = (signed long long) 0;
> @@ -37,11 +47,13 @@ int main() {
>  {
>  #ifdef DEBUG
>  printf("Error: vec_xl(), vec_128_result1[0] = %lld %llu; ",
> -   vec_128_result1[0] >> 64,
> -   vec_128_result1[0] & (__int128_t)0x);
> +   (unsigned long long)(vec_128_result1[0] >> 64),
> +   (unsigned long long)(vec_128_result1[0]
> +    & (__int128_t)0x));
>  printf("vec_128_expected1[0] = %lld %llu\n",
> -   vec_128_expected1[0] >> 6

Re: [PATCH ver2 2/4] rs6000, remove built-ins __builtin_vsx_vperm_8hi and, __builtin_vsx_vperm_8hi_uns

2024-11-24 Thread Kewen Lin

Hi Carl,

在 2024/10/1 23:27, Carl Love 写道:
> 
> GCC maintainers:
> 
> version 2, added the reference to the patch where the removal of the 
> built-ins was missed.  Note, patch was approved by Kewen with this change.
> 
> The following patch removes two redundant built-ins __builtin_vsx_vperm_8hi 
> and __builtin_vsx_vperm_8hi_uns.  The built-ins are covered by the overloaded 
> vec_perm built-in.
> 
> The patch has been tested on Power 10 LE and BE with no regressions.
> 
> Please let me know if it is acceptable for mainline.  Thanks.
> 
>   Carl
> 
> 
> ---
> rs6000, remove built-ins __builtin_vsx_vperm_8hi and 
> __builtin_vsx_vperm_8hi_uns
> 
> The two built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns
> are redundant. The are covered by the overloaded vec_perm built-in. The
> built-ins are not documented and do not have test cases.
> 
> The removal of these built-ins was missed in commit gcc r15-1923 on
> 7/9/2024.
> 
> This patch removes the redundant built-ins.

OK for trunk, thanks.

BR,
Kewen

> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_8hi,
>     __builtin_vsx_vperm_8hi_uns): Remove built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 0e9dc05dbcf..adb4fe761f3 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1472,12 +1472,6 @@
>    const vf __builtin_vsx_uns_floato_v2di (vsll);
>  UNS_FLOATO_V2DI unsfloatov2di {}
> 
> -  const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
> -    VPERM_8HI_X altivec_vperm_v8hi {}
> -
> -  const vus __builtin_vsx_vperm_8hi_uns (vus, vus, vuc);
> -    VPERM_8HI_UNS_X altivec_vperm_v8hi_uns {}
> -
>    const vsll __builtin_vsx_vsigned_v2df (vd);
>  VEC_VSIGNED_V2DF vsx_xvcvdpsxds {}
>

Re: [PATCH ver2 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float

2024-11-24 Thread Kewen Lin

Hi Carl,

在 2024/10/1 23:28, Carl Love 写道:
> 
> GCC maintainers:
> 
> Version 2, added the argument changes for the__builtin_vsx_uns_double[e | o | 
> h | l ]_v4si built-ins. Added support to the vector {un,}signed int to vector 
> float builtins so they are supported using Altivec instructions if VSX is not 
> available per the feedback comments.
> 
> The following patch fixes errors in the definition of the 
> __builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and 
> __builtin_vsx_uns_float2_v2di built-ins.  The arguments should be unsigned 
> but are listed as signed.
> 
> Additionally, there are a number of test cases that are missing for the 
> various instances of the built-ins.  Additionally, the documentation for the 
> various built-ins is missing.
> 
> This patch adds the missing test cases and documentation.
> 
> The patch has been tested on Power 10 LE and BE with no regressions.
> 
> Please let me know if it is acceptable for mainline.  Thanks.
> 
>     Carl
> 
> -
> rs6000, Add tests and documentation for vector conversions between integer 
> and float
> 
> The arguments for the __builtin_vsx_uns_floate_v2di,
> __builtin_vsx_uns_floato_v2di, __builtin_vsx_uns_float2_v2di,
> __builtin_vsx_xvcvuxwsp built-ins,__builtin_vsx_uns_doublee_v4si,
> __builtin_vsx_uns_doubleh_v4si, __builtin_vsx_uns_doublel_v4si and
> __builtin_vsx_uns_doubleo_v4si built-ins should be unsigned not signed.
> 
> Add tests for the following existing vector integer and vector long long
> int to vector float built-ins:
>   __builtin_altivec_float_sisf (vsi);
>   __builtin_altivec_uns_float_sisf (vui);

These functions are to convert vector {un,}signed int to vector float,
PVIPR has defined "vec_float" for this kind of conversion, I think they
are just for bif instances and we don't need to document them.

> 
> Add tests for the vector float to vector int built-ins:
>   __builtin_altivec_fix_sfsi
>   __builtin_altivec_fixuns_sfsi
> 
> The four built-ins are not documented.  The patch adds the missing
> documentation for the built-ins.

Similarly PVIPR has defined "vec_{un,}signed".

> 
> The vector signed/unsigned integer to vector floating point built-ins
> __builtin_vsx_xvcvsxwsp, __builtin_vsx_xvcvuxwsp are extended to generate
> Altivec instructions if VSX is not available.  A new test case for these
> built-ins with Altivec is added to test the new functionality.

I think the previous suggestion (quoted below) beats the current
proposal since XVCVSXWSP and XVCVUXWSP have vsx mnemonic in their
names and they are more reasonable to be used for vsx instances.
The existing FLOAT_V4SI_V4SF (defined in vector.md) is better to be
expanded to cover both altivec and vsx.

"These functions are to convert vector {un,}signed int to vector float,
PVIPR has defined "vec_float" for this kind of conversion.  For now,
this function only considers VSX:

[VEC_FLOAT, vec_float, __builtin_vec_float]
  vf __builtin_vec_float (vsi);
XVCVSXWSP
  vf __builtin_vec_float (vui);
XVCVUXWSP

I think we should fix it to have it also supported without VSX but with
ALTIVEC.  We can change the associated instance XVCVSXWSP with
FLOAT_V4SI_V4SF, and update its associated expander floatv4siv4sf2 to
consider VSX support, such as:

if (mode == V4SFmode)
  {
if (VECTOR_UNIT_VSX_P (mode))
  emit_insn (gen_vsx_floatv4siv4sf2 (operands[0], operands[1]));
else
  emit_insn (gen_altivec_vcfsx (operands[0], operands[1], const0_rtx));
DONE;
  }

(untested), then vec_float can route to altivec_vcfsx with only ALTIVEC
but not VSX, and to XVCVSXWSP with VSX supported.

Similar for XVCVUXWSP with UNSFLOAT_V4SI_V4SF and expander 
floatunsv4siv4sf2, and I think we can drop XVCVSXWSP and XVCVUXWSP
definitions as well as they gets only used for overloading.
"

BR,
Kewen

> 
> This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di
> and __builtin_vsx_xvcvuxwsp argument types and adds test cases for each
> of the built-ins listed above.
> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di,
>     __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di,
>     __builtin_vsx_xvcvuxwspm __builtin_vsx_uns_doublee_v4si,
>     __builtin_vsx_uns_doubleh_v4si, __builtin_vsx_uns_doublel_v4si,
>     __builtin_vsx_uns_doubleo_v4si): Change argument from signed to
>     unsigned.
>     ( __builtin_vsx_xvcvsxwsp, __builtin_vsx_xvcvuxwsp): Move to
>     section Altivec.
>     * config/rs6000/vsx.md (vsx_floatv4siv4sf2, vsx_floatunsv4siv4sf2):
>     Add define expand to generate VSX instructions if VSX is enabled
>     and Altivec instructions otherwise.
>     (vsx_float2, vsx_floatuns2): Change
>     define_insns to define_insn for vsx_float2_internal and
>     vsx_floatuns2_internal.
>     (vsx_floatv2div2df2, vsx_floatunsv2div2df2): Add define expands.
>

Re: [PATCH] testsuite: Fix up various powerpc tests after -std=gnu23 by default switch [PR117663]

2024-11-24 Thread Kewen Lin

Hi Jakub,

在 2024/11/22 16:18, Jakub Jelinek 写道:
> Hi!
> 
> These tests use the K&R function style definitions or pass arguments
> to () functions.
> It seemed easiest to just use -std=gnu17 for all of those.

Thanks for fixing!  I slightly prefer passing -Wno-old-style-definition
instead as the test cases can still go with the default c dialect.  But
it's your call. :)  OK for trunk with or without that change.  Thanks!

BR,
Kewen

> 
> Bootstrapped/regtested on powerpc64le-linux and powerpc64-linux (on the
> latter tested with -m32/-m64), ok for trunk?
> 
> 2024-11-22  Jakub Jelinek  
> 
>   PR testsuite/117663
>   * gcc.target/powerpc/pr58673-1.c: Add -std=gnu17 to dg-options.
>   * gcc.target/powerpc/pr64505.c: Likewise.
>   * gcc.target/powerpc/pr116170.c: Likewise.
>   * gcc.target/powerpc/pr58673-2.c: Likewise.
>   * gcc.target/powerpc/pr64019.c: Likewise.
>   * gcc.target/powerpc/pr96506-1.c: Likewise.
>   * gcc.target/powerpc/swaps-stack-protector.c: Likewise.
>   * gcc.target/powerpc/pr78543.c: Likewise.
>   * gcc.dg/vect/pr48765.c: Add -std=gnu17 to dg-additional-options.
> 
> --- gcc/testsuite/gcc.target/powerpc/pr58673-1.c.jj   2024-06-04 
> 13:19:04.531594020 +0200
> +++ gcc/testsuite/gcc.target/powerpc/pr58673-1.c  2024-11-21 
> 18:57:26.724287696 +0100
> @@ -1,6 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> -/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O1" } */
> +/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O1 -std=gnu17" } */
>  /* { dg-require-effective-target powerpc_vsx } */
>  
>  enum typecode
> --- gcc/testsuite/gcc.target/powerpc/pr64505.c.jj 2020-11-09 
> 15:25:52.0 +0100
> +++ gcc/testsuite/gcc.target/powerpc/pr64505.c2024-11-21 
> 19:08:32.258800032 +0100
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "" { powerpc*-*-aix* } } */
> -/* { dg-options "-w -O2 -mpowerpc64" } */
> +/* { dg-options "-w -O2 -mpowerpc64 -std=gnu17" } */
>  
>  /*
>   * (below is minimized test case)
> --- gcc/testsuite/gcc.target/powerpc/pr116170.c.jj2024-11-21 
> 18:56:42.283921230 +0100
> +++ gcc/testsuite/gcc.target/powerpc/pr116170.c   2024-11-21 
> 18:55:57.301562478 +0100
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target ppc_float128_sw } */
> -/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong 
> -ffloat-store" } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -fstack-protector-strong 
> -ffloat-store -std=gnu17" } */
>  
>  /* Verify there is no ICE.  */
>  
> --- gcc/testsuite/gcc.target/powerpc/pr58673-2.c.jj   2024-06-04 
> 13:19:04.531594020 +0200
> +++ gcc/testsuite/gcc.target/powerpc/pr58673-2.c  2024-11-21 
> 18:59:33.549479716 +0100
> @@ -1,6 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> -/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O3 -funroll-loops" } */
> +/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O3 -funroll-loops -std=gnu17" } 
> */
>  /* { dg-require-effective-target powerpc_vsx } */
>  
>  #include 
> --- gcc/testsuite/gcc.target/powerpc/pr64019.c.jj 2024-06-04 
> 13:19:04.531594020 +0200
> +++ gcc/testsuite/gcc.target/powerpc/pr64019.c2024-11-21 
> 19:00:08.110987010 +0100
> @@ -1,6 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> -/* { dg-options "-O2 -ffast-math -mdejagnu-cpu=power7" } */
> +/* { dg-options "-O2 -ffast-math -mdejagnu-cpu=power7 -std=gnu17" } */
>  /* { dg-require-effective-target powerpc_vsx } */
>  
>  #include 
> --- gcc/testsuite/gcc.target/powerpc/pr96506-1.c.jj   2020-11-22 
> 19:11:44.0 +0100
> +++ gcc/testsuite/gcc.target/powerpc/pr96506-1.c  2024-11-21 
> 19:09:58.042577378 +0100
> @@ -1,7 +1,7 @@
>  /* PR target/96506 */
>  /* { dg-do compile } */
>  /* { dg-require-effective-target power10_ok } */
> -/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2 -std=gnu17" } */
>  
>  extern void bar0();
>  extern void bar1();
> --- gcc/testsuite/gcc.target/powerpc/swaps-stack-protector.c.jj   
> 2020-01-12 11:54:38.0 +0100
> +++ gcc/testsuite/gcc.target/powerpc/swaps-stack-protector.c  2024-11-21 
> 19:12:01.487819286 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-fstack-protector -O3" } */
> +/* { dg-options "-fstack-protector -O3 -std=gnu17" } */
>  
>  /* PR78695: This code used to ICE in rs6000.c:find_alignment_op because
> the stack protector address definition isn't associated with an insn.  */
> --- gcc/testsuite/gcc.target/powerpc/pr78543.c.jj 2024-06-04 
> 13:19:04.0 +0200
> +++ gcc/testsuite/gcc.target/powerpc/pr78543.c2024-11-21 
> 19:09:13.071218226 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> -/* { dg-options "-mdejagnu-c

Re: [PATCH] rs6000: Add PowerPC inline asm redzone clobber support

2024-11-24 Thread Kewen Lin

Hi Jakub,

Thanks for doing this!

在 2024/11/7 20:16, Jakub Jelinek 写道:
> Hi!
> 
> The following patch on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667949.html
> patch adds rs6000 part of the support (the only other target I'm aware of
> which clearly has red zone as well).
> 
> 2024-11-07  Jakub Jelinek  
> 
>   * config/rs6000/rs6000.h (struct machine_function): Add
>   asm_redzone_clobber_seen member.
>   * config/rs6000/rs6000-logue.cc (rs6000_stack_info): Force
>   info->push_p if cfun->machine->asm_redzone_clobber_seen.
>   * config/rs6000/rs6000.cc (TARGET_REDZONE_CLOBBER): Redefine.
>   (rs6000_redzone_clobber): New function.
> 
>   * gcc.target/powerpc/asm-redzone-1.c: New test.
> 
> --- gcc/config/rs6000/rs6000.h.jj 2024-08-30 09:09:45.407624634 +0200
> +++ gcc/config/rs6000/rs6000.h2024-11-07 12:25:44.979466003 +0100
> @@ -2424,6 +2424,7 @@ typedef struct GTY(()) machine_function
>   global entry.  It helps to control the patchable area before and after
>   local entry.  */
>bool global_entry_emitted;
> +  bool asm_redzone_clobber_seen;
>  } machine_function;
>  #endif
>  
> --- gcc/config/rs6000/rs6000-logue.cc.jj  2024-10-25 10:00:29.389768987 
> +0200
> +++ gcc/config/rs6000/rs6000-logue.cc 2024-11-07 12:36:05.985688899 +0100
> @@ -918,7 +918,7 @@ rs6000_stack_info (void)
>else if (DEFAULT_ABI == ABI_V4)
>  info->push_p = non_fixed_size != 0;
>  
> -  else if (frame_pointer_needed)
> +  else if (frame_pointer_needed || cfun->machine->asm_redzone_clobber_seen)
>  info->push_p = 1;
>  
>else
> --- gcc/config/rs6000/rs6000.cc.jj2024-10-25 10:00:29.393768930 +0200
> +++ gcc/config/rs6000/rs6000.cc   2024-11-07 12:34:21.679163134 +0100
> @@ -1752,6 +1752,9 @@ static const scoped_attribute_specs *con
>  #undef TARGET_CAN_CHANGE_MODE_CLASS
>  #define TARGET_CAN_CHANGE_MODE_CLASS rs6000_can_change_mode_class
>  
> +#undef TARGET_REDZONE_CLOBBER
> +#define TARGET_REDZONE_CLOBBER rs6000_redzone_clobber
> +
>  #undef TARGET_CONSTANT_ALIGNMENT
>  #define TARGET_CONSTANT_ALIGNMENT rs6000_constant_alignment
>  
> @@ -13727,6 +13730,24 @@ rs6000_can_change_mode_class (machine_mo
>return true;
>  }
>  
> +/* Implement TARGET_REDZONE_CLOBBER.  */
> +
> +static rtx
> +rs6000_redzone_clobber ()
> +{
> +  cfun->machine->asm_redzone_clobber_seen = true;
> +  if (DEFAULT_ABI != ABI_V4)
> +{
> +  int red_zone_size = TARGET_32BIT ? 220 : 288;

I think this is perfectly consistent with what we have in
function offset_below_red_zone_p.  But I noticed that elfv2
ABI has document mentioning red zone of 512 bytes:

2.2.2.4 Protected Zone

The 288 bytes below the stack pointer are available as volatile program storage 
that is not preserved across
function calls. Interrupt handlers and any other functions that might run 
without an explicit call must take care
to preserve a protected zone, also referred to as the red zone, of 512 bytes 
that consists of:
• The 288-byte volatile program storage region that is used to hold saved 
registers and local variables
• An additional 224 bytes below the volatile program storage region that is set 
aside as a volatile system
storage region for system functions
...

It looks like an oversight instead of intentional setting in
the current offset_below_red_zone_p, looking forward to Segher's
comments. :)

BR,
Kewen

> +  rtx base = plus_constant (Pmode, stack_pointer_rtx,
> + GEN_INT (-red_zone_size));
> +  rtx mem = gen_rtx_MEM (BLKmode, base);
> +  set_mem_size (mem, red_zone_size);
> +  return mem;
> +}
> +  return NULL_RTX;
> +}
> +
>  /* Debug version of rs6000_can_change_mode_class.  */
>  static bool
>  rs6000_debug_can_change_mode_class (machine_mode from,
> --- gcc/testsuite/gcc.target/powerpc/asm-redzone-1.c.jj   2024-11-07 
> 13:01:36.935064863 +0100
> +++ gcc/testsuite/gcc.target/powerpc/asm-redzone-1.c  2024-11-07 
> 13:01:31.449142367 +0100
> @@ -0,0 +1,71 @@
> +/* { dg-do run { target lp64 } } */
> +/* { dg-options "-O2" } */
> +
> +__attribute__((noipa)) int
> +foo (void)
> +{
> +  int a = 1;
> +  int b = 2;
> +  int c = 3;
> +  int d = 4;
> +  int e = 5;
> +  int f = 6;
> +  int g = 7;
> +  int h = 8;
> +  int i = 9;
> +  int j = 10;
> +  int k = 11;
> +  int l = 12;
> +  int m = 13;
> +  int n = 14;
> +  int o = 15;
> +  int p = 16;
> +  int q = 17;
> +  int r = 18;
> +  int s = 19;
> +  int t = 20;
> +  int u = 21;
> +  int v = 22;
> +  int w = 23;
> +  int x = 24;
> +  int y = 25;
> +  int z = 26;
> +  asm volatile ("" : "+g" (a), "+g" (b), "+g" (c), "+g" (d), "+g" (e));
> +  asm volatile ("" : "+g" (f), "+g" (g), "+g" (h), "+g" (i), "+g" (j));
> +  asm volatile ("" : "+g" (k), "+g" (l), "+g" (m), "+g" (n), "+g" (o));
> +  asm volatile ("" : "+g" (k), "+g" (l), "+g" (m), "+g" (n), "+g" (o));
> +  asm volatile ("" : "+g" (p), "+g" (q), "+g" (s), "+g" (t), "+g" (u));
> +  asm volatile ("" : "+g" (v),

Re: [PATCH] [RFC] Add extra 64bit SSE vector epilogue in some cases

2024-11-24 Thread Hongtao Liu

On Fri, Nov 22, 2024 at 9:33 PM Richard Biener  wrote:
>
> Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables
> an extra 128bit SSE vector epilouge when doing 512bit AVX512
> vectorization in the main loop the following allows a 64bit SSE
> vector epilogue to be generated when the previous vector epilogue
> still had a vectorization factor of 16 or larger (which usually
> means we are operating on char data).
>
> This effectively applies to 256bit and 512bit AVX2/AVX512 main loops,
> a 128bit SSE main loop would already get a 64bit SSE vector epilogue.
>
> Together with X86_TUNE_AVX512_TWO_EPILOGUES this means three
> vector epilogues for 512bit and two vector epilogues when enabling
> 256bit vectorization.  I have not added another tunable for this
> RFC - suggestions on how to avoid inflation there welcome.
>
> This speeds up 525.x264_r to within 5% of the -mprefer-vector-size=128
> speed with -mprefer-vector-size=256 or -mprefer-vector-size=512
> (the latter only when -mtune-crtl=avx512_two_epilogues is in effect).
>
> I have not done any further benchmarking, this merely shows the
> possibility and looks for guidance on how to expose this to the
> uarch tunings or to the user (at all?) if not gating on any uarch
> specific tuning.
>
> Note 64bit SSE isn't a native vector size so we rely on emulation
> being "complete" (if not epilogue vectorization will only fail, so
> it's "safe" in this regard).  With AVX512 ISA available an alternative
> is a predicated epilog, but due to possible STLF issues user control
> would be required here.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress
> (I expect some fallout in scans due to some extra epilogues, let's see)
I'll do some benchmark, Guess it should be ok.
>
> * config/i386/i386.cc (ix86_vector_costs::finish_cost): For an
> 128bit SSE epilogue request a 64bit SSE epilogue if the 128bit
> SSE epilogue VF was 16 or higher.
> ---
>  gcc/config/i386/i386.cc | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index c7e70c21999..f2e8de3aafc 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -25495,6 +25495,13 @@ ix86_vector_costs::finish_cost (const vector_costs 
> *scalar_costs)
>&& GET_MODE_SIZE (loop_vinfo->vector_mode) == 32)
> m_suggested_epilogue_mode = V16QImode;
>  }
> +  /* When a 128bit SSE vectorized epilogue still has a VF of 16 or larger
> + enable a 64bit SSE epilogue.  */
> +  if (loop_vinfo
> +  && LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  && GET_MODE_SIZE (loop_vinfo->vector_mode) == 16
> +  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant () >= 16)
> +m_suggested_epilogue_mode = V8QImode;
>
>vector_costs::finish_cost (scalar_costs);
>  }
> --
> 2.43.0



-- 
BR,
Hongtao

Re: [PATCH] wwwdocs: Align the DCO text for the GNU Toolchain to match community usage.

2024-11-24 Thread Mark Wielaard

Hi Carlos,

On Thu, Nov 21, 2024 at 02:26:39PM -0500, Carlos O'Donell wrote:
> On 11/21/24 1:47 PM, Sam James wrote:
> > Mark Wielaard  writes:
> >> On Thu, 2024-11-21 at 12:04 -0500, Carlos O'Donell wrote:
> >>
> >> I suggest including the actual clarification in the explantion, so
> >> there is no confusion about what is meant by "known identity":
> >>
> >> It looks like the cncf has an almost identical clarification.
> > 
> > I will note that our DCO in Gentoo was based on the kernel's, and we
> > changed ours in April last year accordingly to align with this update
> > too.
> 
> I'd like to keep the language as simple as possible, and so I do not plan to
> make any further changes to my patch.

OK, I can update my suggested wording to a patch that includes the
elfutils/cncf/gentoo clarifications.

> I'm concerned that terms like "false" or "misrepresent" are context dependent
> and may lead to more confusion.
> 
> I like that the linux kernel text is succinct.

Succinct is good if it doesn't lead to confusion about the terms
used. But that seems precisely why we got in this mess. Not defining
what "real name" or "known identity" means that our policy keeps
being open for debate and interpretation by individual
reviewers. Which is why community usage by other projects add a
clarification.

Cheers,

Mark

Re: [PATCH] libsanitizer: Remove -pedantic from AM_CXXFLAGS [PR117732]

2024-11-24 Thread Jeff Law





On 11/22/24 5:44 PM, Jakub Jelinek wrote:

Hi!

We aren't the master repository for the sanitizers and clearly upstream
introduces various extensions in the code.
All we care about is whether it builds and works fine with GCC, so
-pedantic flag is of no use to us, only maybe to upstream if they
cared about it (which they clearly don't).

The following patch removes those and fixes some whitespace nits at the same
time.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-23  Jakub Jelinek  

PR sanitizer/117732
* asan/Makefile.am (AM_CXXFLAGS): Remove -pedantic.  Formatting fix.
(asan_files): Formatting fix.
* hwasan/Makefile.am (AM_CXXFLAGS): Remove -pedantic.  Formatting fix.
* interception/Makefile.am (AM_CXXFLAGS): Likewise.
(interception_files): Formatting fix.
* libbacktrace/Makefile.am: Update copyright years.
* lsan/Makefile.am (AM_CXXFLAGS): Remove -pedantic.  Formatting fix.
* sanitizer_common/Makefile.am (AM_CXXFLAGS): Likewise.
(libsanitizer_common_la_DEPENDENCIES): Formatting fix.
* tsan/Makefile.am (AM_CXXFLAGS): Remove -pedantic.  Formatting fix.
* ubsan/Makefile.am (AM_CXXFLAGS): Likewise.
* asan/Makefile.in: Regenerate.
* hwasan/Makefile.in: Regenerate.
* interception/Makefile.in: Regenerate.
* libbacktrace/Makefile.in: Regenerate.
* lsan/Makefile.in: Regenerate.
* sanitizer_common/Makefile.in: Regenerate.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.in: Regenerate.

OK
jeff

Re: improve std::deque::_M_reallocate_map

2024-11-24 Thread Jan Hubicka

> Hi,
> looking into reason why we still do throw_bad_alloc in clang binary I noticed
> that quite few calls come from deque::_M_reallocate_map.  This patch adds
> unreachable to limit the size of realloc_map.  _M_reallocate_map is called 
> only
> if new size is smaller then max_size.  map is an array holding pointers to
> entries of fixed size.
> 
> Since rellocation is done by doubling the map size, I think the maximal size 
> of
> map allocated is max_size / deque_buf_size rounded up times two.  This should
> be also safe for overflows since we have extra bit.
> 
> map size is always at least 8. Theoretically this computation may be wrong for
> very large T, but in that case callers should never reallocate.
> 
> On the testcase I get:
> jh@shroud:~> ~/trunk-install-new4/bin/g++ -O2 dq.C -c ; size -A dq.o | grep 
> text
> .text  284  0
> .text._ZNSt5dequeIiSaIiEE17_M_reallocate_mapEmb485  0
> .text.unlikely  10  0
> jh@shroud:~> ~/trunk-install-new5/bin/g++ -O2 dq.C -c ; size -A dq.o | grep 
> text
> .text  284  0
> .text._ZNSt5dequeIiSaIiEE17_M_reallocate_mapEmb465  0
> .text.unlikely  10  0
> 
> so this saves about 20 bytes of rellocate_map, which I think is worthwhile.
> Curiously enough gcc14 does:
> 
> jh@shroud:~> g++ -O2 dq.C -c ; size -A dq.o | grep text
> .text 604  0
> .text.unlikely 10  0
> 
> which is 145 bytes smaller. Obvoius difference is that _M_reallocate_map gets 
> inlined.
> Compiling gcc14 preprocessed file with trunk gives:
> 
> jh@shroud:~> g++ -O2 dq.C -S ; size -A dq.o | grep text
> .text 762  0
> .text.unlikely 10  0
> 
> So inlining is due to changes at libstdc++ side, but code size growth is due 
> to
> something else.
> 
> For clang this reduced number of thris_bad_new_array_length from 121 to 61.
> 
> Regtested x86_64-linux, OK?
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/deque.tcc: Compute maximal size of alloc_map.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/tree-ssa/deque.C: New test.
> 
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/deque.C 
> b/gcc/testsuite/g++.dg/tree-ssa/deque.C
> new file mode 100644
> index 000..c79de9b2161
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/deque.C
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized" } */
> +#include 
> +void
> +test(std::deque &q, int v)
> +{
> +  q.push_back (v);
> +}
> +// { dg-final { scan-tree-dump-not "throw_bad_alloc" "optimized" } }
> diff --git a/libstdc++-v3/include/bits/deque.tcc 
> b/libstdc++-v3/include/bits/deque.tcc
> index deb010a0ebb..653354f90a7 100644
> --- a/libstdc++-v3/include/bits/deque.tcc
> +++ b/libstdc++-v3/include/bits/deque.tcc
> @@ -955,6 +955,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> size_type __new_map_size = this->_M_impl._M_map_size
>+ std::max(this->_M_impl._M_map_size,
>   __nodes_to_add) + 2;
> +   size_type __max = __deque_buf_size(sizeof(_Tp));
> +   if (__new_map_size > ((max_size() + __deque_buf_size(sizeof(_Tp)) - 1)
> + / __deque_buf_size(sizeof(_Tp))) * 2)
> + __builtin_unreachable ();

I forgot dead variable here which yields to -Wall warning.  Also I
noticed that deque copy operation may be optimized if we know that
size <= max_size.
With this change we can now fully optimize away unused deque copies.  I
am not sure how common it is in practice, but I think all the containers
should be optimizable this way.

Next interesting challenge (seen at llvm binary) is std::hash.

Here is updated patch which got regtested&bootstrapped on x86_64-linux

libstdc++-v3/ChangeLog:

* include/bits/deque.tcc (std::deque::_M_reallocate_map): Add
__builtin_unreachable check to declare that maps are not very large.
* include/bits/stl_deque.h (std::deque::size): Add __builtin_unreachable
to check for maximal size of map.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/deque-1.C: New test.
* g++.dg/tree-ssa/deque-2.C: New test.

diff --git a/gcc/testsuite/g++.dg/tree-ssa/deque-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/deque-1.C
new file mode 100644
index 000..c639ebb1a5f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/deque-1.C
@@ -0,0 +1,9 @@
+// { dg-do compile } 
+// { dg-options "-O1 -fdump-tree-optimized" }
+#include 
+void
+test(std::deque &q, int v)
+{
+  q.push_back (v);
+}
+// { dg-final { scan-tree-dump-not "throw_bad_alloc" "optimized" } }
diff --git a/gcc/testsuite/g++.dg/tree-ssa/deque-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/deque-2.C
new file mode 100644
index 000..7e268b3f018
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/deque-2.C
@@ -0,0 +1,10 @@
+// { dg-do compile }
+

Re: [PATCH] wwwdocs: Align the DCO text for the GNU Toolchain to match community usage.

2024-11-24 Thread Mark Wielaard

Hi Jason,

On Fri, Nov 22, 2024 at 05:13:30PM +0100, Jason Merrill wrote:
> My take has been that this change is not necessary for us because
> the FSF can accept copyright assignment for pseudonymous
> contributions, so individual reviewers don't need to adjudicate
> whether a particular pseudonym is sufficiently "known".

A partially agree. But I still think we need to clarify what is meant
by "real name", "known identity" or "(anonymous) pseudonym".

The problem with the term "real name" is that it is sometimes
associated with "legal name" which depends on tricky interpretation of
local jurisdiction. People might have been called Jeffery by their
parents and the state, but now go by Jeff. People might have adopted
their spouse last name, or use both now. Or they (want to get)
divorced and revert to a previous name. People might be in
transition. Or simply decide to go by just their initials BA.

If that is how they identify themselves publicly and that is what they
feel is their "real name" then I think we can call that a "known
identity" and feel confident they can certify the conditions of the
DCO in place for the project.

Where it get tricky is if they use an (anonymous) pseudonym/identity
just for contributing to the project. They might not want anybody to
know how/what they contribute to other projects. They might use a
(throwaway) email addess different from their normal email address
because the organization or company they work for doesn't want it to
be publicly known they contribute to the project (and whether they
have been given a corporate disclaimer). In that case they cannot
really certify the conditions of the DCO and might have to go through
the FSF process to clarify things.

It would be good to be explicit about the different usages of
"pseudonyms" and which processes have to be followed.

Cheers,

Mark

[Bug fortran/84869] [12/13/14/15 Regression] ICE in gfc_class_len_get, at fortran/trans-expr.c:233

2024-11-24 Thread Paul Richard Thomas

Fixed as 'obvious' on 13-branch to mainline with commit
r15-5629-g470ebd31843db58fc503ccef38b82d0da93c65e4

An error with PR number in the mainline ChangeLogs will be corrected
tomorrow.

Fortran: Fix segfault in allocation of unlimited poly array [PR84869]

2024-11-24  Paul Thomas  

gcc/fortran/ChangeLog

PR fortran/84869
* trans-expr.cc (trans_class_vptr_len_assignment): To access
the '_len' field, 're' must be unlimited polymorphic.

gcc/testsuite/
PR fortran/84869
* gfortran.dg/pr84869.f90: Comment out test of component refs.

Re: [RFC/RFA][PATCH v6 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-11-24 Thread Jeff Law





On 11/24/24 9:27 AM, Mariam Arutunian wrote:





Thank you very much! I'll have a look.
Please let me know if there's anything specific you’d like me to address.
Not yet.  Things are looking really good.  Enough so that I've been 
diving into the small word targets (avr, pru, rl78).


Jeff

Re: [PATCH] pa: Remove pa_section_type_flags

2024-11-24 Thread John David Anglin

I don't see any regressions with this change.  Patch is okay
if you remove declaration of pa_section_type_flags in pa.cc.

Dave

On Thu, Nov 21, 2024 at 09:04:52PM +0800, Xi Ruoyao wrote:
> It's no longer needed since r15-4842 (when the target-independent code
> started to handle the case).
> 
> gcc/ChangeLog:
> 
>   * config/pa/pa.cc (pa_section_type_flags): Remove.
>   (TARGET_SECTION_TYPE_FLAGS): Remove.
> ---
> 
> I don't have a hppa machine to test this, but conceptually this should
> be correct.  Ok for trunk?
> 
>  gcc/config/pa/pa.cc | 21 -
>  1 file changed, 21 deletions(-)
> 
> diff --git a/gcc/config/pa/pa.cc b/gcc/config/pa/pa.cc
> index 94ee7dbfa8e..783b922a5fc 100644
> --- a/gcc/config/pa/pa.cc
> +++ b/gcc/config/pa/pa.cc
> @@ -407,8 +407,6 @@ static size_t n_deferred_plabels = 0;
>  
>  #undef TARGET_LEGITIMATE_CONSTANT_P
>  #define TARGET_LEGITIMATE_CONSTANT_P pa_legitimate_constant_p
> -#undef TARGET_SECTION_TYPE_FLAGS
> -#define TARGET_SECTION_TYPE_FLAGS pa_section_type_flags
>  #undef TARGET_LEGITIMATE_ADDRESS_P
>  #define TARGET_LEGITIMATE_ADDRESS_P pa_legitimate_address_p
>  
> @@ -10900,25 +10898,6 @@ pa_legitimate_constant_p (machine_mode mode, rtx x)
>return true;
>  }
>  
> -/* Implement TARGET_SECTION_TYPE_FLAGS.  */
> -
> -static unsigned int
> -pa_section_type_flags (tree decl, const char *name, int reloc)
> -{
> -  unsigned int flags;
> -
> -  flags = default_section_type_flags (decl, name, reloc);
> -
> -  /* Function labels are placed in the constant pool.  This can
> - cause a section conflict if decls are put in ".data.rel.ro"
> - or ".data.rel.ro.local" using the __attribute__ construct.  */
> -  if (strcmp (name, ".data.rel.ro") == 0
> -  || strcmp (name, ".data.rel.ro.local") == 0)
> -flags |= SECTION_WRITE | SECTION_RELRO;
> -
> -  return flags;
> -}
> -
>  /* pa_legitimate_address_p recognizes an RTL expression that is a
> valid memory address for an instruction.  The MODE argument is the
> machine mode for the MEM expression that wants to use this address.
> -- 
> 2.47.0
> 


signature.asc
Description: PGP signature

[SPARC] Fix PR target/117715

2024-11-24 Thread Eric Botcazou

This fixes the vectorization regressions present on the SPARC by switching 
from vcond[u] patterns to vec_cmp[u] + vcond_mask_ patterns.  While I was at 
it, I merged the patterns for V4HI/V2SI and V8QI enabled with VIS 3/VIS 4 to 
follow the model of those enabled with VIS 4B, and standardized the mnemonics 
to the version documented in the Oracle SPARC architecture 2015.

Bootstrapped/regtested on SPARC64/Solaris 10.4, applied on the mainline.


2024-11-24  Eric Botcazou  

PR target/117715
* config/sparc/sparc-protos.h (sparc_expand_vcond): Rename to...
(sparc_expand_vcond_mask): ...this.
* config/sparc/sparc.cc (TARGET_VECTORIZE_GET_MASK_MODE): Define.
(sparc_vis_init_builtins): Adjust the CODE_FOR_* identifiers.
(sparc_get_mask_mode): New function.
(sparc_expand_vcond): Rename to...
(sparc_expand_vcond_mask): ...this and adjust.
* config/sparc/sparc.md (unspec): Remove UNSPEC_FCMP & UNSPEC_FUCMP
and rename UNSPEC_FPUCMPSHL into UNSPEC_FPCMPUSHL.
(fcmp_vis): Merge into...
(fpcmp8_vis): Merge into...
(fpcmp_vis): ...this.
(fucmp8_vis): Merge into...
(fpcmpu_vis): Merge into...
(fpcmpu_vis): ...this.
(vec_cmp): New expander.
(vec_cmpu): Likewise.
(vcond): Delete.
(vcondv8qiv8qi): Likewise.
(vcondu): Likewise.
(vconduv8qiv8qi): Likewise.
(vcond_mask_): New expander.
(fpcmpshl): Adjust.
(fpcmpushl): Likewise.
(fpcmpdeshl): Likewise.
(fpcmpurshl): Likewise.
* doc/md.texi (vcond_mask_len_): Fix pasto.


2024-11-24  Eric Botcazou  

* gcc.target/sparc/20230328-1.c: Adjust to new mnemonics.
* gcc.target/sparc/20230328-4.c: Likewise.
* gcc.target/sparc/fcmp.c: Likewise.
* gcc.target/sparc/fucmp.c: Likewise.

-- 
Eric Botcazoudiff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h
index bc306083e5f..333f48e82da 100644
--- a/gcc/config/sparc/sparc-protos.h
+++ b/gcc/config/sparc/sparc-protos.h
@@ -106,7 +106,7 @@ extern void sparc_expand_compare_and_swap (rtx op[]);
 extern void sparc_expand_vector_init (rtx, rtx);
 extern void sparc_expand_vec_perm_bmask(machine_mode, rtx);
 extern bool sparc_expand_conditional_move (machine_mode, rtx *);
-extern void sparc_expand_vcond (machine_mode, rtx *, int, int);
+extern void sparc_expand_vcond_mask (machine_mode, rtx *, int);
 unsigned int sparc_regmode_natural_size (machine_mode);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/sparc/sparc.cc b/gcc/config/sparc/sparc.cc
index 3935a97fac8..f1944e510e5 100644
--- a/gcc/config/sparc/sparc.cc
+++ b/gcc/config/sparc/sparc.cc
@@ -719,6 +719,7 @@ static HOST_WIDE_INT sparc_constant_alignment (const_tree, HOST_WIDE_INT);
 static bool sparc_vectorize_vec_perm_const (machine_mode, machine_mode,
 	rtx, rtx, rtx,
 	const vec_perm_indices &);
+static opt_machine_mode sparc_get_mask_mode (machine_mode);
 static bool sparc_can_follow_jump (const rtx_insn *, const rtx_insn *);
 static HARD_REG_SET sparc_zero_call_used_regs (HARD_REG_SET);
 static machine_mode sparc_c_mode_for_floating_type (enum tree_index);
@@ -972,6 +973,9 @@ char sparc_hard_reg_printed[8];
 #undef TARGET_VECTORIZE_VEC_PERM_CONST
 #define TARGET_VECTORIZE_VEC_PERM_CONST sparc_vectorize_vec_perm_const
 
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE sparc_get_mask_mode
+
 #undef TARGET_CAN_FOLLOW_JUMP
 #define TARGET_CAN_FOLLOW_JUMP sparc_can_follow_jump
 
@@ -11271,40 +11275,40 @@ sparc_vis_init_builtins (void)
   /* Pixel compare.  */
   if (TARGET_ARCH64)
 {
-  def_builtin_const ("__builtin_vis_fcmple16", CODE_FOR_fcmple16di_vis,
+  def_builtin_const ("__builtin_vis_fcmple16", CODE_FOR_fpcmple16di_vis,
 			 SPARC_BUILTIN_FCMPLE16, di_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fcmple32", CODE_FOR_fcmple32di_vis,
+  def_builtin_const ("__builtin_vis_fcmple32", CODE_FOR_fpcmple32di_vis,
 			 SPARC_BUILTIN_FCMPLE32, di_ftype_v2si_v2si);
-  def_builtin_const ("__builtin_vis_fcmpne16", CODE_FOR_fcmpne16di_vis,
+  def_builtin_const ("__builtin_vis_fcmpne16", CODE_FOR_fpcmpne16di_vis,
 			 SPARC_BUILTIN_FCMPNE16, di_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fcmpne32", CODE_FOR_fcmpne32di_vis,
+  def_builtin_const ("__builtin_vis_fcmpne32", CODE_FOR_fpcmpne32di_vis,
 			 SPARC_BUILTIN_FCMPNE32, di_ftype_v2si_v2si);
-  def_builtin_const ("__builtin_vis_fcmpgt16", CODE_FOR_fcmpgt16di_vis,
+  def_builtin_const ("__builtin_vis_fcmpgt16", CODE_FOR_fpcmpgt16di_vis,
 			 SPARC_BUILTIN_FCMPGT16, di_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fcmpgt32", CODE_FOR_fcmpgt32di_vis,
+  def_builtin_const ("__builtin_vis_fcmpgt32", CODE_FOR_fpcmpgt32di_vis,
 			 SPARC_BUILTIN_FCMPGT32, di_ftype_v2si_v2si);
-  def_builtin_const ("__builtin_vis_fcmpeq16", CODE_FOR_fcmpeq16

Re: [PATCH v8] Target-independent store forwarding avoidance.

2024-11-24 Thread Jeff Law





On 11/9/24 2:48 AM, Konstantinos Eleftheriou wrote:

From: kelefth 

This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

  strbw2, [x1, 1]
  ldr x0, [x1]  # Expensive store forwarding to larger load.

To:

  ldr x0, [x1]
  strbw2, [x1]
  bfi x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

   Neoverse-N1:  +29.4%
   Intel Coffeelake: +13.1%
   AMD 5950X:+17.5%

The transformation is rejected on cases that would cause store_bit_field
to generate subreg expressions on different register classes.
Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain
such cases and have been marked as XFAIL.

There is a special handling for machines with BITS_BIG_ENDIAN !=
BYTES_BIG_ENDIAN. The need for this came up from an issue in H8
architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN
is false. In that case, the START parameter of store_bit_field
needs to be calculated from the end of the destination register.

gcc/ChangeLog:

* Makefile.in (OBJS): Add avoid-store-forwarding.o.
* common.opt (favoid-store-forwarding): New option.
* common.opt.urls: Regenerate.
* doc/invoke.texi: New param store-forwarding-max-distance.
* doc/passes.texi: Document new pass.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document new pass.
* params.opt (store-forwarding-max-distance): New param.
* passes.def: Add pass_rtl_avoid_store_forwarding before
pass_early_remat.
* target.def (avoid_store_forwarding_p): New DEFHOOK.
* target.h (struct store_fwd_info): Declare.
* targhooks.cc (default_avoid_store_forwarding_p): New function.
* targhooks.h (default_avoid_store_forwarding_p): Declare.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.
* avoid-store-forwarding.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test.
 * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New test.

Signed-off-by: Philipp Tomsich 
Signed-off-by: Konstantinos Eleftheriou 

Series-version: 8

Series-changes: 8
- Fix store_bit_field call for big-endian targets, where
  BITS_BIG_ENDIAN is false.
- Handle store_forwarding_max_distance = 0 as a special case that
  disables cost checks for avoid-store-forwarding.
- Update testcases for AArch64 and add testcases for x86-64.

Series-changes: 7
- Fix bug when copying back the load register, in the case that the
  load is eliminated.

Series-changes: 6
- Reject the transformation on cases that would cause store_bit_field
  to generate subreg expressions on different register classes.
  Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c
   contain such cases and have been marked as XFAIL.
- Use optimize_bb_for_speed_p instead of optimize_insn_for_speed_p.
- Inline and remove get_load_mem.
- New implementation for is_store_forwarding.
- Refactor the main loop in avoid_store_forwarding.
- Avoid using the word 'forwardings'.
- Use lowpart_subreg instead of validate_subreg + gen_rtx_subreg.
- Don't use df_insn_rescan where not needed.
- Change order of emitting stores and bit insert instructions.
- Check and reject loads for which the dest register overlaps with src.
- Remove unused variable.
- Change some gen_mov_insn function calls to gen_rtx_SET.
- Subtract the cost of eliminated load, instead of 1, for the total 
cost.
- Use delete_insn instead of set_insn_deleted.
- Regenerate common.opt.urls.
- Add some more comments.

Series-changes: 5
- Fix bug with BIG_ENDIAN targets.
- Fix bug with unrecognized instructions.
- Fix / simplify pass init/fini.

Series-changes: 4
- Change pass scheduling to run after sched1.
- Add target hook to decide whether a store forwarding instance
  should be avoided or not.
- Fix bugs.

Series-changes: 3
- Only emit SUBREG after calling validate_subreg.
- Fix memory corruption due to vec self-reference.
- Fix bitmap_bit_in_range_p ICE due to BLKMode.
- Reject

Re: [PATCH] RISC-V: Ensure vtype for full-register moves [PR117544].

2024-11-24 Thread Jeff Law





On 11/22/24 10:48 AM, Robin Dapp wrote:

Hi,

as discussed in PR117544 the VTYPE register is not preserved across
function calls.  Even though vmv1r-like instructions operate
independently of the actual vtype they still require a valid vtype.  As
we cannot guarantee that the vtype is valid we must make sure to emit a
vsetvl between a function call and vmv1r.v.

This patch makes the necessary changes by splitting the full-reg-move
insns into patterns that use the vtype register and adding vmov to the
types of instructions requiring a vset.

Regtested on rv64gcv but the CI knows best :)

Regards
  Robin

PR target/117544

gcc/ChangeLog:

* config/riscv/vector.md (*mov_whole): Split.
(*mov_fract): Ditto.
(*mov): Ditto.
(*mov_vls): Ditto.
(*mov_reg_whole_vtype): New pattern with vtype use.
(*mov_fract_vtype): Ditto.
(*mov_vtype): Ditto.
(*mov_vls_vtype): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-args-4.c: Expect vsetvl.
* gcc.target/riscv/rvv/base/pr117544.c: New test.

OK.

Jeff

Re: [PATCH 4/4] RISC-V: Add -fcf-protection=[full|branch|return] to enable zicfiss, zicfilp.

2024-11-24 Thread Kito Cheng

I guess this should also adjust the testcase as well?


On Fri, Nov 15, 2024 at 6:55 PM Monk Chiang  wrote:
>
> gcc/ChangeLog:
> * gcc/config/riscv/riscv.cc
>   (is_zicfilp_p): New function.
>   (is_zicfiss_p): New function.
> * gcc/config/riscv/riscv-zicfilp.cc: Update.
> * gcc/config/riscv/riscv.h: Update.
> * gcc/config/riscv/riscv.md: Update.
> ---
>  gcc/config/riscv/riscv-zicfilp.cc |  2 +-
>  gcc/config/riscv/riscv.cc | 52 ---
>  gcc/config/riscv/riscv.h  |  8 +++--
>  gcc/config/riscv/riscv.md | 10 +++---
>  4 files changed, 52 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-zicfilp.cc 
> b/gcc/config/riscv/riscv-zicfilp.cc
> index f3015385aa9..1865a90bd04 100644
> --- a/gcc/config/riscv/riscv-zicfilp.cc
> +++ b/gcc/config/riscv/riscv-zicfilp.cc
> @@ -150,7 +150,7 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>  {
> -  return TARGET_ZICFILP;
> +  return is_zicfilp_p ();
>  }
>
>virtual unsigned int execute (function *)
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index bda982f085c..cb69eaa3c43 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -6615,7 +6615,7 @@ riscv_legitimize_call_address (rtx addr)
>rtx reg = RISCV_CALL_ADDRESS_TEMP (Pmode);
>riscv_emit_move (reg, addr);
>
> -  if (TARGET_ZICFILP)
> +  if (is_zicfilp_p ())
> {
>   rtx sw_guarded = RISCV_CALL_ADDRESS_LPAD (Pmode);
>   emit_insn (gen_set_guarded (Pmode, reg));
> @@ -6625,7 +6625,7 @@ riscv_legitimize_call_address (rtx addr)
>return reg;
>  }
>
> -  if (TARGET_ZICFILP && REG_P (addr))
> +  if (is_zicfilp_p () && REG_P (addr))
>  emit_insn (gen_set_lpl (Pmode, const1_rtx));
>
>return addr;
> @@ -7391,7 +7391,7 @@ riscv_save_reg_p (unsigned int regno)
>if (regno == GP_REGNUM || regno == THREAD_POINTER_REGNUM)
> return false;
>
> -  if (regno == RETURN_ADDR_REGNUM && TARGET_ZICFISS)
> +  if (regno == RETURN_ADDR_REGNUM && is_zicfiss_p ())
> return true;
>
>/* We must save every register used in this function.  If this is not a
> @@ -10202,10 +10202,10 @@ riscv_file_end_indicate_exec_stack ()
>long GNU_PROPERTY_RISCV_FEATURE_1_AND  = 0;
>unsigned long feature_1_and = 0;
>
> -  if (TARGET_ZICFISS)
> +  if (is_zicfilp_p ())
>  feature_1_and |= 0x1 << 0;
>
> -  if (TARGET_ZICFILP)
> +  if (is_zicfiss_p ())
>  feature_1_and |= 0x1 << 1;
>
>if (feature_1_and)
> @@ -10265,7 +10265,7 @@ riscv_output_mi_thunk (FILE *file, tree thunk_fndecl 
> ATTRIBUTE_UNUSED,
>/* Mark the end of the (empty) prologue.  */
>emit_note (NOTE_INSN_PROLOGUE_END);
>
> -  if (TARGET_ZICFILP)
> +  if (is_zicfilp_p ())
>  emit_insn(gen_lpad (const1_rtx));
>
>/* Determine if we can use a sibcall to call FUNCTION directly.  */
> @@ -10488,6 +10488,20 @@ riscv_override_options_internal (struct gcc_options 
> *opts)
>
>/* Convert -march and -mrvv-vector-bits to a chunks count.  */
>riscv_vector_chunks = riscv_convert_vector_chunks (opts);
> +
> +  if (opts->x_flag_cf_protection != CF_NONE)
> +{
> +  if ((opts->x_flag_cf_protection & CF_RETURN) == CF_RETURN
> + && !TARGET_ZICFISS)
> +   error ("%<-fcf-protection%> is not compatible with this target");
> +
> +  if ((opts->x_flag_cf_protection & CF_BRANCH) == CF_BRANCH
> + && !TARGET_ZICFILP)
> +   error ("%<-fcf-protection%> is not compatible with this target");
> +
> +  opts->x_flag_cf_protection
> +  = (cf_protection_level) (opts->x_flag_cf_protection | CF_SET);
> +}
>  }
>
>  /* Implement TARGET_OPTION_OVERRIDE.  */
> @@ -10778,7 +10792,7 @@ riscv_trampoline_init (rtx m_tramp, tree fndecl, rtx 
> chain_value)
>
>/* Work out the offsets of the pointers from the start of the
>   trampoline code.  */
> -  if (!TARGET_ZICFILP)
> +  if (!is_zicfilp_p ())
>  gcc_assert (ARRAY_SIZE (trampoline) * 4 == TRAMPOLINE_CODE_SIZE);
>else
>  gcc_assert (ARRAY_SIZE (trampoline_cfi) * 4 == TRAMPOLINE_CODE_SIZE);
> @@ -10806,7 +10820,7 @@ riscv_trampoline_init (rtx m_tramp, tree fndecl, rtx 
> chain_value)
>unsigned insn_count = 0;
>
>/* Insert lpad, if zicfilp is enabled.  */
> -  if (TARGET_ZICFILP)
> +  if (is_zicfilp_p ())
> {
>   unsigned HOST_WIDE_INT lpad_code;
>   lpad_code = OPCODE_AUIPC | (0 << SHIFT_RD) | (lp_value << IMM_BITS);
> @@ -10868,7 +10882,7 @@ riscv_trampoline_init (rtx m_tramp, tree fndecl, rtx 
> chain_value)
>insn_count++;
>
>/* For zicfilp only, insert lui t2, 1, because use jr t0.  */
> -  if (TARGET_ZICFILP)
> +  if (is_zicfilp_p ())
> {
>   unsigned HOST_WIDE_INT set_lpl_code;
>   set_lpl_code  = OPCODE_LUI
> @@ -10898,7 +10912,7 @@ riscv_trampoline_init (rtx m_tramp, tree fnde

Re: [PATCH 1/2] asan: Support dynamic shadow offset

2024-11-24 Thread Kito Cheng

Committed with changelog update and minor tweak (move RISC-V bits to
second patch)

On Wed, Nov 20, 2024 at 4:18 AM Jeff Law  wrote:
>
>
>
> On 11/14/24 9:14 PM, Kito Cheng wrote:
> > AddressSanitizer has supported dynamic shadow offsets since 2016[1], but
> > GCC hasn't implemented this yet because targets using dynamic shadow
> > offsets, such as Fuchsia and iOS, are mostly unsupported in GCC.
> >
> > However, RISC-V 64 switched to dynamic shadow offsets this year[2] because
> > virtual memory space support varies across different RISC-V cores, such as
> > Sv39, Sv48, and Sv57. We realized that the best way to handle this
> > situation is by using a dynamic shadow offset to obtain the offset at
> > runtime.
> >
> > We introduce a new target hook, TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P, to
> > determine if the target is using a dynamic shadow offset, so this change
> > won't affect the static offset path. Additionally, TARGET_ASAN_SHADOW_OFFSET
> > continues to work even if TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P is non-zero,
> > ensuring that KASAN functions as expected.
> >
> > This patch set has been verified on the Banana Pi F3, currently one of the
> > most popular RISC-V development boards. All AddressSanitizer-related tests
> > passed without introducing new regressions.
> >
> > It was also verified on AArch64 and x86_64 with no regressions in
> > AddressSanitizer.
> >
> > [1] 
> > https://github.com/llvm/llvm-project/commit/130a190bf08a3d955d9db24dac936159dc049e12
> > [2] 
> > https://github.com/llvm/llvm-project/commit/da0c8b275564f814a53a5c19497669ae2d99538d
> > ---
> >   gcc/asan.cc   | 80 ---
> >   gcc/asan.h|  3 ++
> >   gcc/config/riscv/riscv.cc |  3 ++
> >   gcc/doc/tm.texi   |  6 ++-
> >   gcc/doc/tm.texi.in|  2 +
> >   gcc/sanopt.cc |  4 ++
> >   gcc/target.def|  8 +++-
> >   gcc/toplev.cc |  3 +-
> >   8 files changed, 101 insertions(+), 8 deletions(-)
> Needs a ChangeLog.  New functions should have function comments.
>
> OK with those changes if nobody has objected in 48hrs.
>
> Thanks,
> jeff

Re: [PATCH 2/2] RISC-V: Use dynamic shadow offset

2024-11-24 Thread Kito Cheng

committed :)

On Wed, Nov 20, 2024 at 3:26 AM Jeff Law  wrote:
>
>
>
> On 11/14/24 9:14 PM, Kito Cheng wrote:
> > Switch to dynamic offset so that we can support Sv39, Sv48, and Sv57 at
> > the same time without building multiple libasan versions!
> >
> > [1] 
> > https://github.com/llvm/llvm-project/commit/da0c8b275564f814a53a5c19497669ae2d99538d
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (riscv_asan_shadow_offset): Use dynamic
> >   offset for RV64.
> >   (riscv_asan_dynamic_shadow_offset_p): New.
> OK once prereqs are committed.
>
> jeff
>

Re: [PATCH v2] sched1: parameterize pressure scheduling spilling agressiveness [PR/114729]

2024-11-24 Thread Vineet Gupta

On 11/6/24 12:11, Vineet Gupta wrote:
> changes since v1
>   * Changed target hook to --param
>   * squash addon patch for RISC-V opting-in, testcase here
>   * updated changelog with latest perf numbers

ping !

> ---
>
> sched1 computes ECC (Excess Change Cost) for each insn, which represents
> the register pressure attributed to the insn.
> Currently the pressure sensitive schduling algorithm deliberately ignores
> negative values (pressure reduction), making them 0 (neutral), leading
> to more spills. This happens due to the assumption that the compiler has
> a reasonably accurate processor pipeline scheduling model and thus tries
> to aggresively fill pipeline bubbles with spill slots.
>
> This however might not be true, as the model might not be available for
> certains uarches or even applicable especially for modern out-of-order cores.
>
> The existing heuristic induces spill frenzy on RISC-V, noticably so on
> SPEC2017 507.Cactu. If insn scheduling is disabled completely, the
> total dynamic icounts for this workload are reduced in half from
> ~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns).
>
> This patch adds --param=cycle-accurate-model={0,1} to gate the spill
> behavior.
>
>  - The default (1) preserves existing spill behavior.
>
>  - targets/uarches sensitive to spilling can override the param to (0)
>to get the reverse effect. RISC-V backend does so too.
>
> The actual perf numbers are very promising.
>
> (1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs:
>
>   Before:
>   --
>   Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':
>
>   4,917,712.97 msec task-clock:u #1.000 CPUs 
> utilized
>  5,314  context-switches:u   #1.081 /sec
>  3  cpu-migrations:u #0.001 /sec
>204,784  page-faults:u#   41.642 /sec
>  7,868,291,222,513  cycles:u #1.600 GHz
>  2,615,069,866,153  instructions:u   #0.33  insn per 
> cycle
> 10,799,381,890  branches:u   #2.196 M/sec
> 15,714,572  branch-misses:u  #0.15% of all 
> branches
>
>   After:
>   -
>   Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':
>
>   4,552,979.58 msec task-clock:u #0.998 CPUs 
> utilized
>205,020  context-switches:u   #   45.030 /sec
>  2  cpu-migrations:u #0.000 /sec
>204,221  page-faults:u#   44.854 /sec
>  7,285,176,204,764  cycles:u(7.4% faster)#1.600 GHz
>  2,145,284,345,397  instructions:u (17.96% fewer)#0.29  insn per 
> cycle
> 10,799,382,011  branches:u   #2.372 M/sec
> 16,235,628  branch-misses:u  #0.15% of all 
> branches
>
> (2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs.
>
> gcc/ChangeLog:
>   PR target/11472
>   * params.opt (--param=cycle-accurate-model=): New opt.
>   * doc/invoke.texi (cycle-accurate-model): Document.
>   * haifa-sched.cc (model_excess_group_cost): Return negative
>   delta if param_cycle_accurate_model is 0.
>   (model_excess_cost): Ceil negative baseECC to 0 only if
>   param_cycle_accurate_model is 1.
>   Dump the actual ECC value.
>   * config/riscv/riscv.cc (riscv_option_override): Set param
>   to 0.
>
> gcc/testsuite/ChangeLog:
>   PR target/114729
>   * gcc.target/riscv/riscv.exp: Enable new tests to build.
>   * gcc.target/riscv/sched1-spills/spill1.cpp: Add new test.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/riscv.cc |  4 +++
>  gcc/doc/invoke.texi   |  7 
>  gcc/haifa-sched.cc| 32 ++-
>  gcc/params.opt|  4 +++
>  gcc/testsuite/gcc.target/riscv/riscv.exp  |  2 ++
>  .../gcc.target/riscv/sched1-spills/spill1.cpp | 32 +++
>  6 files changed, 73 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 2e9ac280c8f2..75fcadfc3b58 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10549,6 +10549,10 @@ riscv_option_override (void)
>  param_sched_pressure_algorithm,
>  SCHED_PRESSURE_MODEL);
>  
> +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> +param_cycle_accurate_model,
> +0);
> +
>/* Function to allocate machine-dependent function status.  */
>init_machine_status = &riscv_init_machine_status;
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 714616

Re: [PATCH] sched1: debug/model: dump predecessor list and BB num [NFC]

2024-11-24 Thread Vineet Gupta

On 11/6/24 14:20, Vineet Gupta wrote:
> This is broken out of predecessor promotion patch so that debugging can
> proceed during stage1 restrictions.
>
> Signed-off-by: Vineet Gupta 

ping !


> ---
>  gcc/haifa-sched.cc | 10 +-
>  gcc/sched-rgn.cc   | 14 --
>  2 files changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
> index cd4b6baddcd2..4d3977576eed 100644
> --- a/gcc/haifa-sched.cc
> +++ b/gcc/haifa-sched.cc
> @@ -3762,10 +3762,10 @@ model_choose_insn (void)
>count = param_max_sched_ready_insns;
>while (count > 0 && insn)
>   {
> -   fprintf (sched_dump, ";;\t+---   %d [%d, %d, %d, %d]\n",
> +   fprintf (sched_dump, ";;\t+---   %d [%d, %d, %d, %d][%d]\n",
>  INSN_UID (insn->insn), insn->model_priority,
>  insn->depth + insn->alap, insn->depth,
> -INSN_PRIORITY (insn->insn));
> +INSN_PRIORITY (insn->insn), insn->unscheduled_preds);
> count--;
> insn = insn->next;
>   }
> @@ -3859,11 +3859,11 @@ model_reset_queue_indices (void)
> to sched_dump.  */
>  
>  static void
> -model_dump_pressure_summary (void)
> +model_dump_pressure_summary (basic_block bb)
>  {
>int pci, cl;
>  
> -  fprintf (sched_dump, ";; Pressure summary:");
> +  fprintf (sched_dump, ";; Pressure summary (bb %d):", bb->index);
>for (pci = 0; pci < ira_pressure_classes_num; pci++)
>  {
>cl = ira_pressure_classes[pci];
> @@ -3902,7 +3902,7 @@ model_start_schedule (basic_block bb)
>model_curr_point = 0;
>initiate_reg_pressure_info (df_get_live_in (bb));
>if (sched_verbose >= 1)
> -model_dump_pressure_summary ();
> +model_dump_pressure_summary (bb);
>  }
>  
>  /* Free the information associated with GROUP.  */
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index 4f511b3ca504..0e7891f99392 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -2856,15 +2856,25 @@ void debug_dependencies (rtx_insn *head, rtx_insn 
> *tail)
>else
>   print_reservation (sched_dump, insn);
>  
> -  fprintf (sched_dump, "\t: ");
> +  fprintf (sched_dump, "\t: FW:");
>{
>   sd_iterator_def sd_it;
>   dep_t dep;
>  
>   FOR_EACH_DEP (insn, SD_LIST_FORW, sd_it, dep)
> -   fprintf (sched_dump, "%d%s%s ", INSN_UID (DEP_CON (dep)),
> +   fprintf (sched_dump, " %d%s%s%s", INSN_UID (DEP_CON (dep)),
> +DEP_TYPE (dep) == REG_DEP_TRUE ? "t" : "",
>  DEP_NONREG (dep) ? "n" : "",
>  DEP_MULTIPLE (dep) ? "m" : "");
> + if (sched_verbose >= 5)
> +   {
> + fprintf (sched_dump, "\n;;\t\t\t\t\t\t: BK:");
> + FOR_EACH_DEP (insn, SD_LIST_HARD_BACK, sd_it, dep)
> +   fprintf (sched_dump, " %d%s%s%s", INSN_UID (DEP_PRO (dep)),
> +DEP_TYPE (dep) == REG_DEP_TRUE ? "t" : "",
> +DEP_NONREG (dep) ? "n" : "",
> +DEP_MULTIPLE (dep) ? "m" : "");
> +   }
>}
>fprintf (sched_dump, "\n");
>  }

RE: Patch ping - [PATCH] [APX EGPR] Fix indirect call prefix

2024-11-24 Thread Kong, Lingling

Hi,

LGTM.
Now Hongyu and Hongtao are working on APX.

Thanks,
Lingling

> -Original Message-
> From: Gregory Kanter 
> Sent: Saturday, November 23, 2024 8:16 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Kong, Lingling ; Gregory Kanter
> 
> Subject: Patch ping - [PATCH] [APX EGPR] Fix indirect call prefix
> 
> Hello,
> I would like to ping the patch
> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668105.html
> please.
> 
> Also CC'ing someone who is working on APX, sorry if this is frowned upon.
> 
> Thanks.

Re: Patch ping - [PATCH] [APX EGPR] Fix indirect call prefix

2024-11-24 Thread Hongtao Liu

On Mon, Nov 25, 2024 at 2:32 PM Kong, Lingling  wrote:
>
> Hi,
>
> LGTM.
> Now Hongyu and Hongtao are working on APX.
Ok.
>
> Thanks,
> Lingling
>
> > -Original Message-
> > From: Gregory Kanter 
> > Sent: Saturday, November 23, 2024 8:16 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kong, Lingling ; Gregory Kanter
> > 
> > Subject: Patch ping - [PATCH] [APX EGPR] Fix indirect call prefix
> >
> > Hello,
> > I would like to ping the patch
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668105.html
> > please.
> >
> > Also CC'ing someone who is working on APX, sorry if this is frowned upon.
> >
> > Thanks.



-- 
BR,
Hongtao

51 matches

Mail list logo