[gcc r15-1644] rs6000: Fix wrong RTL patterns for vector merge high/low char on LE

2024-06-26 Thread Kewen Lin via Gcc-cvs
https://gcc.gnu.org/g:62520e4e9f7e2fe8a16ee57a4bd35da2e921ae22

commit r15-1644-g62520e4e9f7e2fe8a16ee57a4bd35da2e921ae22
Author: Kewen Lin 
Date:   Wed Jun 26 02:16:17 2024 -0500

rs6000: Fix wrong RTL patterns for vector merge high/low char on LE

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low char, which are altivec_vmrg[hl]b.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs.  These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order.  So it's
mapped into vmrghb on BE while vmrglb on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
8-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence.  The newly constructed test case
pr106069-1.c is a typical example for this issue.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghb expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Co-authored-by: Xionghu Luo 

PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ...
(altivec_vmrghb_direct_be): ... this.  Add condition 
BYTES_BIG_ENDIAN.
(altivec_vmrghb_direct_le): New define_insn.
(altivec_vmrglb_direct): Rename to ...
(altivec_vmrglb_direct_be): ... this.  Add condition 
BYTES_BIG_ENDIAN.
(altivec_vmrglb_direct_le): New define_insn.
(altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be
for BE and gen_altivec_vmrglb_direct_le for LE.
(altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be
for BE and gen_altivec_vmrghb_direct_le for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghb_direct by
CODE_FOR_altivec_vmrghb_direct_be for BE and
CODE_FOR_altivec_vmrghb_direct_le for LE.  And replace
CODE_FOR_altivec_vmrglb_direct by
CODE_FOR_altivec_vmrglb_direct_be for BE and
CODE_FOR_altivec_vmrglb_direct_le for LE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106069-1.c: New test.

Diff:
---
 gcc/config/rs6000/altivec.md  | 66 +--
 gcc/config/rs6000/rs6000.cc   |  8 ++--
 gcc/testsuite/gcc.target/powerpc/pr106069-1.c | 39 
 3 files changed, 95 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index dcc71cc0f52..a0e8a35b843 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1152,15 +1152,16 @@
(use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
  (vec_concat:V32QI
@@ -1174,7 +1175,25 @@
 (const_int 5) (const_int 21)
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 2 "register_operand" "v")
+   (match

[gcc r15-1645] rs6000: Fix wrong RTL patterns for vector merge high/low short on LE

2024-06-26 Thread Kewen Lin via Gcc-cvs
https://gcc.gnu.org/g:812c70bf4981958488331d4ea5af8709b5321da1

commit r15-1645-g812c70bf4981958488331d4ea5af8709b5321da1
Author: Kewen Lin 
Date:   Wed Jun 26 02:16:17 2024 -0500

rs6000: Fix wrong RTL patterns for vector merge high/low short on LE

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low short, which are altivec_vmrg[hl]h.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs.  These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order.  So it's
mapped into vmrghh on BE while vmrglh on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
16-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence.  The newly constructed test case
pr106069-2.c is a typical example for this issue on element type
short.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghh expands
into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Co-authored-by: Xionghu Luo 

PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ...
(altivec_vmrghh_direct_be): ... this.  Add condition 
BYTES_BIG_ENDIAN.
(altivec_vmrghh_direct_le): New define_insn.
(altivec_vmrglh_direct): Rename to ...
(altivec_vmrglh_direct_be): ... this.  Add condition 
BYTES_BIG_ENDIAN.
(altivec_vmrglh_direct_le): New define_insn.
(altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be
for BE and gen_altivec_vmrglh_direct_le for LE.
(altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be
for BE and gen_altivec_vmrghh_direct_le for LE.
(vec_widen_umult_hi_v16qi): Adjust the call to
gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE
and by gen_altivec_vmrglh for LE.
(vec_widen_smult_hi_v16qi): Likewise.
(vec_widen_umult_lo_v16qi): Adjust the call to
gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE
and by gen_altivec_vmrghh for LE.
(vec_widen_smult_lo_v16qi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghh_direct by
CODE_FOR_altivec_vmrghh_direct_be for BE and
CODE_FOR_altivec_vmrghh_direct_le for LE.  And replace
CODE_FOR_altivec_vmrglh_direct by
CODE_FOR_altivec_vmrglh_direct_be for BE and
CODE_FOR_altivec_vmrglh_direct_le for LE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106069-2.c: New test.

Diff:
---
 gcc/config/rs6000/altivec.md  | 76 +++
 gcc/config/rs6000/rs6000.cc   |  8 +--
 gcc/testsuite/gcc.target/powerpc/pr106069-2.c | 37 +
 3 files changed, 94 insertions(+), 27 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index a0e8a35b843..5af9bf920a2 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1203,17 +1203,18 @@
(use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-   : gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })
 
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
-(vec_select:V8HI
+   (vec_select:V8HI
  (vec_concat:V16HI
(match_operand:V8HI 

[gcc r15-1646] i386: Remove declaration of unused functions

2024-06-26 Thread Christophe Lyon via Gcc-cvs
https://gcc.gnu.org/g:f4e847ba69a36d433d68cc2b41068cd59ffa1cd3

commit r15-1646-gf4e847ba69a36d433d68cc2b41068cd59ffa1cd3
Author: Evgeny Karpov 
Date:   Tue Jun 25 21:59:35 2024 +

i386: Remove declaration of unused functions

The patch fixes the issue introduced in

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=63512c72df09b43d56ac7680cdfd57a66d40c636
and reported at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655599.html .

Regards,
Evgeny

The patch fixes the issue with compilation on x86_64-gnu-linux
when warnings for unused functions are treated as errors.

gcc/ChangeLog:

* config/i386/i386.cc (legitimize_dllimport_symbol): Remove unused
functions.
(legitimize_pe_coff_extern_decl): Likewise.

Diff:
---
 gcc/config/i386/i386.cc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b0ef1bf08e0..1f71ed04be6 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -104,8 +104,6 @@ along with GCC; see the file COPYING3.  If not see
 /* This file should be included last.  */
 #include "target-def.h"
 
-static rtx legitimize_dllimport_symbol (rtx, bool);
-static rtx legitimize_pe_coff_extern_decl (rtx, bool);
 static void ix86_print_operand_address_as (FILE *, rtx, addr_space_t, bool);
 static void ix86_emit_restore_reg_using_pop (rtx, bool = false);


[gcc r15-1647] [aarch64] Add support for -mcpu=grace

2024-06-26 Thread Kyrylo Tkachov via Gcc-cvs
https://gcc.gnu.org/g:7fada36c77829a197f63dde0d48ca33139105202

commit r15-1647-g7fada36c77829a197f63dde0d48ca33139105202
Author: Kyrylo Tkachov 
Date:   Wed Jun 26 09:42:11 2024 +0200

[aarch64] Add support for -mcpu=grace

This adds support for the NVIDIA Grace CPU to aarch64.
We reuse the tuning decisions for the Neoverse V2 core, but include a
number of architecture features that are not enabled by default in
-mcpu=neoverse-v2.

This allows Grace users to more simply target the CPU with -mcpu=grace
rather than remembering what extensions to tag on top of
-mcpu=neoverse-v2.

Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/

* config/aarch64/aarch64-cores.def (grace): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.

Signed-off-by: Kyrylo Tkachov 

Diff:
---
 gcc/config/aarch64/aarch64-cores.def | 2 ++
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 0e05e81761c..e58bc0f27de 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -194,6 +194,8 @@ AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, 
(I8MM, BF16, SVE2_BITPER
 AARCH64_CORE("cobalt-100",   cobalt100, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x6d, 0xd49, -1)
 
 AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
+AARCH64_CORE("grace", grace, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
SVE2_AES, SVE2_SHA3, SVE2_SM4, PROFILE), neoversev2, 0x41, 0xd4f, -1)
+
 AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 
 /* Generic Architecture Processors.  */
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 9b1f32a0330..719fd3dc62a 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,grace,demeter,generic,generic_armv8_a,generic_armv9_a"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 729dbc1691e..30c4b002d1f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21437,8 +21437,8 @@ performance of the code.  Permissible values for this 
option are:
 @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
 @samp{oryon-1},
 @samp{neoverse-512tvb}, @samp{neoverse-e1}, @samp{neoverse-n1},
-@samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2}, @samp{qdf24xx},
-@samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
+@samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2}, @samp{grace},
+@samp{qdf24xx}, @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
 @samp{octeontx}, @samp{octeontx81},  @samp{octeontx83},
 @samp{octeontx2}, @samp{octeontx2t98}, @samp{octeontx2t96}
 @samp{octeontx2t93}

[gcc/aoliva/heads/testbase] (143 commits) optab: Add isnormal_optab for isnormal builtin

2024-06-26 Thread Alexandre Oliva via Gcc-cvs
The branch 'aoliva/heads/testbase' was updated to point to:

 5a10ac0e592... optab: Add isnormal_optab for isnormal builtin

It previously pointed to:

 301927d9335... ada: Reference to nonexistent operator in reduction express

Diff:

Summary of changes (added commits):
---

  5a10ac0... optab: Add isnormal_optab for isnormal builtin (*)
  680eda8... optab: Add isfinite_optab for isfinite builtin (*)
  eed2027... [libstdc++] [testsuite] no libatomic for vxworks (*)
  54d2339... [testsuite] [arm] [vect] adjust mve-vshr test [PR113281] (*)
  aac00d0... Optimize a < 0 ? -1 : 0 to (signed)a >> 31. (*)
  01f8b10... [PATCH 11/11] Handle subroutine types in CodeView (*)
  009b329... [PATCH 10/11] Handle bitfields for CodeView (*)
  3800a78... diagnostics: introduce diagnostic-global-context.cc (*)
  d681c52... diagnostics: eliminate various implicit uses of global_dc (*)
  1796790... testsuite: use check-jsonschema for validating .sarif files (*)
  9fe669c... Daily bump. (*)
  737449e... c++: decltype of capture proxy of ref [PR115504] (*)
  3e64a68... [PATCH 09/11] Handle arrays for CodeView (*)
  0a5f559... [PATCH 08/11] Handle unions for CodeView. (*)
  7d413a8... libstdc++: Simplify std::valarray initialization helpers (*)
  0381445... modula2: tidyup remove unused procedures and unused paramet (*)
  9d8021d... libstdc++: Replace viewcvs links in docs with cgit links (*)
  fc382a3... c++: ICE with __has_unique_object_representations [PR115476 (*)
  b1e828d... [PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension (*)
  d16355c... PR modula2/115540 gcc/m2/mc-boot-ch/Gtermios.cc error retur (*)
  1ea95cc... Add param for bb limit to invoke fast_vrp. (*)
  ed6ffc4... c++: ICE with generic lambda and pack expansion [PR115425] (*)
  71f484d... c++: ICE with __dynamic_cast redecl [PR115501] (*)
  3b9b8d6... ira: Scale save/restore costs of callee save registers with (*)
  9f168b4... PR modula2/115536 Expression is evaluated incorrectly when  (*)
  7c28228... [committed] Fix fr30-elf newlib build failure with late-com (*)
  b87e19a... late-combine: Honor targetm.cannot_copy_insn_p (*)
  06ebb7c... c++: alias CTAD and copy deduction guide [PR115198] (*)
  e3915c1... c++: using non-dep array var of unknown bound [PR115358] (*)
  21f1073... Fix PR c/115587, uninitialized variable in c_parser_omp_loo (*)
  3587bfa... GORI cleanups (*)
  d27049a... doc: gccint: Fix typos in jump_table_data description (*)
  b621506... Add a debug counter for late-combine (*)
  7107574... libatomic: Add rcpc3 128-bit atomic operations for AArch64 (*)
  d4db77c... SPARC: fix internal error with -mv8plus on 64-bit Linux (*)
  7048005... rs6000: Properly default-disable late-combine passes [PR106 (*)
  b694bf4... Revert one of the force_subreg changes (*)
  17b368b... MIPS: Implement vcond_mask optabs for MSA (*)
  0b45643... MIPS: Output $0 for conditional trap if !ISA_HAS_COND_TRAPI (*)
  30db579... aarch64: Add DLL import/export to AArch64 target (*)
  ed20fee... Adjust DLL import/export implementation for AArch64 (*)
  337632e... aarch64: Add selectany attribute handling (*)
  a86d7e1... Rename functions for reuse in AArch64 (*)
  63512c7... Extract ix86 dllimport implementation to mingw (*)
  104d06c... Move mingw_* declarations to the mingw folder (*)
  777cc6a... c: Fix ICE related to incomplete structures in C23 [PR11493 (*)
  4f86d2a... [PATCH 07/11] Handle structs and classes for CodeView (*)
  41ff74a... [committed][RISC-V] Fix some of the testsuite fallout from  (*)
  55947b3... Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook mode (*)
  7eddf6e... vms: Replace use of LONG_DOUBLE_TYPE_SIZE (*)
  bcd1b7a... rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE (*)
  fafd878... go: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE (*)
  f774721... c-family: Add Warning property to Wnrvo option [PR115624] (*)
  4c8b085... Make transitive relations an oracle option (*)
  c3be325... Daily bump. (*)
  a424318... [PATCH v2 2/3] RISC-V: setmem for RISCV with V extension (*)
  580c37f... RISC-V: Add dg-remove-option for z* extensions (*)
  f02c70d... Fortran: fix passing of optional dummy as actual to optiona (*)
  d8b05ae... PR tree-optimization/113673: Avoid load merging when potent (*)
  c43c74f... tree-optimization/115602 - SLP CSE results in cycles (*)
  2f83ea8... tree-optimization/115528 - fix vect alignment analysis for  (*)
  0de0476... Fix MinGW option -mcrtdll= (*)
  a6f7e3c... Regenerate common.opt.urls (*)
  792f97b... Add a late-combine pass [PR106594] (*)
  5185274... rtl-ssa: Rework _ignoring interfaces (*)
  ae13af2... tree-optimization/115599 - reassoc qsort comparator issue (*)
  6274f10... rs6000: Eliminate unnecessary byte swaps for duplicated con (*)
  ea8061f... fwprop: invoke change_is_worthwhile to judge if a replaceme (*)
  d820db3... [PATCH 06/11] Handle enums for CodeView (*)
  29fec9e... [PATCH 05/11] Handle const and varible modifiers for CodeVi (*)
  35cca2c... [PATCH

[gcc/aoliva/heads/testme] (144 commits) [libstdc++] [testsuite] defer to check_vect_support* [PR115

2024-06-26 Thread Alexandre Oliva via Gcc-cvs
The branch 'aoliva/heads/testme' was updated to point to:

 c6581064248... [libstdc++] [testsuite] defer to check_vect_support* [PR115

It previously pointed to:

 441c8117368... [libstdc++] [testsuite] no libatomic for vxworks

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  441c811... [libstdc++] [testsuite] no libatomic for vxworks
  82ef090... [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]


Summary of changes (added commits):
---

  c658106... [libstdc++] [testsuite] defer to check_vect_support* [PR115
  5a10ac0... optab: Add isnormal_optab for isnormal builtin (*)
  680eda8... optab: Add isfinite_optab for isfinite builtin (*)
  eed2027... [libstdc++] [testsuite] no libatomic for vxworks (*)
  54d2339... [testsuite] [arm] [vect] adjust mve-vshr test [PR113281] (*)
  aac00d0... Optimize a < 0 ? -1 : 0 to (signed)a >> 31. (*)
  01f8b10... [PATCH 11/11] Handle subroutine types in CodeView (*)
  009b329... [PATCH 10/11] Handle bitfields for CodeView (*)
  3800a78... diagnostics: introduce diagnostic-global-context.cc (*)
  d681c52... diagnostics: eliminate various implicit uses of global_dc (*)
  1796790... testsuite: use check-jsonschema for validating .sarif files (*)
  9fe669c... Daily bump. (*)
  737449e... c++: decltype of capture proxy of ref [PR115504] (*)
  3e64a68... [PATCH 09/11] Handle arrays for CodeView (*)
  0a5f559... [PATCH 08/11] Handle unions for CodeView. (*)
  7d413a8... libstdc++: Simplify std::valarray initialization helpers (*)
  0381445... modula2: tidyup remove unused procedures and unused paramet (*)
  9d8021d... libstdc++: Replace viewcvs links in docs with cgit links (*)
  fc382a3... c++: ICE with __has_unique_object_representations [PR115476 (*)
  b1e828d... [PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension (*)
  d16355c... PR modula2/115540 gcc/m2/mc-boot-ch/Gtermios.cc error retur (*)
  1ea95cc... Add param for bb limit to invoke fast_vrp. (*)
  ed6ffc4... c++: ICE with generic lambda and pack expansion [PR115425] (*)
  71f484d... c++: ICE with __dynamic_cast redecl [PR115501] (*)
  3b9b8d6... ira: Scale save/restore costs of callee save registers with (*)
  9f168b4... PR modula2/115536 Expression is evaluated incorrectly when  (*)
  7c28228... [committed] Fix fr30-elf newlib build failure with late-com (*)
  b87e19a... late-combine: Honor targetm.cannot_copy_insn_p (*)
  06ebb7c... c++: alias CTAD and copy deduction guide [PR115198] (*)
  e3915c1... c++: using non-dep array var of unknown bound [PR115358] (*)
  21f1073... Fix PR c/115587, uninitialized variable in c_parser_omp_loo (*)
  3587bfa... GORI cleanups (*)
  d27049a... doc: gccint: Fix typos in jump_table_data description (*)
  b621506... Add a debug counter for late-combine (*)
  7107574... libatomic: Add rcpc3 128-bit atomic operations for AArch64 (*)
  d4db77c... SPARC: fix internal error with -mv8plus on 64-bit Linux (*)
  7048005... rs6000: Properly default-disable late-combine passes [PR106 (*)
  b694bf4... Revert one of the force_subreg changes (*)
  17b368b... MIPS: Implement vcond_mask optabs for MSA (*)
  0b45643... MIPS: Output $0 for conditional trap if !ISA_HAS_COND_TRAPI (*)
  30db579... aarch64: Add DLL import/export to AArch64 target (*)
  ed20fee... Adjust DLL import/export implementation for AArch64 (*)
  337632e... aarch64: Add selectany attribute handling (*)
  a86d7e1... Rename functions for reuse in AArch64 (*)
  63512c7... Extract ix86 dllimport implementation to mingw (*)
  104d06c... Move mingw_* declarations to the mingw folder (*)
  777cc6a... c: Fix ICE related to incomplete structures in C23 [PR11493 (*)
  4f86d2a... [PATCH 07/11] Handle structs and classes for CodeView (*)
  41ff74a... [committed][RISC-V] Fix some of the testsuite fallout from  (*)
  55947b3... Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook mode (*)
  7eddf6e... vms: Replace use of LONG_DOUBLE_TYPE_SIZE (*)
  bcd1b7a... rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE (*)
  fafd878... go: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE (*)
  f774721... c-family: Add Warning property to Wnrvo option [PR115624] (*)
  4c8b085... Make transitive relations an oracle option (*)
  c3be325... Daily bump. (*)
  a424318... [PATCH v2 2/3] RISC-V: setmem for RISCV with V extension (*)
  580c37f... RISC-V: Add dg-remove-option for z* extensions (*)
  f02c70d... Fortran: fix passing of optional dummy as actual to optiona (*)
  d8b05ae... PR tree-optimization/113673: Avoid load merging when potent (*)
  c43c74f... tree-optimization/115602 - SLP CSE results in cycles (*)
  2f83ea8... tree-optimization/115528 - fix vect alignment analysis for  (*)
  0de0476... Fix MinGW option -mcrtdll= (*)
  a6f7e3c... Regenerate common.opt.urls (*)
  792f97b... Add a late-combine pass [PR106594] (*)
  5185274... rtl-ssa: Rework _ignoring interfaces (*)
  ae13af2... tree-optimization/115599 

[gcc(refs/users/aoliva/heads/testme)] [libstdc++] [testsuite] defer to check_vect_support* [PR115454]

2024-06-26 Thread Alexandre Oliva via Libstdc++-cvs
https://gcc.gnu.org/g:c658106424868e6512d9042693e3296efc68916d

commit c658106424868e6512d9042693e3296efc68916d
Author: Alexandre Oliva 
Date:   Wed Jun 26 05:54:44 2024 -0300

[libstdc++] [testsuite] defer to check_vect_support* [PR115454]

The newly-added testcase overrides the default dg-do action set by
check_vect_support_and_set_flags (in libstdc++-dg/conformance.exp), so
it attempts to run the test even if runtime vector support is not
available.

Remove the explicit dg-do directive, so that the default is honored,
and the test is run if vector support is found, and only compiled
otherwise.


for  libstdc++-v3/ChangeLog

PR libstdc++/115454
* testsuite/experimental/simd/pr115454_find_last_set.cc: Defer
to check_vect_support_and_set_flags's default dg-do action.

Diff:
---
 libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc 
b/libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc
index 25a713b4e94..4ade8601f27 100644
--- a/libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc
+++ b/libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc
@@ -1,5 +1,4 @@
 // { dg-options "-std=gnu++17" }
-// { dg-do run { target *-*-* } }
 // { dg-require-effective-target c++17 }
 // { dg-additional-options "-march=x86-64-v4" { target avx512f_runtime } }
 // { dg-require-cmath "" }


[gcc r15-1648] arm: make arm_predict_doloop_p reject loops with calls

2024-06-26 Thread Andre Simoes Dias Vieira via Gcc-cvs
https://gcc.gnu.org/g:ad20ad7dddcb052429346ae5f94b4a603925084a

commit r15-1648-gad20ad7dddcb052429346ae5f94b4a603925084a
Author: Andre Vieira 
Date:   Wed Jun 26 11:07:01 2024 +0100

arm: make arm_predict_doloop_p reject loops with calls

With the introduction of low overhead loops we defined arm_predict_doloop_p,
this is meant to be a low-weight check to rule out loops we are not 
considering
for doloop optimization and it is used by other passes to prevent 
optimizations
that may hurt the doloop optimization later on. The reason these are meant 
to be
lightweight is because it's used by pre-RTL optimizations, meaning we can't 
do
the same checks that doloop does.

After the definition of arm_predict_doloop_p, when testing for 
armv8.1-m.main,
tree-ssa/ivopts-3.c failed the scan-dump check as the dump now matched an 
extra
'!= 0' introduced by:
Doloop cmp iv use: if (ivtmp_1 != 0)
Predict loop 1 can perform doloop optimization later.

where previously we had:
Predict doloop failure due to target specific checks.

and after this patch:
Predict doloop failure due to call in loop.
Predict doloop failure due to target specific checks.

Added a copy of the original tree-ssa/ivopts-3.c as a target specifc test to
check for the new dump message.

gcc/ChangeLog:

* config/arm/arm.cc (arm_predict_doloop_p): Reject loops with 
function
calls that are not builtins.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/ivopts-3.c: New test.

Diff:
---
 gcc/config/arm/arm.cc   | 16 
 gcc/testsuite/gcc.target/arm/mve/ivopts-3.c | 13 +
 2 files changed, 29 insertions(+)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 7d67d2cfee9..6dab65f493b 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -35587,6 +35587,22 @@ arm_predict_doloop_p (struct loop *loop)
" loop bb complexity.\n");
   return false;
 }
+  else
+{
+  gimple_stmt_iterator gsi = gsi_after_labels (loop->header);
+  while (!gsi_end_p (gsi))
+   {
+ if (is_gimple_call (gsi_stmt (gsi))
+ && !gimple_call_builtin_p (gsi_stmt (gsi)))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Predict doloop failure due to"
+   " call in loop.\n");
+ return false;
+   }
+ gsi_next (&gsi);
+   }
+}
 
   return true;
 }
diff --git a/gcc/testsuite/gcc.target/arm/mve/ivopts-3.c 
b/gcc/testsuite/gcc.target/arm/mve/ivopts-3.c
new file mode 100644
index 000..19b2442ef12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/ivopts-3.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+void f2 (void);
+
+int main (void)
+{
+  int i;
+  for (i = 0; i < 10; i++)
+f2 ();
+}
+
+/* { dg-final { scan-tree-dump "Predict doloop failure due to call in loop." 
"ivopts" } } */


[gcc(refs/users/aoliva/heads/testme)] [i386] drop static decls moved to mingw/winnt-dll.cc

2024-06-26 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:6668cf365efd5eba5efc39313f0cdabb8f9eb658

commit 6668cf365efd5eba5efc39313f0cdabb8f9eb658
Author: Alexandre Oliva 
Date:   Wed Jun 26 07:04:58 2024 -0300

[i386] drop static decls moved to mingw/winnt-dll.cc

Diff:
---
 gcc/config/i386/i386.cc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b0ef1bf08e0..1f71ed04be6 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -104,8 +104,6 @@ along with GCC; see the file COPYING3.  If not see
 /* This file should be included last.  */
 #include "target-def.h"
 
-static rtx legitimize_dllimport_symbol (rtx, bool);
-static rtx legitimize_pe_coff_extern_decl (rtx, bool);
 static void ix86_print_operand_address_as (FILE *, rtx, addr_space_t, bool);
 static void ix86_emit_restore_reg_using_pop (rtx, bool = false);


[gcc r15-1649] Use auto_vec for memory release on return

2024-06-26 Thread J?rgen Kvalsvik via Gcc-cvs
https://gcc.gnu.org/g:19f630e6ae8da7159a8c82f337b699245f66e6a6

commit r15-1649-g19f630e6ae8da7159a8c82f337b699245f66e6a6
Author: Jørgen Kvalsvik 
Date:   Mon Jun 24 21:55:46 2024 +0200

Use auto_vec for memory release on return

Using auto_vec ensure this memory is cleaned up on function exit.

gcc/ChangeLog:

* tree-profile.cc (find_conditions): Use auto_vec.

Diff:
---
 gcc/tree-profile.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index e4bb689cef5..8c9945847ca 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -876,7 +876,7 @@ find_conditions (struct function *fn)
 make_top_index (fnblocks, ctx.B1, ctx.top_index);
 
 /* Bin the Boolean expressions so that exprs[id] -> [x1, x2, ...].  */
-hash_map, vec> exprs;
+hash_map, auto_vec> exprs;
 for (basic_block b : fnblocks)
 {
const unsigned uid = condition_uid (fn, b);


[gcc r15-1651] Use the term MC/DC in help for gcov --conditions

2024-06-26 Thread J?rgen Kvalsvik via Gcc-cvs
https://gcc.gnu.org/g:0bf002100453a0f531855c093f095dc15274a78c

commit r15-1651-g0bf002100453a0f531855c093f095dc15274a78c
Author: Jørgen Kvalsvik 
Date:   Tue Jun 25 08:41:45 2024 +0200

Use the term MC/DC in help for gcov --conditions

Without key terms like "masking" and "MC/DC" it is not at all obvious
what --conditions actually reports on, and there is no easy path for the
user to figure out. By at least including the two key terms MC/DC and
masking users have something to search for.

gcc/ChangeLog:

* gcov.cc (print_usage): Reference masking MC/DC.

Diff:
---
 gcc/gcov.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 0d4ef14e6c9..6f3055718d2 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -973,7 +973,7 @@ print_usage (int error_p)
   fnotice (file, "  -c, --branch-counts Output counts of branches 
taken\n\
 rather than percentages\n");
   fnotice (file, "  -g, --conditionsInclude modified 
condition/decision\n\
-coverage in output\n");
+coverage (masking MC/DC) in output\n");
   fnotice (file, "  -d, --display-progress  Display progress 
information\n");
   fnotice (file, "  -D, --debugDisplay debugging 
dumps\n");
   fnotice (file, "  -f, --function-summariesOutput summaries for each 
function\n");


[gcc r15-1650] Add section on MC/DC in gcov manual

2024-06-26 Thread J?rgen Kvalsvik via Gcc-cvs
https://gcc.gnu.org/g:229bf66f8d5d6df2997cee37575241cae944e4a6

commit r15-1650-g229bf66f8d5d6df2997cee37575241cae944e4a6
Author: Jørgen Kvalsvik 
Date:   Fri Jun 21 20:28:01 2024 +0200

Add section on MC/DC in gcov manual

gcc/ChangeLog:

* doc/gcov.texi: Add MC/DC section.

Diff:
---
 gcc/doc/gcov.texi | 72 +++
 1 file changed, 72 insertions(+)

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index c118061aed5..fa5bdd3d452 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -890,6 +890,78 @@ of times the call was executed will be printed.  This will 
usually be
 100%, but may be less for functions that call @code{exit} or @code{longjmp},
 and thus may not return every time they are called.
 
+When you use the @option{-g} option, your output looks like this:
+
+@smallexample
+$ gcov -t -m -g tmp
+-:0:Source:tmp.cpp
+-:0:Graph:tmp.gcno
+-:0:Data:tmp.gcda
+-:0:Runs:1
+-:1:#include 
+-:2:
+-:3:int
+1:4:main (void)
+-:5:@{
+-:6:  int i, total;
+1:7:  total = 0;
+-:8:
+   11:9:  for (i = 0; i < 10; i++)
+condition outcomes covered 2/2
+   10:   10:total += i;
+-:   11:
+   1*:   12:  int v = total > 100 ? 1 : 2;
+condition outcomes covered 1/2
+condition  0 not covered (true)
+-:   13:
+   1*:   14:  if (total != 45 && v == 1)
+condition outcomes covered 1/4
+condition  0 not covered (true)
+condition  1 not covered (true false)
+#:   15:printf ("Failure\n");
+-:   16:  else
+1:   17:printf ("Success\n");
+1:   18:  return 0;
+-:   19:@}
+@end smallexample
+
+For every condition the number of taken and total outcomes are
+printed, and if there are uncovered outcomes a line will be printed
+for each condition showing the uncovered outcome in parentheses.
+Conditions are identified by their index -- index 0 is the left-most
+condition.  In @code{a || (b && c)}, @var{a} is condition 0, @var{b}
+condition 1, and @var{c} condition 2.
+
+An outcome is considered covered if it has an independent effect on
+the decision, also known as masking MC/DC (Modified Condition/Decision
+Coverage).  In this example the decision evaluates to true and @var{a}
+is evaluated, but not covered.  This is because @var{a} cannot affect
+the decision independently -- both @var{a} and @var{b} must change
+value for the decision to change.
+
+@smallexample
+$ gcov -t -m -g tmp
+-:0:Source:tmp.c
+-:0:Graph:tmp.gcno
+-:0:Data:tmp.gcda
+-:0:Runs:1
+-:1:#include 
+-:2:
+1:3:int main()
+-:4:@{
+1:5:  int a = 1;
+1:6:  int b = 0;
+-:7:
+1:8:  if (a && b)
+condition outcomes covered 1/4
+condition  0 not covered (true false)
+condition  1 not covered (true)
+#:9:printf ("Success!\n");
+-:   10:  else
+1:   11:printf ("Failure!\n");
+-:   12:@}
+@end smallexample
+
 The execution counts are cumulative.  If the example program were
 executed again without removing the @file{.gcda} file, the count for the
 number of times each line in the source was executed would be added to


[gcc r15-1652] Record edge true/false value for gcov

2024-06-26 Thread J?rgen Kvalsvik via Gcc-cvs
https://gcc.gnu.org/g:7a9b535d8abe27abdaa68cdcb22172a666030d06

commit r15-1652-g7a9b535d8abe27abdaa68cdcb22172a666030d06
Author: Jørgen Kvalsvik 
Date:   Tue Jun 4 14:16:22 2024 +0200

Record edge true/false value for gcov

Make gcov aware which edges are the true/false to more accurately
reconstruct the CFG.  There are plenty of bits left in arc_info and it
opens up for richer reporting.

gcc/ChangeLog:

* gcov-io.h (GCOV_ARC_TRUE): New.
(GCOV_ARC_FALSE): New.
* gcov.cc (struct arc_info): Add true_value, false_value.
(read_graph_file): Read true_value, false_value.
* profile.cc (branch_prob): Write GCOV_ARC_TRUE, GCOV_ARC_FALSE.

Diff:
---
 gcc/gcov-io.h  | 2 ++
 gcc/gcov.cc| 8 
 gcc/profile.cc | 4 
 3 files changed, 14 insertions(+)

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 20f805598f0..5dc467c92b1 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -337,6 +337,8 @@ GCOV_COUNTERS
 #define GCOV_ARC_ON_TREE   (1 << 0)
 #define GCOV_ARC_FAKE  (1 << 1)
 #define GCOV_ARC_FALLTHROUGH   (1 << 2)
+#define GCOV_ARC_TRUE  (1 << 3)
+#define GCOV_ARC_FALSE (1 << 4)
 
 /* Object & program summary record.  */
 
diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 6f3055718d2..2e4bd9d3c5d 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -117,6 +117,12 @@ struct arc_info
   /* Loop making arc.  */
   unsigned int cycle : 1;
 
+  /* Is a true arc.  */
+  unsigned int true_value : 1;
+
+  /* Is a false arc.  */
+  unsigned int false_value : 1;
+
   /* Links to next arc on src and dst lists.  */
   struct arc_info *succ_next;
   struct arc_info *pred_next;
@@ -2010,6 +2016,8 @@ read_graph_file (void)
  arc->on_tree = !!(flags & GCOV_ARC_ON_TREE);
  arc->fake = !!(flags & GCOV_ARC_FAKE);
  arc->fall_through = !!(flags & GCOV_ARC_FALLTHROUGH);
+ arc->true_value = !!(flags & GCOV_ARC_TRUE);
+ arc->false_value = !!(flags & GCOV_ARC_FALSE);
 
  arc->succ_next = src_blk->succ;
  src_blk->succ = arc;
diff --git a/gcc/profile.cc b/gcc/profile.cc
index 2b90e6cc510..25d4f4a4b86 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1456,6 +1456,10 @@ branch_prob (bool thunk)
flag_bits |= GCOV_ARC_FAKE;
  if (e->flags & EDGE_FALLTHRU)
flag_bits |= GCOV_ARC_FALLTHROUGH;
+ if (e->flags & EDGE_TRUE_VALUE)
+   flag_bits |= GCOV_ARC_TRUE;
+ if (e->flags & EDGE_FALSE_VALUE)
+   flag_bits |= GCOV_ARC_FALSE;
  /* On trees we don't have fallthru flags, but we can
 recompute them from CFG shape.  */
  if (e->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)


[gcc r15-1653] tree-optimization/115652 - adjust insertion gsi for SLP

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:f80db5495d5f8455b3003951727eb6c8dc67d81d

commit r15-1653-gf80db5495d5f8455b3003951727eb6c8dc67d81d
Author: Richard Biener 
Date:   Wed Jun 26 09:25:27 2024 +0200

tree-optimization/115652 - adjust insertion gsi for SLP

The following adjusts how SLP computes the insertion location.  In
particular it advanced the insert iterator of the found last_stmt.
The vectorizer will later insert stmts _before_ it.  But we also
have the constraint that possibly masked ops may not be scheduled
outside of the loop and as we do not model the loop mask in the
SLP graph we have to adjust for that.  The following moves this
to after the advance since it isn't compatible with that as the
current GIMPLE_COND exception shows.  The PR is about in-order
reduction vectorization which also isn't happy when that's the
very first stmt.

PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Advance the
iterator based on last_stmt only for vector defs.

Diff:
---
 gcc/tree-vect-slp.cc | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b47b7e8c979..1f5b3fccf41 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9629,16 +9629,6 @@ vect_schedule_slp_node (vec_info *vinfo,
   /* Emit other stmts after the children vectorized defs which is
 earliest possible.  */
   gimple *last_stmt = NULL;
-  if (auto loop_vinfo = dyn_cast  (vinfo))
-   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
-   || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
- {
-   /* But avoid scheduling internal defs outside of the loop when
-  we might have only implicitly tracked loop mask/len defs.  */
-   gimple_stmt_iterator si
- = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
-   last_stmt = *si;
- }
   bool seen_vector_def = false;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
@@ -9747,12 +9737,19 @@ vect_schedule_slp_node (vec_info *vinfo,
   else
{
  si = gsi_for_stmt (last_stmt);
- /* When we're getting gsi_after_labels from the starting
-condition of a fully masked/len loop avoid insertion
-after a GIMPLE_COND that can appear as the only header
-stmt with early break vectorization.  */
- if (gimple_code (last_stmt) != GIMPLE_COND)
-   gsi_next (&si);
+ gsi_next (&si);
+
+ /* Avoid scheduling internal defs outside of the loop when
+we might have only implicitly tracked loop mask/len defs.  */
+ if (auto loop_vinfo = dyn_cast  (vinfo))
+   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+   || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ {
+   gimple_stmt_iterator si2
+ = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
+   if (vect_stmt_dominates_stmt_p (last_stmt, *si2))
+ si = si2;
+ }
}
 }


[gcc r15-1654] [committed][RISC-V] Fix expected output for thead store pair test

2024-06-26 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:03a3dffa43145f80548d32b266b9b87be07b52ee

commit r15-1654-g03a3dffa43145f80548d32b266b9b87be07b52ee
Author: Jeff Law 
Date:   Wed Jun 26 06:59:26 2024 -0600

[committed][RISC-V] Fix expected output for thead store pair test

Surya's patch to IRA has improved the code we generate for one of the thead
store pair tests for both rv32 and rv64.  This patch adjusts the 
expectations
of that test.

I've verified that the test now passes on rv32 and rv64 in my tester.  
Pushing
to the trunk.

gcc/testsuite
* gcc.target/riscv/xtheadmempair-3.c: Update expected output.

Diff:
---
 gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c 
b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
index 5dec702819a..99a6ae7f4d7 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
@@ -17,13 +17,11 @@ void bar (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, 
xlen_t, xlen_t);
 void baz (xlen_t a, xlen_t b, xlen_t c, xlen_t d, xlen_t e, xlen_t f, xlen_t 
g, xlen_t h)
 {
   foo (a, b, c, d, e, f, g, h);
-  /* RV64: We don't use 0(sp), therefore we can only get 3 mempairs.  */
-  /* RV32: We don't use 0(sp)-8(sp), therefore we can only get 2 mempairs.  */
   bar (a, b, c, d, e, f, g, h);
 }
 
-/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } } } } */
-/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.ldd\t" 4 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.sdd\t" 4 { target { rv64 } } } } */
 
-/* { dg-final { scan-assembler-times "th.lwd\t" 2 { target { rv32 } } } } */
-/* { dg-final { scan-assembler-times "th.swd\t" 2 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "th.lwd\t" 4 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "th.swd\t" 4 { target { rv32 } } } } */


[gcc r15-1655] [committed] Remove compromised sh test

2024-06-26 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:47b68cda2c4afe32e84c5f18da0196c39e5e0edf

commit r15-1655-g47b68cda2c4afe32e84c5f18da0196c39e5e0edf
Author: Jeff Law 
Date:   Wed Jun 26 07:20:29 2024 -0600

[committed] Remove compromised sh test

Surya's recent patch to IRA improves the code for sh/pr54602-1.c slightly.
Specifically it's able to eliminate a save/restore in the prologue/epilogue 
and
a bit of register shuffling.

As a result there literally aren't any insns that can be used to fill the 
delay
slot of the return, so a nop gets emitted and the test fails.

Given there literally aren't any insns to move into the delay slot, the best
course of action is to just drop the test.

gcc/testsuite
* gcc.target/sh/pr54602-1.c: Delete test.

Diff:
---
 gcc/testsuite/gcc.target/sh/pr54602-1.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/gcc/testsuite/gcc.target/sh/pr54602-1.c 
b/gcc/testsuite/gcc.target/sh/pr54602-1.c
deleted file mode 100644
index e7fb2a9a642..000
--- a/gcc/testsuite/gcc.target/sh/pr54602-1.c
+++ /dev/null
@@ -1,14 +0,0 @@
-/* Verify that the delay slot is stuffed with register pop insns for normal
-   (i.e. not interrupt handler) function returns.  If everything goes as
-   expected we won't see any nop insns.  */
-/* { dg-do compile }  */
-/* { dg-options "-O1" } */
-/* { dg-final { scan-assembler-not "nop" } } */
-
-int test00 (int a, int b);
-
-int
-test01 (int a, int b, int c, int d)
-{
-  return test00 (a, b) + c;
-}


[gcc r15-1656] rs6000, altivec-1-runnable.c update the require-effective-target

2024-06-26 Thread Carl Love via Gcc-cvs
https://gcc.gnu.org/g:4bf719bc5858cf7e5c96a1b396b0b4b0480c7e5b

commit r15-1656-g4bf719bc5858cf7e5c96a1b396b0b4b0480c7e5b
Author: Carl Love 
Date:   Mon Jun 24 12:31:19 2024 -0400

rs6000, altivec-1-runnable.c update the require-effective-target

Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the
require-effective-target for the test.

Diff:
---
 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index 4e32860a169..3f084c91798 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -1,5 +1,6 @@
-/* { dg-do compile { target powerpc*-*-* } } */
-/* { dg-options "-maltivec" } */
+/* { dg-do run { target vmx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 
 #include 


[gcc r15-1657] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-26 Thread Carl Love via Gcc-cvs
https://gcc.gnu.org/g:0699de2a590f147218323c942e91ccf87273af14

commit r15-1657-g0699de2a590f147218323c942e91ccf87273af14
Author: Carl Love 
Date:   Fri Jun 14 12:46:00 2024 -0400

rs6000, altivec-2-runnable.c update the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.  Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change the
require-effective-target for the test.

Diff:
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 17b23eb9d50..660669f69fd 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,6 +1,6 @@
-/* { dg-do run } */
-/* { dg-options "-mvsx" } */
-/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
+/* { dg-do run { target p8vector_hw } } */
+/* { dg-do compile { target { ! p8vector_hw } } } */
+/* { dg-options "-O2  -mdejagnu-cpu=power8" } */
 /* { dg-require-effective-target powerpc_vsx } */
 
 #include 


[gcc r15-1658] rs6000, change altivec*-runnable.c test file names

2024-06-26 Thread Carl Love via Gcc-cvs
https://gcc.gnu.org/g:e499aee4a35cb780ecd1d011661e4ee9bdcd2d48

commit r15-1658-ge499aee4a35cb780ecd1d011661e4ee9bdcd2d48
Author: Carl Love 
Date:   Fri Jun 21 11:56:36 2024 -0400

rs6000, change altivec*-runnable.c test file names

Changed the names of the test files.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the name to
altivec-38.c.
* gcc.target/powerpc/altivec-2-runnable.c: Change the name to
p8vector-builtin-9.c.

Diff:
---
 gcc/testsuite/gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c}   | 0
 .../gcc.target/powerpc/{altivec-2-runnable.c => p8vector-builtin-9.c} | 0
 2 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-38.c
similarity index 100%
rename from gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/altivec-38.c
diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
similarity index 100%
rename from gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c


[gcc r15-1659] RISC-V: Rename amo testcases

2024-06-26 Thread Patrick O'Neill via Gcc-cvs
https://gcc.gnu.org/g:08498f81f0595eb8a90ea33afd7dab44bb76b293

commit r15-1659-g08498f81f0595eb8a90ea33afd7dab44bb76b293
Author: Patrick O'Neill 
Date:   Tue Jun 25 14:14:16 2024 -0700

RISC-V: Rename amo testcases

Rename riscv/amo/ testcases to follow a '{ext}-{model}-{name}-{memory 
order}.c'
naming convention.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-load-acquire.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-load-relaxed.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-load-seq-cst.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-store-compat-seq-cst.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-store-relaxed.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-store-release.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-load-2.c: Move to...
* gcc.target/riscv/amo/a-ztso-load-acquire.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-load-1.c: Move to...
* gcc.target/riscv/amo/a-ztso-load-relaxed.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-load-3.c: Move to...
* gcc.target/riscv/amo/a-ztso-load-seq-cst.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-store-3.c: Move to...
* gcc.target/riscv/amo/a-ztso-store-compat-seq-cst.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-store-1.c: Move to...
* gcc.target/riscv/amo/a-ztso-store-relaxed.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-store-2.c: Move to...
* gcc.target/riscv/amo/a-ztso-store-release.c: ...here.
* gcc.target/riscv/amo/amo-zaamo-preferred-over-zalrsc.c: Move to...
* gcc.target/riscv/amo/zaamo-preferred-over-zalrsc.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-6.c: Move 
to...
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-3.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-2.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-1.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-relaxed.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-4.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-release.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-7.c: Move 
to...
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst-relaxed.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-5.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-4.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-2.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-1.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-3.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-5.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: 
...here.
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-6.c: Move 
to...
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire-release.c: 
...here.
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-3.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire.c: 
...here.
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-2.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-consume.c: 
...here.
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-1.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-relaxed.c: 
...her

[gcc r15-1660] RISC-V: Consolidate amo testcase variants

2024-06-26 Thread Patrick O'Neill via Gcc-cvs
https://gcc.gnu.org/g:aa89e86f70ac65e2d51f33ac45849d05a4f30524

commit r15-1660-gaa89e86f70ac65e2d51f33ac45849d05a4f30524
Author: Patrick O'Neill 
Date:   Tue Jun 25 14:14:17 2024 -0700

RISC-V: Consolidate amo testcase variants

Many riscv/amo/ testcases use check-function-bodies. These testcases can be
consolidated with related testcases (memory ordering variants) without 
affecting
the assertions.

Give functions descriptive names so testsuite failures are obvious from the
'FAIL:' line.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-1.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-2.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-3.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-4.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-5.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-1.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-2.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-3.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-4.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-5.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-1.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-2.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-3.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-4.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-5.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c: Removed.
* gcc.target/riscv/amo/a-rvwmo-fence.c: New test.
* gcc.target/riscv/amo/a-ztso-fence.c: New test.
* gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c: New test.
* gcc.target/riscv/amo/zaamo-ztso-amo-add-int.c: New test.
* gcc.target/riscv/amo/zalrsc-rvwmo-amo-add-int.c: New test.
* gcc.target/riscv/amo/zalrsc-ztso-amo-add-int.c: New test.

Signed-off-by: Patrick O'Neill 

Diff:
---
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-fence.c | 56 
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-fence.c  | 52 +++
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-1.c   | 15 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-2.c   | 16 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-3.c   | 16 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-4.c   | 16 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-5.c   | 16 -
 .../riscv/amo/amo-table-ztso-amo-add-1.c   | 17 -
 .../riscv/amo/amo-table-ztso-amo-add-2.c   | 17 -
 .../riscv/amo/amo-table-ztso-amo-add-3.c   | 17 -
 .../riscv/amo/amo-table-ztso-amo-add-4.c   | 17 -
 .../riscv/amo/amo-table-ztso-amo-add-5.c   | 17 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-1.c  | 15 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-2.c  | 15 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-3.c  | 15 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-4.c  | 15 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-5.c  | 16 -
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c| 22 --
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c| 22 --
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c| 22 --
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c| 22 --
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c| 22 --
 .../gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c | 57 
 .../gcc.target/riscv/amo/zaamo-ztso-amo-add-int.c  | 57 
 .../riscv/amo/zalrsc-rvwmo-amo-add-int.c   | 78 ++
 .../gcc.target/riscv/amo/zalrsc-ztso-amo-add-int.c | 78 ++
 31 files changed, 378 insertions(+), 435 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-fence.c 
b/gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-fence.c
new file mo

[gcc r15-1661] RISC-V: Update testcase comments to point to PSABI rather than Table A.6

2024-06-26 Thread Patrick O'Neill via Gcc-cvs
https://gcc.gnu.org/g:86a3dbeb6c6a36f8cf97c66cef83c9bc3ad82027

commit r15-1661-g86a3dbeb6c6a36f8cf97c66cef83c9bc3ad82027
Author: Patrick O'Neill 
Date:   Tue Jun 25 14:14:18 2024 -0700

RISC-V: Update testcase comments to point to PSABI rather than Table A.6

Table A.6 was originally the source of truth for the recommended mappings.
Point to the PSABI doc since the memory model mappings have been moved 
there.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/a-rvwmo-fence.c: Replace A.6 reference with 
PSABI.
* gcc.target/riscv/amo/a-rvwmo-load-acquire.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-load-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-load-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-compat-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-release.c: Ditto.
* gcc.target/riscv/amo/a-ztso-fence.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-acquire.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-compat-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-release.c: Ditto.
* gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c: Ditto.
* gcc.target/riscv/amo/zaamo-ztso-amo-add-int.c: Ditto.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-release.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-release.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: 
Ditto.

Signed-off-by: Patrick O'Neill 

Diff:
---
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-fence.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-load-acquire.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-load-relaxed.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-load-seq-cst.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-store-compat-seq-cst.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-store-relaxed.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-store-release.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-fence.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-load-acquire.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-load-relaxed.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-load-seq-cst.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-store-compat-seq-cst.c   | 3 ++-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-store-relaxed.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-store-release.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/zaamo-ztso-amo-add-int.c  

[gcc r15-1662] tree-optimization/115629 - missed tail merging

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:629257bcb81434117f1e9c68479032563176dc0c

commit r15-1662-g629257bcb81434117f1e9c68479032563176dc0c
Author: Richard Biener 
Date:   Tue Jun 25 14:04:31 2024 +0200

tree-optimization/115629 - missed tail merging

The following fixes a missed tail-merging observed for the testcase
in PR115629.  The issue is that when deps_ok_for_redirect doesn't
compute both would be valid prevailing blocks it rejects the merge.
The following instead makes sure to record the working block as
prevailing.  Also stmt comparison fails for indirect references
and is not handling memory references thoroughly, failing to unify
array indices and pointers indirected.  The following attempts to
fix this.

PR tree-optimization/115629
* tree-ssa-tail-merge.cc (gimple_equal_p): Handle
memory references better.
(deps_ok_for_redirect): Handle the case not both blocks
are considered a valid prevailing block.

* gcc.dg/tree-ssa/tail-merge-1.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/tail-merge-1.c | 14 ++
 gcc/tree-ssa-tail-merge.cc   | 69 
 2 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tail-merge-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tail-merge-1.c
new file mode 100644
index 000..e5670c33ba3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tail-merge-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dce4" } */
+
+void foo1 (int *restrict a, int *restrict b, int *restrict c,
+  int *restrict d, int *restrict res, int n)
+{
+  for (int i = 0; i < n; i++)
+res[i] = a[i] ? b[i] : (c[i] ? b[i] : d[i]);
+}
+
+/* After tail-merging (run during PRE) we should end up merging the two
+   blocks dereferencing 'b', ending up with two iftmp assigns and the
+   iftmp PHI def.  */
+/* { dg-final { scan-tree-dump-times "iftmp\[^\r\n\]* = " 3 "dce4" } } */
diff --git a/gcc/tree-ssa-tail-merge.cc b/gcc/tree-ssa-tail-merge.cc
index c8b4a79294d..27e7c6a37b2 100644
--- a/gcc/tree-ssa-tail-merge.cc
+++ b/gcc/tree-ssa-tail-merge.cc
@@ -1188,7 +1188,52 @@ gimple_equal_p (same_succ *same_succ, gimple *s1, gimple 
*s2)
{
  t1 = gimple_arg (s1, i);
  t2 = gimple_arg (s2, i);
- if (!gimple_operand_equal_value_p (t1, t2))
+ while (handled_component_p (t1) && handled_component_p (t2))
+   {
+ if (TREE_CODE (t1) != TREE_CODE (t2)
+ || TREE_THIS_VOLATILE (t1) != TREE_THIS_VOLATILE (t2))
+   return false;
+ switch (TREE_CODE (t1))
+   {
+   case COMPONENT_REF:
+ if (TREE_OPERAND (t1, 1) != TREE_OPERAND (t2, 1)
+ || !gimple_operand_equal_value_p (TREE_OPERAND (t1, 2),
+   TREE_OPERAND (t2, 2)))
+   return false;
+ break;
+   case ARRAY_REF:
+   case ARRAY_RANGE_REF:
+ if (!gimple_operand_equal_value_p (TREE_OPERAND (t1, 3),
+TREE_OPERAND (t2, 3)))
+   return false;
+ /* Fallthru.  */
+   case BIT_FIELD_REF:
+ if (!gimple_operand_equal_value_p (TREE_OPERAND (t1, 1),
+TREE_OPERAND (t2, 1))
+ || !gimple_operand_equal_value_p (TREE_OPERAND (t1, 2),
+   TREE_OPERAND (t2, 2)))
+   return false;
+ break;
+   case REALPART_EXPR:
+   case IMAGPART_EXPR:
+   case VIEW_CONVERT_EXPR:
+ break;
+   default:
+   gcc_unreachable ();
+   }
+ t1 = TREE_OPERAND (t1, 0);
+ t2 = TREE_OPERAND (t2, 0);
+   }
+ if (TREE_CODE (t1) == MEM_REF && TREE_CODE (t2) == MEM_REF)
+   {
+ if (TREE_THIS_VOLATILE (t1) != TREE_THIS_VOLATILE (t2)
+ || TYPE_ALIGN (TREE_TYPE (t1)) != TYPE_ALIGN (TREE_TYPE (t2))
+ || !gimple_operand_equal_value_p (TREE_OPERAND (t1, 0),
+   TREE_OPERAND (t2, 0))
+ || TREE_OPERAND (t1, 1) != TREE_OPERAND (t2, 1))
+   return false;
+   }
+ else if (!gimple_operand_equal_value_p (t1, t2))
return false;
}
   return true;
@@ -1462,16 +1507,24 @@ deps_ok_for_redirect_from_bb_to_bb (basic_block from, 
basic_block to)
replacement are dominates by their defs.  */
 
 static bool
-deps_ok_for_redirect (basic_block bb1, basic_block bb2)
+deps_ok_for_redirect (basic_block &bb1, basic_block &bb2)
 {
-  if (BB_CLUSTER (bb1) != NULL)
-bb1 = BB_CLUSTER (bb1)->re

[gcc r15-1663] libstdc++: Fix std::chrono::tzdb to work with vanguard format

2024-06-26 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:0ca8d56f2085715f27ee536c6c344bc47af49cdd

commit r15-1663-g0ca8d56f2085715f27ee536c6c344bc47af49cdd
Author: Jonathan Wakely 
Date:   Tue Apr 30 09:52:13 2024 +0100

libstdc++: Fix std::chrono::tzdb to work with vanguard format

I found some issues in the std::chrono::tzdb parser by testing the
tzdata "vanguard" format, which uses new features that aren't enabled in
the "main" and "rearguard" data formats.

Since 2024a the keyword "minimum" is no longer valid for the FROM and TO
fields in a Rule line, which means that "m" is now a valid abbreviation
for "maximum". Previously we expected either "mi" or "ma". For backwards
compatibility, a FROM field beginning with "mi" is still supported and
is treated as 1900. The "maximum" keyword is only allowed in TO now,
because it makes no sense in FROM. To support these changes the
minmax_year and minmax_year2 classes for parsing FROM and TO are
replaced with a single years_from_to class that reads both fields.

The vanguard format makes use of %z in Zone FORMAT fields, which caused
an exception to be thrown from ZoneInfo::set_abbrev because no % or /
characters were expected when a Zone doesn't use a named Rule. The
ZoneInfo::to(sys_info&) function now uses format_abbrev_str to replace
any %z with the current offset. Although format_abbrev_str also checks
for %s and STD/DST formats, those only make sense when a named Rule is
in effect, so won't occur when ZoneInfo::to(sys_info&) is used.

This change also implements a feature that has always been missing from
time_zone::_M_get_sys_info: finding the Rule that is active before the
specified time point, so that we can correctly handle %s in the FORMAT
for the first new sys_info that gets created. This requires implementing
a poorly documented feature of zic, to get the LETTERS field from a
later transition, as described at
https://mm.icann.org/pipermail/tz/2024-April/058891.html
In order for this to work we need to be able to distinguish an empty
letters field (as used by CE%sT where the variable part is either empty
or "S") from "the letters field is not known for this transition". The
tzdata file uses "-" for an empty letters field, which libstdc++ was
previously replacing with "" when the Rule was parsed. Instead, we now
preserve the "-" in the Rule object, so that "" can be used for the case
where we don't know the letters (and so need to decide it).

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (minmax_year, minmax_year2): Remove.
(years_from_to): New class replacing minmax_year and
minmax_year2.
(format_abbrev_str, select_std_or_dst_abbrev): Move earlier in
the file. Handle "-" for letters.
(ZoneInfo::to): Use format_abbrev_str to expand %z.
(ZoneInfo::set_abbrev): Remove exception. Change parameter from
reference to value.
(operator>>(istream&, Rule&)): Do not clear letters when it
contains "-".
(time_zone::_M_get_sys_info): Add missing logic to find the Rule
in effect before the time point.
* testsuite/std/time/tzdb/1.cc: Adjust for vanguard format using
"GMT" as the Zone name, not as a Link to "Etc/GMT".
* testsuite/std/time/time_zone/sys_info_abbrev.cc: New test.

Diff:
---
 libstdc++-v3/src/c++20/tzdb.cc | 265 +
 .../std/time/time_zone/sys_info_abbrev.cc  | 106 +
 libstdc++-v3/testsuite/std/time/tzdb/1.cc  |   6 +-
 3 files changed, 274 insertions(+), 103 deletions(-)

diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index c7c7cc9deee..7e8cce7ce8c 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -342,51 +342,103 @@ namespace std::chrono
   friend istream& operator>>(istream&, on_day&);
 };
 
-// Wrapper for chrono::year that reads a year, or one of the keywords
-// "minimum" or "maximum", or an unambiguous prefix of a keyword.
-struct minmax_year
+// Wrapper for two chrono::year values, which reads the FROM and TO
+// fields of a Rule line. The FROM field is a year and TO is a year or
+// one of the keywords "maximum" or "only" (or an abbreviation of those).
+// For backwards compatibility, the keyword "minimum" is recognized
+// for FROM and interpreted as 1900.
+struct years_from_to
 {
-  year& y;
+  year& from;
+  year& to;
 
-  friend istream& operator>>(istream& in, minmax_year&& y)
+  friend istream& operator>>(istream& in, years_from_to&& yy)
   {
-   if (ws(in).peek() == 'm') // keywords "minimum" or "maximum"
+   string s;
+   auto c = ws(in).peek();
+   if (c == 'm') [[unlikely]] // keyword "minimum"
  {
-   string s;

[gcc r15-1664] libstdc++: Work around some PSTL test failures for debug mode [PR90276]

2024-06-26 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:003ce8a6c4c28f8d285134afa9a423d0e234cf2e

commit r15-1664-g003ce8a6c4c28f8d285134afa9a423d0e234cf2e
Author: Jonathan Wakely 
Date:   Thu Jun 6 11:50:06 2024 +0100

libstdc++: Work around some PSTL test failures for debug mode [PR90276]

This addresses one known failure due to a bug in the upstream tests, and
a number of timeouts due to the algorithms running much more slowly with
debug mode checks enabled.

libstdc++-v3/ChangeLog:

PR libstdc++/90276
* testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc
[_GLIBCXX_DEBUG]: Add xfail-run-if for debug mode.
* testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc
[_GLIBCXX_DEBUG]: Reduce size of test data.
* testsuite/25_algorithms/pstl/alg_sorting/includes.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_util.h:
Likewise.

Diff:
---
 .../testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc  | 4 
 libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/includes.cc | 4 
 libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc | 1 +
 libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/set_util.h  | 4 
 4 files changed, 13 insertions(+)

diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc
index 61bbca758b4..63e6abe2ea4 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc
@@ -133,7 +133,11 @@ void
 test_by_type(Generator1 generator1, Generator2 generator2, Compare comp)
 {
 using namespace std;
+#ifdef _GLIBCXX_DEBUG
+size_t max_size = 1000;
+#else
 size_t max_size = 1;
+#endif
 Sequence in1(max_size, [](size_t v) { return T(v); });
 Sequence exp(max_size, [](size_t v) { return T(v); });
 size_t m;
diff --git a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/includes.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/includes.cc
index ed07810618d..1567c369c4c 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/includes.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/includes.cc
@@ -77,7 +77,11 @@ void
 test_includes(Compare compare)
 {
 
+#ifdef _GLIBCXX_DEBUG
+const std::size_t n_max = 1;
+#else
 const std::size_t n_max = 100;
+#endif
 
 // The rand()%(2*n+1) encourages generation of some duplicates.
 std::srand(42);
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc
index 6d441cc3ae9..797d0ee9340 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc
@@ -3,6 +3,7 @@
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
 // { dg-require-effective-target tbb_backend }
+// { dg-xfail-run-if "see PR 90276" { debug_mode } }
 
 //===-- partial_sort.pass.cpp 
-===//
 //
diff --git a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/set_util.h 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/set_util.h
index ecf5cd1c89d..214e3452aa7 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/set_util.h
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/set_util.h
@@ -51,7 +51,11 @@ namespace TestUtils
 void
 test_set_op(Compare compare)
 {
+#ifdef _GLIBCXX_DEBUG
+const std::size_t n_max = 1000;
+#else
 const std::size_t n_max = 10;
+#endif
 
 // The rand()%(2*n+1) encourages generation of some duplicates.
 std::srand(4200);


[gcc r15-1665] libstdc++: Increase timeouts for PSTL tests in debug mode [PR90276]

2024-06-26 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:e65b6627a36869b01bbe128a5324e4b415b28880

commit r15-1665-ge65b6627a36869b01bbe128a5324e4b415b28880
Author: Jonathan Wakely 
Date:   Wed Jun 12 17:11:23 2024 +0100

libstdc++: Increase timeouts for PSTL tests in debug mode [PR90276]

These tests compile very slowly in debug mode.

libstdc++-v3/ChangeLog:

PR libstdc++/90276
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc:
Increase timeout for debug mode.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
Likewise.

Diff:
---
 .../testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc | 1 +
 .../25_algorithms/pstl/alg_modifying_operations/transform_binary.cc  | 1 +
 libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc   | 1 +
 .../testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc  | 1 +
 libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc  | 1 +
 .../testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc | 1 +
 6 files changed, 6 insertions(+)

diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc
 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc
index ea647c6c23a..1b788e1b7ee 100644
--- 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc
+++ 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- rotate_copy.pass.cpp 
--===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc
 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc
index 1f5f239a94b..16b815c5d51 100644
--- 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc
+++ 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- transform_binary.pass.cpp 
-===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc
index 1173186f65c..441f5d1e378 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- mismatch.pass.cpp 
-===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc
 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc
index 924aa78652e..78edeb025d7 100644
--- 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc
+++ 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- lexicographical_compare.pass.cpp 
--===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc
index 0a9f41ca179..e4bd435d192 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- minmax_element.pass.cpp 
---===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_

[gcc r15-1666] libstdc++: Remove duplicate test

2024-06-26 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:6eff23314a7e51715f988acf3c19824fe87b5754

commit r15-1666-g6eff23314a7e51715f988acf3c19824fe87b5754
Author: Jonathan Wakely 
Date:   Thu Jun 20 22:17:08 2024 +0100

libstdc++: Remove duplicate test

We currently have 808590.cc which only runs for C++98 mode, and
808590-cxx11.cc which only runs for C++11 and later, but have almost
identical content (except for a defaulted special member in the C++11
one, to suppress a -Wdeprecated-copy warning).

This was done originally to ensure that the test ran for both C++98 mode
and C++11 mode, because the logic being tested was different enough to
need both to be tested. But it's trivial to run all tests in multiple
-std modes now, using GLIBCXX_TESTSUITE_STDS, so we don't need two
separate tests. We can remove one of the tests and allow the other one
to run in any -std mode.

libstdc++-v3/ChangeLog:

* 
testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc:
Copy defaulted assignment operator from 808590-cxx11.cc to
suppress a warning.
* 
testsuite/20_util/specialized_algorithms/uninitialized_copy/808590-cxx11.cc:
Removed.

Diff:
---
 .../uninitialized_copy/808590-cxx11.cc | 55 --
 .../uninitialized_copy/808590.cc   |  7 ++-
 2 files changed, 5 insertions(+), 57 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590-cxx11.cc
 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590-cxx11.cc
deleted file mode 100644
index 2e93aa75738..000
--- 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590-cxx11.cc
+++ /dev/null
@@ -1,55 +0,0 @@
-// Copyright (C) 2012-2024 Free Software Foundation, Inc.
-//
-// This file is part of the GNU ISO C++ Library.  This library is free
-// software; you can redistribute it and/or modify it under the
-// terms of the GNU General Public License as published by the
-// Free Software Foundation; either version 3, or (at your option)
-// any later version.
-
-// This library is distributed in the hope that it will be useful,
-// but WITHOUT ANY WARRANTY; without even the implied warranty of
-// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-// GNU General Public License for more details.
-
-// You should have received a copy of the GNU General Public License along
-// with this library; see the file COPYING3.  If not see
-// .
-
-// { dg-do run { target c++11 } }
-
-// This is identical to ./808590.cc but for C++11 and later.
-// See https://gcc.gnu.org/ml/libstdc++/2014-05/msg00027.html
-
-#include 
-#include 
-
-// 4.4.x only
-struct c
-{
-  void *m;
-
-  c(void* o = 0) : m(o) {}
-  c(const c &r) : m(r.m) {}
-
-  c& operator=(const c &) = default;
-
-  template
-explicit c(T &o) : m((void*)0xdeadbeef) { }
-};
-
-int main()
-{
-  std::vector cbs;
-  const c cb((void*)0xcafebabe);
-
-  for (int fd = 62; fd < 67; ++fd)
-{
-  cbs.resize(fd + 1);
-  cbs[fd] = cb;
-}
-
-  for (int fd = 62; fd< 67; ++fd)
-if (cb.m != cbs[fd].m)
-  throw std::runtime_error("wrong");
-  return 0;
-}
diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc
 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc
index d2e07127323..b53db0e154d 100644
--- 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc
+++ 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc
@@ -15,8 +15,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++98" }
-
 #include 
 #include 
 
@@ -28,6 +26,11 @@ struct c
   c(void* o = 0) : m(o) {}
   c(const c &r) : m(r.m) {}
 
+#if __cplusplus >= 201103L
+  // Avoid -Wdeprecated-copy warning.
+  c& operator=(const c &) = default;
+#endif
+
   template
 explicit c(T &o) : m((void*)0xdeadbeef) { }
 };


[gcc r15-1667] libstdc++: Add script to update docs for a new release branch

2024-06-26 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:0731985920cdeeeb028f03ddb8a7f035565c1594

commit r15-1667-g0731985920cdeeeb028f03ddb8a7f035565c1594
Author: Jonathan Wakely 
Date:   Tue Jun 25 23:59:19 2024 +0100

libstdc++: Add script to update docs for a new release branch

This should be run on a release branch after branching from trunk.
Various links and references to trunk in the docs will be updated to
refer to the new release branch.

libstdc++-v3/ChangeLog:

* scripts/update_release_branch.sh: New file.

Diff:
---
 libstdc++-v3/scripts/update_release_branch.sh | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/libstdc++-v3/scripts/update_release_branch.sh 
b/libstdc++-v3/scripts/update_release_branch.sh
new file mode 100755
index 000..f8109ed0ba3
--- /dev/null
+++ b/libstdc++-v3/scripts/update_release_branch.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+# This should be run on a release branch after branching from trunk.
+# Various links and references to trunk in the docs will be updated to
+# refer to the new release branch.
+
+# The major version of the new release branch.
+major=$1
+(($major)) || { echo "$0: Integer argument expected" >& 2 ; exit 1; }
+
+# This assumes GNU sed
+sed -i "s@^mainline GCC, not in any particular major.\$@the GCC ${major} 
series.@" doc/xml/manual/status_cxx*.xml
+sed -i 
's@https://gcc.gnu.org/cgit/gcc/tree/libstdc++-v3/testsuite/[^"]\+@&?h=releases%2Fgcc-'${major}@
 doc/xml/manual/allocator.xml doc/xml/manual/mt_allocator.xml
+sed -i 
"s@https://gcc.gnu.org/onlinedocs/gcc/Invoking-GCC.html@https://gcc.gnu.org/onlinedocs/gcc-${major}.1.0/gcc/Invoking-GCC.html@";
 doc/xml/manual/using.xml


[gcc r14-10350] tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE

2024-06-26 Thread Jiawei Chen via Gcc-cvs
https://gcc.gnu.org/g:6e6f10c3ad6f96752acd9c35b653b387d5c3fcf6

commit r14-10350-g6e6f10c3ad6f96752acd9c35b653b387d5c3fcf6
Author: Jiawei 
Date:   Mon May 27 15:40:51 2024 +0800

tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at 
tree-ssa-pre.c:2780): Return NULL_TREE when deal special cases.

Return NULL_TREE when genop3 equal EXACT_DIV_EXPR.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html

version log v3: remove additional POLY_INT_CST check.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652795.html

gcc/ChangeLog:

* tree-ssa-pre.cc (create_component_ref_by_pieces_1): New 
conditions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr115214.c: New test.

Diff:
---
 .../gcc.target/riscv/rvv/vsetvl/pr115214.c | 52 ++
 gcc/tree-ssa-pre.cc| 10 +++--
 2 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
new file mode 100644
index 000..fce2e9da766
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 -w" 
} */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+
+#include 
+
+static inline __attribute__(()) int vaddq_f32();
+static inline __attribute__(()) int vload_tillz_f32(int nlane) {
+  vint32m1_t __trans_tmp_9;
+  {
+int __trans_tmp_0 = nlane;
+{
+  vint64m1_t __trans_tmp_1;
+  vint64m1_t __trans_tmp_2;
+  vint64m1_t __trans_tmp_3;
+  vint64m1_t __trans_tmp_4;
+  if (__trans_tmp_0 == 1) {
+{
+  __trans_tmp_3 =
+  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
+}
+__trans_tmp_4 = __trans_tmp_2;
+  }
+  __trans_tmp_4 = __trans_tmp_3;
+  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
+}
+  }
+  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' cannot 
be passed to an unprototyped function} } */
+}
+
+char CFLOAT_add_args[3];
+const int *CFLOAT_add_steps;
+const int CFLOAT_steps;
+
+__attribute__(()) void CFLOAT_add() {
+  char *b_src0 = &CFLOAT_add_args[0], *b_src1 = &CFLOAT_add_args[1],
+   *b_dst = &CFLOAT_add_args[2];
+  const float *src1 = (float *)b_src1;
+  float *dst = (float *)b_dst;
+  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
+  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
+  const int hstep = 4 / 2;
+  vfloat32m1x2_t a;
+  int len = 255;
+  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
+int b = vload_tillz_f32(len);
+int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
'__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
+  }
+  for (; len > 0; --len, b_src0 += CFLOAT_steps,
+  b_src1 += CFLOAT_add_steps[1], b_dst += CFLOAT_add_steps[2])
+;
+}
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 75217f5cde1..5cf1968bc26 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -2685,11 +2685,15 @@ create_component_ref_by_pieces_1 (basic_block block, 
vn_reference_t ref,
   here as the element alignment may be not visible.  See
   PR43783.  Simply drop the element size for constant
   sizes.  */
-   if (TREE_CODE (genop3) == INTEGER_CST
+   if ((TREE_CODE (genop3) == INTEGER_CST
&& TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST
&& wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
-(wi::to_offset (genop3)
- * vn_ref_op_align_unit (currop
+(wi::to_offset (genop3) * vn_ref_op_align_unit 
(currop
+ || (TREE_CODE (genop3) == EXACT_DIV_EXPR
+   && TREE_CODE (TREE_OPERAND (genop3, 1)) == INTEGER_CST
+   && operand_equal_p (TREE_OPERAND (genop3, 0), TYPE_SIZE_UNIT 
(elmt_type))
+   && wi::eq_p (wi::to_offset (TREE_OPERAND (genop3, 1)),
+vn_ref_op_align_unit (currop
  genop3 = NULL_TREE;
else
  {


[gcc r15-1669] tree-optimization/115493 - complete previous fix

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b7ba0670a768e76e87e04cfd6a72c28c35333b54

commit r15-1669-gb7ba0670a768e76e87e04cfd6a72c28c35333b54
Author: Richard Biener 
Date:   Wed Jun 26 19:11:04 2024 +0200

tree-optimization/115493 - complete previous fix

The following fixes the 2nd occurance of new_temp missed with the
previous fix.

PR tree-optimization/115493
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use
first scalar result.

Diff:
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 347dac97e49..6f32867f85a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6849,7 +6849,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  tree initial_def = reduc_info->reduc_initial_values[0];
  tree tmp = make_ssa_name (new_scalar_dest);
  epilog_stmt = gimple_build_assign (tmp, COND_EXPR, zcompare,
-initial_def, new_temp);
+initial_def, scalar_results[0]);
  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
  scalar_results[0] = tmp;
}


[gcc r15-1670] tree-optimization/115652 - amend last fix

2024-06-26 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c7cb0dd94589ab501bca27f93641b4074e5a2e99

commit r15-1670-gc7cb0dd94589ab501bca27f93641b4074e5a2e99
Author: Richard Biener 
Date:   Wed Jun 26 19:23:26 2024 +0200

tree-optimization/115652 - amend last fix

The previous fix breaks in the degenerate case when the discovered
last_stmt is equal to the first stmt in the block since then we
undo a required stmt advancement.

PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Only insert
at the start of the block if that strictly dominates
the discovered dependent stmt.

Diff:
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 1f5b3fccf41..1252b613125 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9747,7 +9747,8 @@ vect_schedule_slp_node (vec_info *vinfo,
  {
gimple_stmt_iterator si2
  = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
-   if (vect_stmt_dominates_stmt_p (last_stmt, *si2))
+   if (last_stmt != *si2
+   && vect_stmt_dominates_stmt_p (last_stmt, *si2))
  si = si2;
  }
}


[gcc r15-1671] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-26 Thread Pan Li via Gcc-cvs
https://gcc.gnu.org/g:f2476a2649e9975d454d179145574c21d8218aee

commit r15-1671-gf2476a2649e9975d454d179145574c21d8218aee
Author: Pan Li 
Date:   Thu Jun 27 09:28:04 2024 +0800

Vect: Support truncate after .SAT_SUB pattern in zip

The zip benchmark of coremark-pro have one SAT_SUB like pattern but
truncated as below:

void test (uint16_t *x, unsigned b, unsigned n)
{
  unsigned a = 0;
  register uint16_t *p = x;

  do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
  } while (--n);
}

It will have gimple before vect pass,  it cannot hit any pattern of
SAT_SUB and then cannot vectorize to SAT_SUB.

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = _18 ? iftmp.0_13 : 0;

This patch would like to improve the pattern match to recog above
as truncate after .SAT_SUB pattern.  Then we will have the pattern
similar to below,  as well as eliminate the first 3 dead stmt.

_2 = a_11 - b_12(D);
iftmp.0_13 = (short unsigned int) _2;
_18 = a_11 >= b_12(D);
iftmp.0_5 = (short unsigned int).SAT_SUB (a_11, b_12(D));

The below tests are passed for this patch.
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

* match.pd: Add convert description for minus and capture.
* tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
new logic to handle in_type is incompatibile with out_type,  as
well as rename from.
(vect_recog_build_binary_gimple_stmt): Rename to.
(vect_recog_sat_add_pattern): Leverage above renamed func.
(vect_recog_sat_sub_pattern): Ditto.

Signed-off-by: Pan Li 

Diff:
---
 gcc/match.pd  |  4 ++--
 gcc/tree-vect-patterns.cc | 51 ---
 2 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index cf8a399a744..820591a36b3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3164,9 +3164,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned saturation sub, case 2 (branch with ge):
SAT_U_SUB = X >= Y ? X - Y : 0.  */
 (match (unsigned_integer_sat_sub @0 @1)
- (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
+ (cond^ (ge @0 @1) (convert? (minus (convert1? @0) (convert1? @1))) 
integer_zerop)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
 
 /* Unsigned saturation sub, case 3 (branchless with gt):
SAT_U_SUB = (X - Y) * (X > Y).  */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index cef901808eb..519d15f2a43 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4490,26 +4490,37 @@ vect_recog_mult_pattern (vec_info *vinfo,
 extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
 
-static gcall *
-vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
+static gimple *
+vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 internal_fn fn, tree *type_out,
-tree op_0, tree op_1)
+tree lhs, tree op_0, tree op_1)
 {
   tree itype = TREE_TYPE (op_0);
-  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+  tree otype = TREE_TYPE (lhs);
+  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
+  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
 
-  if (vtype != NULL_TREE
-&& direct_internal_fn_supported_p (fn, vtype, OPTIMIZE_FOR_BOTH))
+  if (v_itype != NULL_TREE && v_otype != NULL_TREE
+&& direct_internal_fn_supported_p (fn, v_itype, OPTIMIZE_FOR_BOTH))
 {
   gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
+  tree in_ssa = vect_recog_temp_ssa_var (itype, NULL);
 
-  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+  gimple_call_set_lhs (call, in_ssa);
   gimple_call_set_nothrow (call, /* nothrow_p */ false);
-  gimple_set_location (call, gimple_location (stmt));
+  gimple_set_location (call, gimple_location (STMT_VINFO_STMT 
(stmt_info)));
+
+  *type_out = v_otype;
 
-  *type_out = vtype;
+  if (types_compatible_p (itype, otype))
+   return call;
+  else
+   {
+ append_pattern_def_seq (vinfo, stmt_info, call, v_itype);
+ tree out_ssa = vect_recog_temp_ssa_var (otype, NULL);
 
-  return call;
+ return gimple_build_assign (out_ssa, NOP_EXPR, in_ssa);
+   }
 }
 
   return NULL;
@@ -4541,13 +4552,13 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
stmt_v

[gcc r15-1672] Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int

2024-06-26 Thread Pan Li via Gcc-cvs
https://gcc.gnu.org/g:212441e19d8179645efbec6dd98a74eb673734dd

commit r15-1672-g212441e19d8179645efbec6dd98a74eb673734dd
Author: Pan Li 
Date:   Wed Jun 26 09:28:05 2024 +0800

Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation truncation.  Aka set the result of truncated value to
the max value when overflow.  It will take the pattern similar
as below.

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##T##_fmt_1 (WT x)\
  {\
bool overflow = x > (WT)(NT)(-1);  \
return ((NT)x) | (NT)-overflow;\
  }

For example, truncated uint16_t to uint8_t, we have

* SAT_TRUNC (254)   => 254
* SAT_TRUNC (255)   => 255
* SAT_TRUNC (256)   => 255
* SAT_TRUNC (65536) => 255

Given below SAT_TRUNC from uint64_t to uint32_t.

DEF_SAT_U_TRUC_FMT_1 (uint64_t, uint32_t)

Before this patch:
__attribute__((noinline))
uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
{
  _Bool overflow;
  unsigned int _1;
  unsigned int _2;
  unsigned int _3;
  uint32_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  overflow_5 = x_4(D) > 4294967295;
  _1 = (unsigned int) x_4(D);
  _2 = (unsigned int) overflow_5;
  _3 = -_2;
  _6 = _1 | _3;
  return _6;
;;succ:   EXIT

}

After this patch:
__attribute__((noinline))
uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
{
  uint32_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_TRUNC (x_4(D)); [tail call]
  return _6;
;;succ:   EXIT

}

The below tests are passed for this patch:
*. The rv64gcv fully regression tests.
*. The rv64gcv build with glibc.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

gcc/ChangeLog:

* internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as
unary_convert.
* match.pd: Add new matching pattern for unsigned int sat_trunc.
* optabs.def (OPTAB_CL): Add unsigned and signed optab.
* tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add
new decl for the matching pattern generated func.
(match_unsigned_saturation_trunc): Add new func impl to match
the .SAT_TRUNC.
(math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match
function under BIT_IOR_EXPR case.

Signed-off-by: Pan Li 

Diff:
---
 gcc/internal-fn.def   |  2 ++
 gcc/match.pd  | 16 
 gcc/optabs.def|  3 +++
 gcc/tree-ssa-math-opts.cc | 32 
 4 files changed, 53 insertions(+)

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a8c83437ada..915d329c05a 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -278,6 +278,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_TRUNC, ECF_CONST, first, sstrunc, ustrunc, 
unary_convert)
+
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
 DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 820591a36b3..3fa3f2e8296 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3210,6 +3210,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Unsigned saturation truncate, case 1 (), sizeof (WT) > sizeof (NT).
+   SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
+(match (unsigned_integer_sat_trunc @0)
+ (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
+   (convert @0))
+ (with {
+   unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
+   unsigned otype_precision = TYPE_PRECISION (type);
+   wide_int trunc_max = wi::mask (itype_precision / 2, false, itype_precision);
+   wide_int int_cst = wi::to_wide (@1, itype_precision);
+  }
+  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+   && TYPE_UNSIGNED (TREE_TYPE (@0))
+   && otype_precision < itype_precision
+   && wi::eq_p (trunc_max, int_cst)
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2f36ed4cb42..a69af51d601 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -63,6 +63,9 @@ OPTAB_CX(fractuns_optab, "fractuns$Q$b$I$a2")
 OPTAB_CL(satfract_optab, "satfract$b$Q$a2", SAT_FRACT, "satfract", 
gen_satfract_conv_libfunc)
 OPTAB_CL(satfractuns_op

[gcc r15-1673] Fix wrong cost of MEM when addr is a lea.

2024-06-26 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:b8153b5417bed02f47354a14ad36100785dfdc47

commit r15-1673-gb8153b5417bed02f47354a14ad36100785dfdc47
Author: liuhongt 
Date:   Mon Jun 24 17:53:22 2024 +0800

Fix wrong cost of MEM when addr is a lea.

416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
The commit adjust rtx_cost of mem to reduce cost of (add op0 disp).
But Cost of ADDR could be cheaper than XEXP (addr, 0) when it's a lea.
It is the case in the PR, the patch adjust rtx_cost to only handle reg
+ disp, for other forms, they're basically all LEA which doesn't have
additional cost of ADD.

gcc/ChangeLog:

PR target/115462
* config/i386/i386.cc (ix86_rtx_costs): Make cost of MEM (reg +
disp) just a little bit more than MEM (reg).

gcc/testsuite/ChangeLog:
* gcc.target/i386/pr115462.c: New test.

Diff:
---
 gcc/config/i386/i386.cc  |  5 -
 gcc/testsuite/gcc.target/i386/pr115462.c | 22 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 1f71ed04be6..92e3c67112e 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22154,7 +22154,10 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
 address_cost should be used, but it reduce cost too much.
 So current solution is make constant disp as cheap as possible.  */
  if (GET_CODE (addr) == PLUS
- && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
+ && x86_64_immediate_operand (XEXP (addr, 1), Pmode)
+ /* Only hanlde (reg + disp) since other forms of addr are mostly 
LEA,
+there's no additional cost for the plus of disp.  */
+ && register_operand (XEXP (addr, 0), Pmode))
{
  *total += 1;
  *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, speed);
diff --git a/gcc/testsuite/gcc.target/i386/pr115462.c 
b/gcc/testsuite/gcc.target/i386/pr115462.c
new file mode 100644
index 000..ad50a6382bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115462.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2 -fno-tree-vectorize -fno-pic" } */
+/* { dg-final { scan-assembler-times {(?n)movl[ \t]+.*, p1\.0\+[0-9]*\(,} 3 } 
} */
+
+int
+foo (long indx, long indx2, long indx3, long indx4, long indx5, long indx6, 
long n, int* q)
+{
+  static int p1[1];
+  int* p2 = p1 + 1000;
+  int* p3 = p1 + 4000;
+  int* p4 = p1 + 8000;
+
+  for (long i = 0; i != n; i++)
+{
+  /* scan for  movl%edi, p1.0+3996(,%rax,4),
+p1.0+3996 should be propagted into the loop.  */
+  p2[indx++] = q[indx++];
+  p3[indx2++] = q[indx2++];
+  p4[indx3++] = q[indx3++];
+}
+  return p1[indx6] + p1[indx5];
+}


[gcc r15-1674] LoongArch: Tweak IOR rtx_cost for bstrins

2024-06-26 Thread Xi Ruoyao via Gcc-cvs
https://gcc.gnu.org/g:94aade062a4ab689abc4c3422c1b901ab0733c19

commit r15-1674-g94aade062a4ab689abc4c3422c1b901ab0733c19
Author: Xi Ruoyao 
Date:   Sat Jun 15 18:29:43 2024 +0800

LoongArch: Tweak IOR rtx_cost for bstrins

Consider

c &= 0xfff;
a &= ~0xfff;
b &= ~0xfff;
a |= c;
b |= c;

This can be done with 2 bstrins instructions.  But we need to recognize
it in loongarch_rtx_costs or the compiler will not propagate "c & 0xfff"
forward.

gcc/ChangeLog:

* config/loongarch/loongarch.cc:
(loongarch_use_bstrins_for_ior_with_mask): Split the main logic
into ...
(loongarch_use_bstrins_for_ior_with_mask_1): ... here.
(loongarch_rtx_costs): Special case for IOR those can be
implemented with bstrins.

gcc/testsuite/ChangeLog;

* gcc.target/loongarch/bstrins-3.c: New test.

Diff:
---
 gcc/config/loongarch/loongarch.cc  | 73 --
 gcc/testsuite/gcc.target/loongarch/bstrins-3.c | 16 ++
 2 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e2ff2af89e2..5119d878731 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3681,6 +3681,27 @@ loongarch_set_reg_reg_piece_cost (machine_mode mode, 
unsigned int units)
   return COSTS_N_INSNS ((GET_MODE_SIZE (mode) + units - 1) / units);
 }
 
+static int
+loongarch_use_bstrins_for_ior_with_mask_1 (machine_mode mode,
+  unsigned HOST_WIDE_INT mask1,
+  unsigned HOST_WIDE_INT mask2)
+{
+  if (mask1 != ~mask2 || !mask1 || !mask2)
+return 0;
+
+  /* Try to avoid a right-shift.  */
+  if (low_bitmask_len (mode, mask1) != -1)
+return -1;
+
+  if (low_bitmask_len (mode, mask2 >> (ffs_hwi (mask2) - 1)) != -1)
+return 1;
+
+  if (low_bitmask_len (mode, mask1 >> (ffs_hwi (mask1) - 1)) != -1)
+return -1;
+
+  return 0;
+}
+
 /* Return the cost of moving between two registers of mode MODE.  */
 
 static int
@@ -3812,6 +3833,38 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   /* Fall through.  */
 
 case IOR:
+  {
+   rtx op[2] = {XEXP (x, 0), XEXP (x, 1)};
+   if (GET_CODE (op[0]) == AND && GET_CODE (op[1]) == AND
+   && (mode == SImode || (TARGET_64BIT && mode == DImode)))
+ {
+   rtx rtx_mask0 = XEXP (op[0], 1), rtx_mask1 = XEXP (op[1], 1);
+   if (CONST_INT_P (rtx_mask0) && CONST_INT_P (rtx_mask1))
+ {
+   unsigned HOST_WIDE_INT mask0 = UINTVAL (rtx_mask0);
+   unsigned HOST_WIDE_INT mask1 = UINTVAL (rtx_mask1);
+   if (loongarch_use_bstrins_for_ior_with_mask_1 (mode,
+  mask0,
+  mask1))
+ {
+   /* A bstrins instruction */
+   *total = COSTS_N_INSNS (1);
+
+   /* A srai instruction */
+   if (low_bitmask_len (mode, mask0) == -1
+   && low_bitmask_len (mode, mask1) == -1)
+ *total += COSTS_N_INSNS (1);
+
+   for (int i = 0; i < 2; i++)
+ *total += set_src_cost (XEXP (op[i], 0), mode, speed);
+
+   return true;
+ }
+ }
+ }
+  }
+
+  /* Fall through.  */
 case XOR:
   /* Double-word operations use two single-word operations.  */
   *total = loongarch_binary_cost (x, COSTS_N_INSNS (1), COSTS_N_INSNS (2),
@@ -5796,23 +5849,9 @@ bool loongarch_pre_reload_split (void)
 int
 loongarch_use_bstrins_for_ior_with_mask (machine_mode mode, rtx *op)
 {
-  unsigned HOST_WIDE_INT mask1 = UINTVAL (op[2]);
-  unsigned HOST_WIDE_INT mask2 = UINTVAL (op[4]);
-
-  if (mask1 != ~mask2 || !mask1 || !mask2)
-return 0;
-
-  /* Try to avoid a right-shift.  */
-  if (low_bitmask_len (mode, mask1) != -1)
-return -1;
-
-  if (low_bitmask_len (mode, mask2 >> (ffs_hwi (mask2) - 1)) != -1)
-return 1;
-
-  if (low_bitmask_len (mode, mask1 >> (ffs_hwi (mask1) - 1)) != -1)
-return -1;
-
-  return 0;
+  return loongarch_use_bstrins_for_ior_with_mask_1 (mode,
+   UINTVAL (op[2]),
+   UINTVAL (op[4]));
 }
 
 /* Rewrite a MEM for simple load/store under -mexplicit-relocs=auto
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-3.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-3.c
new file mode 100644
index 000..13762bdef42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-final" } */
+/* { dg-final { scan-rtl-dump-times 

[gcc r15-1675] LoongArch: NFC: Dedup and sort the comment in loongarch_print_operand_reloc

2024-06-26 Thread Xi Ruoyao via Gcc-cvs
https://gcc.gnu.org/g:2280e88ab05ebab994b7db588d577b29f1b12b87

commit r15-1675-g2280e88ab05ebab994b7db588d577b29f1b12b87
Author: Xi Ruoyao 
Date:   Sun Jun 16 12:22:40 2024 +0800

LoongArch: NFC: Dedup and sort the comment in loongarch_print_operand_reloc

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Dedup and sort the comment describing modifiers.

Diff:
---
 gcc/config/loongarch/loongarch.cc | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 5119d878731..0fb547e00f4 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6132,21 +6132,13 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
  'z' for (eq:?I ...), 'n' for (ne:?I ...).
't' Like 'T', but with the EQ/NE cases reversed
-   'F' Print the FPU branch condition for comparison OP.
-   'W' Print the inverse of the FPU branch condition for comparison OP.
-   'w' Print a LSX register.
'u' Print a LASX register.
-   'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
- 'z' for (eq:?I ...), 'n' for (ne:?I ...).
-   't' Like 'T', but with the EQ/NE cases reversed
-   'Y' Print loongarch_fp_conditions[INTVAL (OP)]
-   'Z' Print OP and a comma for 8CC, otherwise print nothing.
-   'z' Print $0 if OP is zero, otherwise print OP normally.
'v' Print the insn size suffix b, h, w or d for vector modes V16QI, V8HI,
  V4SI, V2SI, and w, d for vector modes V4SF, V2DF respectively.
'V' Print exact log2 of CONST_INT OP element 0 of a replicated
  CONST_VECTOR in decimal.
'W' Print the inverse of the FPU branch condition for comparison OP.
+   'w' Print a LSX register.
'X' Print CONST_INT OP in hexadecimal format.
'x' Print the low 16 bits of CONST_INT OP in hexadecimal format.
'Y' Print loongarch_fp_conditions[INTVAL (OP)]


[gcc r15-1676] RISC-V: Add testcases for vector truncate after .SAT_SUB

2024-06-26 Thread Pan Li via Gcc-cvs
https://gcc.gnu.org/g:b55798c0fc5cb02512b58502961d8425fb60588f

commit r15-1676-gb55798c0fc5cb02512b58502961d8425fb60588f
Author: Pan Li 
Date:   Mon Jun 24 22:25:57 2024 +0800

RISC-V: Add testcases for vector truncate after .SAT_SUB

This patch would like to add the test cases of the vector truncate after
.SAT_SUB.  Aka:

  #define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T)   \
  void __attribute__((noinline))   \
  vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
 unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
IN_T x = op_1[i];  \
out[i] = (OUT_T)(x >= y ? x - y : 0);  \
  }\
  }

The below 3 cases are included.

DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t)
DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint16_t, uint32_t)
DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-3.c: 
New test.

Signed-off-by: Pan Li 

Diff:
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h| 19 ++
 .../rvv/autovec/binop/vec_sat_binary_scalar.h  | 27 
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-1.c  | 21 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-2.c  | 21 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-3.c  | 21 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-run-1.c  | 74 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-run-2.c  | 74 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-run-3.c  | 74 ++
 8 files changed, 331 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index d5c81fbe5a9..a3116033fb3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -310,4 +310,23 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N)
 
+/**/
+/* Saturation Sub Truncated (Unsigned and Signed) 
*/
+/**/
+#define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T)   \
+void __attribute__((noinline))   \
+vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
+unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  IN_T x = op_1[i];  \
+  out[i] = (OUT_T)(x >= y ? x - y : 0);  \
+}\
+}
+
+#define RUN_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T, out, op_1, y, N) \
+  vec_sat_u_sub_trunc_##OUT_T##_fmt_1(out, op_1, y, N)
+
 #endif
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h
new file mode 100644
index 000..c79b180054e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY_SCALAR
+#define HAVE_DEFINED_VEC_SAT_BINARY_SCALAR
+
+int
+main ()
+{
+  unsigned i, k;
+  OUT_T out[N];
+
+  for (i = 0; i < size