[gcc(refs/users/meissner/heads/work182)] Revert changes

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:848be2528cd3731de39b8f7416112443ebfc1a6e

commit 848be2528cd3731de39b8f7416112443ebfc1a6e
Author: Michael Meissner 
Date:   Wed Nov 6 13:20:30 2024 -0500

Revert changes

Diff:
---
 gcc/ChangeLog.meissner  | 425 +---
 gcc/config.gcc  |   4 +-
 gcc/config/rs6000/aix71.h   |   1 -
 gcc/config/rs6000/aix72.h   |   1 -
 gcc/config/rs6000/aix73.h   |   1 -
 gcc/config/rs6000/dfp.md|   2 +-
 gcc/config/rs6000/driver-rs6000.cc  |   2 -
 gcc/config/rs6000/power10.md| 144 
 gcc/config/rs6000/rs6000-arch.def   |  49 ---
 gcc/config/rs6000/rs6000-builtin.cc |  14 +-
 gcc/config/rs6000/rs6000-c.cc   |  29 +-
 gcc/config/rs6000/rs6000-cpus.def   |  11 +-
 gcc/config/rs6000/rs6000-opts.h |   1 -
 gcc/config/rs6000/rs6000-protos.h   |   5 +-
 gcc/config/rs6000/rs6000-string.cc  |   4 +-
 gcc/config/rs6000/rs6000-tables.opt |  11 +-
 gcc/config/rs6000/rs6000.cc | 403 ++
 gcc/config/rs6000/rs6000.h  |  82 ++---
 gcc/config/rs6000/rs6000.md |  66 ++--
 gcc/config/rs6000/rs6000.opt|  19 +-
 gcc/testsuite/gcc.target/powerpc/ppc-target-4.c |  38 +--
 gcc/testsuite/gcc.target/powerpc/pr115688.c |   3 +-
 22 files changed, 297 insertions(+), 1018 deletions(-)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 77e940a625fb..a04bd0a46f88 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,417 +1,14 @@
- Branch work182, patch #21 
-
-Add -mcpu=future tuning support.
-
-This patch makes -mtune=future use the same tuning decision as -mtune=power11.
-
-2024-10-22  Michael Meissner  
-
-gcc/
-
-   * config/rs6000/power10.md (all reservations): Add future as an
-   alterntive to power10 and power11.
-
- Branch work182, patch #20 
-
-Add support for -mcpu=future
-
-This patch adds the support that can be used in developing GCC support for
-future PowerPC processors.
-
-2024-10-22  Michael Meissner  
-
-   * config.gcc (powerpc*-*-*): Add support for --with-cpu=future.
-   * config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=future.
-   * config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
-   * config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
-   * config/rs6000/driver-rs6000.cc (asm_names): Likewise.
-   * config/rs6000/rs6000-arch.def: Add future cpu.
-   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): If
-   -mcpu=future, define _ARCH_FUTURE.
-   * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
-   (future cpu): Define.
-   * config/rs6000/rs6000-opts.h (enum processor_type): Add
-   PROCESSOR_FUTURE.
-   * config/rs6000/rs6000-tables.opt: Regenerate.
-   * config/rs6000/rs6000.cc (power10_cost): Update comment.
-   (get_arch_flags): Add support for future processor.
-   (rs6000_option_override_internal): Likewise.
-   (rs6000_machine_from_flags): Likewise.
-   (rs6000_reassociation_width): Likewise.
-   (rs6000_adjust_cost): Likewise.
-   (rs6000_issue_rate): Likewise.
-   (rs6000_sched_reorder): Likewise.
-   (rs6000_sched_reorder2): Likewise.
-   (rs6000_register_move_cost): Likewise.
-   * config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
-   (TARGET_POWER11): New macro.
-   * config/rs6000/rs6000.md (cpu attribute): Likewise.
-
- Branch work182, patch #9 
-
-Update tests to work with architecture flags changes.
-
-Two tests used -mvsx to raise the processor level to at least power7.  These
-tests were rewritten to add cpu=power7 support.
-
-I have built both big endian and little endian bootstrap compilers and there
-were no regressions.
-
-In addition, I constructed a test case that used every archiecture define (like
-_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
-this test for all supported combinations of -mcpu, big/little endian, and 32/64
-bit support.  Every single instance generated exactly the same code with the
-patches installed compared to the compiler before installing the patches.
-
-Can I install this patch on the GCC 15 trunk?
-
-2024-10-22  Michael Meissner  
-
-gcc/testsuite/
-
-   * gcc.target/powerpc/ppc-target-4.c: Rewrite the test to add cpu=power7
-   when we need to add VSX support.  Add test for adding cpu=power7 no-vsx
-   to generate only Altivec instructions.
-   * gcc.target/powerpc/pr115688.c: Add cpu=power7 when requesting VSX
-   instructions.
-
- Branch work182, patch #8 
-
-Change TARG

[gcc r15-4989] openmp: Fix signed/unsigned warning

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:345eb9b795d9728733bd0e472529e259ce796ff6

commit r15-4989-g345eb9b795d9728733bd0e472529e259ce796ff6
Author: Andrew Stubbs 
Date:   Wed Nov 6 17:50:00 2024 +

openmp: Fix signed/unsigned warning

My previous patch broke things when building with Werror.

gcc/ChangeLog:

* omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.

Diff:
---
 gcc/omp-general.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 1ae575ee181f..72fb7f92ff70 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -1005,7 +1005,7 @@ omp_max_vf (bool offload)
   for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;)
{
  if (startswith (c, "amdgcn"))
-   return ordered_max (64, omp_max_vf (false));
+   return ordered_max (poly_uint64 (64), omp_max_vf (false));
  else if ((c = strchr (c, ':')))
c++;
}


[gcc r15-4990] Darwin: Fix a narrowing warning.

2024-11-06 Thread Iain D Sandoe via Gcc-cvs
https://gcc.gnu.org/g:a91d5c27cd2173a40cc170ee09330dd1e13403a5

commit r15-4990-ga91d5c27cd2173a40cc170ee09330dd1e13403a5
Author: Iain Sandoe 
Date:   Wed Nov 6 20:46:47 2024 +

Darwin: Fix a narrowing warning.

cdtor_record needs to have an unsigned entry for the position in order to
match with vec_safe_length.

gcc/ChangeLog:

* config/darwin.cc (cdtor_record): Make position unsigned.

Signed-off-by: Iain Sandoe 

Diff:
---
 gcc/config/darwin.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
index ae821e320121..4e495fce82bb 100644
--- a/gcc/config/darwin.cc
+++ b/gcc/config/darwin.cc
@@ -90,7 +90,7 @@ along with GCC; see the file COPYING3.  If not see
 typedef struct GTY(()) cdtor_record {
   rtx symbol;
   int priority;/* [con/de]structor priority */
-  int position;/* original position */
+  unsigned position;   /* original position */
 } cdtor_record;
 
 static GTY(()) vec *ctors = NULL;


[gcc(refs/users/meissner/heads/work182)] Add rs6000 architecture masks.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:e2d17854ac19a47ab8de08907d3debb1dc585699

commit e2d17854ac19a47ab8de08907d3debb1dc585699
Author: Michael Meissner 
Date:   Wed Nov 6 15:23:31 2024 -0500

Add rs6000 architecture masks.

This patch begins the journey to move architecture bits that are not user 
ISA
options from rs6000_isa_flags to a new targt variable rs6000_arch_flags.  
The
intention is to remove switches that are currently isa options, but the user
should not be using this particular option. For example, we want users to 
use
-mcpu=power10 and not just -mpower10.

This patch also changes the target_clones support to use an architecture 
mask
instead of isa bits.

This patch also switches the handling of .machine to use architecture masks 
if
they exist (power4 through power11).  All of the other PowerPCs will 
continue to
use the existing code for setting the .machine option.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/default64.h (TARGET_CPU_DEFAULT): Set default cpu 
name.
* config/rs6000/rs6000-arch.def: New file.
* config/rs6000/rs6000.cc (struct clone_map): Switch to using
architecture masks instead of ISA masks.
(rs6000_clone_map): Likewise.
(rs6000_print_isa_options): Add an architecture flags argument, 
change
all callers.
(get_arch_flag): New function.
(rs6000_debug_reg_global): Update rs6000_print_isa_options calls.
(rs6000_option_override_internal): Likewise.
(rs6000_machine_from_flags): Switch to using architecture masks 
instead
of ISA masks.
(struct rs6000_arch_mask): New structure.
(rs6000_arch_masks): New table of architecutre masks and names.
(rs6000_function_specific_save): Save architecture flags.
(rs6000_function_specific_restore): Restore architecture flags.
(rs6000_function_specific_print): Update rs6000_print_isa_options 
calls.
(rs6000_print_options_internal): Add architecture flags options.
(rs6000_clone_priority): Switch to using architecture masks instead 
of
ISA masks.
(rs6000_can_inline_p): Don't allow inling if the callee requires a 
newer
architecture than the caller.
* config/rs6000/rs6000.h: Use rs6000-arch.def to create the 
architecture
masks.
* config/rs6000/rs6000.opt (rs6000_arch_flags): New target variable.
(x_rs6000_arch_flags): New save/restore field for rs6000_arch_flags.

Diff:
---
 gcc/config/rs6000/default64.h |  11 ++
 gcc/config/rs6000/rs6000-arch.def |  48 +
 gcc/config/rs6000/rs6000.cc   | 215 +++---
 gcc/config/rs6000/rs6000.h|  24 +
 gcc/config/rs6000/rs6000.opt  |   8 ++
 5 files changed, 270 insertions(+), 36 deletions(-)

diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index 10e3dec78aca..afa6542e040c 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define RS6000_CPU(NAME, CPU, FLAGS)
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
+#undef TARGET_CPU_DEFAULT
 
 #if (TARGET_DEFAULT & MASK_LITTLE_ENDIAN)
 #undef TARGET_DEFAULT
@@ -28,10 +29,20 @@ along with GCC; see the file COPYING3.  If not see
| MASK_LITTLE_ENDIAN)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower8"
+#define TARGET_CPU_DEFAULT "power8"
+
 #else
 #undef TARGET_DEFAULT
 #define TARGET_DEFAULT (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT \
| OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower4"
+
+#if (TARGET_DEFAULT & MASK_POWERPC64)
+#define TARGET_CPU_DEFAULT "powerpc64"
+
+#else
+#define TARGET_CPU_DEFAULT "powerpc"
+#endif
+
 #endif
diff --git a/gcc/config/rs6000/rs6000-arch.def 
b/gcc/config/rs6000/rs6000-arch.def
new file mode 100644
index ..e5b6e9581331
--- /dev/null
+++ b/gcc/config/rs6000/rs6000-arch.def
@@ -0,0 +1,48 @@
+/* IBM RS/6000 CPU architecture features by processor type.
+   Copyright (C) 1991-2024 Free Software Foundation, Inc.
+   Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu)
+
+   T

[gcc(refs/users/meissner/heads/work182)] Update tests to work with architecture flags changes.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:3457a6edb97bdb63ab4af860f82fb7853c54ceba

commit 3457a6edb97bdb63ab4af860f82fb7853c54ceba
Author: Michael Meissner 
Date:   Wed Nov 6 15:45:57 2024 -0500

Update tests to work with architecture flags changes.

Two tests used -mvsx to raise the processor level to at least power7.  These
tests were rewritten to add cpu=power7 support.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

gcc/testsuite/

* gcc.target/powerpc/ppc-target-4.c: Rewrite the test to add 
cpu=power7
when we need to add VSX support.  Add test for adding cpu=power7 
no-vsx
to generate only Altivec instructions.
* gcc.target/powerpc/pr115688.c: Add cpu=power7 when requesting VSX
instructions.

Diff:
---
 gcc/testsuite/gcc.target/powerpc/ppc-target-4.c | 38 +++--
 gcc/testsuite/gcc.target/powerpc/pr115688.c |  3 +-
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c 
b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
index feef76db4618..5e2ecf34f249 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
@@ -2,7 +2,7 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_fprs } */
 /* { dg-options "-O2 -ffast-math -mdejagnu-cpu=power5 -mno-altivec 
-mabi=altivec -fno-unroll-loops" } */
-/* { dg-final { scan-assembler-times "vaddfp" 1 } } */
+/* { dg-final { scan-assembler-times "vaddfp" 2 } } */
 /* { dg-final { scan-assembler-times "xvaddsp" 1 } } */
 /* { dg-final { scan-assembler-times "fadds" 1 } } */
 
@@ -18,10 +18,6 @@
 #error "__VSX__ should not be defined."
 #endif
 
-#pragma GCC target("altivec,vsx")
-#include 
-#pragma GCC reset_options
-
 #pragma GCC push_options
 #pragma GCC target("altivec,no-vsx")
 
@@ -33,6 +29,7 @@
 #error "__VSX__ should not be defined."
 #endif
 
+/* Altivec build, generate vaddfp.  */
 void
 av_add (vector float *a, vector float *b, vector float *c)
 {
@@ -40,10 +37,11 @@ av_add (vector float *a, vector float *b, vector float *c)
   unsigned long n = SIZE / 4;
 
   for (i = 0; i < n; i++)
-a[i] = vec_add (b[i], c[i]);
+a[i] = b[i] + c[i];
 }
 
-#pragma GCC target("vsx")
+/* cpu=power7 must be used to enable VSX.  */
+#pragma GCC target("cpu=power7,vsx")
 
 #ifndef __ALTIVEC__
 #error "__ALTIVEC__ should be defined."
@@ -53,6 +51,7 @@ av_add (vector float *a, vector float *b, vector float *c)
 #error "__VSX__ should be defined."
 #endif
 
+/* VSX build on power7, generate xsaddsp.  */
 void
 vsx_add (vector float *a, vector float *b, vector float *c)
 {
@@ -60,11 +59,31 @@ vsx_add (vector float *a, vector float *b, vector float *c)
   unsigned long n = SIZE / 4;
 
   for (i = 0; i < n; i++)
-a[i] = vec_add (b[i], c[i]);
+a[i] = b[i] + c[i];
+}
+
+#pragma GCC target("cpu=power7,no-vsx")
+
+#ifndef __ALTIVEC__
+#error "__ALTIVEC__ should be defined."
+#endif
+
+#ifdef __VSX__
+#error "__VSX__ should not be defined."
+#endif
+
+/* Altivec build on power7 with no VSX, generate vaddfp.  */
+void
+av2_add (vector float *a, vector float *b, vector float *c)
+{
+  unsigned long i;
+  unsigned long n = SIZE / 4;
+
+  for (i = 0; i < n; i++)
+a[i] = b[i] + c[i];
 }
 
 #pragma GCC pop_options
-#pragma GCC target("no-vsx,no-altivec")
 
 #ifdef __ALTIVEC__
 #error "__ALTIVEC__ should not be defined."
@@ -74,6 +93,7 @@ vsx_add (vector float *a, vector float *b, vector float *c)
 #error "__VSX__ should not be defined."
 #endif
 
+/* Default power5 build, generate scalar fadds.  */
 void
 norm_add (float *a, float *b, float *c)
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr115688.c 
b/gcc/testsuite/gcc.target/powerpc/pr115688.c
index 5222e66ef170..00c7c301436a 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr115688.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr115688.c
@@ -7,7 +7,8 @@
 
 /* Verify there is no ICE under 32 bit env.  */
 
-__attribute__((target("vsx")))
+/* cpu=power7 must be used to enable VSX.  */
+__attribute__((target("cpu=power7,vsx")))
 int test (void)
 {
   return 0;


[gcc(refs/users/meissner/heads/work182)] Use architecture flags for defining _ARCH_PWR macros.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:2c3a02cfe93b9c6e412221762a6954c9be5a9ed3

commit 2c3a02cfe93b9c6e412221762a6954c9be5a9ed3
Author: Michael Meissner 
Date:   Wed Nov 6 15:26:32 2024 -0500

Use architecture flags for defining _ARCH_PWR macros.

For the newer architectures, this patch changes GCC to define the 
_ARCH_PWR
macros using the new architecture flags instead of relying on isa options 
like
-mpower10.

The -mpower8-internal, -mpower10, and -mpower11 options were removed.  The
-mpower11 option was removed completely, since it was just added in GCC 15. 
 The
other two options were marked as WarnRemoved, and the various ISA bits were
removed.

TARGET_POWER8 and TARGET_POWER10 were re-defined to use the architeture bits
instead of the ISA bits.

There are other internal isa bits that aren't removed with this patch 
because
the built-in function support uses those bits.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros) Add 
support to
use architecture flags instead of ISA flags for setting most of the
_ARCH_PWR* macros.
(rs6000_cpu_cpp_builtins): Update rs6000_target_modify_macros call.
* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Remove
OPTION_MASK_POWER8.
(ISA_3_1_MASKS_SERVER): Remove OPTION_MASK_POWER10.
(POWER11_MASKS_SERVER): Remove OPTION_MASK_POWER11.
(POWERPC_MASKS): Remove OPTION_MASK_POWER8, OPTION_MASK_POWER10, and
OPTION_MASK_POWER11.
* config/rs6000/rs6000-protos.h (rs6000_target_modify_macros): 
Update
declaration.
(rs6000_target_modify_macros_ptr): Likewise.
* config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): 
Likewise.
(rs6000_option_override_internal): Use architecture flags instead 
of ISA
flags.
(rs6000_opt_masks): Remove -mpower10 and -mpower11, which are no 
longer
in the ISA flags.
(rs6000_pragma_target_parse): Use architecture flags as well as ISA
flags.
* config/rs6000/rs6000.h (TARGET_POWER4): New macro.
(TARGET_POWER5): Likewise.
(TARGET_POWER5X): Likewise.
(TARGET_POWER6): Likewise.
(TARGET_POWER7): Likewise.
(TARGET_POWER8): Likewise.
(TARGET_POWER9): Likewise.
(TARGET_POWER10): Likewise.
(TARGET_POWER11): Likewise.
* config/rs6000/rs6000.opt (-mpower8-internal): Remove ISA flag 
bits.
(-mpower10): Likewise.
(-mpower11): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-c.cc | 27 +++
 gcc/config/rs6000/rs6000-cpus.def |  8 +---
 gcc/config/rs6000/rs6000-protos.h |  5 +++--
 gcc/config/rs6000/rs6000.cc   | 19 +++
 gcc/config/rs6000/rs6000.h| 20 
 gcc/config/rs6000/rs6000.opt  | 11 ++-
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 04882c396bfe..c8f33289fa38 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -338,7 +338,8 @@ rs6000_define_or_undefine_macro (bool define_p, const char 
*name)
#pragma GCC target, we need to adjust the macros dynamically.  */
 
 void
-rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
+rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
+HOST_WIDE_INT arch_flags)
 {
   if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
 fprintf (stderr,
@@ -411,7 +412,7 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
summary of the flags associated with particular cpu
definitions.  */
 
-  /* rs6000_isa_flags based options.  */
+  /* rs6000_isa_flags and rs6000_arch_flags based options.  */
   rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC");
   if ((flags & OPTION_MASK_PPC_GPOPT) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPCSQ");
@@ -419,23 +420,25 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPCGR");
   if ((flags & OPTION_MASK_POWERPC64) != 0)
 rs6000_define_or_undefine

[gcc(refs/users/meissner/heads/work182)] Change TARGET_CMPB to TARGET_POWER6

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:b024a7e3d1c320509f2e9aaac5ac33a3589c17c0

commit b024a7e3d1c320509f2e9aaac5ac33a3589c17c0
Author: Michael Meissner 
Date:   Wed Nov 6 15:38:07 2024 -0500

Change TARGET_CMPB to TARGET_POWER6

As part of the architecture flags patches, this patch changes the use of
TARGET_CMPB to TARGET_POWER6.  The CMPB instruction was added in power6 (ISA
2.05).

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER6 instead of TARGET_CMPB.
* config/rs6000/rs6000.h (TARGET_FCFID): Merge tests for popcntb, 
cmpb,
and popcntd into a single test for TARGET_POWER5.
(TARGET_LFIWAX): Use TARGET_POWER6 instead of TARGET_CMPB.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
(parity2_cmp): Likewise.
(cmpb): Likewise.
(copysign3): Likewise.
(copysign3_fcpsgn): Likewise.
(cmpstrnsi): Likewise.
(cmpstrsi): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtin.cc |  4 ++--
 gcc/config/rs6000/rs6000.h  |  6 ++
 gcc/config/rs6000/rs6000.md | 16 
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 98a0545030cd..76421bd1de0b 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -157,9 +157,9 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P5:
   return TARGET_POWER5;
 case ENB_P6:
-  return TARGET_CMPB;
+  return TARGET_POWER6;
 case ENB_P6_64:
-  return TARGET_CMPB && TARGET_POWERPC64;
+  return TARGET_POWER6 && TARGET_POWERPC64;
 case ENB_P7:
   return TARGET_POPCNTD;
 case ENB_P7_64:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 4500724d895c..d22693eb2bfb 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -448,13 +448,11 @@ extern int rs6000_vector_align[];
Enable 32-bit fcfid's on any of the switches for newer ISA machines.  */
 #define TARGET_FCFID   (TARGET_POWERPC64   \
 || TARGET_PPC_GPOPT/* 970/power4 */\
-|| TARGET_POPCNTB  /* ISA 2.02 */  \
-|| TARGET_CMPB /* ISA 2.05 */  \
-|| TARGET_POPCNTD) /* ISA 2.06 */
+|| TARGET_POWER5)  /* ISA 2.02 and above */ \
 
 #define TARGET_FCTIDZ  TARGET_FCFID
 #define TARGET_STFIWX  TARGET_PPC_GFXOPT
-#define TARGET_LFIWAX  TARGET_CMPB
+#define TARGET_LFIWAX  TARGET_POWER6
 #define TARGET_LFIWZX  TARGET_POPCNTD
 #define TARGET_FCFIDS  TARGET_POPCNTD
 #define TARGET_FCFIDU  TARGET_POPCNTD
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 7f9fe609a031..0c303087e944 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -383,7 +383,7 @@
  (const_int 1)
 
  (and (eq_attr "isa" "p6")
- (match_test "TARGET_CMPB"))
+ (match_test "TARGET_POWER6"))
  (const_int 1)
 
  (and (eq_attr "isa" "p7")
@@ -2544,7 +2544,7 @@
 (define_insn "parity2_cmpb"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")] 
UNSPEC_PARITY))]
-  "TARGET_CMPB"
+  "TARGET_POWER6"
   "prty %0,%1"
   [(set_attr "type" "popcnt")])
 
@@ -2597,7 +2597,7 @@
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")
 (match_operand:GPR 2 "gpc_reg_operand" "r")] UNSPEC_CMPB))]
-  "TARGET_CMPB"
+  "TARGET_POWER6"
   "cmpb %0,%1,%2"
   [(set_attr "type" "cmp")])
 
@@ -5401,7 +5401,7 @@
&& ((TARGET_PPC_GFXOPT
 && !HONOR_NANS (mode)
 && !HONOR_SIGNED_ZEROS (mode))
-   || TARGET_CMPB
+   || TARGET_POWER6
|| VECTOR_UNIT_VSX_P (mode))"
 {
   /* Middle-end canonicalizes -fabs (x) to copysign (x, -1),
@@ -5422,7 +5422,7 @@
   if (!gpc_reg_operand (operands[2], mode))
 operands[2] = copy_to_mode_reg (mode, operands[2]);
 
-  if (TARGET_CMPB || VECTOR_UNIT_VSX_P (mode))
+  if (TARGET_POWER6 || VECTOR_UNIT_VSX_P (mode))
 {
   emit_insn (gen_copysign3_fcpsgn (

[gcc(refs/users/meissner/heads/work182)] Change TARGET_POPCNTB to TARGET_POWER5

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:c8758525cb5eac88b2607a5fe4cc329d854014b5

commit c8758525cb5eac88b2607a5fe4cc329d854014b5
Author: Michael Meissner 
Date:   Wed Nov 6 15:31:47 2024 -0500

Change TARGET_POPCNTB to TARGET_POWER5

As part of the architecture flags patches, this patch changes the use of
TARGET_POPCNTB to TARGET_POWER5.  The POPCNTB instruction was added in ISA 
2.02
(power5).

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER5 instead of TARGET_POPCNTB.
* config/rs6000/rs6000.h (TARGET_EXTRA_BUILTINS): Use TARGET_POWER5
instead of TARGET_POPCNTB.  Eliminate TARGET_CMPB and TARGET_POPCNTD
tests since TARGET_POWER5 will always be true for those tests.
(TARGET_FRE): Use TARGET_POWER5 instead of TARGET_POPCNTB.
(TARGET_FRSQRTES): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
(popcount): Use TARGET_POWER5 instead of TARGET_POPCNTB.  Drop
test for TARGET_POPCNTD (i.e power7), since TARGET_POPCNTB will 
always
be set if TARGET_POPCNTD is set.
(popcntb2): Use TARGET_POWER5 instead of TARGET_POPCNTB.
(parity2): Likewise.
(parity2_cmpb): Remove TARGET_POPCNTB test, since it will 
always
be true when TARGET_CMPB (i.e. power6) is set.

Diff:
---
 gcc/config/rs6000/rs6000-builtin.cc |  2 +-
 gcc/config/rs6000/rs6000.h  |  8 +++-
 gcc/config/rs6000/rs6000.md | 10 +-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 9bdbae1ecf94..98a0545030cd 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -155,7 +155,7 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_ALWAYS:
   return true;
 case ENB_P5:
-  return TARGET_POPCNTB;
+  return TARGET_POWER5;
 case ENB_P6:
   return TARGET_CMPB;
 case ENB_P6_64:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 7ad8baca177a..4500724d895c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -547,9 +547,7 @@ extern int rs6000_vector_align[];
 
 #define TARGET_EXTRA_BUILTINS  (TARGET_POWERPC64\
 || TARGET_PPC_GPOPT /* 970/power4 */\
-|| TARGET_POPCNTB   /* ISA 2.02 */  \
-|| TARGET_CMPB  /* ISA 2.05 */  \
-|| TARGET_POPCNTD   /* ISA 2.06 */  \
+|| TARGET_POWER5/* ISA 2.02 & above */ \
 || TARGET_ALTIVEC   \
 || TARGET_VSX   \
 || TARGET_HARD_FLOAT)
@@ -563,9 +561,9 @@ extern int rs6000_vector_align[];
 #define TARGET_FRES(TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT)
 
 #define TARGET_FRE (TARGET_HARD_FLOAT \
-&& (TARGET_POPCNTB || VECTOR_UNIT_VSX_P (DFmode)))
+&& (TARGET_POWER5 || VECTOR_UNIT_VSX_P (DFmode)))
 
-#define TARGET_FRSQRTES(TARGET_HARD_FLOAT && TARGET_POPCNTB \
+#define TARGET_FRSQRTES(TARGET_HARD_FLOAT && TARGET_POWER5 \
 && TARGET_PPC_GFXOPT)
 
 #define TARGET_FRSQRTE (TARGET_HARD_FLOAT \
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 8eda2f7bb0d7..10d13bf812d2 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -379,7 +379,7 @@
  (const_int 1)
 
  (and (eq_attr "isa" "p5")
- (match_test "TARGET_POPCNTB"))
+ (match_test "TARGET_POWER5"))
  (const_int 1)
 
  (and (eq_attr "isa" "p6")
@@ -2510,7 +2510,7 @@
 (define_expand "popcount2"
   [(set (match_operand:GPR 0 "gpc_reg_operand")
(popcount:GPR (match_operand:GPR 1 "gpc_reg_operand")))]
-  "TARGET_POPCNTB || TARGET_POPCNTD"
+  "TARGET_POWER5"
 {
   rs6000_emit_popcount (operands[0], operands[1]);
   DONE;
@@ -2520,7 +2520,7 @@
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")]
UNSP

[gcc(refs/users/meissner/heads/work182)] Change TARGET_POPCNTD to TARGET_POWER7

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:ef7b8de40f10280c16f59b184fa32016f744a82c

commit ef7b8de40f10280c16f59b184fa32016f744a82c
Author: Michael Meissner 
Date:   Wed Nov 6 15:41:54 2024 -0500

Change TARGET_POPCNTD to TARGET_POWER7

As part of the architecture flags patches, this patch changes the use of
TARGET_POPCNTD to TARGET_POWER7.  The POPCNTD instruction was added in 
power7
(ISA 2.06).

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/dfp.md (floatdidd2): Change TARGET_POPCNTD to
TARGET_POWER7.
* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported):
Likewise.
* config/rs6000/rs6000-string.cc (expand_block_compare_gpr): 
Likewise.
* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
Likewise.
(rs6000_rtx_costs): Likewise.
(rs6000_emit_popcount): Likewise.
* config/rs6000/rs6000.h (TARGET_LDBRX): Likewise.
(TARGET_LFIWZX): Likewise.
(TARGET_FCFIDS): Likewise.
(TARGET_FCFIDU): Likewise.
(TARGET_FCFIDUS): Likewise.
(TARGET_FCTIDUZ): Likewise.
(TARGET_FCTIWUZ): Likewise.
(CTZ_DEFINED_VALUE_AT_ZERO): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
(ctz2): Likewise.
(popcntd2): Likewise.
(lrintsi2): Likewise.
(lrintsi): Likewise.
(lrintsi_di): Likewise.
(cmpmemsi): Likewise.
(bpermd_"): Likewise.
(addg6s): Likewise.
(cdtbcd): Likewise.
(cbcdtd): Likewise.
(div_): Likewise.

Diff:
---
 gcc/config/rs6000/dfp.md|  2 +-
 gcc/config/rs6000/rs6000-builtin.cc |  4 ++--
 gcc/config/rs6000/rs6000-string.cc  |  4 ++--
 gcc/config/rs6000/rs6000.cc |  6 +++---
 gcc/config/rs6000/rs6000.h  | 16 
 gcc/config/rs6000/rs6000.md | 24 
 6 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index fa9d7dd45dd3..b8189390d410 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -214,7 +214,7 @@
 (define_insn "floatdidd2"
   [(set (match_operand:DD 0 "gpc_reg_operand" "=d")
(float:DD (match_operand:DI 1 "gpc_reg_operand" "d")))]
-  "TARGET_DFP && TARGET_POPCNTD"
+  "TARGET_DFP && TARGET_POWER7"
   "dcffix %0,%1"
   [(set_attr "type" "dfp")])
 
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 76421bd1de0b..dae43b672ea7 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -161,9 +161,9 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P6_64:
   return TARGET_POWER6 && TARGET_POWERPC64;
 case ENB_P7:
-  return TARGET_POPCNTD;
+  return TARGET_POWER7;
 case ENB_P7_64:
-  return TARGET_POPCNTD && TARGET_POWERPC64;
+  return TARGET_POWER7 && TARGET_POWERPC64;
 case ENB_P8:
   return TARGET_POWER8;
 case ENB_P8V:
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 55b4133b1a34..3674c4bd9847 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1948,8 +1948,8 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
-  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
-  gcc_assert (TARGET_POPCNTD);
+  /* TARGET_POWER7 is already guarded at expand cmpmemsi.  */
+  gcc_assert (TARGET_POWER7);
 
   /* For P8, this case is complicated to handle because the subtract
  with carry instructions do not generate the 64-bit carry and so
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index dd51d75c4957..7d20e757c7c4 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1999,7 +1999,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
machine_mode mode)
  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
return 1;
 
- if (TARGET_POPCNTD && mode == SImode)
+ if (TARGET_POWER7 && mode == SImode)
return 1;
 
  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
@@ -22473,7 +22473,7 @@ rs6000_rtx_

[gcc(refs/users/meissner/heads/work182)] Add -mcpu=future tuning support.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:ab781c69a65339bb2f5317a666fe386c532215b5

commit ab781c69a65339bb2f5317a666fe386c532215b5
Author: Michael Meissner 
Date:   Wed Nov 6 15:50:25 2024 -0500

Add -mcpu=future tuning support.

This patch makes -mtune=future use the same tuning decision as 
-mtune=power11.

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/power10.md (all reservations): Add future as an
alterntive to power10 and power11.

Diff:
---
 gcc/config/rs6000/power10.md | 144 +--
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 2310c4603457..e42b057dc45b 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -1,4 +1,4 @@
-;; Scheduling description for the IBM Power10 and Power11 processors.
+;; Scheduling description for the IBM Power10, Power11, and Future processors.
 ;; Copyright (C) 2020-2024 Free Software Foundation, Inc.
 ;;
 ;; Contributed by Pat Haugen (pthau...@us.ibm.com).
@@ -97,12 +97,12 @@
(eq_attr "update" "no")
(eq_attr "size" "!128")
(eq_attr "prefixed" "no")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 (define_insn_reservation "power10-fused-load" 4
   (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-prefixed-load" 4
@@ -110,13 +110,13 @@
(eq_attr "update" "no")
(eq_attr "size" "!128")
(eq_attr "prefixed" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-load-update" 4
   (and (eq_attr "type" "load")
(eq_attr "update" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 (define_insn_reservation "power10-fpload-double" 4
@@ -124,7 +124,7 @@
(eq_attr "update" "no")
(eq_attr "size" "64")
(eq_attr "prefixed" "no")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 (define_insn_reservation "power10-prefixed-fpload-double" 4
@@ -132,14 +132,14 @@
(eq_attr "update" "no")
(eq_attr "size" "64")
(eq_attr "prefixed" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-fpload-update-double" 4
   (and (eq_attr "type" "fpload")
(eq_attr "update" "yes")
(eq_attr "size" "64")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 ; SFmode loads are cracked and have additional 3 cycles over DFmode
@@ -148,27 +148,27 @@
   (and (eq_attr "type" "fpload")
(eq_attr "update" "no")
(eq_attr "size" "32")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-fpload-update-single" 7
   (and (eq_attr "type" "fpload")
(eq_attr "update" "yes")
(eq_attr "size" "32")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 (define_insn_reservation "power10-vecload" 4
   (and (eq_attr "type" "vecload")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 ; lxvp
 (define_insn_reservation "power10-vecload-pair" 4
   (and (eq_attr "type" "vecload")
(eq_attr "size" "256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 ; Store Unit
@@ -178,12 +178,12 @@
(eq_attr "prefixed" "no")
(eq_attr "size" "!128")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,STU_power10")
 
 (define_insn_reservation "power10-fused-store" 0
   (and (eq_attr "type" "fused_store_store")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,STU_power10")
 
 (define_insn_reservation "power10-prefixed-store" 0
@@ -191,52 +191,52 @@
(eq_attr "prefixed" "yes")
(eq_attr "size" "!128")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,STU_power10")
 
 ; Update forms have 2 cycle latency for update

[gcc(refs/users/meissner/heads/work182)] Change TARGET_MODULO to TARGET_POWER9

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:c49f8b7ec3a057c0b78bf60f237a27d6d2496ba8

commit c49f8b7ec3a057c0b78bf60f237a27d6d2496ba8
Author: Michael Meissner 
Date:   Wed Nov 6 15:44:41 2024 -0500

Change TARGET_MODULO to TARGET_POWER9

As part of the architecture flags patches, this patch changes the use of
TARGET_MODULO to TARGET_POWER9.  The modulo instructions were added in 
power9 (ISA
3.0).  Note, I did not change the uses of TARGET_MODULO where it was 
explicitly
generating different code if the machine had a modulo instruction.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER9 instead of TARGET_MODULO.
* config/rs6000/rs6000.h (TARGET_CTZ): Likewise.
(TARGET_EXTSWSLI): Likewise.
(TARGET_MADDLD): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtin.cc | 4 ++--
 gcc/config/rs6000/rs6000.h  | 6 +++---
 gcc/config/rs6000/rs6000.md | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index dae43b672ea7..b6093b3cb64c 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -169,9 +169,9 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P8V:
   return TARGET_P8_VECTOR;
 case ENB_P9:
-  return TARGET_MODULO;
+  return TARGET_POWER9;
 case ENB_P9_64:
-  return TARGET_MODULO && TARGET_POWERPC64;
+  return TARGET_POWER9 && TARGET_POWERPC64;
 case ENB_P9V:
   return TARGET_P9_VECTOR;
 case ENB_P10:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3a03c32f..89ca1bad80f3 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -461,9 +461,9 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIWUZ TARGET_POWER7
 /* Only powerpc64 and powerpc476 support fctid.  */
 #define TARGET_FCTID   (TARGET_POWERPC64 || rs6000_cpu == PROCESSOR_PPC476)
-#define TARGET_CTZ TARGET_MODULO
-#define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
-#define TARGET_MADDLD  TARGET_MODULO
+#define TARGET_CTZ TARGET_POWER9
+#define TARGET_EXTSWSLI(TARGET_POWER9 && TARGET_POWERPC64)
+#define TARGET_MADDLD  TARGET_POWER9
 
 /* TARGET_DIRECT_MOVE is redundant to TARGET_P8_VECTOR, so alias it to that.  
*/
 #define TARGET_DIRECT_MOVE TARGET_P8_VECTOR
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bff898a4eff1..fc0d454e9a42 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -403,7 +403,7 @@
  (const_int 1)
 
  (and (eq_attr "isa" "p9")
- (match_test "TARGET_MODULO"))
+ (match_test "TARGET_POWER9"))
  (const_int 1)
 
  (and (eq_attr "isa" "p9v")


[gcc(refs/users/meissner/heads/work182)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:892e05e5f778ff7017741a0c2eed5133e7527a65

commit 892e05e5f778ff7017741a0c2eed5133e7527a65
Author: Michael Meissner 
Date:   Wed Nov 6 15:53:49 2024 -0500

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.meissner | 416 +
 1 file changed, 416 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index a04bd0a46f88..62db91641260 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,419 @@
+ Branch work182, patch #40 was reverted 

+
+Add -mcpu=future tuning support.
+
+This patch makes -mtune=future use the same tuning decision as -mtune=power11.
+
+2024-11-06  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/power10.md (all reservations): Add future as an
+   alterntive to power10 and power11.
+
+ Branch work182, patch #39 was reverted 

+
+Add support for -mcpu=future
+
+This patch adds the support that can be used in developing GCC support for
+future PowerPC processors.
+
+2024-11-06  Michael Meissner  
+
+   * config.gcc (powerpc*-*-*): Add support for --with-cpu=future.
+   * config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=future.
+   * config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
+   * config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
+   * config/rs6000/driver-rs6000.cc (asm_names): Likewise.
+   * config/rs6000/rs6000-arch.def: Add future cpu.
+   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): If
+   -mcpu=future, define _ARCH_FUTURE.
+   * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
+   (future cpu): Define.
+   * config/rs6000/rs6000-opts.h (enum processor_type): Add
+   PROCESSOR_FUTURE.
+   * config/rs6000/rs6000-tables.opt: Regenerate.
+   * config/rs6000/rs6000.cc (power10_cost): Update comment.
+   (get_arch_flags): Add support for future processor.
+   (rs6000_option_override_internal): Likewise.
+   (rs6000_machine_from_flags): Likewise.
+   (rs6000_reassociation_width): Likewise.
+   (rs6000_adjust_cost): Likewise.
+   (rs6000_issue_rate): Likewise.
+   (rs6000_sched_reorder): Likewise.
+   (rs6000_sched_reorder2): Likewise.
+   (rs6000_register_move_cost): Likewise.
+   * config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
+   (TARGET_POWER11): New macro.
+   * config/rs6000/rs6000.md (cpu attribute): Likewise.
+
+ Branch work182, patch #38 was reverted 

+
+Update tests to work with architecture flags changes.
+
+Two tests used -mvsx to raise the processor level to at least power7.  These
+tests were rewritten to add cpu=power7 support.
+
+I have built both big endian and little endian bootstrap compilers and there
+were no regressions.
+
+In addition, I constructed a test case that used every archiecture define (like
+_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
+this test for all supported combinations of -mcpu, big/little endian, and 32/64
+bit support.  Every single instance generated exactly the same code with the
+patches installed compared to the compiler before installing the patches.
+
+Can I install this patch on the GCC 15 trunk?
+
+2024-11-06  Michael Meissner  
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/ppc-target-4.c: Rewrite the test to add cpu=power7
+   when we need to add VSX support.  Add test for adding cpu=power7 no-vsx
+   to generate only Altivec instructions.
+   * gcc.target/powerpc/pr115688.c: Add cpu=power7 when requesting VSX
+   instructions.
+
+ Branch work182, patch #37 was reverted 

+
+Change TARGET_MODULO to TARGET_POWER9
+
+As part of the architecture flags patches, this patch changes the use of
+TARGET_MODULO to TARGET_POWER9.  The modulo instructions were added in power9 
(ISA
+3.0).  Note, I did not change the uses of TARGET_MODULO where it was explicitly
+generating different code if the machine had a modulo instruction.
+
+I have built both big endian and little endian bootstrap compilers and there
+were no regressions.
+
+In addition, I constructed a test case that used every archiecture define (like
+_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
+this test for all supported combinations of -mcpu, big/little endian, and 32/64
+bit support.  Every single instance generated exactly the same code with the
+patches installed compared to the compiler before installing the patches.
+
+Can I install this patch on the GCC 15 trunk?
+
+2024-11-06  Michael Meissner  
+
+   * config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
+   TARGET_POWER9 instead of TARGET_MODULO.
+   * config/rs6000/rs6000.h (TARGET_CTZ): Likewise.
+   (TARGET_EXTSWSLI): Likewise.
+   (TARGET_MADDLD): Likewise.
+   * config/rs6000/rs6000

[gcc(refs/users/meissner/heads/work182)] Add support for -mcpu=future

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:11869df9a8300867a8e46a7de199cb703bc2410c

commit 11869df9a8300867a8e46a7de199cb703bc2410c
Author: Michael Meissner 
Date:   Wed Nov 6 15:48:43 2024 -0500

Add support for -mcpu=future

This patch adds the support that can be used in developing GCC support for
future PowerPC processors.

2024-11-06  Michael Meissner  

* config.gcc (powerpc*-*-*): Add support for --with-cpu=future.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for 
-mcpu=future.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/rs6000-arch.def: Add future cpu.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): If
-mcpu=future, define _ARCH_FUTURE.
* config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
(future cpu): Define.
* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_FUTURE.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (power10_cost): Update comment.
(get_arch_flags): Add support for future processor.
(rs6000_option_override_internal): Likewise.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
(TARGET_POWER11): New macro.
* config/rs6000/rs6000.md (cpu attribute): Likewise.

Diff:
---
 gcc/config.gcc  |  4 ++--
 gcc/config/rs6000/aix71.h   |  1 +
 gcc/config/rs6000/aix72.h   |  1 +
 gcc/config/rs6000/aix73.h   |  1 +
 gcc/config/rs6000/driver-rs6000.cc  |  2 ++
 gcc/config/rs6000/rs6000-arch.def   |  1 +
 gcc/config/rs6000/rs6000-c.cc   |  2 ++
 gcc/config/rs6000/rs6000-cpus.def   |  3 +++
 gcc/config/rs6000/rs6000-opts.h |  1 +
 gcc/config/rs6000/rs6000-tables.opt | 11 +++
 gcc/config/rs6000/rs6000.cc | 34 ++
 gcc/config/rs6000/rs6000.h  |  2 ++
 gcc/config/rs6000/rs6000.md |  2 +-
 13 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fd8482287228..d552d01b4390 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -539,7 +539,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
-   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
+   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500|xfuture)
cpu_is_64bit=yes
;;
esac
@@ -5647,7 +5647,7 @@ case "${target}" in
tm_defines="${tm_defines} CONFIG_PPC405CR"
eval "with_$which=405"
;;
-   "" | common | native \
+   "" | common | native | future \
| power[3456789] | power1[01] | power5+ | power6x \
| powerpc | powerpc64 | powerpc64le \
| rs64 \
diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 41037b3852d7..570ddcc451db 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -mpwr11; \
   mcpu=power10: -mpwr10; \
   mcpu=power9: -mpwr9; \
diff --git a/gcc/config/rs6000/aix72.h b/gcc/config/rs6000/aix72.h
index fe59f8319b48..242ca94bd065 100644
--- a/gcc/config/rs6000/aix72.h
+++ b/gcc/config/rs6000/aix72.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -mpwr11; \
   mcpu=power10: -mpwr10; \
   mcpu=power9: -mpwr9; \
diff --git a/gcc/config/rs6000/aix73.h b/gcc/config/rs6000/aix73.h
index 1318b0b3662d..2bd6b4bb3c4f 100644
--- a/gcc/config/rs6000/aix73.h
+++ b/gcc/config/rs6000/aix73.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -m

[gcc(refs/users/meissner/heads/work182)] Change TARGET_FPRND to TARGET_POWER5X

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:bc1106eb6337f561461161bf65d8c66a80190285

commit bc1106eb6337f561461161bf65d8c66a80190285
Author: Michael Meissner 
Date:   Wed Nov 6 15:34:41 2024 -0500

Change TARGET_FPRND to TARGET_POWER5X

As part of the architecture flags patches, this patch changes the use of
TARGET_FPRND to TARGET_POWER5X.  The FPRND instruction was added in power5+.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/rs6000.cc (report_architecture_mismatch): Use
TARGET_POWER5X instead of TARGET_FPRND.
* config/rs6000/rs6000.md (fmod3): Use TARGET_POWER5X instead 
of
TARGET_FPRND.
(remainder3): Likewise.
(fctiwuz_): Likewise.
(btrunc2): Likewise.
(ceil2): Likewise.
(floor2): Likewise.
(round): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000.cc |  2 +-
 gcc/config/rs6000/rs6000.md | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a944ffde28a6..dd51d75c4957 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -25428,7 +25428,7 @@ report_architecture_mismatch (void)
 rs6000_isa_flags |= (ISA_2_5_MASKS_SERVER & ~ignore_masks);
   else if (TARGET_CMPB)
 rs6000_isa_flags |= (ISA_2_5_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_FPRND)
+  else if (TARGET_POWER5X)
 rs6000_isa_flags |= (ISA_2_4_MASKS & ~ignore_masks);
   else if (TARGET_POPCNTB)
 rs6000_isa_flags |= (ISA_2_2_MASKS & ~ignore_masks);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 10d13bf812d2..7f9fe609a031 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5171,7 +5171,7 @@
(use (match_operand:SFDF 1 "gpc_reg_operand"))
(use (match_operand:SFDF 2 "gpc_reg_operand"))]
   "TARGET_HARD_FLOAT
-   && TARGET_FPRND
+   && TARGET_POWER5X
&& flag_unsafe_math_optimizations"
 {
   rtx div = gen_reg_rtx (mode);
@@ -5189,7 +5189,7 @@
(use (match_operand:SFDF 1 "gpc_reg_operand"))
(use (match_operand:SFDF 2 "gpc_reg_operand"))]
   "TARGET_HARD_FLOAT
-   && TARGET_FPRND
+   && TARGET_POWER5X
&& flag_unsafe_math_optimizations"
 {
   rtx div = gen_reg_rtx (mode);
@@ -6687,7 +6687,7 @@
 (define_insn "*friz"
   [(set (match_operand:DF 0 "gpc_reg_operand" "=d,wa")
(float:DF (fix:DI (match_operand:DF 1 "gpc_reg_operand" "d,wa"]
-  "TARGET_HARD_FLOAT && TARGET_FPRND
+  "TARGET_HARD_FLOAT && TARGET_POWER5X
&& flag_unsafe_math_optimizations && !flag_trapping_math && TARGET_FRIZ"
   "@
friz %0,%1
@@ -6815,7 +6815,7 @@
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIZ))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
friz %0,%1
xsrdpiz %x0,%x1"
@@ -6825,7 +6825,7 @@
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIP))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
frip %0,%1
xsrdpip %x0,%x1"
@@ -6835,7 +6835,7 @@
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIM))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
frim %0,%1
xsrdpim %x0,%x1"
@@ -6846,7 +6846,7 @@
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "")]
 UNSPEC_FRIN))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "frin %0,%1"
   [(set_attr "type" "fp")])


[gcc/meissner/heads/work182-bugs] (20 commits) Merge commit 'refs/users/meissner/heads/work182-bugs' of gi

2024-11-06 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work182-bugs' was updated to point to:

 28ac3fbb5f0d... Merge commit 'refs/users/meissner/heads/work182-bugs' of gi

It previously pointed to:

 2db27caa81c2... Update ChangeLog.*

Diff:

Summary of changes (added commits):
---

  28ac3fb... Merge commit 'refs/users/meissner/heads/work182-bugs' of gi
  d261945... Update ChangeLog.*
  8cb1bce... PR 99293: Optimize splat of a V2DF/V2DI extract with consta
  d350c11... Add ChangeLog.bugs and update REVISION.
  892e05e... Update ChangeLog.* (*)
  ab781c6... Add -mcpu=future tuning support. (*)
  11869df... Add support for -mcpu=future (*)
  3457a6e... Update tests to work with architecture flags changes. (*)
  c49f8b7... Change TARGET_MODULO to TARGET_POWER9 (*)
  ef7b8de... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  b024a7e... Change TARGET_CMPB to TARGET_POWER6 (*)
  bc1106e... Change TARGET_FPRND to TARGET_POWER5X (*)
  c875852... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  1b99261... Do not allow -mvsx to boost processor to power7. (*)
  2c3a02c... Use architecture flags for defining _ARCH_PWR macros. (*)
  e2d1785... Add rs6000 architecture masks. (*)
  27f73de... Revert changes (*)
  b5d3ebf... Add rs6000 architecture masks. (*)
  132c8a7... Add rs6000 architecture masks. (*)
  848be25... Revert changes (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work182-bugs' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work182-bugs)] Add ChangeLog.bugs and update REVISION.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:d350c1191a696b11a8eb30fc78c36f0bd8191adc

commit d350c1191a696b11a8eb30fc78c36f0bd8191adc
Author: Michael Meissner 
Date:   Tue Oct 22 15:34:50 2024 -0400

Add ChangeLog.bugs and update REVISION.

2024-10-22  Michael Meissner  

gcc/

* ChangeLog.bugs: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.bugs | 5 +
 gcc/REVISION   | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.bugs b/gcc/ChangeLog.bugs
new file mode 100644
index ..df519ccc2fc8
--- /dev/null
+++ b/gcc/ChangeLog.bugs
@@ -0,0 +1,5 @@
+ Branch work182-bugs, baseline 
+
+2024-10-22   Michael Meissner  
+
+   Clone branch
diff --git a/gcc/REVISION b/gcc/REVISION
index 5aaca2bd398a..3a97386db4d2 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work182 branch
+work182-bugs branch


[gcc/meissner/heads/work182-dmf] (30 commits) Merge commit 'refs/users/meissner/heads/work182-dmf' of git

2024-11-06 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work182-dmf' was updated to point to:

 e2b2ecbb0d9b... Merge commit 'refs/users/meissner/heads/work182-dmf' of git

It previously pointed to:

 663944a2b488... Revert changes

Diff:

Summary of changes (added commits):
---

  e2b2ecb... Merge commit 'refs/users/meissner/heads/work182-dmf' of git
  03a0a55... Revert changes
  72dd4a7... Update ChangeLog.*
  cac1d9d... RFC2677-Add xvrlw support.
  360c047... RFC2686-Add paddis support.
  cbd2455... RFC2655-Add saturating subtract built-ins.
  d4ca240... RFC2656-Support load/store vector with right length.
  734d503... RFC2653-PowerPC: Add support for 1,024 bit DMR registers.
  24421eb... RFC2653-Add dense math test for new instruction names.
  d340b89... RFC2653-PowerPC: Switch to dense math names for all MMA ope
  f4a8690... RFC2653-Add support for dense math registers.
  d8e9a73... RFC2653-Add wD constraint.
  52b2990... Use vector pair load/store for memcpy with -mcpu=future
  787e71a... Add ChangeLog.dmf and update REVISION.
  892e05e... Update ChangeLog.* (*)
  ab781c6... Add -mcpu=future tuning support. (*)
  11869df... Add support for -mcpu=future (*)
  3457a6e... Update tests to work with architecture flags changes. (*)
  c49f8b7... Change TARGET_MODULO to TARGET_POWER9 (*)
  ef7b8de... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  b024a7e... Change TARGET_CMPB to TARGET_POWER6 (*)
  bc1106e... Change TARGET_FPRND to TARGET_POWER5X (*)
  c875852... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  1b99261... Do not allow -mvsx to boost processor to power7. (*)
  2c3a02c... Use architecture flags for defining _ARCH_PWR macros. (*)
  e2d1785... Add rs6000 architecture masks. (*)
  27f73de... Revert changes (*)
  b5d3ebf... Add rs6000 architecture masks. (*)
  132c8a7... Add rs6000 architecture masks. (*)
  848be25... Revert changes (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work182-dmf' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work182-bugs)] Merge commit 'refs/users/meissner/heads/work182-bugs' of git+ssh://gcc.gnu.org/git/gcc into me/work1

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:28ac3fbb5f0d2e9a72bf3666d4523b2da53852ac

commit 28ac3fbb5f0d2e9a72bf3666d4523b2da53852ac
Merge: d261945e4782 2db27caa81c2
Author: Michael Meissner 
Date:   Wed Nov 6 16:01:53 2024 -0500

Merge commit 'refs/users/meissner/heads/work182-bugs' of 
git+ssh://gcc.gnu.org/git/gcc into me/work182-bugs

Diff:


[gcc(refs/users/meissner/heads/work182-bugs)] PR 99293: Optimize splat of a V2DF/V2DI extract with constant element

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:8cb1bce5427a0e3999f5bb03621a20dbadb23f69

commit 8cb1bce5427a0e3999f5bb03621a20dbadb23f69
Author: Michael Meissner 
Date:   Tue Oct 22 16:31:04 2024 -0400

PR 99293: Optimize splat of a V2DF/V2DI extract with constant element

We had optimizations for splat of a vector extract for the other vector
types, but we missed having one for V2DI and V2DF.  This patch adds a
combiner insn to do this optimization.

In looking at the source, we had similar optimizations for V4SI and V4SF
extract and splats, but we missed doing V2DI/V2DF.

Without the patch for the code:

vector long long splat_dup_l_0 (vector long long v)
{
  return __builtin_vec_splats (__builtin_vec_extract (v, 0));
}

the compiler generates (on a little endian power9):

splat_dup_l_0:
mfvsrld 9,34
mtvsrdd 34,9,9
blr

Now it generates:

splat_dup_l_0:
xxpermdi 34,34,34,3
blr

2024-10-22  Michael Meissner  

gcc/

* config/rs6000/vsx.md (vsx_splat_extract_): New insn.

gcc/testsuite/

* gcc.target/powerpc/builtins-1.c: Adjust insn count.
* gcc.target/powerpc/pr99293.c: New test.

Diff:
---
 gcc/config/rs6000/vsx.md  | 18 ++
 gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr99293.c| 22 ++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b2fc39acf4e8..73f20a86e56a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4796,6 +4796,24 @@
   "lxvdsx %x0,%y1"
   [(set_attr "type" "vecload")])
 
+;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element
+(define_insn "*vsx_splat_extract_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+   (vec_duplicate:VSX_D
+(vec_select:
+ (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+ (parallel [(match_operand 2 "const_0_to_1_operand" "n")]]
+  "VECTOR_MEM_VSX_P (mode)"
+{
+  int which_word = INTVAL (operands[2]);
+  if (!BYTES_BIG_ENDIAN)
+which_word = 1 - which_word;
+
+  operands[3] = GEN_INT (which_word ? 3 : 0);
+  return "xxpermdi %x0,%x1,%x1,%3";
+}
+  [(set_attr "type" "vecperm")])
+
 ;; V4SI splat support
 (define_insn "vsx_splat_v4si"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
index 8410a5fd4319..4e7e5384675f 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
@@ -1035,4 +1035,4 @@ foo156 (vector unsigned short usa)
 /* { dg-final { scan-assembler-times {\mvmrglb\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mvmrgew\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mvsplth|xxsplth\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mxxpermdi\M} 44 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 42 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c 
b/gcc/testsuite/gcc.target/powerpc/pr99293.c
new file mode 100644
index ..20adc1f27f65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c
@@ -0,0 +1,22 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test for PR 99263, which wants to do:
+   __builtin_vec_splats (__builtin_vec_extract (v, n))
+
+   where v is a V2DF or V2DI vector and n is either 0 or 1.  Previously the
+   compiler would do a direct move to the GPR registers to select the item and 
a
+   direct move from the GPR registers to do the splat.  */
+
+vector long long splat_dup_l_0 (vector long long v)
+{
+  return __builtin_vec_splats (__builtin_vec_extract (v, 0));
+}
+
+vector long long splat_dup_l_1 (vector long long v)
+{
+  return __builtin_vec_splats (__builtin_vec_extract (v, 1));
+}
+
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */


[gcc(refs/users/meissner/heads/work182-bugs)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:d261945e478258a73d10f5ba2af999547ec7

commit d261945e478258a73d10f5ba2af999547ec7
Author: Michael Meissner 
Date:   Tue Oct 22 16:32:40 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.bugs | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/gcc/ChangeLog.bugs b/gcc/ChangeLog.bugs
index df519ccc2fc8..83afad3d0ad8 100644
--- a/gcc/ChangeLog.bugs
+++ b/gcc/ChangeLog.bugs
@@ -1,5 +1,56 @@
+ Branch work182-bugs, patch #200 
+
+PR 99293: Optimize splat of a V2DF/V2DI extract with constant element
+
+We had optimizations for splat of a vector extract for the other vector
+types, but we missed having one for V2DI and V2DF.  This patch adds a
+combiner insn to do this optimization.
+
+In looking at the source, we had similar optimizations for V4SI and V4SF
+extract and splats, but we missed doing V2DI/V2DF.
+
+Without the patch for the code:
+
+   vector long long splat_dup_l_0 (vector long long v)
+   {
+ return __builtin_vec_splats (__builtin_vec_extract (v, 0));
+   }
+
+the compiler generates (on a little endian power9):
+
+   splat_dup_l_0:
+   mfvsrld 9,34
+   mtvsrdd 34,9,9
+   blr
+
+Now it generates:
+
+   splat_dup_l_0:
+   xxpermdi 34,34,34,3
+   blr
+
+2024-10-14  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/vsx.md (vsx_splat_extract_): New insn.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/builtins-1.c: Adjust insn count.
+   * gcc.target/powerpc/pr99293.c: New test.
+
  Branch work182-bugs, baseline 
 
+Add ChangeLog.bugs and update REVISION.
+
+2024-10-14  Michael Meissner  
+
+gcc/
+
+   * ChangeLog.bugs: New file for branch.
+   * REVISION: Update.
+
 2024-10-22   Michael Meissner  
 
Clone branch


[gcc(refs/users/meissner/heads/work182-dmf)] RFC2686-Add paddis support.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:360c0471348a4f8b8a8b866c704823567d62bb58

commit 360c0471348a4f8b8a8b866c704823567d62bb58
Author: Michael Meissner 
Date:   Tue Oct 22 16:24:17 2024 -0400

RFC2686-Add paddis support.

2024-10-22  Michael Meissner  

gcc/

* config/rs6000/constraints.md (eU): New constraint.
(eV): Likewise.
* config/rs6000/predicates.md (paddis_operand): New predicate.
(paddis_paddi_operand): Likewise.
(add_operand): Add paddis support.
* config/rs6000/rs6000.cc (num_insns_constant_gpr): Add paddis 
support.
(num_insns_constant_multi): Likewise.
(print_operand): Add %B for paddis support.
* config/rs6000/rs6000.h (TARGET_PADDIS): New macro.
(SIGNED_INTEGER_32BIT_P): Likewise.
* config/rs6000/rs6000.md (isa attribute): Add paddis support.
(enabled attribute); Likewise.
(add3): Likewise.
(adddi3 splitter): New splitter for paddis.
(movdi_internal64): Add paddis support.
(movdi splitter): New splitter for paddis.

gcc/testsuite/

* gcc.target/powerpc/prefixed-addis.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md  | 10 +++
 gcc/config/rs6000/predicates.md   | 52 +++-
 gcc/config/rs6000/rs6000.cc   | 25 ++
 gcc/config/rs6000/rs6000.h|  4 +
 gcc/config/rs6000/rs6000.md   | 96 ---
 gcc/testsuite/gcc.target/powerpc/prefixed-addis.c | 24 ++
 6 files changed, 197 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 277a30a82458..4d8d21fd6bbb 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -222,6 +222,16 @@
   "An IEEE 128-bit constant that can be loaded into VSX registers."
   (match_operand 0 "easy_vector_constant_ieee128"))
 
+(define_constraint "eU"
+  "@internal integer constant that can be loaded with paddis"
+  (and (match_code "const_int")
+   (match_operand 0 "paddis_operand")))
+
+(define_constraint "eV"
+  "@internal integer constant that can be loaded with paddis + paddi"
+  (and (match_code "const_int")
+   (match_operand 0 "paddis_paddi_operand")))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 2797c3cf619b..f8e7df5e7f5b 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -369,6 +369,53 @@
   return SIGNED_INTEGER_34BIT_P (INTVAL (op));
 })
 
+;; Return 1 if op is a 64-bit constant that uses the paddis instruction
+(define_predicate "paddis_operand"
+  (match_code "const_int")
+{
+  if (!TARGET_PADDIS && TARGET_POWERPC64)
+return 0;
+
+  /* If addi, addis, or paddi can handle the number, don't return true.  */
+  HOST_WIDE_INT value = INTVAL (op);
+  if (SIGNED_INTEGER_34BIT_P (value))
+return false;
+
+  /* If the number is too large for padds, return false.  */
+  if (!SIGNED_INTEGER_32BIT_P (value >> 32))
+return false;
+
+  /* If the bottom 32-bits are non-zero, paddis can't handle it.  */
+  if ((value & HOST_WIDE_INT_C(0x)) != 0)
+return false;
+
+  return true;
+})
+
+;; Return 1 if op is a 64-bit constant that needs the paddis instruction and an
+;; addi/addis/paddi instruction combination.
+(define_predicate "paddis_paddi_operand"
+  (match_code "const_int")
+{
+  if (!TARGET_PADDIS && TARGET_POWERPC64)
+return 0;
+
+  /* If addi, addis, or paddi can handle the number, don't return true.  */
+  HOST_WIDE_INT value = INTVAL (op);
+  if (SIGNED_INTEGER_34BIT_P (value))
+return false;
+
+  /* If the number is too large for padds, return false.  */
+  if (!SIGNED_INTEGER_32BIT_P (value >> 32))
+return false;
+
+  /* If the bottom 32-bits are zero, we can use paddis alone to handle it.  */
+  if ((value & HOST_WIDE_INT_C(0x)) == 0)
+return false;
+
+  return true;
+})
+
 ;; Return 1 if op is a register that is not special.
 ;; Disallow (SUBREG:SF (REG:SI)) and (SUBREG:SI (REG:SF)) on VSX systems where
 ;; you need to be careful in moving a SFmode to SImode and vice versa due to
@@ -1113,7 +1160,10 @@
   (if_then_else (match_code "const_int")
 (match_test "satisfies_constraint_I (op)
 || satisfies_constraint_L (op)
-|| satisfies_constraint_eI (op)")
+|| satisfies_constraint_eI (op)
+|| satisfies_constraint_eU (op)
+|| satisfies_constraint_eV (op)")
+
 (match_operand 0 "gpc_reg_operand")))
 
 ;; Return 1 if the operand is either a non-special register, or 0, or -1.
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a660101f51a5..8ba87a3a2814 100644
--- a/gcc/con

[gcc(refs/users/meissner/heads/work182-dmf)] RFC2653-PowerPC: Add support for 1, 024 bit DMR registers.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:734d503889ad5c84f355d26513500025e420cc68

commit 734d503889ad5c84f355d26513500025e420cc68
Author: Michael Meissner 
Date:   Tue Oct 22 16:20:44 2024 -0400

RFC2653-PowerPC: Add support for 1,024 bit DMR registers.

This patch is a prelimianry patch to add the full 1,024 bit dense math 
register
(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of 
the
DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 
bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX 
registers,
since there are no load/store dense math instructions.  I added the new 
keyword
'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At 
present, I
don't have aliases for __dmr512 and __dmr1024 that we've discussed 
internally.

The patches have been tested on both little and big endian systems.  Can I 
check
it into the master branch?

2024-10-22   Michael Meissner  

gcc/

* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
(UNSPEC_DM_INSERT512_LOWER): Likewise.
(UNSPEC_DM_EXTRACT512): Likewise.
(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
(movtdo): New define_expand and define_insn_and_split to implement 
1,024
bit DMR registers.
(movtdo_insert512_upper): New insn.
(movtdo_insert512_lower): Likewise.
(movtdo_extract512): Likewise.
(reload_dmr_from_memory): Likewise.
(reload_dmr_to_memory): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
support.
(rs6000_init_builtins): Add support for __dmr keyword.
* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add 
support
for TDOmode.
(rs6000_function_arg): Likewise.
* config/rs6000/rs6000-modes.def (TDOmode): New mode.
* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
support for TDOmode.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_modes_tieable_p): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup 
reload
hooks for DMR mode.
(reg_offset_addressing_ok_p): Add support for TDOmode.
(rs6000_emit_move): Likewise.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(rs6000_mangle_type): Add mangling for __dmr type.
(rs6000_dmr_register_move_cost): Add support for TDOmode.
(rs6000_split_multireg_move): Likewise.
(rs6000_invalid_conversion): Likewise.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
(enum rs6000_builtin_type_index): Add DMR type nodes.
(dmr_type_node): Likewise.
(ptr_dmr_type_node): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-1024bit.c: New test.

Diff:
---
 gcc/config/rs6000/mma.md  | 154 ++
 gcc/config/rs6000/rs6000-builtin.cc   |  17 +++
 gcc/config/rs6000/rs6000-call.cc  |  10 +-
 gcc/config/rs6000/rs6000-modes.def|   4 +
 gcc/config/rs6000/rs6000.cc   | 101 -
 gcc/config/rs6000/rs6000.h|   6 +-
 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c |  63 +++
 7 files changed, 321 insertions(+), 34 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 2e04eb653fa6..8461499e1c3d 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -92,6 +92,11 @@
UNSPEC_MMA_XXMFACC
UNSPEC_MMA_XXMTACC
UNSPEC_MMA_DMSETDMRZ
+   UNSPEC_DM_INSERT512_UPPER
+   UNSPEC_DM_INSERT512_LOWER
+   UNSPEC_DM_EXTRACT512
+   UNSPEC_DMR_RELOAD_FROM_MEMORY
+   UNSPEC_DMR_RELOAD_TO_MEMORY
   ])
 
 (define_c_enum "unspecv"
@@ -793,3 +798,152 @@
 }
   [(set_attr "type" "mma")
(set_attr "prefixed" "yes")])
+
+;; TDOmode (__dmr keyword for 1,024 bit registers).
+(define_expand "movtdo"
+  [(set (match_operand:TDO 0 "nonimmediate_operand")
+   (match_operand:TDO 1 "input_operand"))]
+  "TARGET_MMA_DENSE_MATH"
+{
+  rs6000_emit_move (operands[0], operands[1], TDOmode);
+  DONE;
+})
+
+(define_insn_and_split "*movtdo"
+  [(set (match_operand:TDO

[gcc(refs/users/meissner/heads/work182-dmf)] RFC2677-Add xvrlw support.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:cac1d9d66ab682d06523c462e2cb67d50e6958c9

commit cac1d9d66ab682d06523c462e2cb67d50e6958c9
Author: Michael Meissner 
Date:   Tue Oct 22 16:25:54 2024 -0400

RFC2677-Add xvrlw support.

2024-10-22  Michael Meissner  

gcc/

* config/rs6000/altivec.md (xvrlw): New insn.
* config/rs6000/rs6000.h (TARGET_XVRLW): New macro.

gcc/testsuite/

* gcc.target/powerpc/vector-rotate-left.c: New test.

Diff:
---
 gcc/config/rs6000/altivec.md   | 14 +
 gcc/config/rs6000/rs6000.h |  3 ++
 .../gcc.target/powerpc/vector-rotate-left.c| 34 ++
 3 files changed, 51 insertions(+)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 00dad4b91f1c..a875dc9a4ec1 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1983,6 +1983,20 @@
 }
   [(set_attr "type" "vecperm")])
 
+;; -mcpu=future adds a vector rotate left word variant.  There is no vector
+;; byte/half-word/double-word/quad-word rotate left.  This insn occurs before
+;; altivec_vrl and will match for -mcpu=future, while other cpus will
+;; match the generic insn.
+(define_insn "*xvrlw"
+  [(set (match_operand:V4SI 0 "register_operand" "=v,wa")
+   (rotate:V4SI (match_operand:V4SI 1 "register_operand" "v,wa")
+(match_operand:V4SI 2 "register_operand" "v,wa")))]
+  "TARGET_XVRLW"
+  "@
+   vrlw %0,%1,%2
+   xvrlw %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "altivec_vrl"
   [(set (match_operand:VI2 0 "register_operand" "=v")
 (rotate:VI2 (match_operand:VI2 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 6a3fbc1e0fe5..c4d8e52a28a6 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -590,6 +590,9 @@ extern int rs6000_vector_align[];
 /* Whether we have PADDIS support.  */
 #define TARGET_PADDIS  TARGET_FUTURE
 
+/* Whether we have XVRLW support.  */
+#define TARGET_XVRLW   TARGET_FUTURE
+
 /* Whether the various reciprocal divide/square root estimate instructions
exist, and whether we should automatically generate code for the instruction
by default.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-rotate-left.c 
b/gcc/testsuite/gcc.target/powerpc/vector-rotate-left.c
new file mode 100644
index ..5a5f37755077
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-rotate-left.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-mdejagnu-cpu=future -O2" } */
+
+/* Test whether the xvrl (vector word rotate left using VSX registers insead of
+   Altivec registers is generated.  */
+
+#include 
+
+typedef vector unsigned int  v4si_t;
+
+v4si_t
+rotl_v4si_scalar (v4si_t x, unsigned long n)
+{
+  __asm__ (" # %x0" : "+f" (x));
+  return (x << n) | (x >> (32 - n));   /* xvrlw.  */
+}
+
+v4si_t
+rotr_v4si_scalar (v4si_t x, unsigned long n)
+{
+  __asm__ (" # %x0" : "+f" (x));
+  return (x >> n) | (x << (32 - n));   /* xvrlw.  */
+}
+
+v4si_t
+rotl_v4si_vector (v4si_t x, v4si_t y)
+{
+  __asm__ (" # %x0" : "+f" (x));   /* xvrlw.  */
+  return vec_rl (x, y);
+}
+
+/* { dg-final { scan-assembler-times {\mxvrlw\M} 3  } } */


[gcc(refs/users/meissner/heads/work182-dmf)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:72dd4a7afce5476959c49a9b41d5c3a9f167299d

commit 72dd4a7afce5476959c49a9b41d5c3a9f167299d
Author: Michael Meissner 
Date:   Tue Oct 22 16:29:33 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.dmf | 450 ++
 1 file changed, 450 insertions(+)

diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf
index 2ac899d39122..24e0fdcca388 100644
--- a/gcc/ChangeLog.dmf
+++ b/gcc/ChangeLog.dmf
@@ -1,5 +1,455 @@
+ Branch work182-dmf, patch #113 
+
+RFC2677-Add xvrlw support.
+
+2024-10-14  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/altivec.md (xvrlw): New insn.
+   * config/rs6000/rs6000.h (TARGET_XVRLW): New macro.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/vector-rotate-left.c: New test.
+
+ Branch work182-dmf, patch #112 
+
+RFC2686-Add paddis support.
+
+2024-10-14  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/constraints.md (eU): New constraint.
+   (eV): Likewise.
+   * config/rs6000/predicates.md (paddis_operand): New predicate.
+   (paddis_paddi_operand): Likewise.
+   (add_operand): Add paddis support.
+   * config/rs6000/rs6000.cc (num_insns_constant_gpr): Add paddis support.
+   (num_insns_constant_multi): Likewise.
+   (print_operand): Add %B for paddis support.
+   * config/rs6000/rs6000.h (TARGET_PADDIS): New macro.
+   (SIGNED_INTEGER_32BIT_P): Likewise.
+   * config/rs6000/rs6000.md (isa attribute): Add paddis support.
+   (enabled attribute); Likewise.
+   (add3): Likewise.
+   (adddi3 splitter): New splitter for paddis.
+   (movdi_internal64): Add paddis support.
+   (movdi splitter): New splitter for paddis.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/prefixed-addis.c: New test.
+
+ Branch work182-dmf, patch #111 
+
+RFC2655-Add saturating subtract built-ins.
+
+This patch adds support for a saturating subtract built-in function that may be
+added to a future PowerPC processor.  Note, if it is added, the name of the
+built-in function may change before GCC 13 is released.  If the name changes,
+we will submit a patch changing the name.
+
+I also added support for providing dense math built-in functions, even though
+at present, we have not added any new built-in functions for dense math.  It is
+likely we will want to add new dense math built-in functions as the dense math
+support is fleshed out.
+
+The patches have been tested on both little and big endian systems.  Can I 
check
+it into the master branch?
+
+2024-10-14   Michael Meissner  
+
+gcc/
+
+   * config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add support
+   for flagging invalid use of future built-in functions.
+   (rs6000_builtin_is_supported): Add support for future built-in
+   functions.
+   * config/rs6000/rs6000-builtins.def (__builtin_saturate_subtract32): New
+   built-in function for -mcpu=future.
+   (__builtin_saturate_subtract64): Likewise.
+   * config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add stanzas
+   for -mcpu=future built-ins.
+   (stanza_map): Likewise.
+   (enable_string): Likewise.
+   (struct attrinfo): Likewise.
+   (parse_bif_attrs): Likewise.
+   (write_decls): Likewise.
+   * config/rs6000/rs6000.md (sat_sub3): Add saturating subtract
+   built-in insn declarations.
+   (sat_sub3_dot): Likewise.
+   (sat_sub3_dot2): Likewise.
+   * doc/extend.texi (Future PowerPC built-ins): New section.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/subfus-1.c: New test.
+   * gcc.target/powerpc/subfus-2.c: Likewise.
+
+ Branch work182-dmf, patch #110 
+
+RFC2656-Support load/store vector with right length.
+
+This patch adds support for new instructions that may be added to the PowerPC
+architecture in the future to enhance the load and store vector with length
+instructions.
+
+The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use
+since the count for the number of bytes must be in the top 8 bits of the GPR
+register, instead of the bottom 8 bits.  This meant that code generating these
+instructions typically had to do a shift left by 56 bits to get the count into
+the right position.  In a future version of the PowerPC architecture, new
+variants of these instructions might be added that expect the count to be in
+the bottom 8 bits of the GPR register.  These patches add this support to GCC
+if the user uses the -mcpu=future option.
+
+I discovered that the code in rs6000-string.cc to generate ISA 3.1 lxvl/stxvl
+future lxvll/stxvll instructions would generate these instructions on 32-bit.
+However the patterns for these instructions is only done on 64-bit systems.  So
+I added a check for 64-bit support before generating the instructions.
+
+The patches have been tes

[gcc(refs/users/meissner/heads/work182-dmf)] Revert changes

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:03a0a55f93bd9e6f3f1769a7c82ced60d6a31cca

commit 03a0a55f93bd9e6f3f1769a7c82ced60d6a31cca
Author: Michael Meissner 
Date:   Tue Oct 22 18:01:36 2024 -0400

Revert changes

Diff:
---
 gcc/ChangeLog.dmf  | 45 +-
 gcc/config/rs6000/altivec.md   | 14 
 gcc/config/rs6000/constraints.md   | 10 ---
 gcc/config/rs6000/predicates.md| 52 +---
 gcc/config/rs6000/rs6000.cc| 25 --
 gcc/config/rs6000/rs6000.h |  7 --
 gcc/config/rs6000/rs6000.md| 96 +++---
 gcc/testsuite/gcc.target/powerpc/prefixed-addis.c  | 24 --
 .../gcc.target/powerpc/vector-rotate-left.c| 34 
 9 files changed, 16 insertions(+), 291 deletions(-)

diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf
index 24e0fdcca388..53e9dd48bcf9 100644
--- a/gcc/ChangeLog.dmf
+++ b/gcc/ChangeLog.dmf
@@ -1,46 +1,5 @@
- Branch work182-dmf, patch #113 
-
-RFC2677-Add xvrlw support.
-
-2024-10-14  Michael Meissner  
-
-gcc/
-
-   * config/rs6000/altivec.md (xvrlw): New insn.
-   * config/rs6000/rs6000.h (TARGET_XVRLW): New macro.
-
-gcc/testsuite/
-
-   * gcc.target/powerpc/vector-rotate-left.c: New test.
-
- Branch work182-dmf, patch #112 
-
-RFC2686-Add paddis support.
-
-2024-10-14  Michael Meissner  
-
-gcc/
-
-   * config/rs6000/constraints.md (eU): New constraint.
-   (eV): Likewise.
-   * config/rs6000/predicates.md (paddis_operand): New predicate.
-   (paddis_paddi_operand): Likewise.
-   (add_operand): Add paddis support.
-   * config/rs6000/rs6000.cc (num_insns_constant_gpr): Add paddis support.
-   (num_insns_constant_multi): Likewise.
-   (print_operand): Add %B for paddis support.
-   * config/rs6000/rs6000.h (TARGET_PADDIS): New macro.
-   (SIGNED_INTEGER_32BIT_P): Likewise.
-   * config/rs6000/rs6000.md (isa attribute): Add paddis support.
-   (enabled attribute); Likewise.
-   (add3): Likewise.
-   (adddi3 splitter): New splitter for paddis.
-   (movdi_internal64): Add paddis support.
-   (movdi splitter): New splitter for paddis.
-
-gcc/testsuite/
-
-   * gcc.target/powerpc/prefixed-addis.c: New test.
+ Branch work182-dmf, patch #113 was reverted 

+ Branch work182-dmf, patch #112 was reverted 

 
  Branch work182-dmf, patch #111 
 
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index a875dc9a4ec1..00dad4b91f1c 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1983,20 +1983,6 @@
 }
   [(set_attr "type" "vecperm")])
 
-;; -mcpu=future adds a vector rotate left word variant.  There is no vector
-;; byte/half-word/double-word/quad-word rotate left.  This insn occurs before
-;; altivec_vrl and will match for -mcpu=future, while other cpus will
-;; match the generic insn.
-(define_insn "*xvrlw"
-  [(set (match_operand:V4SI 0 "register_operand" "=v,wa")
-   (rotate:V4SI (match_operand:V4SI 1 "register_operand" "v,wa")
-(match_operand:V4SI 2 "register_operand" "v,wa")))]
-  "TARGET_XVRLW"
-  "@
-   vrlw %0,%1,%2
-   xvrlw %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")])
-
 (define_insn "altivec_vrl"
   [(set (match_operand:VI2 0 "register_operand" "=v")
 (rotate:VI2 (match_operand:VI2 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 4d8d21fd6bbb..277a30a82458 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -222,16 +222,6 @@
   "An IEEE 128-bit constant that can be loaded into VSX registers."
   (match_operand 0 "easy_vector_constant_ieee128"))
 
-(define_constraint "eU"
-  "@internal integer constant that can be loaded with paddis"
-  (and (match_code "const_int")
-   (match_operand 0 "paddis_operand")))
-
-(define_constraint "eV"
-  "@internal integer constant that can be loaded with paddis + paddi"
-  (and (match_code "const_int")
-   (match_operand 0 "paddis_paddi_operand")))
-
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index f8e7df5e7f5b..2797c3cf619b 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -369,53 +369,6 @@
   return SIGNED_INTEGER_34BIT_P (INTVAL (op));
 })
 
-;; Return 1 if op is a 64-bit constant that uses the paddis instruction
-(define_predicate "paddis_operand"
-  (match_code "const_int")
-{
-  if (!TARGET_PADDIS && TARGET_POWERPC64)
-return 0;
-
-  /* If addi, addis, or paddi can handle the number, don't return true.  */
-  HOST_WIDE_

[gcc(refs/users/meissner/heads/work182-dmf)] RFC2653-Add dense math test for new instruction names.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:24421eb1af4e492cbe23ea1e77d52fe81d641e93

commit 24421eb1af4e492cbe23ea1e77d52fe81d641e93
Author: Michael Meissner 
Date:   Tue Oct 22 16:19:55 2024 -0400

RFC2653-Add dense math test for new instruction names.

2024-10-22   Michael Meissner  

gcc/testsuite/

* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.

Diff:
---
 gcc/testsuite/gcc.target/powerpc/dm-double-test.c | 194 ++
 gcc/testsuite/lib/target-supports.exp |  23 +++
 2 files changed, 217 insertions(+)

diff --git a/gcc/testsuite/gcc.target/powerpc/dm-double-test.c 
b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c
new file mode 100644
index ..66c197795856
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c
@@ -0,0 +1,194 @@
+/* Test derived from mma-double-1.c, modified for dense math.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_dense_math_ok } */
+/* { dg-options "-mdejagnu-cpu=future -O2" } */
+
+#include 
+#include 
+#include 
+
+typedef unsigned char vec_t __attribute__ ((vector_size (16)));
+typedef double v4sf_t __attribute__ ((vector_size (16)));
+#define SAVE_ACC(ACC, ldc, J)  \
+ __builtin_mma_disassemble_acc (result, ACC); \
+ rowC = (v4sf_t *) &CO[0*ldc+J]; \
+  rowC[0] += result[0]; \
+  rowC = (v4sf_t *) &CO[1*ldc+J]; \
+  rowC[0] += result[1]; \
+  rowC = (v4sf_t *) &CO[2*ldc+J]; \
+  rowC[0] += result[2]; \
+  rowC = (v4sf_t *) &CO[3*ldc+J]; \
+ rowC[0] += result[3];
+
+void
+DM (int m, int n, int k, double *A, double *B, double *C)
+{
+  __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7;
+  v4sf_t result[4];
+  v4sf_t *rowC;
+  for (int l = 0; l < n; l += 4)
+{
+  double *CO;
+  double *AO;
+  AO = A;
+  CO = C;
+  C += m * 4;
+  for (int j = 0; j < m; j += 16)
+   {
+ double *BO = B;
+ __builtin_mma_xxsetaccz (&acc0);
+ __builtin_mma_xxsetaccz (&acc1);
+ __builtin_mma_xxsetaccz (&acc2);
+ __builtin_mma_xxsetaccz (&acc3);
+ __builtin_mma_xxsetaccz (&acc4);
+ __builtin_mma_xxsetaccz (&acc5);
+ __builtin_mma_xxsetaccz (&acc6);
+ __builtin_mma_xxsetaccz (&acc7);
+ unsigned long i;
+
+ for (i = 0; i < k; i++)
+   {
+ vec_t *rowA = (vec_t *) & AO[i * 16];
+ __vector_pair rowB;
+ vec_t *rb = (vec_t *) & BO[i * 4];
+ __builtin_mma_assemble_pair (&rowB, rb[1], rb[0]);
+ __builtin_mma_xvf64gerpp (&acc0, rowB, rowA[0]);
+ __builtin_mma_xvf64gerpp (&acc1, rowB, rowA[1]);
+ __builtin_mma_xvf64gerpp (&acc2, rowB, rowA[2]);
+ __builtin_mma_xvf64gerpp (&acc3, rowB, rowA[3]);
+ __builtin_mma_xvf64gerpp (&acc4, rowB, rowA[4]);
+ __builtin_mma_xvf64gerpp (&acc5, rowB, rowA[5]);
+ __builtin_mma_xvf64gerpp (&acc6, rowB, rowA[6]);
+ __builtin_mma_xvf64gerpp (&acc7, rowB, rowA[7]);
+   }
+ SAVE_ACC (&acc0, m, 0);
+ SAVE_ACC (&acc2, m, 4);
+ SAVE_ACC (&acc1, m, 2);
+ SAVE_ACC (&acc3, m, 6);
+ SAVE_ACC (&acc4, m, 8);
+ SAVE_ACC (&acc6, m, 12);
+ SAVE_ACC (&acc5, m, 10);
+ SAVE_ACC (&acc7, m, 14);
+ AO += k * 16;
+ BO += k * 4;
+ CO += 16;
+   }
+  B += k * 4;
+}
+}
+
+void
+init (double *matrix, int row, int column)
+{
+  for (int j = 0; j < column; j++)
+{
+  for (int i = 0; i < row; i++)
+   {
+ matrix[j * row + i] = (i * 16 + 2 + j) / 0.123;
+   }
+}
+}
+
+void
+init0 (double *matrix, double *matrix1, int row, int column)
+{
+  for (int j = 0; j < column; j++)
+for (int i = 0; i < row; i++)
+  matrix[j * row + i] = matrix1[j * row + i] = 0;
+}
+
+
+void
+print (const char *name, const double *matrix, int row, int column)
+{
+  printf ("Matrix %s has %d rows and %d columns:\n", name, row, column);
+  for (int i = 0; i < row; i++)
+{
+  for (int j = 0; j < column; j++)
+   {
+ printf ("%f ", matrix[j * row + i]);
+   }
+  printf ("\n");
+}
+  printf ("\n");
+}
+
+int
+main (int argc, char *argv[])
+{
+  int rowsA, colsB, common;
+  int i, j, k;
+  int ret = 0;
+
+  for (int t = 16; t <= 128; t += 16)
+{
+  for (int t1 = 4; t1 <= 16; t1 += 4)
+   {
+ rowsA = t;
+ colsB = t1;
+ common = 1;
+ /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */
+ double A[rowsA * common];
+ double B[common * colsB];
+ double C[rowsA * colsB];
+ double D[rowsA * colsB];
+
+
+ init (A, rowsA, common);
+ init (B, common, colsB);
+ init0 (C, D, rowsA, colsB);
+ DM (rowsA, colsB, common, A, B

[gcc(refs/users/meissner/heads/work182-libs)] Add ChangeLog.libs and update REVISION.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:49a3ad314ac91e9e7c7428b72351431816d9bc96

commit 49a3ad314ac91e9e7c7428b72351431816d9bc96
Author: Michael Meissner 
Date:   Tue Oct 22 15:35:38 2024 -0400

Add ChangeLog.libs and update REVISION.

2024-10-22  Michael Meissner  

gcc/

* ChangeLog.libs: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.libs | 5 +
 gcc/REVISION   | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.libs b/gcc/ChangeLog.libs
new file mode 100644
index ..b5698fabdd6a
--- /dev/null
+++ b/gcc/ChangeLog.libs
@@ -0,0 +1,5 @@
+ Branch work182-libs, baseline 
+
+2024-10-22   Michael Meissner  
+
+   Clone branch
diff --git a/gcc/REVISION b/gcc/REVISION
index 5aaca2bd398a..4693a544e886 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work182 branch
+work182-libs branch


[gcc/meissner/heads/work182-libs] (18 commits) Merge commit 'refs/users/meissner/heads/work182-libs' of gi

2024-11-06 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work182-libs' was updated to point to:

 1c84b8027e7e... Merge commit 'refs/users/meissner/heads/work182-libs' of gi

It previously pointed to:

 1fa6af582a1a... Merge commit 'refs/users/meissner/heads/work182-libs' of gi

Diff:

Summary of changes (added commits):
---

  1c84b80... Merge commit 'refs/users/meissner/heads/work182-libs' of gi
  49a3ad3... Add ChangeLog.libs and update REVISION.
  892e05e... Update ChangeLog.* (*)
  ab781c6... Add -mcpu=future tuning support. (*)
  11869df... Add support for -mcpu=future (*)
  3457a6e... Update tests to work with architecture flags changes. (*)
  c49f8b7... Change TARGET_MODULO to TARGET_POWER9 (*)
  ef7b8de... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  b024a7e... Change TARGET_CMPB to TARGET_POWER6 (*)
  bc1106e... Change TARGET_FPRND to TARGET_POWER5X (*)
  c875852... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  1b99261... Do not allow -mvsx to boost processor to power7. (*)
  2c3a02c... Use architecture flags for defining _ARCH_PWR macros. (*)
  e2d1785... Add rs6000 architecture masks. (*)
  27f73de... Revert changes (*)
  b5d3ebf... Add rs6000 architecture masks. (*)
  132c8a7... Add rs6000 architecture masks. (*)
  848be25... Revert changes (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work182-libs' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work182-dmf)] Merge commit 'refs/users/meissner/heads/work182-dmf' of git+ssh://gcc.gnu.org/git/gcc into me/work18

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:e2b2ecbb0d9b5e2e2af5b9be297db997a32ef35d

commit e2b2ecbb0d9b5e2e2af5b9be297db997a32ef35d
Merge: 03a0a55f93bd 663944a2b488
Author: Michael Meissner 
Date:   Wed Nov 6 16:04:23 2024 -0500

Merge commit 'refs/users/meissner/heads/work182-dmf' of 
git+ssh://gcc.gnu.org/git/gcc into me/work182-dmf

Diff:


[gcc(refs/users/meissner/heads/work182-dmf)] RFC2653-Add support for dense math registers.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:f4a8690b7e069396bb98fbd689ddb098e6e1e0b8

commit f4a8690b7e069396bb98fbd689ddb098e6e1e0b8
Author: Michael Meissner 
Date:   Tue Oct 22 16:17:41 2024 -0400

RFC2653-Add support for dense math registers.

The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1 (power10).  In ISA 3.1, these accumulators overlapped 
with
the VSX registers 0..31, but logically the accumulator registers were 
separate
from the FPR registers.  In ISA 3.1, it was anticipated that in future 
systems,
the accumulator registers may no overlap with the FPR registers.  This patch
adds the support for dense math registers as separate registers.

This particular patch does not change the MMA support to use the 
accumulators
within the dense math registers.  This patch just adds the basic support for
having separate DMRs.  The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.

For testing purposes, I added an undocumented option '-mdense-math' to 
enable
or disable the dense math support.

This patch adds a new constraint (wD).  If MMA is selected but dense math is
not selected (i.e. -mcpu=power10), the wD constraint will allow access to
accumulators that overlap with VSX registers 0..31.  If both MMA and dense 
math
are selected (i.e. -mcpu=future), the wD constraint will only allow dense 
math
registers.

This patch modifies the existing %A output modifier.  If MMA is selected but
dense math is not selected, then %A output modifier converts the VSX 
register
number to the accumulator number, by dividing it by 4.  If both MMA and 
dense
math are selected, then %A will map the separate DMR registers into 0..7.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

1)  If possible, don't use extended asm, but instead use the MMA 
built-in
functions;

2)  If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;

3)  Only use the built-in zero, assemble and disassemble functions 
create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code.  The reason is these instructions assume there 
is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions.  With 
accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.

It is possible that the mangling for DMRs and the GDB register numbers may
produce other changes in the future.

gcc/

2024-10-22   Michael Meissner  

* config/rs6000/mma.md (UNSPEC_MMA_DMSETDMRZ): New unspec.
(movxo): Add comments about dense math registers.
(movxo_nodm): Rename from movxo and restrict the usage to machines
without dense math registers.
(movxo_dm): New insn for movxo support for machines with dense math
registers.
(mma_): Restrict usage to machines without dense math 
registers.
(mma_xxsetaccz): Add a define_expand wrapper, and add support for 
dense
math registers.
(mma_dmsetaccz): New insn.
* config/rs6000/predicates.md (dmr_operand): New predicate.
(accumulator_operand): Add support for dense math registers.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): 
Do
not issue a de-prime instruction when disassembling a vector quad 
on a
system with dense math registers.
* config/rs6000/rs6000-c.cc (rs6000_define_or_undefine_macro): 
Define
__DENSE_MATH__ if we have dense math registers.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
(LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
constraint.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_secondary_reload_memory): Add support for DMR registers.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(print_operand): Make %A handle both FP

[gcc(refs/users/meissner/heads/work182-dmf)] Add ChangeLog.dmf and update REVISION.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:787e71a76ad677a015e81f8c1c4f4ac3c3675dc0

commit 787e71a76ad677a015e81f8c1c4f4ac3c3675dc0
Author: Michael Meissner 
Date:   Tue Oct 22 15:32:17 2024 -0400

Add ChangeLog.dmf and update REVISION.

2024-10-22  Michael Meissner  

gcc/

* ChangeLog.dmf: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.dmf | 5 +
 gcc/REVISION  | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf
new file mode 100644
index ..2ac899d39122
--- /dev/null
+++ b/gcc/ChangeLog.dmf
@@ -0,0 +1,5 @@
+ Branch work182-dmf, baseline 
+
+2024-10-22   Michael Meissner  
+
+   Clone branch
diff --git a/gcc/REVISION b/gcc/REVISION
index 5aaca2bd398a..efb2d92aa380 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work182 branch
+work182-dmf branch


[gcc(refs/users/meissner/heads/work182-dmf)] RFC2656-Support load/store vector with right length.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:d4ca240457ded511bf36e6a9a0cf1a6015c5f7aa

commit d4ca240457ded511bf36e6a9a0cf1a6015c5f7aa
Author: Michael Meissner 
Date:   Tue Oct 22 16:22:22 2024 -0400

RFC2656-Support load/store vector with right length.

This patch adds support for new instructions that may be added to the 
PowerPC
architecture in the future to enhance the load and store vector with length
instructions.

The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to 
use
since the count for the number of bytes must be in the top 8 bits of the GPR
register, instead of the bottom 8 bits.  This meant that code generating 
these
instructions typically had to do a shift left by 56 bits to get the count 
into
the right position.  In a future version of the PowerPC architecture, new
variants of these instructions might be added that expect the count to be in
the bottom 8 bits of the GPR register.  These patches add this support to 
GCC
if the user uses the -mcpu=future option.

I discovered that the code in rs6000-string.cc to generate ISA 3.1 
lxvl/stxvl
future lxvll/stxvll instructions would generate these instructions on 
32-bit.
However the patterns for these instructions is only done on 64-bit systems. 
 So
I added a check for 64-bit support before generating the instructions.

The patches have been tested on both little and big endian systems.  Can I 
check
it into the master branch?

2024-10-22   Michael Meissner  

gcc/

* config/rs6000/rs6000-string.cc (expand_block_move): Do not 
generate
lxvl and stxvl on 32-bit.
* config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl 
with
the shift count automaticaly used in the insn.
(lxvrl): New insn for -mcpu=future.
(lxvrll): Likewise.
(stxvl): If -mcpu=future, generate the stxvl with the shift count
automaticaly used in the insn.
(stxvrl): New insn for -mcpu=future.
(stxvrll): Likewise.

gcc/testsuite/

* gcc.target/powerpc/lxvrl.c: New test.
* lib/target-supports.exp 
(check_effective_target_powerpc_future_ok):
New effective target.

Diff:
---
 gcc/config/rs6000/rs6000-string.cc   |   1 +
 gcc/config/rs6000/vsx.md | 122 +--
 gcc/testsuite/gcc.target/powerpc/lxvrl.c |  32 
 gcc/testsuite/lib/target-supports.exp|  12 +++
 4 files changed, 146 insertions(+), 21 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 3674c4bd9847..818ff10a8ac8 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2786,6 +2786,7 @@ expand_block_move (rtx operands[], bool might_overlap)
 
   if (TARGET_MMA && TARGET_BLOCK_OPS_UNALIGNED_VSX
  && TARGET_BLOCK_OPS_VECTOR_PAIR
+ && TARGET_POWERPC64
  && bytes >= 32
  && (align >= 256 || !STRICT_ALIGNMENT))
{
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b2fc39acf4e8..9a082ec21958 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5710,20 +5710,32 @@
   DONE;
 })
 
-;; Load VSX Vector with Length
+;; Load VSX Vector with Length.  If we have lxvrl, we don't have to do an
+;; explicit shift left into a pseudo.
 (define_expand "lxvl"
-  [(set (match_dup 3)
-(ashift:DI (match_operand:DI 2 "register_operand")
-   (const_int 56)))
-   (set (match_operand:V16QI 0 "vsx_register_operand")
-   (unspec:V16QI
-[(match_operand:DI 1 "gpc_reg_operand")
-  (mem:V16QI (match_dup 1))
- (match_dup 3)]
-UNSPEC_LXVL))]
+  [(use (match_operand:V16QI 0 "vsx_register_operand"))
+   (use (match_operand:DI 1 "gpc_reg_operand"))
+   (use (match_operand:DI 2 "gpc_reg_operand"))]
   "TARGET_P9_VECTOR && TARGET_64BIT"
 {
-  operands[3] = gen_reg_rtx (DImode);
+  rtx shift_len = gen_rtx_ASHIFT (DImode, operands[2], GEN_INT (56));
+  rtx len;
+
+  if (TARGET_FUTURE)
+len = shift_len;
+  else
+{
+  len = gen_reg_rtx (DImode);
+  emit_insn (gen_rtx_SET (len, shift_len));
+}
+
+  rtx dest = operands[0];
+  rtx addr = operands[1];
+  rtx mem = gen_rtx_MEM (V16QImode, addr);
+  rtvec rv = gen_rtvec (3, addr, mem, len);
+  rtx lxvl = gen_rtx_UNSPEC (V16QImode, rv, UNSPEC_LXVL);
+  emit_insn (gen_rtx_SET (dest, lxvl));
+  DONE;
 })
 
 (define_insn "*lxvl"
@@ -5747,6 +5759,34 @@
   "lxvll %x0,%1,%2"
   [(set_attr "type" "vecload")])
 
+;; For lxvrl and lxvrll, use the combiner to eliminate the shift.  The
+;; define_expand for lxvl will already incorporate the shift in generating the
+;; insn.  The lxvll buitl-in function required the user to have already done
+;; the shift.  Defining lxvrll this way, will optimize cases where the user has
+;; done the shift immediately before

[gcc(refs/users/meissner/heads/work182-dmf)] RFC2655-Add saturating subtract built-ins.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:cbd2455ef7c92a64b242c6fa3d4a976e23b80e57

commit cbd2455ef7c92a64b242c6fa3d4a976e23b80e57
Author: Michael Meissner 
Date:   Tue Oct 22 16:23:20 2024 -0400

RFC2655-Add saturating subtract built-ins.

This patch adds support for a saturating subtract built-in function that 
may be
added to a future PowerPC processor.  Note, if it is added, the name of the
built-in function may change before GCC 13 is released.  If the name 
changes,
we will submit a patch changing the name.

I also added support for providing dense math built-in functions, even 
though
at present, we have not added any new built-in functions for dense math.  
It is
likely we will want to add new dense math built-in functions as the dense 
math
support is fleshed out.

The patches have been tested on both little and big endian systems.  Can I 
check
it into the master branch?

2024-10-22   Michael Meissner  

gcc/

* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add 
support
for flagging invalid use of future built-in functions.
(rs6000_builtin_is_supported): Add support for future built-in
functions.
* config/rs6000/rs6000-builtins.def 
(__builtin_saturate_subtract32): New
built-in function for -mcpu=future.
(__builtin_saturate_subtract64): Likewise.
* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add 
stanzas
for -mcpu=future built-ins.
(stanza_map): Likewise.
(enable_string): Likewise.
(struct attrinfo): Likewise.
(parse_bif_attrs): Likewise.
(write_decls): Likewise.
* config/rs6000/rs6000.md (sat_sub3): Add saturating subtract
built-in insn declarations.
(sat_sub3_dot): Likewise.
(sat_sub3_dot2): Likewise.
* doc/extend.texi (Future PowerPC built-ins): New section.

gcc/testsuite/

* gcc.target/powerpc/subfus-1.c: New test.
* gcc.target/powerpc/subfus-2.c: Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtin.cc | 17 
 gcc/config/rs6000/rs6000-builtins.def   | 10 +
 gcc/config/rs6000/rs6000-gen-builtins.cc| 35 ++---
 gcc/config/rs6000/rs6000.md | 60 +
 gcc/doc/extend.texi | 24 
 gcc/testsuite/gcc.target/powerpc/subfus-1.c | 32 +++
 gcc/testsuite/gcc.target/powerpc/subfus-2.c | 32 +++
 7 files changed, 205 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 8e4335e9b44f..a5f33eb9da18 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -139,6 +139,17 @@ rs6000_invalid_builtin (enum rs6000_gen_builtins fncode)
 case ENB_MMA:
   error ("%qs requires the %qs option", name, "-mmma");
   break;
+case ENB_FUTURE:
+  error ("%qs requires the %qs option", name, "-mcpu=future");
+  break;
+case ENB_FUTURE_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=future", "-m64", "-mpowerpc64");
+  break;
+case ENB_DM:
+  error ("%qs requires the %qs or %qs options", name, "-mcpu=future",
+"-mdense-math");
+  break;
 default:
 case ENB_ALWAYS:
   gcc_unreachable ();
@@ -194,6 +205,12 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
   return TARGET_HTM;
 case ENB_MMA:
   return TARGET_MMA;
+case ENB_FUTURE:
+  return TARGET_FUTURE;
+case ENB_FUTURE_64:
+  return TARGET_FUTURE && TARGET_POWERPC64;
+case ENB_DM:
+  return TARGET_DENSE_MATH;
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 0e9dc05dbcff..7d47dc4e402c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -137,6 +137,8 @@
 ;   endian   Needs special handling for endianness
 ;   ibmldRestrict usage to the case when TFmode is IBM-128
 ;   ibm128   Restrict usage to the case where __ibm128 is supported or if ibmld
+;   future   Restrict usage to future instructions
+;   dm   Restrict usage to dense math
 ;
 ; Each attribute corresponds to extra processing required when
 ; the built-in is expanded.  All such special processing should
@@ -3933,3 +3935,11 @@
 
   void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
 STXVP nothing {mma,pair}
+
+[future]
+  const signed int __builtin_saturate_subtract32 (signed int, signed int);
+  SAT_SUBSI sat_subsi3 {}
+
+[future-64]
+  const signed long __builtin_saturate_subtract64 (signed long,  signed long);
+  SAT_SUBDI sat_subdi3 {}
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.cc 
b/gcc/conf

[gcc(refs/users/meissner/heads/work182-dmf)] Use vector pair load/store for memcpy with -mcpu=future

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:52b2990b5a242158040a5f19add5611438449487

commit 52b2990b5a242158040a5f19add5611438449487
Author: Michael Meissner 
Date:   Tue Oct 22 16:15:45 2024 -0400

Use vector pair load/store for memcpy with -mcpu=future

In the development for the power10 processor, GCC did not enable using the 
load
vector pair and store vector pair instructions when optimizing things like
memory copy.  This patch enables using those instructions if -mcpu=future is
used.

2024-10-22  Michael Meissner  

gcc/

* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable 
using
load vector pair and store vector pair instructions for memory copy
operations.
(POWERPC_MASKS): Make the bit for enabling using load vector pair 
and
store vector pair operations set and reset when the PowerPC 
processor is
changed.

Diff:
---
 gcc/config/rs6000/rs6000-cpus.def | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index e73d9ef51f8d..74151be40484 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -86,7 +86,8 @@
 
 #define POWER11_MASKS_SERVER   ISA_3_1_MASKS_SERVER
 
-#define FUTURE_MASKS_SERVERPOWER11_MASKS_SERVER
+#define FUTURE_MASKS_SERVER(POWER11_MASKS_SERVER   \
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)
 
 /* Flags that need to be turned off if -mno-vsx.  */
 #define OTHER_VSX_VECTOR_MASKS (OPTION_MASK_EFFICIENT_UNALIGNED_VSX\
@@ -116,6 +117,7 @@
 
 /* Mask of all options to set the default isa flags based on -mcpu=.  */
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \


[gcc(refs/users/meissner/heads/work182-dmf)] RFC2653-Add wD constraint.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:d8e9a7370ba52ccd5fb31e264e3a918bcdc9f121

commit d8e9a7370ba52ccd5fb31e264e3a918bcdc9f121
Author: Michael Meissner 
Date:   Tue Oct 22 16:16:43 2024 -0400

RFC2653-Add wD constraint.

This patch adds a new constraint ('wD') that matches the accumulator 
registers
that overlap with VSX registers 0..31 on power10.  Future patches will add 
the
support for a separate accumulator register class that will be used when the
support for dense math registes is added.

2024-10-22   Michael Meissner  

* config/rs6000/constraints.md (wD): New constraint.
* config/rs6000/mma.md (mma_): Prepare for alternate 
accumulator
registers.  Use wD constraint instead of 'd' constraint.  Use
accumulator_operand instead of fpr_reg_operand.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")]
MMA_ACC))]
   "TARGET_MMA"
   " %A0"
@@ -523,7 +523,7 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
MMA_VV))]
@@ -532,8 +532,8 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
MMA_AVV))]
@@ -542,7 +542,7 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
MMA_PV))]
@@ -551,8 +551,8 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0")
(match_operand:OO 2 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
MMA_APV))]
@@ -561,7 +561,7 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
(match_operand:SI 3 "const_0_to_15_operand" "n,n")
@@ -574,8 +574,8 @@
(set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
(match_operand:SI 4 "const_0_to_15_operand" "n,n")
@@ -588,7 +588,7 @@
(set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
(match_operand:SI 3 "const_0_to_15_operand" "n,n")
@@ -601,8 +601,8 @@
(set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0")
   

[gcc(refs/users/meissner/heads/work182-dmf)] RFC2653-PowerPC: Switch to dense math names for all MMA operations.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:d340b8928626c8847c0e5f3a84cf5d0be3f995d1

commit d340b8928626c8847c0e5f3a84cf5d0be3f995d1
Author: Michael Meissner 
Date:   Tue Oct 22 16:18:43 2024 -0400

RFC2653-PowerPC: Switch to dense math names for all MMA operations.

This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense 
math
system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
same bits for either spelling.

For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the
instruction.  However, the prefixed instructions have a 'pm' prefix, and we 
add
the 'dm' prefix afterwards.  To prevent having two sets of parallel int
attributes, we remove the "pm" prefix from the instruction string in the
attributes, and add it later, both in the insn name and in the output 
template.

2024-10-22   Michael Meissner  

gcc/

* config/rs6000/mma.md (vvi4i4i8): Change the instruction to not 
have a
"pm" prefix.
(avvi4i4i8): Likewise.
(vvi4i4i2): Likewise.
(avvi4i4i2): Likewise.
(vvi4i4): Likewise.
(avvi4i4): Likewise.
(pvi4i2): Likewise.
(apvi4i2): Likewise.
(vvi4i4i4): Likewise.
(avvi4i4i4): Likewise.
(mma_): Add support for running on DMF systems, generating the 
dense
math instruction and using the dense math accumulators.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_pm): Add support for running on DMF systems, 
generating
the dense math instruction and using the dense math accumulators.
Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm'
prefixes based on whether we have the original MMA specification or 
if
we have dense math support.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.

Diff:
---
 gcc/config/rs6000/mma.md | 157 +++
 1 file changed, 104 insertions(+), 53 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index ae6e7e9695be..2e04eb653fa6 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -225,44 +225,47 @@
 (UNSPEC_MMA_XVF64GERNP "xvf64gernp")
 (UNSPEC_MMA_XVF64GERNN "xvf64gernn")])
 
-(define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
+;; The "pm" prefix is not in these expansions, so that we can generate
+;; pmdmxvi4ger8 on systems with dense math registers and xvi4ger8 on systems
+;; without dense math registers.
+(define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")])
 
-(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   
"pmxvi4ger8pp")])
+(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   "xvi4ger8pp")])
 
-(define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2")
-(UNSPEC_MMA_PMXVI16GER2S   "pmxvi16ger2s")
-(UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2")
-(UNSPEC_MMA_PMXVBF16GER2   
"pmxvbf16ger2")])
+(define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"xvi16ger2")
+(UNSPEC_MMA_PMXVI16GER2S   "xvi16ger2s")
+(UNSPEC_MMA_PMXVF16GER2"xvf16ger2")
+(UNSPEC_MMA_PMXVBF16GER2   "xvbf16ger2")])
 
-(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
-(UNSPEC_MMA_PMXVI16GER2SPP 
"pmxvi16ger2spp")
-(UNSPEC_MMA_PMXVF16GER2PP  "pmxvf16ger2pp")
-(UNSPEC_MMA_PMXVF16GER2PN  "pmxvf16ger2pn")
-(UNSPEC_MMA_PMXVF16GER2NP  "pmxvf16ger2np")
-(UNSPEC_MMA_PMXVF16GER2NN  "pmxvf16ger2nn")
-(UNSPEC_MMA_PMXVBF16GER2PP 
"pmxvbf16ger2pp")
-(UNSPEC_MMA_PMXVBF16GER2PN 
"pmxvbf16ger2pn")
-(UNSPEC_MMA_PMXVBF16GER2NP 
"pmxvbf16ger2np")
-(UNSPEC_MMA_PMXVBF16GER2NN 
"pmxvbf16ger2nn")])
+(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "xvi16ger2pp")
+(UNSPEC_MMA_PMXVI16GER2SPP "xvi16ger2spp")
+(UNSPEC_MMA_PMXVF16GER2PP  "xvf16ger2pp")
+(UNSPEC_MMA_PMXVF16GER2PN  "xvf

[gcc(refs/users/meissner/heads/work182-libs)] Merge commit 'refs/users/meissner/heads/work182-libs' of git+ssh://gcc.gnu.org/git/gcc into me/work1

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:1c84b8027e7ef00d0e26ba0656ad6239b419c1d9

commit 1c84b8027e7ef00d0e26ba0656ad6239b419c1d9
Merge: 49a3ad314ac9 1fa6af582a1a
Author: Michael Meissner 
Date:   Wed Nov 6 16:05:58 2024 -0500

Merge commit 'refs/users/meissner/heads/work182-libs' of 
git+ssh://gcc.gnu.org/git/gcc into me/work182-libs

Diff:


[gcc(refs/users/meissner/heads/work182-sha)] Initial support for adding xxeval fusion support.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:65a8dfdbe04ed3db254f267ffccf9353e4352a0a

commit 65a8dfdbe04ed3db254f267ffccf9353e4352a0a
Author: Michael Meissner 
Date:   Tue Oct 22 19:35:27 2024 -0400

Initial support for adding xxeval fusion support.

2024-10-16  Michael Meissner  

gcc/

PR target/117251
* config/rs6000/fusion.md: Regenerate.
* config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to
generate vector/vector logical fusion if XXEVAL supports the fusion.
* config/rs6000/predicates.md (vector_fusion_operand): New 
predicate.
* config/rs6000/rs6000.cc (rs6000_opt_vars): Add -mxxeval.
* config/rs6000/rs6000.md (isa attribute): Add xxeval.
(enabled attribute): Add support for -mxxeval.
* config/rs6000/rs6000.opt (-mxxeval): New switch.

gcc/testsuite/

PR target/117251
* gcc.target/powerpc/p10-vector-fused-1.c: New test.
* gcc.target/powerpc/p10-vector-fused-2.c: Likewise.
* gcc.target/powerpc/xxeval-1.c: Likewise.
* gcc.target/powerpc/xxeval-2.c: Likewise.

Diff:
---
 gcc/config/rs6000/fusion.md| 660 +--
 gcc/config/rs6000/genfusion.pl | 102 ++-
 gcc/config/rs6000/predicates.md|  14 +-
 gcc/config/rs6000/rs6000.cc|   3 +
 gcc/config/rs6000/rs6000.md|   7 +-
 gcc/config/rs6000/rs6000.opt   |   4 +
 .../gcc.target/powerpc/p10-vector-fused-1.c| 409 +
 .../gcc.target/powerpc/p10-vector-fused-2.c| 936 +
 8 files changed, 1865 insertions(+), 270 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index 4ed9ae1d69f4..215a3aae074f 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -1871,146 +1871,170 @@
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vand -> vand
 (define_insn "*fuse_vand_vand"
-  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
-(and:VM (and:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v")
-  (match_operand:VM 1 "altivec_register_operand" 
"%v,v,v,v"))
- (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
-   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
+  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v")
+(and:VM (and:VM (match_operand:VM 0 "vector_fusion_operand" 
"v,v,v,wa,v")
+  (match_operand:VM 1 "vector_fusion_operand" 
"%v,v,v,wa,v"))
+ (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v")))
+   (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))]
   "(TARGET_P10_FUSION)"
   "@
vand %3,%1,%0\;vand %3,%3,%2
vand %3,%1,%0\;vand %3,%3,%2
vand %3,%1,%0\;vand %3,%3,%2
+   xxeval %x3,%x2,%x1,%x0,1
vand %4,%1,%0\;vand %3,%4,%2"
   [(set_attr "type" "fused_vector")
(set_attr "cost" "6")
-   (set_attr "length" "8")])
+   (set_attr "length" "8")
+   (set_attr "prefixed" "*,*,*,yes,*")
+   (set_attr "isa" "*,*,*,xxeval,*")])
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vandc -> vand
 (define_insn "*fuse_vandc_vand"
-  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
-(and:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v"))
-  (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
- (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
-   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
+  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v")
+(and:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" 
"v,v,v,wa,v"))
+  (match_operand:VM 1 "vector_fusion_operand" 
"v,v,v,wa,v"))
+ (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v")))
+   (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))]
   "(TARGET_P10_FUSION)"
   "@
vandc %3,%1,%0\;vand %3,%3,%2
vandc %3,%1,%0\;vand %3,%3,%2
vandc %3,%1,%0\;vand %3,%3,%2
+   xxeval %x3,%x2,%x1,%x0,2
vandc %4,%1,%0\;vand %3,%4,%2"
   [(set_attr "type" "fused_vector")
(set_attr "cost" "6")
-   (set_attr "length" "8")])
+   (set_attr "length" "8")
+   (set_attr "prefixed" "*,*,*,yes,*")
+   (set_attr "isa" "*,*,*,xxeval,*")])
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector veqv -> vand
 (define_insn "*fuse_veqv_vand"
-  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
-(and:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v")
-  (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v")))
- (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
-   (clobber (match_scratch:VM 4 "

[gcc(refs/users/meissner/heads/work182-sha)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:95e4e093c98f18dd4cdbb26a0a6d2f4929e5b2a9

commit 95e4e093c98f18dd4cdbb26a0a6d2f4929e5b2a9
Author: Michael Meissner 
Date:   Tue Oct 22 19:37:19 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.sha | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
index d33f88b871de..402ab7534d33 100644
--- a/gcc/ChangeLog.sha
+++ b/gcc/ChangeLog.sha
@@ -1,5 +1,40 @@
+ Branch work182-sha, patch #400 
+
+Initial support for adding xxeval fusion support.
+
+2024-10-16  Michael Meissner  
+
+gcc/
+
+   PR target/117251
+   * config/rs6000/fusion.md: Regenerate.
+   * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to
+   generate vector/vector logical fusion if XXEVAL supports the fusion.
+   * config/rs6000/predicates.md (vector_fusion_operand): New predicate.
+   * config/rs6000/rs6000.cc (rs6000_opt_vars): Add -mxxeval.
+   * config/rs6000/rs6000.md (isa attribute): Add xxeval.
+   (enabled attribute): Add support for -mxxeval.
+   * config/rs6000/rs6000.opt (-mxxeval): New switch.
+
+gcc/testsuite/
+
+   PR target/117251
+   * gcc.target/powerpc/p10-vector-fused-1.c: New test.
+   * gcc.target/powerpc/p10-vector-fused-2.c: Likewise.
+   * gcc.target/powerpc/xxeval-1.c: Likewise.
+   * gcc.target/powerpc/xxeval-2.c: Likewise.
+
  Branch work182-sha, baseline 
 
+Add ChangeLog.sha and update REVISION.
+
+2024-10-22  Michael Meissner  
+
+gcc/
+
+   * ChangeLog.sha: New file for branch.
+   * REVISION: Update.
+
 2024-10-22   Michael Meissner  
 
Clone branch


[gcc(refs/users/meissner/heads/work182-sha)] Revert changes

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:89f86a5ddfa57a721f7ca5776e554b91332b5a0b

commit 89f86a5ddfa57a721f7ca5776e554b91332b5a0b
Author: Michael Meissner 
Date:   Thu Oct 24 12:11:15 2024 -0400

Revert changes

Diff:
---
 gcc/config/rs6000/altivec.md   |  35 +-
 gcc/config/rs6000/predicates.md|  26 -
 gcc/config/rs6000/rs6000.h |   3 -
 gcc/config/rs6000/rs6000.md|   6 +-
 .../gcc.target/powerpc/p10-vector-fused-1.c| 409 -
 .../gcc.target/powerpc/p10-vector-fused-2.c| 936 -
 .../gcc.target/powerpc/vector-rotate-left.c|  34 -
 7 files changed, 5 insertions(+), 1444 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index d4ee50322ca1..00dad4b91f1c 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1983,39 +1983,12 @@
 }
   [(set_attr "type" "vecperm")])
 
-;; -mcpu=future adds a vector rotate left word variant.  There is no vector
-;; byte/half-word/double-word/quad-word rotate left.  This insn occurs before
-;; altivec_vrl and will match for -mcpu=future, while other cpus will
-;; match the generic insn.
-;; However for testing, allow other xvrl variants.  In particular, XVRLD for
-;; the sha3 tests for multibuf/singlebuf.
 (define_insn "altivec_vrl"
-  [(set (match_operand:VI2 0 "register_operand" "=v,wa")
-(rotate:VI2 (match_operand:VI2 1 "register_operand" "v,wa")
-   (match_operand:VI2 2 "register_operand" "v,wa")))]
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+(rotate:VI2 (match_operand:VI2 1 "register_operand" "v")
+   (match_operand:VI2 2 "register_operand" "v")))]
   ""
-  "@
-   vrl %0,%1,%2
-   xvrl %x0,%x1,%x2"
-  [(set_attr "type" "vecsimple")
-   (set_attr "isa" "*,xvrlw")])
-
-(define_insn "*altivec_vrl_immediate"
-  [(set (match_operand:VI2 0 "register_operand" "=wa,wa,wa,wa")
-   (rotate:VI2 (match_operand:VI2 1 "register_operand" "wa,wa,wa,wa")
-   (match_operand:VI2 2 "vector_shift_immediate" 
"j,wM,wE,wS")))]
-  "TARGET_XVRLW && "
-{
-  rtx op2 = operands[2];
-  int value = 256;
-  int num_insns = -1;
-
-  if (!xxspltib_constant_p (op2, mode, &num_insns, &value))
-gcc_unreachable ();
-
-  operands[3] = GEN_INT (value & 0xff);
-  return "xvrli %x0,%x1,%3";
-}
+  "vrl %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "altivec_vrlq"
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index fccfbd7e4904..1d95e34557e5 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -728,32 +728,6 @@
   return num_insns == 1;
 })
 
-;; Return 1 if the operand is a CONST_VECTOR whose elements are all the
-;; same and the elements can be an immediate shift or rotate factor
-(define_predicate "vector_shift_immediate"
-  (match_code "const_vector,vec_duplicate,const_int")
-{
-  int value = 256;
-  int num_insns = -1;
-
-  if (zero_constant (op, mode) || all_ones_constant (op, mode))
-return true;
-
-  if (!xxspltib_constant_p (op, mode, &num_insns, &value))
-return false;
-
-  switch (mode)
-{
-case V16QImode: return IN_RANGE (value, 0, 7);
-case V8HImode:  return IN_RANGE (value, 0, 15);
-case V4SImode:  return IN_RANGE (value, 0, 31);
-case V2DImode:  return IN_RANGE (value, 0, 63);
-default:break;
-}
-
-  return false;
-})
-  
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 1a168c2c9596..8cfd9faf77dc 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -581,9 +581,6 @@ extern int rs6000_vector_align[];
below.  */
 #define RS6000_FN_TARGET_INFO_HTM 1
 
-/* Whether we have XVRLW support.  */
-#define TARGET_XVRLW   TARGET_FUTURE
-
 /* Whether the various reciprocal divide/square root estimate instructions
exist, and whether we should automatically generate code for the instruction
by default.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 420f20d4524b..68fbfec95546 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -369,7 +369,7 @@
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8,p8v,p9,p9v,p9kf,p9tf,p10,xxeval,xvrlw"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8,p8v,p9,p9v,p9kf,p9tf,p10,xxeval"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -426,10 +426,6 @@
  (match_test "TARGET_PREFIXED && TARGET_XXEVAL"))
  (const_int 1)
 
- (and (eq_attr "isa" "xvrlw")
- (match_test "TARGET_XVRLW"))
- (const_int 1)
-
 ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff -

[gcc/meissner/heads/work182-sha] (34 commits) Merge commit 'refs/users/meissner/heads/work182-sha' of git

2024-11-06 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work182-sha' was updated to point to:

 cc318ac99f83... Merge commit 'refs/users/meissner/heads/work182-sha' of git

It previously pointed to:

 c5a9703abe8d... Update ChangeLog.*

Diff:

Summary of changes (added commits):
---

  cc318ac... Merge commit 'refs/users/meissner/heads/work182-sha' of git
  8800ae7... Update ChangeLog.*
  168741b... Add p-future target-supports.exp
  3390127... Update ChangeLog.*
  344b41a... Update ChangeLog.*
  5572897... Update ChangeLog.*
  b1cf258... Update ChangeLog.*
  442f717... Add potential p-future XVRLD and XVRLDI instructions.
  b9352ef... PR target/117251: Add PowerPC XXEVAL support to speed up SH
  be9aeed... Revert changes
  5c03d03... Revert changes
  89f86a5... Revert changes
  f3cae27... Update ChangeLog.*
  ad14425... Add missing test.
  de89bc1... Add potential p-future XVRLD and XVRLDI instructions.
  95e4e09... Update ChangeLog.*
  65a8dfd... Initial support for adding xxeval fusion support.
  72ecf7c... Add ChangeLog.sha and update REVISION.
  892e05e... Update ChangeLog.* (*)
  ab781c6... Add -mcpu=future tuning support. (*)
  11869df... Add support for -mcpu=future (*)
  3457a6e... Update tests to work with architecture flags changes. (*)
  c49f8b7... Change TARGET_MODULO to TARGET_POWER9 (*)
  ef7b8de... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  b024a7e... Change TARGET_CMPB to TARGET_POWER6 (*)
  bc1106e... Change TARGET_FPRND to TARGET_POWER5X (*)
  c875852... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  1b99261... Do not allow -mvsx to boost processor to power7. (*)
  2c3a02c... Use architecture flags for defining _ARCH_PWR macros. (*)
  e2d1785... Add rs6000 architecture masks. (*)
  27f73de... Revert changes (*)
  b5d3ebf... Add rs6000 architecture masks. (*)
  132c8a7... Add rs6000 architecture masks. (*)
  848be25... Revert changes (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work182-sha' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work182-sha)] Add potential p-future XVRLD and XVRLDI instructions.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:de89bc11125fc705cda16d419cea11d45c1fb3ec

commit de89bc11125fc705cda16d419cea11d45c1fb3ec
Author: Michael Meissner 
Date:   Wed Oct 23 13:26:49 2024 -0400

Add potential p-future XVRLD and XVRLDI instructions.

2024-10-16  Michael Meissner  

gcc/

* config/rs6000/altivec.md (altivec_vrl): Add support for a
possible XVRLD instruction in the future.
(altivec_vrl_immediate): New insns.
* config/rs6000/predicates.md (vector_shift_immediate): New 
predicate.
* config/rs6000/rs6000.h (TARGET_XVRLW): New macro.
* config/rs6000/rs6000.md (isa attribute): Add xvrlw.
(enabled attribute): Add support for xvrlw.

Diff:
---
 gcc/config/rs6000/altivec.md| 35 +++
 gcc/config/rs6000/predicates.md | 26 ++
 gcc/config/rs6000/rs6000.h  |  3 +++
 gcc/config/rs6000/rs6000.md |  6 +-
 4 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 00dad4b91f1c..d4ee50322ca1 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1983,12 +1983,39 @@
 }
   [(set_attr "type" "vecperm")])
 
+;; -mcpu=future adds a vector rotate left word variant.  There is no vector
+;; byte/half-word/double-word/quad-word rotate left.  This insn occurs before
+;; altivec_vrl and will match for -mcpu=future, while other cpus will
+;; match the generic insn.
+;; However for testing, allow other xvrl variants.  In particular, XVRLD for
+;; the sha3 tests for multibuf/singlebuf.
 (define_insn "altivec_vrl"
-  [(set (match_operand:VI2 0 "register_operand" "=v")
-(rotate:VI2 (match_operand:VI2 1 "register_operand" "v")
-   (match_operand:VI2 2 "register_operand" "v")))]
+  [(set (match_operand:VI2 0 "register_operand" "=v,wa")
+(rotate:VI2 (match_operand:VI2 1 "register_operand" "v,wa")
+   (match_operand:VI2 2 "register_operand" "v,wa")))]
   ""
-  "vrl %0,%1,%2"
+  "@
+   vrl %0,%1,%2
+   xvrl %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "isa" "*,xvrlw")])
+
+(define_insn "*altivec_vrl_immediate"
+  [(set (match_operand:VI2 0 "register_operand" "=wa,wa,wa,wa")
+   (rotate:VI2 (match_operand:VI2 1 "register_operand" "wa,wa,wa,wa")
+   (match_operand:VI2 2 "vector_shift_immediate" 
"j,wM,wE,wS")))]
+  "TARGET_XVRLW && "
+{
+  rtx op2 = operands[2];
+  int value = 256;
+  int num_insns = -1;
+
+  if (!xxspltib_constant_p (op2, mode, &num_insns, &value))
+gcc_unreachable ();
+
+  operands[3] = GEN_INT (value & 0xff);
+  return "xvrli %x0,%x1,%3";
+}
   [(set_attr "type" "vecsimple")])
 
 (define_insn "altivec_vrlq"
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 1d95e34557e5..fccfbd7e4904 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -728,6 +728,32 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR whose elements are all the
+;; same and the elements can be an immediate shift or rotate factor
+(define_predicate "vector_shift_immediate"
+  (match_code "const_vector,vec_duplicate,const_int")
+{
+  int value = 256;
+  int num_insns = -1;
+
+  if (zero_constant (op, mode) || all_ones_constant (op, mode))
+return true;
+
+  if (!xxspltib_constant_p (op, mode, &num_insns, &value))
+return false;
+
+  switch (mode)
+{
+case V16QImode: return IN_RANGE (value, 0, 7);
+case V8HImode:  return IN_RANGE (value, 0, 15);
+case V4SImode:  return IN_RANGE (value, 0, 31);
+case V2DImode:  return IN_RANGE (value, 0, 63);
+default:break;
+}
+
+  return false;
+})
+  
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 8cfd9faf77dc..1a168c2c9596 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -581,6 +581,9 @@ extern int rs6000_vector_align[];
below.  */
 #define RS6000_FN_TARGET_INFO_HTM 1
 
+/* Whether we have XVRLW support.  */
+#define TARGET_XVRLW   TARGET_FUTURE
+
 /* Whether the various reciprocal divide/square root estimate instructions
exist, and whether we should automatically generate code for the instruction
by default.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 68fbfec95546..420f20d4524b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -369,7 +369,7 @@
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8,p8v,p9,p9v,p9kf,p9tf,p10,xxeval"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8,p8v,p9,p9v,p9kf,p9tf,p10,xxeval,xvrlw"
   (const_string "any"))
 
 ;; Is this alternative enabled for

[gcc(refs/users/meissner/heads/work182-sha)] Add missing test.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:ad14425132290755acf69e2f7673abb0518bb764

commit ad14425132290755acf69e2f7673abb0518bb764
Author: Michael Meissner 
Date:   Wed Oct 23 13:30:07 2024 -0400

Add missing test.

2024-10-16  Michael Meissner  

gcc/testsuite/

* gcc.target/powerpc/vector-rotate-left.c: New test.

Diff:
---
 .../gcc.target/powerpc/vector-rotate-left.c| 34 ++
 1 file changed, 34 insertions(+)

diff --git a/gcc/testsuite/gcc.target/powerpc/vector-rotate-left.c 
b/gcc/testsuite/gcc.target/powerpc/vector-rotate-left.c
new file mode 100644
index ..5a5f37755077
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-rotate-left.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-mdejagnu-cpu=future -O2" } */
+
+/* Test whether the xvrl (vector word rotate left using VSX registers insead of
+   Altivec registers is generated.  */
+
+#include 
+
+typedef vector unsigned int  v4si_t;
+
+v4si_t
+rotl_v4si_scalar (v4si_t x, unsigned long n)
+{
+  __asm__ (" # %x0" : "+f" (x));
+  return (x << n) | (x >> (32 - n));   /* xvrlw.  */
+}
+
+v4si_t
+rotr_v4si_scalar (v4si_t x, unsigned long n)
+{
+  __asm__ (" # %x0" : "+f" (x));
+  return (x >> n) | (x << (32 - n));   /* xvrlw.  */
+}
+
+v4si_t
+rotl_v4si_vector (v4si_t x, v4si_t y)
+{
+  __asm__ (" # %x0" : "+f" (x));   /* xvrlw.  */
+  return vec_rl (x, y);
+}
+
+/* { dg-final { scan-assembler-times {\mxvrlw\M} 3  } } */


[gcc(refs/users/meissner/heads/work182-sha)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:f3cae2735d9e6e1a5470b93ad2e0d3e7

commit f3cae2735d9e6e1a5470b93ad2e0d3e7
Author: Michael Meissner 
Date:   Wed Oct 23 13:31:23 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.sha | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
index 402ab7534d33..fe43d0cb19a8 100644
--- a/gcc/ChangeLog.sha
+++ b/gcc/ChangeLog.sha
@@ -1,3 +1,29 @@
+ Branch work182-sha, patch #402 
+
+Add missing test.
+
+2024-10-16  Michael Meissner  
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/vector-rotate-left.c: New test.
+
+ Branch work182-sha, patch #401 
+
+Add potential p-future XVRLD and XVRLDI instructions.
+
+2024-10-16  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/altivec.md (altivec_vrl): Add support for a
+   possible XVRLD instruction in the future.
+   (altivec_vrl_immediate): New insns.
+   * config/rs6000/predicates.md (vector_shift_immediate): New predicate.
+   * config/rs6000/rs6000.h (TARGET_XVRLW): New macro.
+   * config/rs6000/rs6000.md (isa attribute): Add xvrlw.
+   (enabled attribute): Add support for xvrlw.
+
  Branch work182-sha, patch #400 
 
 Initial support for adding xxeval fusion support.


[gcc(refs/users/meissner/heads/work182-sha)] Add ChangeLog.sha and update REVISION.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:72ecf7cbf38ecba7fb65ab4a51ebaabaaadf197b

commit 72ecf7cbf38ecba7fb65ab4a51ebaabaaadf197b
Author: Michael Meissner 
Date:   Tue Oct 22 15:36:24 2024 -0400

Add ChangeLog.sha and update REVISION.

2024-10-22  Michael Meissner  

gcc/

* ChangeLog.sha: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.sha | 5 +
 gcc/REVISION  | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
new file mode 100644
index ..d33f88b871de
--- /dev/null
+++ b/gcc/ChangeLog.sha
@@ -0,0 +1,5 @@
+ Branch work182-sha, baseline 
+
+2024-10-22   Michael Meissner  
+
+   Clone branch
diff --git a/gcc/REVISION b/gcc/REVISION
index 5aaca2bd398a..bfd9a1d35726 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work182 branch
+work182-sha branch


[gcc(refs/users/meissner/heads/work182-sha)] Revert changes

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:be9aeedf712fb457ca2603622a6e2ea4c5b6fbb2

commit be9aeedf712fb457ca2603622a6e2ea4c5b6fbb2
Author: Michael Meissner 
Date:   Thu Oct 24 12:15:54 2024 -0400

Revert changes

Diff:
---
 gcc/testsuite/gcc.target/powerpc/p10-vector-fused-1.c | 0
 gcc/testsuite/gcc.target/powerpc/p10-vector-fused-2.c | 0
 2 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/p10-vector-fused-1.c 
b/gcc/testsuite/gcc.target/powerpc/p10-vector-fused-1.c
deleted file mode 100644
index e69de29bb2d1..
diff --git a/gcc/testsuite/gcc.target/powerpc/p10-vector-fused-2.c 
b/gcc/testsuite/gcc.target/powerpc/p10-vector-fused-2.c
deleted file mode 100644
index e69de29bb2d1..


[gcc(refs/users/meissner/heads/work182-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:b9352efddd99af7512450ea57cbf3581cdabeb4a

commit b9352efddd99af7512450ea57cbf3581cdabeb4a
Author: Michael Meissner 
Date:   Thu Oct 24 12:21:09 2024 -0400

PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations

The multibuff.c benchmark attached to the PR target/117251 compiled for 
Power10
PowerPC that implement SHA3 has a slowdown in the current trunk and GCC 14
compared to GCC 11 - GCC 13, due to excessive amounts of spilling.

The main function for the multibuf.c file has 3,747 lines, all of which are
using vector unsigned long long.  There are 696 vector rotates (all rotates 
are
constant), 1,824 vector xor's and 600 vector andc's.

In looking at it, the main thing that steps out is the reason for either
spilling or moving variables is the support in fusion.md (generated by
genfusion.pl) that tries to fuse the vec_andc feeding into vec_xor, and 
other
vec_xor's feeding into vec_xor.

On the powerpc for power10, there is a special fusion mode that happens if 
the
machine has a VANDC or VXOR instruction that is adjacent to a VXOR 
instruction
and the VANDC/VXOR feeds into the 2nd VXOR instruction.

While the Power10 has 64 vector registers (which uses the XXL prefix to do
logical operations), the fusion only works with the older Altivec 
instruction
set (which uses the V prefix).  The Altivec instruction only has 32 vector
registers (which are overlaid over the VSX vector registers 32-63).

By having the combiner patterns fuse_vandc_vxor and fuse_vxor_vxor to do 
this
fusion, it means that the register allocator has more register pressure for 
the
traditional Altivec registers instead of the VSX registers.

In addition, since there are vector rotates, these rotates only work on the
traditional Altivec registers, which adds to the Altivec register pressure.

Finally in addition to doing the explicit xor, andc, and rotates using the
Altivec registers, we have to also load vector constants for the rotate 
amount
and these registers also are allocated as Altivec registers.

Current trunk and GCC 12-14 have more vector spills than GCC 11, but GCC 11 
has
many more vector moves that the later compilers.  Thus even though it has 
way
less spills, the vector moves are why GCC 11 have the slowest results.

There is an instruction that was added in power10 (XXEVAL) that does provide
fusion between VSX vectors that includes ANDC->XOR and XOR->XOR fusion.

The latency of XXEVAL is slightly more than the fused VANDC/VXOR or 
VXOR/VXOR,
so I have written the patch to prefer doing the Altivec instructions if they
don't need a temporary register.

Here are the results for adding support for XXEVAL for the multibuff.c
benchmark attached to the PR.  Note that we essentially recover the speed 
with
this patch that were lost with GCC 14 and the current trunk:

  XXEVALTrunk   GCC14   GCC13   GCC12
GCC11
  ---   -   -   -
-
Benchmark time in seconds   5.53 6.156.265.575.61 
9.56

Fuse VANDC -> VXOR   209 600  600 600 600   
600
Fuse VXOR -> VXOR  0 240  240 120 120   
120
XXEVAL to fuse ANDC -> XOR   391   00   0   0   
  0
XXEVAL to fuse XOR -> XOR240   00   0   0   
  0

Spill vector to stack 78 364  364 172 184   
110
Load spilled vector from stack   431 962  962 713 723   
166
Vector moves  10 100  100  70  72 
3,055

Vector rotate right  696 696  696 696 696   
696
XXLANDC or VANDC 209 600  600 600 600   
600
XXLXOR or VXOR   953   1,8241,824   1,824   1,824 
1,825
XXEVAL   631   00   0   0   
  0

Load vector rotate constants  24  24   24  24  24   
 24

Here are the results for adding support for XXEVAL for the singlebuff.c
benchmark attached to the PR.  Note that adding XXEVAL greatly speeds up 
this
particular benchmark:

  XXEVALTrunk   GCC14   GCC13   GCC12
GCC11
  ---   -   -   -
-
Benchmark time in seconds   4.46 5.405.405.355.36 
7.54

Fuse VANDC -> VXOR   210  600 600 600 600  
600
Fuse VXOR -> VXOR  0  240 240 120 120  
120
XXEVAL to fuse ANDC -> XOR   3900 

[gcc(refs/users/meissner/heads/work182-sha)] Revert changes

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:5c03d03351a2026159c81e29f350a0f77567bd2f

commit 5c03d03351a2026159c81e29f350a0f77567bd2f
Author: Michael Meissner 
Date:   Thu Oct 24 12:15:20 2024 -0400

Revert changes

Diff:
---
 gcc/config/rs6000/fusion.md| 660 +
 gcc/config/rs6000/genfusion.pl | 102 +---
 gcc/config/rs6000/predicates.md|  14 +-
 gcc/config/rs6000/rs6000.cc|   3 -
 gcc/config/rs6000/rs6000.md|   7 +-
 gcc/config/rs6000/rs6000.opt   |   4 -
 .../gcc.target/powerpc/p10-vector-fused-1.c|   0
 .../gcc.target/powerpc/p10-vector-fused-2.c|   0
 8 files changed, 270 insertions(+), 520 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index 215a3aae074f..4ed9ae1d69f4 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -1871,170 +1871,146 @@
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vand -> vand
 (define_insn "*fuse_vand_vand"
-  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v")
-(and:VM (and:VM (match_operand:VM 0 "vector_fusion_operand" 
"v,v,v,wa,v")
-  (match_operand:VM 1 "vector_fusion_operand" 
"%v,v,v,wa,v"))
- (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v")))
-   (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))]
+  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
+(and:VM (and:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v")
+  (match_operand:VM 1 "altivec_register_operand" 
"%v,v,v,v"))
+ (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
+   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
   "(TARGET_P10_FUSION)"
   "@
vand %3,%1,%0\;vand %3,%3,%2
vand %3,%1,%0\;vand %3,%3,%2
vand %3,%1,%0\;vand %3,%3,%2
-   xxeval %x3,%x2,%x1,%x0,1
vand %4,%1,%0\;vand %3,%4,%2"
   [(set_attr "type" "fused_vector")
(set_attr "cost" "6")
-   (set_attr "length" "8")
-   (set_attr "prefixed" "*,*,*,yes,*")
-   (set_attr "isa" "*,*,*,xxeval,*")])
+   (set_attr "length" "8")])
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vandc -> vand
 (define_insn "*fuse_vandc_vand"
-  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v")
-(and:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" 
"v,v,v,wa,v"))
-  (match_operand:VM 1 "vector_fusion_operand" 
"v,v,v,wa,v"))
- (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v")))
-   (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))]
+  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
+(and:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v"))
+  (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
+ (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
+   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
   "(TARGET_P10_FUSION)"
   "@
vandc %3,%1,%0\;vand %3,%3,%2
vandc %3,%1,%0\;vand %3,%3,%2
vandc %3,%1,%0\;vand %3,%3,%2
-   xxeval %x3,%x2,%x1,%x0,2
vandc %4,%1,%0\;vand %3,%4,%2"
   [(set_attr "type" "fused_vector")
(set_attr "cost" "6")
-   (set_attr "length" "8")
-   (set_attr "prefixed" "*,*,*,yes,*")
-   (set_attr "isa" "*,*,*,xxeval,*")])
+   (set_attr "length" "8")])
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector veqv -> vand
 (define_insn "*fuse_veqv_vand"
-  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v")
-(and:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" 
"v,v,v,wa,v")
-  (match_operand:VM 1 "vector_fusion_operand" 
"v,v,v,wa,v")))
- (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v")))
-   (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))]
+  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
+(and:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v")
+  (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v")))
+ (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
+   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
   "(TARGET_P10_FUSION)"
   "@
veqv %3,%1,%0\;vand %3,%3,%2
veqv %3,%1,%0\;vand %3,%3,%2
veqv %3,%1,%0\;vand %3,%3,%2
-   xxeval %x3,%x2,%x1,%x0,9
veqv %4,%1,%0\;vand %3,%4,%2"
   [(set_attr "type" "fused_vector")
(set_attr "cost" "6")
-   (set_attr "length" "8")
-   (set_attr "prefixed" "*,*,*,yes,*")
-   (set_attr "isa" "*,*,*,xxeval,*")])
+   (set_attr "length" "8")])
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vnand -> vand
 (define_insn "*fuse_vnand_vand"
-  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v")
-(and:V

[gcc(refs/users/meissner/heads/work182-sha)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:33901274eea3a07aeac48ec0bc4e87f66f19d50e

commit 33901274eea3a07aeac48ec0bc4e87f66f19d50e
Author: Michael Meissner 
Date:   Thu Oct 24 12:30:57 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.sha | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
index c6151fce09a4..80f083698d2a 100644
--- a/gcc/ChangeLog.sha
+++ b/gcc/ChangeLog.sha
@@ -116,7 +116,7 @@ XXEVAL   6300   0   
0  0 0
 Load vector rotate constants  96   96  96  96 9696
 
 
-These patches to add XXEVAL support add the following fusion patterns:
+These add the following fusion patterns:
 
xxland  => xxland   xxlandc => xxland   xxlxor  => xxland
xxlor   => xxland   xxlnor  => xxland   xxleqv  => xxland


[gcc(refs/users/meissner/heads/work182-sha)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:b1cf258d66b448da4378616ecf57df83c948bbb7

commit b1cf258d66b448da4378616ecf57df83c948bbb7
Author: Michael Meissner 
Date:   Thu Oct 24 12:26:28 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.sha | 143 +++---
 1 file changed, 126 insertions(+), 17 deletions(-)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
index fe43d0cb19a8..de75ac6f0e81 100644
--- a/gcc/ChangeLog.sha
+++ b/gcc/ChangeLog.sha
@@ -1,18 +1,8 @@
- Branch work182-sha, patch #402 
-
-Add missing test.
-
-2024-10-16  Michael Meissner  
-
-gcc/testsuite/
-
-   * gcc.target/powerpc/vector-rotate-left.c: New test.
-
- Branch work182-sha, patch #401 
+ Branch work182-sha, patch #411 was reverted 

 
 Add potential p-future XVRLD and XVRLDI instructions.
 
-2024-10-16  Michael Meissner  
+2024-10-24  Michael Meissner  
 
 gcc/
 
@@ -24,11 +14,128 @@ gcc/
* config/rs6000/rs6000.md (isa attribute): Add xvrlw.
(enabled attribute): Add support for xvrlw.
 
- Branch work182-sha, patch #400 
+gcc/testsuite/
+
+   * gcc.target/powerpc/vector-rotate-left.c: New test.
+
+ Branch work182-sha, patch #410 was reverted 

+
+PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
+
+The multibuff.c benchmark attached to the PR target/117251 compiled for Power10
+PowerPC that implement SHA3 has a slowdown in the current trunk and GCC 14
+compared to GCC 11 - GCC 13, due to excessive amounts of spilling.
+
+The main function for the multibuf.c file has 3,747 lines, all of which are
+using vector unsigned long long.  There are 696 vector rotates (all rotates are
+constant), 1,824 vector xor's and 600 vector andc's.
+
+In looking at it, the main thing that steps out is the reason for either
+spilling or moving variables is the support in fusion.md (generated by
+genfusion.pl) that tries to fuse the vec_andc feeding into vec_xor, and other
+vec_xor's feeding into vec_xor.
+
+On the powerpc for power10, there is a special fusion mode that happens if the
+machine has a VANDC or VXOR instruction that is adjacent to a VXOR instruction
+and the VANDC/VXOR feeds into the 2nd VXOR instruction.
+
+While the Power10 has 64 vector registers (which uses the XXL prefix to do
+logical operations), the fusion only works with the older Altivec instruction
+set (which uses the V prefix).  The Altivec instruction only has 32 vector
+registers (which are overlaid over the VSX vector registers 32-63).
+
+By having the combiner patterns fuse_vandc_vxor and fuse_vxor_vxor to do this
+fusion, it means that the register allocator has more register pressure for the
+traditional Altivec registers instead of the VSX registers.
+
+In addition, since there are vector rotates, these rotates only work on the
+traditional Altivec registers, which adds to the Altivec register pressure.
+
+Finally in addition to doing the explicit xor, andc, and rotates using the
+Altivec registers, we have to also load vector constants for the rotate amount
+and these registers also are allocated as Altivec registers.
 
-Initial support for adding xxeval fusion support.
+Current trunk and GCC 12-14 have more vector spills than GCC 11, but GCC 11 has
+many more vector moves that the later compilers.  Thus even though it has way
+less spills, the vector moves are why GCC 11 have the slowest results.
 
-2024-10-16  Michael Meissner  
+There is an instruction that was added in power10 (XXEVAL) that does provide
+fusion between VSX vectors that includes ANDC->XOR and XOR->XOR fusion.
+
+The latency of XXEVAL is slightly more than the fused VANDC/VXOR or VXOR/VXOR,
+so I have written the patch to prefer doing the Altivec instructions if they
+don't need a temporary register.
+
+Here are the results for adding support for XXEVAL for the multibuff.c
+benchmark attached to the PR.  Note that we essentially recover the speed with
+this patch that were lost with GCC 14 and the current trunk:
+
+  XXEVALTrunk   GCC14   GCC13   GCC12GCC11
+  ---   -   -   --
+Benchmark time in seconds   5.53 6.156.265.575.61 9.56
+
+Fuse VANDC -> VXOR   209 600  600 600 600   600
+Fuse VXOR -> VXOR  0 240  240 120 120   120
+XXEVAL to fuse ANDC -> XOR   391   00   0   0 0
+XXEVAL to fuse XOR -> XOR240   00   0   0 0
+
+Spill vector to stack 78 364  364 172 184   110
+Load spilled vector from stack   431 962  962 713 723   166
+Vector moves  10 100  100  70  72 3,055
+
+Vector rota

[gcc(refs/users/meissner/heads/work182-sha)] Add potential p-future XVRLD and XVRLDI instructions.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:442f717c7cf517cbc4475ac98064c9be0fe4b222

commit 442f717c7cf517cbc4475ac98064c9be0fe4b222
Author: Michael Meissner 
Date:   Thu Oct 24 12:23:17 2024 -0400

Add potential p-future XVRLD and XVRLDI instructions.

2024-10-24  Michael Meissner  

gcc/

* config/rs6000/altivec.md (altivec_vrl): Add support for a
possible XVRLD instruction in the future.
(altivec_vrl_immediate): New insns.
* config/rs6000/predicates.md (vector_shift_immediate): New 
predicate.
* config/rs6000/rs6000.h (TARGET_XVRLW): New macro.
* config/rs6000/rs6000.md (isa attribute): Add xvrlw.
(enabled attribute): Add support for xvrlw.

gcc/testsuite/

* gcc.target/powerpc/vector-rotate-left.c: New test.

Diff:
---
 gcc/config/rs6000/altivec.md   | 35 +++---
 gcc/config/rs6000/predicates.md| 26 
 gcc/config/rs6000/rs6000.h |  3 ++
 gcc/config/rs6000/rs6000.md|  6 +++-
 .../gcc.target/powerpc/vector-rotate-left.c| 34 +
 5 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 00dad4b91f1c..d4ee50322ca1 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1983,12 +1983,39 @@
 }
   [(set_attr "type" "vecperm")])
 
+;; -mcpu=future adds a vector rotate left word variant.  There is no vector
+;; byte/half-word/double-word/quad-word rotate left.  This insn occurs before
+;; altivec_vrl and will match for -mcpu=future, while other cpus will
+;; match the generic insn.
+;; However for testing, allow other xvrl variants.  In particular, XVRLD for
+;; the sha3 tests for multibuf/singlebuf.
 (define_insn "altivec_vrl"
-  [(set (match_operand:VI2 0 "register_operand" "=v")
-(rotate:VI2 (match_operand:VI2 1 "register_operand" "v")
-   (match_operand:VI2 2 "register_operand" "v")))]
+  [(set (match_operand:VI2 0 "register_operand" "=v,wa")
+(rotate:VI2 (match_operand:VI2 1 "register_operand" "v,wa")
+   (match_operand:VI2 2 "register_operand" "v,wa")))]
   ""
-  "vrl %0,%1,%2"
+  "@
+   vrl %0,%1,%2
+   xvrl %x0,%x1,%x2"
+  [(set_attr "type" "vecsimple")
+   (set_attr "isa" "*,xvrlw")])
+
+(define_insn "*altivec_vrl_immediate"
+  [(set (match_operand:VI2 0 "register_operand" "=wa,wa,wa,wa")
+   (rotate:VI2 (match_operand:VI2 1 "register_operand" "wa,wa,wa,wa")
+   (match_operand:VI2 2 "vector_shift_immediate" 
"j,wM,wE,wS")))]
+  "TARGET_XVRLW && "
+{
+  rtx op2 = operands[2];
+  int value = 256;
+  int num_insns = -1;
+
+  if (!xxspltib_constant_p (op2, mode, &num_insns, &value))
+gcc_unreachable ();
+
+  operands[3] = GEN_INT (value & 0xff);
+  return "xvrli %x0,%x1,%3";
+}
   [(set_attr "type" "vecsimple")])
 
 (define_insn "altivec_vrlq"
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 1d95e34557e5..fccfbd7e4904 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -728,6 +728,32 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR whose elements are all the
+;; same and the elements can be an immediate shift or rotate factor
+(define_predicate "vector_shift_immediate"
+  (match_code "const_vector,vec_duplicate,const_int")
+{
+  int value = 256;
+  int num_insns = -1;
+
+  if (zero_constant (op, mode) || all_ones_constant (op, mode))
+return true;
+
+  if (!xxspltib_constant_p (op, mode, &num_insns, &value))
+return false;
+
+  switch (mode)
+{
+case V16QImode: return IN_RANGE (value, 0, 7);
+case V8HImode:  return IN_RANGE (value, 0, 15);
+case V4SImode:  return IN_RANGE (value, 0, 31);
+case V2DImode:  return IN_RANGE (value, 0, 63);
+default:break;
+}
+
+  return false;
+})
+  
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 8cfd9faf77dc..1a168c2c9596 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -581,6 +581,9 @@ extern int rs6000_vector_align[];
below.  */
 #define RS6000_FN_TARGET_INFO_HTM 1
 
+/* Whether we have XVRLW support.  */
+#define TARGET_XVRLW   TARGET_FUTURE
+
 /* Whether the various reciprocal divide/square root estimate instructions
exist, and whether we should automatically generate code for the instruction
by default.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 68fbfec95546..420f20d4524b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -369,7 +369,7 @@
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we impl

[gcc(refs/users/meissner/heads/work182-sha)] Add p-future target-supports.exp

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:168741b8e7a230560c4cf5702e42e17ccd7043ba

commit 168741b8e7a230560c4cf5702e42e17ccd7043ba
Author: Michael Meissner 
Date:   Thu Oct 24 14:07:22 2024 -0400

Add p-future target-supports.exp

2024-10-24  Michael Meissner  

gcc/testsuite/

* lib/target-supports.exp 
(check_effective_target_powerpc_future_ok):
New target.
(check_effective_target_powerpc_dense_math_ok): Likewise.

Diff:
---
 gcc/testsuite/lib/target-supports.exp | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index d113a08dff7b..f104f4295d9f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7366,6 +7366,41 @@ proc check_effective_target_power10_ok { } {
 }
 }
 
+# Return 1 if this is a PowerPC target supporting -mcpu=future which enables
+# some potential new instructions.
+proc check_effective_target_powerpc_future_ok { } {
+   return [check_no_compiler_messages powerpc_future_ok object {
+   #ifndef _ARCH_PWR_FUTURE
+   #error "-mcpu=future is not supported"
+   #else
+   int dummy;
+   #endif
+   } "-mcpu=future"]
+}
+
+# Return 1 if this is a PowerPC target supporting -mcpu=future which enables
+# the dense math operations.
+proc check_effective_target_powerpc_dense_math_ok { } {
+if { ([istarget powerpc*-*-*]) } {
+   return [check_no_compiler_messages powerpc_dense_math_ok object {
+   __vector_quad vq;
+   int main (void) {
+   #ifndef __DENSE_MATH__
+   #error "target does not have dense math support."
+   #else
+   /* Make sure we have dense math support.  */
+ __vector_quad dmr;
+ __asm__ ("dmsetaccz %A0" : "=wD" (dmr));
+ vq = dmr;
+   #endif
+   return 0;
+   }
+   } "-mcpu=future"]
+} else {
+   return 0;
+}
+}
+
 # Return 1 if this is a PowerPC target supporting -mfloat128 via either
 # software emulation on power7/power8 systems or hardware support on power9.


[gcc(refs/users/meissner/heads/work182-sha)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:344b41a356d2fb212c24b4621793ce87628c9f5a

commit 344b41a356d2fb212c24b4621793ce87628c9f5a
Author: Michael Meissner 
Date:   Thu Oct 24 12:27:43 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.sha | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
index dc35b2de5a28..c6151fce09a4 100644
--- a/gcc/ChangeLog.sha
+++ b/gcc/ChangeLog.sha
@@ -74,21 +74,21 @@ this patch that were lost with GCC 14 and the current trunk:
   ---   -   -   --
 Benchmark time in seconds   5.53 6.156.265.575.61 9.56
 
-Fuse VANDC -> VXOR   209 600  600 600 600   600
-Fuse VXOR -> VXOR  0 240  240 120 120   120
-XXEVAL to fuse ANDC -> XOR   391   00   0   0 0
-XXEVAL to fuse XOR -> XOR240   00   0   0 0
+Fuse VANDC -> VXOR   209 600  600 600 600  600
+Fuse VXOR -> VXOR  0 240  240 120 120  120
+XXEVAL to fuse ANDC -> XOR   391   00   0   00
+XXEVAL to fuse XOR -> XOR240   00   0   00
 
-Spill vector to stack 78 364  364 172 184   110
-Load spilled vector from stack   431 962  962 713 723   166
-Vector moves  10 100  100  70  72 3,055
+Spill vector to stack 78 364  364 172 184  110
+Load spilled vector from stack   431 962  962 713 723  166
+Vector moves  10 100  100  70  723,055
 
-Vector rotate right  696 696  696 696 696   696
-XXLANDC or VANDC 209 600  600 600 600   600
-XXLXOR or VXOR   953   1,8241,824   1,824   1,824 1,825
-XXEVAL   631   00   0   0 0
+Vector rotate right  696 696  696 696 696  696
+XXLANDC or VANDC 209 600  600 600 600  600
+XXLXOR or VXOR   953   1,8241,824   1,824   1,8241,825
+XXEVAL   631   00   0   00
 
-Load vector rotate constants  24  24   24  24  2424
+Load vector rotate constants  24  24   24  24  24   24
 
 
 Here are the results for adding support for XXEVAL for the singlebuff.c


[gcc(refs/users/meissner/heads/work182-sha)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:8800ae704d6e6482fd92b7c9d40c3d6a1fa85544

commit 8800ae704d6e6482fd92b7c9d40c3d6a1fa85544
Author: Michael Meissner 
Date:   Thu Oct 24 14:08:35 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.sha | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
index 80f083698d2a..a48e8bcf5071 100644
--- a/gcc/ChangeLog.sha
+++ b/gcc/ChangeLog.sha
@@ -1,3 +1,15 @@
+ Branch work182-sha, patch #412 
+
+Add p-future target-supports.exp
+
+2024-10-24  Michael Meissner  
+
+gcc/testsuite/
+
+   * lib/target-supports.exp (check_effective_target_powerpc_future_ok):
+   New target.
+   (check_effective_target_powerpc_dense_math_ok): Likewise.
+
  Branch work182-sha, patch #411 
 
 Add potential p-future XVRLD and XVRLDI instructions.


[gcc(refs/users/meissner/heads/work182)] Do not allow -mvsx to boost processor to power7.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:1b9926150dd7e38196cda043c0fc4afc374007b8

commit 1b9926150dd7e38196cda043c0fc4afc374007b8
Author: Michael Meissner 
Date:   Wed Nov 6 15:28:34 2024 -0500

Do not allow -mvsx to boost processor to power7.

This patch restructures the code so that -mvsx for example will not silently
convert the processor to power7.  The user must now use -mcpu=power7 or 
higher.
This means if the user does -mvsx and the default processor does not have 
VSX
support, it will be an error.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/rs6000.cc (report_architecture_mismatch): New 
function.
Report an error if the user used an option such as -mvsx when the
default processor would not allow the option.
(rs6000_option_override_internal): Move some ISA checking code into
report_architecture_mismatch.

Diff:
---
 gcc/config/rs6000/rs6000.cc | 129 +++-
 1 file changed, 79 insertions(+), 50 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 8388542b7210..a944ffde28a6 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1173,6 +1173,7 @@ const int INSN_NOT_AVAILABLE = -1;
 static void rs6000_print_isa_options (FILE *, int, const char *,
  HOST_WIDE_INT, HOST_WIDE_INT);
 static HOST_WIDE_INT rs6000_disable_incompatible_switches (void);
+static void report_architecture_mismatch (void);
 
 static enum rs6000_reg_type register_to_reg_type (rtx, bool *);
 static bool rs6000_secondary_reload_move (enum rs6000_reg_type,
@@ -3695,7 +3696,6 @@ rs6000_option_override_internal (bool global_init_p)
   bool ret = true;
 
   HOST_WIDE_INT set_masks;
-  HOST_WIDE_INT ignore_masks;
   int cpu_index = -1;
   int tune_index;
   struct cl_target_option *main_target_opt
@@ -3964,59 +3964,13 @@ rs6000_option_override_internal (bool global_init_p)
 dwarf_offset_size = POINTER_SIZE_UNITS;
 #endif
 
-  /* Handle explicit -mno-{altivec,vsx} and turn off all of
- the options that depend on those flags.  */
-  ignore_masks = rs6000_disable_incompatible_switches ();
-
-  /* For the newer switches (vsx, dfp, etc.) set some of the older options,
- unless the user explicitly used the -mno- to disable the code.  */
-  if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_MISC)
-rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_P9_MINMAX)
-{
-  if (cpu_index >= 0)
-   {
- if (cpu_index == PROCESSOR_POWER9)
-   {
- /* legacy behavior: allow -mcpu=power9 with certain
-capabilities explicitly disabled.  */
- rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks);
-   }
- else
-   error ("power9 target option is incompatible with %<%s=%> "
-  "for  less than power9", "-mcpu");
-   }
-  else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit)
-  != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags
-  & rs6000_isa_flags_explicit))
-   /* Enforce that none of the ISA_3_0_MASKS_SERVER flags
-  were explicitly cleared.  */
-   error ("%qs incompatible with explicitly disabled options",
-  "-mpower9-minmax");
-  else
-   rs6000_isa_flags |= ISA_3_0_MASKS_SERVER;
-}
-  else if (TARGET_P8_VECTOR || TARGET_POWER8 || TARGET_CRYPTO)
-rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_VSX)
-rs6000_isa_flags |= (ISA_2_6_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_POPCNTD)
-rs6000_isa_flags |= (ISA_2_6_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_DFP)
-rs6000_isa_flags |= (ISA_2_5_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_CMPB)
-rs6000_isa_flags |= (ISA_2_5_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_FPRND)
-rs6000_isa_flags |= (ISA_2_4_MASKS & ~ignore_masks);
-  else if (TARGET_POPCNTB)
-rs6000_isa_flags |= (ISA_2_2_MASKS & ~ignore_masks);
-  else if (TARGET_ALTIVEC)
-rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~ignore_masks);
+  /* Report trying to use things like -mmodulo to imply -mcpu=power9.  */
+  report_architecture_mismatch ();
 
   /* Disable VSX and Altivec silently if the user switched cpus to power7 in a
  target at

[gcc(refs/users/meissner/heads/work182-tar)] Add ChangeLog.tar and update REVISION.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:bf1ef9c1bec80bf4f665d12e3616f98c3d7d2177

commit bf1ef9c1bec80bf4f665d12e3616f98c3d7d2177
Author: Michael Meissner 
Date:   Tue Oct 22 15:34:01 2024 -0400

Add ChangeLog.tar and update REVISION.

2024-10-22  Michael Meissner  

gcc/

* ChangeLog.tar: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.tar | 5 +
 gcc/REVISION  | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.tar b/gcc/ChangeLog.tar
new file mode 100644
index ..b5f7236162fa
--- /dev/null
+++ b/gcc/ChangeLog.tar
@@ -0,0 +1,5 @@
+ Branch work182-tar, baseline 
+
+2024-10-22   Michael Meissner  
+
+   Clone branch
diff --git a/gcc/REVISION b/gcc/REVISION
index 5aaca2bd398a..d3e47d04fcc2 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work182 branch
+work182-tar branch


[gcc(refs/users/meissner/heads/work182-tar)] Merge commit 'refs/users/meissner/heads/work182-tar' of git+ssh://gcc.gnu.org/git/gcc into me/work18

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:a8d2d9edcfbb70bd8c104e4650112549a6d0ca09

commit a8d2d9edcfbb70bd8c104e4650112549a6d0ca09
Merge: bf1ef9c1bec8 153f08510e13
Author: Michael Meissner 
Date:   Wed Nov 6 16:09:19 2024 -0500

Merge commit 'refs/users/meissner/heads/work182-tar' of 
git+ssh://gcc.gnu.org/git/gcc into me/work182-tar

Diff:


[gcc(refs/users/meissner/heads/work182-sha)] Merge commit 'refs/users/meissner/heads/work182-sha' of git+ssh://gcc.gnu.org/git/gcc into me/work18

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:cc318ac99f83e04507baeda7e057f54374d08abc

commit cc318ac99f83e04507baeda7e057f54374d08abc
Merge: 8800ae704d6e c5a9703abe8d
Author: Michael Meissner 
Date:   Wed Nov 6 16:07:21 2024 -0500

Merge commit 'refs/users/meissner/heads/work182-sha' of 
git+ssh://gcc.gnu.org/git/gcc into me/work182-sha

Diff:


[gcc(refs/users/meissner/heads/work182-test)] Add debugging for PR 71977-1.c regression.

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:cd59f64c68541b050ef6ab95637cbb1855d3df94

commit cd59f64c68541b050ef6ab95637cbb1855d3df94
Author: Michael Meissner 
Date:   Wed Nov 6 16:15:36 2024 -0500

Add debugging for PR 71977-1.c regression.

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/rs6000.cc (sf_logical_op_p): New function.
* config/rs6000/rs6000.h (sf_logical_op_p): Add declaration.
* config/rs6000/vsx.md (define_peephole2 for SF + logical): Move 
test to
sf_logical_op_p.

Diff:
---
 gcc/config/rs6000/rs6000.cc | 62 +
 gcc/config/rs6000/rs6000.h  |  2 ++
 gcc/config/rs6000/vsx.md| 19 +-
 3 files changed, 65 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index aa67e7256bb9..e1ec9591a0eb 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -29564,6 +29564,68 @@ rs6000_opaque_type_invalid_use_p (gimple *stmt)
   return false;
 }
 
+bool
+sf_logical_op_p (rtx operands[])
+{
+  if (!TARGET_POWERPC64 || !TARGET_DIRECT_MOVE)
+{
+  fprintf (stderr, "!TARGET_POWERPC64 || !TARGET_DIRECT_MOVE\n");
+  return false;
+}
+
+   /* The REG_P (xxx) tests prevents SUBREG's, which allows us to use REGNO
+  to compare registers, when the mode is different.  */
+  if (!REG_P (operands[SFBOOL_MFVSR_D]) && REG_P (operands[SFBOOL_BOOL_D]))
+{
+  fprintf (stderr, "REG_P (operands[SFBOOL_MFVSR_D]) && REG_P 
(operands[SFBOOL_BOOL_D]))\n");
+  return false;
+}
+
+  if (!REG_P (operands[SFBOOL_BOOL_A1]) && REG_P (operands[SFBOOL_SHL_D]))
+{
+  fprintf (stderr, "!REG_P (operands[SFBOOL_BOOL_A1]) && REG_P 
(operands[SFBOOL_SHL_D])\n");
+  return false;
+}
+
+  if (!REG_P (operands[SFBOOL_SHL_A])   && REG_P (operands[SFBOOL_MTVSR_D]))
+{
+  fprintf (stderr, "!REG_P (operands[SFBOOL_SHL_A])   && REG_P 
(operands[SFBOOL_MTVSR_D])\n");
+  return false;
+}
+
+  if (!REG_P (operands[SFBOOL_BOOL_A2])
+   && !CONST_INT_P (operands[SFBOOL_BOOL_A2]))
+{
+  fprintf (stderr, "!REG_P (operands[SFBOOL_BOOL_A2]) && !CONST_INT_P 
(operands[SFBOOL_BOOL_A2])\n");
+  return false;
+}
+
+  if (!REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D])
+   && !peep2_reg_dead_p (2, operands[SFBOOL_MFVSR_D]))
+{
+  fprintf (stderr, "!REGNO (operands[SFBOOL_BOOL_D]) == REGNO 
(operands[SFBOOL_MFVSR_D]) && !peep2_reg_dead_p (2, 
operands[SFBOOL_MFVSR_D])\n");
+  return false;
+}
+
+  if (((REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1])
+   || (REG_P (operands[SFBOOL_BOOL_A2])
+   && REGNO (operands[SFBOOL_MFVSR_D]) == REGNO 
(operands[SFBOOL_BOOL_A2])))
+   && REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_SHL_A])
+   && (REGNO (operands[SFBOOL_SHL_D]) == REGNO (operands[SFBOOL_BOOL_D])
+  || peep2_reg_dead_p (3, operands[SFBOOL_BOOL_D]))
+   && peep2_reg_dead_p (4, operands[SFBOOL_SHL_D])))
+{
+  fprintf (stderr, "last test passed\n");
+  return true;
+}
+  else
+{
+  fprintf (stderr, "last test failed\n");
+  return false;
+}
+}
+
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 8cfd9faf77dc..499e80fda08d 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2525,3 +2525,5 @@ enum {
 
 #undef ARCH_EXPAND
 #endif /* GCC_HWINT_H.  */
+
+extern bool sf_logical_op_p (rtx operands[]);
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b2fc39acf4e8..bcf8e2a60462 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -6338,24 +6338,7 @@
(set (match_operand:SF SFBOOL_MTVSR_D "vsx_register_operand")
(unspec:SF [(match_dup SFBOOL_SHL_D)] UNSPEC_P8V_MTVSRD))]
 
-  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE
-   /* The REG_P (xxx) tests prevents SUBREG's, which allows us to use REGNO
-  to compare registers, when the mode is different.  */
-   && REG_P (operands[SFBOOL_MFVSR_D]) && REG_P (operands[SFBOOL_BOOL_D])
-   && REG_P (operands[SFBOOL_BOOL_A1]) && REG_P (operands[SFBOOL_SHL_D])
-   && REG_P (operands[SFBOOL_SHL_A])   && REG_P (operands[SFBOOL_MTVSR_D])
-   && (REG_P (operands[SFBOOL_BOOL_A2])
-   || CONST_INT_P (operands[SFBOOL_BOOL_A2]))
-   && (REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D])
-   || peep2_reg_dead_p (2, operands[SFBOOL_MFVSR_D]))
-   && (REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1])
-   || (REG_P (operands[SFBOOL_BOOL_A2])
-  && REGNO (operands[SFBOOL_MFVSR_D])
-   == REGNO (operands[SFBOOL_BOOL_A2])))
-   && REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_SHL_A])
-   && (REGNO (operands[SFBOOL_SHL_D]) == REGNO (operands[SFBOOL_BOOL_D])
-   || peep2_reg_dead_p (3, operan

[gcc(refs/users/meissner/heads/work182-test)] Update ChangeLog.*

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:f35900b938191cb5dafe9eeaed1ed2baa79350ec

commit f35900b938191cb5dafe9eeaed1ed2baa79350ec
Author: Michael Meissner 
Date:   Wed Nov 6 16:17:19 2024 -0500

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.test | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/gcc/ChangeLog.test b/gcc/ChangeLog.test
index cda3bd90508c..6f28010a85b3 100644
--- a/gcc/ChangeLog.test
+++ b/gcc/ChangeLog.test
@@ -1,5 +1,27 @@
+ Branch work182-test, patch #500 
+
+Add debugging for PR 71977-1.c regression.
+
+2024-11-06  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/rs6000.cc (sf_logical_op_p): New function.
+   * config/rs6000/rs6000.h (sf_logical_op_p): Add declaration.
+   * config/rs6000/vsx.md (define_peephole2 for SF + logical): Move test to
+   sf_logical_op_p.
+
  Branch work182-test, baseline 
 
+Add ChangeLog.test and update REVISION.
+
+2024-10-22  Michael Meissner  
+
+gcc/
+
+   * ChangeLog.test: New file for branch.
+   * REVISION: Update.
+
 2024-10-22   Michael Meissner  
 
Clone branch


[gcc r14-10893] AArch64: rename the SVE2 psel intrinsics to psel_lane [PR116371]

2024-11-06 Thread Tamar Christina via Gcc-cvs
https://gcc.gnu.org/g:97640e9632697b9f0ab31e4022d24d360d1ea2c9

commit r14-10893-g97640e9632697b9f0ab31e4022d24d360d1ea2c9
Author: Tamar Christina 
Date:   Mon Oct 14 13:58:09 2024 +0100

AArch64: rename the SVE2 psel intrinsics to psel_lane [PR116371]

The psel intrinsics. similar to the pext, should be name psel_lane.  This
corrects the naming.

gcc/ChangeLog:

PR target/116371
* config/aarch64/aarch64-sve-builtins-sve2.cc (class svpsel_impl):
Renamed to ...
(class svpsel_lane_impl): ... This and adjust initialization.
* config/aarch64/aarch64-sve-builtins-sve2.def (svpsel): Renamed to 
...
(svpsel_lane): ... This.
* config/aarch64/aarch64-sve-builtins-sve2.h (svpsel): Renamed to
svpsel_lane.

gcc/testsuite/ChangeLog:

PR target/116371
* gcc.target/aarch64/sme2/acle-asm/psel_b16.c,
gcc.target/aarch64/sme2/acle-asm/psel_b32.c,
gcc.target/aarch64/sme2/acle-asm/psel_b64.c,
gcc.target/aarch64/sme2/acle-asm/psel_b8.c,
gcc.target/aarch64/sme2/acle-asm/psel_c16.c,
gcc.target/aarch64/sme2/acle-asm/psel_c32.c,
gcc.target/aarch64/sme2/acle-asm/psel_c64.c,
gcc.target/aarch64/sme2/acle-asm/psel_c8.c: Renamed to
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c,
gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c,
gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c,
gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c,
gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c,
gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c,
gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c,
gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: ... These.

(cherry picked from commit 306834b7f74ab61160f205e04f5bf35b71f9ec52)

Diff:
---
 gcc/config/aarch64/aarch64-sve-builtins-sve2.cc|  4 +-
 gcc/config/aarch64/aarch64-sve-builtins-sve2.def   |  2 +-
 gcc/config/aarch64/aarch64-sve-builtins-sve2.h |  2 +-
 .../gcc.target/aarch64/sme2/acle-asm/psel_b16.c| 89 --
 .../gcc.target/aarch64/sme2/acle-asm/psel_b32.c| 89 --
 .../gcc.target/aarch64/sme2/acle-asm/psel_b64.c| 80 ---
 .../gcc.target/aarch64/sme2/acle-asm/psel_b8.c | 89 --
 .../gcc.target/aarch64/sme2/acle-asm/psel_c16.c| 89 --
 .../gcc.target/aarch64/sme2/acle-asm/psel_c32.c| 89 --
 .../gcc.target/aarch64/sme2/acle-asm/psel_c64.c| 80 ---
 .../gcc.target/aarch64/sme2/acle-asm/psel_c8.c | 89 --
 .../aarch64/sme2/acle-asm/psel_lane_b16.c  | 89 ++
 .../aarch64/sme2/acle-asm/psel_lane_b32.c  | 89 ++
 .../aarch64/sme2/acle-asm/psel_lane_b64.c  | 80 +++
 .../aarch64/sme2/acle-asm/psel_lane_b8.c   | 89 ++
 .../aarch64/sme2/acle-asm/psel_lane_c16.c  | 89 ++
 .../aarch64/sme2/acle-asm/psel_lane_c32.c  | 89 ++
 .../aarch64/sme2/acle-asm/psel_lane_c64.c  | 80 +++
 .../aarch64/sme2/acle-asm/psel_lane_c8.c   | 89 ++
 19 files changed, 698 insertions(+), 698 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
index 4f25cc680282..06d4d22fc0b2 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
@@ -234,7 +234,7 @@ public:
   }
 };
 
-class svpsel_impl : public function_base
+class svpsel_lane_impl : public function_base
 {
 public:
   rtx
@@ -625,7 +625,7 @@ FUNCTION (svpmullb, unspec_based_function, (-1, 
UNSPEC_PMULLB, -1))
 FUNCTION (svpmullb_pair, unspec_based_function, (-1, UNSPEC_PMULLB_PAIR, -1))
 FUNCTION (svpmullt, unspec_based_function, (-1, UNSPEC_PMULLT, -1))
 FUNCTION (svpmullt_pair, unspec_based_function, (-1, UNSPEC_PMULLT_PAIR, -1))
-FUNCTION (svpsel, svpsel_impl,)
+FUNCTION (svpsel_lane, svpsel_lane_impl,)
 FUNCTION (svqabs, rtx_code_function, (SS_ABS, UNKNOWN, UNKNOWN))
 FUNCTION (svqcadd, svqcadd_impl,)
 FUNCTION (svqcvt, integer_conversion, (UNSPEC_SQCVT, UNSPEC_SQCVTU,
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index 4366925a9711..ef677a74020b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
@@ -235,7 +235,7 @@ DEF_SVE_FUNCTION (svsm4ekey, binary, s_unsigned, none)
 | AARCH64_FL_SME \
 | AARCH64_FL_SM_ON)
 DEF_SVE_FUNCTION (svclamp, clamp, all_integer, none)
-DEF_SVE_FUNCTION (svpsel, select_pred, all_pred_count, none)
+DEF_SVE_FU

[gcc r15-4981] libstdc++: Deprecate useless compatibility headers for C++17

2024-11-06 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:5c34f02ba7ebe0dfec5595ebc8298c47d5f65e6e

commit r15-4981-g5c34f02ba7ebe0dfec5595ebc8298c47d5f65e6e
Author: Jonathan Wakely 
Date:   Wed Oct 23 16:01:50 2024 +0100

libstdc++: Deprecate useless  compatibility headers for C++17

These headers make no sense for C++ programs, because they either define
different content to the corresponding  C header, or define
nothing at all in namespace std. They were all deprecated in C++17, so
add deprecation warnings to them, which can be disabled with
-Wno-deprecated. For C++20 and later these headers are no longer in the
standard at all, so compiling with _GLIBCXX_USE_DEPRECATED defined to 0
will give an error when they are included.

Because #warning is non-standard before C++23 we need to use pragmas to
ignore -Wc++23-extensions for the -Wsystem-headers -pedantic case.

One g++ test needs adjustment because it includes , but that
can be made conditional on the __cplusplus value without any reduction
in test coverage.

For the library tests, consolidate the std_c++0x_neg.cc XFAIL tests into
the macros.cc test, using dg-error with a { target c++98_only }
selector. This avoids having two separate test files, one for C++98 and
one for everything later. Also add tests for the  headers to
ensure that they behave as expected and don't give deprecated warnings.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document deprecations.
* doc/html/*: Regenerate.
* include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move
include guard to start of file. Include  directly
instead of .
* include/c_compatibility/tgmath.h: Include  and
 directly, instead of .
* include/c_global/ccomplex: Add deprecated #warning for C++17
and #error for C++20 if _GLIBCXX_USE_DEPRECATED == 0.
* include/c_global/ciso646: Likewise.
* include/c_global/cstdalign: Likewise.
* include/c_global/cstdbool: Likewise.
* include/c_global/ctgmath: Likewise.
* include/c_std/ciso646: Likewise.
* include/precompiled/stdc++.h: Do not include ccomplex,
ciso646, cstdalign, cstdbool, or ctgmath in C++17 and later.
* testsuite/18_support/headers/cstdalign/macros.cc: Check for
warnings and errors for unsupported dialects.
* testsuite/18_support/headers/cstdbool/macros.cc: Likewise.
* testsuite/26_numerics/headers/ctgmath/complex.cc: Likewise.
* testsuite/27_io/objects/char/1.cc: Do not include .
* testsuite/27_io/objects/wchar_t/1.cc: Likewise.
* testsuite/18_support/headers/cstdbool/std_c++0x_neg.cc: Removed.
* testsuite/18_support/headers/cstdalign/std_c++0x_neg.cc: Removed.
* testsuite/26_numerics/headers/ccomplex/std_c++0x_neg.cc: Removed.
* testsuite/26_numerics/headers/ctgmath/std_c++0x_neg.cc: Removed.
* testsuite/18_support/headers/ciso646/macros.cc: New test.
* testsuite/18_support/headers/ciso646/macros.h.cc: New test.
* testsuite/18_support/headers/cstdbool/macros.h.cc: New test.
* testsuite/26_numerics/headers/ccomplex/complex.cc: New test.
* testsuite/26_numerics/headers/ccomplex/complex.h.cc: New test.
* testsuite/26_numerics/headers/ctgmath/complex.h.cc: New test.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.other/headers1.C: Do not include ciso646 for
C++17 and later.

Diff:
---
 gcc/testsuite/g++.old-deja/g++.other/headers1.C|  2 +
 libstdc++-v3/doc/html/manual/api.html  |  8 
 libstdc++-v3/doc/xml/manual/evolution.xml  | 10 +
 libstdc++-v3/include/c_compatibility/complex.h |  4 +-
 libstdc++-v3/include/c_compatibility/tgmath.h  | 11 ++---
 libstdc++-v3/include/c_global/ccomplex |  9 
 libstdc++-v3/include/c_global/ciso646  |  9 
 libstdc++-v3/include/c_global/cstdalign|  8 
 libstdc++-v3/include/c_global/cstdbool |  8 
 libstdc++-v3/include/c_global/ctgmath  |  8 
 libstdc++-v3/include/c_std/ciso646 | 10 +
 libstdc++-v3/include/precompiled/stdc++.h  | 13 +++---
 .../testsuite/18_support/headers/ciso646/macros.cc | 51 ++
 .../18_support/headers/ciso646/macros.h.cc | 49 +
 .../18_support/headers/cstdalign/macros.cc | 10 -
 .../18_support/headers/cstdalign/macros.h.cc   | 25 +++
 .../18_support/headers/cstdalign/std_c++0x_neg.cc  | 24 --
 .../18_support/headers/cstdbool/macros.cc  |  9 +++-
 .../18_support/headers/cstdbool/macros.h.cc| 21 +
 .../18_support/headers/cstdbool/std_c++0x_neg.cc   | 26 ---
 .../26_nu

[gcc r15-4978] libstdc++: Enable debug assertions for filesystem directory iterators

2024-11-06 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:f7979b8bfa6542e6861f44c78d18cc1cf8dae4d6

commit r15-4978-gf7979b8bfa6542e6861f44c78d18cc1cf8dae4d6
Author: Jonathan Wakely 
Date:   Mon Oct 28 17:55:02 2024 +

libstdc++: Enable debug assertions for filesystem directory iterators

Several member functions of filesystem::directory_iterator and
filesystem::recursive_directory_iterator currently dereference their
shared_ptr data member without checking for non-null. Because they use
operator-> and that function only uses _GLIBCXX_DEBUG_PEDASSERT rather
than __glibcxx_assert there is no assertion even when the library is
built with _GLIBCXX_ASSERTIONS defined. This means that dereferencing
invalid directory iterators gives an unhelpful segfault.

By using (*p). instead of p-> we get an assertion when the library is
built with _GLIBCXX_ASSERTIONS, with a "_M_get() != nullptr" message.

libstdc++-v3/ChangeLog:

* src/c++17/fs_dir.cc (fs::directory_iterator::operator*): Use
shared_ptr::operator* instead of shared_ptr::operator->.
(fs::recursive_directory_iterator::options): Likewise.
(fs::recursive_directory_iterator::depth): Likewise.
(fs::recursive_directory_iterator::recursion_pending): Likewise.
(fs::recursive_directory_iterator::operator*): Likewise.
(fs::recursive_directory_iterator::disable_recursion_pending):
Likewise.

Diff:
---
 libstdc++-v3/src/c++17/fs_dir.cc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/src/c++17/fs_dir.cc b/libstdc++-v3/src/c++17/fs_dir.cc
index 28d27f6a9fa1..8fe9e5e4cc81 100644
--- a/libstdc++-v3/src/c++17/fs_dir.cc
+++ b/libstdc++-v3/src/c++17/fs_dir.cc
@@ -230,7 +230,7 @@ directory_iterator(const path& p, directory_options 
options, error_code* ecptr)
 const fs::directory_entry&
 fs::directory_iterator::operator*() const noexcept
 {
-  return _M_dir->entry;
+  return (*_M_dir).entry;
 }
 
 fs::directory_iterator&
@@ -327,25 +327,25 @@ 
fs::recursive_directory_iterator::~recursive_directory_iterator() = default;
 fs::directory_options
 fs::recursive_directory_iterator::options() const noexcept
 {
-  return _M_dirs->options;
+  return (*_M_dirs).options;
 }
 
 int
 fs::recursive_directory_iterator::depth() const noexcept
 {
-  return int(_M_dirs->size()) - 1;
+  return int((*_M_dirs).size()) - 1;
 }
 
 bool
 fs::recursive_directory_iterator::recursion_pending() const noexcept
 {
-  return _M_dirs->pending;
+  return (*_M_dirs).pending;
 }
 
 const fs::directory_entry&
 fs::recursive_directory_iterator::operator*() const noexcept
 {
-  return _M_dirs->top().entry;
+  return (*_M_dirs).top().entry;
 }
 
 fs::recursive_directory_iterator&
@@ -453,7 +453,7 @@ fs::recursive_directory_iterator::pop()
 void
 fs::recursive_directory_iterator::disable_recursion_pending() noexcept
 {
-  _M_dirs->pending = false;
+  (*_M_dirs).pending = false;
 }
 
 // Used to implement filesystem::remove_all.


[gcc r15-4977] ipcp don't propagate where not needed

2024-11-06 Thread Michal Jires via Gcc-cvs
https://gcc.gnu.org/g:05e70ff9213159dc969a7edefc671d4ad65375f4

commit r15-4977-g05e70ff9213159dc969a7edefc671d4ad65375f4
Author: Michal Jires 
Date:   Thu Oct 24 00:52:28 2024 +0200

ipcp don't propagate where not needed

This patch disables propagation of ipcp information into partitions
where all instances of the node are marked to be inlined.

Motivation:
Incremental LTO needs stable values between compilations to be
effective. This requirement fails with following example:

void heavily_used_function(int);
...
heavily_used_function(__LINE__);

Ipcp creates long list of all __LINE__ arguments, and then
propagates it with every function clone, even though for inlined
functions this information is not useful.

gcc/ChangeLog:

* ipa-prop.cc (write_ipcp_transformation_info): Disable
uneeded value propagation.
* lto-cgraph.cc (lto_symtab_encoder_encode): Default values.
(lto_symtab_encoder_always_inlined_p): New.
(lto_set_symtab_encoder_not_always_inlined): New.
(add_node_to): Set always inlined.
* lto-streamer.h (struct lto_encoder_entry): New field.
(lto_symtab_encoder_always_inlined_p): New.

Diff:
---
 gcc/ipa-prop.cc| 12 +---
 gcc/lto-cgraph.cc  | 47 ++-
 gcc/lto-streamer.h | 11 +++
 3 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index add3a12b5848..599181d0a943 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5405,9 +5405,15 @@ write_ipcp_transformation_info (output_block *ob, 
cgraph_node *node,
   streamer_write_bitpack (&bp);
 }
 
-  streamer_write_uhwi (ob, vec_safe_length (ts->m_vr));
-  for (const ipa_vr &parm_vr : ts->m_vr)
-parm_vr.streamer_write (ob);
+  /* If all instances of this node are inlined, ipcp info is not useful.  */
+  if (!lto_symtab_encoder_only_for_inlining_p (encoder, node))
+{
+  streamer_write_uhwi (ob, vec_safe_length (ts->m_vr));
+  for (const ipa_vr &parm_vr : ts->m_vr)
+   parm_vr.streamer_write (ob);
+}
+  else
+streamer_write_uhwi (ob, 0);
 }
 
 /* Stream in the aggregate value replacement chain for NODE from IB.  */
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index b1fc694e5624..b18d2b34e468 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -113,7 +113,7 @@ lto_symtab_encoder_encode (lto_symtab_encoder_t encoder,
 
   if (!encoder->map)
 {
-  lto_encoder_entry entry = {node, false, false, false};
+  lto_encoder_entry entry (node);
 
   ref = encoder->nodes.length ();
   encoder->nodes.safe_push (entry);
@@ -123,7 +123,7 @@ lto_symtab_encoder_encode (lto_symtab_encoder_t encoder,
   size_t *slot = encoder->map->get (node);
   if (!slot || !*slot)
 {
-  lto_encoder_entry entry = {node, false, false, false};
+  lto_encoder_entry entry (node);
   ref = encoder->nodes.length ();
   if (!slot)
 encoder->map->put (node, ref + 1);
@@ -168,6 +168,15 @@ lto_symtab_encoder_delete_node (lto_symtab_encoder_t 
encoder,
   return true;
 }
 
+/* Return TRUE if the NODE and its clones are always inlined.  */
+
+bool
+lto_symtab_encoder_only_for_inlining_p (lto_symtab_encoder_t encoder,
+   struct cgraph_node *node)
+{
+  int index = lto_symtab_encoder_lookup (encoder, node);
+  return encoder->nodes[index].only_for_inlining;
+}
 
 /* Return TRUE if we should encode the body of NODE (if any).  */
 
@@ -179,17 +188,6 @@ lto_symtab_encoder_encode_body_p (lto_symtab_encoder_t 
encoder,
   return encoder->nodes[index].body;
 }
 
-/* Specify that we encode the body of NODE in this partition.  */
-
-static void
-lto_set_symtab_encoder_encode_body (lto_symtab_encoder_t encoder,
-   struct cgraph_node *node)
-{
-  int index = lto_symtab_encoder_encode (encoder, node);
-  gcc_checking_assert (encoder->nodes[index].node == node);
-  encoder->nodes[index].body = true;
-}
-
 /* Return TRUE if we should encode initializer of NODE (if any).  */
 
 bool
@@ -797,13 +795,28 @@ output_refs (lto_symtab_encoder_t encoder)
 
 static void
 add_node_to (lto_symtab_encoder_t encoder, struct cgraph_node *node,
-bool include_body)
+bool include_body, bool not_inlined)
 {
   if (node->clone_of)
-add_node_to (encoder, node->clone_of, include_body);
+add_node_to (encoder, node->clone_of, include_body, not_inlined);
+
+  int index = lto_symtab_encoder_encode (encoder, node);
+  gcc_checking_assert (encoder->nodes[index].node == node);
+
   if (include_body)
-lto_set_symtab_encoder_encode_body (encoder, node);
-  lto_symtab_encoder_encode (encoder, node);
+encoder->nodes[index].body = true;
+  if (not_inlined)
+encoder->nodes[index].only_for_inlining = false;
+}
+
+/* Add NODE into encoder as well as nodes it is cloned f

[gcc r15-4979] libstdc++: More user-friendly failed assertions from shared_ptr dereference

2024-11-06 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:1b169ee7e25129fede3aadbeb72037017d1d5a47

commit r15-4979-g1b169ee7e25129fede3aadbeb72037017d1d5a47
Author: Jonathan Wakely 
Date:   Wed Oct 30 11:41:47 2024 +

libstdc++: More user-friendly failed assertions from shared_ptr dereference

Currently dereferencing an empty shared_ptr prints a complicated
internal type in the assertion message:

include/bits/shared_ptr_base.h:1377: std::__shared_ptr_access<_Tp, _Lp, 
,  >::element_type& std::__shared_ptr_access<_Tp, _Lp, 
,  >::operator*() const [with _Tp = 
std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack; 
__gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool  = false; 
bool  = false; element_type = 
std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack]: Assertion 
'_M_get() != nullptr' failed.

Users don't care about any of the _Lp and  template
parameters, so this is unnecessarily verbose.

We can simplify it to something that only mentions "shared_ptr_deref"
and the element type:

include/bits/shared_ptr_base.h:1371: _Tp* std::__shared_ptr_deref(_Tp*) 
[with _Tp = filesystem::__cxx11::recursive_directory_iterator::_Dir_stack]: 
Assertion '__p != nullptr' failed.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h (__shared_ptr_deref): New
function template.
(__shared_ptr_access, __shared_ptr_access<>): Use it.

Diff:
---
 libstdc++-v3/include/bits/shared_ptr_base.h | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index 9a7617e7014f..ee01594ce0c5 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -1337,6 +1337,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
 
+  template
+[[__gnu__::__always_inline__]]
+inline _Tp*
+__shared_ptr_deref(_Tp* __p)
+{
+  __glibcxx_assert(__p != nullptr);
+  return __p;
+}
+
   // Define operator* and operator-> for shared_ptr.
   template::value, bool = is_void<_Tp>::value>
@@ -1347,10 +1356,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   element_type&
   operator*() const noexcept
-  {
-   __glibcxx_assert(_M_get() != nullptr);
-   return *_M_get();
-  }
+  { return *std::__shared_ptr_deref(_M_get()); }
 
   element_type*
   operator->() const noexcept
@@ -1392,10 +1398,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   [[__deprecated__("shared_ptr::operator* is absent from C++17")]]
   element_type&
   operator*() const noexcept
-  {
-   __glibcxx_assert(_M_get() != nullptr);
-   return *_M_get();
-  }
+  { return *std::__shared_ptr_deref(_M_get()); }
 
   [[__deprecated__("shared_ptr::operator-> is absent from C++17")]]
   element_type*
@@ -1406,13 +1409,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 #endif
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions"
   element_type&
   operator[](ptrdiff_t __i) const noexcept
   {
-   __glibcxx_assert(_M_get() != nullptr);
-   __glibcxx_assert(!extent<_Tp>::value || __i < extent<_Tp>::value);
-   return _M_get()[__i];
+   if constexpr (extent<_Tp>::value)
+ __glibcxx_assert(__i < extent<_Tp>::value);
+   return std::__shared_ptr_deref(_M_get())[__i];
   }
+#pragma GCC diagnostic pop
 
 private:
   element_type*


[gcc r15-4983] testsuite: add infinite recursion test case [PR63388]

2024-11-06 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:85736ba8e1fc4a5003f958dd268a155e379e059f

commit r15-4983-g85736ba8e1fc4a5003f958dd268a155e379e059f
Author: David Malcolm 
Date:   Wed Nov 6 08:45:29 2024 -0500

testsuite: add infinite recursion test case [PR63388]

gcc/testsuite/ChangeLog:
PR c++/63388
* g++.dg/analyzer/infinite-recursion-pr63388.C: New test.

Signed-off-by: David Malcolm 

Diff:
---
 .../g++.dg/analyzer/infinite-recursion-pr63388.C| 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/testsuite/g++.dg/analyzer/infinite-recursion-pr63388.C 
b/gcc/testsuite/g++.dg/analyzer/infinite-recursion-pr63388.C
new file mode 100644
index ..74af8cad3fff
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/infinite-recursion-pr63388.C
@@ -0,0 +1,21 @@
+// { dg-do compile { target c++11 } }
+
+namespace std
+{
+  class ostream;
+  extern ostream cout;
+}
+
+enum class Month {jan=1, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, 
dec};
+
+std::ostream& operator<<(std::ostream& os, Month m)
+{
+  return os << m; // { dg-warning "infinite recursion" }
+}
+
+int main()
+{
+  Month m = Month::may;
+  std::cout << m;
+  return 0;
+}


[gcc r15-4976] store-merging: Apply --param=store-merging-max-size= in more spots [PR117439]

2024-11-06 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:6d8764cc1f938b3edee4ac26dc5d4d8dca74dc54

commit r15-4976-g6d8764cc1f938b3edee4ac26dc5d4d8dca74dc54
Author: Jakub Jelinek 
Date:   Wed Nov 6 10:22:13 2024 +0100

store-merging: Apply --param=store-merging-max-size= in more spots 
[PR117439]

Store merging assumes a merged region won't be too large.  The assumption is
e.g. in using inappropriate types in various spots (e.g. int for bit sizes
and bit positions in a few spots, or unsigned for the total size in bytes of
the merged region), in doing XNEWVEC for the whole total size of the merged
region and preparing everything in there and even that XALLOCAVEC in two
spots.  The last case is what was breaking the test below in the patch,
64MB XALLOCAVEC is just too large, but even with that fixed I think we just
shouldn't be merging gigabyte large merge groups.

We already have --param=store-merging-max-size= parameter, right now with
65536 bytes maximum (if needed, we could raise that limit a little bit).
That parameter is currently used when merging two adjacent stores, if the
size of the already merged bitregion together with the new store's bitregion
is above that limit, we don't merge those.
I guess initially that was sufficient, at that time a store was always
limited to MAX_BITSIZE_MODE_ANY_INT bits.
But later on we've added support for empty ctors ({} and even later
{CLOBBER}) and also added another spot where we merge further stores into
the merge group, if there is some overlap, we can merge various other stores
in one coalesce_immediate_stores iteration.
And, we weren't applying the --param=store-merging-max-size= parameter
in either of those cases.  So a single store can be gigabytes long, and
if there is some overlap, we can extend the region again to gigabytes in
size.

The following patch attempts to apply that parameter even in those cases.
So, if testing if it should merge the merged group with info (we've already
punted if those together are above the parameter) and some other stores,
the first two hunks just punt if that would make the merge group too large.
And the third hunk doesn't even add stores which are over the limit.

2024-11-06  Jakub Jelinek  

PR tree-optimization/117439
* gimple-ssa-store-merging.cc
(imm_store_chain_info::coalesce_immediate_stores): Punt if merging 
of
any of the additional overlapping stores would result in growing the
bitregion size over param_store_merging_max_size.
(pass_store_merging::process_store): Terminate all aliasing chains
for stores with bitregion larger than param_store_merging_max_size.

* g++.dg/opt/pr117439.C: New test.

Diff:
---
 gcc/gimple-ssa-store-merging.cc | 21 -
 gcc/testsuite/g++.dg/opt/pr117439.C | 16 
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-store-merging.cc b/gcc/gimple-ssa-store-merging.cc
index 5962a281d801..0929defdf6cb 100644
--- a/gcc/gimple-ssa-store-merging.cc
+++ b/gcc/gimple-ssa-store-merging.cc
@@ -3246,6 +3246,10 @@ imm_store_chain_info::coalesce_immediate_stores ()
  unsigned int min_order = first_order;
  unsigned first_nonmergeable_int_order = ~0U;
  unsigned HOST_WIDE_INT this_end = end;
+ unsigned HOST_WIDE_INT this_bitregion_start
+   = new_bitregion_start;
+ unsigned HOST_WIDE_INT this_bitregion_end
+   = new_bitregion_end;
  k = i;
  first_nonmergeable_order = ~0U;
  for (unsigned int j = i + 1; j < len; ++j)
@@ -3269,6 +3273,19 @@ imm_store_chain_info::coalesce_immediate_stores ()
  k = 0;
  break;
}
+ if (info2->bitregion_start
+ < this_bitregion_start)
+   this_bitregion_start = info2->bitregion_start;
+ if (info2->bitregion_end
+ > this_bitregion_end)
+   this_bitregion_end = info2->bitregion_end;
+ if (((this_bitregion_end - this_bitregion_start
+   + 1) / BITS_PER_UNIT)
+ > (unsigned) param_store_merging_max_size)
+   {
+ k = 0;
+ break;
+   }
  k = j;
  min_order = MIN (min_order, info2->order);
  this_end = MAX (this_end,
@@ -5336,7 +5353,9 @@ pass_store_merging::process_

[gcc r15-4975] store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439]

2024-11-06 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:aab572240a0752da74029ed9f8918e0b1628e8ba

commit r15-4975-gaab572240a0752da74029ed9f8918e0b1628e8ba
Author: Jakub Jelinek 
Date:   Wed Nov 6 10:21:09 2024 +0100

store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless 
necessary [PR117439]

encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
it has to allocate a buffer and do various extra work like shifting the bits
etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
doesn't have corresponding integer mode.
The last case is explained later in the comments:
  /* The native_encode_expr machinery uses TYPE_MODE to determine how many
 bytes to write.  This means it can write more than
 ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example
 write 8 bytes for a bitlen of 40).  Skip the bytes that are not within
 bitlen and zero out the bits that are not relevant as well (that may
 contain a sign bit due to sign-extension).  */
Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR
or {CLOBBER}, which doesn't use native_encode_expr at all, just memset,
so that case doesn't need those fancy games unless bitlen or bitpos
aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is
possible).

The following patch makes us use the fast path even for empty_ctor_p
which occupy full bytes, we can just memset that in the provided buffer and
don't need to XALLOCAVEC another buffer.

This patch in itself fixes the testcase from the PR (which was about using
huge XALLLOCAVEC), but I want to do some other changes, to be posted in a
next patch.

2024-11-06  Jakub Jelinek  

PR tree-optimization/117439
* gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For
empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an
integral mode.

Diff:
---
 gcc/gimple-ssa-store-merging.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/gimple-ssa-store-merging.cc b/gcc/gimple-ssa-store-merging.cc
index d4d078576259..5962a281d801 100644
--- a/gcc/gimple-ssa-store-merging.cc
+++ b/gcc/gimple-ssa-store-merging.cc
@@ -1934,14 +1934,15 @@ encode_tree_to_bitpos (tree expr, unsigned char *ptr, 
int bitlen, int bitpos,
   unsigned int total_bytes)
 {
   unsigned int first_byte = bitpos / BITS_PER_UNIT;
-  bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
-   || (bitpos % BITS_PER_UNIT)
-   || !int_mode_for_size (bitlen, 0).exists ());
   bool empty_ctor_p
 = (TREE_CODE (expr) == CONSTRUCTOR
&& CONSTRUCTOR_NELTS (expr) == 0
&& TYPE_SIZE_UNIT (TREE_TYPE (expr))
-  && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr;
+   && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr;
+  bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
+   || (bitpos % BITS_PER_UNIT)
+   || (!int_mode_for_size (bitlen, 0).exists ()
+   && !empty_ctor_p));
 
   if (!sub_byte_op_p)
 {


[gcc r15-4984] Add details output for assume processing.

2024-11-06 Thread Andrew Macleod via Gcc-cvs
https://gcc.gnu.org/g:137b26412f681bb1f8b3eb52b8f9efd79e6bda2a

commit r15-4984-g137b26412f681bb1f8b3eb52b8f9efd79e6bda2a
Author: Andrew MacLeod 
Date:   Tue Nov 5 12:52:51 2024 -0500

Add details output for assume processing.

The Assume pass simply produces results, with no indication of how it
arrived as the results it gets.  Add some output to the details listing.

The only functional change is when gori is used to calculate a range
more than once (ie, multiple uses), we now load the merged range rather
than just using the last calculated one.

* tree-assume.cc (assume_query::assume_query): Add debug output.
(assume_query::update_parms): Likewise.
(assume_query::calculate_phi): Likewise.
(assume_query::calculate_op): Likewise.  Also pick up any
merged path values.
(assume_query::calculate_stmt): Likewise.

Diff:
---
 gcc/tree-assume.cc | 134 +
 1 file changed, 115 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-assume.cc b/gcc/tree-assume.cc
index dd279f581795..5c6e0832028e 100644
--- a/gcc/tree-assume.cc
+++ b/gcc/tree-assume.cc
@@ -18,6 +18,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_MEMORY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -33,6 +34,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimple-range.h"
 #include "tree-dfa.h"
+#include "tree-cfg.h"
+#include "gimple-pretty-print.h"
 
 // An assume query utilizes the current range query to implelemtn the assume
 // keyword.
@@ -145,7 +148,7 @@ assume_query::assume_query (function *f, bitmap p) : 
m_parm_list (p),
 process_stmts (def, lhs_range);
 
   if (dump_file)
-fprintf (dump_file, "Assumptions :\n--\n");
+fprintf (dump_file, "\n\nAssumptions :\n--\n");
 
   // Now export any interesting values that were found.
   bitmap_iterator bi;
@@ -159,6 +162,12 @@ assume_query::assume_query (function *f, bitmap p) : 
m_parm_list (p),
   if (m_parms.get_range (assume_range, name) && !assume_range.varying_p ())
set_range_info (name, assume_range);
 }
+
+  if (dump_file)
+   {
+ fputc ('\n', dump_file);
+ gimple_dump_cfg (dump_file, dump_flags & ~TDF_DETAILS);
+   }
 }
 
 // This function Will update all the current value of interesting parameters.
@@ -172,6 +181,9 @@ assume_query::assume_query (function *f, bitmap p) : 
m_parm_list (p),
 void
 assume_query::update_parms (fur_source &src)
 {
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "\nupdate parameters\n");
+
   // Merge any parameter values.
   bitmap_iterator bi;
   unsigned x;
@@ -180,40 +192,85 @@ assume_query::update_parms (fur_source &src)
   tree name = ssa_name (x);
   tree type = TREE_TYPE (name);
 
-  // Find a valu efrom calculations.
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "PARAMETER ");
+ print_generic_expr (dump_file, name, TDF_SLIM);
+   }
   value_range glob_range (type);
-  if (!m_path.get_range (glob_range, name)
- && !src.get_operand (glob_range, name))
+  // Find a value from calculations.
+  // There will be a value in m_path if GORI calculated an operand value.
+  if (m_path.get_range (glob_range, name))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "\n  Calculated path range:");
+ glob_range.dump (dump_file);
+   }
+   }
+  // Otherwise, let ranger determine the range at the SRC location.
+  else if (src.get_operand (glob_range, name))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "\n  Ranger Computes path range:");
+ glob_range.dump (dump_file);
+   }
+   }
+  else
glob_range.set_varying (type);
 
-  // Find any current value of parm, and combine them.
+  // Find any current saved value of parm, and combine them.
   value_range parm_range (type);
   if (m_parms.get_range (parm_range, name))
glob_range.union_ (parm_range);
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "\n  Combine with previous range:");
+ parm_range.dump (dump_file);
+ fputc ('\n', dump_file);
+ print_generic_expr (dump_file, name, TDF_SLIM);
+ fprintf (dump_file, " = ");
+ glob_range.dump (dump_file);
+ fputc ('\n', dump_file);
+   }
   // Set this new value.
   m_parms.set_range (name, glob_range);
 }
   // Now reset the path values for the next path.
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file,

[gcc r15-4980] libstdc++: Move include guards to start of headers

2024-11-06 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:6a050a3e650c7718c0a8cc948cd57706795f9805

commit r15-4980-g6a050a3e650c7718c0a8cc948cd57706795f9805
Author: Jonathan Wakely 
Date:   Tue Nov 5 12:54:32 2024 +

libstdc++: Move include guards to start of headers

libstdc++-v3/ChangeLog:

* include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move
include guard to start of the header.
* include/c_global/ctgmath (_GLIBCXX_CTGMATH): Likewise.

Diff:
---
 libstdc++-v3/include/c_compatibility/complex.h | 6 +++---
 libstdc++-v3/include/c_global/ctgmath  | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/c_compatibility/complex.h 
b/libstdc++-v3/include/c_compatibility/complex.h
index 605d1f30e06e..a3102d9fce30 100644
--- a/libstdc++-v3/include/c_compatibility/complex.h
+++ b/libstdc++-v3/include/c_compatibility/complex.h
@@ -26,6 +26,9 @@
  *  This is a Standard C++ Library header.
  */
 
+#ifndef _GLIBCXX_COMPLEX_H
+#define _GLIBCXX_COMPLEX_H 1
+
 #include 
 
 #if __cplusplus >= 201103L
@@ -42,7 +45,4 @@
 # endif
 #endif
 
-#ifndef _GLIBCXX_COMPLEX_H
-#define _GLIBCXX_COMPLEX_H 1
-
 #endif
diff --git a/libstdc++-v3/include/c_global/ctgmath 
b/libstdc++-v3/include/c_global/ctgmath
index 79c1a029f41d..39c17668f16a 100644
--- a/libstdc++-v3/include/c_global/ctgmath
+++ b/libstdc++-v3/include/c_global/ctgmath
@@ -26,13 +26,13 @@
  *  This is a Standard C++ Library header.
  */
 
+#ifndef _GLIBCXX_CTGMATH
+#define _GLIBCXX_CTGMATH 1
+
 #ifdef _GLIBCXX_SYSHDR
 #pragma GCC system_header
 #endif
 
-#ifndef _GLIBCXX_CTGMATH
-#define _GLIBCXX_CTGMATH 1
-
 #if __cplusplus < 201103L
 #  include 
 #else


[gcc r15-4982] diagnostics: fix typo in comment

2024-11-06 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:6f4977ee545ab81906dcdcc6e44b7d6ca1404652

commit r15-4982-g6f4977ee545ab81906dcdcc6e44b7d6ca1404652
Author: David Malcolm 
Date:   Wed Nov 6 08:45:29 2024 -0500

diagnostics: fix typo in comment

gcc/ChangeLog:
* diagnostic.h (class diagnostic_context): Fix typo in leading
comment.

Signed-off-by: David Malcolm 

Diff:
---
 gcc/diagnostic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index a0056b41b2df..5b71523cf89b 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -483,7 +483,7 @@ struct diagnostic_counters
- being a central place for clients to report diagnostics
- reporting those diagnostics to zero or more output sinks
  (e.g. text vs SARIF)
-   - proving a "dump" member function for a debug dump of the state of
+   - providing a "dump" member function for a debug dump of the state of
  the diagnostics subsytem
- direct vs buffered diagnostics (see class diagnostic_buffer)
- tracking the original argv of the program (for SARIF output)


[gcc r15-4987] openmp: Add IFN_GOMP_MAX_VF

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:2a2e6e9894f42fef9315aaad80c36843718ca0cb

commit r15-4987-g2a2e6e9894f42fef9315aaad80c36843718ca0cb
Author: Andrew Stubbs 
Date:   Fri Nov 1 15:00:25 2024 +

openmp: Add IFN_GOMP_MAX_VF

Delay omp_max_vf call until after the host and device compilers have 
diverged
so that the max_vf value can be tuned exactly right on both variants.

This change means that the ompdevlow pass must be enabled for functions that
use OpenMP directives with both "simd" and "schedule" enabled.

gcc/ChangeLog:

* internal-fn.cc (expand_GOMP_MAX_VF): New function.
* internal-fn.def (GOMP_MAX_VF): New internal function.
* omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when
called in offload context, otherwise assume host context.
* omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.

Diff:
---
 gcc/internal-fn.cc  |  8 
 gcc/internal-fn.def |  1 +
 gcc/omp-expand.cc   | 30 ++
 gcc/omp-offload.cc  |  3 +++
 4 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 1b3fe7be0479..0ee5f5bc7c55 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -510,6 +510,14 @@ expand_GOMP_SIMT_VF (internal_fn, gcall *)
 
 /* This should get expanded in omp_device_lower pass.  */
 
+static void
+expand_GOMP_MAX_VF (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
+/* This should get expanded in omp_device_lower pass.  */
+
 static void
 expand_GOMP_TARGET_REV (internal_fn, gcall *)
 {
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 2d4559382711..c3d0efc0f2c3 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -465,6 +465,7 @@ DEF_INTERNAL_FN (GOMP_SIMT_ENTER_ALLOC, ECF_LEAF | 
ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_EXIT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (GOMP_MAX_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, 
NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_ORDERED_PRED, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VOTE_ANY, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index b0f9d375b6c7..80fb1843445d 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -229,15 +229,29 @@ omp_adjust_chunk_size (tree chunk_size, bool 
simd_schedule, bool offload)
   if (!simd_schedule || integer_zerop (chunk_size))
 return chunk_size;
 
-  poly_uint64 vf = omp_max_vf (offload);
-  if (known_eq (vf, 1U))
-return chunk_size;
-
+  tree vf;
   tree type = TREE_TYPE (chunk_size);
-  chunk_size = fold_build2 (PLUS_EXPR, type, chunk_size,
-   build_int_cst (type, vf - 1));
-  return fold_build2 (BIT_AND_EXPR, type, chunk_size,
- build_int_cst (type, -vf));
+
+  if (offload)
+{
+  cfun->curr_properties &= ~PROP_gimple_lomp_dev;
+  vf = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_MAX_VF,
+unsigned_type_node, 0);
+  vf = fold_convert (type, vf);
+}
+  else
+{
+  poly_uint64 vf_num = omp_max_vf (false);
+  if (known_eq (vf_num, 1U))
+   return chunk_size;
+  vf = build_int_cst (type, vf_num);
+}
+
+  tree vf_minus_one = fold_build2 (MINUS_EXPR, type, vf,
+  build_int_cst (type, 1));
+  tree negative_vf = fold_build1 (NEGATE_EXPR, type, vf);
+  chunk_size = fold_build2 (PLUS_EXPR, type, chunk_size, vf_minus_one);
+  return fold_build2 (BIT_AND_EXPR, type, chunk_size, negative_vf);
 }
 
 /* Collect additional arguments needed to emit a combined
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 25ce8133fe5e..372b019f9d60 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2754,6 +2754,9 @@ execute_omp_device_lower ()
  case IFN_GOMP_SIMT_VF:
rhs = build_int_cst (type, vf);
break;
+ case IFN_GOMP_MAX_VF:
+   rhs = build_int_cst (type, omp_max_vf (false));
+   break;
  case IFN_GOMP_SIMT_ORDERED_PRED:
rhs = vf == 1 ? integer_zero_node : NULL_TREE;
if (rhs || !lhs)


[gcc r15-4988] openmp: Add testcases for omp_max_vf

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:d334f729e53867b838e867375b3f475ba793d96e

commit r15-4988-gd334f729e53867b838e867375b3f475ba793d96e
Author: Andrew Stubbs 
Date:   Wed Nov 6 12:26:08 2024 +

openmp: Add testcases for omp_max_vf

Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, 
when
offloading is enabled ("target" directives are present), and is inactive
otherwise.

libgomp/ChangeLog:

* testsuite/libgomp.c/max_vf-1.c: New test.
* testsuite/libgomp.c/max_vf-2.c: New test.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/max_vf-1.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/gomp/max_vf-1.c   | 37 ++
 libgomp/testsuite/libgomp.c/max_vf-1.c | 47 ++
 libgomp/testsuite/libgomp.c/max_vf-2.c | 21 +++
 3 files changed, 105 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/gomp/max_vf-1.c 
b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c
new file mode 100644
index ..0513aae226ce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/max_vf-1.c
@@ -0,0 +1,37 @@
+/* Test that omp parallel simd schedule uses the correct max_vf for the
+   host system, when no target directives are present.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-ompexp" } */
+
+/* Fix a max_vf size so we can scan for it.
+{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */
+
+#define N 1024
+int a[N], b[N], c[N];
+
+void
+f2 (void)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd: static, 7)
+  for (i = 0; i < N; i++)
+a[i] = b[i] + c[i];
+}
+
+/* Make sure the max_vf is inlined as a number.
+   Hopefully there are no unrelated uses of these numbers ...
+{ dg-final { scan-tree-dump-times {\* 16} 2 "ompexp" { target { x86_64-*-* } } 
} }
+{ dg-final { scan-tree-dump-times {\+ 16} 1 "ompexp" { target { x86_64-*-* } } 
} } */
+
+void
+f3 (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd : dynamic, 7)
+  for (i = 0; i < N; i++)
+a[i] = b[i] + c[i];
+}
+
+/* Make sure the max_vf is inlined as a number.
+{ dg-final { scan-tree-dump-times 
{__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 16, 0\);} 1 "ompexp" { 
target { x86_64-*-* } } } } */
diff --git a/libgomp/testsuite/libgomp.c/max_vf-1.c 
b/libgomp/testsuite/libgomp.c/max_vf-1.c
new file mode 100644
index ..be900c565a37
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/max_vf-1.c
@@ -0,0 +1,47 @@
+/* Test that omp parallel simd schedule uses the correct max_vf for the
+   host system, when target directives are present.  */
+
+/* { dg-require-effective-target offloading_enabled } */
+
+/* { dg-do link } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-ompexp 
-foffload=-fdump-tree-optimized" } */
+
+/* Fix a max_vf size so we can scan for it.
+{ dg-additional-options "-msse2" { target { x86_64-*-* i?86-*-* } } } */
+
+#define N 1024
+int a[N], b[N], c[N];
+
+/* Test both static schedules and inline target directives.  */
+void
+f2 (void)
+{
+  int i;
+  #pragma omp target parallel for simd schedule (simd: static, 7)
+  for (i = 0; i < N; i++)
+a[i] = b[i] + c[i];
+}
+
+/* Test both dynamic schedules and declare target functions.  */
+#pragma omp declare target
+void
+f3 (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp parallel for simd schedule (simd : dynamic, 7)
+  for (i = 0; i < N; i++)
+a[i] = b[i] + c[i];
+}
+#pragma omp end declare target
+
+/* Make sure that the max_vf is used as an IFN.
+{ dg-final { scan-tree-dump-times {GOMP_MAX_VF} 2 "ompexp" { target { 
x86_64-*-* i?86-*-* } } } } */
+
+/* Make sure the max_vf is passed as a temporary variable.
+{ dg-final { scan-tree-dump-times 
{__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, D\.[0-9]*, 0\);} 1 
"ompexp" { target { x86_64-*-* i?86-*-* } } } } */
+
+/* Test SIMD offload devices
+{ dg-final { scan-offload-tree-dump-times 
{__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 64, 0\);} 1 
"optimized" { target { offload_gcn } } } } 
+{ dg-final { scan-offload-tree-dump-times 
{__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \(.*, 7, 0\);} 1 "optimized" 
{ target { offload_nvptx } } } } */
+
+int main() {}
diff --git a/libgomp/testsuite/libgomp.c/max_vf-2.c 
b/libgomp/testsuite/libgomp.c/max_vf-2.c
new file mode 100644
index ..91744c309df8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/max_vf-2.c
@@ -0,0 +1,21 @@
+/* Ensure that the default safelen is set correctly for the larger of the host
+   and offload device, to prevent defeating the vectorizer.  */
+ 
+/* { dg-require-effective-target offloading_enabled } */
+
+/* { dg-do link } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-omplower" } */
+
+int f(float *a, float *b, int n)
+{
+  float sum = 0;
+  #pragma omp target teams distribute parallel for simd map(tofrom: sum) 
reduction(+:sum)
+  for (int i = 0; i < n; i++)
+sum += a[i] * b[i];
+  return sum;
+}
+
+/* Make sure that 

[gcc r15-4985] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:5c9de3df8547682bfb3d484d7d28a27776bf979c

commit r15-4985-g5c9de3df8547682bfb3d484d7d28a27776bf979c
Author: Andrew Stubbs 
Date:   Mon Oct 21 12:29:54 2024 +

openmp: Tune omp_max_vf for offload targets

If requested, return the vectorization factor appropriate for the offload
device, if any.

This change gives a significant speedup in the BabelStream "dot" benchmark 
on
amdgcn.

The omp_adjust_chunk_size usecase is set "false", for now, but I intend to
change that in a follow-up patch.

Note that NVPTX SIMT offload does not use this code-path.

gcc/ChangeLog:

* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set
omp_max_vf to offload == false.
* omp-expand.cc (omp_adjust_chunk_size): Likewise.
* omp-general.cc (omp_max_vf): Add "offload" parameter, and detect
amdgcn offload devices.
* omp-general.h (omp_max_vf): Likewise.
* omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to
omp_max_vf.

Diff:
---
 gcc/gimple-loop-versioning.cc |  2 +-
 gcc/omp-expand.cc |  2 +-
 gcc/omp-general.cc| 17 +++--
 gcc/omp-general.h |  2 +-
 gcc/omp-low.cc|  3 ++-
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc
index 107b00200247..2968c929d04a 100644
--- a/gcc/gimple-loop-versioning.cc
+++ b/gcc/gimple-loop-versioning.cc
@@ -554,7 +554,7 @@ loop_versioning::loop_versioning (function *fn)
  handled efficiently by scalar code.  omp_max_vf calculates the
  maximum number of bytes in a vector, when such a value is relevant
  to loop optimization.  */
-  m_maximum_scale = estimated_poly_value (omp_max_vf ());
+  m_maximum_scale = estimated_poly_value (omp_max_vf (false));
   m_maximum_scale = MAX (m_maximum_scale, MAX_FIXED_MODE_SIZE);
 }
 
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index b0b4ddf5dbc8..907fd46a5b26 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -212,7 +212,7 @@ omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
   if (!simd_schedule || integer_zerop (chunk_size))
 return chunk_size;
 
-  poly_uint64 vf = omp_max_vf ();
+  poly_uint64 vf = omp_max_vf (false);
   if (known_eq (vf, 1U))
 return chunk_size;
 
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index f74b9bf5e96c..1ae575ee181f 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -987,10 +987,11 @@ find_combined_omp_for (tree *tp, int *walk_subtrees, void 
*data)
   return NULL_TREE;
 }
 
-/* Return maximum possible vectorization factor for the target.  */
+/* Return maximum possible vectorization factor for the target, or for
+   the OpenMP offload target if one exists.  */
 
 poly_uint64
-omp_max_vf (void)
+omp_max_vf (bool offload)
 {
   if (!optimize
   || optimize_debug
@@ -999,6 +1000,18 @@ omp_max_vf (void)
  && OPTION_SET_P (flag_tree_loop_vectorize)))
 return 1;
 
+  if (ENABLE_OFFLOADING && offload)
+{
+  for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;)
+   {
+ if (startswith (c, "amdgcn"))
+   return ordered_max (64, omp_max_vf (false));
+ else if ((c = strchr (c, ':')))
+   c++;
+   }
+  /* Otherwise, fall through to host VF.  */
+}
+
   auto_vector_modes modes;
   targetm.vectorize.autovectorize_vector_modes (&modes, true);
   if (!modes.is_empty ())
diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index f37781316269..70f78d2055b7 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -162,7 +162,7 @@ extern void omp_extract_for_data (gomp_for *for_stmt, 
struct omp_for_data *fd,
  struct omp_for_data_loop *loops);
 extern gimple *omp_build_barrier (tree lhs);
 extern tree find_combined_omp_for (tree *, int *, void *);
-extern poly_uint64 omp_max_vf (void);
+extern poly_uint64 omp_max_vf (bool);
 extern int omp_max_simt_vf (void);
 extern const char *omp_context_name_list_prop (tree);
 extern void omp_construct_traits_to_codes (tree, int, enum tree_code *);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 44c4310075bf..70a2c108fbca 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -4589,7 +4589,8 @@ lower_rec_simd_input_clauses (tree new_var, omp_context 
*ctx,
 {
   if (known_eq (sctx->max_vf, 0U))
 {
-  sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
+  sctx->max_vf = (sctx->is_simt ? omp_max_simt_vf ()
+ : omp_max_vf (omp_maybe_offloaded_ctx (ctx)));
   if (maybe_gt (sctx->max_vf, 1U))
{
  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),


[gcc r15-4986] openmp: use offload max_vf for chunk_size

2024-11-06 Thread Andrew Stubbs via Gcc-cvs
https://gcc.gnu.org/g:896c6c28939f0b1eb6582231d24ea07ce01d071e

commit r15-4986-g896c6c28939f0b1eb6582231d24ea07ce01d071e
Author: Andrew Stubbs 
Date:   Fri Nov 1 13:53:34 2024 +

openmp: use offload max_vf for chunk_size

The chunk size for SIMD loops should be right for the current device; too 
big
allocates too much memory, too small is inefficient.  Getting it wrong 
doesn't
actually break anything though.

This patch attempts to choose the optimal setting based on the context.  
Both
host-fallback and device will get the same chunk size, but device 
performance
is the most important in this case.

gcc/ChangeLog:

* omp-expand.cc (is_in_offload_region): New function.
(omp_adjust_chunk_size): Add pass-through "offload" parameter.
(get_ws_args_for): Likewise.
(determine_parallel_type): Use is_in_offload_region to adjust call 
to
get_ws_args_for.
(expand_omp_for_generic): Likewise.
(expand_omp_for_static_chunk): Likewise.

Diff:
---
 gcc/omp-expand.cc | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 907fd46a5b26..b0f9d375b6c7 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -127,6 +127,23 @@ is_combined_parallel (struct omp_region *region)
   return region->is_combined_parallel;
 }
 
+/* Return true is REGION is or is contained within an offload region.  */
+
+static bool
+is_in_offload_region (struct omp_region *region)
+{
+  gimple *entry_stmt = last_nondebug_stmt (region->entry);
+  if (is_gimple_omp (entry_stmt)
+  && is_gimple_omp_offloaded (entry_stmt))
+return true;
+  else if (region->outer)
+return is_in_offload_region (region->outer);
+  else
+return (lookup_attribute ("omp declare target",
+ DECL_ATTRIBUTES (current_function_decl))
+   != NULL);
+}
+
 /* Given two blocks PAR_ENTRY_BB and WS_ENTRY_BB such that WS_ENTRY_BB
is the immediate dominator of PAR_ENTRY_BB, return true if there
are no data dependencies that would prevent expanding the parallel
@@ -207,12 +224,12 @@ workshare_safe_to_combine_p (basic_block ws_entry_bb)
presence (SIMD_SCHEDULE).  */
 
 static tree
-omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
+omp_adjust_chunk_size (tree chunk_size, bool simd_schedule, bool offload)
 {
   if (!simd_schedule || integer_zerop (chunk_size))
 return chunk_size;
 
-  poly_uint64 vf = omp_max_vf (false);
+  poly_uint64 vf = omp_max_vf (offload);
   if (known_eq (vf, 1U))
 return chunk_size;
 
@@ -228,7 +245,7 @@ omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
expanded.  */
 
 static vec *
-get_ws_args_for (gimple *par_stmt, gimple *ws_stmt)
+get_ws_args_for (gimple *par_stmt, gimple *ws_stmt, bool offload)
 {
   tree t;
   location_t loc = gimple_location (ws_stmt);
@@ -270,7 +287,7 @@ get_ws_args_for (gimple *par_stmt, gimple *ws_stmt)
   if (fd.chunk_size)
{
  t = fold_convert_loc (loc, long_integer_type_node, fd.chunk_size);
- t = omp_adjust_chunk_size (t, fd.simd_schedule);
+ t = omp_adjust_chunk_size (t, fd.simd_schedule, offload);
  ws_args->quick_push (t);
}
 
@@ -366,7 +383,8 @@ determine_parallel_type (struct omp_region *region)
 
   region->is_combined_parallel = true;
   region->inner->is_combined_parallel = true;
-  region->ws_args = get_ws_args_for (par_stmt, ws_stmt);
+  region->ws_args = get_ws_args_for (par_stmt, ws_stmt,
+is_in_offload_region (region));
 }
 }
 
@@ -3929,6 +3947,7 @@ expand_omp_for_generic (struct omp_region *region,
   tree *counts = NULL;
   int i;
   bool ordered_lastprivate = false;
+  bool offload = is_in_offload_region (region);
 
   gcc_assert (!broken_loop || !in_combined_parallel);
   gcc_assert (fd->iter_type == long_integer_type_node
@@ -4196,7 +4215,7 @@ expand_omp_for_generic (struct omp_region *region,
  if (fd->chunk_size)
{
  t = fold_convert (fd->iter_type, fd->chunk_size);
- t = omp_adjust_chunk_size (t, fd->simd_schedule);
+ t = omp_adjust_chunk_size (t, fd->simd_schedule, offload);
  if (sched_arg)
{
  if (fd->ordered)
@@ -4240,7 +4259,7 @@ expand_omp_for_generic (struct omp_region *region,
{
  tree bfn_decl = builtin_decl_explicit (start_fn);
  t = fold_convert (fd->iter_type, fd->chunk_size);
- t = omp_adjust_chunk_size (t, fd->simd_schedule);
+ t = omp_adjust_chunk_size (t, fd->simd_schedule, offload);
  if (sched_arg)
t = build_call_expr (bfn_decl, 10, t5, t0, t1, t2, sched_arg,
 t, t3, t4, reductions, mem);
@@ -5937,7 +5956,8 @@ expand_omp_for_static_chunk (struct om

[gcc r15-4995] RISC-V: Add testcases for signed imm SAT_ADD form1

2024-11-06 Thread Li Xu via Gcc-cvs
https://gcc.gnu.org/g:1e2ae65a7f01fa3dcdbfd1bb5bc87b860172336d

commit r15-4995-g1e2ae65a7f01fa3dcdbfd1bb5bc87b860172336d
Author: xuli 
Date:   Mon Nov 4 10:00:45 2024 +

RISC-V: Add testcases for signed imm SAT_ADD form1

This patch adds testcase for form1, as shown below:

T __attribute__((noinline))  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{\
  T sum = (UT)x + (UT)IMM; \
  return (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Support signed
imm SAT_ADD form1.
* gcc.target/riscv/sat_s_add_imm-1-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-3-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-4.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-4.c: New test.

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h | 15 
 gcc/testsuite/gcc.target/riscv/sat_s_add_imm-1-1.c | 10 ++
 gcc/testsuite/gcc.target/riscv/sat_s_add_imm-1.c   | 30 
 gcc/testsuite/gcc.target/riscv/sat_s_add_imm-2-1.c | 10 ++
 gcc/testsuite/gcc.target/riscv/sat_s_add_imm-2.c   | 33 +
 gcc/testsuite/gcc.target/riscv/sat_s_add_imm-3-1.c | 10 ++
 gcc/testsuite/gcc.target/riscv/sat_s_add_imm-3.c   | 31 
 gcc/testsuite/gcc.target/riscv/sat_s_add_imm-4.c   | 29 +++
 .../gcc.target/riscv/sat_s_add_imm-run-1.c | 42 ++
 .../gcc.target/riscv/sat_s_add_imm-run-2.c | 42 ++
 .../gcc.target/riscv/sat_s_add_imm-run-3.c | 42 ++
 .../gcc.target/riscv/sat_s_add_imm-run-4.c | 42 ++
 12 files changed, 336 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 2cbd1f18c8d2..b334e7f630c5 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -176,6 +176,21 @@ sat_s_add_##T##_fmt_4 (T x, T y)   \
 #define RUN_SAT_S_ADD_FMT_4(T, x, y) sat_s_add_##T##_fmt_4(x, y)
 #define RUN_SAT_S_ADD_FMT_4_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_4(T, x, y)
 
+#define DEF_SAT_S_ADD_IMM_FMT_1(INDEX, T, UT, IMM, MIN, MAX) \
+T __attribute__((noinline))  \
+sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
+{\
+  T sum = (UT)x + (UT)IMM; \
+  return (x ^ IMM) < 0 \
+? sum\
+: (sum ^ x) >= 0 \
+  ? sum  \
+  : x < 0 ? MIN : MAX;   \
+}
+
+#define RUN_SAT_S_ADD_IMM_FMT_1(INDEX, T, x, expect) \
+  if (sat_s_add_imm##_##T##_fmt_1##_##INDEX(x) != expect) __builtin_abort ()
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add_imm-1-1.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add_imm-1-1.c
new file mode 100644
index ..f20f9b0c477c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add_imm-1-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-skip-if  "" { *-*-* } { "-flto" } } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details" } */
+
+#include "sat_arith.h"
+
+DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -129, INT8_MIN, INT8_MAX)
+DEF_SAT_S_ADD_IMM_FMT_1(1, int8_t, uint8_t, 128, INT8_MIN, INT8_MAX)
+
+/* { dg-final { scan-rtl-dump-not ".SAT_ADD " "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add_imm-1.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add_imm-1.c
new file mode 100644
index ..6746caf02f6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add_imm-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-skip-if  "" { *-*-* } { "-flto" } } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-sche

[gcc r15-4994] Match:Support signed imm SAT_ADD form1

2024-11-06 Thread Li Xu via Gcc-cvs
https://gcc.gnu.org/g:da31786910f253bba062d8f7126b269c432083ff

commit r15-4994-gda31786910f253bba062d8f7126b269c432083ff
Author: xuli 
Date:   Wed Nov 6 01:56:09 2024 +

Match:Support signed imm SAT_ADD form1

This patch would like to support .SAT_ADD when one of the op
is singed IMM.

Form1:
T __attribute__((noinline))  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{\
  T sum = (UT)x + (UT)IMM; \
  return (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}

Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX)

Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  int8_t sum;
  unsigned char x.0_1;
  unsigned char _2;
  signed char _4;
  int8_t _5;
  _Bool _9;
  signed char _10;
  signed char _11;
  signed char _12;
  signed char _14;
  signed char _16;

   [local count: 1073741824]:
  x.0_1 = (unsigned char) x_6(D);
  _2 = x.0_1 + 246;
  sum_7 = (int8_t) _2;
  _4 = x_6(D) ^ sum_7;
  _16 = x_6(D) ^ 9;
  _14 = _4 & _16;
  if (_14 < 0)
goto ; [41.00%]
  else
goto ; [59.00%]

   [local count: 259738147]:
  _9 = x_6(D) < 0;
  _10 = (signed char) _9;
  _11 = -_10;
  _12 = _11 ^ 127;

   [local count: 1073741824]:
  # _5 = PHI 
  return _5;

}

After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  int8_t _5;

   [local count: 1073741824]:
  _5 = .SAT_ADD (x_6(D), -10); [tail call]
  return _5;

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu 

gcc/ChangeLog:

* match.pd: Add the form1 of signed imm .SAT_ADD matching.
* tree-ssa-math-opts.cc (match_saturation_add): Add fold
convert for const_int to the type of operand 0.

Diff:
---
 gcc/match.pd  | 13 +
 gcc/tree-ssa-math-opts.cc |  3 +++
 2 files changed, 16 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index c10bf9a7b804..00988241348a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3277,6 +3277,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
@2)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
 
+/* Signed saturation add, case 6 (one op is imm):
+   T sum = (T)((UT)X + (UT)IMM);
+   SAT_S_ADD = (X ^ IMM) < 0 ? sum : (X ^ sum) >= 0 ? sum : (x < 0) ? MIN : 
MAX;
+   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
+
+(match (signed_integer_sat_add @0 @1)
+(cond^ (lt (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
+  INTEGER_CST@1)))
+  (bit_xor:c @0 INTEGER_CST@3)) integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value) @2)
+(if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+ && wi::bit_and (wi::to_wide (@1), wi::to_wide (@3)) == 0)))
+
 /* Unsigned saturation sub, case 1 (branch with gt):
SAT_U_SUB = X > Y ? X - Y : 0  */
 (match (unsigned_integer_sat_sub @0 @1)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 83933df6928b..5f521aa6fef0 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4130,6 +4130,9 @@ match_saturation_add (gimple_stmt_iterator *gsi, gphi 
*phi)
   && !gimple_signed_integer_sat_add (phi_result, ops, NULL))
 return false;
 
+  if (!TYPE_UNSIGNED (TREE_TYPE (ops[0])) && TREE_CODE (ops[1]) == INTEGER_CST)
+ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
+
   return build_saturation_binary_arith_call_and_insert (gsi, IFN_SAT_ADD,
phi_result, ops[0],
ops[1]);


[gcc r14-10895] i386: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 instructions.

2024-11-06 Thread Hu via Gcc-cvs
https://gcc.gnu.org/g:05fd99e3d5e9f00e4e23596ed15a3cec2aaba128

commit r14-10895-g05fd99e3d5e9f00e4e23596ed15a3cec2aaba128
Author: Hu, Lin1 
Date:   Tue Nov 5 15:49:57 2024 +0800

i386: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 instructions.

gcc/ChangeLog:

PR target/117304
* config/i386/i386-builtin.def: Add OPTION_MASK_ISA2_EVEX512 for 
some
AVX512 512-bits instructions.

gcc/testsuite/ChangeLog:

PR target/117304
* gcc.target/i386/pr117304-1.c: New test.

(cherry picked from commit 8ac694ae67e24a798dce368587bed4c40b90fbc0)

Diff:
---
 gcc/config/i386/i386-builtin.def   | 10 +-
 gcc/testsuite/gcc.target/i386/pr117304-1.c | 28 
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index fdd9dba6e542..ee34e0a14979 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3065,11 +3065,11 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, 
CODE_FOR_sse_cvtsi2ss_round, "__builtin_ia32_
 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, 
CODE_FOR_sse_cvtsi2ssq_round, "__builtin_ia32_cvtsi2ss64", 
IX86_BUILTIN_CVTSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT64_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_cvtss2sd_round, 
"__builtin_ia32_cvtss2sd_round", IX86_BUILTIN_CVTSS2SD_ROUND, UNKNOWN, (int) 
V2DF_FTYPE_V2DF_V4SF_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse2_cvtss2sd_mask_round, 
"__builtin_ia32_cvtss2sd_mask_round", IX86_BUILTIN_CVTSS2SD_MASK_ROUND, 
UNKNOWN, (int) V2DF_FTYPE_V2DF_V4SF_V2DF_UQI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fix_truncv8dfv8si2_mask_round, 
"__builtin_ia32_cvttpd2dq512_mask", IX86_BUILTIN_CVTTPD2DQ512, UNKNOWN, (int) 
V8SI_FTYPE_V8DF_V8SI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fixuns_truncv8dfv8si2_mask_round, 
"__builtin_ia32_cvttpd2udq512_mask", IX86_BUILTIN_CVTTPD2UDQ512, UNKNOWN, (int) 
V8SI_FTYPE_V8DF_V8SI_QI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_fix_truncv16sfv16si2_mask_round, 
"__builtin_ia32_cvttps2dq512_mask", IX86_BUILTIN_CVTTPS2DQ512, UNKNOWN, (int) 
V16SI_FTYPE_V16SF_V16SI_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, 
CODE_FOR_fixuns_truncv16sfv16si2_mask_round, 
"__builtin_ia32_cvttps2udq512_mask", IX86_BUILTIN_CVTTPS2UDQ512, UNKNOWN, (int) 
V16SI_FTYPE_V16SF_V16SI_HI_INT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatunsv16siv16sf2_mask_round, 
"__builtin_ia32_cvtudq2ps512_mask", IX86_BUILTIN_CVTUDQ2PS512, UNKNOWN, (int) 
V16SF_FTYPE_V16SI_V16SF_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, 
CODE_FOR_fix_truncv8dfv8si2_mask_round, "__builtin_ia32_cvttpd2dq512_mask", 
IX86_BUILTIN_CVTTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, 
CODE_FOR_fixuns_truncv8dfv8si2_mask_round, "__builtin_ia32_cvttpd2udq512_mask", 
IX86_BUILTIN_CVTTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, 
CODE_FOR_fix_truncv16sfv16si2_mask_round, "__builtin_ia32_cvttps2dq512_mask", 
IX86_BUILTIN_CVTTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, 
CODE_FOR_fixuns_truncv16sfv16si2_mask_round, 
"__builtin_ia32_cvttps2udq512_mask", IX86_BUILTIN_CVTTPS2UDQ512, UNKNOWN, (int) 
V16SI_FTYPE_V16SF_V16SI_HI_INT)
+BDESC (OPTION_MASK_ISA_AVX512F, OPTION_MASK_ISA2_EVEX512, 
CODE_FOR_floatunsv16siv16sf2_mask_round, "__builtin_ia32_cvtudq2ps512_mask", 
IX86_BUILTIN_CVTUDQ2PS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_HI_INT)
 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, 
CODE_FOR_cvtusi2sd64_round, "__builtin_ia32_cvtusi2sd64", 
IX86_BUILTIN_CVTUSI2SD64, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT64_INT)
 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2ss32_round, 
"__builtin_ia32_cvtusi2ss32", IX86_BUILTIN_CVTUSI2SS32, UNKNOWN, (int) 
V4SF_FTYPE_V4SF_UINT_INT)
 BDESC (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_64BIT, 0, 
CODE_FOR_cvtusi2ss64_round, "__builtin_ia32_cvtusi2ss64", 
IX86_BUILTIN_CVTUSI2SS64, UNKNOWN, (int) V4SF_FTYPE_V4SF_UINT64_INT)
diff --git a/gcc/testsuite/gcc.target/i386/pr117304-1.c 
b/gcc/testsuite/gcc.target/i386/pr117304-1.c
new file mode 100644
index ..da26f4bd1b78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr117304-1.c
@@ -0,0 +1,28 @@
+/* PR target/117304 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mno-evex512 -mavx512vl" } */
+
+typedef __attribute__((__vector_size__(32))) int __v8si;
+typedef __attribute__((__vector_size__(32))) unsigned int __v8su;
+typedef __attribute__((__vector_size__(64))) double __v8df;
+typedef __attribute__((__vector_size__(64))) int __v16si;
+typedef __attribute__((__vector_size__(64))) unsigned int __v16su;
+typedef __attribute__((__vector_size__(64))) float __v16sf;
+typedef float __m512 __attr

[gcc r15-5005] limit ifcombine stmt moving and adjust flow info

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:c2d58f88c1a9f190f475ae8b91f6a1859f164410

commit r15-5005-gc2d58f88c1a9f190f475ae8b91f6a1859f164410
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:50 2024 -0300

limit ifcombine stmt moving and adjust flow info

It became apparent that conditions could be combined that had deep SSA
dependency trees, that might thus require moving lots of statements.
Set a hard upper bound for now, hopefully to be replaced by a
dynamically computed bound, based on probabilities and costs.

Also reset flow sensitive info and avoid introducing undefined
behavior when moving stmts from under guarding conditions.

Finally, rework the preexisting reset of flow sensitive info and
avoidance of undefined behavior to be done when needed on all affected
inner blocks: reset flow info whenever enclosing conditions change,
and avoid undefined behavior whenever enclosing conditions become
laxer.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc
(ifcombine_rewrite_to_defined_overflow): New.
(ifcombine_replace_cond): Reject conds that would require
moving too many stmts.  Reset flow sensitive info and avoid
undefined behavior in moved stmts.  Reset flow sensitive info
in all inner blocks when the outer condition changes, and
avoid undefined behavior whenever the outer condition becomes
laxer, adapted and moved from...
(pass_tree_ifcombine::execute): ... here.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 114 --
 1 file changed, 89 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index d52510e3c3fb..b87ed1189df1 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -509,6 +509,25 @@ ifcombine_mark_ssa_name_walk (tree *t, int *, void *data_)
   return NULL;
 }
 
+/* Rewrite a stmt, that presumably used to be guarded by conditions that could
+   avoid undefined overflow, into one that has well-defined overflow, so that
+   it won't invoke undefined behavior once the guarding conditions change.  */
+
+static inline void
+ifcombine_rewrite_to_defined_overflow (gimple_stmt_iterator gsi)
+{
+  gassign *ass = dyn_cast  (gsi_stmt (gsi));
+  if (!ass)
+return;
+  tree lhs = gimple_assign_lhs (ass);
+  if ((INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+   || POINTER_TYPE_P (TREE_TYPE (lhs)))
+  && arith_code_with_undefined_signed_overflow
+  (gimple_assign_rhs_code (ass)))
+rewrite_to_defined_overflow (&gsi);
+}
+
+
 /* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
replaced with a constant, but if there are intervening blocks, it's best to
@@ -519,6 +538,7 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
gcond *outer_cond, bool outer_inv,
tree cond, bool must_canon, tree cond2)
 {
+  bool split_single_cond = false;
   /* Split cond into cond2 if they're contiguous.  ??? We might be able to
  handle ORIF as well, inverting both conditions, but it's not clear that
  this would be enough, and it never comes up.  */
@@ -528,11 +548,13 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
 {
   cond2 = TREE_OPERAND (cond, 1);
   cond = TREE_OPERAND (cond, 0);
+  split_single_cond = true;
 }
 
   bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
   != gimple_bb (outer_cond));
   bool result_inv = outer_p ? outer_inv : inner_inv;
+  bool strictening_outer_cond = !split_single_cond && outer_p;
 
   if (result_inv)
 cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
@@ -559,9 +581,11 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
 
if (!bitmap_empty_p (used))
  {
+   const int max_stmts = 6;
+   auto_vec stmts;
+
/* Iterate up from inner_cond, moving DEFs identified as used by
   cond, and marking USEs in the DEFs for moving as well.  */
-   gimple_stmt_iterator gsins = gsi_for_stmt (outer_cond);
for (basic_block bb = gimple_bb (inner_cond);
 bb != outer_bb; bb = single_pred (bb))
  {
@@ -583,11 +607,14 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
if (!move)
  continue;
 
+   if (stmts.length () < max_stmts)
+ stmts.quick_push (stmt);
+   else
+ return false;
+
/* Mark uses in STMT before moving it.  */
FOR_EACH_SSA_TREE_OPERAND (t, stmt, it, SSA_OP_USE)
  ifcombine_mark_ssa_name (used, t, outer_bb);
-
-   gsi_move_before (&gsitr, &gsins, GSI_NEW_STMT);
  }
 
 

[gcc(refs/users/meissner/heads/work182)] Revert changes

2024-11-06 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:27f73dee7dd0ad27cb5f74e2ef3b2de3f00b0bcd

commit 27f73dee7dd0ad27cb5f74e2ef3b2de3f00b0bcd
Author: Michael Meissner 
Date:   Wed Nov 6 13:45:23 2024 -0500

Revert changes

Diff:
---
 gcc/config/rs6000/rs6000-arch.def |  48 
 gcc/config/rs6000/rs6000-c.cc |  27 ++---
 gcc/config/rs6000/rs6000-cpus.def |   8 +-
 gcc/config/rs6000/rs6000-protos.h |   5 +-
 gcc/config/rs6000/rs6000.cc   | 234 +++---
 gcc/config/rs6000/rs6000.h|  44 ---
 gcc/config/rs6000/rs6000.opt  |  19 ++--
 7 files changed, 74 insertions(+), 311 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-arch.def 
b/gcc/config/rs6000/rs6000-arch.def
deleted file mode 100644
index e5b6e9581331..
--- a/gcc/config/rs6000/rs6000-arch.def
+++ /dev/null
@@ -1,48 +0,0 @@
-/* IBM RS/6000 CPU architecture features by processor type.
-   Copyright (C) 1991-2024 Free Software Foundation, Inc.
-   Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu)
-
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published
-   by the Free Software Foundation; either version 3, or (at your
-   option) any later version.
-
-   GCC is distributed in the hope that it will be useful, but WITHOUT
-   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
-   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
-   License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with GCC; see the file COPYING3.  If not see
-   .  */
-
-/* This file defines architecture features that are based on the -mcpu=
-   option, and not on user options that can be turned on or off.  The intention
-   is for newer processors (power7 and above) to not add new ISA bits for the
-   particular processor, but add these bits.  Otherwise we have to add a bunch
-   of hidden options, just so we have the proper ISA bits.
-
-   For example, in the past we added -mpower8-internal, so that on power8,
-   power9, and power10 would inherit the option, but we had to mark the option
-   generate a warning if the user actually used it.  These options have been
-   moved from the ISA flags to the arch flags.
-
-   To use this, define the macro ARCH_EXPAND which takes 2 arguments.  The
-   first argument is the processor name in upper case, and the second argument
-   is a text name for the processor.
-
-   The function get_arch_flags when passed a processor index number will set up
-   the appropriate architecture flags based on the actual processor
-   enumeration.  */
-
-ARCH_EXPAND(POWER4,  "power4")
-ARCH_EXPAND(POWER5,  "power5")
-ARCH_EXPAND(POWER5X, "power5+")
-ARCH_EXPAND(POWER6,  "power6")
-ARCH_EXPAND(POWER7,  "power7")
-ARCH_EXPAND(POWER8,  "power8")
-ARCH_EXPAND(POWER9,  "power9")
-ARCH_EXPAND(POWER10, "power10")
-ARCH_EXPAND(POWER11, "power11")
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index c8f33289fa38..04882c396bfe 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -338,8 +338,7 @@ rs6000_define_or_undefine_macro (bool define_p, const char 
*name)
#pragma GCC target, we need to adjust the macros dynamically.  */
 
 void
-rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
-HOST_WIDE_INT arch_flags)
+rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
 {
   if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
 fprintf (stderr,
@@ -412,7 +411,7 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
summary of the flags associated with particular cpu
definitions.  */
 
-  /* rs6000_isa_flags and rs6000_arch_flags based options.  */
+  /* rs6000_isa_flags based options.  */
   rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC");
   if ((flags & OPTION_MASK_PPC_GPOPT) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPCSQ");
@@ -420,25 +419,23 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPCGR");
   if ((flags & OPTION_MASK_POWERPC64) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC64");
-  if ((flags & OPTION_MASK_POWERPC64) != 0)
-rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC64");
-  if ((arch_flags & ARCH_MASK_POWER4) != 0)
+  if ((flags & OPTION_MASK_MFCRF) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR4");
-  if ((arch_flags & ARCH_MASK_POWER5) != 0)
+  if ((flags & OPTION_MASK_POPCNTB) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR5");
-  if ((arch_flags & ARCH_MASK_POWER5X) != 0)
+  if ((flags & OPTION_MASK_FPRND) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR5X");
-  if ((arch_flags & ARCH_MASK_POWER6) != 0)
+  if ((flags & OPTION_MASK_CMPB) != 0)
 rs6000_def

[gcc/aoliva/heads/testme] (414 commits) limit ifcombine stmt moving and adjust flow info

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
The branch 'aoliva/heads/testme' was updated to point to:

 c2d58f88c1a9... limit ifcombine stmt moving and adjust flow info

It previously pointed to:

 948a9475337a... fold fold_truth_andor field merging into ifcombine

Diff:

!!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST):
---

  948a947... fold fold_truth_andor field merging into ifcombine
  199a586... limit ifcombine stmt moving and adjust flow info
  0902a71... handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond
  d1c6e0b... ifcombine across noncontiguous blocks
  48872a2... extend ifcombine_replace_cond to handle noncontiguous ifcom
  8aef26f... adjust update_profile_after_ifcombine for noncontiguous ifc
  668d14d... introduce ifcombine_replace_cond
  74aeb80... drop redundant ifcombine_ifandif parm
  5a40cc2... allow vuses in ifcombine blocks


Summary of changes (added commits):
---

  c2d58f8... limit ifcombine stmt moving and adjust flow info
  13cf22e... handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond
  ae074c6... ifcombine across noncontiguous blocks
  6eac478... extend ifcombine_replace_cond to handle noncontiguous ifcom
  02dc503... adjust update_profile_after_ifcombine for noncontiguous ifc
  f9fb8f9... introduce ifcombine_replace_cond
  77c9254... drop redundant ifcombine_ifandif parm
  8e6a25b... allow vuses in ifcombine blocks
  2ec80c6... [testsuite] disable PIE on ia32 on more tests
  d17a2e8... [testsuite] fix pr70321.c PIC expectations
  1e2ae65... RISC-V: Add testcases for signed imm SAT_ADD form1 (*)
  da31786... Match:Support signed imm SAT_ADD form1 (*)
  693b770... Daily bump. (*)
  859ce74... avx10_2-comibf-2.c: Require AVX10.2 support (*)
  69bd93c... [PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR (*)
  a91d5c2... Darwin: Fix a narrowing warning. (*)
  345eb9b... openmp: Fix signed/unsigned warning (*)
  d334f72... openmp: Add testcases for omp_max_vf (*)
  2a2e6e9... openmp: Add IFN_GOMP_MAX_VF (*)
  896c6c2... openmp: use offload max_vf for chunk_size (*)
  5c9de3d... openmp: Tune omp_max_vf for offload targets (*)
  137b264... Add details output for assume processing. (*)
  85736ba... testsuite: add infinite recursion test case [PR63388] (*)
  6f4977e... diagnostics: fix typo in comment (*)
  5c34f02... libstdc++: Deprecate useless  compatibility headers f (*)
  6a050a3... libstdc++: Move include guards to start of headers (*)
  1b169ee... libstdc++: More user-friendly failed assertions from shared (*)
  f7979b8... libstdc++: Enable debug assertions for filesystem directory (*)
  05e70ff... ipcp don't propagate where not needed (*)
  6d8764c... store-merging: Apply --param=store-merging-max-size= in mor (*)
  aab5722... store-merging: Don't use sub_byte_op_p mode for empty_ctor_ (*)
  4dbf4c0... Fortran: F2008 passing of internal procs to a proc pointer  (*)
  8ac694a... i386: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 instruct (*)
  d228a07... Intel MOVRS tests: Also scan (%e.x) (*)
  c415565... gcc.target/i386/apx-ndd.c: Also scan (%edi) (*)
  f0d34e8... Daily bump. (*)
  8c41846... fortran: dynamically allocate error_buffer [PR117442] (*)
  2e35fbd... match: Fix comment for `X != 0 ? X + ~0 : 0` transformation (*)
  c751889... testsuite: arm: Use effective-target for pr68620 and pr7804 (*)
  e152a73... testsuite: arm: Relax register selection [PR116623] (*)
  4602f62... testsuite: arm: Use effective-target for pr98636.c test (*)
  3621d2a... c: gimplefe: Only allow an identifier before ? [PR117445] (*)
  161e246... PR target/117449: Restrict vector rotate match and split to (*)
  f185a89... testsuite: Fix up gcc.target/powerpc/safe-indirect-jump-3.c (*)
  3545aab... c++: allow array mem-init with -fpermissive [PR116634] (*)
  6543a21... Deprecate the ARM simulator (*)
  f31b72b... c++: Fix crash during NRV optimization with invalid input [ (*)
  5821f5c... c++: Don't crash upon invalid placement new operator [PR117 (*)
  b1d92ae... c++: Defer -fstrong-eval-order processing to template insta (*)
  5c19ba5... testsuite: fix testcase pr110279-1.c (*)
  648bd1f... Support vector float_extend from __bf16 to float. (*)
  a17acf4... Support vector float_truncate for SF to BF. (*)
  c1bbad0... c++: Mark replaceable global operator new/delete with const (*)
  ea46a21... i386: Handling exception input of __builtin_ia32_prefetch.  (*)
  2fc25a2... middle-end/117433 - ICE with gimple BLKmode reg copy (*)
  1cc2c45... aarch64: remove falkor-tag-collision-avoidance pass (*)
  acf18b5... aarch64: Remove scheduling models for falkor and saphira (*)
  61622cf... i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2 (*)
  6177b45... Handle T_HRESULT types in CodeView records (*)
  b0f4f55... Write LF_POINTER CodeView types for pointers to member func (*)
  7ac2407... Write LF_BCLASS records in CodeView struct definitions (*)
  a96c774... c++/modules: Merge default arguments [PR99274] (*)
  48ef

[gcc(refs/users/aoliva/heads/testme)] [testsuite] disable PIE on ia32 on more tests

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:2ec80c60d4f10dcdbc9fad5d35297bfa432d14aa

commit 2ec80c60d4f10dcdbc9fad5d35297bfa432d14aa
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:06 2024 -0300

[testsuite] disable PIE on ia32 on more tests

Multiple tests fail on ia32 with -fPIE enabled by default because of
different call sequences required by the call-saved PIC register
(no-callee-saved-*.c), uses of the constant pool instead of computing
constants (pr100865-*.c), and unexpected matches of esp in get_pc_thunk
(sse2-stv-1.c).  Disable PIE on them, to match the expectations.


for  gcc/testsuite/ChangeLog

* gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/sse2-stv-1.c: Likewise.

Diff:
---
 gcc/testsuite/gcc.target/i386/no-callee-saved-13.c | 1 +
 gcc/testsuite/gcc.target/i386/no-callee-saved-14.c | 1 +
 gcc/testsuite/gcc.target/i386/no-callee-saved-15.c | 1 +
 gcc/testsuite/gcc.target/i386/no-callee-saved-17.c | 1 +
 gcc/testsuite/gcc.target/i386/pr100865-1.c | 1 +
 gcc/testsuite/gcc.target/i386/pr100865-7a.c| 1 +
 gcc/testsuite/gcc.target/i386/pr100865-7c.c| 1 +
 gcc/testsuite/gcc.target/i386/sse2-stv-1.c | 1 +
 8 files changed, 8 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
index 6757e72d8487..0b59da36786a 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern void foo (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
index 2239e286e6a6..2127b12f120b 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern void bar (void) __attribute__ ((no_callee_saved_registers));
 
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
index 10135fec9c14..65f2a9532ffd 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
 extern fn_t bar;
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
index 1fd5daadf080..1ecf4552f3d0 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern void foo (void) __attribute__ ((no_caller_saved_registers));
 
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c 
b/gcc/testsuite/gcc.target/i386/pr100865-1.c
index 75cd463cbfc2..fc0a5b33950f 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64" } */
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7a.c 
b/gcc/testsuite/gcc.target/i386/pr100865-7a.c
index 7de7d4a3ce3a..9fb5dc525652 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-7a.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-7a.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=skylake" } */
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern long long int array[64];
 
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7c.c 
b/gcc/testsuite/gcc.target/i386/pr100865-7c.c
index edbfd5b09ed6..695831e59af5 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-7c.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-7c.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=skylake -mno-avx2" } */
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern long long int array[64];
 
diff --git a/

[gcc(refs/users/aoliva/heads/testme)] allow vuses in ifcombine blocks

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:8e6a25b01becf449d54154b7e83de5f4955cba37

commit 8e6a25b01becf449d54154b7e83de5f4955cba37
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:15 2024 -0300

allow vuses in ifcombine blocks

Disallowing vuses in blocks for ifcombine is too strict, and it
prevents usefully moving fold_truth_andor into ifcombine.  That
tree-level folder has long ifcombined loads, absent other relevant
side effects.


for  gcc/ChangeLog

* tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses,
but not vdefs.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 39702929fc01..57b7e4b62f29 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -130,7 +130,7 @@ bb_no_side_effects_p (basic_block bb)
   enum tree_code rhs_code;
   if (gimple_has_side_effects (stmt)
  || gimple_could_trap_p (stmt)
- || gimple_vuse (stmt)
+ || gimple_vdef (stmt)
  /* We need to rewrite stmts with undefined overflow to use
 unsigned arithmetic but cannot do so for signed division.  */
  || ((ass = dyn_cast  (stmt))


[gcc(refs/users/aoliva/heads/testme)] introduce ifcombine_replace_cond

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:f9fb8f96cd7d849ca68da8839b2e8fe8eeb70411

commit f9fb8f96cd7d849ca68da8839b2e8fe8eeb70411
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:31 2024 -0300

introduce ifcombine_replace_cond

Refactor ifcombine_ifandif, moving the common code from the various
paths that apply the combined condition to a new function.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out
of...
(ifcombine_ifandif): ... this.  Leave it for the above to
gimplify and invert the condition.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 137 ++
 1 file changed, 65 insertions(+), 72 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 7fc46a913768..1f28cde719a7 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -400,6 +400,51 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
   outer2->probability = profile_probability::never ();
 }
 
+/* Replace the conditions in INNER_COND with COND.
+   Replace OUTER_COND with a constant.  */
+
+static bool
+ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
+   gcond *outer_cond, bool outer_inv,
+   tree cond, bool must_canon, tree cond2)
+{
+  bool result_inv = inner_inv;
+
+  gcc_checking_assert (!cond2);
+
+  if (result_inv)
+cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
+
+  if (tree tcanon = canonicalize_cond_expr_cond (cond))
+cond = tcanon;
+  else if (must_canon)
+return false;
+
+{
+  if (!is_gimple_condexpr_for_cond (cond))
+   {
+ gimple_stmt_iterator gsi = gsi_for_stmt (inner_cond);
+ cond = force_gimple_operand_gsi_1 (&gsi, cond,
+is_gimple_condexpr_for_cond,
+NULL, true, GSI_SAME_STMT);
+   }
+  gimple_cond_set_condition_from_tree (inner_cond, cond);
+  update_stmt (inner_cond);
+
+  /* Leave CFG optimization to cfg_cleanup.  */
+  gimple_cond_set_condition_from_tree (outer_cond,
+  outer_inv
+  ? boolean_false_node
+  : boolean_true_node);
+  update_stmt (outer_cond);
+}
+
+  update_profile_after_ifcombine (gimple_bb (inner_cond),
+ gimple_bb (outer_cond));
+
+  return true;
+}
+
 /* If-convert on a and pattern with a common else block.  The inner
if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
inner_inv, outer_inv indicate whether the conditions are inverted.
@@ -409,7 +454,6 @@ static bool
 ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
   basic_block outer_cond_bb, bool outer_inv)
 {
-  bool result_inv = inner_inv;
   gimple_stmt_iterator gsi;
   tree name1, name2, bit1, bit2, bits1, bits2;
 
@@ -447,26 +491,13 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
inner_inv,
   t2 = fold_build2 (BIT_AND_EXPR, TREE_TYPE (name1), name1, t);
   t2 = force_gimple_operand_gsi (&gsi, t2, true, NULL_TREE,
 true, GSI_SAME_STMT);
-  t = fold_build2 (result_inv ? NE_EXPR : EQ_EXPR,
-  boolean_type_node, t2, t);
-  t = canonicalize_cond_expr_cond (t);
-  if (!t)
-   return false;
-  if (!is_gimple_condexpr_for_cond (t))
-   {
- gsi = gsi_for_stmt (inner_cond);
- t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
- NULL, true, GSI_SAME_STMT);
-   }
-  gimple_cond_set_condition_from_tree (inner_cond, t);
-  update_stmt (inner_cond);
 
-  /* Leave CFG optimization to cfg_cleanup.  */
-  gimple_cond_set_condition_from_tree (outer_cond,
-   outer_inv ? boolean_false_node : boolean_true_node);
-  update_stmt (outer_cond);
+  t = fold_build2 (EQ_EXPR, boolean_type_node, t2, t);
 
-  update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
+  if (!ifcombine_replace_cond (inner_cond, inner_inv,
+  outer_cond, outer_inv,
+  t, true, NULL_TREE))
+   return false;
 
   if (dump_file)
{
@@ -486,9 +517,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
inner_inv,
  In that case remove the outer test and change the inner one to
  test for name & (bits1 | bits2) != 0.  */
   else if (recognize_bits_test (inner_cond, &name1, &bits1, !inner_inv)
-  && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
+  && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
 {
-  gimple_stmt_iterator gsi;
   tree t;
 
   if ((TREE_CODE (name1) == SSA_NAME
@@ -531,33 +561,14 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool

[gcc(refs/users/aoliva/heads/testme)] [testsuite] fix pr70321.c PIC expectations

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:d17a2e8bfc91a8e401a2d8c61e23fba36e28a43d

commit d17a2e8bfc91a8e401a2d8c61e23fba36e28a43d
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:46:57 2024 -0300

[testsuite] fix pr70321.c PIC expectations

When we select a non-bx get_pc_thunk, we get an extra mov to set up
the PIC register before the abort call.  Expect that mov or a
get_pc_thunk.bx call.


for  gcc/testsuite/ChangeLog

* gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.

Diff:
---
 gcc/testsuite/gcc.target/i386/pr70321.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr70321.c 
b/gcc/testsuite/gcc.target/i386/pr70321.c
index 58f5f5661c7a..287b7da1b950 100644
--- a/gcc/testsuite/gcc.target/i386/pr70321.c
+++ b/gcc/testsuite/gcc.target/i386/pr70321.c
@@ -9,4 +9,8 @@ void foo (long long ixi)
 
 /* { dg-final { scan-assembler-times "mov" 1 { target nonpic } } } */
 /* get_pc_thunk adds an extra mov insn.  */
-/* { dg-final { scan-assembler-times "mov" 2 { target { ! nonpic } } } } */
+/* Choosing a non-bx get_pc_thunk requires another mov before the abort call.
+   So we require a match of either that mov or the get_pc_thunk.bx call, in
+   addition to the other 2 movs.  (Hopefully there won't be more calls for a
+   false positive.)  */
+/* { dg-final { scan-assembler-times "mov|call\[^\n\r]*get_pc_thunk\.bx" 3 { 
target { ! nonpic } } } } */


[gcc(refs/users/aoliva/heads/testme)] extend ifcombine_replace_cond to handle noncontiguous ifcombine

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:6eac478619193eeb2fd714eb0988ce3197dd63b1

commit 6eac478619193eeb2fd714eb0988ce3197dd63b1
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:38 2024 -0300

extend ifcombine_replace_cond to handle noncontiguous ifcombine

Prepare to handle noncontiguous ifcombine, introducing logic to modify
the outer condition when needed.  There are two cases worth
mentioning:

- when blocks are noncontiguous, we have to place the combined
  condition in the outer block to avoid pessimizing carefully crafted
  short-circuited tests;

- even when blocks are contiguous, we prepare for situations in which
  the combined condition has two tests, one to be placed in outer and
  the other in inner.  This circumstance will not come up when
  noncontiguous ifcombine is first enabled, but it will when
  an improved fold_truth_andor is integrated with ifcombine.

Combining the condition from inner into outer may require moving SSA
DEFs used in the inner condition, and the changes implement this as
well.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc: Include bitmap.h.
(ifcombine_mark_ssa_name): New.
(struct ifcombine_mark_ssa_name_t): New.
(ifcombine_mark_ssa_name_walk): New.
(ifcombine_replace_cond): Prepare to handle noncontiguous and
split-condition ifcombine.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 175 --
 1 file changed, 170 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 64319874888e..49bd7f2915c2 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "attribs.h"
 #include "asan.h"
+#include "bitmap.h"
 
 #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
 #define LOGICAL_OP_NON_SHORT_CIRCUIT \
@@ -461,17 +462,57 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
 }
 }
 
-/* Replace the conditions in INNER_COND with COND.
-   Replace OUTER_COND with a constant.  */
+/* Set NAME's bit in USED if OUTER dominates it.  */
+
+static void
+ifcombine_mark_ssa_name (bitmap used, tree name, basic_block outer)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (name))
+return;
+
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  basic_block bb = gimple_bb (def);
+  if (!dominated_by_p (CDI_DOMINATORS, bb, outer))
+return;
+
+  bitmap_set_bit (used, SSA_NAME_VERSION (name));
+}
+
+/* Data structure passed to ifcombine_mark_ssa_name.  */
+struct ifcombine_mark_ssa_name_t
+{
+  /* SSA_NAMEs that have been referenced.  */
+  bitmap used;
+  /* Dominating block of DEFs that might need moving.  */
+  basic_block outer;
+};
+
+/* Mark in DATA->used any SSA_NAMEs used in *t.  */
+
+static tree
+ifcombine_mark_ssa_name_walk (tree *t, int *, void *data_)
+{
+  ifcombine_mark_ssa_name_t *data = (ifcombine_mark_ssa_name_t *)data_;
+
+  if (*t && TREE_CODE (*t) == SSA_NAME)
+ifcombine_mark_ssa_name (data->used, *t, data->outer);
+
+  return NULL;
+}
+
+/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
+   COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
+   replaced with a constant, but if there are intervening blocks, it's best to
+   adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND.  */
 
 static bool
 ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
gcond *outer_cond, bool outer_inv,
tree cond, bool must_canon, tree cond2)
 {
-  bool result_inv = inner_inv;
-
-  gcc_checking_assert (!cond2);
+  bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
+  != gimple_bb (outer_cond));
+  bool result_inv = outer_p ? outer_inv : inner_inv;
 
   if (result_inv)
 cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
@@ -481,6 +522,130 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
   else if (must_canon)
 return false;
 
+  if (outer_p)
+{
+  {
+   auto_bitmap used;
+   basic_block outer_bb = gimple_bb (outer_cond);
+
+   bitmap_tree_view (used);
+
+   /* Mark SSA DEFs that are referenced by cond and may thus need to be
+  moved to outer.  */
+   {
+ ifcombine_mark_ssa_name_t data = { used, outer_bb };
+ walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
+   }
+
+   if (!bitmap_empty_p (used))
+ {
+   /* Iterate up from inner_cond, moving DEFs identified as used by
+  cond, and marking USEs in the DEFs for moving as well.  */
+   gimple_stmt_iterator gsins = gsi_for_stmt (outer_cond);
+   for (basic_block bb = gimple_bb (inner_cond);
+bb != outer_bb; bb = single_pred (bb))
+ {
+   for (gimple_stmt_iterator gsitr = gsi_last_bb (bb);
+

[gcc(refs/users/aoliva/heads/testme)] adjust update_profile_after_ifcombine for noncontiguous ifcombine

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:02dc5036ba8d816048b942e51f74a9e4b3fde173

commit 02dc5036ba8d816048b942e51f74a9e4b3fde173
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:34 2024 -0300

adjust update_profile_after_ifcombine for noncontiguous ifcombine

Prepare for ifcombining noncontiguous blocks, adding (still unused)
logic to the ifcombine profile updater to handle such cases.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (known_succ_p): New.
(update_profile_after_ifcombine): Handle noncontiguous blocks.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 109 --
 1 file changed, 85 insertions(+), 24 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 1f28cde719a7..64319874888e 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -50,6 +50,21 @@ along with GCC; see the file COPYING3.  If not see
 false) >= 2)
 #endif
 
+/* Return FALSE iff the COND_BB ends with a conditional whose result is not a
+   known constant.  */
+
+static bool
+known_succ_p (basic_block cond_bb)
+{
+  gcond *cond = safe_dyn_cast  (*gsi_last_bb (cond_bb));
+
+  if (!cond)
+return true;
+
+  return (CONSTANT_CLASS_P (gimple_cond_lhs (cond))
+ && CONSTANT_CLASS_P (gimple_cond_rhs (cond)));
+}
+
 /* This pass combines COND_EXPRs to simplify control flow.  It
currently recognizes bit tests and comparisons in chains that
represent logical and or logical or of two COND_EXPRs.
@@ -357,14 +372,28 @@ recognize_bits_test (gcond *cond, tree *name, tree *bits, 
bool inv)
 }
 
 
-/* Update profile after code in outer_cond_bb was adjusted so
-   outer_cond_bb has no condition.  */
+/* Update profile after code in either outer_cond_bb or inner_cond_bb was
+   adjusted so that it has no condition.  */
 
 static void
 update_profile_after_ifcombine (basic_block inner_cond_bb,
basic_block outer_cond_bb)
 {
-  edge outer_to_inner = find_edge (outer_cond_bb, inner_cond_bb);
+  /* In the following we assume that inner_cond_bb has single predecessor.  */
+  gcc_assert (single_pred_p (inner_cond_bb));
+
+  basic_block outer_to_inner_bb = inner_cond_bb;
+  profile_probability prob = profile_probability::always ();
+  for (;;)
+{
+  basic_block parent = single_pred (outer_to_inner_bb);
+  prob *= find_edge (parent, outer_to_inner_bb)->probability;
+  if (parent == outer_cond_bb)
+   break;
+  outer_to_inner_bb = parent;
+}
+
+  edge outer_to_inner = find_edge (outer_cond_bb, outer_to_inner_bb);
   edge outer2 = (EDGE_SUCC (outer_cond_bb, 0) == outer_to_inner
 ? EDGE_SUCC (outer_cond_bb, 1)
 : EDGE_SUCC (outer_cond_bb, 0));
@@ -375,29 +404,61 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
 std::swap (inner_taken, inner_not_taken);
   gcc_assert (inner_taken->dest == outer2->dest);
 
-  /* In the following we assume that inner_cond_bb has single predecessor.  */
-  gcc_assert (single_pred_p (inner_cond_bb));
-
-  /* Path outer_cond_bb->(outer2) needs to be merged into path
- outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
- and probability of inner_not_taken updated.  */
-
-  inner_cond_bb->count = outer_cond_bb->count;
+  if (outer_to_inner_bb == inner_cond_bb
+  && known_succ_p (outer_cond_bb))
+{
+  /* Path outer_cond_bb->(outer2) needs to be merged into path
+outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
+and probability of inner_not_taken updated.  */
+
+  inner_cond_bb->count = outer_cond_bb->count;
+
+  /* Handle special case where inner_taken probability is always. In this
+case we know that the overall outcome will be always as well, but
+combining probabilities will be conservative because it does not know
+that outer2->probability is inverse of
+outer_to_inner->probability.  */
+  if (inner_taken->probability == profile_probability::always ())
+   ;
+  else
+   inner_taken->probability = outer2->probability
+ + outer_to_inner->probability * inner_taken->probability;
+  inner_not_taken->probability = profile_probability::always ()
+   - inner_taken->probability;
 
-  /* Handle special case where inner_taken probability is always. In this case
- we know that the overall outcome will be always as well, but combining
- probabilities will be conservative because it does not know that
- outer2->probability is inverse of outer_to_inner->probability.  */
-  if (inner_taken->probability == profile_probability::always ())
-;
+  outer_to_inner->probability = profile_probability::always ();
+  outer2->probability = profile_probability::never ();
+}
+  else if (known_succ_p (inner_cond_bb))
+{
+  /* Path inner_cond_bb->(inner_taken) needs to be merged into path
+outer_cond_bb->(outer2).  We've accumulated 

[gcc(refs/users/aoliva/heads/testme)] limit ifcombine stmt moving and adjust flow info

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:c2d58f88c1a9f190f475ae8b91f6a1859f164410

commit c2d58f88c1a9f190f475ae8b91f6a1859f164410
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:50 2024 -0300

limit ifcombine stmt moving and adjust flow info

It became apparent that conditions could be combined that had deep SSA
dependency trees, that might thus require moving lots of statements.
Set a hard upper bound for now, hopefully to be replaced by a
dynamically computed bound, based on probabilities and costs.

Also reset flow sensitive info and avoid introducing undefined
behavior when moving stmts from under guarding conditions.

Finally, rework the preexisting reset of flow sensitive info and
avoidance of undefined behavior to be done when needed on all affected
inner blocks: reset flow info whenever enclosing conditions change,
and avoid undefined behavior whenever enclosing conditions become
laxer.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc
(ifcombine_rewrite_to_defined_overflow): New.
(ifcombine_replace_cond): Reject conds that would require
moving too many stmts.  Reset flow sensitive info and avoid
undefined behavior in moved stmts.  Reset flow sensitive info
in all inner blocks when the outer condition changes, and
avoid undefined behavior whenever the outer condition becomes
laxer, adapted and moved from...
(pass_tree_ifcombine::execute): ... here.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 114 --
 1 file changed, 89 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index d52510e3c3fb..b87ed1189df1 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -509,6 +509,25 @@ ifcombine_mark_ssa_name_walk (tree *t, int *, void *data_)
   return NULL;
 }
 
+/* Rewrite a stmt, that presumably used to be guarded by conditions that could
+   avoid undefined overflow, into one that has well-defined overflow, so that
+   it won't invoke undefined behavior once the guarding conditions change.  */
+
+static inline void
+ifcombine_rewrite_to_defined_overflow (gimple_stmt_iterator gsi)
+{
+  gassign *ass = dyn_cast  (gsi_stmt (gsi));
+  if (!ass)
+return;
+  tree lhs = gimple_assign_lhs (ass);
+  if ((INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+   || POINTER_TYPE_P (TREE_TYPE (lhs)))
+  && arith_code_with_undefined_signed_overflow
+  (gimple_assign_rhs_code (ass)))
+rewrite_to_defined_overflow (&gsi);
+}
+
+
 /* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
replaced with a constant, but if there are intervening blocks, it's best to
@@ -519,6 +538,7 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
gcond *outer_cond, bool outer_inv,
tree cond, bool must_canon, tree cond2)
 {
+  bool split_single_cond = false;
   /* Split cond into cond2 if they're contiguous.  ??? We might be able to
  handle ORIF as well, inverting both conditions, but it's not clear that
  this would be enough, and it never comes up.  */
@@ -528,11 +548,13 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
 {
   cond2 = TREE_OPERAND (cond, 1);
   cond = TREE_OPERAND (cond, 0);
+  split_single_cond = true;
 }
 
   bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
   != gimple_bb (outer_cond));
   bool result_inv = outer_p ? outer_inv : inner_inv;
+  bool strictening_outer_cond = !split_single_cond && outer_p;
 
   if (result_inv)
 cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
@@ -559,9 +581,11 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
 
if (!bitmap_empty_p (used))
  {
+   const int max_stmts = 6;
+   auto_vec stmts;
+
/* Iterate up from inner_cond, moving DEFs identified as used by
   cond, and marking USEs in the DEFs for moving as well.  */
-   gimple_stmt_iterator gsins = gsi_for_stmt (outer_cond);
for (basic_block bb = gimple_bb (inner_cond);
 bb != outer_bb; bb = single_pred (bb))
  {
@@ -583,11 +607,14 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
if (!move)
  continue;
 
+   if (stmts.length () < max_stmts)
+ stmts.quick_push (stmt);
+   else
+ return false;
+
/* Mark uses in STMT before moving it.  */
FOR_EACH_SSA_TREE_OPERAND (t, stmt, it, SSA_OP_USE)
  ifcombine_mark_ssa_name (used, t, outer_bb);
-
-   gsi_move_before (&gsitr, &gsins, GSI_NEW_STMT);
  }
 
   

[gcc(refs/users/aoliva/heads/testme)] handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:13cf22eb557eb5e3d796822247d8d4957bdb25da

commit 13cf22eb557eb5e3d796822247d8d4957bdb25da
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:46 2024 -0300

handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond

The upcoming move of fold_truth_andor to ifcombine brings with it the
possibility of TRUTH_ANDIF cond exprs.  Handle them by splitting the
cond so as to best use both BB insertion points, but only if they're
contiguous.


for  gcc/ChangeLog

* tree-ssa-ifcombine.c (ifcombine_replace_cond): Support
TRUTH_ANDIF cond exprs.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 158f2a645020..d52510e3c3fb 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -519,6 +519,17 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
gcond *outer_cond, bool outer_inv,
tree cond, bool must_canon, tree cond2)
 {
+  /* Split cond into cond2 if they're contiguous.  ??? We might be able to
+ handle ORIF as well, inverting both conditions, but it's not clear that
+ this would be enough, and it never comes up.  */
+  if (!cond2
+  && TREE_CODE (cond) == TRUTH_ANDIF_EXPR
+  && single_pred (gimple_bb (inner_cond)) == gimple_bb (outer_cond))
+{
+  cond2 = TREE_OPERAND (cond, 1);
+  cond = TREE_OPERAND (cond, 0);
+}
+
   bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
   != gimple_bb (outer_cond));
   bool result_inv = outer_p ? outer_inv : inner_inv;


[gcc/aoliva/heads/testbase] (404 commits) RISC-V: Add testcases for signed imm SAT_ADD form1

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
The branch 'aoliva/heads/testbase' was updated to point to:

 1e2ae65a7f01... RISC-V: Add testcases for signed imm SAT_ADD form1

It previously pointed to:

 fc40202c1ac5... SVE intrinsics: Fold division and multiplication by -1 to n

Diff:

Summary of changes (added commits):
---

  1e2ae65... RISC-V: Add testcases for signed imm SAT_ADD form1 (*)
  da31786... Match:Support signed imm SAT_ADD form1 (*)
  693b770... Daily bump. (*)
  859ce74... avx10_2-comibf-2.c: Require AVX10.2 support (*)
  69bd93c... [PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR (*)
  a91d5c2... Darwin: Fix a narrowing warning. (*)
  345eb9b... openmp: Fix signed/unsigned warning (*)
  d334f72... openmp: Add testcases for omp_max_vf (*)
  2a2e6e9... openmp: Add IFN_GOMP_MAX_VF (*)
  896c6c2... openmp: use offload max_vf for chunk_size (*)
  5c9de3d... openmp: Tune omp_max_vf for offload targets (*)
  137b264... Add details output for assume processing. (*)
  85736ba... testsuite: add infinite recursion test case [PR63388] (*)
  6f4977e... diagnostics: fix typo in comment (*)
  5c34f02... libstdc++: Deprecate useless  compatibility headers f (*)
  6a050a3... libstdc++: Move include guards to start of headers (*)
  1b169ee... libstdc++: More user-friendly failed assertions from shared (*)
  f7979b8... libstdc++: Enable debug assertions for filesystem directory (*)
  05e70ff... ipcp don't propagate where not needed (*)
  6d8764c... store-merging: Apply --param=store-merging-max-size= in mor (*)
  aab5722... store-merging: Don't use sub_byte_op_p mode for empty_ctor_ (*)
  4dbf4c0... Fortran: F2008 passing of internal procs to a proc pointer  (*)
  8ac694a... i386: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 instruct (*)
  d228a07... Intel MOVRS tests: Also scan (%e.x) (*)
  c415565... gcc.target/i386/apx-ndd.c: Also scan (%edi) (*)
  f0d34e8... Daily bump. (*)
  8c41846... fortran: dynamically allocate error_buffer [PR117442] (*)
  2e35fbd... match: Fix comment for `X != 0 ? X + ~0 : 0` transformation (*)
  c751889... testsuite: arm: Use effective-target for pr68620 and pr7804 (*)
  e152a73... testsuite: arm: Relax register selection [PR116623] (*)
  4602f62... testsuite: arm: Use effective-target for pr98636.c test (*)
  3621d2a... c: gimplefe: Only allow an identifier before ? [PR117445] (*)
  161e246... PR target/117449: Restrict vector rotate match and split to (*)
  f185a89... testsuite: Fix up gcc.target/powerpc/safe-indirect-jump-3.c (*)
  3545aab... c++: allow array mem-init with -fpermissive [PR116634] (*)
  6543a21... Deprecate the ARM simulator (*)
  f31b72b... c++: Fix crash during NRV optimization with invalid input [ (*)
  5821f5c... c++: Don't crash upon invalid placement new operator [PR117 (*)
  b1d92ae... c++: Defer -fstrong-eval-order processing to template insta (*)
  5c19ba5... testsuite: fix testcase pr110279-1.c (*)
  648bd1f... Support vector float_extend from __bf16 to float. (*)
  a17acf4... Support vector float_truncate for SF to BF. (*)
  c1bbad0... c++: Mark replaceable global operator new/delete with const (*)
  ea46a21... i386: Handling exception input of __builtin_ia32_prefetch.  (*)
  2fc25a2... middle-end/117433 - ICE with gimple BLKmode reg copy (*)
  1cc2c45... aarch64: remove falkor-tag-collision-avoidance pass (*)
  acf18b5... aarch64: Remove scheduling models for falkor and saphira (*)
  61622cf... i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2 (*)
  6177b45... Handle T_HRESULT types in CodeView records (*)
  b0f4f55... Write LF_POINTER CodeView types for pointers to member func (*)
  7ac2407... Write LF_BCLASS records in CodeView struct definitions (*)
  a96c774... c++/modules: Merge default arguments [PR99274] (*)
  48ef485... c++/modules: Handle location exhaustion in write_location [ (*)
  ad1f112... Daily bump. (*)
  8ae4a83... simulate-thread tests: Silence gdb debuginfod warning (*)
  35425d0... libstdc++: Remove workaround for modules issue [PR113814] (*)
  c1d91ad... guality tests: Silence gdb debuginfod warning (*)
  6b31590... [PATCH v2 2/2] RISC-V: Disable by pieces for vector setmem  (*)
  b30c6a5... [PATCH v2 1/2] RISC-V: Make vectorized memset handle more c (*)
  fe97ac4... libgccjit: Add convert vector (*)
  b8ac365... diagnostics: update leading comment for class diagnostic_co (*)
  196b13f... diagnostics: cleanups to opts-diagnostic.cc (*)
  7bb75a5... libgccjit: Add gcc_jit_global_set_readonly (*)
  e995866... testsuite: arm: Force hard ABI for pr51534.c test (*)
  7b2e6e6... testsuite: arm: Use effective-target for data-intrinsics-as (*)
  d56d2f3... testsuite: arm: Relax cbranch tests to accept inverted bran (*)
  e3f2db9... testsuite: arm: Update expected asm in armv8_2-fp16-neon-2. (*)
  56acc94... libgccjit: Add count zeroes builtins to ensure_optimization (*)
  899b5be... ada: Move special case for null string literal from fronten (*)
  023a5dd... ada: Remove special case for the size of a string literal s (*)
  bffba3

[gcc(refs/users/aoliva/heads/testme)] ifcombine across noncontiguous blocks

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:ae074c69fd5aff10953264dbd9740cebfeb0902e

commit ae074c69fd5aff10953264dbd9740cebfeb0902e
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:42 2024 -0300

ifcombine across noncontiguous blocks

Rework ifcombine to support merging conditions from noncontiguous
blocks.  This depends on earlier preparation changes.

The function that attempted to ifcombine a block with its immediate
predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks
eligible for ifcombine, attempting to combine with them.

The function that actually drives the combination of a pair of blocks,
tree_ssa_ifcombine_bb_1, now takes an additional parameter: the
successor of outer that leads to inner.

The function that recognizes if_then_else patterns is modified to
enable testing without distinguishing between then and else, or to
require nondegenerate conditions, that aren't worth combining with.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (recognize_if_then_else): Support
relaxed then/else testing; require nondegenerate condition
otherwise.
(tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it
instead of inner_cond_bb.  Adjust callers.
(tree_ssa_ifcombine_bb): Loop over dominating outer blocks
eligible for ifcombine.
(pass_tree_ifcombine::execute): Noted potential need for
changes to the post-combine logic.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 152 +-
 1 file changed, 123 insertions(+), 29 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 49bd7f2915c2..158f2a645020 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -86,25 +86,34 @@ known_succ_p (basic_block cond_bb)
is left to CFG cleanup and DCE.  */
 
 
-/* Recognize a if-then-else CFG pattern starting to match with the
-   COND_BB basic-block containing the COND_EXPR.  The recognized
-   then end else blocks are stored to *THEN_BB and *ELSE_BB.  If
-   *THEN_BB and/or *ELSE_BB are already set, they are required to
-   match the then and else basic-blocks to make the pattern match.
-   Returns true if the pattern matched, false otherwise.  */
+/* Recognize a if-then-else CFG pattern starting to match with the COND_BB
+   basic-block containing the COND_EXPR.  If !SUCCS_ANY, the condition must not
+   resolve to a constant for a match.  Returns true if the pattern matched,
+   false otherwise.  In case of a !SUCCS_ANY match, the recognized then end
+   else blocks are stored to *THEN_BB and *ELSE_BB.  If *THEN_BB and/or
+   *ELSE_BB are already set, they are required to match the then and else
+   basic-blocks to make the pattern match.  If SUCCS_ANY, *THEN_BB and *ELSE_BB
+   will not be filled in, and they will be found to match even if reversed.  */
 
 static bool
 recognize_if_then_else (basic_block cond_bb,
-   basic_block *then_bb, basic_block *else_bb)
+   basic_block *then_bb, basic_block *else_bb,
+   bool succs_any = false)
 {
   edge t, e;
 
-  if (EDGE_COUNT (cond_bb->succs) != 2)
+  if (EDGE_COUNT (cond_bb->succs) != 2
+  || (!succs_any && known_succ_p (cond_bb)))
 return false;
 
   /* Find the then/else edges.  */
   t = EDGE_SUCC (cond_bb, 0);
   e = EDGE_SUCC (cond_bb, 1);
+
+  if (succs_any)
+return ((t->dest == *then_bb && e->dest == *else_bb)
+   || (t->dest == *else_bb && e->dest == *then_bb));
+
   if (!(t->flags & EDGE_TRUE_VALUE))
 std::swap (t, e);
   if (!(t->flags & EDGE_TRUE_VALUE)
@@ -889,19 +898,21 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
inner_inv,
 /* Helper function for tree_ssa_ifcombine_bb.  Recognize a CFG pattern and
dispatch to the appropriate if-conversion helper for a particular
set of INNER_COND_BB, OUTER_COND_BB, THEN_BB and ELSE_BB.
-   PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB.  */
+   PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB.
+   OUTER_SUCC_BB is the successor of OUTER_COND_BB on the path towards
+   INNER_COND_BB.  */
 
 static bool
 tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
 basic_block then_bb, basic_block else_bb,
-basic_block phi_pred_bb)
+basic_block phi_pred_bb, basic_block outer_succ_bb)
 {
   /* The && form is characterized by a common else_bb with
  the two edges leading to it mergable.  The latter is
  guaranteed by matching PHI arguments in the else_bb and
  the inner cond_bb having no side-effects.  */
   if (phi_pred_bb != else_bb
-  && recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &else_bb)
+  && recognize_if_then_else (outer_cond_bb, &outer_succ_bb, &else_bb)
   && same_phi_args_p (outer_cond_bb, phi_pred_bb, els

[gcc r15-5003] ifcombine across noncontiguous blocks

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:ae074c69fd5aff10953264dbd9740cebfeb0902e

commit r15-5003-gae074c69fd5aff10953264dbd9740cebfeb0902e
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:42 2024 -0300

ifcombine across noncontiguous blocks

Rework ifcombine to support merging conditions from noncontiguous
blocks.  This depends on earlier preparation changes.

The function that attempted to ifcombine a block with its immediate
predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks
eligible for ifcombine, attempting to combine with them.

The function that actually drives the combination of a pair of blocks,
tree_ssa_ifcombine_bb_1, now takes an additional parameter: the
successor of outer that leads to inner.

The function that recognizes if_then_else patterns is modified to
enable testing without distinguishing between then and else, or to
require nondegenerate conditions, that aren't worth combining with.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (recognize_if_then_else): Support
relaxed then/else testing; require nondegenerate condition
otherwise.
(tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it
instead of inner_cond_bb.  Adjust callers.
(tree_ssa_ifcombine_bb): Loop over dominating outer blocks
eligible for ifcombine.
(pass_tree_ifcombine::execute): Noted potential need for
changes to the post-combine logic.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 152 +-
 1 file changed, 123 insertions(+), 29 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 49bd7f2915c2..158f2a645020 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -86,25 +86,34 @@ known_succ_p (basic_block cond_bb)
is left to CFG cleanup and DCE.  */
 
 
-/* Recognize a if-then-else CFG pattern starting to match with the
-   COND_BB basic-block containing the COND_EXPR.  The recognized
-   then end else blocks are stored to *THEN_BB and *ELSE_BB.  If
-   *THEN_BB and/or *ELSE_BB are already set, they are required to
-   match the then and else basic-blocks to make the pattern match.
-   Returns true if the pattern matched, false otherwise.  */
+/* Recognize a if-then-else CFG pattern starting to match with the COND_BB
+   basic-block containing the COND_EXPR.  If !SUCCS_ANY, the condition must not
+   resolve to a constant for a match.  Returns true if the pattern matched,
+   false otherwise.  In case of a !SUCCS_ANY match, the recognized then end
+   else blocks are stored to *THEN_BB and *ELSE_BB.  If *THEN_BB and/or
+   *ELSE_BB are already set, they are required to match the then and else
+   basic-blocks to make the pattern match.  If SUCCS_ANY, *THEN_BB and *ELSE_BB
+   will not be filled in, and they will be found to match even if reversed.  */
 
 static bool
 recognize_if_then_else (basic_block cond_bb,
-   basic_block *then_bb, basic_block *else_bb)
+   basic_block *then_bb, basic_block *else_bb,
+   bool succs_any = false)
 {
   edge t, e;
 
-  if (EDGE_COUNT (cond_bb->succs) != 2)
+  if (EDGE_COUNT (cond_bb->succs) != 2
+  || (!succs_any && known_succ_p (cond_bb)))
 return false;
 
   /* Find the then/else edges.  */
   t = EDGE_SUCC (cond_bb, 0);
   e = EDGE_SUCC (cond_bb, 1);
+
+  if (succs_any)
+return ((t->dest == *then_bb && e->dest == *else_bb)
+   || (t->dest == *else_bb && e->dest == *then_bb));
+
   if (!(t->flags & EDGE_TRUE_VALUE))
 std::swap (t, e);
   if (!(t->flags & EDGE_TRUE_VALUE)
@@ -889,19 +898,21 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
inner_inv,
 /* Helper function for tree_ssa_ifcombine_bb.  Recognize a CFG pattern and
dispatch to the appropriate if-conversion helper for a particular
set of INNER_COND_BB, OUTER_COND_BB, THEN_BB and ELSE_BB.
-   PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB.  */
+   PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB.
+   OUTER_SUCC_BB is the successor of OUTER_COND_BB on the path towards
+   INNER_COND_BB.  */
 
 static bool
 tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
 basic_block then_bb, basic_block else_bb,
-basic_block phi_pred_bb)
+basic_block phi_pred_bb, basic_block outer_succ_bb)
 {
   /* The && form is characterized by a common else_bb with
  the two edges leading to it mergable.  The latter is
  guaranteed by matching PHI arguments in the else_bb and
  the inner cond_bb having no side-effects.  */
   if (phi_pred_bb != else_bb
-  && recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &else_bb)
+  && recognize_if_then_else (outer_cond_bb, &outer_succ_bb, &else_bb)
   && same_phi_args_p (outer_cond_bb, phi_pr

[gcc r15-5001] adjust update_profile_after_ifcombine for noncontiguous ifcombine

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:02dc5036ba8d816048b942e51f74a9e4b3fde173

commit r15-5001-g02dc5036ba8d816048b942e51f74a9e4b3fde173
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:34 2024 -0300

adjust update_profile_after_ifcombine for noncontiguous ifcombine

Prepare for ifcombining noncontiguous blocks, adding (still unused)
logic to the ifcombine profile updater to handle such cases.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (known_succ_p): New.
(update_profile_after_ifcombine): Handle noncontiguous blocks.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 109 --
 1 file changed, 85 insertions(+), 24 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 1f28cde719a7..64319874888e 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -50,6 +50,21 @@ along with GCC; see the file COPYING3.  If not see
 false) >= 2)
 #endif
 
+/* Return FALSE iff the COND_BB ends with a conditional whose result is not a
+   known constant.  */
+
+static bool
+known_succ_p (basic_block cond_bb)
+{
+  gcond *cond = safe_dyn_cast  (*gsi_last_bb (cond_bb));
+
+  if (!cond)
+return true;
+
+  return (CONSTANT_CLASS_P (gimple_cond_lhs (cond))
+ && CONSTANT_CLASS_P (gimple_cond_rhs (cond)));
+}
+
 /* This pass combines COND_EXPRs to simplify control flow.  It
currently recognizes bit tests and comparisons in chains that
represent logical and or logical or of two COND_EXPRs.
@@ -357,14 +372,28 @@ recognize_bits_test (gcond *cond, tree *name, tree *bits, 
bool inv)
 }
 
 
-/* Update profile after code in outer_cond_bb was adjusted so
-   outer_cond_bb has no condition.  */
+/* Update profile after code in either outer_cond_bb or inner_cond_bb was
+   adjusted so that it has no condition.  */
 
 static void
 update_profile_after_ifcombine (basic_block inner_cond_bb,
basic_block outer_cond_bb)
 {
-  edge outer_to_inner = find_edge (outer_cond_bb, inner_cond_bb);
+  /* In the following we assume that inner_cond_bb has single predecessor.  */
+  gcc_assert (single_pred_p (inner_cond_bb));
+
+  basic_block outer_to_inner_bb = inner_cond_bb;
+  profile_probability prob = profile_probability::always ();
+  for (;;)
+{
+  basic_block parent = single_pred (outer_to_inner_bb);
+  prob *= find_edge (parent, outer_to_inner_bb)->probability;
+  if (parent == outer_cond_bb)
+   break;
+  outer_to_inner_bb = parent;
+}
+
+  edge outer_to_inner = find_edge (outer_cond_bb, outer_to_inner_bb);
   edge outer2 = (EDGE_SUCC (outer_cond_bb, 0) == outer_to_inner
 ? EDGE_SUCC (outer_cond_bb, 1)
 : EDGE_SUCC (outer_cond_bb, 0));
@@ -375,29 +404,61 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
 std::swap (inner_taken, inner_not_taken);
   gcc_assert (inner_taken->dest == outer2->dest);
 
-  /* In the following we assume that inner_cond_bb has single predecessor.  */
-  gcc_assert (single_pred_p (inner_cond_bb));
-
-  /* Path outer_cond_bb->(outer2) needs to be merged into path
- outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
- and probability of inner_not_taken updated.  */
-
-  inner_cond_bb->count = outer_cond_bb->count;
+  if (outer_to_inner_bb == inner_cond_bb
+  && known_succ_p (outer_cond_bb))
+{
+  /* Path outer_cond_bb->(outer2) needs to be merged into path
+outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
+and probability of inner_not_taken updated.  */
+
+  inner_cond_bb->count = outer_cond_bb->count;
+
+  /* Handle special case where inner_taken probability is always. In this
+case we know that the overall outcome will be always as well, but
+combining probabilities will be conservative because it does not know
+that outer2->probability is inverse of
+outer_to_inner->probability.  */
+  if (inner_taken->probability == profile_probability::always ())
+   ;
+  else
+   inner_taken->probability = outer2->probability
+ + outer_to_inner->probability * inner_taken->probability;
+  inner_not_taken->probability = profile_probability::always ()
+   - inner_taken->probability;
 
-  /* Handle special case where inner_taken probability is always. In this case
- we know that the overall outcome will be always as well, but combining
- probabilities will be conservative because it does not know that
- outer2->probability is inverse of outer_to_inner->probability.  */
-  if (inner_taken->probability == profile_probability::always ())
-;
+  outer_to_inner->probability = profile_probability::always ();
+  outer2->probability = profile_probability::never ();
+}
+  else if (known_succ_p (inner_cond_bb))
+{
+  /* Path inner_cond_bb->(inner_taken) needs to be merged into path
+outer_cond_bb->(outer2).  We've ac

[gcc r15-4997] [testsuite] disable PIE on ia32 on more tests

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:2ec80c60d4f10dcdbc9fad5d35297bfa432d14aa

commit r15-4997-g2ec80c60d4f10dcdbc9fad5d35297bfa432d14aa
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:06 2024 -0300

[testsuite] disable PIE on ia32 on more tests

Multiple tests fail on ia32 with -fPIE enabled by default because of
different call sequences required by the call-saved PIC register
(no-callee-saved-*.c), uses of the constant pool instead of computing
constants (pr100865-*.c), and unexpected matches of esp in get_pc_thunk
(sse2-stv-1.c).  Disable PIE on them, to match the expectations.


for  gcc/testsuite/ChangeLog

* gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/sse2-stv-1.c: Likewise.

Diff:
---
 gcc/testsuite/gcc.target/i386/no-callee-saved-13.c | 1 +
 gcc/testsuite/gcc.target/i386/no-callee-saved-14.c | 1 +
 gcc/testsuite/gcc.target/i386/no-callee-saved-15.c | 1 +
 gcc/testsuite/gcc.target/i386/no-callee-saved-17.c | 1 +
 gcc/testsuite/gcc.target/i386/pr100865-1.c | 1 +
 gcc/testsuite/gcc.target/i386/pr100865-7a.c| 1 +
 gcc/testsuite/gcc.target/i386/pr100865-7c.c| 1 +
 gcc/testsuite/gcc.target/i386/sse2-stv-1.c | 1 +
 8 files changed, 8 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
index 6757e72d8487..0b59da36786a 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-13.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern void foo (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
index 2239e286e6a6..2127b12f120b 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-14.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern void bar (void) __attribute__ ((no_callee_saved_registers));
 
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
index 10135fec9c14..65f2a9532ffd 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-15.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
 extern fn_t bar;
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
index 1fd5daadf080..1ecf4552f3d0 100644
--- a/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-17.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern void foo (void) __attribute__ ((no_caller_saved_registers));
 
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c 
b/gcc/testsuite/gcc.target/i386/pr100865-1.c
index 75cd463cbfc2..fc0a5b33950f 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64" } */
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7a.c 
b/gcc/testsuite/gcc.target/i386/pr100865-7a.c
index 7de7d4a3ce3a..9fb5dc525652 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-7a.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-7a.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=skylake" } */
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern long long int array[64];
 
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7c.c 
b/gcc/testsuite/gcc.target/i386/pr100865-7c.c
index edbfd5b09ed6..695831e59af5 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-7c.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-7c.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=skylake -mno-avx2" } */
+/* { dg-additional-options "-fno-PIE" { target ia32 } } */
 
 extern long long int array[64];
 
dif

[gcc r15-5004] handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:13cf22eb557eb5e3d796822247d8d4957bdb25da

commit r15-5004-g13cf22eb557eb5e3d796822247d8d4957bdb25da
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:46 2024 -0300

handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond

The upcoming move of fold_truth_andor to ifcombine brings with it the
possibility of TRUTH_ANDIF cond exprs.  Handle them by splitting the
cond so as to best use both BB insertion points, but only if they're
contiguous.


for  gcc/ChangeLog

* tree-ssa-ifcombine.c (ifcombine_replace_cond): Support
TRUTH_ANDIF cond exprs.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 158f2a645020..d52510e3c3fb 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -519,6 +519,17 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
gcond *outer_cond, bool outer_inv,
tree cond, bool must_canon, tree cond2)
 {
+  /* Split cond into cond2 if they're contiguous.  ??? We might be able to
+ handle ORIF as well, inverting both conditions, but it's not clear that
+ this would be enough, and it never comes up.  */
+  if (!cond2
+  && TREE_CODE (cond) == TRUTH_ANDIF_EXPR
+  && single_pred (gimple_bb (inner_cond)) == gimple_bb (outer_cond))
+{
+  cond2 = TREE_OPERAND (cond, 1);
+  cond = TREE_OPERAND (cond, 0);
+}
+
   bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
   != gimple_bb (outer_cond));
   bool result_inv = outer_p ? outer_inv : inner_inv;


[gcc r15-4999] drop redundant ifcombine_ifandif parm

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:77c925464e50dfdf224be3c27e5b72de21a92e86

commit r15-4999-g77c925464e50dfdf224be3c27e5b72de21a92e86
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:19 2024 -0300

drop redundant ifcombine_ifandif parm

In preparation to changes that may modify both inner and outer
conditions in ifcombine, drop the redundant parameter result_inv, that
is always identical to inner_inv.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant
result_inv parm.  Adjust all callers.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 57b7e4b62f29..7fc46a913768 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -402,14 +402,14 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
 
 /* If-convert on a and pattern with a common else block.  The inner
if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
-   inner_inv, outer_inv and result_inv indicate whether the conditions
-   are inverted.
+   inner_inv, outer_inv indicate whether the conditions are inverted.
Returns true if the edges to the common else basic-block were merged.  */
 
 static bool
 ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
-  basic_block outer_cond_bb, bool outer_inv, bool result_inv)
+  basic_block outer_cond_bb, bool outer_inv)
 {
+  bool result_inv = inner_inv;
   gimple_stmt_iterator gsi;
   tree name1, name2, bit1, bit2, bits1, bits2;
 
@@ -694,8 +694,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
basic_block outer_cond_bb,
   
 ...
*/
-  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false,
-   false);
+  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false);
 }
 
   /* And a version where the outer condition is negated.  */
@@ -712,8 +711,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
basic_block outer_cond_bb,
   
 ...
*/
-  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true,
-   false);
+  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true);
 }
 
   /* The || form is characterized by a common then_bb with the
@@ -732,8 +730,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
basic_block outer_cond_bb,
   
 ...
*/
-  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true,
-   true);
+  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true);
 }
 
   /* And a version where the outer condition is negated.  */
@@ -749,8 +746,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
basic_block outer_cond_bb,
   
 ...
*/
-  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false,
-   true);
+  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false);
 }
 
   return false;


[gcc r15-5000] introduce ifcombine_replace_cond

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:f9fb8f96cd7d849ca68da8839b2e8fe8eeb70411

commit r15-5000-gf9fb8f96cd7d849ca68da8839b2e8fe8eeb70411
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:31 2024 -0300

introduce ifcombine_replace_cond

Refactor ifcombine_ifandif, moving the common code from the various
paths that apply the combined condition to a new function.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out
of...
(ifcombine_ifandif): ... this.  Leave it for the above to
gimplify and invert the condition.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 137 ++
 1 file changed, 65 insertions(+), 72 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 7fc46a913768..1f28cde719a7 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -400,6 +400,51 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
   outer2->probability = profile_probability::never ();
 }
 
+/* Replace the conditions in INNER_COND with COND.
+   Replace OUTER_COND with a constant.  */
+
+static bool
+ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
+   gcond *outer_cond, bool outer_inv,
+   tree cond, bool must_canon, tree cond2)
+{
+  bool result_inv = inner_inv;
+
+  gcc_checking_assert (!cond2);
+
+  if (result_inv)
+cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
+
+  if (tree tcanon = canonicalize_cond_expr_cond (cond))
+cond = tcanon;
+  else if (must_canon)
+return false;
+
+{
+  if (!is_gimple_condexpr_for_cond (cond))
+   {
+ gimple_stmt_iterator gsi = gsi_for_stmt (inner_cond);
+ cond = force_gimple_operand_gsi_1 (&gsi, cond,
+is_gimple_condexpr_for_cond,
+NULL, true, GSI_SAME_STMT);
+   }
+  gimple_cond_set_condition_from_tree (inner_cond, cond);
+  update_stmt (inner_cond);
+
+  /* Leave CFG optimization to cfg_cleanup.  */
+  gimple_cond_set_condition_from_tree (outer_cond,
+  outer_inv
+  ? boolean_false_node
+  : boolean_true_node);
+  update_stmt (outer_cond);
+}
+
+  update_profile_after_ifcombine (gimple_bb (inner_cond),
+ gimple_bb (outer_cond));
+
+  return true;
+}
+
 /* If-convert on a and pattern with a common else block.  The inner
if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
inner_inv, outer_inv indicate whether the conditions are inverted.
@@ -409,7 +454,6 @@ static bool
 ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
   basic_block outer_cond_bb, bool outer_inv)
 {
-  bool result_inv = inner_inv;
   gimple_stmt_iterator gsi;
   tree name1, name2, bit1, bit2, bits1, bits2;
 
@@ -447,26 +491,13 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
inner_inv,
   t2 = fold_build2 (BIT_AND_EXPR, TREE_TYPE (name1), name1, t);
   t2 = force_gimple_operand_gsi (&gsi, t2, true, NULL_TREE,
 true, GSI_SAME_STMT);
-  t = fold_build2 (result_inv ? NE_EXPR : EQ_EXPR,
-  boolean_type_node, t2, t);
-  t = canonicalize_cond_expr_cond (t);
-  if (!t)
-   return false;
-  if (!is_gimple_condexpr_for_cond (t))
-   {
- gsi = gsi_for_stmt (inner_cond);
- t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
- NULL, true, GSI_SAME_STMT);
-   }
-  gimple_cond_set_condition_from_tree (inner_cond, t);
-  update_stmt (inner_cond);
 
-  /* Leave CFG optimization to cfg_cleanup.  */
-  gimple_cond_set_condition_from_tree (outer_cond,
-   outer_inv ? boolean_false_node : boolean_true_node);
-  update_stmt (outer_cond);
+  t = fold_build2 (EQ_EXPR, boolean_type_node, t2, t);
 
-  update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
+  if (!ifcombine_replace_cond (inner_cond, inner_inv,
+  outer_cond, outer_inv,
+  t, true, NULL_TREE))
+   return false;
 
   if (dump_file)
{
@@ -486,9 +517,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
inner_inv,
  In that case remove the outer test and change the inner one to
  test for name & (bits1 | bits2) != 0.  */
   else if (recognize_bits_test (inner_cond, &name1, &bits1, !inner_inv)
-  && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
+  && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
 {
-  gimple_stmt_iterator gsi;
   tree t;
 
   if ((TREE_CODE (name1) == SSA_NAME
@@ -531,33 +561,14 @@ ifcombine_ifandif (basic_block inner_con

[gcc r15-5002] extend ifcombine_replace_cond to handle noncontiguous ifcombine

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:6eac478619193eeb2fd714eb0988ce3197dd63b1

commit r15-5002-g6eac478619193eeb2fd714eb0988ce3197dd63b1
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:38 2024 -0300

extend ifcombine_replace_cond to handle noncontiguous ifcombine

Prepare to handle noncontiguous ifcombine, introducing logic to modify
the outer condition when needed.  There are two cases worth
mentioning:

- when blocks are noncontiguous, we have to place the combined
  condition in the outer block to avoid pessimizing carefully crafted
  short-circuited tests;

- even when blocks are contiguous, we prepare for situations in which
  the combined condition has two tests, one to be placed in outer and
  the other in inner.  This circumstance will not come up when
  noncontiguous ifcombine is first enabled, but it will when
  an improved fold_truth_andor is integrated with ifcombine.

Combining the condition from inner into outer may require moving SSA
DEFs used in the inner condition, and the changes implement this as
well.


for  gcc/ChangeLog

* tree-ssa-ifcombine.cc: Include bitmap.h.
(ifcombine_mark_ssa_name): New.
(struct ifcombine_mark_ssa_name_t): New.
(ifcombine_mark_ssa_name_walk): New.
(ifcombine_replace_cond): Prepare to handle noncontiguous and
split-condition ifcombine.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 175 --
 1 file changed, 170 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 64319874888e..49bd7f2915c2 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "attribs.h"
 #include "asan.h"
+#include "bitmap.h"
 
 #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
 #define LOGICAL_OP_NON_SHORT_CIRCUIT \
@@ -461,17 +462,57 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
 }
 }
 
-/* Replace the conditions in INNER_COND with COND.
-   Replace OUTER_COND with a constant.  */
+/* Set NAME's bit in USED if OUTER dominates it.  */
+
+static void
+ifcombine_mark_ssa_name (bitmap used, tree name, basic_block outer)
+{
+  if (SSA_NAME_IS_DEFAULT_DEF (name))
+return;
+
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  basic_block bb = gimple_bb (def);
+  if (!dominated_by_p (CDI_DOMINATORS, bb, outer))
+return;
+
+  bitmap_set_bit (used, SSA_NAME_VERSION (name));
+}
+
+/* Data structure passed to ifcombine_mark_ssa_name.  */
+struct ifcombine_mark_ssa_name_t
+{
+  /* SSA_NAMEs that have been referenced.  */
+  bitmap used;
+  /* Dominating block of DEFs that might need moving.  */
+  basic_block outer;
+};
+
+/* Mark in DATA->used any SSA_NAMEs used in *t.  */
+
+static tree
+ifcombine_mark_ssa_name_walk (tree *t, int *, void *data_)
+{
+  ifcombine_mark_ssa_name_t *data = (ifcombine_mark_ssa_name_t *)data_;
+
+  if (*t && TREE_CODE (*t) == SSA_NAME)
+ifcombine_mark_ssa_name (data->used, *t, data->outer);
+
+  return NULL;
+}
+
+/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
+   COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
+   replaced with a constant, but if there are intervening blocks, it's best to
+   adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND.  */
 
 static bool
 ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
gcond *outer_cond, bool outer_inv,
tree cond, bool must_canon, tree cond2)
 {
-  bool result_inv = inner_inv;
-
-  gcc_checking_assert (!cond2);
+  bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
+  != gimple_bb (outer_cond));
+  bool result_inv = outer_p ? outer_inv : inner_inv;
 
   if (result_inv)
 cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
@@ -481,6 +522,130 @@ ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
   else if (must_canon)
 return false;
 
+  if (outer_p)
+{
+  {
+   auto_bitmap used;
+   basic_block outer_bb = gimple_bb (outer_cond);
+
+   bitmap_tree_view (used);
+
+   /* Mark SSA DEFs that are referenced by cond and may thus need to be
+  moved to outer.  */
+   {
+ ifcombine_mark_ssa_name_t data = { used, outer_bb };
+ walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
+   }
+
+   if (!bitmap_empty_p (used))
+ {
+   /* Iterate up from inner_cond, moving DEFs identified as used by
+  cond, and marking USEs in the DEFs for moving as well.  */
+   gimple_stmt_iterator gsins = gsi_for_stmt (outer_cond);
+   for (basic_block bb = gimple_bb (inner_cond);
+bb != outer_bb; bb = single_pred (bb))
+ {
+   for (gimple_stmt_iterator gsitr = gsi_last_

[gcc r15-4996] [testsuite] fix pr70321.c PIC expectations

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:d17a2e8bfc91a8e401a2d8c61e23fba36e28a43d

commit r15-4996-gd17a2e8bfc91a8e401a2d8c61e23fba36e28a43d
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:46:57 2024 -0300

[testsuite] fix pr70321.c PIC expectations

When we select a non-bx get_pc_thunk, we get an extra mov to set up
the PIC register before the abort call.  Expect that mov or a
get_pc_thunk.bx call.


for  gcc/testsuite/ChangeLog

* gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.

Diff:
---
 gcc/testsuite/gcc.target/i386/pr70321.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr70321.c 
b/gcc/testsuite/gcc.target/i386/pr70321.c
index 58f5f5661c7a..287b7da1b950 100644
--- a/gcc/testsuite/gcc.target/i386/pr70321.c
+++ b/gcc/testsuite/gcc.target/i386/pr70321.c
@@ -9,4 +9,8 @@ void foo (long long ixi)
 
 /* { dg-final { scan-assembler-times "mov" 1 { target nonpic } } } */
 /* get_pc_thunk adds an extra mov insn.  */
-/* { dg-final { scan-assembler-times "mov" 2 { target { ! nonpic } } } } */
+/* Choosing a non-bx get_pc_thunk requires another mov before the abort call.
+   So we require a match of either that mov or the get_pc_thunk.bx call, in
+   addition to the other 2 movs.  (Hopefully there won't be more calls for a
+   false positive.)  */
+/* { dg-final { scan-assembler-times "mov|call\[^\n\r]*get_pc_thunk\.bx" 3 { 
target { ! nonpic } } } } */


[gcc r15-4998] allow vuses in ifcombine blocks

2024-11-06 Thread Alexandre Oliva via Gcc-cvs
https://gcc.gnu.org/g:8e6a25b01becf449d54154b7e83de5f4955cba37

commit r15-4998-g8e6a25b01becf449d54154b7e83de5f4955cba37
Author: Alexandre Oliva 
Date:   Thu Nov 7 02:47:15 2024 -0300

allow vuses in ifcombine blocks

Disallowing vuses in blocks for ifcombine is too strict, and it
prevents usefully moving fold_truth_andor into ifcombine.  That
tree-level folder has long ifcombined loads, absent other relevant
side effects.


for  gcc/ChangeLog

* tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses,
but not vdefs.

Diff:
---
 gcc/tree-ssa-ifcombine.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index 39702929fc01..57b7e4b62f29 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -130,7 +130,7 @@ bb_no_side_effects_p (basic_block bb)
   enum tree_code rhs_code;
   if (gimple_has_side_effects (stmt)
  || gimple_could_trap_p (stmt)
- || gimple_vuse (stmt)
+ || gimple_vdef (stmt)
  /* We need to rewrite stmts with undefined overflow to use
 unsigned arithmetic but cannot do so for signed division.  */
  || ((ass = dyn_cast  (stmt))


  1   2   >