date:20241016

[gcc r15-4382] tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP

2024-10-16 Thread Richard Biener via Gcc-cvs

https://gcc.gnu.org/g:ae224de0631a7fcac37ac1384f457f1dc1a487b2

commit r15-4382-gae224de0631a7fcac37ac1384f457f1dc1a487b2
Author: Richard Biener 
Date:   Thu Oct 10 11:02:47 2024 +0200

tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP

The following is a more complete fix for PR117050, restoring the
ability to permute non-grouped .MASK_LOAD with.

PR tree-optimization/117050
* tree-vect-slp.cc (vect_build_slp_tree_2): Properly handle
non-grouped masked loads when handling permutations.

Diff:
---
 gcc/tree-vect-slp.cc | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f053198b86b9..629c4b433ab5 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1991,7 +1991,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  stmt_vec_info load_info;
  load_permutation.create (group_size);
  stmt_vec_info first_stmt_info
-   = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (node)[0]);
+   = STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ ? DR_GROUP_FIRST_ELEMENT (stmt_info) : stmt_info;
  bool any_permute = false;
  bool any_null = false;
  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info)
@@ -2035,8 +2036,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 loads with gaps.  */
  if ((STMT_VINFO_GROUPED_ACCESS (stmt_info)
   && (DR_GROUP_GAP (first_stmt_info) != 0 || has_gaps))
- || STMT_VINFO_STRIDED_P (stmt_info)
- || (!STMT_VINFO_GROUPED_ACCESS (stmt_info) && any_permute))
+ || STMT_VINFO_STRIDED_P (stmt_info))
{
  load_permutation.release ();
  matches[0] = false;
@@ -2051,17 +2051,17 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
{
  /* Discover the whole unpermuted load.  */
  vec stmts2;
- stmts2.create (DR_GROUP_SIZE (first_stmt_info));
- stmts2.quick_grow_cleared (DR_GROUP_SIZE (first_stmt_info));
+ unsigned dr_group_size = STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ ? DR_GROUP_SIZE (first_stmt_info) : 1;
+ stmts2.create (dr_group_size);
+ stmts2.quick_grow_cleared (dr_group_size);
  unsigned i = 0;
  for (stmt_vec_info si = first_stmt_info;
   si; si = DR_GROUP_NEXT_ELEMENT (si))
stmts2[i++] = si;
- bool *matches2
-   = XALLOCAVEC (bool, DR_GROUP_SIZE (first_stmt_info));
+ bool *matches2 = XALLOCAVEC (bool, dr_group_size);
  slp_tree unperm_load
-   = vect_build_slp_tree (vinfo, stmts2,
-  DR_GROUP_SIZE (first_stmt_info),
+   = vect_build_slp_tree (vinfo, stmts2, dr_group_size,
   &this_max_nunits, matches2, limit,
   &this_tree_size, bst_map);
  /* When we are able to do the full masked load emit that

[gcc r15-4381] Remove SLP_INSTANCE_UNROLLING_FACTOR, compute VF in vect_make_slp_decision

2024-10-16 Thread Richard Biener via Gcc-cvs

https://gcc.gnu.org/g:962a994d57f5b39c92d26b0446dbe6de9aa8910a

commit r15-4381-g962a994d57f5b39c92d26b0446dbe6de9aa8910a
Author: Richard Biener 
Date:   Wed Oct 9 14:38:48 2024 +0200

Remove SLP_INSTANCE_UNROLLING_FACTOR, compute VF in vect_make_slp_decision

The following prepares us for SLP instances with a non-uniform number
of lanes.  We already have this with load permutation lowering, but
we managed to keep that within the constraints of the per SLP instance
computed VF based on its max_nunits (with a vector type fixed for
each node) and the instance group size which is the number of lanes
in the SLP instance root.  But in the case where arbitrary splitting
and merging SLP nodes at non-power-of-two lane boundaries is allowed
this simple calculation based on the outgoing group size falls apart.

The following, instead of computing a VF during SLP instance
discovery, computes it at vect_make_slp_decision time by walking
the SLP graph and looking at each SLP node in isolation.  We do
track max_nunits per node which could be a VF per node instead or
forgo with both completely (though for BB vectorization we need
to communicate a VF > 1 requirement upward, or compute that after
the fact).  In the end we'd like to delay vector type assignment
and only compute a minimum VF here, allowing vector types to
grow when the actual VF is bigger.

There's slight complication with permutes of externs / constants
as those get their vector type (and thus max_nunits) assigned late.
While we force them to have the same vector type as the result at
the moment their number of lanes can differ.  So those get handled
explicitly there right now to up the VF as needed - the alternative
is to fail vectorization, I have an addition to
vect_maybe_update_slp_op_vectype that would FAIL if the set
vector type isn't within the constraints of the VF.

* tree-vectorizer.h (SLP_INSTANCE_UNROLLING_FACTOR): Remove.
(slp_instance::unrolling_factor): Likewise.
* tree-vect-slp.cc (vect_build_slp_instance): Do not set
SLP_INSTANCE_UNROLLING_FACTOR.  Remove then dead code.
Compute and set max_nunits from the RHS nodes merged.
(vect_update_slp_vf_for_node): New function.
(vect_make_slp_decision): Use vect_update_slp_vf_for_node
to compute VF recursively.
(vect_build_slp_store_interleaving): Get max_nunits and
properly set that on the permute nodes built.
(vect_analyze_slp): Do not set SLP_INSTANCE_UNROLLING_FACTOR.

Diff:
---
 gcc/tree-vect-slp.cc  | 72 +++
 gcc/tree-vectorizer.h |  4 ---
 2 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 8727246c27a6..f053198b86b9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3557,13 +3557,15 @@ vect_analyze_slp_instance (vec_info *vinfo,
 
 static slp_tree
 vect_build_slp_store_interleaving (vec &rhs_nodes,
-  vec &scalar_stmts)
+  vec &scalar_stmts,
+  poly_uint64 max_nunits)
 {
   unsigned int group_size = scalar_stmts.length ();
   slp_tree node = vect_create_new_slp_node (scalar_stmts,
SLP_TREE_CHILDREN
  (rhs_nodes[0]).length ());
   SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]);
+  node->max_nunits = max_nunits;
   for (unsigned l = 0;
l < SLP_TREE_CHILDREN (rhs_nodes[0]).length (); ++l)
 {
@@ -3573,6 +3575,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
   SLP_TREE_CHILDREN (node).quick_push (perm);
   SLP_TREE_LANE_PERMUTATION (perm).create (group_size);
   SLP_TREE_VECTYPE (perm) = SLP_TREE_VECTYPE (node);
+  perm->max_nunits = max_nunits;
   SLP_TREE_LANES (perm) = group_size;
   /* ???  We should set this NULL but that's not expected.  */
   SLP_TREE_REPRESENTATIVE (perm)
@@ -3628,6 +3631,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
  SLP_TREE_LANES (permab) = n;
  SLP_TREE_LANE_PERMUTATION (permab).create (n);
  SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
+ permab->max_nunits = max_nunits;
  /* ???  Should be NULL but that's not expected.  */
  SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm);
  SLP_TREE_CHILDREN (permab).quick_push (a);
@@ -3698,6 +3702,7 @@ vect_build_slp_store_interleaving (vec 
&rhs_nodes,
  SLP_TREE_LANES (permab) = n;
  SLP_TREE_LANE_PERMUTATION (permab).create (n);
  SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
+ permab->max_nunits = max_nunits;
  /* ???  Should be NULL but that's not expe

[gcc r15-4383] Enhance gather fallback for PR65518 with SLP

2024-10-16 Thread Richard Biener via Gcc-cvs

https://gcc.gnu.org/g:62c4e621a8182c58161188009f1e9b256e1b

commit r15-4383-g62c4e621a8182c58161188009f1e9b256e1b
Author: Richard Biener 
Date:   Wed Oct 16 10:09:36 2024 +0200

Enhance gather fallback for PR65518 with SLP

With SLP forced we fail to use gather for PR65518 on RISC-V as expected
because we're failing due to not effective peeling for gaps.  The
following appropriately moves the memory_access_type adjustment before
doing all the overrun checking since using VMAT_ELEMENTWISE means
there's no overrun.

* tree-vect-stmts.cc (get_group_load_store_type): Move
VMAT_ELEMENTWISE fallback for single-element interleaving
of too large groups before overrun checking.

* gcc.dg/vect/pr65518.c: Adjust.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr65518.c | 109 ++--
 gcc/tree-vect-stmts.cc  |  58 ++-
 2 files changed, 85 insertions(+), 82 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr65518.c 
b/gcc/testsuite/gcc.dg/vect/pr65518.c
index 189a65534f61..6d8515061694 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65518.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65518.c
@@ -1,54 +1,55 @@
-#include "tree-vect.h"
-
-#if VECTOR_BITS > 256
-#define NINTS (VECTOR_BITS / 32)
-#else
-#define NINTS 8
-#endif
-
-#define N (NINTS * 2)
-#define RESULT (NINTS * (NINTS - 1) / 2 * N + NINTS)
-
-extern void abort (void);
-
-typedef struct giga
-{
-  unsigned int g[N];
-} giga;
-
-unsigned long __attribute__((noinline,noclone))
-addfst(giga const *gptr, int num)
-{
-  unsigned int retval = 0;
-  int i;
-  for (i = 0; i < num; i++)
-retval += gptr[i].g[0];
-  return retval;
-}
-
-int main ()
-{
-  struct giga g[NINTS];
-  unsigned int n = 1;
-  int i, j;
-  check_vect ();
-  for (i = 0; i < NINTS; ++i)
-for (j = 0; j < N; ++j)
-  {
-   g[i].g[j] = n++;
-   __asm__ volatile ("");
-  }
-  if (addfst (g, NINTS) != RESULT)
-abort ();
-  return 0;
-}
-
-/* We don't want to vectorize the single-element interleaving in the way
-   we currently do that (without ignoring not needed vectors in the
-   gap between gptr[0].g[0] and gptr[1].g[0]), because that's very
-   sub-optimal and causes memory explosion (even though the cost model
-   should reject that in the end).  */
-
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops in function" 2 "vect" 
{ target {! riscv*-*-* } } } } */
-/* We end up using gathers for the strided load on RISC-V which would be OK.  
*/
-/* { dg-final { scan-tree-dump "using gather/scatter for strided/grouped 
access" "vect" { target { riscv*-*-* } } } } */
+#include "tree-vect.h"
+
+#if VECTOR_BITS > 256
+#define NINTS (VECTOR_BITS / 32)
+#else
+#define NINTS 8
+#endif
+
+#define N (NINTS * 2)
+#define RESULT (NINTS * (NINTS - 1) / 2 * N + NINTS)
+
+extern void abort (void);
+
+typedef struct giga
+{
+  unsigned int g[N];
+} giga;
+
+unsigned long __attribute__((noinline,noclone))
+addfst(giga const *gptr, int num)
+{
+  unsigned int retval = 0;
+  int i;
+  for (i = 0; i < num; i++)
+retval += gptr[i].g[0];
+  return retval;
+}
+
+int main ()
+{
+  struct giga g[NINTS];
+  unsigned int n = 1;
+  int i, j;
+  check_vect ();
+  for (i = 0; i < NINTS; ++i)
+for (j = 0; j < N; ++j)
+  {
+   g[i].g[j] = n++;
+   __asm__ volatile ("");
+  }
+  if (addfst (g, NINTS) != RESULT)
+abort ();
+  return 0;
+}
+
+/* We don't want to vectorize the single-element interleaving in the way
+   we currently do that (without ignoring not needed vectors in the
+   gap between gptr[0].g[0] and gptr[1].g[0]), because that's very
+   sub-optimal and causes memory explosion (even though the cost model
+   should reject that in the end).  */
+
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops in function" 2 "vect" 
{ target {! riscv*-*-* } } } } */
+/* We should end up using gathers for the strided load on RISC-V.  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" 
{ target { riscv*-*-* } } } } */
+/* { dg-final { scan-tree-dump "using gather/scatter for strided/grouped 
access" "vect" { target { riscv*-*-* } } } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 9b14b96cb5a6..6967d50288e9 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2081,6 +2081,35 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
  else
*memory_access_type = VMAT_CONTIGUOUS;
 
+ /* If this is single-element interleaving with an element
+distance that leaves unused vector loads around punt - we
+at least create very sub-optimal code in that case (and
+blow up memory, see PR65518).  */
+ if (loop_vinfo
+ && *memory_access_type == VMAT_CONTIGUOUS
+ && single_element_p
+ && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
+   {
+ if

[gcc r14-10795] libstdc++: Fix Python deprecation warning in printers.py

2024-10-16 Thread Jonathan Wakely via Gcc-cvs

https://gcc.gnu.org/g:23480efbab9212adb3a0147a3e5dd93d53e1843b

commit r14-10795-g23480efbab9212adb3a0147a3e5dd93d53e1843b
Author: Jonathan Wakely 
Date:   Wed Oct 16 09:22:37 2024 +0100

libstdc++: Fix Python deprecation warning in printers.py

python/libstdcxx/v6/printers.py:1355: DeprecationWarning: 'count' is passed 
as positional argument

The Python docs say:

  Deprecated since version 3.13: Passing count and flags as positional
  arguments is deprecated. In future Python versions they will be
  keyword-only parameters.

Using a keyword argument for count only became possible with Python 3.1
so introduce a new function to do the substitution.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (strip_fundts_namespace): New.
(StdExpAnyPrinter, StdExpOptionalPrinter): Use it.

(cherry picked from commit b9e98bb9919fa9f07782f23f79b3d35abb9ff542)

Diff:
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index a6c2ed4599fa..3026de35bbd2 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -220,6 +220,16 @@ def strip_versioned_namespace(typename):
 return typename.replace(_versioned_namespace, '')
 
 
+def strip_fundts_namespace(typ):
+"""Remove "fundamentals_vN" inline namespace from qualified type name."""
+pattern = r'^std::experimental::fundamentals_v\d::'
+repl = 'std::experimental::'
+if sys.version_info[0] == 2:
+return re.sub(pattern, repl, typ, 1)
+else: # Technically this needs Python 3.1 but nobody should be using 3.0
+return re.sub(pattern, repl, typ, count=1)
+
+
 def strip_inline_namespaces(type_str):
 """Remove known inline namespaces from the canonical name of a type."""
 type_str = strip_versioned_namespace(type_str)
@@ -1352,8 +1362,7 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
 
 def __init__(self, typename, val):
 self._typename = strip_versioned_namespace(typename)
-self._typename = re.sub(r'^std::experimental::fundamentals_v\d::',
-'std::experimental::', self._typename, 1)
+self._typename = strip_fundts_namespace(self._typename)
 self._val = val
 self._contained_type = None
 contained_value = None
@@ -1446,10 +1455,8 @@ class StdExpOptionalPrinter(SingleObjContainerPrinter):
 """Print a std::optional or std::experimental::optional."""
 
 def __init__(self, typename, val):
-typename = strip_versioned_namespace(typename)
-self._typename = re.sub(
-r'^std::(experimental::|)(fundamentals_v\d::|)(.*)',
-r'std::\1\3', typename, 1)
+self._typename = strip_versioned_namespace(typename)
+self._typename = strip_fundts_namespace(self._typename)
 payload = val['_M_payload']
 if self._typename.startswith('std::experimental'):
 engaged = val['_M_engaged']

[gcc r14-10792] libstdc++: Fix localized %c formatting for [PR117085]

2024-10-16 Thread Jonathan Wakely via Libstdc++-cvs

https://gcc.gnu.org/g:f1436fde43215659554418220aa45830a5e7ae61

commit r14-10792-gf1436fde43215659554418220aa45830a5e7ae61
Author: Jonathan Wakely 
Date:   Fri Oct 11 09:40:38 2024 +0100

libstdc++: Fix localized %c formatting for  [PR117085]

When formatting a time point with %c we call std::vformat_to using the
formatting locale's D_T_FMT string, but we weren't adding the L option
to the format string. This meant we always interpreted D_T_FMT in the C
locale, instead of using the formatting locale as obviously intended
when %c is used.

libstdc++-v3/ChangeLog:

PR libstdc++/117085
* include/bits/chrono_io.h (__formatter_chrono::_M_c): Add L
option to format string.
* testsuite/std/time/format.cc: Move to...
* testsuite/std/time/format/format.cc: ...here.
* testsuite/std/time/format/pr117085.cc: New test.

(cherry picked from commit 4ad697bb7f1aad252e1398c6f13eed3fa6d0ca5b)

Diff:
---
 libstdc++-v3/include/bits/chrono_io.h | 14 --
 .../testsuite/std/time/{ => format}/format.cc |  0
 libstdc++-v3/testsuite/std/time/format/pr117085.cc| 19 +++
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index eaa864b0bdcd..6c813bf439d5 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -793,17 +793,19 @@ namespace __format
  // %c  Locale's date and time representation.
  // %Ec Locale's alternate date and time representation.
 
+ basic_string<_CharT> __fmt;
  auto __t = _S_floor_seconds(__tt);
  locale __loc = _M_locale(__ctx);
  const auto& __tp = use_facet<__timepunct<_CharT>>(__loc);
  const _CharT* __formats[2];
  __tp._M_date_time_formats(__formats);
- const _CharT* __rep = __formats[__mod];
- if (!*__rep) [[unlikely]]
-   __rep = _GLIBCXX_WIDEN("%a %b %e %T %Y");
- basic_string<_CharT> __fmt(_S_empty_spec);
- __fmt.insert(1u, 1u, _S_colon);
- __fmt.insert(2u, __rep);
+ if (*__formats[__mod]) [[likely]]
+   {
+ __fmt = _GLIBCXX_WIDEN("{:L}");
+ __fmt.insert(3u, __formats[__mod]);
+   }
+ else
+   __fmt = _GLIBCXX_WIDEN("{:L%a %b %e %T %Y}");
  return std::vformat_to(std::move(__out), __loc, __fmt,
 std::make_format_args<_FormatContext>(__t));
}
diff --git a/libstdc++-v3/testsuite/std/time/format.cc 
b/libstdc++-v3/testsuite/std/time/format/format.cc
similarity index 100%
rename from libstdc++-v3/testsuite/std/time/format.cc
rename to libstdc++-v3/testsuite/std/time/format/format.cc
diff --git a/libstdc++-v3/testsuite/std/time/format/pr117085.cc 
b/libstdc++-v3/testsuite/std/time/format/pr117085.cc
new file mode 100644
index ..99ef8389995d
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/format/pr117085.cc
@@ -0,0 +1,19 @@
+// { dg-do run { target c++20 } }
+// { dg-require-namedlocale "fr_FR.ISO8859-1" }
+
+#include 
+#include 
+#include 
+
+void
+test_c()
+{
+  std::locale::global(std::locale(ISO_8859(1,fr_FR)));
+  auto s = std::format("{:L%c}", std::chrono::sys_seconds());
+  VERIFY( ! s.starts_with("Thu") );
+}
+
+int main()
+{
+  test_c();
+}

[gcc r15-4380] testsuite: Add tests for C23 __STDC_VERSION__

2024-10-16 Thread Joseph Myers via Gcc-cvs

https://gcc.gnu.org/g:65abc81c3982631255799e4a666a5dd5b03dc817

commit r15-4380-g65abc81c3982631255799e4a666a5dd5b03dc817
Author: Joseph Myers 
Date:   Wed Oct 16 10:37:10 2024 +

testsuite: Add tests for C23 __STDC_VERSION__

Add some tests for the value of __STDC_VERSION__ in C23 mode.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

* gcc.dg/c23-version-1.c, gcc.dg/c23-version-2.c,
gcc.dg/gnu23-version-1.c: New tests.

Diff:
---
 gcc/testsuite/gcc.dg/c23-version-1.c   | 9 +
 gcc/testsuite/gcc.dg/c23-version-2.c   | 9 +
 gcc/testsuite/gcc.dg/gnu23-version-1.c | 9 +
 3 files changed, 27 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/c23-version-1.c 
b/gcc/testsuite/gcc.dg/c23-version-1.c
new file mode 100644
index ..2145f9742973
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-version-1.c
@@ -0,0 +1,9 @@
+/* Test __STDC_VERSION__ for C23.  Test -std=c23.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+
+#if __STDC_VERSION__ == 202311L
+int i;
+#else
+#error "Bad __STDC_VERSION__."
+#endif
diff --git a/gcc/testsuite/gcc.dg/c23-version-2.c 
b/gcc/testsuite/gcc.dg/c23-version-2.c
new file mode 100644
index ..3d44b7324c9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-version-2.c
@@ -0,0 +1,9 @@
+/* Test __STDC_VERSION__ for C23.  Test -std=iso9899:2024.  */
+/* { dg-do compile } */
+/* { dg-options "-std=iso9899:2024 -pedantic-errors" } */
+
+#if __STDC_VERSION__ == 202311L
+int i;
+#else
+#error "Bad __STDC_VERSION__."
+#endif
diff --git a/gcc/testsuite/gcc.dg/gnu23-version-1.c 
b/gcc/testsuite/gcc.dg/gnu23-version-1.c
new file mode 100644
index ..649f13b54ec5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gnu23-version-1.c
@@ -0,0 +1,9 @@
+/* Test __STDC_VERSION__ for C23 with GNU extensions.  Test -std=gnu23.  */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu23 -pedantic-errors" } */
+
+#if __STDC_VERSION__ == 202311L
+int i;
+#else
+#error "Bad __STDC_VERSION__."
+#endif

[gcc r14-10789] libstdc++: Use std::move for iterator in ranges::fill [PR117094]

2024-10-16 Thread Jonathan Wakely via Libstdc++-cvs

https://gcc.gnu.org/g:cbb1814ffa29acc390bb0de46be49a24d09948d1

commit r14-10789-gcbb1814ffa29acc390bb0de46be49a24d09948d1
Author: Jonathan Wakely 
Date:   Sun Oct 13 22:48:43 2024 +0100

libstdc++: Use std::move for iterator in ranges::fill [PR117094]

Input iterators aren't required to be copyable.

libstdc++-v3/ChangeLog:

PR libstdc++/117094
* include/bits/ranges_algobase.h (__fill_fn): Use std::move for
iterator that might not be copyable.
* testsuite/25_algorithms/fill/constrained.cc: Check
non-copyable iterator with sized sentinel.

(cherry picked from commit 03623fa91ff36ecb9faa3b55f7842a39b759594e)

Diff:
---
 libstdc++-v3/include/bits/ranges_algobase.h|  2 +-
 .../testsuite/25_algorithms/fill/constrained.cc| 34 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index 7387f616d361..caced9f26ba4 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -567,7 +567,7 @@ namespace ranges
if constexpr (sized_sentinel_for<_Sent, _Out>)
  {
const auto __len = __last - __first;
-   return ranges::fill_n(__first, __len, __value);
+   return ranges::fill_n(std::move(__first), __len, __value);
  }
else if constexpr (is_scalar_v<_Tp>)
  {
diff --git a/libstdc++-v3/testsuite/25_algorithms/fill/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/fill/constrained.cc
index 126515eddcaa..7cae99f2d5ce 100644
--- a/libstdc++-v3/testsuite/25_algorithms/fill/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/fill/constrained.cc
@@ -83,9 +83,43 @@ test02()
   return ok;
 }
 
+void
+test03()
+{
+  // Bug libstdc++/117094 - ranges::fill misses std::move for output_iterator
+
+  // Move-only output iterator
+  struct Iterator
+  {
+using difference_type = long;
+Iterator(int* p) : p(p) { }
+Iterator(Iterator&&) = default;
+Iterator& operator=(Iterator&&) = default;
+int& operator*() const { return *p; }
+Iterator& operator++() { ++p; return *this; }
+Iterator operator++(int) { return Iterator(p++ ); }
+int* p;
+
+struct Sentinel
+{
+  const int* p;
+  bool operator==(const Iterator& i) const { return p == i.p; }
+  long operator-(const Iterator& i) const { return p - i.p; }
+};
+
+long operator-(Sentinel s) const { return p - s.p; }
+  };
+  static_assert(std::sized_sentinel_for);
+  int a[2];
+  std::ranges::fill(Iterator(a), Iterator::Sentinel{a+2}, 999);
+  VERIFY( a[0] == 999 );
+  VERIFY( a[1] == 999 );
+}
+
 int
 main()
 {
   test01();
   static_assert(test02());
+  test03();
 }

[gcc r14-10790] libstdc++: Populate generic std::time_get's wide %c format [PR117135]

2024-10-16 Thread Jonathan Wakely via Libstdc++-cvs

https://gcc.gnu.org/g:8f181a2f878e8b97a91d68214161cb96a2b7

commit r14-10790-g8f181a2f878e8b97a91d68214161cb96a2b7
Author: Jonathan Wakely 
Date:   Tue Sep 24 23:20:56 2024 +0100

libstdc++: Populate generic std::time_get's wide %c format [PR117135]

I missed out the __timepunct specialization for the "generic"
implementation when defining the %c format in r15-4016-gc534e37faccf48.

libstdc++-v3/ChangeLog:

PR libstdc++/117135
* config/locale/generic/time_members.cc
(__timepunct::_M_initialize_timepunc): Set
_M_date_time_format for C locale. Set %Ex formats to the same
values as the %x formats.

(cherry picked from commit 707d84efee7f7eb5a336935f386e094402f267a6)

Diff:
---
 libstdc++-v3/config/locale/generic/time_members.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/config/locale/generic/time_members.cc 
b/libstdc++-v3/config/locale/generic/time_members.cc
index 6619f0ca881a..5012a270dd1a 100644
--- a/libstdc++-v3/config/locale/generic/time_members.cc
+++ b/libstdc++-v3/config/locale/generic/time_members.cc
@@ -150,11 +150,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_data = new __timepunct_cache;
 
   _M_data->_M_date_format = L"%m/%d/%y";
-  _M_data->_M_date_era_format = L"%m/%d/%y";
+  _M_data->_M_date_era_format = _M_data->_M_date_format;
   _M_data->_M_time_format = L"%H:%M:%S";
-  _M_data->_M_time_era_format = L"%H:%M:%S";
-  _M_data->_M_date_time_format = L"";
-  _M_data->_M_date_time_era_format = L"";
+  _M_data->_M_time_era_format = _M_data->_M_time_format;
+  _M_data->_M_date_time_format = L"%a %b %e %T %Y";
+  _M_data->_M_date_time_era_format = _M_data->_M_date_time_format;
   _M_data->_M_am = L"AM";
   _M_data->_M_pm = L"PM";
   _M_data->_M_am_pm_format = L"%I:%M:%S %p";

[gcc r14-10794] libstdc++: Increase timeouts for PSTL tests in debug mode [PR90276]

2024-10-16 Thread Jonathan Wakely via Gcc-cvs

https://gcc.gnu.org/g:f1cee9d1a049a3bc7cae24245fcc3c415fd12764

commit r14-10794-gf1cee9d1a049a3bc7cae24245fcc3c415fd12764
Author: Jonathan Wakely 
Date:   Wed Jun 12 17:11:23 2024 +0100

libstdc++: Increase timeouts for PSTL tests in debug mode [PR90276]

These tests compile very slowly in debug mode.

libstdc++-v3/ChangeLog:

PR libstdc++/90276
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc:
Increase timeout for debug mode.
* 
testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc:
Likewise.
* 
testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
Likewise.

(cherry picked from commit e65b6627a36869b01bbe128a5324e4b415b28880)

Diff:
---
 .../testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc | 1 +
 .../25_algorithms/pstl/alg_modifying_operations/transform_binary.cc  | 1 +
 libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc   | 1 +
 .../testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc  | 1 +
 libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc  | 1 +
 .../testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc | 1 +
 6 files changed, 6 insertions(+)

diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc
 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc
index ea647c6c23a0..1b788e1b7ee5 100644
--- 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc
+++ 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- rotate_copy.pass.cpp 
--===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc
 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc
index 1f5f239a94be..16b815c5d514 100644
--- 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc
+++ 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- transform_binary.pass.cpp 
-===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc
index 1173186f65c0..441f5d1e3782 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- mismatch.pass.cpp 
-===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc
 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc
index 924aa78652e8..78edeb025d78 100644
--- 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc
+++ 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- lexicographical_compare.pass.cpp 
--===//
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc 
b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc
index 0a9f41ca1797..e4bd435d1926 100644
--- a/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc
@@ -2,6 +2,7 @@
 // { dg-options "-ltbb" }
 // { dg-do run { target c++17 } }
 // { dg-timeout-factor 3 }
+// { dg-timeout-factor 5 { target debug_mode } }
 // { dg-require-effective-target tbb_backend }
 
 //===-- minmax_element.pass.cpp 
--

[gcc r14-10793] libstdc++: Implement LWG 3564 for ranges::transform_view

2024-10-16 Thread Jonathan Wakely via Gcc-cvs

https://gcc.gnu.org/g:4d8a55ac552627ebf9bf50d28a35459cba58d8c6

commit r14-10793-g4d8a55ac552627ebf9bf50d28a35459cba58d8c6
Author: Jonathan Wakely 
Date:   Sun Oct 13 21:47:14 2024 +0100

libstdc++: Implement LWG 3564 for ranges::transform_view

The _Iterator type returned by begin() const uses const F& to
transform the elements, so it should use const F& to determine the
iterator's value_type and iterator_category as well.

This was accepted into the WP in July 2022.

libstdc++-v3/ChangeLog:

* include/std/ranges (transform_view:_Iterator): Use const F&
to determine value_type and iterator_category of
_Iterator, as per LWG 3564.
* testsuite/std/ranges/adaptors/transform.cc: Check value_type
and iterator_category.

Reviewed-by: Patrick Palka 
(cherry picked from commit dde19c600c3c8a1d765c9b4961d2556e89edad14)

Diff:
---
 libstdc++-v3/include/std/ranges   |  9 +++--
 .../testsuite/std/ranges/adaptors/transform.cc| 19 +++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 59a251536208..2c8a8535d396 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1870,8 +1870,12 @@ namespace views::__adaptor
  static auto
  _S_iter_cat()
  {
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3564. transform_view::iterator::value_type and
+   // iterator_category should use const F&
using _Base = transform_view::_Base<_Const>;
-   using _Res = invoke_result_t<_Fp&, range_reference_t<_Base>>;
+   using _Res = invoke_result_t<__maybe_const_t<_Const, _Fp>&,
+range_reference_t<_Base>>;
if constexpr (is_lvalue_reference_v<_Res>)
  {
using _Cat
@@ -1920,7 +1924,8 @@ namespace views::__adaptor
  using iterator_concept = decltype(_S_iter_concept());
  // iterator_category defined in __transform_view_iter_cat
  using value_type
-   = remove_cvref_t>>;
+   = remove_cvref_t&,
+range_reference_t<_Base>>>;
  using difference_type = range_difference_t<_Base>;
 
  _Iterator() requires default_initializable<_Base_iter> = default;
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
index bcb18a3fc6c8..ca695349650a 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
@@ -196,6 +196,24 @@ test09()
 #endif
 }
 
+void
+test10()
+{
+  struct F {
+short operator()(int) { return 0; }
+const int& operator()(const int& i) const { return i; }
+  };
+
+  int x[] {2, 4};
+  const auto xform = x | views::transform(F{});
+  using const_iterator = decltype(xform.begin());
+  // LWG 3564. transform_view::iterator::value_type and iterator_category
+  // should use const F&
+  static_assert(std::same_as, int>);
+  using cat = std::iterator_traits::iterator_category;
+  static_assert(std::same_as);
+}
+
 int
 main()
 {
@@ -208,4 +226,5 @@ main()
   test07();
   test08();
   test09();
+  test10();
 }

[gcc r14-10791] libstdc++: Tweak %c formatting for chrono types

2024-10-16 Thread Jonathan Wakely via Libstdc++-cvs

https://gcc.gnu.org/g:7836113ff4726e96c2ecf34e6954e657c0e84602

commit r14-10791-g7836113ff4726e96c2ecf34e6954e657c0e84602
Author: Jonathan Wakely 
Date:   Fri Sep 27 16:54:31 2024 +0100

libstdc++: Tweak %c formatting for chrono types

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_c): Add
[[unlikely]] attribute to condition for missing %c format in
locale. Use %T instead of %H:%M:%S in fallback.

(cherry picked from commit ce89d2f3170e0d6474cee2c5cb9d478426a5b2f6)

Diff:
---
 libstdc++-v3/include/bits/chrono_io.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index f8b78daedc65..eaa864b0bdcd 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -799,8 +799,8 @@ namespace __format
  const _CharT* __formats[2];
  __tp._M_date_time_formats(__formats);
  const _CharT* __rep = __formats[__mod];
- if (!*__rep)
-   __rep = _GLIBCXX_WIDEN("%a %b %e %H:%M:%S %Y");
+ if (!*__rep) [[unlikely]]
+   __rep = _GLIBCXX_WIDEN("%a %b %e %T %Y");
  basic_string<_CharT> __fmt(_S_empty_spec);
  __fmt.insert(1u, 1u, _S_colon);
  __fmt.insert(2u, __rep);

[gcc r15-4373] sparc: drop -mlra

2024-10-16 Thread Sam James via Gcc-cvs

https://gcc.gnu.org/g:b388f65abc71c951167175aa502476f1bfaa2a83

commit r15-4373-gb388f65abc71c951167175aa502476f1bfaa2a83
Author: Sam James 
Date:   Mon Oct 14 11:53:52 2024 -0700

sparc: drop -mlra

The sparc port gained LRA support in r7-5076-gf99bd883fb0d05 and has
defaulted to LRA since r7-5642-g70a6dbe7e37e69.

Let's finish the transition by dropping -mlra entirely.

Tested on sparc64-unknown-linux-gnu with no regressions.

gcc/ChangeLog:
PR target/113952
* config/sparc/sparc.cc (sparc_lra_p): Delete.
(TARGET_LRA_P): Ditto.
(sparc_option_override): Don't use MASK_LRA.
* config/sparc/sparc.md (disabled,enabled): Drop lra attribute.
* config/sparc/sparc.opt: Delete -mlra.
* config/sparc/sparc.opt.urls: Ditto.
* doc/invoke.texi (SPARC options): Drop -mlra and -mno-lra.

Diff:
---
 gcc/config/sparc/sparc.cc   | 16 
 gcc/config/sparc/sparc.md   | 15 ---
 gcc/config/sparc/sparc.opt  |  4 
 gcc/config/sparc/sparc.opt.urls |  3 ---
 gcc/doc/invoke.texi | 10 +-
 5 files changed, 5 insertions(+), 43 deletions(-)

diff --git a/gcc/config/sparc/sparc.cc b/gcc/config/sparc/sparc.cc
index 4bc249da825e..353837d73e55 100644
--- a/gcc/config/sparc/sparc.cc
+++ b/gcc/config/sparc/sparc.cc
@@ -697,7 +697,6 @@ static const char *sparc_mangle_type (const_tree);
 static void sparc_trampoline_init (rtx, tree, rtx);
 static machine_mode sparc_preferred_simd_mode (scalar_mode);
 static reg_class_t sparc_preferred_reload_class (rtx x, reg_class_t rclass);
-static bool sparc_lra_p (void);
 static bool sparc_print_operand_punct_valid_p (unsigned char);
 static void sparc_print_operand (FILE *, rtx, int);
 static void sparc_print_operand_address (FILE *, machine_mode, rtx);
@@ -921,9 +920,6 @@ char sparc_hard_reg_printed[8];
 #define TARGET_MANGLE_TYPE sparc_mangle_type
 #endif
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P sparc_lra_p
-
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P sparc_legitimate_address_p
 
@@ -1957,10 +1953,6 @@ sparc_option_override (void)
   if (TARGET_ARCH32)
 target_flags &= ~MASK_STACK_BIAS;
 
-  /* Use LRA instead of reload, unless otherwise instructed.  */
-  if (!(target_flags_explicit & MASK_LRA))
-target_flags |= MASK_LRA;
-
   /* Enable applicable errata workarounds for LEON3FT.  */
   if (sparc_fix_ut699 || sparc_fix_ut700 || sparc_fix_gr712rc)
 {
@@ -13286,14 +13278,6 @@ sparc_preferred_reload_class (rtx x, reg_class_t 
rclass)
   return rclass;
 }
 
-/* Return true if we use LRA instead of reload pass.  */
-
-static bool
-sparc_lra_p (void)
-{
-  return TARGET_LRA;
-}
-
 /* Output a wide multiply instruction in V8+ mode.  INSN is the instruction,
OPERANDS are its operands and OPCODE is the mnemonic to be used.  */
 
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 736307926f9a..96c542c6ab6e 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -265,12 +265,8 @@
 (define_attr "cpu_feature" "none,fpu,fpunotv9,v9,vis,vis3,vis4,vis4b"
   (const_string "none"))
 
-(define_attr "lra" "disabled,enabled"
-  (const_string "enabled"))
-
 (define_attr "enabled" ""
-  (cond [(eq_attr "cpu_feature" "none")
-   (cond [(eq_attr "lra" "disabled") (symbol_ref "!TARGET_LRA")] 
(const_int 1))
+  (cond [(eq_attr "cpu_feature" "none") (const_int 1)
  (eq_attr "cpu_feature" "fpu") (symbol_ref "TARGET_FPU")
  (eq_attr "cpu_feature" "fpunotv9") (symbol_ref "TARGET_FPU && 
!TARGET_V9")
  (eq_attr "cpu_feature" "v9") (symbol_ref "TARGET_V9")
@@ -1867,8 +1863,7 @@ visl")
(set_attr "subtype" 
"*,*,regular,*,regular,*,*,*,*,*,*,*,*,*,*,*,*,*,double,double")
(set_attr "length" "*,2,*,*,*,*,2,2,*,*,2,2,*,2,2,2,*,*,*,*")
(set_attr "fptype" "*,*,*,*,*,*,*,*,*,*,*,*,double,*,*,*,*,*,double,double")
-   (set_attr "cpu_feature" 
"v9,*,*,*,*,*,*,*,fpu,fpu,fpu,fpu,v9,fpunotv9,vis3,vis3,fpu,fpu,vis,vis")
-   (set_attr "lra" "*,*,disabled,disabled,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*")])
+   (set_attr "cpu_feature" 
"v9,*,*,*,*,*,*,*,fpu,fpu,fpu,fpu,v9,fpunotv9,vis3,vis3,fpu,fpu,vis,vis")])
 
 (define_insn "*movdi_insn_sp64"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m, r,*e,?*e,?*e,  
m,b,b")
@@ -2496,8 +2491,7 @@ visl")
(set_attr "subtype" 
"*,*,double,double,*,*,*,*,*,*,regular,*,*,*,*,regular,*")
(set_attr "length" "*,2,*,*,*,2,2,2,*,*,*,*,2,2,2,*,*")
(set_attr "fptype" "*,*,double,double,double,*,*,*,*,*,*,*,*,*,*,*,*")
-   (set_attr "cpu_feature" 
"v9,*,vis,vis,v9,fpunotv9,vis3,vis3,fpu,fpu,*,*,fpu,fpu,*,*,*")
-   (set_attr "lra" "*,*,*,*,*,*,*,*,*,*,disabled,disabled,*,*,*,*,*")])
+   (set_attr "cpu_feature" 
"v9,*,vis,vis,v9,fpunotv9,vis3,vis3,fpu,fpu,*,*,fpu,fpu,*,*,*")])
 
 (define_insn "*movdf_insn_sp64"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=b,b,e,*r, e,  e,m,

[gcc r15-4378] RISC-V: Use biggest_mode as mode for constants.

2024-10-16 Thread Robin Dapp via Gcc-cvs

https://gcc.gnu.org/g:cc217a1ecb04c9234b2cce7ba3c27701a050e402

commit r15-4378-gcc217a1ecb04c9234b2cce7ba3c27701a050e402
Author: Robin Dapp 
Date:   Tue Oct 15 12:10:48 2024 +0200

RISC-V: Use biggest_mode as mode for constants.

In compute_nregs_for_mode we expect that the current variable's mode is
at most as large as the biggest mode to be used for vectorization.

This might not be true for constants as they don't actually have a mode.
In that case, just use the biggest mode so max_number_of_live_regs
returns 1.

This fixes several test cases in the test suite.

gcc/ChangeLog:

PR target/116655

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs):
Use biggest mode instead of constant's saved mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116655.c: New test.

Diff:
---
 gcc/config/riscv/riscv-vector-costs.cc| 14 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116655.c | 11 +++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 25570bd40040..67b9e3e8f413 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -194,7 +194,7 @@ compute_local_program_points (
   /* Collect the stmts that is vectorized and mark their program point.  */
   for (i = 0; i < nbbs; i++)
{
- int point = 1;
+ unsigned int point = 1;
  basic_block bb = bbs[i];
  vec program_points = vNULL;
  if (dump_enabled_p ())
@@ -489,9 +489,15 @@ max_number_of_live_regs (loop_vec_info loop_vinfo, const 
basic_block bb,
   pair live_range = (*iter).second;
   for (i = live_range.first + 1; i <= live_range.second; i++)
{
- machine_mode mode = TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
-   ? BImode
-   : TYPE_MODE (TREE_TYPE (var));
+ machine_mode mode;
+ if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE)
+   mode = BImode;
+ /* Constants do not have a mode, just use the biggest so
+compute_nregs will return 1.  */
+ else if (TREE_CODE (var) == INTEGER_CST)
+   mode = biggest_mode;
+ else
+   mode = TYPE_MODE (TREE_TYPE (var));
  unsigned int nregs
= compute_nregs_for_mode (loop_vinfo, mode, biggest_mode, lmul);
  live_vars_vec[i] += nregs;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116655.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116655.c
new file mode 100644
index ..36768e37d005
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116655.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64imv -mabi=lp64d -mrvv-max-lmul=dynamic" } */
+
+short a[5];
+int b() {
+  int c = 0;
+  for (; c <= 4; c++)
+if (a[c])
+  break;
+  return c;
+}

[gcc r15-4375] libcpp, c, middle-end: Optimize initializers using #embed in C

2024-10-16 Thread Jakub Jelinek via Gcc-cvs

https://gcc.gnu.org/g:1844a4aa6615c2252303e70d41bdb18e7c5664c6

commit r15-4375-g1844a4aa6615c2252303e70d41bdb18e7c5664c6
Author: Jakub Jelinek 
Date:   Wed Oct 16 10:09:49 2024 +0200

libcpp, c, middle-end: Optimize initializers using #embed in C

This patch actually optimizes #embed, so far in C.

For a simple testcase (for 494447200 bytes long cc1plus):
cat embed-11.c
unsigned char a[] = {
  #embed "cc1plus"
};
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c

real0m13.647s
user0m7.157s
sys 0m2.597s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c

real0m28.649s
user0m26.653s
sys 0m1.958s

and when configured against binutils with .base64 support
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c

real0m4.283s
user0m2.288s
sys 0m0.859s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c

real0m6.888s
user0m5.876s
sys 0m1.002s

(all times with --enable-checking=yes,rtl,extra compiler).

Even just
./cc1plus -E -o embed-11.i embed-11.c
(which doesn't have this optimization yet and so preprocesses it as
1.3GB preprocessed file) needed almost 25GB of compile time RAM (but
preprocessed fine).
And compiling that embed-11.i with -std=c23 -O0 by unpatched gcc
I gave up after 400 seconds when it already ate 45GB of RAM and didn't
produce a single byte into embed-11.s yet.

The patch introduces a new CPP_EMBED token which contains raw memory image
virtually representing a sequence of int literals.
To simplify the parsing complexities, the preprocessor guarantees CPP_EMBED
is only emitted if there are 4+ (it actually does that for 64+ right now)
literals in the sequence and emits CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA
CPP_NUMBER tokens (with more CPP_EMBED separated by CPP_COMMA if it is
longer than 2GB, as STRING_CSTs in GCC and also the new RAW_DATA_CST etc.
are limited to INT_MAX elements).  The main reason is that the preprocessor
doesn't really know in which context #embed directive appears, there could
be e.g.
{ 25 *
  #embed "whatever"
* 2 - 15 }
or similar and dealing with this special case deep in the expression parsing
is undesirable.
With the CPP_NUMBERs around it, I believe in the C FE the only places which
need handling of the CPP_EMBED token are initializer parsing (that is the
only one which adds actual optimizations for it), comma expressions (I
believe nothing really cares whether it is 25,13,95 or
25,13,0,1,2,3,4,5,6,7,8,9,10,13,95 etc., so besides the 2 outer CPP_NUMBER
the parsing just adds one INTEGER_CST to the comma expression, I doubt users
want to be spammed with millions of -Wunused warnings per #embed),
whatever uses c_parser_expr_list (function calls, attribute arguments,
OpenMP sizes clause argument, OpenACC tile clause argument and whatever uses
c_parser_get_builtin_args (mainly for __builtin_shufflevector).  Please 
correct
me if I'm wrong.

The patch introduces a RAW_DATA_CST tree code, which can then be used inside
of array CONSTRUCTOR elt values.  In some sense RAW_DATA_CST is similar to
STRING_CST, but right now STRING_CST is used only if the whole array
initializer is that constant, while RAW_DATA_CST at index idx (should be
always INTEGER_CST index, another advantage of the CPP_NUMBER around is that
[30 ... 250] =
  #embed "whatever"
really does what it would do with a integer sequence there) stands for
[idx] = RAW_DATA_POINTER (val)[0],
[idx+1] = RAW_DATA_POINTER (val)[1],
...
[idx+RAW_DATA_LENGTH (val)-1] = RAW_DATA_POINTER (val)[RAW_DATA_LENGTH 
(val)-1].
Another important thing is that unlike STRING_CST which has the data
embedded in it RAW_DATA_CST doesn't own the data, it has RAW_DATA_OWNER
which owns the data (that can be a STRING_CST, e.g. used for PCH or LTO
after reading LTO in) or another RAW_DATA_CST (with NULL RAW_DATA_OWNER,
standing for data owned by libcpp buffers).  The advantage is that it can be
cheaply peeled off, or split into multiple smaller pieces, e.g. if one uses
designated initializer to store something into the middle of a 10GB #embed
array, in no case we need to actually copy data around for that.
Right now RAW_DATA_CST is only used in initializers of integral arrays where
the integer type has (host) CHAR_BIT precision, so usually char/signed
char/unsigned char (for C++ later maybe std::byte); in theory we could say
allocate 4 times as big buffer for conversions to int array and depending
on endianity and storage order reversal etc., but I'm not sure if that is
something that will be actually needed in the wild.
And an optimization inside of c-common.cc attempts to undo that CPP_NUMBER
CPP_EMBED CPP_NUMBER division in case one uses #embe

[gcc r15-4376] gimplify: Small RAW_DATA_CST gimplification fix

2024-10-16 Thread Jakub Jelinek via Gcc-cvs

https://gcc.gnu.org/g:60ad1e40649244aa219b411c1a1ef5e00ec6a87b

commit r15-4376-g60ad1e40649244aa219b411c1a1ef5e00ec6a87b
Author: Jakub Jelinek 
Date:   Wed Oct 16 10:20:00 2024 +0200

gimplify: Small RAW_DATA_CST gimplification fix

I've noticed the following testcase hangs during gimplification.

While it is gimplifying an assignment from a VAR_DECL .LCNNN to MEM_REF,
because the VAR_DECL is TREE_READONLY, it will happily pick its initializer
and try to gimplify that, which means recursing to the exact same code.

The following patch fixes that by just gimplifying the lhs and building
assignment, because the code decided that it should use copying from
a static var.

2024-10-16  Jakub Jelinek  

* gimplify.cc (gimplify_init_ctor_eval): For larger RAW_DATA_CST,
just gimplify cref as lvalue and add gimple assignment of rctor
to cref instead of going through gimplification of INIT_EXPR, as
the latter can suffer from infinite recursion.

* c-c++-common/cpp/embed-24.c: New test.

Diff:
---
 gcc/gimplify.cc   |  7 +++--
 gcc/testsuite/c-c++-common/cpp/embed-24.c | 52 +++
 2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 9284fffe137f..769b880cce58 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -5419,9 +5419,10 @@ gimplify_init_ctor_eval (tree object, 
vec *elts,
  cref = build2 (MEM_REF, rtype, addr,
 build_int_cst (ptr_type_node, 0));
  rctor = tree_output_constant_def (rctor);
- tree init = build2 (INIT_EXPR, rtype, cref, rctor);
- gimplify_and_add (init, pre_p);
- ggc_free (init);
+ if (gimplify_expr (&cref, pre_p, NULL, is_gimple_lvalue,
+fb_lvalue) != GS_ERROR)
+   gimplify_seq_add_stmt (pre_p,
+  gimple_build_assign (cref, rctor));
}
}
   else
diff --git a/gcc/testsuite/c-c++-common/cpp/embed-24.c 
b/gcc/testsuite/c-c++-common/cpp/embed-24.c
new file mode 100644
index ..d59a7e55cec5
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/embed-24.c
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-std=c23" { target c } } */
+
+static unsigned char a[] = {
+#embed __FILE__ limit (125)
+};
+
+void
+foo (unsigned char *p)
+{
+  for (int i = 0; i < 128; ++i)
+if (p[i] != ((i < 64 || i == 127) ? (unsigned char) -1 : i - 64 + 33))
+  __builtin_abort ();
+  if (__builtin_memcmp (p + 128, a, 125))
+__builtin_abort ();
+  for (int i = 253; i < 256; ++i)
+if (p[i] != (unsigned char) -1)
+  __builtin_abort ();
+}
+
+#ifdef __cplusplus
+#define M1 (unsigned char) -1
+#else
+#define M1 -1
+#endif
+
+int
+main ()
+{
+  unsigned char res[256] = {
+M1, M1, M1, M1, M1, M1, M1, M1, 
+M1, M1, M1, M1, M1, M1, M1, M1, 
+M1, M1, M1, M1, M1, M1, M1, M1, 
+M1, M1, M1, M1, M1, M1, M1, M1, 
+M1, M1, M1, M1, M1, M1, M1, M1, 
+M1, M1, M1, M1, M1, M1, M1, M1, 
+M1, M1, M1, M1, M1, M1, M1, M1, 
+M1, M1, M1, M1, M1, M1, M1, M1, 
+33, 34, 35, 36, 37, 38, 39, 40, 
+41, 42, 43, 44, 45, 46, 47, 48, 
+49, 50, 51, 52, 53, 54, 55, 56, 
+57, 58, 59, 60, 61, 62, 63, 64, 
+65, 66, 67, 68, 69, 70, 71, 72, 
+73, 74, 75, 76, 77, 78, 79, 80, 
+81, 82, 83, 84, 85, 86, 87, 88, 
+89, 90, 91, 92, 93, 94, 95, M1,
+  #embed __FILE__ limit (125) suffix (,)
+M1, M1, M1
+  };
+  foo (res);
+}

[gcc r15-4377] c: Speed up compilation of large char array initializers when not using #embed

2024-10-16 Thread Jakub Jelinek via Gcc-cvs

https://gcc.gnu.org/g:f9bac238840155e1539aa68daf1507ea63c9ed80

commit r15-4377-gf9bac238840155e1539aa68daf1507ea63c9ed80
Author: Jakub Jelinek 
Date:   Wed Oct 16 10:22:44 2024 +0200

c: Speed up compilation of large char array initializers when not using 
#embed

The following patch on attempts to speed up compilation of large char array
initializers when one doesn't use #embed in the source.

My testcase has been
unsigned char a[] = {
 #embed "cc1gm2" limit (1)
};
and corresponding variant which has the middle line replaced with
dd if=cc1gm bs=1 count=1 | xxd -i
With embed 95.3MiB is really fast:
time ./cc1 -quiet -O2 -o test4a.s test4a.c

real0m0.700s
user0m0.576s
sys 0m0.123s
Without embed and without this patch it needs around 11GB of RAM and
time ./cc1 -quiet -O2 -o test4b.s test4b.c

real2m47.230s
user2m41.548s
sys 0m4.328s
Without embed and with this patch it needs around 3.5GB of RAM and
time ./cc1 -quiet -O2 -o test4b.s2 test4b.c

real0m25.004s
user0m23.655s
sys 0m1.308s
Not perfect (but one needs to parse all the numbers, libcpp also creates
strings which are pointed by CPP_NUMBER tokens (that can take up to 4 bytes
per byte), but still almost 7x speed improvement and 3x compile time memory.

One drawback of the patch is that for the larger initializers the precise
locations for -Wconversion warnings are gone when initializing signed char
(or char when it is signed) arrays.

If that is important, perhaps c_maybe_optimize_large_byte_initializer could
tell the caller this is the case and c_parser_initval could emit the
warnings directly when it still knows the location_t and suppress warnings
on the RAW_DATA_CST.

2024-10-16  Jakub Jelinek  

* c-tree.h (c_maybe_optimize_large_byte_initializer): Declare.
* c-parser.cc (c_parser_initval): Attempt to optimize large char 
array
initializers into RAW_DATA_CST.
* c-typeck.cc (c_maybe_optimize_large_byte_initializer): New 
function.

* c-c++-common/init-1.c: New test.
* c-c++-common/init-2.c: New test.
* c-c++-common/init-3.c: New test.

Diff:
---
 gcc/c/c-parser.cc   | 118 +++
 gcc/c/c-tree.h  |   1 +
 gcc/c/c-typeck.cc   |  36 ++
 gcc/testsuite/c-c++-common/init-1.c | 218 
 gcc/testsuite/c-c++-common/init-2.c | 218 
 gcc/testsuite/c-c++-common/init-3.c | 218 
 6 files changed, 809 insertions(+)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 9eaa91413b6c..120f2b289c0b 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -6507,7 +6507,125 @@ c_parser_initval (c_parser *parser, struct c_expr 
*after,
(init.value
init = convert_lvalue_to_rvalue (loc, init, true, true, true);
 }
+  tree val = init.value;
   process_init_element (loc, init, false, braced_init_obstack);
+
+  /* Attempt to optimize large char array initializers into RAW_DATA_CST
+ to save compile time and memory even when not using #embed.  */
+  static unsigned vals_to_ignore;
+  if (vals_to_ignore)
+/* If earlier call determined there is certain number of CPP_COMMA
+   CPP_NUMBER tokens with 0-255 int values, but not enough for
+   RAW_DATA_CST to be beneficial, don't try to check it again until
+   they are all parsed.  */
+--vals_to_ignore;
+  else if (val
+  && TREE_CODE (val) == INTEGER_CST
+  && TREE_TYPE (val) == integer_type_node
+  && c_parser_next_token_is (parser, CPP_COMMA))
+if (unsigned int len = c_maybe_optimize_large_byte_initializer ())
+  {
+   char buf1[64];
+   unsigned int i;
+   gcc_checking_assert (len >= 64);
+   location_t last_loc = UNKNOWN_LOCATION;
+   for (i = 0; i < 64; ++i)
+ {
+   c_token *tok = c_parser_peek_nth_token_raw (parser, 1 + 2 * i);
+   if (tok->type != CPP_COMMA)
+ break;
+   tok = c_parser_peek_nth_token_raw (parser, 2 + 2 * i);
+   if (tok->type != CPP_NUMBER
+   || TREE_CODE (tok->value) != INTEGER_CST
+   || TREE_TYPE (tok->value) != integer_type_node
+   || wi::neg_p (wi::to_wide (tok->value))
+   || wi::to_widest (tok->value) > UCHAR_MAX)
+ break;
+   buf1[i] = (char) tree_to_uhwi (tok->value);
+   if (i == 0)
+ loc = tok->location;
+   last_loc = tok->location;
+ }
+   if (i < 64)
+ {
+   vals_to_ignore = i;
+   return;
+ }
+   c_token *tok = c_parser_peek_nth_token_raw (parser, 1 + 2 * i);
+   /* If 64 CPP_C

[gcc(refs/users/meissner/heads/work181-sha)] Initial support for adding xxeval fusion support.

2024-10-16 Thread Michael Meissner via Gcc-cvs

https://gcc.gnu.org/g:40636facc4f89efc136859f9e72338cb7dba6ed4

commit 40636facc4f89efc136859f9e72338cb7dba6ed4
Author: Michael Meissner 
Date:   Wed Oct 16 03:27:19 2024 -0400

Initial support for adding xxeval fusion support.

2024-10-16  Michael Meissner  

gcc/

* config/rs6000/fusion.md (fuse_vandc_xor_noxxeval): Rename from
fuse_vandc_xor, and restrict the case to non-xxeval support.
(fuse_vandc_vxor_xxeval): New insn.
(fuse_vxor_vxor_noxxeval): Rename from fuse_vxor_xor, and restrict 
the
case to non-xxeval support.
(fuse_vxor_vxor_xxeval): New insn.
* config/rs6000/rs6000.cc (rs6000_opt_vars): Add -mxxeval.
* config/rs6000/rs6000.opt (-mxxeval): New switch.

Diff:
---
 gcc/config/rs6000/fusion.md  | 44 
 gcc/config/rs6000/rs6000.cc  |  3 +++
 gcc/config/rs6000/rs6000.opt |  4 
 3 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index 4ed9ae1d69f4..5332c84681fa 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -2896,13 +2896,13 @@
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vandc -> vxor
-(define_insn "*fuse_vandc_vxor"
+(define_insn "*fuse_vandc_vxor_noxxeval"
   [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
 (xor:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v"))
   (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(TARGET_P10_FUSION && (!TARGET_XXEVAL || !TARGET_PREFIXED))"
   "@
vandc %3,%1,%0\;vxor %3,%3,%2
vandc %3,%1,%0\;vxor %3,%3,%2
@@ -2912,6 +2912,24 @@
(set_attr "cost" "6")
(set_attr "length" "8")])
 
+(define_insn "*fuse_vandc_vxor_xxeval"
+  [(set (match_operand:VM 3 "vsx_register_operand" "=&0,&1,&v,v,wa")
+(xor:VM (and:VM (not:VM (match_operand:VM 0 "vsx_register_operand" 
"v,v,v,v,wa"))
+  (match_operand:VM 1 "vsx_register_operand" 
"v,v,v,v,wa"))
+ (match_operand:VM 2 "vsx_register_operand" "v,v,v,v,wa")))
+   (clobber (match_scratch:VM 4 "=X,X,X,&v,X"))]
+  "(TARGET_P10_FUSION && TARGET_XXEVAL && TARGET_PREFIXED)"
+  "@
+   vandc %3,%1,%0\;vxor %3,%3,%2
+   vandc %3,%1,%0\;vxor %3,%3,%2
+   vandc %3,%1,%0\;vxor %3,%3,%2
+   vandc %4,%1,%0\;vxor %3,%4,%2
+   xxeval %w3,%w0,%w1,%w2,45\t\t# fuse xxlxor (%w0, xxlandc (%x1, %x2))"
+  [(set_attr "type" "fused_vector")
+   (set_attr "cost" "6")
+   (set_attr "length" "8")
+   (set_attr "prefixed" "*,*,*,*,yes")])
+
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector veqv -> vxor
 (define_insn "*fuse_veqv_vxor"
@@ -3004,13 +3022,13 @@
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vxor -> vxor
-(define_insn "*fuse_vxor_vxor"
+(define_insn "*fuse_vxor_vxor_noxxeval"
   [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
 (xor:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v")
   (match_operand:VM 1 "altivec_register_operand" 
"%v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(TARGET_P10_FUSION && (!TARGET_XXEVAL || !TARGET_PREFIXED))"
   "@
vxor %3,%1,%0\;vxor %3,%3,%2
vxor %3,%1,%0\;vxor %3,%3,%2
@@ -3020,6 +3038,24 @@
(set_attr "cost" "6")
(set_attr "length" "8")])
 
+(define_insn "*fuse_vxor_vxor_xxeval"
+  [(set (match_operand:VM 3 "vsx_register_operand" "=&0,&1,&v,v,wa")
+(xor:VM (xor:VM (match_operand:VM 0 "vsx_register_operand" 
"v,v,v,v,wa")
+  (match_operand:VM 1 "vsx_register_operand" 
"%v,v,v,v,wa"))
+ (match_operand:VM 2 "vsx_register_operand" "v,v,v,v,wa")))
+   (clobber (match_scratch:VM 4 "=X,X,X,&v,X"))]
+  "(TARGET_P10_FUSION && TARGET_XXEVAL && TARGET_PREFIXED)"
+  "@
+   vxor %3,%1,%0\;vxor %3,%3,%2
+   vxor %3,%1,%0\;vxor %3,%3,%2
+   vxor %3,%1,%0\;vxor %3,%3,%2
+   vxor %4,%1,%0\;vxor %3,%4,%2
+   xxeval %x3,%x0,%x1,%x2,105\t\t# fuse xxlxor (%x0, xxlxor (%x1,%x2))"
+  [(set_attr "type" "fused_vector")
+   (set_attr "cost" "6")
+   (set_attr "length" "8")
+   (set_attr "prefixed" "*,*,*,*,yes")])
+
 ;; add-add fusion pattern generated by gen_addadd
 (define_insn "*fuse_add_add"
   [(set (match_operand:GPR 3 "gpc_reg_operand" "=&0,&1,&r,r")
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index aa67e7256bb9..072556b7fd7a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -24668,6 +24668,9 @@ static struct rs6000_opt_var const rs6000_opt_vars[] =
   { "speculate-ind

[gcc(refs/users/meissner/heads/work181-sha)] Update ChangeLog.*

2024-10-16 Thread Michael Meissner via Gcc-cvs

https://gcc.gnu.org/g:b2acd43c277050150361693fc2d40bf7aa206c21

commit b2acd43c277050150361693fc2d40bf7aa206c21
Author: Michael Meissner 
Date:   Wed Oct 16 03:28:49 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.sha | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
index 6afb87fdaaae..0727f25e522f 100644
--- a/gcc/ChangeLog.sha
+++ b/gcc/ChangeLog.sha
@@ -1,5 +1,31 @@
+ Branch work181-sha, patch #400 
+
+Initial support for adding xxeval fusion support.
+
+2024-10-16  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/fusion.md (fuse_vandc_xor_noxxeval): Rename from
+   fuse_vandc_xor, and restrict the case to non-xxeval support.
+   (fuse_vandc_vxor_xxeval): New insn.
+   (fuse_vxor_vxor_noxxeval): Rename from fuse_vxor_xor, and restrict the
+   case to non-xxeval support.
+   (fuse_vxor_vxor_xxeval): New insn.
+   * config/rs6000/rs6000.cc (rs6000_opt_vars): Add -mxxeval.
+   * config/rs6000/rs6000.opt (-mxxeval): New switch.
+
  Branch work181-sha, baseline 
 
+Add ChangeLog.sha and update REVISION.
+
+2024-10-14  Michael Meissner  
+
+gcc/
+
+   * ChangeLog.sha: New file for branch.
+   * REVISION: Update.
+
 2024-10-14   Michael Meissner  
 
Clone branch

[gcc r15-4374] vax: fixup vax.opt.urls

2024-10-16 Thread Sam James via Gcc-cvs

https://gcc.gnu.org/g:a9e14d2ddf200f612b5991dbf3a3218c08e31d08

commit r15-4374-ga9e14d2ddf200f612b5991dbf3a3218c08e31d08
Author: Sam James 
Date:   Wed Oct 16 09:16:55 2024 +0100

vax: fixup vax.opt.urls

Needed after r15-4373-gb388f65abc71c9.

gcc/ChangeLog:

* config/vax/vax.opt.urls: Adjust index for -mlra.

Diff:
---
 gcc/config/vax/vax.opt.urls | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/vax/vax.opt.urls b/gcc/config/vax/vax.opt.urls
index 10bee25d8336..7813b886baa2 100644
--- a/gcc/config/vax/vax.opt.urls
+++ b/gcc/config/vax/vax.opt.urls
@@ -19,5 +19,5 @@ munix
 UrlSuffix(gcc/VAX-Options.html#index-munix)
 
 mlra
-UrlSuffix(gcc/VAX-Options.html#index-mlra-4)
+UrlSuffix(gcc/VAX-Options.html#index-mlra-3)

[gcc r15-4379] libstdc++: Fix Python deprecation warning in printers.py

2024-10-16 Thread Jonathan Wakely via Libstdc++-cvs

https://gcc.gnu.org/g:b9e98bb9919fa9f07782f23f79b3d35abb9ff542

commit r15-4379-gb9e98bb9919fa9f07782f23f79b3d35abb9ff542
Author: Jonathan Wakely 
Date:   Wed Oct 16 09:22:37 2024 +0100

libstdc++: Fix Python deprecation warning in printers.py

python/libstdcxx/v6/printers.py:1355: DeprecationWarning: 'count' is passed 
as positional argument

The Python docs say:

  Deprecated since version 3.13: Passing count and flags as positional
  arguments is deprecated. In future Python versions they will be
  keyword-only parameters.

Using a keyword argument for count only became possible with Python 3.1
so introduce a new function to do the substitution.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (strip_fundts_namespace): New.
(StdExpAnyPrinter, StdExpOptionalPrinter): Use it.

Diff:
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 921049378627..d05b79762fdd 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -220,6 +220,16 @@ def strip_versioned_namespace(typename):
 return typename.replace(_versioned_namespace, '')
 
 
+def strip_fundts_namespace(typ):
+"""Remove "fundamentals_vN" inline namespace from qualified type name."""
+pattern = r'^std::experimental::fundamentals_v\d::'
+repl = 'std::experimental::'
+if sys.version_info[0] == 2:
+return re.sub(pattern, repl, typ, 1)
+else: # Technically this needs Python 3.1 but nobody should be using 3.0
+return re.sub(pattern, repl, typ, count=1)
+
+
 def strip_inline_namespaces(type_str):
 """Remove known inline namespaces from the canonical name of a type."""
 type_str = strip_versioned_namespace(type_str)
@@ -1355,8 +1365,7 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
 
 def __init__(self, typename, val):
 self._typename = strip_versioned_namespace(typename)
-self._typename = re.sub(r'^std::experimental::fundamentals_v\d::',
-'std::experimental::', self._typename, 1)
+self._typename = strip_fundts_namespace(self._typename)
 self._val = val
 self._contained_type = None
 contained_value = None
@@ -1449,10 +1458,8 @@ class StdExpOptionalPrinter(SingleObjContainerPrinter):
 """Print a std::optional or std::experimental::optional."""
 
 def __init__(self, typename, val):
-typename = strip_versioned_namespace(typename)
-self._typename = re.sub(
-r'^std::(experimental::|)(fundamentals_v\d::|)(.*)',
-r'std::\1\3', typename, 1)
+self._typename = strip_versioned_namespace(typename)
+self._typename = strip_fundts_namespace(self._typename)
 payload = val['_M_payload']
 if self._typename.startswith('std::experimental'):
 engaged = val['_M_engaged']

[gcc r15-4389] c: Fix up uninitialized next.original_type use in #embed optimization

2024-10-16 Thread Jakub Jelinek via Gcc-cvs

https://gcc.gnu.org/g:f5224caf53a4f17b190497c00c505977d358bef9

commit r15-4389-gf5224caf53a4f17b190497c00c505977d358bef9
Author: Jakub Jelinek 
Date:   Wed Oct 16 17:45:19 2024 +0200

c: Fix up uninitialized next.original_type use in #embed optimization

Jonathan pointed me at a diagnostic from an unnamed static analyzer
which found that next.original_type isn't initialized for the CPP_EMBED
case when it is parsed in a comma expression, yet
  expr.original_type = next.original_type;
is done a few lines later and the expr is returned.

2024-10-16  Jakub Jelinek  

* c-parser.cc (c_parser_expression): Initialize next.original_type
to integer_type_node for the CPP_EMBED case.

Diff:
---
 gcc/c/c-parser.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 120f2b289c0b..e4381044e5cb 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -13299,6 +13299,7 @@ c_parser_expression (c_parser *parser)
  next.value = build_int_cst (TREE_TYPE (val),
  ((const unsigned char *)
   RAW_DATA_POINTER (val))[last]);
+ next.original_type = integer_type_node;
  c_parser_consume_token (parser);
}
   else

[gcc r15-4390] c: Add some checking asserts to named loops handling code

2024-10-16 Thread Jakub Jelinek via Gcc-cvs

https://gcc.gnu.org/g:6756250fcbed4a214c30de94e4ec68ea130528d5

commit r15-4390-g6756250fcbed4a214c30de94e4ec68ea130528d5
Author: Jakub Jelinek 
Date:   Wed Oct 16 17:46:06 2024 +0200

c: Add some checking asserts to named loops handling code

Jonathan mentioned an unnamed static analyzer reported issue in
c_finish_bc_name.
It is actually a false positive, because the construction of the
loop_names vector guarantees that the last element of the vector
(if the vector is non-empty) always has either
C_DECL_LOOP_NAME (l) or C_DECL_SWITCH_NAME (l) (or both) flags
set, so c will be always non-NULL after the if at the start of the
loops.
The following patch is an attempt to help those static analyzers
(though dunno if it actually helps), by adding a checking assert.

2024-10-16  Jakub Jelinek  

* c-decl.cc (c_get_loop_names): Add checking assert that
c is non-NULL in the loop.
(c_finish_bc_name): Likewise.

Diff:
---
 gcc/c/c-decl.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 888966cb7109..1827bbf06465 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -13881,6 +13881,7 @@ c_get_loop_names (tree before_labels, bool switch_p, 
tree *last_p)
{
  if (C_DECL_LOOP_NAME (l) || C_DECL_SWITCH_NAME (l))
c = l;
+ gcc_checking_assert (c);
  loop_names_hash->put (l, c);
  if (i == first)
break;
@@ -13952,6 +13953,7 @@ c_finish_bc_name (location_t loc, tree name, bool 
is_break)
  {
if (C_DECL_LOOP_NAME (l) || C_DECL_SWITCH_NAME (l))
  c = l;
+   gcc_checking_assert (c);
if (l == lab)
  {
label = c;
@@ -13970,6 +13972,7 @@ c_finish_bc_name (location_t loc, tree name, bool 
is_break)
{
  if (C_DECL_LOOP_NAME (l) || C_DECL_SWITCH_NAME (l))
c = l;
+ gcc_checking_assert (c);
  if (is_break || C_DECL_LOOP_NAME (c))
candidates.safe_push (IDENTIFIER_POINTER (DECL_NAME (l)));
}

[gcc r15-4387] PR116510: Add missing fold_converts into tree switch if conversion

2024-10-16 Thread Andi Kleen via Gcc-cvs

https://gcc.gnu.org/g:d5a05db80fa95dcae1ebc177f7790e1d34fa73ed

commit r15-4387-gd5a05db80fa95dcae1ebc177f7790e1d34fa73ed
Author: Andi Kleen 
Date:   Tue Oct 15 13:16:02 2024 -0700

PR116510: Add missing fold_converts into tree switch if conversion

Passes test suite. Ok to commit?

gcc/ChangeLog:

PR middle-end/116510
* tree-if-conv.cc (predicate_bbs): Add missing fold_converts.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-3.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c | 12 
 gcc/tree-if-conv.cc |  9 ++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c
new file mode 100644
index ..41bc8a1cf129
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+// PR116510
+
+char excmap_def_0;
+int gg_strescape_i;
+void gg_strescape() {
+  for (; gg_strescape_i; gg_strescape_i++)
+switch ((unsigned char)gg_strescape_i)
+case '\\':
+case '"':
+  excmap_def_0 = 0;
+}
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 90c754a48147..376a4642954d 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1477,10 +1477,12 @@ predicate_bbs (loop_p loop)
{
  tree low = build2_loc (loc, GE_EXPR,
 boolean_type_node,
-index, CASE_LOW (label));
+index, fold_convert_loc (loc, 
TREE_TYPE (index),
+CASE_LOW (label)));
  tree high = build2_loc (loc, LE_EXPR,
  boolean_type_node,
- index, CASE_HIGH (label));
+ index, fold_convert_loc (loc, 
TREE_TYPE (index),
+ CASE_HIGH (label)));
  case_cond = build2_loc (loc, TRUTH_AND_EXPR,
  boolean_type_node,
  low, high);
@@ -1489,7 +1491,8 @@ predicate_bbs (loop_p loop)
case_cond = build2_loc (loc, EQ_EXPR,
boolean_type_node,
index,
-   CASE_LOW (gimple_switch_label (sw, i)));
+   fold_convert_loc (loc, TREE_TYPE 
(index),
+ CASE_LOW (label)));
  if (i > 1)
switch_cond = build2_loc (loc, TRUTH_OR_EXPR,
  boolean_type_node,

[gcc r15-4388] Add libgomp.oacc-fortran/acc_on_device-1-4.f

2024-10-16 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:ee4fdda70f1080bba5e49cadebc44333e19edeb4

commit r15-4388-gee4fdda70f1080bba5e49cadebc44333e19edeb4
Author: Tobias Burnus 
Date:   Wed Oct 16 16:15:40 2024 +0200

Add libgomp.oacc-fortran/acc_on_device-1-4.f

Kind of undoes r15-4315-g9f549d216c9716 by adding the original testcase 
back;
namely, adding acc_on_device-1-3.f as acc_on_device-1-4.f with
-fno-builtin-acc_on_device removed.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/acc_on_device-1-4.f: New test;
same as acc_on_device-1-3.f but using the builtin function.

Diff:
---
 .../libgomp.oacc-fortran/acc_on_device-1-4.f   | 60 ++
 1 file changed, 60 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-4.f 
b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-4.f
new file mode 100644
index ..401d3a372b3b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-4.f
@@ -0,0 +1,60 @@
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+
+! As acc_on_device-1-3.f, but using the acc_on_device builtin.
+
+! { dg-additional-options "-fopt-info-all-omp" }
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! { dg-additional-options "-foffload=-fopt-info-all-omp" }
+! { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+! for testing/documenting aspects of that functionality.
+
+  IMPLICIT NONE
+  INCLUDE "openacc_lib.h"
+
+!Host.
+
+  IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) STOP 1
+  IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) STOP 2
+  IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) STOP 3
+  IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) STOP 4
+  IF (ACC_ON_DEVICE (ACC_DEVICE_RADEON)) STOP 4
+
+
+!Host via offloading fallback mode.
+
+!$ACC PARALLEL IF(.FALSE.)
+! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper 
OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } .-1 }
+!TODO Unhandled 'CONST_DECL' instances for constant arguments in 
'acc_on_device' calls.
+  IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) STOP 5
+  IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) STOP 6
+  IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) STOP 7
+  IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) STOP 8
+  IF (ACC_ON_DEVICE (ACC_DEVICE_RADEON)) STOP 8
+!$ACC END PARALLEL
+
+
+#if !ACC_DEVICE_TYPE_host
+
+! Offloaded.
+
+!$ACC PARALLEL
+! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper 
OpenACC privatization level: 'const_decl'} "TODO" { target { ! 
openacc_host_selected } } .-1 }
+  IF (ACC_ON_DEVICE (ACC_DEVICE_NONE)) STOP 9
+  IF (ACC_ON_DEVICE (ACC_DEVICE_HOST)) STOP 10
+  IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) STOP 11
+#if ACC_DEVICE_TYPE_nvidia
+  IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) STOP 12
+#else
+  IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) STOP 13
+#endif
+#if ACC_DEVICE_TYPE_radeon
+  IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_RADEON)) STOP 14
+#else
+  IF (ACC_ON_DEVICE (ACC_DEVICE_RADEON)) STOP 15
+#endif
+!$ACC END PARALLEL
+
+#endif
+
+  END

[gcc r15-4385] Fix bootstrap on 32-bit SPARC/Solaris

2024-10-16 Thread Eric Botcazou via Gcc-cvs

https://gcc.gnu.org/g:935b7fbd03373c91bae065c6fe862a9fc7d1a901

commit r15-4385-g935b7fbd03373c91bae065c6fe862a9fc7d1a901
Author: Eric Botcazou 
Date:   Wed Oct 16 13:59:50 2024 +0200

Fix bootstrap on 32-bit SPARC/Solaris

The 'U' constraint cannot be used with LRA.

gcc/
PR target/113952
PR target/117168
* config/sparc/constraints.md ('U'): Delete.
* config/sparc/sparc.md (*movdi_insn_sp32): Remove U alternatives.
(*movdf_insn_sp32): Likewise.
(*mov_insn_sp32): Likewise.
* doc/md.texi (SPARC constraints): Remove entry for 'U'.

Diff:
---
 gcc/config/sparc/constraints.md | 45 ---
 gcc/config/sparc/sparc.md   | 47 ++---
 gcc/doc/md.texi |  3 ---
 3 files changed, 20 insertions(+), 75 deletions(-)

diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index 350ad8e9fbaf..6cb7a30d4553 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -145,51 +145,6 @@
   (match_test "TARGET_ARCH32")
   (match_test "memory_ok_for_ldd (op)")))
 
-;; This awkward register constraint is necessary because it is not
-;; possible to express the "must be even numbered register" condition
-;; using register classes.  The problem is that membership in a
-;; register class requires that all registers of a multi-regno
-;; register be included in the set.  It is add_to_hard_reg_set
-;; and in_hard_reg_set_p which populate and test regsets with these
-;; semantics.
-;;
-;; So this means that we would have to put both the even and odd
-;; register into the register class, which would not restrict things
-;; at all.
-;;
-;; Using a combination of GENERAL_REGS and TARGET_HARD_REGNO_MODE_OK is
-;; not a full solution either.  In fact, even though IRA uses the macro
-;; TARGET_HARD_REGNO_MODE_OK to calculate which registers are prohibited
-;; from use in certain modes, it still can allocate an odd hard register
-;; for DImode values.  This is due to how IRA populates the table
-;; ira_useful_class_mode_regs[][].  It suffers from the same problem
-;; as using a register class to describe this restriction.  Namely, it
-;; sets both the odd and even part of an even register pair in the
-;; regset.  Therefore IRA can and will allocate odd registers for
-;; DImode values on 32-bit.
-;;
-;; There are legitimate cases where DImode values can end up in odd
-;; hard registers, the most notable example is argument passing.
-;;
-;; What saves us is reload and the DImode splitters.  Both are
-;; necessary.  The odd register splitters cannot match if, for
-;; example, we have a non-offsetable MEM.  Reload will notice this
-;; case and reload the address into a single hard register.
-;;
-;; The real downfall of this awkward register constraint is that it
-;; does not evaluate to a true register class like a bonafide use of
-;; define_register_constraint would.  This means that we cannot use
-;; it with LRA, since the constraint processing of LRA really depends
-;; upon whether an extra constraint is for registers or not.  It uses
-;; reg_class_for_constraint, and checks it against NO_REGS.
-(define_constraint "U"
- "Pseudo-register or hard even-numbered integer register"
- (and (match_code "reg")
-  (ior (match_test "REGNO (op) < FIRST_PSEUDO_REGISTER")
-  (not (match_test "reload_in_progress && reg_renumber [REGNO (op)] < 
0")))
-  (match_test "TARGET_ARCH32")
-  (match_test "register_ok_for_ldd (op)")))
-
 (define_memory_constraint "W"
   "A memory with only a base register"
   (match_operand 0 "mem_noofs_operand"))
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 96c542c6ab6e..9703a2097e63 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -1831,9 +1831,9 @@
 
 (define_insn "*movdi_insn_sp32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
-   "=T,o,U,T,r,o,r,r,?*f,  T,?*f,  o,?*e,?*e,  
r,?*f,?*e,  T,*b,*b")
+   "=T,o,r,o,r,r,?*f,  T,?*f,  o,?*e,?*e,  r,?*f,?*e,  
T,*b,*b")
 (match_operand:DI 1 "input_operand"
-   " J,J,T,U,o,r,i,r,  T,?*f,  o,?*f, *e, *e,?*f,  r,  
T,?*e, J, P"))]
+   " J,J,o,r,i,r,  T,?*f,  o,?*f, *e, *e,?*f,  r,  
T,?*e, J, P"))]
   "TARGET_ARCH32
&& (register_operand (operands[0], DImode)
|| register_or_zero_operand (operands[1], DImode))"
@@ -1842,8 +1842,6 @@
#
ldd\t%1, %0
std\t%1, %0
-   ldd\t%1, %0
-   std\t%1, %0
#
#
ldd\t%1, %0
@@ -1858,12 +1856,11 @@
std\t%1, %0
fzero\t%0
fone\t%0"
-  [(set_attr "type" 
"store,*,load,store,load,store,*,*,fpload,fpstore,*,*,fpmove,*,*,*,fpload,fpstore,visl,
-visl")
-   (set_attr "subtype" 
"*,*,regular,*,regular,*,*,*,*,*,*,*,*,*,*,*,*,*,double,double")
-   (set_attr "length" "*,2,*,*,*,*,2,2,*,*,2,2,*,2,2,2,*,*,*,

[gcc r13-9117] Add new microarchitecture tune for SRF/GRR/CWF.

2024-10-16 Thread hongtao Liu via Gcc-cvs

https://gcc.gnu.org/g:e9eadc29c1c57cd7be9ec8de231d8fb9e8ac0c7c

commit r13-9117-ge9eadc29c1c57cd7be9ec8de231d8fb9e8ac0c7c
Author: liuhongt 
Date:   Tue Sep 24 15:53:14 2024 +0800

Add new microarchitecture tune for SRF/GRR/CWF.

For Crestmont, 4-operand vex blendv instructions come from MSROM and
is slower than 3-instructions sequence (op1 & mask) | (op2 & ~mask).
legacy blendv instruction can still be handled by the decoder.

The patch add a new tune which is enabled for all processors except
for SRF/CWF. It will use vpand + vpandn + vpor instead of
vpblendvb(similar for vblendvps/vblendvpd) for SRF/CWF.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Guard
instruction blendv generation under new tune.
* config/i386/i386.h (TARGET_SSE_MOVCC_USE_BLENDV): New Macro.
* config/i386/x86-tune.def (X86_TUNE_SSE_MOVCC_USE_BLENDV):
New tune.

(cherry picked from commit 9c8cea8feb6cd54ef73113a0b74f1df7b60d09dc)

Diff:
---
 gcc/config/i386/i386-expand.cc | 24 +++---
 gcc/config/i386/i386.h |  2 ++
 gcc/config/i386/x86-tune.def   |  8 
 .../gcc.target/i386/sse_movcc_use_blendv.c | 12 +++
 4 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 3112c0b78dcc..1130b6a51853 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -4116,29 +4116,29 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
   switch (mode)
 {
 case E_V2SFmode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
gen = gen_mmx_blendvps;
   break;
 case E_V4SFmode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
gen = gen_sse4_1_blendvps;
   break;
 case E_V2DFmode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
gen = gen_sse4_1_blendvpd;
   break;
 case E_SFmode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
gen = gen_sse4_1_blendvss;
   break;
 case E_DFmode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
gen = gen_sse4_1_blendvsd;
   break;
 case E_V8QImode:
 case E_V4HImode:
 case E_V2SImode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
{
  gen = gen_mmx_pblendvb_v8qi;
  blend_mode = V8QImode;
@@ -4146,14 +4146,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
   break;
 case E_V4QImode:
 case E_V2HImode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
{
  gen = gen_mmx_pblendvb_v4qi;
  blend_mode = V4QImode;
}
   break;
 case E_V2QImode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
gen = gen_mmx_pblendvb_v2qi;
   break;
 case E_V16QImode:
@@ -4163,18 +4163,18 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
 case E_V4SImode:
 case E_V2DImode:
 case E_V1TImode:
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE_MOVCC_USE_BLENDV && TARGET_SSE4_1)
{
  gen = gen_sse4_1_pblendvb;
  blend_mode = V16QImode;
}
   break;
 case E_V8SFmode:
-  if (TARGET_AVX)
+  if (TARGET_AVX && TARGET_SSE_MOVCC_USE_BLENDV)
gen = gen_avx_blendvps256;
   break;
 case E_V4DFmode:
-  if (TARGET_AVX)
+  if (TARGET_AVX && TARGET_SSE_MOVCC_USE_BLENDV)
gen = gen_avx_blendvpd256;
   break;
 case E_V32QImode:
@@ -4183,7 +4183,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
 case E_V16BFmode:
 case E_V8SImode:
 case E_V4DImode:
-  if (TARGET_AVX2)
+  if (TARGET_AVX2 && TARGET_SSE_MOVCC_USE_BLENDV)
{
  gen = gen_avx2_pblendvb;
  blend_mode = V32QImode;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 79f7dc31b779..cda755b374d8 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -448,6 +448,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
ix86_tune_features[X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD]
 #define TARGET_DEST_FALSE_DEP_FOR_GLC \
ix86_tune_features[X86_TUNE_DEST_FALSE_DEP_FOR_GLC]
+#define TARGET_SSE_MOVCC_USE_BLENDV \
+   ix86_tune_features[X86_TUNE_SSE_MOVCC_USE_BLENDV]
 
 /* Feature tests against the various architecture variations.  */
 enum ix86_arch_indices {
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 4231ca90b0ed..ce903cf29a75 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -529,6 +529,14 @@

[gcc r13-9118] Add a new tune avx256_avoid_vec_perm for SRF.

2024-10-16 Thread hongtao Liu via Gcc-cvs

https://gcc.gnu.org/g:eecd5f8ce1729a214bf0a1edfdd3ee1cf79be881

commit r13-9118-geecd5f8ce1729a214bf0a1edfdd3ee1cf79be881
Author: liuhongt 
Date:   Wed Sep 25 13:11:11 2024 +0800

Add a new tune avx256_avoid_vec_perm for SRF.

According to Intel SOM[1], For Crestmont,  most 256-bit Intel AVX2
instructions can be decomposed into two independent 128-bit
micro-operations, except for a subset of Intel AVX2 instructions,
known as cross-lane operations, can only compute the result for an
element by utilizing one or more sources belonging to other elements.

The 256-bit instructions listed below use more operand sources than
can be natively supported by a single reservation station within these
microarchitectures. They are decomposed into two μops, where the first
μop resolves a subset of operand dependencies across two cycles. The
dependent second μop executes the 256-bit operation by using a single
128-bit execution port for two consecutive cycles with a five-cycle
latency for a total latency of seven cycles.

VPERM2I128 ymm1, ymm2, ymm3/m256, imm8
VPERM2F128 ymm1, ymm2, ymm3/m256, imm8
VPERMPD ymm1, ymm2/m256, imm8
VPERMPS ymm1, ymm2, ymm3/m256
VPERMD ymm1, ymm2, ymm3/m256
VPERMQ ymm1, ymm2/m256, imm8

Instead of setting tune avx128_optimal for SRF, the patch add a new
tune avx256_avoid_vec_perm for it. so by default, vectorizer still
uses 256-bit VF if cost is profitable, but lowers to 128-bit whenever
256-bit vec_perm is needed for auto-vectorization. w/o vec_perm,
performance of 256-bit vectorization should be similar as 128-bit
ones(some benchmark results show it's even better than 128-bit
vectorization since it enables more parallelism for convert cases.)

[1] 
https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html

gcc/ChangeLog:

* config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs):
Add new member m_num_avx256_vec_perm.
(ix86_vector_costs::add_stmt_cost): Record 256-bit vec_perm.
(ix86_vector_costs::finish_cost): Prevent vectorization for
TAREGT_AVX256_AVOID_VEC_PERM when there's 256-bit vec_perm
instruction.
* config/i386/i386.h (TARGET_AVX256_AVOID_VEC_PERM): New
Macro.
* config/i386/x86-tune.def (X86_TUNE_AVX256_SPLIT_REGS): Add
m_CORE_ATOM.
(X86_TUNE_AVX256_AVOID_VEC_PERM): New tune.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx256_avoid_vec_perm.c: New test.

(cherry picked from commit 9eaecce3d8c1d9349adbf8c2cdaf8d87672ed29c)

Diff:
---
 gcc/config/i386/i386.cc|  5 +
 gcc/config/i386/i386.h |  2 ++
 gcc/config/i386/x86-tune.def   |  7 ++-
 .../gcc.target/i386/avx256_avoid_vec_perm.c| 22 ++
 4 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 1e43ae15d7bd..8323b2e7cd39 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23746,6 +23746,11 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
   if (stmt_cost == -1)
 stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
 
+  if (kind == vec_perm && vectype
+  && GET_MODE_SIZE (TYPE_MODE (vectype)) == 32
+  && TARGET_AVX256_AVOID_VEC_PERM)
+stmt_cost += 1000;
+
   /* Penalize DFmode vector operations for Bonnell.  */
   if (TARGET_CPU_P (BONNELL) && kind == vector_stmt
   && vectype && GET_MODE_INNER (TYPE_MODE (vectype)) == DFmode)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index cda755b374d8..08309367c18b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -425,6 +425,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
ix86_tune_features[X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL]
 #define TARGET_AVX256_SPLIT_REGS \
ix86_tune_features[X86_TUNE_AVX256_SPLIT_REGS]
+#define TARGET_AVX256_AVOID_VEC_PERM \
+   ix86_tune_features[X86_TUNE_AVX256_AVOID_VEC_PERM]
 #define TARGET_AVX512_SPLIT_REGS \
ix86_tune_features[X86_TUNE_AVX512_SPLIT_REGS]
 #define TARGET_GENERAL_REGS_SSE_SPILL \
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index ce903cf29a75..773a4ea4ccf6 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -553,7 +553,7 @@ DEF_TUNE (X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL, 
"256_unaligned_store_optimal"
 
 /* X86_TUNE_AVX256_SPLIT_REGS: if true, AVX256 ops are split into two AVX128 
ops.  */
 DEF_TUNE (X86_TUNE_AVX256_SPLIT_REGS, "avx256_split_regs",m_BDVER | m_BTVER2
- | m_ZNVER1)
+ | m_ZNVER1 | m_CORE_ATOM)
 
 /* X86_TUNE_AVX128_OPTIMAL: Enable 128-bit AVX instruction

[gcc r13-9119] i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]

2024-10-16 Thread Uros Bizjak via Gcc-cvs

https://gcc.gnu.org/g:dc295054c4ba28e44d4856bb68d148e9ac272d05

commit r13-9119-gdc295054c4ba28e44d4856bb68d148e9ac272d05
Author: Uros Bizjak 
Date:   Tue Oct 15 16:51:33 2024 +0200

i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]

Middle end can generate SYMBOL_REF RTX as a value "val" in the call
to expand_vector_set, but SYMBOL_REF RTX is not accepted in
_pinsr insn pattern, generated via
VEC_MERGE/VEC_DUPLICATE RTX path.

Force the value into a register before VEC_MERGE/VEC_DUPLICATE RTX
is generated if it doesn't satisfy nonimmediate_operand predicate.

PR target/117116

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vector_set): Force "val"
into a register before VEC_MERGE/VEC_DUPLICATE RTX is generated
if it doesn't satisfy nonimmediate_operand predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117116.c: New test.

(cherry picked from commit 80d7032067a3a5b76aecd657d9b35b0a8f5a941d)

Diff:
---
 gcc/config/i386/i386-expand.cc   |  2 ++
 gcc/testsuite/gcc.target/i386/pr117116.c | 18 ++
 2 files changed, 20 insertions(+)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 1130b6a51853..dc85103f3a81 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -17096,6 +17096,8 @@ quarter:
   else if (use_vec_merge)
 {
 do_vec_merge:
+  if (!nonimmediate_operand (val, inner_mode))
+   val = force_reg (inner_mode, val);
   tmp = gen_rtx_VEC_DUPLICATE (mode, val);
   tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
   GEN_INT (HOST_WIDE_INT_1U << elt));
diff --git a/gcc/testsuite/gcc.target/i386/pr117116.c 
b/gcc/testsuite/gcc.target/i386/pr117116.c
new file mode 100644
index ..d6e28848a4b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr117116.c
@@ -0,0 +1,18 @@
+/* PR target/117116 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+typedef void (*StmFct)();
+typedef struct {
+  StmFct fct_getc;
+  StmFct fct_putc;
+  StmFct fct_flush;
+  StmFct fct_close;
+} StmInf;
+
+StmInf TTY_Getc_pstm;
+
+void TTY_Getc() {
+  TTY_Getc_pstm.fct_getc = TTY_Getc;
+  TTY_Getc_pstm.fct_putc = TTY_Getc_pstm.fct_flush = TTY_Getc_pstm.fct_close = 
(StmFct)1;
+}

[gcc r14-10797] i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]

2024-10-16 Thread Uros Bizjak via Gcc-cvs

https://gcc.gnu.org/g:8be94d5643176ecd2dcdceaf4448c3b89318037c

commit r14-10797-g8be94d5643176ecd2dcdceaf4448c3b89318037c
Author: Uros Bizjak 
Date:   Tue Oct 15 16:51:33 2024 +0200

i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]

Middle end can generate SYMBOL_REF RTX as a value "val" in the call
to expand_vector_set, but SYMBOL_REF RTX is not accepted in
_pinsr insn pattern, generated via
VEC_MERGE/VEC_DUPLICATE RTX path.

Force the value into a register before VEC_MERGE/VEC_DUPLICATE RTX
is generated if it doesn't satisfy nonimmediate_operand predicate.

PR target/117116

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vector_set): Force "val"
into a register before VEC_MERGE/VEC_DUPLICATE RTX is generated
if it doesn't satisfy nonimmediate_operand predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117116.c: New test.

(cherry picked from commit 80d7032067a3a5b76aecd657d9b35b0a8f5a941d)

Diff:
---
 gcc/config/i386/i386-expand.cc   |  2 ++
 gcc/testsuite/gcc.target/i386/pr117116.c | 18 ++
 2 files changed, 20 insertions(+)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index cad8b6d58842..7019116fcac1 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -17627,6 +17627,8 @@ quarter:
   else if (use_vec_merge)
 {
 do_vec_merge:
+  if (!nonimmediate_operand (val, inner_mode))
+   val = force_reg (inner_mode, val);
   tmp = gen_rtx_VEC_DUPLICATE (mode, val);
   tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
   GEN_INT (HOST_WIDE_INT_1U << elt));
diff --git a/gcc/testsuite/gcc.target/i386/pr117116.c 
b/gcc/testsuite/gcc.target/i386/pr117116.c
new file mode 100644
index ..d6e28848a4b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr117116.c
@@ -0,0 +1,18 @@
+/* PR target/117116 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+typedef void (*StmFct)();
+typedef struct {
+  StmFct fct_getc;
+  StmFct fct_putc;
+  StmFct fct_flush;
+  StmFct fct_close;
+} StmInf;
+
+StmInf TTY_Getc_pstm;
+
+void TTY_Getc() {
+  TTY_Getc_pstm.fct_getc = TTY_Getc;
+  TTY_Getc_pstm.fct_putc = TTY_Getc_pstm.fct_flush = TTY_Getc_pstm.fct_close = 
(StmFct)1;
+}

[gcc r12-10774] i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]

2024-10-16 Thread Uros Bizjak via Gcc-cvs

https://gcc.gnu.org/g:a8bd38de88715fdbf0d064ff0d50e2b8734de939

commit r12-10774-ga8bd38de88715fdbf0d064ff0d50e2b8734de939
Author: Uros Bizjak 
Date:   Tue Oct 15 16:51:33 2024 +0200

i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]

Middle end can generate SYMBOL_REF RTX as a value "val" in the call
to expand_vector_set, but SYMBOL_REF RTX is not accepted in
_pinsr insn pattern, generated via
VEC_MERGE/VEC_DUPLICATE RTX path.

Force the value into a register before VEC_MERGE/VEC_DUPLICATE RTX
is generated if it doesn't satisfy nonimmediate_operand predicate.

PR target/117116

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vector_set): Force "val"
into a register before VEC_MERGE/VEC_DUPLICATE RTX is generated
if it doesn't satisfy nonimmediate_operand predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117116.c: New test.

(cherry picked from commit 80d7032067a3a5b76aecd657d9b35b0a8f5a941d)

Diff:
---
 gcc/config/i386/i386-expand.cc   |  2 ++
 gcc/testsuite/gcc.target/i386/pr117116.c | 18 ++
 2 files changed, 20 insertions(+)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index c57a8f56dac3..909c11e4195b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -16541,6 +16541,8 @@ quarter:
   else if (use_vec_merge)
 {
 do_vec_merge:
+  if (!nonimmediate_operand (val, inner_mode))
+   val = force_reg (inner_mode, val);
   tmp = gen_rtx_VEC_DUPLICATE (mode, val);
   tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
   GEN_INT (HOST_WIDE_INT_1U << elt));
diff --git a/gcc/testsuite/gcc.target/i386/pr117116.c 
b/gcc/testsuite/gcc.target/i386/pr117116.c
new file mode 100644
index ..d6e28848a4b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr117116.c
@@ -0,0 +1,18 @@
+/* PR target/117116 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+typedef void (*StmFct)();
+typedef struct {
+  StmFct fct_getc;
+  StmFct fct_putc;
+  StmFct fct_flush;
+  StmFct fct_close;
+} StmInf;
+
+StmInf TTY_Getc_pstm;
+
+void TTY_Getc() {
+  TTY_Getc_pstm.fct_getc = TTY_Getc;
+  TTY_Getc_pstm.fct_putc = TTY_Getc_pstm.fct_flush = TTY_Getc_pstm.fct_close = 
(StmFct)1;
+}

[gcc r15-4386] Ternary operator formatting fixes

2024-10-16 Thread Jakub Jelinek via Gcc-cvs

https://gcc.gnu.org/g:e48a65d3b3fcbcf6059df247d9c87a9a19b35861

commit r15-4386-ge48a65d3b3fcbcf6059df247d9c87a9a19b35861
Author: Jakub Jelinek 
Date:   Wed Oct 16 14:44:32 2024 +0200

Ternary operator formatting fixes

While working on PR117028 C2Y changes, I've noticed weird ternary
operator formatting (operand1 ? operand2: operand3).
The usual formatting is operand1 ? operand2 : operand3
where we have around 18000+ cases of that (counting only what fits
on one line) and
indent -nbad -bap -nbc -bbo -bl -bli2 -bls -ncdb -nce -cp1 -cs -di2 -ndj \
   -nfc1 -nfca -hnl -i2 -ip5 -lp -pcs -psl -nsc -nsob
documented in
https://www.gnu.org/prep/standards/html_node/Formatting.html#Formatting
does the same.
Some code was even trying to save space as much as possible and used
operand1?operand2:operand3 or
operand1 ? operand2:operand3

Today I've grepped for such cases (the grep was '?.*[^ ]:' and I had to
skim through various false positives with that where the : matched e.g.
stuff inside of strings, or *.md pattern macros or :: scope) and the
following patch is a fix for what I found.

2024-10-16  Jakub Jelinek  

gcc/
* attribs.cc (lookup_scoped_attribute_spec): ?: operator formatting
fixes.
* basic-block.h (FOR_BB_INSNS_SAFE): Likewise.
* cfgcleanup.cc (outgoing_edges_match): Likewise.
* cgraph.cc (cgraph_node::dump): Likewise.
* config/arc/arc.cc (gen_acc1, gen_acc2): Likewise.
* config/arc/arc.h (CLASS_MAX_NREGS, CONSTANT_ADDRESS_P): Likewise.
* config/arm/arm.cc (arm_print_operand): Likewise.
* config/cris/cris.md (*b): Likewise.
* config/darwin.cc (darwin_asm_declare_object_name,
darwin_emit_common): Likewise.
* config/darwin-driver.cc (darwin_driver_init): Likewise.
* config/epiphany/epiphany.md (call, sibcall, call_value,
sibcall_value): Likewise.
* config/i386/i386.cc (gen_push2): Likewise.
* config/i386/i386.h (ix86_cur_cost): Likewise.
* config/i386/openbsdelf.h (FUNCTION_PROFILER): Likewise.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins):
Likewise.
* config/loongarch/loongarch-cpu.cc (fill_native_cpu_config):
Likewise.
* config/riscv/riscv.cc (riscv_union_memmodels): Likewise.
* config/riscv/zc.md (*mva01s, *mvsa01): Likewise.
* config/rs6000/mmintrin.h (_mm_cmpeq_pi8, _mm_cmpgt_pi8,
_mm_cmpeq_pi16, _mm_cmpgt_pi16, _mm_cmpeq_pi32, _mm_cmpgt_pi32):
Likewise.
* config/v850/predicates.md (pattern_is_ok_for_prologue): Likewise.
* config/xtensa/constraints.md (d, C, W): Likewise.
* coverage.cc (coverage_begin_function, build_init_ctor,
build_gcov_exit_decl): Likewise.
* df-problems.cc (df_create_unused_note): Likewise.
* diagnostic.cc (diagnostic_set_caret_max_width): Likewise.
* diagnostic-path.cc (path_summary::path_summary): Likewise.
* expr.cc (expand_expr_divmod): Likewise.
* gcov.cc (format_gcov): Likewise.
* gcov-dump.cc (dump_gcov_file): Likewise.
* genmatch.cc (main): Likewise.
* incpath.cc (remove_duplicates, register_include_chains): Likewise.
* ipa-devirt.cc (dump_odr_type): Likewise.
* ipa-icf.cc (sem_item_optimizer::merge_classes): Likewise.
* ipa-inline.cc (inline_small_functions): Likewise.
* ipa-polymorphic-call.cc (ipa_polymorphic_call_context::dump):
Likewise.
* ipa-sra.cc (create_parameter_descriptors): Likewise.
* ipa-utils.cc (find_always_executed_bbs): Likewise.
* predict.cc (predict_loops): Likewise.
* selftest.cc (read_file): Likewise.
* sreal.h (SREAL_SIGN, SREAL_ABS): Likewise.
* tree-dump.cc (dequeue_and_dump): Likewise.
* tree-ssa-ccp.cc (bit_value_binop): Likewise.
gcc/c-family/
* c-opts.cc (c_common_init_options, c_common_handle_option,
c_common_finish, set_std_c89, set_std_c99, set_std_c11,
set_std_c17, set_std_c23, set_std_cxx98, set_std_cxx11,
set_std_cxx14, set_std_cxx17, set_std_cxx20, set_std_cxx23,
set_std_cxx26): ?: operator formatting fixes.
gcc/cp/
* search.cc (lookup_member): ?: operator formatting fixes.
* typeck.cc (cp_build_modify_expr): Likewise.
libcpp/
* expr.cc (interpret_float_suffix): ?: operator formatting fixes.

Diff:
---
 gcc/attribs.cc|  2 +-
 gcc/basic-block.h |  2 +-
 gcc/c-family/c-opts.cc| 32 ++--
 gcc/cfgcleanup.cc |  4 +--
 gcc/cgraph.cc

[gcc r15-4405] Fix ICE with coarrays and submodules [PR80235]

2024-10-16 Thread Andre Vehreschild via Gcc-cvs

https://gcc.gnu.org/g:e32fff675c3bb040fa79854f6b0654c16bc38997

commit r15-4405-ge32fff675c3bb040fa79854f6b0654c16bc38997
Author: Andre Vehreschild 
Date:   Tue Sep 24 14:30:52 2024 +0200

Fix ICE with coarrays and submodules [PR80235]

Exposing a variable in a module and referencing it in a submodule made
the compiler ICE, because the external variable was not sorted into the
correct module.  In fact the module name was not set where the variable
got built.

gcc/fortran/ChangeLog:

PR fortran/80235

* trans-decl.cc (gfc_build_qualified_array): Make sure the array
is associated to the correct module and being marked as extern.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/add_sources/submodule_1_sub.f90: New test.
* gfortran.dg/coarray/submodule_1.f90: New test.

Diff:
---
 gcc/fortran/trans-decl.cc  |  7 --
 .../coarray/add_sources/submodule_1_sub.f90| 22 
 gcc/testsuite/gfortran.dg/coarray/submodule_1.f90  | 29 ++
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 56b6202510e8..9cced7c02e40 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -1066,7 +1066,8 @@ gfc_build_qualified_array (tree decl, gfc_symbol * sym)
IDENTIFIER_POINTER (gfc_sym_mangled_identifier (sym;
  token = build_decl (DECL_SOURCE_LOCATION (decl), VAR_DECL, token_name,
  token_type);
- if (sym->attr.use_assoc)
+ if (sym->attr.use_assoc
+ || (sym->attr.host_assoc && sym->attr.used_in_submodule))
DECL_EXTERNAL (token) = 1;
  else
TREE_STATIC (token) = 1;
@@ -1091,9 +1092,11 @@ gfc_build_qualified_array (tree decl, gfc_symbol * sym)
 
   if (sym->module && !sym->attr.use_assoc)
{
+ module_htab_entry *mod
+   = cur_module ? cur_module : gfc_find_module (sym->module);
  pushdecl (token);
  DECL_CONTEXT (token) = sym->ns->proc_name->backend_decl;
- gfc_module_add_decl (cur_module, token);
+ gfc_module_add_decl (mod, token);
}
   else if (sym->attr.host_assoc
   && TREE_CODE (DECL_CONTEXT (current_function_decl))
diff --git a/gcc/testsuite/gfortran.dg/coarray/add_sources/submodule_1_sub.f90 
b/gcc/testsuite/gfortran.dg/coarray/add_sources/submodule_1_sub.f90
new file mode 100644
index ..fd177fcda298
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/add_sources/submodule_1_sub.f90
@@ -0,0 +1,22 @@
+! This test belongs to submodule_1.f90
+! It is references as additional source in that test.
+! The two code fragments need to be in separate files to show
+! the error of pr80235.
+
+submodule (pr80235) pr80235_sub
+
+contains
+  module subroutine test()
+implicit none
+if (var%v /= 42) stop 1
+  end subroutine
+end submodule pr80235_sub
+
+program pr80235_prg
+  use pr80235
+  
+  implicit none
+
+  var%v = 42
+  call test()
+end program
diff --git a/gcc/testsuite/gfortran.dg/coarray/submodule_1.f90 
b/gcc/testsuite/gfortran.dg/coarray/submodule_1.f90
new file mode 100644
index ..d0faef93ba76
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/submodule_1.f90
@@ -0,0 +1,29 @@
+!{ dg-do run }
+!{ dg-additional-sources add_sources/submodule_1_sub.f90 }
+
+! Separating the module and the submodule is needed to show the error.
+! Having all code pieces in one file does not show the error.
+
+module pr80235
+  implicit none
+
+  private
+  public :: test, var
+
+  type T
+integer :: v
+  end type T
+
+interface
+
+  module subroutine test()
+  end subroutine
+
+end interface
+
+  type(T) :: var[*]
+
+end module pr80235
+
+
+

[gcc r15-4404] Fix gcc.dg/vect/vect-early-break_39.c FAIL with forced SLP

2024-10-16 Thread Richard Biener via Gcc-cvs

https://gcc.gnu.org/g:6293272e9a47e6e7debe4acd8195a2ae2d9ef0df

commit r15-4404-g6293272e9a47e6e7debe4acd8195a2ae2d9ef0df
Author: Richard Biener 
Date:   Wed Oct 16 11:37:31 2024 +0200

Fix gcc.dg/vect/vect-early-break_39.c FAIL with forced SLP

The testcases shows single-element interleaving of size three
being exempted from permutation lowering via heuristics
(see also PR116973).  But it wasn't supposed to apply to
non-power-of-two sizes so this amends the check to ensure
the sub-group is aligned even when the number of lanes is one.

* tree-vect-slp.cc (vect_lower_load_permutations): Avoid
exempting non-power-of-two group sizes from lowering.

Diff:
---
 gcc/tree-vect-slp.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 629c4b433ab5..d35c2ea02dce 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4427,6 +4427,7 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
  && contiguous
  && (SLP_TREE_LANES (load) > 1 || loads.size () == 1)
  && pow2p_hwi (SLP_TREE_LANES (load))
+ && pow2p_hwi (group_lanes)
  && SLP_TREE_LOAD_PERMUTATION (load)[0] % SLP_TREE_LANES (load) == 0
  && group_lanes % SLP_TREE_LANES (load) == 0)
{

[gcc r15-4402] c: Fix up speed up compilation of large char array initializers when not using #embed [PR117177]

2024-10-16 Thread Jakub Jelinek via Gcc-cvs

https://gcc.gnu.org/g:96ba5e5663d4390a7e69735ce3c9de657fc543fc

commit r15-4402-g96ba5e5663d4390a7e69735ce3c9de657fc543fc
Author: Jakub Jelinek 
Date:   Thu Oct 17 06:59:31 2024 +0200

c: Fix up speed up compilation of large char array initializers when not 
using #embed [PR117177]

Apparently my
c: Speed up compilation of large char array initializers when not using 
#embed
patch broke building glibc.

The issue is that when using CPP_EMBED, we are guaranteed by the
preprocessor that there is CPP_NUMBER CPP_COMMA before it and
CPP_COMMA CPP_NUMBER after it (or CPP_COMMA CPP_EMBED), so RAW_DATA_CST
never ends up at the end of arrays of unknown length.
Now, the c_parser_initval optimization attempted to preserve that property
rather than changing everything that e.g. inferes array number of elements
from the initializer etc. to deal with RAW_DATA_CST at the end, but
it didn't take into account the possibility that there could be
CPP_COMMA followed by CPP_CLOSE_BRACE (where the CPP_COMMA is redundant).

As we are peaking already at 4 tokens in that code, peeking more would
require using raw tokens and that seems to be expensive doing it for
every pair of tokens due to vec_free done when we are out of raw tokens.

So, the following patch instead determines the case where we want
another INTEGER_CST element after it after consuming the tokens, and just
arranges for another process_init_element.

2024-10-17  Jakub Jelinek  

PR c/117177
gcc/c/
* c-parser.cc (c_parser_initval): Instead of doing
orig_len == INT_MAX checks before consuming tokens to set
last = 1, check it after consuming it and if not followed
by CPP_COMMA CPP_NUMBER, call process_init_element once
more with the last CPP_NUMBER.
gcc/testsuite/
* c-c++-common/init-4.c: New test.

Diff:
---
 gcc/c/c-parser.cc   | 35 +
 gcc/testsuite/c-c++-common/init-4.c | 97 +
 2 files changed, 122 insertions(+), 10 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index e4381044e5cb..090ab1cbc088 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -6529,6 +6529,7 @@ c_parser_initval (c_parser *parser, struct c_expr *after,
unsigned int i;
gcc_checking_assert (len >= 64);
location_t last_loc = UNKNOWN_LOCATION;
+   location_t prev_loc = UNKNOWN_LOCATION;
for (i = 0; i < 64; ++i)
  {
c_token *tok = c_parser_peek_nth_token_raw (parser, 1 + 2 * i);
@@ -6544,6 +6545,7 @@ c_parser_initval (c_parser *parser, struct c_expr *after,
buf1[i] = (char) tree_to_uhwi (tok->value);
if (i == 0)
  loc = tok->location;
+   prev_loc = last_loc;
last_loc = tok->location;
  }
if (i < 64)
@@ -6567,6 +6569,7 @@ c_parser_initval (c_parser *parser, struct c_expr *after,
unsigned int max_len = 131072 - offsetof (struct tree_string, str) - 1;
unsigned int orig_len = len;
unsigned int off = 0, last = 0;
+   unsigned char lastc = 0;
if (!wi::neg_p (wi::to_wide (val)) && wi::to_widest (val) <= UCHAR_MAX)
  off = 1;
len = MIN (len, max_len - off);
@@ -6596,20 +6599,25 @@ c_parser_initval (c_parser *parser, struct c_expr 
*after,
if (tok2->type != CPP_COMMA && tok2->type != CPP_CLOSE_BRACE)
  break;
buf2[i + off] = (char) tree_to_uhwi (tok->value);
-   /* If orig_len is INT_MAX, this can be flexible array member and
-  in that case we need to ensure another element which
-  for CPP_EMBED is normally guaranteed after it.  Include
-  that byte in the RAW_DATA_OWNER though, so it can be optimized
-  later.  */
-   if (tok2->type == CPP_CLOSE_BRACE && orig_len == INT_MAX)
- {
-   last = 1;
-   break;
- }
+   prev_loc = last_loc;
last_loc = tok->location;
c_parser_consume_token (parser);
c_parser_consume_token (parser);
  }
+   /* If orig_len is INT_MAX, this can be flexible array member and
+  in that case we need to ensure another element which
+  for CPP_EMBED is normally guaranteed after it.  Include
+  that byte in the RAW_DATA_OWNER though, so it can be optimized
+  later.  */
+   if (orig_len == INT_MAX
+   && (!c_parser_next_token_is (parser, CPP_COMMA)
+   || c_parser_peek_2nd_token (parser)->type != CPP_NUMBER))
+ {
+   --i;
+   last = 1;
+   std::swap (prev_loc, last_loc);
+   lastc = (unsigned char) buf2[i + off];
+ }
val = make_node (RAW_DATA_CST);
TREE_TYPE (val) = integer_type_node;
RAW_DATA_LENGTH (val) = i;

[gcc r15-4403] c, libcpp: Partially implement C2Y N3353 paper [PR117028]

2024-10-16 Thread Jakub Jelinek via Gcc-cvs

https://gcc.gnu.org/g:e020116db056352d9a7495e85d37e66c36f6ea32

commit r15-4403-ge020116db056352d9a7495e85d37e66c36f6ea32
Author: Jakub Jelinek 
Date:   Thu Oct 17 07:01:44 2024 +0200

c, libcpp: Partially implement C2Y N3353 paper [PR117028]

The following patch partially implements the N3353 paper.
In particular, it adds support for the delimited escape sequences
(\u{123}, \x{123}, \o{123}) which were added already for C++23,
all I had to do is split the delimited escape sequence guarding from
named universal character escape sequence guards
(\N{LATIN CAPITAL LETTER C WITH CARON}), which C++23 has but C2Y doesn't
and emit different diagnostics for C from C++ for the delimited escape
sequences.
And it adds support for the new style of octal literals, 0o137 or 0O1777.
I have so far added that just for C and not C++, because I have no idea
whether C++ will want to handle it similarly.

What the patch doesn't do is any kind of diagnostics for obsoletion of
\137 or 0137, as discussed in the PR, I think it is way too early for that.
Perhaps some non-default warning later on.

2024-10-17  Jakub Jelinek  

PR c/117028
libcpp/
* include/cpplib.h (struct cpp_options): Add named_uc_escape_seqs,
octal_constants and cpp_warn_c23_c2y_compat members.
(enum cpp_warning_reason): Add CPP_W_C23_C2Y_COMPAT enumerator.
* init.cc (struct lang_flags): Add named_uc_escape_seqs and
octal_constants bit-fields.
(lang_defaults): Add initializers for them into the table.
(cpp_set_lang): Initialize named_uc_escape_seqs and octal_constants.
(cpp_create_reader): Initialize cpp_warn_c23_c2y_compat to -1.
* charset.cc (_cpp_valid_ucn): Test
CPP_OPTION (pfile, named_uc_escape_seqs) rather than
CPP_OPTION (pfile, delimited_escape_seqs) in \N{} related tests.
Change wording of C cpp_pedwarning for \u{} and emit
-Wc23-c2y-compat warning for it too if needed.  Formatting fixes.
(convert_hex): Change wording of C cpp_pedwarning for \u{} and emit
-Wc23-c2y-compat warning for it too if needed.
(convert_oct): Likewise.
* expr.cc (cpp_classify_number): Handle C2Y 0o or 0O prefixed
octal constants.
(cpp_interpret_integer): Likewise.
gcc/c-family/
* c.opt (Wc23-c2y-compat): Add CPP and CppReason parameters.
* c-opts.cc (set_std_c2y): Use CLK_STDC2Y or CLK_GNUC2Y rather
than CLK_STDC23 and CLK_GNUC23.  Formatting fix.
* c-lex.cc (interpret_integer): Handle C2Y 0o or 0O prefixed
and wb/WB/uwb/UWB suffixed octal constants.
gcc/testsuite/
* gcc.dg/bitint-112.c: New test.
* gcc.dg/c23-digit-separators-1.c: Add _Static_assert for
valid binary constant with digit separator.
* gcc.dg/c23-octal-constants-1.c: New test.
* gcc.dg/c23-octal-constants-2.c: New test.
* gcc.dg/c2y-digit-separators-1.c: New test.
* gcc.dg/c2y-digit-separators-2.c: New test.
* gcc.dg/c2y-octal-constants-1.c: New test.
* gcc.dg/c2y-octal-constants-2.c: New test.
* gcc.dg/c2y-octal-constants-3.c: New test.
* gcc.dg/cpp/c23-delimited-escape-seq-1.c: New test.
* gcc.dg/cpp/c23-delimited-escape-seq-2.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-1.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-2.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-3.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-4.c: New test.
* gcc.dg/octal-constants-1.c: New test.
* gcc.dg/octal-constants-2.c: New test.
* gcc.dg/octal-constants-3.c: New test.
* gcc.dg/octal-constants-4.c: New test.
* gcc.dg/system-octal-constants-1.c: New test.
* gcc.dg/system-octal-constants-1.h: New file.

Diff:
---
 gcc/c-family/c-lex.cc  |   4 +
 gcc/c-family/c-opts.cc |   2 +-
 gcc/c-family/c.opt |   2 +-
 gcc/testsuite/gcc.dg/bitint-112.c  |   5 +
 gcc/testsuite/gcc.dg/c23-digit-separators-1.c  |   1 +
 gcc/testsuite/gcc.dg/c23-octal-constants-1.c   |  11 +
 gcc/testsuite/gcc.dg/c23-octal-constants-2.c   |  11 +
 gcc/testsuite/gcc.dg/c2y-digit-separators-1.c  |   6 +
 gcc/testsuite/gcc.dg/c2y-digit-separators-2.c  |  11 +
 gcc/testsuite/gcc.dg/c2y-octal-constants-1.c   |   5 +
 gcc/testsuite/gcc.dg/c2y-octal-constants-2.c   |  11 +
 gcc/testsuite/gcc.dg/c2y-octal-constants-3.c   |   9 +
 .../gcc.dg/cpp/c23-delimited-escape-seq-1.c|  87 ++
 .../gcc.dg/cpp/c23-delimited-escape-seq-2.c|  87 ++
 .../gcc.dg/cpp/c2y-delimite

[gcc r15-4397] Support andn_optab for x86

2024-10-16 Thread Lili Cui via Gcc-cvs

https://gcc.gnu.org/g:70f59d2a1c51bde085d8fc7df002918851e76c9c

commit r15-4397-g70f59d2a1c51bde085d8fc7df002918851e76c9c
Author: Cui, Lili 
Date:   Thu Oct 17 08:50:38 2024 +0800

Support andn_optab for x86

Add new andn pattern to match the new optab added by
r15-1890-gf379596e0ba99d. Only enable 64bit, 128bit and
256bit vector ANDN, X86-64 has mask mov instruction when
avx512 is enabled.

gcc/ChangeLog:

* config/i386/sse.md (andn3): New.
* config/i386/mmx.md (andn3): New.

gcc/testsuite/ChangeLog:

* g++.target/i386/vect-cmp.C: New test.

Diff:
---
 gcc/config/i386/mmx.md   |  7 +++
 gcc/config/i386/sse.md   |  7 +++
 gcc/testsuite/g++.target/i386/vect-cmp.C | 23 +++
 3 files changed, 37 insertions(+)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 9d2a82c598e5..ef4ed8b501a1 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -4467,6 +4467,13 @@
   operands[0] = lowpart_subreg (V16QImode, operands[0], mode);
 })
 
+(define_expand "andn3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+(and:MMXMODEI
+  (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand"))
+  (match_operand:MMXMODEI 2 "register_operand")))]
+  "TARGET_SSE2")
+
 (define_insn "mmx_andnot3"
   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,x,v")
(and:MMXMODEI
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a45b50ad7324..7be313346677 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18438,6 +18438,13 @@
  (match_operand:VI_AVX2 2 "vector_operand")))]
   "TARGET_SSE2")
 
+(define_expand "andn3"
+  [(set (match_operand:VI 0 "register_operand")
+   (and:VI
+ (not:VI (match_operand:VI 2 "register_operand"))
+ (match_operand:VI 1 "register_operand")))]
+  "TARGET_SSE2")
+
 (define_expand "_andnot3_mask"
   [(set (match_operand:VI48_AVX512VL 0 "register_operand")
(vec_merge:VI48_AVX512VL
diff --git a/gcc/testsuite/g++.target/i386/vect-cmp.C 
b/gcc/testsuite/g++.target/i386/vect-cmp.C
new file mode 100644
index ..c154474fa51c
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/vect-cmp.C
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v3 -fdump-tree-optimized" } */
+
+#define vect8 __attribute__((vector_size(8) ))
+#define vect16 __attribute__((vector_size(16) ))
+#define vect32 __attribute__((vector_size(32) ))
+
+vect8 int bar0 (vect8 float a, vect8 float b, vect8 int c)
+{
+  return (a > b) ? 0 : c;
+}
+
+vect16 int bar1 (vect16 float a, vect16 float b, vect16 int c)
+{
+  return (a > b) ? 0 : c;
+}
+
+vect32 int bar2 (vect32 float a, vect32 float b, vect32 int c)
+{
+  return (a > b) ? 0 : c;
+}
+
+/* { dg-final { scan-tree-dump-times ".BIT_ANDN " 3 "optimized" { target { ! 
ia32 } } } } */

[gcc(refs/users/meissner/heads/work181-sha)] Initial support for adding xxeval fusion support.

2024-10-16 Thread Michael Meissner via Gcc-cvs

https://gcc.gnu.org/g:6b1b02ef622788dd9beefa6f9bc4f845e971ffb6

commit 6b1b02ef622788dd9beefa6f9bc4f845e971ffb6
Author: Michael Meissner 
Date:   Wed Oct 16 19:21:19 2024 -0400

Initial support for adding xxeval fusion support.

2024-10-16  Michael Meissner  

gcc/

* config/rs6000/fusion.md: Regenerate.
* config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to
generate vector/vector logical fusion if XXEVAL supports the fusion.
* config/rs6000/predicates.md (vector_fusion_operand): New 
predicate.
* config/rs6000/rs6000.cc (rs6000_opt_vars): Add -mxxeval.
* config/rs6000/rs6000.md (isa attribute): Add xxeval.
(enabled attribute): Add support for -mxxeval.
* config/rs6000/rs6000.opt (-mxxeval): New switch.

Diff:
---
 gcc/config/rs6000/fusion.md | 782 +++-
 gcc/config/rs6000/genfusion.pl  | 104 +-
 gcc/config/rs6000/predicates.md |  14 +-
 gcc/config/rs6000/rs6000.cc |   3 +
 gcc/config/rs6000/rs6000.md |   7 +-
 gcc/config/rs6000/rs6000.opt|   4 +
 6 files changed, 586 insertions(+), 328 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index 4ed9ae1d69f4..724e4692d101 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -1871,146 +1871,170 @@
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vand -> vand
 (define_insn "*fuse_vand_vand"
-  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
-(and:VM (and:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v")
-  (match_operand:VM 1 "altivec_register_operand" 
"%v,v,v,v"))
- (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
-   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
+  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,v,wa")
+(and:VM (and:VM (match_operand:VM 0 "vector_fusion_operand" 
"v,v,v,v,wa")
+  (match_operand:VM 1 "vector_fusion_operand" 
"%v,v,v,v,wa"))
+ (match_operand:VM 2 "vector_fusion_operand" "v,v,v,v,wa")))
+   (clobber (match_scratch:VM 4 "=X,X,X,&v,X"))]
   "(TARGET_P10_FUSION)"
   "@
vand %3,%1,%0\;vand %3,%3,%2
vand %3,%1,%0\;vand %3,%3,%2
vand %3,%1,%0\;vand %3,%3,%2
-   vand %4,%1,%0\;vand %3,%4,%2"
+   vand %4,%1,%0\;vand %3,%4,%2
+   xxeval %x3,%x2,%x1,%x0,1"
   [(set_attr "type" "fused_vector")
(set_attr "cost" "6")
-   (set_attr "length" "8")])
+   (set_attr "length" "8")
+   (set_attr "prefixed" "*,*,*,*,yes")
+   (set_attr "isa" "*,*,*,*,xxeval")])
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vandc -> vand
 (define_insn "*fuse_vandc_vand"
-  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
-(and:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v"))
-  (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
- (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
-   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
+  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,v,wa")
+(and:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" 
"v,v,v,v,wa"))
+  (match_operand:VM 1 "vector_fusion_operand" 
"v,v,v,v,wa"))
+ (match_operand:VM 2 "vector_fusion_operand" "v,v,v,v,wa")))
+   (clobber (match_scratch:VM 4 "=X,X,X,&v,X"))]
   "(TARGET_P10_FUSION)"
   "@
vandc %3,%1,%0\;vand %3,%3,%2
vandc %3,%1,%0\;vand %3,%3,%2
vandc %3,%1,%0\;vand %3,%3,%2
-   vandc %4,%1,%0\;vand %3,%4,%2"
+   vandc %4,%1,%0\;vand %3,%4,%2
+   xxeval %x3,%x2,%x1,%x0,2"
   [(set_attr "type" "fused_vector")
(set_attr "cost" "6")
-   (set_attr "length" "8")])
+   (set_attr "length" "8")
+   (set_attr "prefixed" "*,*,*,*,yes")
+   (set_attr "isa" "*,*,*,*,xxeval")])
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector veqv -> vand
 (define_insn "*fuse_veqv_vand"
-  [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
-(and:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v")
-  (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v")))
- (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
-   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
+  [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,v,wa")
+(and:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" 
"v,v,v,v,wa")
+  (match_operand:VM 1 "vector_fusion_operand" 
"v,v,v,v,wa")))
+ (match_operand:VM 2 "vector_fusion_operand" "v,v,v,v,wa")))
+   (clobber (match_scratch:VM 4 "=X,X,X,&v,X"))]
   "(TARGET_P10_FUSION)"
   "@
veqv %3,%1,%0\;vand %3,%3,%2
veqv %3,%1,%0\;vand %3,%3,%2

[gcc(refs/users/meissner/heads/work181-sha)] Update ChangeLog.*

2024-10-16 Thread Michael Meissner via Gcc-cvs

https://gcc.gnu.org/g:dfd150eac2b5d1f07d2b3a1368248cc39360508f

commit dfd150eac2b5d1f07d2b3a1368248cc39360508f
Author: Michael Meissner 
Date:   Wed Oct 16 19:22:36 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.sha | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/ChangeLog.sha b/gcc/ChangeLog.sha
index 0727f25e522f..4baa72680749 100644
--- a/gcc/ChangeLog.sha
+++ b/gcc/ChangeLog.sha
@@ -6,15 +6,17 @@ Initial support for adding xxeval fusion support.
 
 gcc/
 
-   * config/rs6000/fusion.md (fuse_vandc_xor_noxxeval): Rename from
-   fuse_vandc_xor, and restrict the case to non-xxeval support.
-   (fuse_vandc_vxor_xxeval): New insn.
-   (fuse_vxor_vxor_noxxeval): Rename from fuse_vxor_xor, and restrict the
-   case to non-xxeval support.
-   (fuse_vxor_vxor_xxeval): New insn.
+   * config/rs6000/fusion.md: Regenerate.
+   * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to
+   generate vector/vector logical fusion if XXEVAL supports the fusion.
+   * config/rs6000/predicates.md (vector_fusion_operand): New predicate.
* config/rs6000/rs6000.cc (rs6000_opt_vars): Add -mxxeval.
+   * config/rs6000/rs6000.md (isa attribute): Add xxeval.
+   (enabled attribute): Add support for -mxxeval.
* config/rs6000/rs6000.opt (-mxxeval): New switch.
 
+ Branch work181-sha, patch #400 was reverted 

+
  Branch work181-sha, baseline 
 
 Add ChangeLog.sha and update REVISION.

[gcc r15-4401] i386: Fix scalar VCOMSBF16 which only compares low word

2024-10-16 Thread Kong Lingling via Gcc-cvs

https://gcc.gnu.org/g:2d8c3a26dca8912147e34e3a496297138c9261d8

commit r15-4401-g2d8c3a26dca8912147e34e3a496297138c9261d8
Author: Lingling Kong 
Date:   Thu Oct 17 10:42:44 2024 +0800

i386: Fix scalar VCOMSBF16 which only compares low word

gcc/ChangeLog:

* config/i386/sse.md(avx10_2_comsbf16_v8bf): Fixed scalar
operands.

Diff:
---
 gcc/config/i386/sse.md | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 59b826cba015..685bce3094ab 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -32366,8 +32366,12 @@
 (define_insn "avx10_2_comsbf16_v8bf"
   [(set (reg:CCFP FLAGS_REG)
(unspec:CCFP
- [(match_operand:V8BF 0 "register_operand" "v")
-  (match_operand:V8BF 1 "nonimmediate_operand" "vm")]
+ [(vec_select:BF
+(match_operand:V8BF 0 "register_operand" "v")
+(parallel [(const_int 0)]))
+  (vec_select:BF
+(match_operand:V8BF 1 "nonimmediate_operand" "vm")
+(parallel [(const_int 0)]))]
 UNSPEC_VCOMSBF16))]
   "TARGET_AVX10_2_256"
   "vcomsbf16\t{%1, %0|%0, %1}"

[gcc r15-4394] arm: [MVE intrinsics] fix vdup iterator

2024-10-16 Thread Christophe Lyon via Gcc-cvs

https://gcc.gnu.org/g:79dae32843854dacfff22f059a71b5a657d7c96f

commit r15-4394-g79dae32843854dacfff22f059a71b5a657d7c96f
Author: Christophe Lyon 
Date:   Mon Jul 8 14:56:16 2024 +0200

arm: [MVE intrinsics] fix vdup iterator

This patch fixes a bug where the mode iterator for mve_vdup
should be MVE_VLD_ST instead of MVE_vecs: V2DI and V2DF (thus vdup.64)
are not supported by MVE.

2024-07-02  Jolen Li  
Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vdup): Fix mode iterator.

Diff:
---
 gcc/config/arm/mve.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 706a45c7d665..3f01bc1f4fc7 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -95,8 +95,8 @@
(set_attr "neg_pool_range" "*,*,*,*,996,*,*,*")])
 
 (define_insn "mve_vdup"
-  [(set (match_operand:MVE_vecs 0 "s_register_operand" "=w")
-   (vec_duplicate:MVE_vecs
+  [(set (match_operand:MVE_VLD_ST 0 "s_register_operand" "=w")
+   (vec_duplicate:MVE_VLD_ST
  (match_operand: 1 "s_register_operand" "r")))]
   "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
   "vdup.\t%q0, %1"

[gcc r15-4395] arm: [MVE intrinsics] Improve vdupq_n implementation

2024-10-16 Thread Christophe Lyon via Gcc-cvs

https://gcc.gnu.org/g:74caf97572d84c7c4503d10773e0f8e8544c50d9

commit r15-4395-g74caf97572d84c7c4503d10773e0f8e8544c50d9
Author: Christophe Lyon 
Date:   Tue Jun 25 15:47:23 2024 +0200

arm: [MVE intrinsics] Improve vdupq_n implementation

This patch makes the non-predicated vdupq_n MVE intrinsics use
vec_duplicate rather than an unspec.  This enables the compiler to
generate better code sequences (for instance using vmov when
possible).

The patch renames the existing mve_vdup pattern into
@mve_vdupq_n, and removes the now useless
@mve_q_n_f and @mve_q_n_ ones.

As a side-effect, it needs to update the mve_unpredicated_insn
predicates in @mve_q_m_n_ and
@mve_q_m_n_f.

Using vec_duplicates means the compiler is now able to use vmov in the
tests with an immediate argument in vdupq_n_[su]{8,16,32}.c:
vmov.i8 q0,#0x1

However, this is only possible when the immediate has a suitable value
(MVE encoding constraints, see imm_for_neon_mov_operand predicate).

Provided we adjust the cost computations in arm_rtx_costs_internal(),
when the immediate does not meet the vmov constraints, we now generate:
mov r0, #imm
vdup.xx q0,r0

or
ldr r0, .L4
vdup.32 q0,r0
in the f32 case (with 1.1 as immediate).

Without the cost adjustment, we would generate:
vldr.64 d0, .L4
vldr.64 d1, .L4+8
and an associated literal pool entry.

Regarding the testsuite updates:

* The signed versions of vdupq_* tests lack a version with an
immediate argument.  This patch adds them, similar to what we already
have for vdupq_n_u*.c tests.

* Code generation for different immediate values is checked with the
new tests this patch introduces.  Note there's no need for s8/u8 tests
because 8-bit immediates always comply wth imm_for_neon_mov_operand.

* We can remove xfail from vcmp*f tests since we now generate:
movw r3, #15462
vcmp.f16 eq, q0, r3
instead of the previous:
vldr.64 d6, .L5
vldr.64 d7, .L5+8
vcmp.f16 eq, q0, q3

Tested on arm-linux-gnueabihf and arm-none-eabi with no regression.

2024-07-02  Jolen Li  
Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class.
(vdupq): Use new implementation.
* config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode
for COST_DOUBLE. Update costing for CONST_VECTOR.
* config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s
and vdupq_n_u into vdupq_n.
* config/arm/mve.md (mve_vdup): Rename into ...
(@mve_vdup_n): ... this.
(@mve_q_n_f): Delete.
(@mve_q_n_): Delete..
(@mve_q_m_n_): Update mve_unpredicated_insn
attribute.
(@mve_q_m_n_f): Likewise.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Remove xfail.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq

[gcc r15-4396] tree-object-size: use size_for_offset in more cases

2024-10-16 Thread Siddhesh Poyarekar via Gcc-cvs

https://gcc.gnu.org/g:72ae35bbc90fea3bb0084187896b783c1451fd22

commit r15-4396-g72ae35bbc90fea3bb0084187896b783c1451fd22
Author: Siddhesh Poyarekar 
Date:   Tue Sep 17 18:32:52 2024 -0400

tree-object-size: use size_for_offset in more cases

When wholesize != size, there is a reasonable opportunity for static
object sizes also to be computed using size_for_offset, so use that.

gcc/ChangeLog:

* tree-object-size.cc (plus_stmt_object_size): Call
SIZE_FOR_OFFSET for some negative offset cases.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-3.c (test9): Adjust test.
* gcc.dg/builtin-object-size-4.c (test8): Likewise.

Signed-off-by: Siddhesh Poyarekar 

Diff:
---
 gcc/testsuite/gcc.dg/builtin-object-size-3.c | 6 +++---
 gcc/testsuite/gcc.dg/builtin-object-size-4.c | 6 +++---
 gcc/tree-object-size.cc  | 1 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-3.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
index 3f58da3d500c..ec2c62c96401 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-3.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
@@ -574,7 +574,7 @@ test9 (unsigned cond)
   if (__builtin_object_size (&p[-4], 2) != (cond ? 6 : 10))
 FAIL ();
 #else
-  if (__builtin_object_size (&p[-4], 2) != 0)
+  if (__builtin_object_size (&p[-4], 2) != 6)
 FAIL ();
 #endif
 
@@ -585,7 +585,7 @@ test9 (unsigned cond)
   if (__builtin_object_size (p, 2) != ((cond ? 2 : 6) + cond))
 FAIL ();
 #else
-  if (__builtin_object_size (p, 2) != 0)
+  if (__builtin_object_size (p, 2) != 2)
 FAIL ();
 #endif
 
@@ -598,7 +598,7 @@ test9 (unsigned cond)
   != sizeof (y) - __builtin_offsetof (struct A, c) - 8 + cond)
 FAIL ();
 #else
-  if (__builtin_object_size (p, 2) != 0)
+  if (__builtin_object_size (p, 2) != sizeof (y) - __builtin_offsetof (struct 
A, c) - 8)
 FAIL ();
 #endif
 }
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-4.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
index b3eb36efb744..7bcd24c41507 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-4.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
@@ -482,7 +482,7 @@ test8 (unsigned cond)
   if (__builtin_object_size (&p[-4], 3) != (cond ? 6 : 10))
 FAIL ();
 #else
-  if (__builtin_object_size (&p[-4], 3) != 0)
+  if (__builtin_object_size (&p[-4], 3) != 6)
 FAIL ();
 #endif
 
@@ -493,7 +493,7 @@ test8 (unsigned cond)
   if (__builtin_object_size (p, 3) != ((cond ? 2 : 6) + cond))
 FAIL ();
 #else
-  if (__builtin_object_size (p, 3) != 0)
+  if (__builtin_object_size (p, 3) != 2)
 FAIL ();
 #endif
 
@@ -505,7 +505,7 @@ test8 (unsigned cond)
   if (__builtin_object_size (p, 3) != sizeof (y.c) - 8 + cond)
 FAIL ();
 #else
-  if (__builtin_object_size (p, 3) != 0)
+  if (__builtin_object_size (p, 3) != sizeof (y.c) - 8)
 FAIL ();
 #endif
 }
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 6544730e1539..78faae7ad0d3 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -1527,6 +1527,7 @@ plus_stmt_object_size (struct object_size_info *osi, tree 
var, gimple *stmt)
   if (size_unknown_p (bytes, 0))
;
   else if ((object_size_type & OST_DYNAMIC)
+  || bytes != wholesize
   || compare_tree_int (op1, offset_limit) <= 0)
bytes = size_for_offset (bytes, op1, wholesize);
   /* In the static case, with a negative offset, the best estimate for

[gcc(refs/users/meissner/heads/work181-sha)] Revert changes

2024-10-16 Thread Michael Meissner via Gcc-cvs

https://gcc.gnu.org/g:96ba66508eef23597c4e016758fc1fcc55b4c3f9

commit 96ba66508eef23597c4e016758fc1fcc55b4c3f9
Author: Michael Meissner 
Date:   Wed Oct 16 17:51:38 2024 -0400

Revert changes

Diff:
---
 gcc/config/rs6000/fusion.md  | 44 
 gcc/config/rs6000/rs6000.cc  |  3 ---
 gcc/config/rs6000/rs6000.opt |  4 
 3 files changed, 4 insertions(+), 47 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index 5332c84681fa..4ed9ae1d69f4 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -2896,13 +2896,13 @@
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vandc -> vxor
-(define_insn "*fuse_vandc_vxor_noxxeval"
+(define_insn "*fuse_vandc_vxor"
   [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
 (xor:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v"))
   (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION && (!TARGET_XXEVAL || !TARGET_PREFIXED))"
+  "(TARGET_P10_FUSION)"
   "@
vandc %3,%1,%0\;vxor %3,%3,%2
vandc %3,%1,%0\;vxor %3,%3,%2
@@ -2912,24 +2912,6 @@
(set_attr "cost" "6")
(set_attr "length" "8")])
 
-(define_insn "*fuse_vandc_vxor_xxeval"
-  [(set (match_operand:VM 3 "vsx_register_operand" "=&0,&1,&v,v,wa")
-(xor:VM (and:VM (not:VM (match_operand:VM 0 "vsx_register_operand" 
"v,v,v,v,wa"))
-  (match_operand:VM 1 "vsx_register_operand" 
"v,v,v,v,wa"))
- (match_operand:VM 2 "vsx_register_operand" "v,v,v,v,wa")))
-   (clobber (match_scratch:VM 4 "=X,X,X,&v,X"))]
-  "(TARGET_P10_FUSION && TARGET_XXEVAL && TARGET_PREFIXED)"
-  "@
-   vandc %3,%1,%0\;vxor %3,%3,%2
-   vandc %3,%1,%0\;vxor %3,%3,%2
-   vandc %3,%1,%0\;vxor %3,%3,%2
-   vandc %4,%1,%0\;vxor %3,%4,%2
-   xxeval %w3,%w0,%w1,%w2,45\t\t# fuse xxlxor (%w0, xxlandc (%x1, %x2))"
-  [(set_attr "type" "fused_vector")
-   (set_attr "cost" "6")
-   (set_attr "length" "8")
-   (set_attr "prefixed" "*,*,*,*,yes")])
-
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector veqv -> vxor
 (define_insn "*fuse_veqv_vxor"
@@ -3022,13 +3004,13 @@
 
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; vector vxor -> vxor
-(define_insn "*fuse_vxor_vxor_noxxeval"
+(define_insn "*fuse_vxor_vxor"
   [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v")
 (xor:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" 
"v,v,v,v")
   (match_operand:VM 1 "altivec_register_operand" 
"%v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION && (!TARGET_XXEVAL || !TARGET_PREFIXED))"
+  "(TARGET_P10_FUSION)"
   "@
vxor %3,%1,%0\;vxor %3,%3,%2
vxor %3,%1,%0\;vxor %3,%3,%2
@@ -3038,24 +3020,6 @@
(set_attr "cost" "6")
(set_attr "length" "8")])
 
-(define_insn "*fuse_vxor_vxor_xxeval"
-  [(set (match_operand:VM 3 "vsx_register_operand" "=&0,&1,&v,v,wa")
-(xor:VM (xor:VM (match_operand:VM 0 "vsx_register_operand" 
"v,v,v,v,wa")
-  (match_operand:VM 1 "vsx_register_operand" 
"%v,v,v,v,wa"))
- (match_operand:VM 2 "vsx_register_operand" "v,v,v,v,wa")))
-   (clobber (match_scratch:VM 4 "=X,X,X,&v,X"))]
-  "(TARGET_P10_FUSION && TARGET_XXEVAL && TARGET_PREFIXED)"
-  "@
-   vxor %3,%1,%0\;vxor %3,%3,%2
-   vxor %3,%1,%0\;vxor %3,%3,%2
-   vxor %3,%1,%0\;vxor %3,%3,%2
-   vxor %4,%1,%0\;vxor %3,%4,%2
-   xxeval %x3,%x0,%x1,%x2,105\t\t# fuse xxlxor (%x0, xxlxor (%x1,%x2))"
-  [(set_attr "type" "fused_vector")
-   (set_attr "cost" "6")
-   (set_attr "length" "8")
-   (set_attr "prefixed" "*,*,*,*,yes")])
-
 ;; add-add fusion pattern generated by gen_addadd
 (define_insn "*fuse_add_add"
   [(set (match_operand:GPR 3 "gpc_reg_operand" "=&0,&1,&r,r")
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 072556b7fd7a..aa67e7256bb9 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -24668,9 +24668,6 @@ static struct rs6000_opt_var const rs6000_opt_vars[] =
   { "speculate-indirect-jumps",
 offsetof (struct gcc_options, x_rs6000_speculate_indirect_jumps),
 offsetof (struct cl_target_option, x_rs6000_speculate_indirect_jumps), },
-  { "xxeval",
-offsetof (struct gcc_options, x_TARGET_XXEVAL),
-offsetof (struct cl_target_option, x_TARGET_XXEVAL), },
 };
 
 /* Inner function to handle attribute((target("..."))) and #pragma GCC target
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 127befdcd56b..0d71dbaf2fc1 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -631,10 +631,6 @@ mieee128-consta

[gcc r15-4392] diagnostics: eliminate m_ice_handler_cb [PR116613]

2024-10-16 Thread David Malcolm via Gcc-cvs

https://gcc.gnu.org/g:d826b6389d9605944ce2261c07d2c9515992bccf

commit r15-4392-gd826b6389d9605944ce2261c07d2c9515992bccf
Author: David Malcolm 
Date:   Wed Oct 16 13:10:07 2024 -0400

diagnostics: eliminate m_ice_handler_cb [PR116613]

No functional change intended.

gcc/ChangeLog:
PR other/116613
* diagnostic-format-sarif.cc
(sarif_builder::on_report_diagnostic): Move the fnotice here from
sarif_ice_handler.
(sarif_ice_handler): Delete.
(diagnostic_output_format_init_sarif): Drop setting of ice handler
callback.
* diagnostic.cc (diagnostic_context::initialize): Likewise.
(diagnostic_context::action_after_output): Rather than call
m_ice_handler_cb, instead call finish on this context.
* diagnostic.h (ice_handler_callback_t): Delete typedef.
(diagnostic_context::set_ice_handler_callback): Delete.
(diagnostic_context::m_ice_handler_cb): Delete.

gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c: Update for
removal of ICE callback.

Signed-off-by: David Malcolm 

Diff:
---
 gcc/diagnostic-format-sarif.cc | 28 ++--
 gcc/diagnostic.cc  | 16 +---
 gcc/diagnostic.h   |  8 --
 .../gcc.dg/plugin/diagnostic_plugin_xhtml_format.c | 30 +++---
 4 files changed, 24 insertions(+), 58 deletions(-)

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 70832513b6d9..0ab2b83bff9a 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -1539,6 +1539,14 @@ sarif_builder::on_report_diagnostic (const 
diagnostic_info &diagnostic,
   if (diagnostic.kind == DK_ICE || diagnostic.kind == DK_ICE_NOBT)
 {
   m_invocation_obj->add_notification_for_ice (diagnostic, *this);
+
+  /* Print a header for the remaining output to stderr, and
+return, attempting to print the usual ICE messages to
+stderr.  Hopefully this will be helpful to the user in
+indicating what's gone wrong (also for DejaGnu, for pruning
+those messages).   */
+  fnotice (stderr, "Internal compiler error:\n");
+
   return;
 }
 
@@ -3138,23 +3146,6 @@ sarif_builder::make_artifact_content_object (const char 
*text) const
   return content_obj;
 }
 
-/* Callback for diagnostic_context::ice_handler_cb for when an ICE
-   occurs.  */
-
-static void
-sarif_ice_handler (diagnostic_context *context)
-{
-  /* Attempt to ensure that a .sarif file is written out.  */
-  diagnostic_finish (context);
-
-  /* Print a header for the remaining output to stderr, and
- return, attempting to print the usual ICE messages to
- stderr.  Hopefully this will be helpful to the user in
- indicating what's gone wrong (also for DejaGnu, for pruning
- those messages).   */
-  fnotice (stderr, "Internal compiler error:\n");
-}
-
 class sarif_output_format : public diagnostic_output_format
 {
 public:
@@ -3387,9 +3378,6 @@ diagnostic_output_format_init_sarif (diagnostic_context 
&context,
   /* Suppress normal textual path output.  */
   context.set_path_format (DPF_NONE);
 
-  /* Override callbacks.  */
-  context.set_ice_handler_callback (sarif_ice_handler);
-
   /* Don't colorize the text.  */
   pp_show_color (fmt->get_printer ()) = false;
   context.set_show_highlight_colors (false);
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 5e092cc2d475..9793df6467a7 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -284,7 +284,6 @@ diagnostic_context::initialize (int n_opts)
   m_diagnostic_groups.m_emission_count = 0;
   m_output_format = new diagnostic_text_output_format (*this);
   m_set_locations_cb = nullptr;
-  m_ice_handler_cb = nullptr;
   m_client_data_hooks = nullptr;
   m_diagrams.m_theme = nullptr;
   m_original_argv = nullptr;
@@ -782,16 +781,15 @@ diagnostic_context::action_after_output (diagnostic_t 
diag_kind)
 case DK_ICE:
 case DK_ICE_NOBT:
   {
-   /* Optional callback for attempting to handle ICEs gracefully.  */
-   if (void (*ice_handler_cb) (diagnostic_context *) = m_ice_handler_cb)
+   /* Attempt to ensure that any outputs are flushed e.g. that .sarif
+  files are written out.
+  Only do it once.  */
+   static bool finishing_due_to_ice = false;
+   if (!finishing_due_to_ice)
  {
-   /* Clear the callback, to avoid potentially re-entering
-  the routine if there's a crash within the handler.  */
-   m_ice_handler_cb = NULL;
-   ice_handler_cb (this);
+   finishing_due_to_ice = true;
+   finish ();
  }
-   /* The context might have had diagnostic_finish called on
-  it at this point.  */
 
struct backtrace_state *st

[gcc r15-4393] diagnostics: capture backtraces in SARIF notifications [PR116602]

2024-10-16 Thread David Malcolm via Gcc-cvs

https://gcc.gnu.org/g:69b2d523b1069651053cd39dc9b4810a2c7f964a

commit r15-4393-g69b2d523b1069651053cd39dc9b4810a2c7f964a
Author: David Malcolm 
Date:   Wed Oct 16 13:10:11 2024 -0400

diagnostics: capture backtraces in SARIF notifications [PR116602]

This patch makes the SARIF output's crash handler attempt to capture
a backtrace in JSON form within the notification's property bag.  The
precise format of the property is subject to change, but, for example,
in one of the test cases I got output like this:

"properties": {"gcc/backtrace": {"frames": [{"pc": "0x7f39c610a32d",
 "function": 
"pass_crash_test::execute(function*)",
 "filename": 
"/home/david/gcc-newgit/src/gcc/testsuite/gcc.dg/plugin/crash_test_plugin.c",
 "lineno": 98}]}}}],

The backtrace code is based on that in diagnostic.cc.

gcc/ChangeLog:
PR other/116602
* diagnostic-format-sarif.cc: Include "demangle.h" and
"backtrace.h".
(sarif_invocation::add_notification_for_ice): Add "backtrace"
param and pass it to ctor.
(sarif_ice_notification::sarif_ice_notification): Add "backtrace"
param and add it to property bag.
(bt_stop): New, taken from diagnostic.cc.
(struct bt_closure): New.
(bt_callback): New, adapted from diagnostic.cc.
(sarif_builder::make_stack_from_backtrace): New.
(sarif_builder::on_report_diagnostic): Attempt to get backtrace
and pass it to add_notification_for_ice.

gcc/testsuite/ChangeLog:
PR other/116602
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2_1.py: Add check
for backtrace.
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2_2.py: Likewise.

Signed-off-by: David Malcolm 

Diff:
---
 gcc/diagnostic-format-sarif.cc | 155 -
 .../plugin/crash-test-ice-in-header-sarif-2_1.py   |   7 +
 .../plugin/crash-test-ice-in-header-sarif-2_2.py   |   7 +
 3 files changed, 163 insertions(+), 6 deletions(-)

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 0ab2b83bff9a..89ac9a5424c9 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "text-range-label.h"
 #include "pretty-print-format-impl.h"
 #include "pretty-print-urlifier.h"
+#include "demangle.h"
+#include "backtrace.h"
 
 /* Forward decls.  */
 class sarif_builder;
@@ -167,7 +169,8 @@ public:
const char * const *original_argv);
 
   void add_notification_for_ice (const diagnostic_info &diagnostic,
-sarif_builder &builder);
+sarif_builder &builder,
+std::unique_ptr backtrace);
   void prepare_to_flush (sarif_builder &builder);
 
 private:
@@ -578,7 +581,8 @@ class sarif_ice_notification : public sarif_location_manager
 {
 public:
   sarif_ice_notification (const diagnostic_info &diagnostic,
- sarif_builder &builder);
+ sarif_builder &builder,
+ std::unique_ptr backtrace);
 
   void
   add_related_location (std::unique_ptr location_obj,
@@ -801,6 +805,9 @@ private:
   make_artifact_content_object (const char *text) const;
   int get_sarif_column (expanded_location exploc) const;
 
+  std::unique_ptr
+  make_stack_from_backtrace ();
+
   diagnostic_context &m_context;
   pretty_printer *m_printer;
   const line_maps *m_line_maps;
@@ -885,12 +892,15 @@ sarif_invocation::sarif_invocation (sarif_builder 
&builder,
 
 void
 sarif_invocation::add_notification_for_ice (const diagnostic_info &diagnostic,
-   sarif_builder &builder)
+   sarif_builder &builder,
+   std::unique_ptr 
backtrace)
 {
   m_success = false;
 
   auto notification
-= ::make_unique (diagnostic, builder);
+= ::make_unique (diagnostic,
+builder,
+std::move (backtrace));
 
   /* Support for related locations within a notification was added
  in SARIF 2.2; see https://github.com/oasis-tcs/sarif-spec/issues/540  */
@@ -1310,7 +1320,8 @@ sarif_location::lazily_add_relationships_array ()
 
 sarif_ice_notification::
 sarif_ice_notification (const diagnostic_info &diagnostic,
-   sarif_builder &builder)
+   sarif_builder &builder,
+   std::unique_ptr backtrace)
 {
   /* "locations" property (SARIF v2.1.0 section 3.58.4).  */
   auto locations_arr
@@ -1327,6 +1338,13 @@ sarif_ice_notificatio

[gcc r15-4400] Don't lower vpcmpu to pcmpgt since the latter is for signed comparison.

2024-10-16 Thread hongtao Liu via Gcc-cvs

https://gcc.gnu.org/g:21e2cd65add9070292313f8e12e8731d0aa2c869

commit r15-4400-g21e2cd65add9070292313f8e12e8731d0aa2c869
Author: liuhongt 
Date:   Tue Oct 8 16:18:31 2024 +0800

Don't lower vpcmpu to pcmpgt since the latter is for signed comparison.

r15-1737-gb06a108f0fbffe lower AVX512 kmask comparison to AVX2 ones,
but wrong lowered unsigned comparison to signed ones, for unsigned
comparison, only EQ/NEQ can be lowered.

The commit fix that.

gcc/ChangeLog:

PR target/116940
* config/i386/sse.md (*avx2_pcmp3_7): Change
UNSPEC_PCMP_ITER to UNSPEC_PCMP.
(*avx2_pcmp3_8): New pre_reload
define_insn_and_splitter.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr116940.c: New test.

Diff:
---
 gcc/config/i386/sse.md   | 27 ++-
 gcc/testsuite/gcc.target/i386/pr116940.c | 28 
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d8a05e223b30..59b826cba015 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18144,7 +18144,7 @@
[(match_operand:VI_128_256 3 "nonimmediate_operand")
 (match_operand:VI_128_256 4 "nonimmediate_operand")
 (match_operand:SI 5 "const_0_to_7_operand")]
-UNSPEC_PCMP_ITER)))]
+UNSPEC_PCMP)))]
   "TARGET_AVX512VL && ix86_pre_reload_split ()
  /* NE is commutative.  */
&& (INTVAL (operands[5]) == 4
@@ -18167,6 +18167,31 @@
   DONE;
 })
 
+(define_insn_and_split "*avx2_pcmp3_8"
+ [(set (match_operand:VI_128_256  0 "register_operand")
+   (vec_merge:VI_128_256
+ (match_operand:VI_128_256 1 "const0_operand")
+ (match_operand:VI_128_256 2 "vector_all_ones_operand")
+ (unspec:
+   [(match_operand:VI_128_256 3 "nonimmediate_operand")
+(match_operand:VI_128_256 4 "nonimmediate_operand")
+(match_operand:SI 5 "const_0_to_7_operand")]
+UNSPEC_UNSIGNED_PCMP)))]
+  "TARGET_AVX512VL && ix86_pre_reload_split ()
+ /* NE is commutative.  */
+   && INTVAL (operands[5]) == 4"
+
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  if (MEM_P (operands[3]))
+operands[3] = force_reg (mode, operands[3]);
+  emit_move_insn (operands[0], gen_rtx_fmt_ee (EQ, mode,
+  operands[3], operands[4]));
+  DONE;
+})
+
 (define_expand "_eq3"
   [(set (match_operand: 0 "register_operand")
(unspec:
diff --git a/gcc/testsuite/gcc.target/i386/pr116940.c 
b/gcc/testsuite/gcc.target/i386/pr116940.c
new file mode 100644
index ..721596bb8bf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116940.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#include "avx512f-helper.h"
+
+typedef __attribute__((__vector_size__ (16))) unsigned V;
+
+short s;
+
+V
+foo ()
+{
+  return ~(-(V){ 0, 0, 0, 1 } <= s);
+}
+
+void
+test_128 ()
+{
+  V x = foo ();
+  if (x[0] != 0 || x[1] != 0 || x[2] != 0 || x[3] != 0x)
+__builtin_abort();
+}
+
+void
+test_256 ()
+{}

[gcc r15-4399] Canonicalize (vec_merge (fma: op2 op1 op3) (match_dup 1)) mask) to (vec_merge (fma: op1 op2 op3) (ma

2024-10-16 Thread hongtao Liu via Gcc-cvs

https://gcc.gnu.org/g:edf4db8355dead3413bad64f6a89bae82dabd0ad

commit r15-4399-gedf4db8355dead3413bad64f6a89bae82dabd0ad
Author: liuhongt 
Date:   Mon Oct 14 13:09:59 2024 +0800

Canonicalize (vec_merge (fma: op2 op1 op3) (match_dup 1)) mask) to 
(vec_merge (fma: op1 op2 op3) (match_dup 1)) mask)

For masked FMA, there're 2 forms of RTL representation
1) (vec_merge (fma: op2 op1 op3) op1) mask)
2) (vec_merge (fma: op1 op2 op3) op1) mask)
It's because op1 op2 are communatative in RTL(the second op1 is
written as (match_dup 1))

we once tried to replace (match_dup 1)
with (match_operand:VFH_AVX512VL 5 "nonimmediate_operand" "0,0")), but
trigger an ICE in reload(reload can handle at most one operand with
"0" constraint).

So the patch do the canonicalizaton for the backend part.

gcc/ChangeLog:

PR target/117072
* config/i386/sse.md (_fmadd__mask):
Relax predicates of fma operands from register_operand to
nonimmediate_operand.
(_fmadd__mask3): Ditto.
(_fmsub__mask): Ditto.
(_fmsub__mask3): Ditto.
(_fnmadd__mask): Ditto.
(_fnmadd__mask3): Ditto.
(_fnmsub__mask): Ditto.
(_fnmsub__mask3): Ditto.
(_fmaddsub__mask3): Ditto.
(_fmsubadd__mask): Ditto.
(_fmsubadd__mask3): Ditto.
(avx512f_vmfmadd__mask): Ditto.
(avx512f_vmfmadd__mask3): Ditto.
(avx512f_vmfmadd__maskz_1): Ditto.
(*avx512f_vmfmsub__mask): Ditto.
(avx512f_vmfmsub__mask3): Ditto.
(*avx512f_vmfmsub__maskz_1): Ditto.
(avx512f_vmfnmadd__mask): Ditto.
(avx512f_vmfnmadd__mask3): Ditto.
(avx512f_vmfnmadd__maskz_1): Ditto.
(*avx512f_vmfnmsub__mask): Ditto.
(*avx512f_vmfnmsub__mask3): Ditto.
(*avx512f_vmfnmsub__maskz_1): Ditto.
(avx10_2_fmaddnepbf16__mask3): Ditto.
(avx10_2_fnmaddnepbf16__mask3): Ditto.
(avx10_2_fmsubnepbf16__mask3): Ditto.
(avx10_2_fnmsubnepbf16__mask3): Ditto.
(fmai_vmfmadd_): Swap operands[1] and operands[2].
(fmai_vmfmsub_): Ditto.
(fmai_vmfnmadd_): Ditto.
(fmai_vmfnmsub_): Ditto.
(*fmai_fmadd_): Swap operands[1] and operands[2] adjust
operands[1] predicates from register_operand to
nonimmediate_operand.
(*fmai_fmsub_): Ditto.
(*fmai_fnmadd_): Ditto.
(*fmai_fnmsub_): Ditto.

Diff:
---
 gcc/config/i386/sse.md | 86 +-
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7be313346677..d8a05e223b30 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5895,7 +5895,7 @@
   [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
(vec_merge:VFH_AVX512VL
  (fma:VFH_AVX512VL
-   (match_operand:VFH_AVX512VL 1 "register_operand" "0,0")
+   (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
(match_operand:VFH_AVX512VL 2 "" 
",v")
(match_operand:VFH_AVX512VL 3 "" 
"v,"))
  (match_dup 1)
@@ -5914,7 +5914,7 @@
  (fma:VFH_AVX512VL
(match_operand:VFH_AVX512VL 1 "" "%v")
(match_operand:VFH_AVX512VL 2 "" 
"")
-   (match_operand:VFH_AVX512VL 3 "register_operand" "0"))
+   (match_operand:VFH_AVX512VL 3 "nonimmediate_operand" "0"))
  (match_dup 3)
  (match_operand: 4 "register_operand" "Yk")))]
   "TARGET_AVX512F && "
@@ -5999,7 +5999,7 @@
   [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
(vec_merge:VFH_AVX512VL
  (fma:VFH_AVX512VL
-   (match_operand:VFH_AVX512VL 1 "register_operand" "0,0")
+   (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
(match_operand:VFH_AVX512VL 2 "" 
",v")
(neg:VFH_AVX512VL
  (match_operand:VFH_AVX512VL 3 "" 
"v,")))
@@ -6020,7 +6020,7 @@
(match_operand:VFH_AVX512VL 1 "" "%v")
(match_operand:VFH_AVX512VL 2 "" 
"")
(neg:VFH_AVX512VL
- (match_operand:VFH_AVX512VL 3 "register_operand" "0")))
+ (match_operand:VFH_AVX512VL 3 "nonimmediate_operand" "0")))
  (match_dup 3)
  (match_operand: 4 "register_operand" "Yk")))]
   "TARGET_AVX512F && "
@@ -6106,7 +6106,7 @@
(vec_merge:VFH_AVX512VL
  (fma:VFH_AVX512VL
(neg:VFH_AVX512VL
- (match_operand:VFH_AVX512VL 1 "register_operand" "0,0"))
+ (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0"))
(match_operand:VFH_AVX512VL 2 "" 
",v")
(match_operand:VFH_AVX512VL 3 "" 
"v,"))
  (match_dup 1)
@@ -6126,7 +6126,7 @@
(neg:VFH_AVX512V

[gcc r15-4398] Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask).

2024-10-16 Thread hongtao Liu via Gcc-cvs

https://gcc.gnu.org/g:330782a1b6cfe881ad884617ffab441aeb1c2b5c

commit r15-4398-g330782a1b6cfe881ad884617ffab441aeb1c2b5c
Author: liuhongt 
Date:   Mon Oct 14 17:16:13 2024 +0800

Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 
op2 op3) op1 mask).

For x86 masked fma, there're 2 rtl representations
1) (vec_merge (fma op2 op1 op3) op1 mask)
2) (vec_merge (fma op1 op2 op3) op1 mask).

 5894(define_insn "_fmadd__mask"
 5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
 5896(vec_merge:VFH_AVX512VL
 5897  (fma:VFH_AVX512VL
 5898(match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
 5899(match_operand:VFH_AVX512VL 2 "" 
",v")
 5900(match_operand:VFH_AVX512VL 3 "" 
"v,"))
 5901  (match_dup 1)
 5902  (match_operand: 4 "register_operand" 
"Yk,Yk")))]
 5903  "TARGET_AVX512F && "
 5904  "@
 5905   vfmadd132\t{%2, %3, %0%{%4%}|%0%{%4%}, 
%3, %2}
 5906   vfmadd213\t{%3, %2, %0%{%4%}|%0%{%4%}, 
%2, %3}"
 5907  [(set_attr "type" "ssemuladd")
 5908   (set_attr "prefix" "evex")
 5909   (set_attr "mode" "")])

Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
we once tried to replace it with (match_operand:M 5
"nonimmediate_operand" "0")) to enable more flexibility for pattern
match and recog, but it triggered an ICE in reload(reload can handle
at most one perand with "0" constraint).

So we need either add 2 patterns in the backend or just do the
canonicalization in the middle-end.

gcc/ChangeLog:

PR middle-end/117072
* combine.cc (maybe_swap_commutative_operands):
Canonicalize (vec_merge (fma op2 op1 op3) op1 mask)
to (vec_merge (fma op1 op2 op3) op1 mask).

Diff:
---
 gcc/combine.cc | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index fef06a6cdc08..3400dfebd848 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -5656,6 +5656,31 @@ maybe_swap_commutative_operands (rtx x)
   SUBST (XEXP (x, 1), temp);
 }
 
+  /* Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to
+ (vec_merge (fma op1 op2 op3) op1 mask).  */
+  if (GET_CODE (x) == VEC_MERGE
+  && GET_CODE (XEXP (x, 0)) == FMA)
+{
+  rtx fma_op1 = XEXP (XEXP (x, 0), 0);
+  rtx fma_op2 = XEXP (XEXP (x, 0), 1);
+  rtx masked_op = XEXP (x, 1);
+  if (rtx_equal_p (masked_op, fma_op2))
+   {
+ if (GET_CODE (fma_op1) == NEG)
+   {
+ /* Keep the negate canonicalized to the first operand.  */
+ fma_op1 = XEXP (fma_op1, 0);
+ SUBST (XEXP (XEXP (XEXP (x, 0), 0), 0), fma_op2);
+ SUBST (XEXP (XEXP (x, 0), 1), fma_op1);
+   }
+ else
+   {
+ SUBST (XEXP (XEXP (x, 0), 0), fma_op2);
+ SUBST (XEXP (XEXP (x, 0), 1), fma_op1);
+   }
+   }
+}
+
   unsigned n_elts = 0;
   if (GET_CODE (x) == VEC_MERGE
   && CONST_INT_P (XEXP (x, 2))

48 matches

Mail list logo