[gcc r15-3395] SVE intrinsics: Refactor const_binop to allow constant folding of intrinsics.

2024-09-03 Thread Jennifer Schmitz via Gcc-cvs
https://gcc.gnu.org/g:87217bea3aa556779a111cec0ef45dcefd1736f6

commit r15-3395-g87217bea3aa556779a111cec0ef45dcefd1736f6
Author: Jennifer Schmitz 
Date:   Fri Aug 30 06:56:52 2024 -0700

SVE intrinsics: Refactor const_binop to allow constant folding of 
intrinsics.

This patch sets the stage for constant folding of binary operations for SVE
intrinsics:
In fold-const.cc, the code for folding vector constants was moved from
const_binop to a new function vector_const_binop. This function takes a
function pointer as argument specifying how to fold the vector elements.
The intention is to call vector_const_binop from the backend with an
aarch64-specific callback function.
The code in const_binop for folding operations where the first operand is a
vector constant and the second argument is an integer constant was also 
moved
into vector_const_binop to to allow folding of binary SVE intrinsics where
the second operand is an integer (_n).
To allow calling poly_int_binop from the backend, the latter was made 
public.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* fold-const.h: Declare vector_const_binop.
* fold-const.cc (const_binop): Remove cases for vector constants.
(vector_const_binop): New function that folds vector constants
element-wise.
(int_const_binop): Remove call to wide_int_binop.
(poly_int_binop): Add call to wide_int_binop.

Diff:
---
 gcc/fold-const.cc | 189 +-
 gcc/fold-const.h  |   5 ++
 2 files changed, 105 insertions(+), 89 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 81dcc13925a7..2ada59f712bb 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1236,13 +1236,24 @@ can_min_p (const_tree arg1, const_tree arg2, 
poly_wide_int &res)
produce a new constant in RES.  Return FALSE if we don't know how
to evaluate CODE at compile-time.  */
 
-static bool
+bool
 poly_int_binop (poly_wide_int &res, enum tree_code code,
const_tree arg1, const_tree arg2,
signop sign, wi::overflow_type *overflow)
 {
   gcc_assert (NUM_POLY_INT_COEFFS != 1);
   gcc_assert (poly_int_tree_p (arg1) && poly_int_tree_p (arg2));
+
+  if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST)
+{
+  wide_int warg1 = wi::to_wide (arg1), wi_res;
+  wide_int warg2 = wi::to_wide (arg2, TYPE_PRECISION (TREE_TYPE (arg1)));
+  if (!wide_int_binop (wi_res, code, warg1, warg2, sign, overflow))
+   return NULL_TREE;
+  res = wi_res;
+  return true;
+}
+
   switch (code)
 {
 case PLUS_EXPR:
@@ -1304,17 +1315,9 @@ int_const_binop (enum tree_code code, const_tree arg1, 
const_tree arg2,
   signop sign = TYPE_SIGN (type);
   wi::overflow_type overflow = wi::OVF_NONE;
 
-  if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST)
-{
-  wide_int warg1 = wi::to_wide (arg1), res;
-  wide_int warg2 = wi::to_wide (arg2, TYPE_PRECISION (type));
-  if (!wide_int_binop (res, code, warg1, warg2, sign, &overflow))
-   return NULL_TREE;
-  poly_res = res;
-}
-  else if (!poly_int_tree_p (arg1)
-  || !poly_int_tree_p (arg2)
-  || !poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
+  if (!poly_int_tree_p (arg1)
+  || !poly_int_tree_p (arg2)
+  || !poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
 return NULL_TREE;
   return force_fit_type (type, poly_res, overflowable,
 (((sign == SIGNED || overflowable == -1)
@@ -1365,6 +1368,90 @@ simplify_const_binop (tree_code code, tree op, tree 
other_op,
   return NULL_TREE;
 }
 
+/* If ARG1 and ARG2 are constants, and if performing CODE on them would
+   be an elementwise vector operation, try to fold the operation to a
+   constant vector, using ELT_CONST_BINOP to fold each element.  Return
+   the folded value on success, otherwise return null.  */
+tree
+vector_const_binop (tree_code code, tree arg1, tree arg2,
+   tree (*elt_const_binop) (enum tree_code, tree, tree))
+{
+  if (TREE_CODE (arg1) == VECTOR_CST && TREE_CODE (arg2) == VECTOR_CST
+  && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)),
+  TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg2
+{
+  tree type = TREE_TYPE (arg1);
+  bool step_ok_p;
+  if (VECTOR_CST_STEPPED_P (arg1)
+ && VECTOR_CST_STEPPED_P (arg2))
+  /* We can operate directly on the encoding if:
+
+  a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
+  implies
+  (a3 op b3) - (a2 op b2) == (a2 op b2) - (a1 op b1)
+
+  Addition and subtraction are the supported operators
+  for which this is true.  */
+   step_ok_p = (code == PLUS_EXPR || code == MINUS_EXPR);
+  else if (VECTOR_CST_ST

[gcc r15-3396] SVE intrinsics: Fold constant operands for svdiv.

2024-09-03 Thread Jennifer Schmitz via Gcc-cvs
https://gcc.gnu.org/g:ee8b7231b03a36dfc09d94f2b663636ca2a36daf

commit r15-3396-gee8b7231b03a36dfc09d94f2b663636ca2a36daf
Author: Jennifer Schmitz 
Date:   Fri Aug 30 07:03:49 2024 -0700

SVE intrinsics: Fold constant operands for svdiv.

This patch implements constant folding for svdiv:
The new function aarch64_const_binop was created, which - in contrast to
int_const_binop - does not treat operations as overflowing. This function is
passed as callback to vector_const_binop from the new gimple_folder
method fold_const_binary, if the predicate is ptrue or predication is _x.
From svdiv_impl::fold, fold_const_binary is called with TRUNC_DIV_EXPR as
tree_code.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Try constant folding.
* config/aarch64/aarch64-sve-builtins.h: Declare
gimple_folder::fold_const_binary.
* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
New function to fold binary SVE intrinsics without overflow.
(gimple_folder::fold_const_binary): New helper function for
constant folding of SVE intrinsics.

gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_div_1.c: New test.

Diff:
---
 gcc/config/aarch64/aarch64-sve-builtins-base.cc|  11 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc |  43 +++
 gcc/config/aarch64/aarch64-sve-builtins.h  |   1 +
 .../gcc.target/aarch64/sve/const_fold_div_1.c  | 358 +
 4 files changed, 410 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index d55bee0b72fa..6c94d144dc9c 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -755,8 +755,13 @@ public:
   gimple *
   fold (gimple_folder &f) const override
   {
-tree divisor = gimple_call_arg (f.call, 2);
-tree divisor_cst = uniform_integer_cst_p (divisor);
+if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
+  return res;
+
+/* If the divisor is a uniform power of 2, fold to a shift
+   instruction.  */
+tree op2 = gimple_call_arg (f.call, 2);
+tree divisor_cst = uniform_integer_cst_p (op2);
 
 if (!divisor_cst || !integer_pow2p (divisor_cst))
   return NULL;
@@ -770,7 +775,7 @@ public:
shapes::binary_uint_opt_n, MODE_n,
f.type_suffix_ids, GROUP_none, f.pred);
call = f.redirect_call (instance);
-   tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
+   tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
   }
 else
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 5ca9ec32b691..8f9aa3cf1207 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -1132,6 +1132,30 @@ report_not_enum (location_t location, tree fndecl, 
unsigned int argno,
" a valid %qT value", actual, argno + 1, fndecl, enumtype);
 }
 
+/* Try to fold constant arguments ARG1 and ARG2 using the given tree_code.
+   Operations are not treated as overflowing.  */
+static tree
+aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
+{
+  if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
+{
+  poly_wide_int poly_res;
+  tree type = TREE_TYPE (arg1);
+  signop sign = TYPE_SIGN (type);
+  wi::overflow_type overflow = wi::OVF_NONE;
+
+  /* Return 0 for division by 0, like SDIV and UDIV do.  */
+  if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
+   return arg2;
+
+  if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
+   return NULL_TREE;
+  return force_fit_type (type, poly_res, false,
+TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2));
+}
+  return NULL_TREE;
+}
+
 /* Return a hash code for a function_instance.  */
 hashval_t
 function_instance::hash () const
@@ -3593,6 +3617,25 @@ gimple_folder::fold_to_vl_pred (unsigned int vl)
   return gimple_build_assign (lhs, builder.build ());
 }
 
+/* Try to fold the call to a constant, given that, for integers, the call
+   is roughly equivalent to binary operation CODE.  aarch64_const_binop
+   handles any differences between CODE and the intrinsic.  */
+g

[gcc r15-3397] SVE intrinsics: Fold constant operands for svmul.

2024-09-03 Thread Jennifer Schmitz via Gcc-cvs
https://gcc.gnu.org/g:6b1cf59e90d3d6391d61b2a8f77856b5aa044014

commit r15-3397-g6b1cf59e90d3d6391d61b2a8f77856b5aa044014
Author: Jennifer Schmitz 
Date:   Fri Aug 30 07:16:43 2024 -0700

SVE intrinsics: Fold constant operands for svmul.

This patch implements constant folding for svmul by calling
gimple_folder::fold_const_binary with tree_code MULT_EXPR.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svmul_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
Try constant folding.

gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_mul_1.c: New test.

Diff:
---
 gcc/config/aarch64/aarch64-sve-builtins-base.cc|  15 +-
 .../gcc.target/aarch64/sve/const_fold_mul_1.c  | 302 +
 2 files changed, 316 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 6c94d144dc9c..8f781e26cc84 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2000,6 +2000,19 @@ public:
   }
 };
 
+class svmul_impl : public rtx_code_function
+{
+public:
+  CONSTEXPR svmul_impl ()
+: rtx_code_function (MULT, MULT, UNSPEC_COND_FMUL) {}
+
+  gimple *
+  fold (gimple_folder &f) const override
+  {
+return f.fold_const_binary (MULT_EXPR);
+  }
+};
+
 class svnand_impl : public function_base
 {
 public:
@@ -3184,7 +3197,7 @@ FUNCTION (svmls_lane, svmls_lane_impl,)
 FUNCTION (svmmla, svmmla_impl,)
 FUNCTION (svmov, svmov_impl,)
 FUNCTION (svmsb, svmsb_impl,)
-FUNCTION (svmul, rtx_code_function, (MULT, MULT, UNSPEC_COND_FMUL))
+FUNCTION (svmul, svmul_impl,)
 FUNCTION (svmul_lane, CODE_FOR_MODE0 (aarch64_mul_lane),)
 FUNCTION (svmulh, unspec_based_function, (UNSPEC_SMUL_HIGHPART,
  UNSPEC_UMUL_HIGHPART, -1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_mul_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_mul_1.c
new file mode 100644
index ..6d68607b5492
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_mul_1.c
@@ -0,0 +1,302 @@
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-options "-O2" } */
+
+#include "arm_sve.h"
+
+/*
+** s64_x_pg:
+** mov z[0-9]+\.d, #15
+** ret
+*/
+svint64_t s64_x_pg (svbool_t pg)
+{
+  return svmul_x (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_0 (svbool_t pg)
+{
+  return svmul_x (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg:
+** mov z[0-9]+\.d, p[0-7]/z, #15
+** ret
+*/
+svint64_t s64_z_pg (svbool_t pg)
+{
+  return svmul_z (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_0:
+** mov z[0-9]+\.d, p[0-7]/z, #0
+** ret
+*/
+svint64_t s64_z_pg_0 (svbool_t pg)
+{
+  return svmul_z (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_m_pg:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** mul \2, p[0-7]/m, \2, \1
+** ret
+*/
+svint64_t s64_m_pg (svbool_t pg)
+{
+  return svmul_m (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_ptrue:
+** mov z[0-9]+\.d, #15
+** ret
+*/
+svint64_t s64_x_ptrue ()
+{
+  return svmul_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_ptrue:
+** mov z[0-9]+\.d, #15
+** ret
+*/
+svint64_t s64_z_ptrue ()
+{
+  return svmul_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_m_ptrue:
+** mov z[0-9]+\.d, #15
+** ret
+*/
+svint64_t s64_m_ptrue ()
+{
+  return svmul_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_n:
+** mov z[0-9]+\.d, #15
+** ret
+*/
+svint64_t s64_x_pg_n (svbool_t pg)
+{
+  return svmul_n_s64_x (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_pg_n_s64_0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
+{
+  return svmul_n_s64_x (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_z_pg_n:
+** mov z[0-9]+\.d, p[0-7]/z, #15
+** ret
+*/
+svint64_t s64_z_pg_n (svbool_t pg)
+{
+  return svmul_n_s64_z (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_pg_n_s64_0:
+** mov z[0-9]+\.d, p[0-7]/z, #0
+** ret
+*/
+svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
+{
+  return svmul_n_s64_z (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_m_pg_n:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** mul \2, p[0-7]/m, \2, \1
+** ret
+*/
+svint64_t s64_m_pg_n (svbool_t pg)
+{
+  return svmul_n_s64_m (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_ptrue_n:
+** mov z[0-9]+\.d, #15
+** ret
+*/
+svint64_t s64_x_ptrue_n ()
+{
+  return svmul_n_s64_x (svptrue_b64 (

[gcc r15-3398] ada: Fix Finalize_Storage_Only bug in b-i-p calls

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:b776b08b718feb059fed80b1de6bcf280fd6f03c

commit r15-3398-gb776b08b718feb059fed80b1de6bcf280fd6f03c
Author: Bob Duff 
Date:   Thu Aug 22 12:32:00 2024 -0400

ada: Fix Finalize_Storage_Only bug in b-i-p calls

Do not pass null for the Collection parameter when
Finalize_Storage_Only is in effect. If the collection
is null in that case, we will blow up later when we
deallocate the object.

gcc/ada/

* exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call):
Remove Finalize_Storage_Only from the code that checks whether to
pass null to the Collection parameter. Having done that, we don't
need to check for Is_Library_Level_Entity, because
No_Heap_Finalization requires that. And if we ever change
No_Heap_Finalization to allow nested access types, we will still
want to pass null. Note that the comment "Such a type lacks a
collection." is incorrect in the case of Finalize_Storage_Only;
such types have a collection.

Diff:
---
 gcc/ada/exp_ch6.adb | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index 3c87c0e8220c..c868234655ea 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -517,15 +517,11 @@ package body Exp_Ch6 is
   else
  Desig_Typ := Directly_Designated_Type (Ptr_Typ);
 
- --  Check for a library-level access type whose designated type has
- --  suppressed finalization or the access type is subject to pragma
- --  No_Heap_Finalization. Such an access type lacks a collection. Pass
- --  a null actual to callee in order to signal a missing collection.
-
- if Is_Library_Level_Entity (Ptr_Typ)
-   and then (Finalize_Storage_Only (Desig_Typ)
-  or else No_Heap_Finalization (Ptr_Typ))
- then
+ --  Check for a type that is subject to pragma No_Heap_Finalization.
+ --  Such a type lacks a collection. Pass a null actual to callee to
+ --  signal a missing collection.
+
+ if No_Heap_Finalization (Ptr_Typ) then
 Actual := Make_Null (Loc);
 
  --  Types in need of finalization actions


[gcc r15-3399] ada: Reject illegal array aggregates as per AI22-0106.

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:e083e728668c7aba698fd846767ffbd99506

commit r15-3399-ge083e728668c7aba698fd846767ffbd99506
Author: Steve Baird 
Date:   Mon Aug 19 14:58:38 2024 -0700

ada: Reject illegal array aggregates as per AI22-0106.

Implement the new legality rules of AI22-0106 which (as discussed in the AI)
are needed to disallow constructs whose semantics would otherwise be poorly
defined.

gcc/ada/

* sem_aggr.adb (Resolve_Array_Aggregate): Implement the two new
legality rules of AI11-0106. Add code to avoid cascading error
messages.

Diff:
---
 gcc/ada/sem_aggr.adb | 114 +++
 1 file changed, 97 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 8319ff5af622..63bdeca96584 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -301,7 +301,7 @@ package body Sem_Aggr is
--In addition this step analyzes and resolves each discrete_choice,
--making sure that its type is the type of the corresponding Index.
--If we are not at the lowest array aggregate level (in the case of
-   --multi-dimensional aggregates) then invoke Resolve_Array_Aggregate
+   --multidimensional aggregates) then invoke Resolve_Array_Aggregate
--recursively on each component expression. Otherwise, resolve the
--bottom level component expressions against the expected component
--type ONLY IF the component corresponds to a single discrete choice
@@ -314,7 +314,7 @@ package body Sem_Aggr is
--  3. For positional aggregates:
--
-- (A) Loop over the component expressions either recursively invoking
-   -- Resolve_Array_Aggregate on each of these for multi-dimensional
+   -- Resolve_Array_Aggregate on each of these for multidimensional
-- array aggregates or resolving the bottom level component
-- expressions against the expected component type.
--
@@ -1596,6 +1596,8 @@ package body Sem_Aggr is
   Nb_Choices : Nat := 0;
   --  Contains the overall number of named choices in this sub-aggregate
 
+  Saved_SED  : constant Nat := Serious_Errors_Detected;
+
   function Add (Val : Uint; To : Node_Id) return Node_Id;
   --  Creates a new expression node where Val is added to expression To.
   --  Tries to constant fold whenever possible. To must be an already
@@ -1968,7 +1970,7 @@ package body Sem_Aggr is
  Nxt_Ind_Constr : constant Node_Id := Next_Index (Index_Constr);
  --  Index is the current index corresponding to the expression
 
- Resolution_OK : Boolean := True;
+ Resolution_OK  : Boolean := True;
  --  Set to False if resolution of the expression failed
 
   begin
@@ -2038,6 +2040,9 @@ package body Sem_Aggr is
 Resolution_OK := Resolve_Array_Aggregate
   (Expr, Nxt_Ind, Nxt_Ind_Constr, Component_Typ, Others_Allowed);
 
+if Resolution_OK = Failure then
+   return Failure;
+end if;
  else
 --  If it's "... => <>", nothing to resolve
 
@@ -2135,10 +2140,10 @@ package body Sem_Aggr is
 
  --  Local variables
 
- Choice : Node_Id;
- Dummy  : Boolean;
- Scop   : Entity_Id;
- Expr   : constant Node_Id := Expression (N);
+ Choice : Node_Id;
+ Resolution_OK  : Boolean;
+ Scop   : Entity_Id;
+ Expr   : constant Node_Id := Expression (N);
 
   --  Start of processing for Resolve_Iterated_Component_Association
 
@@ -2208,7 +2213,11 @@ package body Sem_Aggr is
  --  rewritting as a loop with a new index variable; when not
  --  generating code we leave the analyzed expression as it is.
 
- Dummy := Resolve_Aggr_Expr (Expr, Single_Elmt => False);
+ Resolution_OK := Resolve_Aggr_Expr (Expr, Single_Elmt => False);
+
+ if not Resolution_OK then
+return;
+ end if;
 
  if Operating_Mode /= Check_Semantics then
 Remove_References (Expr);
@@ -2610,6 +2619,14 @@ package body Sem_Aggr is
  if Nkind (Assoc) = N_Iterated_Component_Association
and then Present (Iterator_Specification (Assoc))
  then
+if Number_Dimensions (Etype (N)) /= 1 then
+   Error_Msg_N ("iterated_component_association with an" &
+" iterator_specification not allowed for" &
+" multidimensional array aggregate",
+Assoc);
+   return Failure;
+end if;
+
 --  All other component associations must have an iterator spec.
 
 Next (Assoc);
@@ -2931,16 +2948,75 @@ package body Sem_Aggr is
  Get_Index_Bounds (Choice, Low, High);
   end if;
 
-  

[gcc r15-3400] ada: Do not warn for partial access to Atomic Volatile_Full_Access objects

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:d7e110d8fa18f734058e73424c398d8c69fcb6b3

commit r15-3400-gd7e110d8fa18f734058e73424c398d8c69fcb6b3
Author: Eric Botcazou 
Date:   Thu Aug 22 22:54:02 2024 +0200

ada: Do not warn for partial access to Atomic Volatile_Full_Access objects

The initial implementation of the GNAT aspect/pragma Volatile_Full_Access
made it incompatible with Atomic, because it was not decided whether the
read-modify-write sequences generated by Volatile_Full_Access would need
to be implemented atomically when Atomic was also specified, which would
have required a compare-and-swap primitive from the target architecture.

But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic
in the process, answering the above question by the negative, so the
incompatibility between Volatile_Full_Access and Atomic was lifted in
Ada 2012 as well, but the implementation was not entirely adjusted.

In Ada 2012, it does not make sense to warn for the partial access to an
Atomic object if the object is also declared Volatile_Full_Access, since
the object will be accessed as a whole in this case (like in Ada 2022).

gcc/ada/

* sem_res.adb (Is_Atomic_Ref_With_Address): Rename into...
(Is_Atomic_Non_VFA_Ref_With_Address): ...this and adjust the
implementation to exclude Volatile_Full_Access objects.
(Resolve_Indexed_Component): Adjust to above renaming.
(Resolve_Selected_Component): Likewise.

Diff:
---
 gcc/ada/sem_res.adb | 46 ++
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
index b23ca48f0498..e7fd7d62fec6 100644
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -144,10 +144,10 @@ package body Sem_Res is
--  for restriction No_Direct_Boolean_Operators. This procedure also handles
--  the style check for Style_Check_Boolean_And_Or.
 
-   function Is_Atomic_Ref_With_Address (N : Node_Id) return Boolean;
-   --  N is either an indexed component or a selected component. This function
-   --  returns true if the prefix denotes an atomic object that has an address
-   --  clause (the case in which we may want to issue a warning).
+   function Is_Atomic_Non_VFA_Ref_With_Address (N : Node_Id) return Boolean;
+   --  N is either an indexed component or a selected component. Return true
+   --  if the prefix denotes an Atomic but not Volatile_Full_Access object that
+   --  has an address clause (the case in which we may want to give a warning).
 
function Is_Definite_Access_Type (E : N_Entity_Id) return Boolean;
--  Determine whether E is an access type declared by an access declaration,
@@ -1486,28 +1486,42 @@ package body Sem_Res is
   end if;
end Check_Parameterless_Call;
 
-   
-   -- Is_Atomic_Ref_With_Address --
-   
+   
+   -- Is_Atomic_Non_VFA_Ref_With_Address --
+   
 
-   function Is_Atomic_Ref_With_Address (N : Node_Id) return Boolean is
+   function Is_Atomic_Non_VFA_Ref_With_Address (N : Node_Id) return Boolean is
   Pref : constant Node_Id := Prefix (N);
 
-   begin
-  if not Is_Entity_Name (Pref) then
- return False;
+  function Is_Atomic_Non_VFA (E : Entity_Id) return Boolean;
+  --  Return true if E is Atomic but not Volatile_Full_Access
 
-  else
+  ---
+  -- Is_Atomic_Non_VFA --
+  ---
+
+  function Is_Atomic_Non_VFA (E : Entity_Id) return Boolean is
+  begin
+ return Is_Atomic (E) and then not Is_Volatile_Full_Access (E);
+  end Is_Atomic_Non_VFA;
+
+   begin
+  if Is_Entity_Name (Pref) then
  declare
 Pent : constant Entity_Id := Entity (Pref);
 Ptyp : constant Entity_Id := Etype (Pent);
+
  begin
 return not Is_Access_Type (Ptyp)
-  and then (Is_Atomic (Ptyp) or else Is_Atomic (Pent))
+  and then (Is_Atomic_Non_VFA (Ptyp)
+ or else Is_Atomic_Non_VFA (Pent))
   and then Present (Address_Clause (Pent));
  end;
+
+  else
+ return False;
   end if;
-   end Is_Atomic_Ref_With_Address;
+   end Is_Atomic_Non_VFA_Ref_With_Address;
 
-
-- Is_Definite_Access_Type --
@@ -9658,7 +9672,7 @@ package body Sem_Res is
   --  object, or partial word accesses, both of which may be unexpected.
 
   if Nkind (N) = N_Indexed_Component
-and then Is_Atomic_Ref_With_Address (N)
+and then Is_Atomic_Non_VFA_Ref_With_Address (N)
 and then not (Has_Atomic_Components (Array_Type)
or else (Is_Entity_Name (Pref)
  and then Has_Atomic_Components
@@ -11434,7 +11448,7 

[gcc r15-3401] ada: Transform Length attribute references for non-Strict overflow mode.

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:1ef11f4bed8eb230f04e5fb09741ae6444ca3e7b

commit r15-3401-g1ef11f4bed8eb230f04e5fb09741ae6444ca3e7b
Author: Steve Baird 
Date:   Tue Aug 20 17:35:24 2024 -0700

ada: Transform Length attribute references for non-Strict overflow mode.

The non-strict overflow checking code does a better job of eliminating
overflow checks if given an expression consisting only of predefined
operators (including relationals), literals, identifiers, and conditional
expressions. If it is both feasible and useful, rewrite a
Length attribute reference as such an expression. "Feasible" means
"index type is same type as attribute reference type, so we can rewrite 
without
using type conversions". "Useful" means "Overflow_Mode is something other 
than
Strict, so there is value in making overflow check elimination easier".

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference): If it makes sense
to do so, then rewrite a Length attribute reference as an
equivalent conditional expression.

Diff:
---
 gcc/ada/exp_attr.adb | 69 +++-
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index 84c7a4bbdeeb..702c4bb120a3 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -4797,7 +4797,7 @@ package body Exp_Attr is
 --  then replace this attribute with a reference to 'Range_Length
 --  of the appropriate index subtype (since otherwise the
 --  back end will try to give us the value of 'Length for
---  this implementation type).s
+--  this implementation type).
 
 elsif Is_Constrained (Ptyp) then
Rewrite (N,
@@ -4868,6 +4868,73 @@ package body Exp_Attr is
end if;
 end;
 
+ --  Overflow-related transformations need Length attribute rewritten
+ --  using non-attribute expressions. So generate
+ --   (if Pref'First > Pref'Last
+ --then 0
+ --else ((Pref'Last - Pref'First) + 1)) .
+
+ elsif Overflow_Check_Mode in Minimized_Or_Eliminated
+
+--  This Comes_From_Source test fixes a regression test failure
+--  involving a Length attribute reference generated as part of
+--  the expansion of a concatentation operator; it is unclear
+--  whether this is the right solution to that problem.
+
+and then Comes_From_Source (N)
+
+--  This Base_Type equality test is so that we only perform this
+--  transformation if we can do it without introducing
+--  a type conversion anywhere in the resulting expansion;
+--  a type conversion is just as bad as a Length attribute
+--  reference for those overflow-related transformations.
+
+and then Btyp = Base_Type (Get_Index_Subtype (N))
+
+ then
+declare
+   function Prefix_Bound
+ (Bound_Attr_Name : Name_Id; Is_First_Copy : Boolean := False)
+ return Node_Id;
+   --  constructs a Pref'First or Pref'Last attribute reference
+
+   --
+   -- Prefix_Bound --
+   --
+
+   function Prefix_Bound
+ (Bound_Attr_Name : Name_Id; Is_First_Copy : Boolean := False)
+ return Node_Id
+   is
+  Prefix : constant Node_Id :=
+(if Is_First_Copy
+ then Duplicate_Subexpr (Pref)
+ else Duplicate_Subexpr_No_Checks (Pref));
+   begin
+  return Make_Attribute_Reference (Loc,
+   Prefix => Prefix,
+   Attribute_Name => Bound_Attr_Name,
+   Expressions=> New_Copy_List (Exprs));
+   end Prefix_Bound;
+begin
+   Rewrite (N,
+ Make_If_Expression (Loc,
+   Expressions =>
+ New_List (
+   Node1 => Make_Op_Gt (Loc,
+  Prefix_Bound (Name_First,
+Is_First_Copy => True),
+  Prefix_Bound (Name_Last)),
+   Node2 => Make_Integer_Literal (Loc, 0),
+   Node3 => Make_Op_Add (Loc,
+  Make_Op_Subtract (Loc,
+Prefix_Bound (Name_Last),
+Prefix_Bound (Name_First)),
+  Make_Integer_Literal (Loc, 1);
+
+   Analyze_And_Resolve (N, Typ);
+end;
+
  --  Otherwise leave it to the back end
 
  else


[gcc r15-3402] ada: Simplify Note_Uplevel_Bound procedure

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:b3f6a7909149a5eff2b9e2a5d28439cccd7902df

commit r15-3402-gb3f6a7909149a5eff2b9e2a5d28439cccd7902df
Author: Marc Poulhiès 
Date:   Fri Aug 9 18:08:01 2024 +0200

ada: Simplify Note_Uplevel_Bound procedure

The procedure Note_Uplevel_Bound was implemented as a custom expression
tree walk. This change replaces this custom tree traversal by a more
idiomatic use of Traverse_Proc.

gcc/ada/

* exp_unst.adb (Check_Static_Type::Note_Uplevel_Bound): Refactor
to use the generic Traverse_Proc.
(Check_Static_Type): Adjust calls to Note_Uplevel_Bound as the
previous second parameter was unused, so removed.

Diff:
---
 gcc/ada/exp_unst.adb | 169 ---
 1 file changed, 66 insertions(+), 103 deletions(-)

diff --git a/gcc/ada/exp_unst.adb b/gcc/ada/exp_unst.adb
index 7ff1ea621bbe..fb48a64ac867 100644
--- a/gcc/ada/exp_unst.adb
+++ b/gcc/ada/exp_unst.adb
@@ -507,78 +507,90 @@ package body Exp_Unst is
 is
T : constant Entity_Id := Get_Fullest_View (In_T);
 
-   procedure Note_Uplevel_Bound (N : Node_Id; Ref : Node_Id);
+   procedure Note_Uplevel_Bound (N : Node_Id);
--  N is the bound of a dynamic type. This procedure notes that
--  this bound is uplevel referenced, it can handle references
--  to entities (typically _FIRST and _LAST entities), and also
--  attribute references of the form T'name (name is typically
--  FIRST or LAST) where T is the uplevel referenced bound.
-   --  Ref, if Present, is the location of the reference to
-   --  replace.
 

-- Note_Uplevel_Bound --

 
-   procedure Note_Uplevel_Bound (N : Node_Id; Ref : Node_Id) is
-   begin
-  --  Entity name case. Make sure that the entity is declared
-  --  in a subprogram. This may not be the case for a type in a
-  --  loop appearing in a precondition.
-  --  Exclude explicitly discriminants (that can appear
-  --  in bounds of discriminated components) and enumeration
-  --  literals.
-
-  if Is_Entity_Name (N) then
- if Present (Entity (N))
-   and then not Is_Type (Entity (N))
-   and then Present (Enclosing_Subprogram (Entity (N)))
-   and then
- Ekind (Entity (N))
-   not in E_Discriminant | E_Enumeration_Literal
- then
-Note_Uplevel_Ref
-  (E  => Entity (N),
-   N  => Empty,
-   Caller => Current_Subprogram,
-   Callee => Enclosing_Subprogram (Entity (N)));
- end if;
+   procedure Note_Uplevel_Bound (N : Node_Id) is
 
-  --  Attribute or indexed component case
+  function Note_Uplevel_Bound_Trav
+(N : Node_Id) return Traverse_Result;
+  --  Tree visitor that marks entities that are uplevel
+  --  referenced.
 
-  elsif Nkind (N) in
-  N_Attribute_Reference | N_Indexed_Component
-  then
- Note_Uplevel_Bound (Prefix (N), Ref);
+  procedure Do_Note_Uplevel_Bound
+is new Traverse_Proc (Note_Uplevel_Bound_Trav);
+  --  Subtree visitor instantiation
 
- --  The indices of the indexed components, or the
- --  associated expressions of an attribute reference,
- --  may also involve uplevel references.
+  -
+  -- Note_Uplevel_Bound_Trav --
+  -
 
- declare
-Expr : Node_Id;
+  function Note_Uplevel_Bound_Trav
+(N : Node_Id) return Traverse_Result
+  is
+  begin
+ --  Entity name case. Make sure that the entity is
+ --  declared in a subprogram. This may not be the case for
+ --  a type in a loop appearing in a precondition. Exclude
+ --  explicitly discriminants (that can appear in bounds of
+ --  discriminated components), enumeration literals and
+ --  block.
+
+ if Is_Entity_Name (N) then
+if Present (Entity (N))
+  and then not Is_Type (Entity (N))
+   

[gcc r15-3403] ada: Fix internal error on pragma pack with discriminated record component

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:9ba7262c8de0a96e85cc1ad05e2c3666228c74e8

commit r15-3403-g9ba7262c8de0a96e85cc1ad05e2c3666228c74e8
Author: Eric Botcazou 
Date:   Tue Aug 20 10:34:45 2024 +0200

ada: Fix internal error on pragma pack with discriminated record component

When updating the size after making a packable type in gnat_to_gnu_field,
we fail to clear it again when it is not constant.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_field): Clear again gnu_size
after updating it if it is not constant.

Diff:
---
 gcc/ada/gcc-interface/decl.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 398e01521a33..655ba0b8a105 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -7686,6 +7686,8 @@ gnat_to_gnu_field (Entity_Id gnat_field, tree 
gnu_record_type, int packed,
  gnu_field_type = gnu_packable_type;
  if (!gnu_size)
gnu_size = rm_size (gnu_field_type);
+ if (TREE_CODE (gnu_size) != INTEGER_CST)
+   gnu_size = NULL_TREE;
}
 }


[gcc r15-3404] ada: Pass unaligned record components by copy in calls on all platforms

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:d8d191469e1e08e7b8530874cbb0f2781dc2e14d

commit r15-3404-gd8d191469e1e08e7b8530874cbb0f2781dc2e14d
Author: Eric Botcazou 
Date:   Tue Aug 20 22:59:58 2024 +0200

ada: Pass unaligned record components by copy in calls on all platforms

This has historically been done only on platforms requiring the strict
alignment of memory references, but this can arguably be considered as
being mandated by the language on all of them.

gcc/ada/

* gcc-interface/trans.cc (addressable_p) : Take into
account the alignment of the field on all platforms.

Diff:
---
 gcc/ada/gcc-interface/trans.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 3f2eadd7b2bc..7cced04361d0 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -10289,9 +10289,8 @@ addressable_p (tree gnu_expr, tree gnu_type)
   check the alignment of the containing record, as it is
   guaranteed to be not smaller than that of its most
   aligned field that is not a bit-field.  */
-   && (!STRICT_ALIGNMENT
-   || DECL_ALIGN (TREE_OPERAND (gnu_expr, 1))
-  >= TYPE_ALIGN (TREE_TYPE (gnu_expr
+   && DECL_ALIGN (TREE_OPERAND (gnu_expr, 1))
+  >= TYPE_ALIGN (TREE_TYPE (gnu_expr)))
   /* The field of a padding record is always addressable.  */
   || TYPE_IS_PADDING_P (TREE_TYPE (TREE_OPERAND (gnu_expr, 0
  && addressable_p (TREE_OPERAND (gnu_expr, 0), NULL_TREE));


[gcc r15-3405] ada: Fix internal error with Atomic Volatile_Full_Access object

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:0a862c5af5c603baab8715bbcca6890f77cc59e2

commit r15-3405-g0a862c5af5c603baab8715bbcca6890f77cc59e2
Author: Eric Botcazou 
Date:   Thu Aug 22 21:18:15 2024 +0200

ada: Fix internal error with Atomic Volatile_Full_Access object

The initial implementation of the GNAT aspect/pragma Volatile_Full_Access
made it incompatible with Atomic, because it was not decided whether the
read-modify-write sequences generated by Volatile_Full_Access would need
to be implemented atomically when Atomic was also specified, which would
have required a compare-and-swap primitive from the target architecture.

But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic
in the process, answering the above question by the negative, so the
incompatibility between Volatile_Full_Access and Atomic was lifted in
Ada 2012 as well, unfortunately without adjusting the implementation.

gcc/ada/

* gcc-interface/trans.cc (get_atomic_access): Deal specifically with
nodes that are both Atomic and Volatile_Full_Access in Ada 2012.

Diff:
---
 gcc/ada/gcc-interface/trans.cc | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 7cced04361d0..caa0f56a34d9 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -4387,9 +4387,9 @@ get_atomic_access (Node_Id gnat_node, atomic_acces_t 
*type, bool *sync)
 gnat_node = Expression (gnat_node);
 
   /* Up to Ada 2012, for Atomic itself, only reads and updates of the object as
- a whole require atomic access (RM C.6(15)).  But, starting with Ada 2022,
- reads of or writes to a nonatomic subcomponent of the object also require
- atomic access (RM C.6(19)).  */
+ a whole require atomic access (RM C.6(15)), unless the object is also VFA.
+ But, starting with Ada 2022, reads of or writes to nonatomic subcomponents
+ of the object also require atomic access (RM C.6(19)).  */
   if (node_is_atomic (gnat_node))
 {
   bool as_a_whole = true;
@@ -4398,7 +4398,9 @@ get_atomic_access (Node_Id gnat_node, atomic_acces_t 
*type, bool *sync)
   for (gnat_temp = gnat_node, gnat_parent = Parent (gnat_temp);
   node_is_component (gnat_parent) && Prefix (gnat_parent) == gnat_temp;
   gnat_temp = gnat_parent, gnat_parent = Parent (gnat_temp))
-   if (Ada_Version < Ada_2022 || node_is_atomic (gnat_parent))
+   if (Ada_Version < Ada_2022
+   ? !node_is_volatile_full_access (gnat_node)
+   : node_is_atomic (gnat_parent))
  goto not_atomic;
else
  as_a_whole = false;


[gcc r15-3406] ada: Plug loophole exposed by previous change

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:9362abf5e81eb2e6e35f55f36ff8e7a31aef4e9d

commit r15-3406-g9362abf5e81eb2e6e35f55f36ff8e7a31aef4e9d
Author: Eric Botcazou 
Date:   Fri Aug 23 09:44:06 2024 +0200

ada: Plug loophole exposed by previous change

The change causes more temporaries to be created at call sites for unaligned
actual parameters, thus revealing that the machinery does not properly deal
with unconstrained nominal subtypes for them.

gcc/ada/

* gcc-interface/trans.cc (create_temporary): Deal with types whose
size is self-referential by allocating the maximum size.

Diff:
---
 gcc/ada/gcc-interface/trans.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index caa0f56a34d9..fadd6b483d5a 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -4527,6 +4527,9 @@ storage_model_access_required_p (Node_Id gnat_node, 
Entity_Id *gnat_smo)
 static tree
 create_temporary (const char *prefix, tree type)
 {
+  if (CONTAINS_PLACEHOLDER_P (TYPE_SIZE (type)))
+type = maybe_pad_type (type, max_size (TYPE_SIZE (type), true), 0,
+  Empty, false, false, true);
   tree gnu_temp
 = create_var_decl (create_tmp_var_name (prefix), NULL_TREE,
  type, NULL_TREE,


[gcc r15-3407] ada: Add kludge for quirk of ancient 32-bit ABIs to previous change

2024-09-03 Thread Marc Poulhi?s via Gcc-cvs
https://gcc.gnu.org/g:a19cf635ea29658d5f9fc19199473d6d823ef2d1

commit r15-3407-ga19cf635ea29658d5f9fc19199473d6d823ef2d1
Author: Eric Botcazou 
Date:   Fri Aug 23 17:06:00 2024 +0200

ada: Add kludge for quirk of ancient 32-bit ABIs to previous change

Some ancient 32-bit ABIs, most notably that of x86/Linux, misalign double
scalars in record types, so comparing DECL_ALIGN with TYPE_ALIGN directly
may give the wrong answer for them.

gcc/ada/

* gcc-interface/trans.cc (addressable_p) : Add kludge
to cope with ancient 32-bit ABIs.

Diff:
---
 gcc/ada/gcc-interface/trans.cc | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index fadd6b483d5a..c99b06670d58 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -10294,8 +10294,20 @@ addressable_p (tree gnu_expr, tree gnu_type)
   check the alignment of the containing record, as it is
   guaranteed to be not smaller than that of its most
   aligned field that is not a bit-field.  */
-   && DECL_ALIGN (TREE_OPERAND (gnu_expr, 1))
-  >= TYPE_ALIGN (TREE_TYPE (gnu_expr)))
+   && (DECL_ALIGN (TREE_OPERAND (gnu_expr, 1))
+   >= TYPE_ALIGN (TREE_TYPE (gnu_expr))
+#ifdef TARGET_ALIGN_DOUBLE
+  /* Cope with the misalignment of doubles in records for
+ ancient 32-bit ABIs like that of x86/Linux.  */
+  || (DECL_ALIGN (TREE_OPERAND (gnu_expr, 1)) == 32
+  && TYPE_ALIGN (TREE_TYPE (gnu_expr)) == 64
+  && !TARGET_ALIGN_DOUBLE
+#ifdef TARGET_64BIT
+  && !TARGET_64BIT
+#endif
+ )
+#endif
+  ))
   /* The field of a padding record is always addressable.  */
   || TYPE_IS_PADDING_P (TREE_TYPE (TREE_OPERAND (gnu_expr, 0
  && addressable_p (TREE_OPERAND (gnu_expr, 0), NULL_TREE));


[gcc r15-3408] lower-bitint: Fix up __builtin_{add, sub}_overflow{, _p} bitint lowering [PR116501]

2024-09-03 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:d4d75a83007e884bfcd632ea3b3269704496f048

commit r15-3408-gd4d75a83007e884bfcd632ea3b3269704496f048
Author: Jakub Jelinek 
Date:   Tue Sep 3 10:20:44 2024 +0200

lower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering 
[PR116501]

The following testcase is miscompiled.  The problem is in the last_ovf step.
The second operand has signed _BitInt(513) type but has the MSB clear,
so range_to_prec returns 512 for it (i.e. it fits into unsigned
_BitInt(512)).  Because of that the last step actually doesn't need to get
the most significant bit from the second operand, but the code was deciding
what to use purely from TYPE_UNSIGNED (type1) - if unsigned, use 0,
otherwise sign-extend the last processed bit; but that in this case was set.
We don't want to treat the positive operand as if it was negative regardless
of the bit below that precision, and precN >= 0 indicates that the operand
is in the [0, inf) range.

2024-09-03  Jakub Jelinek  

PR tree-optimization/116501
* gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow):
In the last_ovf case, use build_zero_cst operand not just when
TYPE_UNSIGNED (typeN), but also when precN >= 0.

* gcc.dg/torture/bitint-73.c: New test.

Diff:
---
 gcc/gimple-lower-bitint.cc   |  4 ++--
 gcc/testsuite/gcc.dg/torture/bitint-73.c | 20 
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-lower-bitint.cc b/gcc/gimple-lower-bitint.cc
index b10593035c36..58deaf253e93 100644
--- a/gcc/gimple-lower-bitint.cc
+++ b/gcc/gimple-lower-bitint.cc
@@ -4192,7 +4192,7 @@ bitint_large_huge::lower_addsub_overflow (tree obj, 
gimple *stmt)
   else
{
  m_data_cnt = data_cnt;
- if (TYPE_UNSIGNED (type0))
+ if (TYPE_UNSIGNED (type0) || prec0 >= 0)
rhs1 = build_zero_cst (m_limb_type);
  else
{
@@ -4210,7 +4210,7 @@ bitint_large_huge::lower_addsub_overflow (tree obj, 
gimple *stmt)
  rhs1 = add_cast (m_limb_type, gimple_assign_lhs (g));
}
}
- if (TYPE_UNSIGNED (type1))
+ if (TYPE_UNSIGNED (type1) || prec1 >= 0)
rhs2 = build_zero_cst (m_limb_type);
  else
{
diff --git a/gcc/testsuite/gcc.dg/torture/bitint-73.c 
b/gcc/testsuite/gcc.dg/torture/bitint-73.c
new file mode 100644
index ..1e15f3912574
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/bitint-73.c
@@ -0,0 +1,20 @@
+/* PR tree-optimization/116501 */
+/* { dg-do run { target bitint575 } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+_BitInt (4) a;
+
+int
+foo (_BitInt(513) b)
+{
+  return __builtin_sub_overflow_p (a, b, (_BitInt (511)) 0);
+}
+
+int
+main ()
+{
+  if (!foo 
(0xwb))
+__builtin_abort ();
+}


[gcc r15-3409] Do not assert NUM_POLY_INT_COEFFS != 1 early

2024-09-03 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:14b65af6b400284a937e1d3be45579ee8cf8c32b

commit r15-3409-g14b65af6b400284a937e1d3be45579ee8cf8c32b
Author: Richard Biener 
Date:   Tue Sep 3 10:40:41 2024 +0200

Do not assert NUM_POLY_INT_COEFFS != 1 early

The following moves the assert on NUM_POLY_INT_COEFFS != 1 after
INTEGER_CST processing.

* fold-const.cc (poly_int_binop): Move assert on
NUM_POLY_INT_COEFFS after INTEGER_CST processing.

Diff:
---
 gcc/fold-const.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 2ada59f712bb..70db16759d04 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1241,7 +1241,6 @@ poly_int_binop (poly_wide_int &res, enum tree_code code,
const_tree arg1, const_tree arg2,
signop sign, wi::overflow_type *overflow)
 {
-  gcc_assert (NUM_POLY_INT_COEFFS != 1);
   gcc_assert (poly_int_tree_p (arg1) && poly_int_tree_p (arg2));
 
   if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST)
@@ -1254,6 +1253,8 @@ poly_int_binop (poly_wide_int &res, enum tree_code code,
   return true;
 }
 
+  gcc_assert (NUM_POLY_INT_COEFFS != 1);
+
   switch (code)
 {
 case PLUS_EXPR:


[gcc r15-3410] i386: Fix vfpclassph non-optimizied intrin

2024-09-03 Thread Haochen Jiang via Gcc-cvs
https://gcc.gnu.org/g:9b312595f9ac073f55d858b6f833097608b40bba

commit r15-3410-g9b312595f9ac073f55d858b6f833097608b40bba
Author: Haochen Jiang 
Date:   Mon Sep 2 15:00:22 2024 +0800

i386: Fix vfpclassph non-optimizied intrin

The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. That problem also happened in AVX10.2 testcases.
I will write a seperate patch to fix that.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h
(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.

Diff:
---
 gcc/config/i386/avx512fp16intrin.h |  4 +-
 .../gcc.target/i386/avx512fp16-vfpclassph-1c.c | 77 ++
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h 
b/gcc/config/i386/avx512fp16intrin.h
index 1869a920dd32..c3096b74ad2b 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -3961,11 +3961,11 @@ _mm512_fpclass_ph_mask (__m512h __A, const int __imm)
 #else
 #define _mm512_mask_fpclass_ph_mask(u, x, c)   \
   ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-(int) (c),(__mmask8)(u)))
+(int) (c),(__mmask32)(u)))
 
 #define _mm512_fpclass_ph_mask(x, c)\
   ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-(int) (c),(__mmask8)-1))
+(int) (c),(__mmask32)-1))
 #endif /* __OPIMTIZE__ */
 
 /* Intrinsics vgetexpph.  */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c
new file mode 100644
index ..4739f1228e32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -mavx512fp16" } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#define AVX512FP16
+#include "avx512f-helper.h"
+
+#include 
+#include 
+#include 
+#define SIZE (AVX512F_LEN / 16)
+#include "avx512f-mask-type.h"
+
+#ifndef __FPCLASSPH__
+#define __FPCLASSPH__
+int check_fp_class_hp (_Float16 src, int imm)
+{
+  int qNaN_res = isnan (src);
+  int sNaN_res = isnan (src);
+  int Pzero_res = (src == 0.0);
+  int Nzero_res = (src == -0.0);
+  int PInf_res = (isinf (src) == 1);
+  int NInf_res = (isinf (src) == -1);
+  int Denorm_res = (fpclassify (src) == FP_SUBNORMAL);
+  int FinNeg_res = __builtin_finite (src) && (src < 0);
+
+  int result = (((imm & 1) && qNaN_res)
+   || (((imm >> 1) & 1) && Pzero_res)
+   || (((imm >> 2) & 1) && Nzero_res)
+   || (((imm >> 3) & 1) && PInf_res)
+   || (((imm >> 4) & 1) && NInf_res)
+   || (((imm >> 5) & 1) && Denorm_res)
+   || (((imm >> 6) & 1) && FinNeg_res)
+   || (((imm >> 7) & 1) && sNaN_res));
+  return result;
+}
+#endif
+
+MASK_TYPE
+CALC (_Float16 *s1, int imm)
+{
+  int i;
+  MASK_TYPE res = 0;
+
+  for (i = 0; i < SIZE; i++)
+if (check_fp_class_hp(s1[i], imm))
+  res = res | (1 << i);
+
+  return res;
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, h) src;
+  MASK_TYPE res1, res2, res_ref = 0;
+  MASK_TYPE mask = MASK_VALUE;
+
+  src.a[SIZE - 1] = NAN;
+  src.a[SIZE - 2] = 1.0 / 0.0;
+  for (i = 0; i < SIZE - 2; i++)
+{
+  src.a[i] = -24.43 + 0.6 * i;
+}
+
+  res1 = INTRINSIC (_fpclass_ph_mask) (src.x, 0xFF);
+  res2 = INTRINSIC (_mask_fpclass_ph_mask) (mask, src.x, 0xFF);
+
+  res_ref = CALC (src.a, 0xFF);
+
+  if (res_ref != res1)
+abort ();
+
+  if ((mask & res_ref) != res2)
+abort ();
+}


[gcc r12-10696] i386: Fix vfpclassph non-optimizied intrin

2024-09-03 Thread Haochen Jiang via Gcc-cvs
https://gcc.gnu.org/g:6e59b188c4a051d4f2de5220d30681e6963d96c0

commit r12-10696-g6e59b188c4a051d4f2de5220d30681e6963d96c0
Author: Haochen Jiang 
Date:   Mon Sep 2 15:00:22 2024 +0800

i386: Fix vfpclassph non-optimizied intrin

The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. That problem also happened in AVX10.2 testcases.
I will write a seperate patch to fix that.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h
(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.

Diff:
---
 gcc/config/i386/avx512fp16intrin.h |  4 +-
 .../gcc.target/i386/avx512fp16-vfpclassph-1c.c | 77 ++
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h 
b/gcc/config/i386/avx512fp16intrin.h
index b16ccfcb7f17..6330e57ebb85 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -2321,11 +2321,11 @@ _mm512_fpclass_ph_mask (__m512h __A, const int __imm)
 #else
 #define _mm512_mask_fpclass_ph_mask(u, x, c)   \
   ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-(int) (c),(__mmask8)(u)))
+(int) (c),(__mmask32)(u)))
 
 #define _mm512_fpclass_ph_mask(x, c)\
   ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-(int) (c),(__mmask8)-1))
+(int) (c),(__mmask32)-1))
 #endif /* __OPIMTIZE__ */
 
 /* Intrinsics vgetexpph, vgetexpsh.  */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c
new file mode 100644
index ..4739f1228e32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -mavx512fp16" } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#define AVX512FP16
+#include "avx512f-helper.h"
+
+#include 
+#include 
+#include 
+#define SIZE (AVX512F_LEN / 16)
+#include "avx512f-mask-type.h"
+
+#ifndef __FPCLASSPH__
+#define __FPCLASSPH__
+int check_fp_class_hp (_Float16 src, int imm)
+{
+  int qNaN_res = isnan (src);
+  int sNaN_res = isnan (src);
+  int Pzero_res = (src == 0.0);
+  int Nzero_res = (src == -0.0);
+  int PInf_res = (isinf (src) == 1);
+  int NInf_res = (isinf (src) == -1);
+  int Denorm_res = (fpclassify (src) == FP_SUBNORMAL);
+  int FinNeg_res = __builtin_finite (src) && (src < 0);
+
+  int result = (((imm & 1) && qNaN_res)
+   || (((imm >> 1) & 1) && Pzero_res)
+   || (((imm >> 2) & 1) && Nzero_res)
+   || (((imm >> 3) & 1) && PInf_res)
+   || (((imm >> 4) & 1) && NInf_res)
+   || (((imm >> 5) & 1) && Denorm_res)
+   || (((imm >> 6) & 1) && FinNeg_res)
+   || (((imm >> 7) & 1) && sNaN_res));
+  return result;
+}
+#endif
+
+MASK_TYPE
+CALC (_Float16 *s1, int imm)
+{
+  int i;
+  MASK_TYPE res = 0;
+
+  for (i = 0; i < SIZE; i++)
+if (check_fp_class_hp(s1[i], imm))
+  res = res | (1 << i);
+
+  return res;
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, h) src;
+  MASK_TYPE res1, res2, res_ref = 0;
+  MASK_TYPE mask = MASK_VALUE;
+
+  src.a[SIZE - 1] = NAN;
+  src.a[SIZE - 2] = 1.0 / 0.0;
+  for (i = 0; i < SIZE - 2; i++)
+{
+  src.a[i] = -24.43 + 0.6 * i;
+}
+
+  res1 = INTRINSIC (_fpclass_ph_mask) (src.x, 0xFF);
+  res2 = INTRINSIC (_mask_fpclass_ph_mask) (mask, src.x, 0xFF);
+
+  res_ref = CALC (src.a, 0xFF);
+
+  if (res_ref != res1)
+abort ();
+
+  if ((mask & res_ref) != res2)
+abort ();
+}


[gcc r13-9002] i386: Fix vfpclassph non-optimizied intrin

2024-09-03 Thread Haochen Jiang via Gcc-cvs
https://gcc.gnu.org/g:e152aee5709dd3e341ef965450500f754f8b0a46

commit r13-9002-ge152aee5709dd3e341ef965450500f754f8b0a46
Author: Haochen Jiang 
Date:   Mon Sep 2 15:00:22 2024 +0800

i386: Fix vfpclassph non-optimizied intrin

The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. I will write a seperate patch to fix that
on trunk ONLY.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h
(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.

Diff:
---
 gcc/config/i386/avx512fp16intrin.h |  4 +-
 .../gcc.target/i386/avx512fp16-vfpclassph-1c.c | 77 ++
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h 
b/gcc/config/i386/avx512fp16intrin.h
index dd083e5ed67b..4702c56c0dc7 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -2322,11 +2322,11 @@ _mm512_fpclass_ph_mask (__m512h __A, const int __imm)
 #else
 #define _mm512_mask_fpclass_ph_mask(u, x, c)   \
   ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-(int) (c),(__mmask8)(u)))
+(int) (c),(__mmask32)(u)))
 
 #define _mm512_fpclass_ph_mask(x, c)\
   ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-(int) (c),(__mmask8)-1))
+(int) (c),(__mmask32)-1))
 #endif /* __OPIMTIZE__ */
 
 /* Intrinsics vgetexpph, vgetexpsh.  */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c
new file mode 100644
index ..4739f1228e32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -mavx512fp16" } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#define AVX512FP16
+#include "avx512f-helper.h"
+
+#include 
+#include 
+#include 
+#define SIZE (AVX512F_LEN / 16)
+#include "avx512f-mask-type.h"
+
+#ifndef __FPCLASSPH__
+#define __FPCLASSPH__
+int check_fp_class_hp (_Float16 src, int imm)
+{
+  int qNaN_res = isnan (src);
+  int sNaN_res = isnan (src);
+  int Pzero_res = (src == 0.0);
+  int Nzero_res = (src == -0.0);
+  int PInf_res = (isinf (src) == 1);
+  int NInf_res = (isinf (src) == -1);
+  int Denorm_res = (fpclassify (src) == FP_SUBNORMAL);
+  int FinNeg_res = __builtin_finite (src) && (src < 0);
+
+  int result = (((imm & 1) && qNaN_res)
+   || (((imm >> 1) & 1) && Pzero_res)
+   || (((imm >> 2) & 1) && Nzero_res)
+   || (((imm >> 3) & 1) && PInf_res)
+   || (((imm >> 4) & 1) && NInf_res)
+   || (((imm >> 5) & 1) && Denorm_res)
+   || (((imm >> 6) & 1) && FinNeg_res)
+   || (((imm >> 7) & 1) && sNaN_res));
+  return result;
+}
+#endif
+
+MASK_TYPE
+CALC (_Float16 *s1, int imm)
+{
+  int i;
+  MASK_TYPE res = 0;
+
+  for (i = 0; i < SIZE; i++)
+if (check_fp_class_hp(s1[i], imm))
+  res = res | (1 << i);
+
+  return res;
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, h) src;
+  MASK_TYPE res1, res2, res_ref = 0;
+  MASK_TYPE mask = MASK_VALUE;
+
+  src.a[SIZE - 1] = NAN;
+  src.a[SIZE - 2] = 1.0 / 0.0;
+  for (i = 0; i < SIZE - 2; i++)
+{
+  src.a[i] = -24.43 + 0.6 * i;
+}
+
+  res1 = INTRINSIC (_fpclass_ph_mask) (src.x, 0xFF);
+  res2 = INTRINSIC (_mask_fpclass_ph_mask) (mask, src.x, 0xFF);
+
+  res_ref = CALC (src.a, 0xFF);
+
+  if (res_ref != res1)
+abort ();
+
+  if ((mask & res_ref) != res2)
+abort ();
+}


[gcc r14-10627] i386: Fix vfpclassph non-optimizied intrin

2024-09-03 Thread Haochen Jiang via Gcc-cvs
https://gcc.gnu.org/g:59157c038d683e91c419a1fadd5f91f15218f57b

commit r14-10627-g59157c038d683e91c419a1fadd5f91f15218f57b
Author: Haochen Jiang 
Date:   Mon Sep 2 15:00:22 2024 +0800

i386: Fix vfpclassph non-optimizied intrin

The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. I will write a seperate patch to fix that
on trunk ONLY.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h
(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.

Diff:
---
 gcc/config/i386/avx512fp16intrin.h |  4 +-
 .../gcc.target/i386/avx512fp16-vfpclassph-1c.c | 77 ++
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h 
b/gcc/config/i386/avx512fp16intrin.h
index f86050b20873..e8baebd41d3c 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -3961,11 +3961,11 @@ _mm512_fpclass_ph_mask (__m512h __A, const int __imm)
 #else
 #define _mm512_mask_fpclass_ph_mask(u, x, c)   \
   ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-(int) (c),(__mmask8)(u)))
+(int) (c),(__mmask32)(u)))
 
 #define _mm512_fpclass_ph_mask(x, c)\
   ((__mmask32) __builtin_ia32_fpclassph512_mask ((__v32hf) (__m512h) (x), \
-(int) (c),(__mmask8)-1))
+(int) (c),(__mmask32)-1))
 #endif /* __OPIMTIZE__ */
 
 /* Intrinsics vgetexpph.  */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c
new file mode 100644
index ..4739f1228e32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfpclassph-1c.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -mavx512fp16" } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#define AVX512FP16
+#include "avx512f-helper.h"
+
+#include 
+#include 
+#include 
+#define SIZE (AVX512F_LEN / 16)
+#include "avx512f-mask-type.h"
+
+#ifndef __FPCLASSPH__
+#define __FPCLASSPH__
+int check_fp_class_hp (_Float16 src, int imm)
+{
+  int qNaN_res = isnan (src);
+  int sNaN_res = isnan (src);
+  int Pzero_res = (src == 0.0);
+  int Nzero_res = (src == -0.0);
+  int PInf_res = (isinf (src) == 1);
+  int NInf_res = (isinf (src) == -1);
+  int Denorm_res = (fpclassify (src) == FP_SUBNORMAL);
+  int FinNeg_res = __builtin_finite (src) && (src < 0);
+
+  int result = (((imm & 1) && qNaN_res)
+   || (((imm >> 1) & 1) && Pzero_res)
+   || (((imm >> 2) & 1) && Nzero_res)
+   || (((imm >> 3) & 1) && PInf_res)
+   || (((imm >> 4) & 1) && NInf_res)
+   || (((imm >> 5) & 1) && Denorm_res)
+   || (((imm >> 6) & 1) && FinNeg_res)
+   || (((imm >> 7) & 1) && sNaN_res));
+  return result;
+}
+#endif
+
+MASK_TYPE
+CALC (_Float16 *s1, int imm)
+{
+  int i;
+  MASK_TYPE res = 0;
+
+  for (i = 0; i < SIZE; i++)
+if (check_fp_class_hp(s1[i], imm))
+  res = res | (1 << i);
+
+  return res;
+}
+
+void
+TEST (void)
+{
+  int i;
+  UNION_TYPE (AVX512F_LEN, h) src;
+  MASK_TYPE res1, res2, res_ref = 0;
+  MASK_TYPE mask = MASK_VALUE;
+
+  src.a[SIZE - 1] = NAN;
+  src.a[SIZE - 2] = 1.0 / 0.0;
+  for (i = 0; i < SIZE - 2; i++)
+{
+  src.a[i] = -24.43 + 0.6 * i;
+}
+
+  res1 = INTRINSIC (_fpclass_ph_mask) (src.x, 0xFF);
+  res2 = INTRINSIC (_mask_fpclass_ph_mask) (mask, src.x, 0xFF);
+
+  res_ref = CALC (src.a, 0xFF);
+
+  if (res_ref != res1)
+abort ();
+
+  if ((mask & res_ref) != res2)
+abort ();
+}


[gcc/devel/omp/gcc-14] (100 commits) Merge branch 'releases/gcc-14' into devel/omp/gcc-14

2024-09-03 Thread Tobias Burnus via Gcc-cvs
The branch 'devel/omp/gcc-14' was updated to point to:

 682fd948f835... Merge branch 'releases/gcc-14' into devel/omp/gcc-14

It previously pointed to:

 6d3c68ff05cf... amdgcn: Fix VGPR max count

Diff:

Summary of changes (added commits):
---

  682fd94... Merge branch 'releases/gcc-14' into devel/omp/gcc-14
  59157c0... i386: Fix vfpclassph non-optimizied intrin (*)
  2ac3806... Daily bump. (*)
  ba9a3f1... Check avx upper register for parallel. (*)
  db4d810... Daily bump. (*)
  a2e32a8... Daily bump. (*)
  657bf4a... Daily bump. (*)
  5999dd8... Fortran: fix ICE with use with rename of namelist member [P (*)
  552c7c1... Daily bump. (*)
  d3c14d4... Daily bump. (*)
  4d6c0c0... Add gcc ka.po (*)
  f5b3dae... i386: testsuite: Adapt fentryname3.c for r14-811 change [PR (*)
  377c3e9... i386: testsuite: Add -no-pie for pr113689-1.c [PR70150] (*)
  87aea23... Daily bump. (*)
  90b1232... Update gcc zh_CN.po (*)
  75892d9... MIPS: Include missing mips16.S in libgcc/lib1funcs.S (*)
  ef9c53b... Daily bump. (*)
  b414466... Daily bump. (*)
  5b75e1c... Daily bump. (*)
  8de3153... Daily bump. (*)
  27dc153... Align ix86_{move_max,store_max} with vectorizer. (*)
  ffd458d... Daily bump. (*)
  5146af5... Daily bump. (*)
  25812d8... [testsuite] [arm] [vect] adjust mve-vshr test [PR113281] (*)
  76ac167... Daily bump. (*)
  52da858... c++: fix ICE in convert_nontype_argument [PR116384] (*)
  af97b5e... testsuite: Prune warning about size of enums (*)
  1fad6ad... Daily bump. (*)
  c725748... AVR: ad target/116407 - Fix linker error "relocation trunca (*)
  919c42b... AVR: target/116407 - Fix linker error "relocation truncated (*)
  f4ce098... Daily bump. (*)
  f3d9c12... AVR: target/116390 - Fix an avrtiny asm out template. (*)
  0296001... Daily bump. (*)
  507b4e1... AVR: target/85624 - Use HImode for clrmemqi alignment. (*)
  edf95a4... testsuite: Verify -fshort-enums and -fno-short-enums in pr3 (*)
  5c1f687... testsuite: Add -fno-short-enums to pr97315-1.C (*)
  345d145... testsuite: Add -fwrapv to signbit-5.c (*)
  45a771d... i386: Fix some vex insns that prohibit egpr (*)
  86dacfb... aarch64: Add another use of force_subreg [PR115464] (*)
  32b2129... aarch64: Fix invalid nested subregs [PR115464] (*)
  4e7735a... Move ix86_align_loops into a separate pass and insert the p (*)
  ccca8df... Daily bump. (*)
  63c51e0... c++/coroutines: fix passing *this to promise type, again [P (*)
  d9bd361... [PATCH] RISC-V: Fix unresolved mcpu-[67].c tests (*)
  8c98f06... RISC-V: Make full-vec-move1.c test robust for optimization (*)
  7268985... Daily bump. (*)
  e903ada... s390: Fix high-level builtins vec_gfmsum{,_accum}_128 (*)
  5a63e19... Daily bump. (*)
  7d9bb37... Add -mcpu=power11 support. (*)
  f688431... Daily bump. (*)
  6bfd78c... Daily bump. (*)
  534ffe7... Daily bump. (*)
  6f1e687... Daily bump. (*)
  b0dd13e... i386: Fix up __builtin_ia32_b{extr{,i}_u{32,64},zhi_{s,d}i} (*)
  897cd79... Daily bump. (*)
  9ca1d7a... AVR: target/116295 - Fix unrecognizable insn with __flash r (*)
  a9255df... Daily bump. (*)
  49e8eee... Daily bump. (*)
  b1102f7... c++: alias and non-type template parm [PR116223] (*)
  987fc81... c++: parse error with -std=c++14 -fconcepts [PR116071] (*)
  ba26c47... hppa: Fix (plus (plus (mult (a) (mem_shadd_constant)) (b))  (*)
  f2b5ca6... wide-int: Fix up mul_internal overflow checking [PR116224] (*)
  3fe5720... libquadmath: Fix up libquadmath/math/sqrtq.c compilation in (*)
  cad2693... fortran: Fix up pasto in gfc_get_array_descr_info (*)
  ba45573... sh: Don't call make_insn_raw in sh_recog_treg_set_expr [PR1 (*)
  c5ef3b9... Daily bump. (*)
  de73898... compiler: panic arguments are empty interface type (*)
  2405d29... libgomp: Remove bogus warnings from privatized-ref-2.f90. (*)
  9906a98... Fortran: Suppress bogus used uninitialized warnings [PR1088 (*)
  daced76... Update gcc fr.po (*)
  eccf707... RISC-V: xtheadmemidx: Fix mode test for pre/post-modify add (*)
  5103ee7... Daily bump. (*)
  80a64e6... Daily bump. (*)
  c386665... libstdc++: Fix __cpp_lib_chrono for old std::string ABI (*)
  99eb84f... Daily bump. (*)
  21e2d27... Update gcc .po files (*)
  14fa2b2... forwprop: Don't add uses to dce list if debug statement [PR (*)
  a295076... Refine constraint "Bk" to define_special_memory_constraint. (*)
  30f4fa3... i386: Add non-optimize prefetchi intrins (*)
  79d32ba... LoongArch: Remove gawk extension from a generator script. (*)
  81db685... c++: generic lambda in default template argument [PR88313] (*)
  37e54ff... c++: alias of alias tmpl with dependent attrs [PR115897] (*)
  59e3934... libstdc++: fix uses of explicit object parameter [PR116038] (*)
  241f710... c++: normalizing ttp constraints [PR115656] (*)
  e548a88... c++: missing SFINAE during alias CTAD [PR115296] (*)
  1287b4a... c++: prev declared hidden tmpl friend inst [PR112288] (*)
  fb8da40... Daily bump. (*)
  c637241... libstdc++: Add [[nodiscard]] to some std::l

[gcc/devel/omp/gcc-14] Merge branch 'releases/gcc-14' into devel/omp/gcc-14

2024-09-03 Thread Tobias Burnus via Gcc-cvs
https://gcc.gnu.org/g:682fd948f835fd5ada2de45988448c91e10f5016

commit 682fd948f835fd5ada2de45988448c91e10f5016
Merge: 6d3c68ff05cf 59157c038d68
Author: Tobias Burnus 
Date:   Tue Sep 3 10:54:46 2024 +0200

Merge branch 'releases/gcc-14' into devel/omp/gcc-14

Merge up to r14-10627-g59157c038d683e (3rd Sep 2024)

Diff:

 gcc/ChangeLog  |   252 +
 gcc/DATESTAMP  | 2 +-
 gcc/config.gcc | 4 +-
 gcc/config/aarch64/aarch64-sve-builtins-base.cc| 6 +-
 gcc/config/avr/avr-protos.h| 2 +-
 gcc/config/avr/avr.cc  |45 +-
 gcc/config/avr/avr.md  |12 +-
 gcc/config/i386/avx512fp16intrin.h | 4 +-
 gcc/config/i386/constraints.md | 2 +-
 gcc/config/i386/i386-features.cc   |   191 +
 gcc/config/i386/i386-options.cc| 6 +
 gcc/config/i386/i386-passes.def| 3 +
 gcc/config/i386/i386-protos.h  | 1 +
 gcc/config/i386/i386.cc|   194 +-
 gcc/config/i386/prfchiintrin.h | 9 +
 gcc/config/i386/sse.md |49 +-
 gcc/config/loongarch/genopts/gen-evolution.awk | 7 +-
 gcc/config/pa/pa.cc| 1 +
 gcc/config/riscv/thead.cc  | 6 +-
 gcc/config/rs6000/aix71.h  | 1 +
 gcc/config/rs6000/aix72.h  | 1 +
 gcc/config/rs6000/aix73.h  | 1 +
 gcc/config/rs6000/driver-rs6000.cc | 2 +
 gcc/config/rs6000/power10.md   |   144 +-
 gcc/config/rs6000/ppc-auxv.h   | 3 +-
 gcc/config/rs6000/rs6000-builtin.cc| 1 +
 gcc/config/rs6000/rs6000-c.cc  | 2 +
 gcc/config/rs6000/rs6000-cpus.def  | 5 +
 gcc/config/rs6000/rs6000-opts.h| 1 +
 gcc/config/rs6000/rs6000-string.cc | 1 +
 gcc/config/rs6000/rs6000-tables.opt|11 +-
 gcc/config/rs6000/rs6000.cc|32 +-
 gcc/config/rs6000/rs6000.h | 1 +
 gcc/config/rs6000/rs6000.md| 2 +-
 gcc/config/rs6000/rs6000.opt   | 6 +
 gcc/config/s390/s390-builtin-types.def | 2 +
 gcc/config/s390/s390-builtins.def  | 2 +
 gcc/config/s390/vecintrin.h| 4 +-
 gcc/config/sh/sh.cc|12 +-
 gcc/cp/ChangeLog   |   100 +
 gcc/cp/constraint.cc   | 9 +-
 gcc/cp/coroutines.cc   | 8 +-
 gcc/cp/cp-tree.h   | 2 +-
 gcc/cp/parser.cc   |44 +-
 gcc/cp/pt.cc   |25 +-
 gcc/cp/tree.cc |51 +-
 gcc/doc/invoke.texi| 2 +-
 gcc/explow.cc  |15 +
 gcc/explow.h   | 2 +
 gcc/fortran/ChangeLog  |32 +
 gcc/fortran/gfortran.h | 4 +
 gcc/fortran/trans-array.cc |43 +
 gcc/fortran/trans-io.cc| 3 +-
 gcc/fortran/trans-types.cc | 7 +-
 gcc/go/gofrontend/expressions.cc   | 6 +
 gcc/po/ChangeLog   |18 +
 gcc/po/be.po   |  7712 +-
 gcc/po/da.po   |  7730 +-
 gcc/po/de.po   |  7727 +-
 gcc/po/el.po   |  7709 +-
 gcc/po/es.po   |  7737 +-
 gcc/po/fi.po   |  7718 +-
 gcc/po/fr.po   |  7729 +-
 gcc/po/hr.po   |  7715 +-
 gcc/po/id.po   |  7727 +-
 gcc/po/ja.po   |  7713 +-
 gcc/po/ka.po   | 83090 +++
 gcc/po/nl.po   |  7719 +-
 gcc/po/ru.po   |  7732 +-
 gcc/po/sr.po   |  7725 +-
 gcc/po/sv.po   |  7717 +-
 gcc/po/tr.po   |  7741 +-
 gcc/po/uk.po   |  7734 +-
 gcc/po/vi.po   |  7725 +-
 gcc/po/zh_CN.

[gcc r15-3411] tree-optimization/116575 - avoid ICE with SLP mask_load_lane

2024-09-03 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:ac6cd62a351a8f1f3637a2552c74eb5eb51cfdda

commit r15-3411-gac6cd62a351a8f1f3637a2552c74eb5eb51cfdda
Author: Richard Biener 
Date:   Tue Sep 3 09:23:20 2024 +0200

tree-optimization/116575 - avoid ICE with SLP mask_load_lane

The following avoids performing re-discovery with single lanes in
the attempt to for the use of mask_load_lane as rediscovery will
fail since a single lane of a mask load will appear permuted which
isn't supported.

PR tree-optimization/116575
* tree-vect-slp.cc (vect_analyze_slp): Properly compute
the mask argument for vect_load/store_lanes_supported.
When the load is masked for now avoid rediscovery.

* gcc.dg/vect/pr116575.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr116575.c | 15 +++
 gcc/tree-vect-slp.cc | 19 +--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr116575.c 
b/gcc/testsuite/gcc.dg/vect/pr116575.c
new file mode 100644
index ..2047041ca64b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr116575.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+int a;
+float *b, *c;
+void d(char * __restrict e)
+{
+  for (; a; a++, b += 4, c += 4)
+if (*e++) {
+   float *f = c;
+   f[0] = b[0];
+   f[1] = b[1];
+   f[2] = b[2];
+   f[3] = b[3];
+}
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 2302d91fd23f..1342913affa1 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4720,11 +4720,16 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
  }
}
 
+ gimple *rep = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (slp_root));
+ bool masked = (is_gimple_call (rep)
+&& gimple_call_internal_p (rep)
+&& internal_fn_mask_index
+ (gimple_call_internal_fn (rep)) != -1);
  /* If the loads and stores can use load/store-lanes force re-discovery
 with single lanes.  */
  if (loads_permuted
  && !slp_root->ldst_lanes
- && vect_store_lanes_supported (vectype, group_size, false)
+ && vect_store_lanes_supported (vectype, group_size, masked)
  != IFN_LAST)
{
  bool can_use_lanes = true;
@@ -4734,13 +4739,23 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
  {
stmt_vec_info stmt_vinfo = DR_GROUP_FIRST_ELEMENT
(SLP_TREE_REPRESENTATIVE (load_node));
+   rep = STMT_VINFO_STMT (stmt_vinfo);
+   masked = (is_gimple_call (rep)
+ && gimple_call_internal_p (rep)
+ && internal_fn_mask_index
+  (gimple_call_internal_fn (rep)));
/* Use SLP for strided accesses (or if we can't
   load-lanes).  */
if (STMT_VINFO_STRIDED_P (stmt_vinfo)
|| compare_step_with_zero (vinfo, stmt_vinfo) <= 0
|| vect_load_lanes_supported
 (STMT_VINFO_VECTYPE (stmt_vinfo),
- DR_GROUP_SIZE (stmt_vinfo), false) == IFN_LAST)
+ DR_GROUP_SIZE (stmt_vinfo), masked) == IFN_LAST
+   /* ???  During SLP re-discovery with a single lane
+  a masked grouped load will appear permuted and
+  discovery will fail.  We have to rework this
+  on the discovery side - for now avoid ICEing.  */
+   || masked)
  {
can_use_lanes = false;
break;


[gcc r15-3412] MAINTAINERS: Update my email address

2024-09-03 Thread Szabolcs Nagy via Gcc-cvs
https://gcc.gnu.org/g:ce5f2dc45038c9806088132cc923b13719f48732

commit r15-3412-gce5f2dc45038c9806088132cc923b13719f48732
Author: Szabolcs Nagy 
Date:   Mon Sep 2 13:53:52 2024 +0100

MAINTAINERS: Update my email address

* MAINTAINERS: Update my email address and add myself to DCO.

Diff:
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 07ea5f5b6e12..cfd96c9f33ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -676,7 +676,7 @@ Christoph Müllner   cmuellner   

 Steven Munroe   munroesj
 Philippe De Muyter  -   
 Joseph Myersjsm28   
-Szabolcs Nagy   nsz 
+Szabolcs Nagy   nsz 
 Victor Do Nascimentovictorldn   
 Quentin Neill   qneill  
 Adam Nemet  nemet   
@@ -927,6 +927,7 @@ H.J. Lu 

 Matthew Malcomson   
 Immad Mir   
 Gaius Mulley
+Szabolcs Nagy   
 Andrew Pinski   
 Siddhesh Poyarekar  
 Ramana Radhakrishnan


[gcc r15-3413] LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

2024-09-03 Thread Tobias Burnus via Gcc-cvs
https://gcc.gnu.org/g:2fcccf21a34f92ea060b492c9b2aecb56cd5d167

commit r15-3413-g2fcccf21a34f92ea060b492c9b2aecb56cd5d167
Author: Tobias Burnus 
Date:   Tue Sep 3 12:02:23 2024 +0200

LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming
sufficient partiations, e.g., via -flto-partition=max), 
output_offload_tables
wrote the output tables once per fork.

PR lto/116535

gcc/ChangeLog:

* lto-cgraph.cc (output_offload_tables): Remove offload_ frees.
* lto-streamer-out.cc (lto_output): Make call to it depend on
lto_get_out_decl_state ()->output_offload_tables_p.
* lto-streamer.h (struct lto_out_decl_state): Add
output_offload_tables_p field.
* tree-pass.h (ipa_write_optimization_summaries): Add bool argument.
* passes.cc (ipa_write_summaries_1): Add bool
output_offload_tables_p arg.
(ipa_write_summaries): Update call.
(ipa_write_optimization_summaries): Accept output_offload_tables_p.

gcc/lto/ChangeLog:

* lto.cc (stream_out): Update call to
ipa_write_optimization_summaries to pass true for first partition.

Diff:
---
 gcc/lto-cgraph.cc   | 10 --
 gcc/lto-streamer-out.cc |  3 ++-
 gcc/lto-streamer.h  |  3 +++
 gcc/lto/lto.cc  |  2 +-
 gcc/passes.cc   | 11 ---
 gcc/tree-pass.h |  3 ++-
 6 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 6395033ab9df..1492409427c9 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -1139,16 +1139,6 @@ output_offload_tables (void)
 
   streamer_write_uhwi_stream (ob->main_stream, 0);
   lto_destroy_simple_output_block (ob);
-
-  /* In WHOPR mode during the WPA stage the joint offload tables need to be
- streamed to one partition only.  That's why we free offload_funcs and
- offload_vars after the first call of output_offload_tables.  */
-  if (flag_wpa)
-{
-  vec_free (offload_funcs);
-  vec_free (offload_vars);
-  vec_free (offload_ind_funcs);
-}
 }
 
 /* Verify the partitioning of NODE.  */
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 523d6dad221e..a4b171358d41 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2829,7 +2829,8 @@ lto_output (void)
  statements using the statement UIDs.  */
   output_symtab ();
 
-  output_offload_tables ();
+  if (lto_get_out_decl_state ()->output_offload_tables_p)
+output_offload_tables ();
 
   if (flag_checking)
 {
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 79c44d2cae71..4da1a3efe033 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -531,6 +531,9 @@ struct lto_out_decl_state
 
   /* True if decl state is compressed.  */
   bool compressed;
+
+  /* True if offload tables should be output. */
+  bool output_offload_tables_p;
 };
 
 typedef struct lto_out_decl_state *lto_out_decl_state_ptr;
diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
index 52dd436fd9a1..1ee215d8f1d3 100644
--- a/gcc/lto/lto.cc
+++ b/gcc/lto/lto.cc
@@ -178,7 +178,7 @@ stream_out (char *temp_filename, lto_symtab_encoder_t 
encoder, int part)
 
   gcc_assert (!dump_file);
   streamer_dump_file = dump_begin (TDI_lto_stream_out, NULL, part);
-  ipa_write_optimization_summaries (encoder);
+  ipa_write_optimization_summaries (encoder, part == 0);
 
   free (CONST_CAST (char *, file->filename));
 
diff --git a/gcc/passes.cc b/gcc/passes.cc
index d73f8ba97b64..057850f4decb 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -2829,11 +2829,13 @@ ipa_write_summaries_2 (opt_pass *pass, struct 
lto_out_decl_state *state)
summaries.  SET is the set of nodes to be written.  */
 
 static void
-ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
+ipa_write_summaries_1 (lto_symtab_encoder_t encoder,
+  bool output_offload_tables_p)
 {
   pass_manager *passes = g->get_passes ();
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state->symtab_node_encoder = encoder;
+  state->output_offload_tables_p = output_offload_tables_p;
 
   lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
@@ -2897,7 +2899,8 @@ ipa_write_summaries (void)
 if (vnode->need_lto_streaming)
   lto_set_symtab_encoder_in_partition (encoder, vnode);
 
-  ipa_write_summaries_1 (compute_ltrans_boundary (encoder));
+  ipa_write_summaries_1 (compute_ltrans_boundary (encoder),
+flag_generate_offload);
 
   free (order);
   if (streamer_dump_file)
@@ -2952,10 +2955,12 @@ ipa_write_optimization_summaries_1 (opt_pass *pass,
NULL, write out all summaries of all nodes. */
 
 void
-ipa_write_optimization_summaries (lto_symtab_encoder_t encoder)
+ipa_write_optimization_summaries (lto_symtab_encoder_t encoder,
+   

[gcc r15-3414] Zen5 tuning part 1: avoid FMA chains

2024-09-03 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:d6360b4083695970789fd65b9c515c11a5ce25b4

commit r15-3414-gd6360b4083695970789fd65b9c515c11a5ce25b4
Author: Jan Hubicka 
Date:   Tue Sep 3 13:38:33 2024 +0200

Zen5 tuning part 1: avoid FMA chains

testing matrix multiplication benchmarks shows that FMA on a critical chain
is a perofrmance loss over separate multiply and add. While the latency of 4
is lower than multiply + add (3+2) the problem is that all values needs to
be ready before computation starts.

While on znver4 AVX512 code fared well with FMA, it was because of the split
registers. Znver5 benefits from avoding FMA on all widths.  This may be 
different
with the mobile version though.

On naive matrix multiplication benchmark the difference is 8% with -O3
only since with -Ofast loop interchange solves the problem differently.
It is 30% win, for example, on S323 from TSVC:

real_t s323(struct args_t * func_args)
{

//recurrences
//coupled recurrence

initialise_arrays(__func__);
gettimeofday(&func_args->t1, NULL);

for (int nl = 0; nl < iterations/2; nl++) {
for (int i = 1; i < LEN_1D; i++) {
a[i] = b[i-1] + c[i] * d[i];
b[i] = a[i] + c[i] * e[i];
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}

gettimeofday(&func_args->t2, NULL);
return calc_checksum(__func__);
}

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable 
for
znver5.
(X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
(X86_TUNE_AVOID_512FMA_CHAINS): Likewise.

Diff:
---
 gcc/config/i386/x86-tune.def | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 3d29bffc49c3..da1a3d6a3c6c 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -508,17 +508,18 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, 
"use_scatter_8parts",
 
 /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
smaller FMA chain.  */
-DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | 
m_ZNVER2 | m_ZNVER3 | m_ZNVER4
+DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER
   | m_YONGFENG | m_SHIJIDADAO | m_GENERIC)
 
 /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
smaller FMA chain.  */
-DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | 
m_ZNVER3 | m_ZNVER4
- | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC)
+DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains",
+ m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_ZNVER5 | m_CORE_HYBRID
+ | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC)
 
 /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or
smaller FMA chain.  */
-DEF_TUNE (X86_TUNE_AVOID_512FMA_CHAINS, "avoid_fma512_chains", m_NONE)
+DEF_TUNE (X86_TUNE_AVOID_512FMA_CHAINS, "avoid_fma512_chains", m_ZNVER5)
 
 /* X86_TUNE_V2DF_REDUCTION_PREFER_PHADDPD: Prefer haddpd
for v2df vector reduction.  */


[gcc r15-3415] [PR target/115921] Improve reassociation for rv64

2024-09-03 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4371f656288f461335c47e98b8c038937a89764a

commit r15-3415-g4371f656288f461335c47e98b8c038937a89764a
Author: Jeff Law 
Date:   Tue Sep 3 06:45:30 2024 -0600

[PR target/115921] Improve reassociation for rv64

As Jovan pointed out in pr115921, we're not reassociating expressions like 
this
on rv64:

(x & 0x3e) << 12

It generates something like this:

li  a5,258048
sllia0,a0,12
and a0,a0,a5

We have a pattern that's designed to clean this up.  Essentially 
reassociating
the operations so that we don't need to load the constant resulting in
something like this:

andia0,a0,63
sllia0,a0,12

That pattern wasn't working for certain constants due to its condition. The
condition is trying to avoid cases where this kind of reassociation would
hinder shadd generation on rv64.  That condition was just written poorly.

This patch tightens up that condition in a few ways.  First, there's no 
need to
worry about shadd cases if ZBA is not enabled.  Second we can't use shadd if
the shift value isn't 1, 2 or 3.  Finally rather than open-coding one of the
tests, we can use an existing operand predicate.

The net is we'll start performing this transformation in more cases on rv64
while still avoiding reassociation if it would spoil shadd generation.

PR target/115921
gcc/
* config/riscv/riscv.md (reassociate bitwise ops): Tighten test for
cases we do not want reassociate.

gcc/testsuite/
* gcc.target/riscv/pr115921.c: New test.

Diff:
---
 gcc/J | 1064 +
 gcc/config/riscv/riscv.md |   10 +-
 gcc/testsuite/gcc.target/riscv/pr115921.c |   13 +
 3 files changed, 1083 insertions(+), 4 deletions(-)

diff --git a/gcc/J b/gcc/J
new file mode 100644
index ..6b6332dd2c13
--- /dev/null
+++ b/gcc/J
@@ -0,0 +1,1064 @@
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
1) /* RTL-based forward propagation pass for GNU compiler.
+a945c346f57b gcc/fwprop.cc (Jakub Jelinek   2024-01-03 12:19:35 +0100
2)Copyright (C) 2005-2024 Free Software Foundation, Inc.
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
3)Contributed by Paolo Bonzini and Steven Bosscher.
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
4) 
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
5) This file is part of GCC.
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
6) 
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
7) GCC is free software; you can redistribute it and/or modify it under
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
8) the terms of the GNU General Public License as published by the Free
+9dcd6f09a3dd gcc/fwprop.c  (Nick Clifton2007-07-26 08:37:01 +
9) Software Foundation; either version 3, or (at your option) any later
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
10) version.
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
11) 
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
12) GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
13) WARRANTY; without even the implied warranty of MERCHANTABILITY or
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
14) FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
15) for more details.
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
16) 
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
17) You should have received a copy of the GNU General Public License
+9dcd6f09a3dd gcc/fwprop.c  (Nick Clifton2007-07-26 08:37:01 +   
18) along with GCC; see the file COPYING3.  If not see
+9dcd6f09a3dd gcc/fwprop.c  (Nick Clifton2007-07-26 08:37:01 +   
19) .  */
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
20) 
+0b76990a9d75 gcc/fwprop.c  (Richard Sandiford   2020-12-17 00:15:12 +   
21) #define INCLUDE_ALGORITHM
+0b76990a9d75 gcc/fwprop.c  (Richard Sandiford   2020-12-17 00:15:12 +   
22) #define INCLUDE_FUNCTIONAL
+d6849aa92666 gcc/fwprop.cc (Richard Sandiford   2024-07-25 13:25:32 +0100   
23) #define INCLUDE_ARRAY
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
24) #include "config.h"
+a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:

[gcc r15-3416] ipa: Don't disable function parameter analysis for fat LTO

2024-09-03 Thread H.J. Lu via Gcc-cvs
https://gcc.gnu.org/g:2f1689ea8e631ebb4ff3720d56ef0362f5898ff6

commit r15-3416-g2f1689ea8e631ebb4ff3720d56ef0362f5898ff6
Author: H.J. Lu 
Date:   Tue Aug 27 13:11:39 2024 -0700

ipa: Don't disable function parameter analysis for fat LTO

Update analyze_parms not to disable function parameter analysis for
-ffat-lto-objects.  Tested on x86-64, there are no differences in zstd
with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects".

PR ipa/116410
* ipa-modref.cc (analyze_parms): Always analyze function parameter
for LTO.

Signed-off-by: H.J. Lu 

Diff:
---
 gcc/ipa-modref.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index 59cfe91f987a..9275030c2546 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -2975,7 +2975,7 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
summary->arg_flags.safe_grow_cleared (count, true);
  summary->arg_flags[parm_index] = EAF_UNUSED;
}
- else if (summary_lto)
+ if (summary_lto)
{
  if (parm_index >= summary_lto->arg_flags.length ())
summary_lto->arg_flags.safe_grow_cleared (count, true);
@@ -3034,7 +3034,7 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
summary->arg_flags.safe_grow_cleared (count, true);
  summary->arg_flags[parm_index] = flags;
}
- else if (summary_lto)
+ if (summary_lto)
{
  if (parm_index >= summary_lto->arg_flags.length ())
summary_lto->arg_flags.safe_grow_cleared (count, true);


[gcc r15-3417] Zen5 tuning part 2: disable gather and scatter

2024-09-03 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:d82edbe92eed53a479736fcbbe6d54d0fb42daa4

commit r15-3417-gd82edbe92eed53a479736fcbbe6d54d0fb42daa4
Author: Jan Hubicka 
Date:   Tue Sep 3 15:07:41 2024 +0200

Zen5 tuning part 2: disable gather and scatter

We disable gathers for zen4.  It seems that gather has improved a bit 
compared
to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions 
when
the indices are known ahead of time. Vector loads followed by shuffles 
result
in a higher load bandwidth." however the situation seems to be more
complicated.

gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot
products in TSVC. Curiously enough breaking these out into microbenchmark
reversed the situation and it turns out that the performance depends on
how indices are distributed.  gather is loss if indices are sequential,
neutral if they are random and win for some strides (4, 8).

This seems to be similar to earlier zens, so I think (especially for
backporting znver5 support) that it makes sense to be conistent and disable
gather unless we work out a good heuristics on when to use it. Since we
typically do not know the indices in advance, I don't see how that can be 
done.

I opened PR116582 with some examples of wins and loses

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for
ZNVER5.
(X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5.
(X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5.
(X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5.
(X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5.
(X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5.

Diff:
---
 gcc/config/i386/x86-tune.def | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index da1a3d6a3c6c..ed26136faee5 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -476,35 +476,35 @@ DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, 
"avoid_4byte_prefixes",
 /* X86_TUNE_USE_GATHER_2PARTS: Use gather instructions for vectors with 2
elements.  */
 DEF_TUNE (X86_TUNE_USE_GATHER_2PARTS, "use_gather_2parts",
- ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID
+ ~(m_ZNVER | m_CORE_HYBRID
| m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS))
 
 /* X86_TUNE_USE_SCATTER_2PARTS: Use scater instructions for vectors with 2
elements.  */
 DEF_TUNE (X86_TUNE_USE_SCATTER_2PARTS, "use_scatter_2parts",
- ~(m_ZNVER4))
+ ~(m_ZNVER4 | m_ZNVER5))
 
 /* X86_TUNE_USE_GATHER_4PARTS: Use gather instructions for vectors with 4
elements.  */
 DEF_TUNE (X86_TUNE_USE_GATHER_4PARTS, "use_gather_4parts",
- ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 | m_CORE_HYBRID
+ ~(m_ZNVER | m_CORE_HYBRID
| m_YONGFENG | m_SHIJIDADAO | m_CORE_ATOM | m_GENERIC | m_GDS))
 
 /* X86_TUNE_USE_SCATTER_4PARTS: Use scater instructions for vectors with 4
elements.  */
 DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts",
- ~(m_ZNVER4))
+ ~(m_ZNVER4 | m_ZNVER5))
 
 /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more
elements.  */
 DEF_TUNE (X86_TUNE_USE_GATHER_8PARTS, "use_gather_8parts",
- ~(m_ZNVER1 | m_ZNVER2 | m_ZNVER4 | m_CORE_HYBRID | m_CORE_ATOM
+ ~(m_ZNVER | m_CORE_HYBRID | m_CORE_ATOM
| m_YONGFENG | m_SHIJIDADAO | m_GENERIC | m_GDS))
 
 /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more
elements.  */
 DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_scatter_8parts",
- ~(m_ZNVER4))
+ ~(m_ZNVER4 | m_ZNVER5))
 
 /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
smaller FMA chain.  */


[gcc r15-3418] libstdc++: Add missing feature-test macro in various headers

2024-09-03 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:efe6efb6f315c7f97be8a850e0a84ff7f6651d85

commit r15-3418-gefe6efb6f315c7f97be8a850e0a84ff7f6651d85
Author: Dhruv Chawla 
Date:   Mon Aug 26 11:09:19 2024 +0530

libstdc++: Add missing feature-test macro in various headers

version.syn#2 requires various headers to define
__cpp_lib_allocator_traits_is_always_equal. Currently, only  was
defining this macro. Implement fixes for the other headers as well.

Signed-off-by: Dhruv Chawla 

libstdc++-v3/ChangeLog:

* include/std/deque: Define macro
__glibcxx_want_allocator_traits_is_always_equal.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/map: Likewise.
* include/std/scoped_allocator: Likewise.
* include/std/set: Likewise.
* include/std/string: Likewise.
* include/std/unordered_map: Likewise.
* include/std/unordered_set: Likewise.
* include/std/vector: Likewise.
* testsuite/20_util/headers/memory/version.cc: New test.
* testsuite/20_util/scoped_allocator/version.cc: Likewise.
* testsuite/21_strings/headers/string/version.cc: Likewise.
* testsuite/23_containers/deque/version.cc: Likewise.
* testsuite/23_containers/forward_list/version.cc: Likewise.
* testsuite/23_containers/list/version.cc: Likewise.
* testsuite/23_containers/map/version.cc: Likewise.
* testsuite/23_containers/set/version.cc: Likewise.
* testsuite/23_containers/unordered_map/version.cc: Likewise.
* testsuite/23_containers/unordered_set/version.cc: Likewise.
* testsuite/23_containers/vector/version.cc: Likewise.

Diff:
---
 libstdc++-v3/include/std/deque| 1 +
 libstdc++-v3/include/std/forward_list | 1 +
 libstdc++-v3/include/std/list | 1 +
 libstdc++-v3/include/std/map  | 1 +
 libstdc++-v3/include/std/scoped_allocator | 3 +++
 libstdc++-v3/include/std/set  | 1 +
 libstdc++-v3/include/std/string   | 1 +
 libstdc++-v3/include/std/unordered_map| 1 +
 libstdc++-v3/include/std/unordered_set| 1 +
 libstdc++-v3/include/std/vector   | 1 +
 libstdc++-v3/testsuite/20_util/headers/memory/version.cc  | 8 
 libstdc++-v3/testsuite/20_util/scoped_allocator/version.cc| 8 
 libstdc++-v3/testsuite/21_strings/headers/string/version.cc   | 8 
 libstdc++-v3/testsuite/23_containers/deque/version.cc | 8 
 libstdc++-v3/testsuite/23_containers/forward_list/version.cc  | 8 
 libstdc++-v3/testsuite/23_containers/list/version.cc  | 8 
 libstdc++-v3/testsuite/23_containers/map/version.cc   | 8 
 libstdc++-v3/testsuite/23_containers/set/version.cc   | 8 
 libstdc++-v3/testsuite/23_containers/unordered_map/version.cc | 8 
 libstdc++-v3/testsuite/23_containers/unordered_set/version.cc | 8 
 libstdc++-v3/testsuite/23_containers/vector/version.cc| 8 
 21 files changed, 100 insertions(+)

diff --git a/libstdc++-v3/include/std/deque b/libstdc++-v3/include/std/deque
index 0bf8309c19a7..69f8c0dcdccf 100644
--- a/libstdc++-v3/include/std/deque
+++ b/libstdc++-v3/include/std/deque
@@ -68,6 +68,7 @@
 #include 
 #include 
 
+#define __glibcxx_want_allocator_traits_is_always_equal
 #define __glibcxx_want_erase_if
 #define __glibcxx_want_nonmember_container_access
 #include 
diff --git a/libstdc++-v3/include/std/forward_list 
b/libstdc++-v3/include/std/forward_list
index 5ac74360808d..dfd7d48d1219 100644
--- a/libstdc++-v3/include/std/forward_list
+++ b/libstdc++-v3/include/std/forward_list
@@ -45,6 +45,7 @@
 # include 
 #endif
 
+#define __glibcxx_want_allocator_traits_is_always_equal
 #define __glibcxx_want_erase_if
 #define __glibcxx_want_incomplete_container_elements
 #define __glibcxx_want_list_remove_return_type
diff --git a/libstdc++-v3/include/std/list b/libstdc++-v3/include/std/list
index fce4e3d925b1..ff632fc1ab2f 100644
--- a/libstdc++-v3/include/std/list
+++ b/libstdc++-v3/include/std/list
@@ -69,6 +69,7 @@
 # include 
 #endif
 
+#define __glibcxx_want_allocator_traits_is_always_equal
 #define __glibcxx_want_erase_if
 #define __glibcxx_want_incomplete_container_elements
 #define __glibcxx_want_list_remove_return_type
diff --git a/libstdc++-v3/include/std/map b/libstdc++-v3/include/std/map
index 4a96e59a5bc1..6520d9f744fe 100644
--- a/libstdc++-v3/include/std/map
+++ b/libstdc++-v3/include/std/map
@@ -69,6 +69,7 @@
 # include 
 #endif
 
+#define __glibcxx_want_allocator_traits_is_always_equal
 #define __glibcxx_want_erase_if
 #define __glibcxx_want_generic_associative_l

[gcc r15-3419] libstdc++: Simplify std::any to fix -Wdeprecated-declarations warning

2024-09-03 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:dee3c5c6ff9952204af3014383593e8d316250e4

commit r15-3419-gdee3c5c6ff9952204af3014383593e8d316250e4
Author: Jonathan Wakely 
Date:   Wed Aug 28 13:07:47 2024 +0100

libstdc++: Simplify std::any to fix -Wdeprecated-declarations warning

We don't need to use std::aligned_storage in std::any. We just need a
POD type of the right size. The void* union member already ensures the
alignment will be correct. Avoiding std::aligned_storage means we don't
need to suppress a -Wdeprecated-declarations warning.

libstdc++-v3/ChangeLog:

* include/experimental/any (experimental::any::_Storage): Use
array of unsigned char instead of deprecated
std::aligned_storage.
* include/std/any (any::_Storage): Likewise.
* testsuite/20_util/any/layout.cc: New test.

Diff:
---
 libstdc++-v3/include/experimental/any|  2 +-
 libstdc++-v3/include/std/any |  2 +-
 libstdc++-v3/testsuite/20_util/any/layout.cc | 22 ++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/experimental/any 
b/libstdc++-v3/include/experimental/any
index 27a7a146e53c..3db30df5c75e 100644
--- a/libstdc++-v3/include/experimental/any
+++ b/libstdc++-v3/include/experimental/any
@@ -102,7 +102,7 @@ inline namespace fundamentals_v1
   _Storage& operator=(const _Storage&) = delete;
 
   void* _M_ptr;
-  aligned_storage::type _M_buffer;
+  unsigned char _M_buffer[sizeof(_M_ptr)];
 };
 
 template,
diff --git a/libstdc++-v3/include/std/any b/libstdc++-v3/include/std/any
index e4709b1ce046..9ae29aab99fa 100644
--- a/libstdc++-v3/include/std/any
+++ b/libstdc++-v3/include/std/any
@@ -90,7 +90,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Storage& operator=(const _Storage&) = delete;
 
   void* _M_ptr;
-  aligned_storage::type _M_buffer;
+  unsigned char _M_buffer[sizeof(_M_ptr)];
 };
 
 template,
diff --git a/libstdc++-v3/testsuite/20_util/any/layout.cc 
b/libstdc++-v3/testsuite/20_util/any/layout.cc
new file mode 100644
index ..5a7f4a8a280f
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/any/layout.cc
@@ -0,0 +1,22 @@
+// { dg-options "-Wno-deprecated-declarations" }
+// { dg-do compile { target c++17 } }
+
+// Verify that r15-3419 did not change the layout of std::any
+
+#include 
+
+namespace test {
+  class any {
+union Storage {
+  constexpr Storage() : ptr(nullptr) { }
+  void* ptr;
+  std::aligned_storage::type buffer;
+};
+
+void (*manager)(int, const any*, void*);
+Storage storage;
+  };
+}
+
+static_assert( sizeof(std::any) == sizeof(test::any) );
+static_assert( alignof(std::any) == alignof(test::any) );


[gcc r15-3420] Zen5 tuning part 3: scheduler tweaks

2024-09-03 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:e2125a600552bc6e0329e3f1224eea14804db8d3

commit r15-3420-ge2125a600552bc6e0329e3f1224eea14804db8d3
Author: Jan Hubicka 
Date:   Tue Sep 3 16:26:16 2024 +0200

Zen5 tuning part 3: scheduler tweaks

this patch adds support for new fussion in znver5 documented in the
optimization manual:

   The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions
   with certain ALU instructions. The following conditions need to be met 
for
   fusion to happen:
 - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B
 - The MOV is followed by an ALU instruction where the MOV and ALU 
destination register match.
 - The ALU instruction may source only registers or immediate data. 
There cannot be any memory source.
 - The ALU instruction sources either the source or dest of MOV 
instruction.
 - If ALU instruction has 2 reg sources, they should be different.
 - The following ALU instructions can fuse with an older qualified MOV 
instruction:
   ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR
   (I assume OP is OR)

I also increased issue rate from 4 to 6.  Theoretically znver5 can do more, 
but
with our model we can't realy use it.
Increasing issue rate to 8 leads to infinite loop in scheduler.

Finally, I also enabled fuse_alu_and_branch since it is supported by
znver5 (I think by earlier zens too).

New fussion pattern moves quite few instructions around in common code:
@@ -2210,13 +2210,13 @@
.cfi_offset 3, -32
leaq63(%rsi), %rbx
movq%rbx, %rbp
+   shrq$6, %rbp
+   salq$3, %rbp
subq$16, %rsp
.cfi_def_cfa_offset 48
movq%rdi, %r12
-   shrq$6, %rbp
-   movq%rsi, 8(%rsp)
-   salq$3, %rbp
movq%rbp, %rdi
+   movq%rsi, 8(%rsp)
call_Znwm
movq8(%rsp), %rsi
movl$0, 8(%r12)
@@ -2224,8 +2224,8 @@
movq%rax, (%r12)
movq%rbp, 32(%r12)
testq   %rsi, %rsi
-   movq%rsi, %rdx
cmovns  %rsi, %rbx
+   movq%rsi, %rdx
sarq$63, %rdx
shrq$58, %rdx
sarq$6, %rbx
which should help decoder bandwidth and perhaps also cache, though I was not
able to measure off-noise effect on SPEC.

gcc/ChangeLog:

* config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5.
(ix86_adjust_cost): Add TODO about znver5 memory latency.
(ix86_fuse_mov_alu_p): New.
(ix86_macro_fusion_pair_p): Use it.
* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add 
ZNVER5.
(X86_TUNE_FUSE_MOV_AND_ALU): New tune;

Diff:
---
 gcc/config/i386/i386.h|  2 ++
 gcc/config/i386/x86-tune-sched.cc | 67 ++-
 gcc/config/i386/x86-tune.def  | 11 +--
 3 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index eabb3248ea00..c1ec92ffb150 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -430,6 +430,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS]
 #define TARGET_FUSE_ALU_AND_BRANCH \
ix86_tune_features[X86_TUNE_FUSE_ALU_AND_BRANCH]
+#define TARGET_FUSE_MOV_AND_ALU \
+   ix86_tune_features[X86_TUNE_FUSE_MOV_AND_ALU]
 #define TARGET_OPT_AGU ix86_tune_features[X86_TUNE_OPT_AGU]
 #define TARGET_AVOID_LEA_FOR_ADDR \
ix86_tune_features[X86_TUNE_AVOID_LEA_FOR_ADDR]
diff --git a/gcc/config/i386/x86-tune-sched.cc 
b/gcc/config/i386/x86-tune-sched.cc
index d77298b0e34d..c6d5426ae8d3 100644
--- a/gcc/config/i386/x86-tune-sched.cc
+++ b/gcc/config/i386/x86-tune-sched.cc
@@ -67,7 +67,6 @@ ix86_issue_rate (void)
 case PROCESSOR_ZNVER2:
 case PROCESSOR_ZNVER3:
 case PROCESSOR_ZNVER4:
-case PROCESSOR_ZNVER5:
 case PROCESSOR_CORE2:
 case PROCESSOR_NEHALEM:
 case PROCESSOR_SANDYBRIDGE:
@@ -91,6 +90,13 @@ ix86_issue_rate (void)
   return 5;
 
 case PROCESSOR_SAPPHIRERAPIDS:
+/* For znver5 decoder can handle 4 or 8 instructions per cycle,
+   op cache 12 instruction/cycle, dispatch 8 instructions
+   integer rename 8 instructions and Fp 6 instructions.
+
+   The scheduler, without understanding out of order nature of the CPU
+   is unlikely going to be able to fill all of these.  */
+case PROCESSOR_ZNVER5:
   return 6;
 
 default:
@@ -434,6 +440,8 @@ ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn 
*dep_insn, int cost,
  enum attr_unit unit = get_attr_unit (insn);
  int loadcost;
 
+ /* TODO: On

[gcc r15-3421] Fix missed peeling for gaps with SLP load-lanes

2024-09-03 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:bd120de19c600d064b3b3b5abf8c36ffc0037c40

commit r15-3421-gbd120de19c600d064b3b3b5abf8c36ffc0037c40
Author: Richard Biener 
Date:   Tue Sep 3 15:04:42 2024 +0200

Fix missed peeling for gaps with SLP load-lanes

The following disables peeling for gap avoidance with using smaller
vector accesses when using load-lanes.

* tree-vect-stmts.cc (get_group_load_store_type): Only disable
peeling for gaps by using smaller vectors when not using
load-lanes.

Diff:
---
 gcc/tree-vect-stmts.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ace1c8eaa0de..16f6889d853a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2127,6 +2127,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
  unsigned HOST_WIDE_INT tem, num;
  if (overrun_p
  && !masked_p
+ && *memory_access_type != VMAT_LOAD_STORE_LANES
  && (((alss = vect_supportable_dr_alignment (vinfo, first_dr_info,
  vectype, misalign)))
   == dr_aligned


[gcc r15-3422] Dump whether a SLP node represents load/store-lanes

2024-09-03 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:ef0c4482ca8069fa56e8d359dbdc6168be499f69

commit r15-3422-gef0c4482ca8069fa56e8d359dbdc6168be499f69
Author: Richard Biener 
Date:   Tue Sep 3 15:05:43 2024 +0200

Dump whether a SLP node represents load/store-lanes

This makes it easier to discover whether SLP load or store nodes
participate in load/store-lanes accesses.

* tree-vect-slp.cc (vect_print_slp_tree): Annotate load
and store-lanes nodes.

Diff:
---
 gcc/tree-vect-slp.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 1342913affa1..2b05032790e5 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2958,14 +2958,17 @@ vect_print_slp_tree (dump_flags_t dump_kind, 
dump_location_t loc,
dump_printf (dump_kind, " %u[%u]",
 SLP_TREE_LANE_PERMUTATION (node)[i].first,
 SLP_TREE_LANE_PERMUTATION (node)[i].second);
-  dump_printf (dump_kind, " }\n");
+  dump_printf (dump_kind, " }%s\n",
+  node->ldst_lanes ? " (load-lanes)" : "");
 }
   if (SLP_TREE_CHILDREN (node).is_empty ())
 return;
   dump_printf_loc (metadata, user_loc, "\tchildren");
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
 dump_printf (dump_kind, " %p", (void *)child);
-  dump_printf (dump_kind, "\n");
+  dump_printf (dump_kind, "%s\n",
+  node->ldst_lanes && !SLP_TREE_LANE_PERMUTATION (node).exists ()
+  ? " (store-lanes)" : "");
 }
 
 DEBUG_FUNCTION void


[gcc r15-3423] libstdc++: Specialize std::disable_sized_sentinel_for for std::move_iterator [PR116549]

2024-09-03 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:819deae0a5bee079a7d5582fafaa098c26144ae8

commit r15-3423-g819deae0a5bee079a7d5582fafaa098c26144ae8
Author: Jonathan Wakely 
Date:   Mon Sep 2 11:29:13 2024 +0100

libstdc++: Specialize std::disable_sized_sentinel_for for 
std::move_iterator [PR116549]

LWG 3736 added a partial specialization of this variable template for
two std::move_iterator types. This is needed for the case where the
types satisfy std::sentinel_for and are subtractable, but do not model
the semantics requirements of std::sized_sentinel_for.

libstdc++-v3/ChangeLog:

PR libstdc++/116549
* include/bits/stl_iterator.h (disable_sized_sentinel_for):
Define specialization for two move_iterator types, as per LWG
3736.
* testsuite/24_iterators/move_iterator/lwg3736.cc: New test.

Diff:
---
 libstdc++-v3/include/bits/stl_iterator.h   |  8 
 .../24_iterators/move_iterator/lwg3736.cc  | 52 ++
 2 files changed, 60 insertions(+)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index d38230572709..20c0319f3a7a 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1822,6 +1822,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return _ReturnType(__i); }
 
 #if __cplusplus > 201703L && __glibcxx_concepts
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3736.  move_iterator missing disable_sized_sentinel_for specialization
+  template
+requires (!sized_sentinel_for<_Iterator1, _Iterator2>)
+inline constexpr bool
+disable_sized_sentinel_for,
+  move_iterator<_Iterator2>> = true;
+
   // [iterators.common] Common iterators
 
   namespace __detail
diff --git a/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3736.cc 
b/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3736.cc
new file mode 100644
index ..eaf791b30893
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3736.cc
@@ -0,0 +1,52 @@
+// { dg-do compile { target c++20 } }
+
+// 3736.  move_iterator missing disable_sized_sentinel_for specialization
+
+#include 
+
+template using MoveIter = std::move_iterator;
+
+using std::sized_sentinel_for;
+using std::disable_sized_sentinel_for;
+
+// These assertions always passed, even without LWG 3736:
+static_assert(sized_sentinel_for, MoveIter>);
+static_assert(sized_sentinel_for, MoveIter>);
+static_assert(not sized_sentinel_for, MoveIter>);
+static_assert(not sized_sentinel_for, std::default_sentinel_t>);
+static_assert(not disable_sized_sentinel_for, MoveIter>);
+
+// These types don't satisfy sized_sentinel_for anyway (because the subtraction
+// is ill-formed) but LWG 3736 makes the variable template explicitly false:
+static_assert(disable_sized_sentinel_for, MoveIter>);
+
+struct Iter
+{
+  using iterator_category = std::random_access_iterator_tag;
+  using value_type = int;
+  using pointer = int*;
+  using reference = int&;
+  using difference_type = long;
+
+  Iter() = default;
+  Iter& operator++();
+  Iter operator++(int);
+  Iter& operator--();
+  Iter operator--(int);
+  reference operator*() const;
+  pointer operator->() const;
+  Iter& operator+=(difference_type);
+  Iter& operator-=(difference_type);
+  friend Iter operator+(Iter, difference_type);
+  friend Iter operator+(difference_type, Iter);
+  friend Iter operator-(Iter, difference_type);
+  friend difference_type operator-(Iter, Iter);
+  bool operator==(Iter) const;
+};
+
+// Specialize the variable template so that Iter is not its own sized sentinel:
+template<> constexpr bool std::disable_sized_sentinel_for = true;
+static_assert( not sized_sentinel_for );
+
+// LWG 3736 means that affects std::move_iterator as well:
+static_assert( not sized_sentinel_for, MoveIter> );


[gcc r15-3424] libstdc++: Fix error handling in fs::hard_link_count for Windows

2024-09-03 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:71b1639c67b91554420cc38eb4c82323e535c816

commit r15-3424-g71b1639c67b91554420cc38eb4c82323e535c816
Author: Jonathan Wakely 
Date:   Mon Sep 2 12:16:49 2024 +0100

libstdc++: Fix error handling in fs::hard_link_count for Windows

The recent change to use auto_win_file_handle for
std::filesystem::hard_link_count caused a regression. The
std::error_code argument should be cleared if no error occurs, but this
no longer happens. Add a call to ec.clear() in fs::hard_link_count to
fix this.

Also change the auto_win_file_handle class to take a reference to the
std::error_code and set it if an error occurs, to slightly simplify the
control flow in the fs::equiv_files function.

libstdc++-v3/ChangeLog:

* src/c++17/fs_ops.cc (auto_win_file_handle): Add error_code&
member and set it if CreateFileW or GetFileInformationByHandle
fails.
(fs::equiv_files) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Simplify
control flow.
(fs::hard_link_count) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Clear ec
on success.
* testsuite/27_io/filesystem/operations/hard_link_count.cc:
Check error handling.

Diff:
---
 libstdc++-v3/src/c++17/fs_ops.cc   | 59 --
 .../27_io/filesystem/operations/hard_link_count.cc | 24 +
 2 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 9606afa9f1f7..946fefd9e449 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -829,23 +829,37 @@ namespace
   struct auto_win_file_handle
   {
 explicit
-auto_win_file_handle(const wchar_t* p)
+auto_win_file_handle(const wchar_t* p, std::error_code& ec) noexcept
 : handle(CreateFileW(p, 0,
 FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
-0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
-{ }
+0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0)),
+  ec(ec)
+{
+  if (handle == INVALID_HANDLE_VALUE)
+   ec = std::__last_system_error();
+}
 
 ~auto_win_file_handle()
 { if (*this) CloseHandle(handle); }
 
-explicit operator bool() const
+explicit operator bool() const noexcept
 { return handle != INVALID_HANDLE_VALUE; }
 
-bool get_info()
-{ return GetFileInformationByHandle(handle, &info); }
+bool get_info() noexcept
+{
+  if (GetFileInformationByHandle(handle, &info))
+   return true;
+  ec = std::__last_system_error();
+  return false;
+}
 
 HANDLE handle;
 BY_HANDLE_FILE_INFORMATION info;
+// Like errno, we only set this on error and never clear it.
+// This propagates an error_code to the caller when something goes wrong,
+// but the caller should not assume a non-zero ec means an error happened
+// unless they explicitly cleared it before passing it to our constructor.
+std::error_code& ec;
   };
 }
 #endif
@@ -866,23 +880,14 @@ fs::equiv_files([[maybe_unused]] const char_type* p1, 
const stat_type& st1,
   if (st1.st_mode != st2.st_mode || st1.st_dev != st2.st_dev)
 return false;
 
-  // Need to use GetFileInformationByHandle to get more info about the files.
-  auto_win_file_handle h1(p1);
-  auto_win_file_handle h2(p2);
-  if (!h1 || !h2)
-{
-  if (!h1 && !h2)
-   ec = __last_system_error();
-  return false;
-}
-  if (!h1.get_info() || !h2.get_info())
-{
-  ec = __last_system_error();
-  return false;
-}
-  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
-  && h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
-  && h1.info.nFileIndexLow == h2.info.nFileIndexLow;
+  // Use GetFileInformationByHandle to get more info about the files.
+  if (auto_win_file_handle h1{p1, ec})
+if (auto_win_file_handle h2{p2, ec})
+  if (h1.get_info() && h2.get_info())
+   return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
+&& h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
+&& h1.info.nFileIndexLow == h2.info.nFileIndexLow;
+  return false;
 #endif // _GLIBCXX_FILESYSTEM_IS_WINDOWS
 }
 #endif // NEED_DO_COPY_FILE
@@ -1007,10 +1012,12 @@ std::uintmax_t
 fs::hard_link_count(const path& p, error_code& ec) noexcept
 {
 #if _GLIBCXX_FILESYSTEM_IS_WINDOWS
-  auto_win_file_handle h(p.c_str());
+  auto_win_file_handle h(p.c_str(), ec);
   if (h && h.get_info())
-return static_cast(h.info.nNumberOfLinks);
-  ec = __last_system_error();
+{
+  ec.clear();
+  return static_cast(h.info.nNumberOfLinks);
+}
   return static_cast(-1);
 #elif defined _GLIBCXX_HAVE_SYS_STAT_H
   return do_stat(p, ec, std::mem_fn(&stat_type::st_nlink),
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/operations/hard_link_count.cc 
b/libstdc+

[gcc r15-3425] Zen5 tuning part 3: fix typo in previous patch

2024-09-03 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:910e1769a0653ac32bd8c1d6aabb39c797d5d773

commit r15-3425-g910e1769a0653ac32bd8c1d6aabb39c797d5d773
Author: Jan Hubicka 
Date:   Tue Sep 3 17:25:05 2024 +0200

Zen5 tuning part 3: fix typo in previous patch

gcc/ChangeLog:

* config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Fix
typo.

Diff:
---
 gcc/config/i386/x86-tune-sched.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-tune-sched.cc 
b/gcc/config/i386/x86-tune-sched.cc
index c6d5426ae8d3..4ebdf111269b 100644
--- a/gcc/config/i386/x86-tune-sched.cc
+++ b/gcc/config/i386/x86-tune-sched.cc
@@ -613,7 +613,7 @@ ix86_fuse_mov_alu_p (rtx_insn *mov, rtx_insn *alu)
   /* One of operands should be register.  */
   if (op1 && (!REG_P (op0) || REGNO (op0) != REGNO (reg)))
 std::swap (op0, op1);
-  if (!REG_P (op0) || REGNO (op1) != REGNO (reg))
+  if (!REG_P (op0) || REGNO (op0) != REGNO (reg))
 return false;
   if (op1
   && !REG_P (op1)


[gcc r15-3426] Drop file that should not have been committed.

2024-09-03 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:36f63000c6f869f4f5550780d77b381b1a8b1700

commit r15-3426-g36f63000c6f869f4f5550780d77b381b1a8b1700
Author: Jeff Law 
Date:   Tue Sep 3 09:30:35 2024 -0600

Drop file that should not have been committed.

* J: Drop file that should not have been committed

Diff:
---
 gcc/J | 1064 -
 1 file changed, 1064 deletions(-)

diff --git a/gcc/J b/gcc/J
deleted file mode 100644
index 6b6332dd2c13..
--- a/gcc/J
+++ /dev/null
@@ -1,1064 +0,0 @@
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
1) /* RTL-based forward propagation pass for GNU compiler.
-a945c346f57b gcc/fwprop.cc (Jakub Jelinek   2024-01-03 12:19:35 +0100
2)Copyright (C) 2005-2024 Free Software Foundation, Inc.
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
3)Contributed by Paolo Bonzini and Steven Bosscher.
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
4) 
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
5) This file is part of GCC.
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
6) 
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
7) GCC is free software; you can redistribute it and/or modify it under
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +
8) the terms of the GNU General Public License as published by the Free
-9dcd6f09a3dd gcc/fwprop.c  (Nick Clifton2007-07-26 08:37:01 +
9) Software Foundation; either version 3, or (at your option) any later
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
10) version.
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
11) 
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
12) GCC is distributed in the hope that it will be useful, but WITHOUT ANY
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
13) WARRANTY; without even the implied warranty of MERCHANTABILITY or
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
14) FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
15) for more details.
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
16) 
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
17) You should have received a copy of the GNU General Public License
-9dcd6f09a3dd gcc/fwprop.c  (Nick Clifton2007-07-26 08:37:01 +   
18) along with GCC; see the file COPYING3.  If not see
-9dcd6f09a3dd gcc/fwprop.c  (Nick Clifton2007-07-26 08:37:01 +   
19) .  */
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
20) 
-0b76990a9d75 gcc/fwprop.c  (Richard Sandiford   2020-12-17 00:15:12 +   
21) #define INCLUDE_ALGORITHM
-0b76990a9d75 gcc/fwprop.c  (Richard Sandiford   2020-12-17 00:15:12 +   
22) #define INCLUDE_FUNCTIONAL
-d6849aa92666 gcc/fwprop.cc (Richard Sandiford   2024-07-25 13:25:32 +0100   
23) #define INCLUDE_ARRAY
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
24) #include "config.h"
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
25) #include "system.h"
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
26) #include "coretypes.h"
-c7131fb2b58a gcc/fwprop.c  (Andrew MacLeod  2015-07-08 00:53:03 +   
27) #include "backend.h"
-c7131fb2b58a gcc/fwprop.c  (Andrew MacLeod  2015-07-08 00:53:03 +   
28) #include "rtl.h"
-1815e313a8fb gcc/fwprop.cc (Uros Bizjak 2023-07-14 11:46:22 +0200   
29) #include "rtlanal.h"
-c7131fb2b58a gcc/fwprop.c  (Andrew MacLeod  2015-07-08 00:53:03 +   
30) #include "df.h"
-0b76990a9d75 gcc/fwprop.c  (Richard Sandiford   2020-12-17 00:15:12 +   
31) #include "rtl-ssa.h"
-957060b5c5d2 gcc/fwprop.c  (Andrew MacLeod  2015-10-29 13:57:32 +   
32) 
-0b76990a9d75 gcc/fwprop.c  (Richard Sandiford   2020-12-17 00:15:12 +   
33) #include "predict.h"
-60393bbc613a gcc/fwprop.c  (Andrew MacLeod  2014-10-27 12:41:01 +   
34) #include "cfgrtl.h"
-60393bbc613a gcc/fwprop.c  (Andrew MacLeod  2014-10-27 12:41:01 +   
35) #include "cfgcleanup.h"
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
36) #include "cfgloop.h"
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45 +   
37) #include "tree-pass.h"
-aa4e2d7ef068 gcc/fwprop.c  (Richard Sandiford   2014-08-28 06:23:26 +   
38) #include "rtl-iter.h"
-0b76990a9d75 gcc/fwprop.c  (Richard Sandiford   2020-12-17 00:15:12 +   
39) #include "target.h"
-a52b023a5f03 gcc/fwprop.c  (Paolo Bonzini   2006-11-04 08:36:45

[gcc r15-3427] Zen5 tuning part 4: update reassocation width

2024-09-03 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:f0ab3de6ec0e3540f2e57f3f5628005f0a4e3fa5

commit r15-3427-gf0ab3de6ec0e3540f2e57f3f5628005f0a4e3fa5
Author: Jan Hubicka 
Date:   Tue Sep 3 18:20:34 2024 +0200

Zen5 tuning part 4: update reassocation width

Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute 
in
3 of them.  FP units can do 2 additions and 2 multiplications with latency 2
and 3.  This patch updates reassociation width accordingly.  This has 
potential
of increasing register pressure but unlike while benchmarking znver1 tuning
I did not noticed this actually causing problem on spec, so this patch bumps
up reassociation width to 6 for everything except for integer vectors, where
there are 4 units with typical latency of 1.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_reassociation_width): Update for Znver5.
* config/i386/x86-tune-costs.h (znver5_costs): Update reassociation
widths.

Diff:
---
 gcc/config/i386/i386.cc  | 10 +++---
 gcc/config/i386/x86-tune-costs.h | 23 +--
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 7af9ceca429f..e8744fa77ead 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24483,13 +24483,17 @@ ix86_reassociation_width (unsigned int op, 
machine_mode mode)
   if (width == 1)
return 1;
 
-  /* Integer vector instructions execute in FP unit
+  /* Znver1-4 Integer vector instructions execute in FP unit
 and can execute 3 additions and one multiplication per cycle.  */
   if ((ix86_tune == PROCESSOR_ZNVER1 || ix86_tune == PROCESSOR_ZNVER2
-  || ix86_tune == PROCESSOR_ZNVER3 || ix86_tune == PROCESSOR_ZNVER4
-  || ix86_tune == PROCESSOR_ZNVER5)
+  || ix86_tune == PROCESSOR_ZNVER3 || ix86_tune == PROCESSOR_ZNVER4)
  && INTEGRAL_MODE_P (mode) && op != PLUS && op != MINUS)
return 1;
+  /* Znver5 can do 2 integer multiplications per cycle with latency
+of 3.  */
+  if (ix86_tune == PROCESSOR_ZNVER5
+ && INTEGRAL_MODE_P (mode) && op != PLUS && op != MINUS)
+   width = 6;
 
   /* Account for targets that splits wide vectors into multiple parts.  */
   if (TARGET_AVX512_SPLIT_REGS && GET_MODE_BITSIZE (mode) > 256)
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 2bfaee554d53..b90567fbbf23 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -2100,16 +2100,19 @@ struct processor_costs znver5_cost = {
   COSTS_N_INSNS (13),  /* cost of DIVSD instruction.  */
   COSTS_N_INSNS (14),  /* cost of SQRTSS instruction.  */
   COSTS_N_INSNS (20),  /* cost of SQRTSD instruction.  */
-  /* Zen can execute 4 integer operations per cycle.  FP operations
- take 3 cycles and it can execute 2 integer additions and 2
- multiplications thus reassociation may make sense up to with of 6.
- SPEC2k6 bencharks suggests
- that 4 works better than 6 probably due to register pressure.
-
- Integer vector operations are taken by FP unit and execute 3 vector
- plus/minus operations per cycle but only one multiply.  This is adjusted
- in ix86_reassociation_width.  */
-  4, 4, 3, 6,  /* reassoc int, fp, vec_int, vec_fp.  */
+  /* Zen5 can execute:
+  - integer ops: 6 per cycle, at most 3 multiplications.
+   latency 1 for additions, 3 for multiplications (pipelined)
+
+   Setting width of 9 for multiplication is probably excessive
+   for register pressure.
+  - fp ops: 2 additions per cycle, latency 2-3
+   2 multiplicaitons per cycle, latency 3
+  - vector intger ops: 4 additions, latency 1
+  2 multiplications, latency 4
+   We increase width to 6 for multiplications
+   in ix86_reassociation_width.  */
+  6, 6, 4, 6,  /* reassoc int, fp, vec_int, vec_fp.  */
   znver2_memcpy,
   znver2_memset,
   COSTS_N_INSNS (4),   /* cond_taken_branch_cost.  */


[gcc r15-3428] c++: add fixed test [PR109095]

2024-09-03 Thread Marek Polacek via Gcc-cvs
https://gcc.gnu.org/g:5f3a6e26aab16a792176b33fbee1456a91aaebf2

commit r15-3428-g5f3a6e26aab16a792176b33fbee1456a91aaebf2
Author: Marek Polacek 
Date:   Tue Sep 3 13:32:35 2024 -0400

c++: add fixed test [PR109095]

Fixed by r13-6693.

PR c++/109095

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class66.C: New test.

Diff:
---
 gcc/testsuite/g++.dg/cpp2a/nontype-class66.C | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class66.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class66.C
new file mode 100644
index ..385b290521fe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class66.C
@@ -0,0 +1,19 @@
+// PR c++/109095
+// { dg-do compile { target c++20 } }
+
+template< typename T >
+struct bar
+{};
+
+template< int X >
+struct baz
+{};
+
+template< auto N, template< auto N2 > typename TT >
+struct foo;
+
+template< typename T, bar< T > B, template< T N2 > typename TT >
+struct foo< B, TT >
+{};
+
+foo< bar< int >{}, baz > x;


[gcc r15-3430] pretty-print: add selftest of pp_format's stack

2024-09-03 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:d0891f3aa75d31744de728905f2f454e9d07ce54

commit r15-3430-gd0891f3aa75d31744de728905f2f454e9d07ce54
Author: David Malcolm 
Date:   Tue Sep 3 15:11:01 2024 -0400

pretty-print: add selftest of pp_format's stack

gcc/ChangeLog:
* pretty-print-format-impl.h (pp_formatted_chunks::get_prev): New
accessor.
* pretty-print.cc (selftest::push_pp_format): New.
(ASSERT_TEXT_TOKEN): New macro.
(selftest::test_pp_format_stack): New test.
(selftest::pretty_print_cc_tests): New.

Signed-off-by: David Malcolm 

Diff:
---
 gcc/pretty-print-format-impl.h |  3 ++
 gcc/pretty-print.cc| 78 ++
 2 files changed, 81 insertions(+)

diff --git a/gcc/pretty-print-format-impl.h b/gcc/pretty-print-format-impl.h
index c70f61ce1bab..ec4425c9dafb 100644
--- a/gcc/pretty-print-format-impl.h
+++ b/gcc/pretty-print-format-impl.h
@@ -376,6 +376,9 @@ public:
   void dump (FILE *out) const;
   void DEBUG_FUNCTION dump () const { dump (stderr); }
 
+  // For use in selftests
+  pp_formatted_chunks *get_prev () const { return m_prev; }
+
 private:
   /* Pointer to previous level on the stack.  */
   pp_formatted_chunks *m_prev;
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index 50aea69edd62..115f376c4512 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -3547,6 +3547,83 @@ test_custom_tokens_2 ()
"print_tokens was called");
 }
 
+/* Helper subroutine for test_pp_format_stack.
+   Call pp_format (phases 1 and 2), without calling phase 3.  */
+
+static void
+push_pp_format (pretty_printer *pp, const char *msg, ...)
+{
+  va_list ap;
+
+  va_start (ap, msg);
+  rich_location rich_loc (line_table, UNKNOWN_LOCATION);
+  text_info ti (msg, &ap, 0, nullptr, &rich_loc);
+  pp_format (pp, &ti);
+  va_end (ap);
+}
+
+#define ASSERT_TEXT_TOKEN(TOKEN, EXPECTED_TEXT)\
+  SELFTEST_BEGIN_STMT  \
+ASSERT_NE ((TOKEN), nullptr);  \
+ASSERT_EQ ((TOKEN)->m_kind, pp_token::kind::text); \
+ASSERT_STREQ   \
+  (as_a  (TOKEN)->m_value.get (),   \
+   (EXPECTED_TEXT));   \
+  SELFTEST_END_STMT
+
+
+/* Verify that the stack of pp_formatted_chunks works as expected.  */
+
+static void
+test_pp_format_stack ()
+{
+  auto_fix_quotes fix_quotes;
+
+  pretty_printer pp;
+  push_pp_format (&pp, "unexpected foo: %i bar: %qs", 42, "test");
+  push_pp_format (&pp, "In function: %qs", "test_fn");
+
+  /* Expect the top of the stack to have:
+ (gdb) call top->dump()
+ 0: [TEXT("In function: ")]
+ 1: [BEGIN_QUOTE, TEXT("test_fn"), END_QUOTE].  */
+
+  pp_formatted_chunks *top = pp_buffer (&pp)->m_cur_formatted_chunks;
+  ASSERT_NE (top, nullptr);
+  ASSERT_TEXT_TOKEN (top->get_token_lists ()[0]->m_first, "In function: ");
+  ASSERT_EQ (top->get_token_lists ()[1]->m_first->m_kind,
+pp_token::kind::begin_quote);
+  ASSERT_EQ (top->get_token_lists ()[2], nullptr);
+
+  /* Expect an entry in the stack below it with:
+ 0: [TEXT("unexpected foo: ")]
+ 1: [TEXT("42")]
+ 2: [TEXT(" bar: ")]
+ 3: [BEGIN_QUOTE, TEXT("test"), END_QUOTE].  */
+  pp_formatted_chunks *prev = top->get_prev ();
+  ASSERT_NE (prev, nullptr);
+  ASSERT_TEXT_TOKEN (prev->get_token_lists ()[0]->m_first, "unexpected foo: ");
+  ASSERT_TEXT_TOKEN (prev->get_token_lists ()[1]->m_first, "42");
+  ASSERT_TEXT_TOKEN (prev->get_token_lists ()[2]->m_first, " bar: ");
+  ASSERT_EQ (prev->get_token_lists ()[3]->m_first->m_kind,
+pp_token::kind::begin_quote);
+  ASSERT_EQ (prev->get_token_lists ()[4], nullptr);
+
+  ASSERT_EQ (prev->get_prev (), nullptr);
+
+  /* Pop the top of the stack.  */
+  pp_output_formatted_text (&pp);
+  ASSERT_EQ (pp_buffer (&pp)->m_cur_formatted_chunks, prev);
+  pp_newline (&pp);
+
+  /* Pop the remaining entry from the stack.  */
+  pp_output_formatted_text (&pp);
+  ASSERT_EQ (pp_buffer (&pp)->m_cur_formatted_chunks, nullptr);
+
+  ASSERT_STREQ (pp_formatted_text (&pp),
+   "In function: `test_fn'\nunexpected foo: 42 bar: `test'");
+}
+
 /* A subclass of pretty_printer for use by test_prefixes_and_wrapping.  */
 
 class test_pretty_printer : public pretty_printer
@@ -3976,6 +4053,7 @@ pretty_print_cc_tests ()
   test_merge_consecutive_text_tokens ();
   test_custom_tokens_1 ();
   test_custom_tokens_2 ();
+  test_pp_format_stack ();
   test_prefixes_and_wrapping ();
   test_urls ();
   test_urls_from_braces ();


[gcc r15-3429] pretty-print: naming cleanups

2024-09-03 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:34f01475611b422668a70744c79273c7019625f2

commit r15-3429-g34f01475611b422668a70744c79273c7019625f2
Author: David Malcolm 
Date:   Tue Sep 3 15:10:56 2024 -0400

pretty-print: naming cleanups

This patch is a followup to r15-3311-ge31b6176996567 making some
cleanups to pretty-printing to reflect those changes:
- renaming "chunk_info" to "pp_formatted_chunks"
- renaming "cur_chunk_array" to "m_cur_fomatted_chunks"
- rewording/clarifying comments
and taking the opportunity to add a "m_" prefix to all fields of
output_buffer.

No functional change intended.

gcc/analyzer/ChangeLog:
* analyzer-logging.cc (logger::logger): Prefix all output_buffer
fields with "m_".

gcc/c-family/ChangeLog:
* c-ada-spec.cc (dump_ada_node): Prefix all output_buffer fields
with "m_".
* c-pretty-print.cc (pp_c_integer_constant): Likewise.
(pp_c_integer_constant): Likewise.
(pp_c_floating_constant): Likewise.
(pp_c_fixed_constant): Likewise.

gcc/c/ChangeLog:
* c-objc-common.cc (print_type): Prefix all output_buffer fields
with "m_".

gcc/cp/ChangeLog:
* error.cc (type_to_string): Prefix all output_buffer fields with
"m_".
(append_formatted_chunk): Likewise.  Rename "chunk_info" to
"pp_formatted_chunks" and field cur_chunk_array with
m_cur_formatted_chunks.

gcc/fortran/ChangeLog:
* error.cc (gfc_move_error_buffer_from_to): Prefix all
output_buffer fields with "m_".
(gfc_diagnostics_init): Likewise.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_set_caret_max_width): Prefix all
output_buffer fields with "m_".
* dumpfile.cc (emit_any_pending_textual_chunks): Likewise.
(emit_any_pending_textual_chunks): Likewise.
* gimple-pretty-print.cc (gimple_dump_bb_buff): Likewise.
* json.cc (value::dump): Likewise.
* pretty-print-format-impl.h (class chunk_info): Rename to...
(class pp_formatted_chunks): ...this.  Add friend
class output_buffer.  Update comment near end of decl to show
the pp_formatted_chunks instance on the chunk_obstack.
(pp_formatted_chunks::pop_from_output_buffer): Delete decl.
(pp_formatted_chunks::on_begin_quote): Delete decl that should
have been removed in r15-3311-ge31b6176996567.
(pp_formatted_chunks::on_end_quote): Likewise.
(pp_formatted_chunks::m_prev): Update for renaming.
* pretty-print.cc (output_buffer::output_buffer): Prefix all
fields with "m_".  Rename "cur_chunk_array" to
"m_cur_formatted_chunks".
(output_buffer::~output_buffer): Prefix all fields with "m_".
(output_buffer::push_formatted_chunks): New.
(output_buffer::pop_formatted_chunks): New.
(pp_write_text_to_stream): Prefix all output_buffer fields with
"m_".
(pp_write_text_as_dot_label_to_stream): Likewise.
(pp_write_text_as_html_like_dot_to_stream): Likewise.
(chunk_info::append_formatted_chunk): Rename to...
(pp_formatted_chunks::append_formatted_chunk): ...this.
(chunk_info::pop_from_output_buffer): Delete.
(pretty_printer::format): Update leading comment to mention
pushing pp_formatted_chunks, and to reflect changes in
r15-3311-ge31b6176996567.  Prefix all output_buffer fields with
"m_".
(pp_output_formatted_text): Update leading comment to mention
popping a pp_formatted_chunks, and to reflect the changes in
r15-3311-ge31b6176996567.  Prefix all output_buffer fields with
"m_" and rename "cur_chunk_array" to "m_cur_formatted_chunks".
Replace call to chunk_info::pop_from_output_buffer with a call to
output_buffer::pop_formatted_chunks.
(pp_flush): Prefix all output_buffer fields with "m_".
(pp_really_flush): Likewise.
(pp_clear_output_area): Likewise.
(pp_append_text): Likewise.
(pretty_printer::remaining_character_count_for_line): Likewise.
(pp_newline): Likewise.
(pp_character): Likewise.
(pp_markup::context::push_back_any_text): Likewise.
* pretty-print.h (class chunk_info): Rename to...
(class pp_formatted_chunks): ...this.
(class output_buffer): Delete unimplemented rule-of-5 members.
(output_buffer::push_formatted_chunks): New decl.
(output_buffer::pop_formatted_chunks): New decl.
(output_buffer::formatted_obstack): Rename to...
(output_buffer::m_formatted_obstack): ...this.
(output_buffe

[gcc r15-3431] pretty-print: split up pretty_printer::format into subroutines

2024-09-03 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:07e74798b93c256bea3a91895d3517223a58da61

commit r15-3431-g07e74798b93c256bea3a91895d3517223a58da61
Author: David Malcolm 
Date:   Tue Sep 3 15:11:06 2024 -0400

pretty-print: split up pretty_printer::format into subroutines

The body of pretty_printer::format is almost 500 lines long,
mostly comprising two distinct phases.

This patch splits it up so that there are explicit subroutines
for the two different phases, reducing the scope of various
locals, and making it easier to e.g. put a breakpoint on phase 2.

No functional change intended.

gcc/ChangeLog:
* pretty-print-markup.h (pp_markup::context::context): Drop
params "buf" and "chunk_idx", initializing m_buf from pp.
(pp_markup::context::m_chunk_idx): Drop field.
* pretty-print.cc (pretty_printer::format): Convert param
from a text_info * to a text_info &.  Split out phase 1
and phase 2 into subroutines...
(format_phase_1): New, from pretty_printer::format.
(format_phase_2): Likewise.
* pretty-print.h (pretty_printer::format): Convert param
from a text_info * to a text_info &.
(pp_format): Update for above change.  Assert that text_info is
non-null.

Signed-off-by: David Malcolm 

Diff:
---
 gcc/pretty-print-markup.h |   6 +-
 gcc/pretty-print.cc   | 232 +-
 gcc/pretty-print.h|   5 +-
 3 files changed, 131 insertions(+), 112 deletions(-)

diff --git a/gcc/pretty-print-markup.h b/gcc/pretty-print-markup.h
index ce2c5e9dbbe9..de9e4bda6ade 100644
--- a/gcc/pretty-print-markup.h
+++ b/gcc/pretty-print-markup.h
@@ -30,13 +30,10 @@ class context
 {
 public:
   context (pretty_printer &pp,
-  output_buffer &buf,
-  unsigned chunk_idx,
   bool "ed,
   pp_token_list *formatted_token_list)
   : m_pp (pp),
-m_buf (buf),
-m_chunk_idx (chunk_idx),
+m_buf (*pp_buffer (&pp)),
 m_quoted (quoted),
 m_formatted_token_list (formatted_token_list)
   {
@@ -52,7 +49,6 @@ public:
 
   pretty_printer &m_pp;
   output_buffer &m_buf;
-  unsigned m_chunk_idx;
   bool &m_quoted;
   pp_token_list *m_formatted_token_list;
 };
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index 115f376c4512..998e06e155f7 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -1589,35 +1589,79 @@ push_back_any_text (pp_token_list *tok_list,
Phase 3 is in pp_output_formatted_text, which pops the pp_formatted_chunks
instance.  */
 
+static void
+format_phase_1 (const text_info &text,
+   obstack &chunk_obstack,
+   pp_token_list **args,
+   pp_token_list ***formatters);
+
+static void
+format_phase_2 (pretty_printer *pp,
+   text_info &text,
+   obstack &chunk_obstack,
+   pp_token_list ***formatters);
+
 void
-pretty_printer::format (text_info *text)
+pretty_printer::format (text_info &text)
 {
-  output_buffer * const buffer = m_buffer;
+  pp_formatted_chunks *new_chunk_array = m_buffer->push_formatted_chunks ();
+  pp_token_list **args = new_chunk_array->m_args;
 
-  unsigned int chunk = 0, argno;
   pp_token_list **formatters[PP_NL_ARGMAX];
-
-  pp_formatted_chunks *new_chunk_array = buffer->push_formatted_chunks ();
-  pp_token_list **args = new_chunk_array->m_args;
+  memset (formatters, 0, sizeof formatters);
 
   /* Formatting phase 1: split up TEXT->format_spec into chunks in
  pp_buffer (PP)->args[].  Even-numbered chunks are to be output
  verbatim, odd-numbered chunks are format specifiers.
  %m, %%, %<, %>, %} and %' are replaced with the appropriate text at
  this point.  */
+  format_phase_1 (text, m_buffer->m_chunk_obstack, args, formatters);
 
-  memset (formatters, 0, sizeof formatters);
+  /* Note that you can debug the state of the chunk arrays here using
+   (gdb) call m_buffer->cur_chunk_array->dump()
+ which, given e.g. "foo: %s bar: %s" might print:
+   0: [TEXT("foo: ")]
+   1: [TEXT("s")]
+   2: [TEXT(" bar: ")]
+   3: [TEXT("s")]
+  */
+
+  /* Set output to the argument obstack, and switch line-wrapping and
+ prefixing off.  */
+  m_buffer->m_obstack = &m_buffer->m_chunk_obstack;
+  const int old_line_length = m_buffer->m_line_length;
+  const pp_wrapping_mode_t old_wrapping_mode = pp_set_verbatim_wrapping (this);
+
+  format_phase_2 (this, text, m_buffer->m_chunk_obstack, formatters);
+
+  /* If the client supplied a postprocessing object, call its "handle"
+ hook here.  */
+  if (m_format_postprocessor)
+m_format_postprocessor->handle (this);
+
+  /* Revert to normal obstack and wrapping mode.  */
+  m_buffer->m_obstack = &m_buffer->m_formatted_obstack;
+  m_buffer->m_line_length = old_line_length;
+  pp_wrapping_mode (this) = old_wrapping_mode;
+  clear_state ();
+}
 
+static void
+format_phase_1 (cons

[gcc r15-3432] PR116080: Fix test suite checks for musttail

2024-09-03 Thread Andi Kleen via Gcc-cvs
https://gcc.gnu.org/g:1fad396dd467326251572811b703e788e62a2588

commit r15-3432-g1fad396dd467326251572811b703e788e62a2588
Author: Andi Kleen 
Date:   Mon Jul 29 10:58:29 2024 -0700

PR116080: Fix test suite checks for musttail

This is a new attempt to fix PR116080. The previous try was reverted
because it just broke a bunch of tests, hiding the problem.

- musttail behaves differently than tailcall at -O0. Some of the test
run at -O0, so add separate effective target tests for musttail.
- New effective target tests need to use unique file names
to make dejagnu caching work
- Change the tests to use new targets
- Add a external_musttail test to check for target's ability
to do tail calls between translation units. This covers some powerpc
ABIs.

gcc/testsuite/ChangeLog:

PR testsuite/116080
* c-c++-common/musttail1.c: Use musttail target.
* c-c++-common/musttail12.c: Use struct_musttail target.
* c-c++-common/musttail2.c: Use musttail target.
* c-c++-common/musttail3.c: Likewise.
* c-c++-common/musttail4.c: Likewise.
* c-c++-common/musttail7.c: Likewise.
* c-c++-common/musttail8.c: Likewise.
* g++.dg/musttail10.C: Likewise. Replace powerpc checks with
external_musttail.
* g++.dg/musttail11.C: Use musttail target.
* g++.dg/musttail6.C: Use musttail target. Replace powerpc
checks with external_musttail.
* g++.dg/musttail9.C: Use musttail target.
* lib/target-supports.exp: Add musttail, struct_musttail,
external_musttail targets. Remove optimization for musttail.
Use unique file names for musttail.

Diff:
---
 gcc/testsuite/c-c++-common/musttail1.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail12.c |  2 +-
 gcc/testsuite/c-c++-common/musttail2.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail3.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail4.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail7.c  |  2 +-
 gcc/testsuite/c-c++-common/musttail8.c  |  2 +-
 gcc/testsuite/g++.dg/musttail10.C   |  6 +++---
 gcc/testsuite/g++.dg/musttail11.C   |  2 +-
 gcc/testsuite/g++.dg/musttail6.C|  4 ++--
 gcc/testsuite/g++.dg/musttail9.C|  2 +-
 gcc/testsuite/lib/target-supports.exp   | 30 --
 12 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
b/gcc/testsuite/c-c++-common/musttail1.c
index 74efcc2a0bc6..51549672e02a 100644
--- a/gcc/testsuite/c-c++-common/musttail1.c
+++ b/gcc/testsuite/c-c++-common/musttail1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 int __attribute__((noinline,noclone,noipa))
diff --git a/gcc/testsuite/c-c++-common/musttail12.c 
b/gcc/testsuite/c-c++-common/musttail12.c
index 4140bcd00950..475afc5af3f3 100644
--- a/gcc/testsuite/c-c++-common/musttail12.c
+++ b/gcc/testsuite/c-c++-common/musttail12.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { struct_tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
 /* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 struct str
diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
b/gcc/testsuite/c-c++-common/musttail2.c
index 86f2c3d77404..1970c4edd670 100644
--- a/gcc/testsuite/c-c++-common/musttail2.c
+++ b/gcc/testsuite/c-c++-common/musttail2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 
 struct box { char field[256]; int i; };
 
diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
b/gcc/testsuite/c-c++-common/musttail3.c
index ea9589c59ef2..7499fd6460b4 100644
--- a/gcc/testsuite/c-c++-common/musttail3.c
+++ b/gcc/testsuite/c-c++-common/musttail3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { struct_musttail && { c || c++11 } } } } */
 
 extern int foo2 (int x, ...);
 
diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
b/gcc/testsuite/c-c++-common/musttail4.c
index 23f4b5e1cd68..bd6effa4b931 100644
--- a/gcc/testsuite/c-c++-common/musttail4.c
+++ b/gcc/testsuite/c-c++-common/musttail4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-do compile { target { musttail && { c || c++11 } } } } */
 
 struct box { char field[64]; int i; };
 
diff --git a/gcc/testsuite/c-c++-common/musttail7.c 
b/gcc/testsuite/c-c++-common/musttail7.c
index c753a3fe9b2a..d17cb71256d7 100644
--- a/gcc/testsuite/c-c++-common/musttail7.c
+++ b/gcc/testsuite/c-c++-common/musttail7.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { tail_call && { c ||

[gcc r15-3433] c++: support C++11 attributes in C++98

2024-09-03 Thread Jason Merrill via Gcc-cvs
https://gcc.gnu.org/g:3775f71c8909b3531fe002138814fa2504ec2e8b

commit r15-3433-g3775f71c8909b3531fe002138814fa2504ec2e8b
Author: Jason Merrill 
Date:   Fri Aug 30 16:02:10 2024 -0400

c++: support C++11 attributes in C++98

I don't see any reason why we can't allow the [[]] attribute syntax in C++98
mode with a pedwarn just like many other C++11 features.  In fact, we
already do support it in some places in the grammar, but not in places that
check cp_nth_tokens_can_be_std_attribute_p.

Let's also follow the C front-end's lead in only warning about them when
-pedantic.

It still isn't necessary for this function to guard against Objective-C
message passing syntax; we handle that with tentative parsing in
cp_parser_statement, and we don't call this function in that context anyway.

gcc/cp/ChangeLog:

* parser.cc (cp_nth_tokens_can_be_std_attribute_p): Don't check
cxx_dialect.
* error.cc (maybe_warn_cpp0x): Only complain about C++11 attributes
if pedantic.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/gen-attrs-1.C: Also run in C++98 mode.
* g++.dg/cpp0x/gen-attrs-11.C: Likewise.
* g++.dg/cpp0x/gen-attrs-13.C: Likewise.
* g++.dg/cpp0x/gen-attrs-15.C: Likewise.
* g++.dg/cpp0x/gen-attrs-75.C: Don't expect C++98 warning after
__extension__.

Diff:
---
 gcc/cp/error.cc   |  7 ---
 gcc/cp/parser.cc  |  9 -
 gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/gen-attrs-13.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/gen-attrs-15.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C | 10 +-
 7 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 57cd76caf490..4a9e9aa3cdcb 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -4735,9 +4735,10 @@ maybe_warn_cpp0x (cpp0x_warn_str str, location_t 
loc/*=input_location*/)
 "only available with %<-std=c++11%> or %<-std=gnu++11%>");
 break;
   case CPP0X_ATTRIBUTES:
-   pedwarn (loc, OPT_Wc__11_extensions,
-"C++11 attributes "
-"only available with %<-std=c++11%> or %<-std=gnu++11%>");
+   if (pedantic)
+ pedwarn (loc, OPT_Wc__11_extensions,
+  "C++11 attributes "
+  "only available with %<-std=c++11%> or %<-std=gnu++11%>");
break;
   case CPP0X_REF_QUALIFIER:
pedwarn (loc, OPT_Wc__11_extensions,
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index edfa5a494405..64122d937fa5 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -29924,11 +29924,10 @@ cp_nth_tokens_can_be_std_attribute_p (cp_parser 
*parser, size_t n)
 {
   cp_token *token = cp_lexer_peek_nth_token (parser->lexer, n);
 
-  return (cxx_dialect >= cxx11
- && ((token->type == CPP_KEYWORD && token->keyword == RID_ALIGNAS)
- || (token->type == CPP_OPEN_SQUARE
- && (token = cp_lexer_peek_nth_token (parser->lexer, n + 1))
- && token->type == CPP_OPEN_SQUARE)));
+  return ((token->type == CPP_KEYWORD && token->keyword == RID_ALIGNAS)
+ || (token->type == CPP_OPEN_SQUARE
+ && (token = cp_lexer_peek_nth_token (parser->lexer, n + 1))
+ && token->type == CPP_OPEN_SQUARE));
 }
 
 /* Return TRUE iff the next Nth tokens in the stream are possibly the
diff --git a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C 
b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C
index c2cf912047e9..b1625d969167 100644
--- a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C
@@ -1,3 +1,3 @@
-// { dg-do compile { target c++11 } }
+// { dg-additional-options "-Wno-c++11-extensions" }
 
 int  [[gnu::format(printf, 1, 2)]] foo(const char *, ...); // { dg-warning 
"only applies to function types" }
diff --git a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C 
b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C
index 504b4565679c..040f15c9dbb9 100644
--- a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C
+++ b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C
@@ -1,4 +1,4 @@
-// { dg-do compile { target c++11 } }
+// { dg-additional-options "-Wno-c++11-extensions" }
 // PR c++/13791
 
 template  struct O {
diff --git a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-13.C 
b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-13.C
index a1b4a84b7e54..8997b845dfd9 100644
--- a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-13.C
+++ b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-13.C
@@ -1,4 +1,4 @@
-// { dg-do compile { target c++11 } }
+// { dg-additional-options "-Wno-c++11-extensions" }
 // PR c++/13854
 
 extern char *rindex [[gnu::__pure__]] (__const char *__s, int __c) throw ();
diff --git a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-15.C 
b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-15.C
index bf05dbeb31b9..8b552c

[gcc r15-3434] Explicitly document that the "counted_by" attribute is only supported in C.

2024-09-03 Thread Qing Zhao via Gcc-cvs
https://gcc.gnu.org/g:f9642ffe7814396f31203f4366f78a43a01a215c

commit r15-3434-gf9642ffe7814396f31203f4366f78a43a01a215c
Author: Qing Zhao 
Date:   Tue Sep 3 19:28:23 2024 +

Explicitly document that the "counted_by" attribute is only supported in C.

The "counted_by" attribute currently is only supported in C, mention this
explicitly in documentation and also issue warnings when see "counted_by"
attribute in C++ with -Wattributes.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_counted_by_attribute): Is ignored and issues
warning with -Wattributes in C++ for now.

gcc/ChangeLog:

* doc/extend.texi: Explicitly mentions counted_by is available
only in C for now.

gcc/testsuite/ChangeLog:

* g++.dg/ext/flex-array-counted-by.C: New test.
* g++.dg/ext/flex-array-counted-by-2.C: New test.

Diff:
---
 gcc/c-family/c-attribs.cc  | 10 +-
 gcc/doc/extend.texi|  4 
 gcc/testsuite/g++.dg/ext/flex-array-counted-by-2.C | 13 +
 gcc/testsuite/g++.dg/ext/flex-array-counted-by.C   | 11 +++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index cf27cd6d5212..79303518dcb7 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2867,8 +2867,16 @@ handle_counted_by_attribute (tree *node, tree name,
   tree argval = TREE_VALUE (args);
   tree old_counted_by = lookup_attribute ("counted_by", DECL_ATTRIBUTES 
(decl));
 
+  /* This attribute is not supported in C++.  */
+  if (c_dialect_cxx ())
+{
+  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
+ "%qE attribute is not supported for C++ for now, ignored",
+ name);
+  *no_add_attrs = true;
+}
   /* This attribute only applies to field decls of a structure.  */
-  if (TREE_CODE (decl) != FIELD_DECL)
+  else if (TREE_CODE (decl) != FIELD_DECL)
 {
   error_at (DECL_SOURCE_LOCATION (decl),
"%qE attribute is not allowed for a non-field"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5845bcedf6e5..ebfa6779becb 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7926,6 +7926,10 @@ The @code{counted_by} attribute may be attached to the 
C99 flexible array
 member of a structure.  It indicates that the number of the elements of the
 array is given by the field "@var{count}" in the same structure as the
 flexible array member.
+
+This attribute is available only in C for now.
+In C++ this attribute is ignored.
+
 GCC may use this information to improve detection of object size information
 for such structures and provide better results in compile-time diagnostics
 and runtime features like the array bound sanitizer and
diff --git a/gcc/testsuite/g++.dg/ext/flex-array-counted-by-2.C 
b/gcc/testsuite/g++.dg/ext/flex-array-counted-by-2.C
new file mode 100644
index ..6ac2b509b687
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/flex-array-counted-by-2.C
@@ -0,0 +1,13 @@
+/* Testing the fact that the attribute counted_by is not supported in C++.  */
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-Wattributes" } */
+
+struct trailing {
+  int count;
+  int field [[gnu::counted_by (count)]] []; /* { dg-warning "attribute is not 
supported for C\\+\\+ for now, ignored" } */
+};
+
+struct trailing1 {
+  int count1;
+  [[gnu::counted_by (count)]] int field []; /* { dg-warning "attribute is not 
supported for C\\+\\+ for now, ignored" } */
+};
diff --git a/gcc/testsuite/g++.dg/ext/flex-array-counted-by.C 
b/gcc/testsuite/g++.dg/ext/flex-array-counted-by.C
new file mode 100644
index ..8bc79d459dfc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/flex-array-counted-by.C
@@ -0,0 +1,11 @@
+/* Testing the fact that the attribute counted_by is not supported in C++.  */
+/* { dg-do compile } */
+/* { dg-options "-Wattributes" } */
+
+int size;
+int x __attribute ((counted_by (size))); /* { dg-warning "attribute is not 
supported for C\\+\\+ for now, ignored" } */
+
+struct trailing {
+  int count;
+  int field[] __attribute ((counted_by (count))); /* { dg-warning "attribute 
is not supported for C\\+\\+ for now, ignored" } */
+};


[gcc] Created branch 'meissner/heads/work177' in namespace 'refs/users'

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177' was created in namespace 'refs/users' 
pointing to:

 f9642ffe7814... Explicitly document that the "counted_by" attribute is only


[gcc r15-3435] split-paths: Move check for # of statements in join earlier

2024-09-03 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:77e17558fcda8992fbe731ccd12bde445e48d6f4

commit r15-3435-g77e17558fcda8992fbe731ccd12bde445e48d6f4
Author: Andrew Pinski 
Date:   Mon Sep 2 20:38:11 2024 -0700

split-paths: Move check for # of statements in join earlier

This moves the check for # of statements to copy in join to
be the first check. This check is the cheapest check so it
should be first. Plus add a print to the dump file since there
was none beforehand.

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (is_feasible_trace): Move
check for # of statments in join earlier and add a
debug print.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/gimple-ssa-split-paths.cc | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-ssa-split-paths.cc b/gcc/gimple-ssa-split-paths.cc
index 8b4304fe59e0..81a5d1dee5b2 100644
--- a/gcc/gimple-ssa-split-paths.cc
+++ b/gcc/gimple-ssa-split-paths.cc
@@ -167,6 +167,19 @@ is_feasible_trace (basic_block bb)
   int num_stmts_in_pred2
 = EDGE_COUNT (pred2->succs) == 1 ? count_stmts_in_block (pred2) : 0;
 
+  /* Upper Hard limit on the number statements to copy.  */
+  if (num_stmts_in_join
+  >= param_max_jump_thread_duplication_stmts)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Duplicating block %d would be too duplicate "
+"too many statments: %d >= %d\n",
+bb->index, num_stmts_in_join,
+param_max_jump_thread_duplication_stmts);
+  return false;
+}
+
   /* This is meant to catch cases that are likely opportunities for
  if-conversion.  Essentially we look for the case where
  BB's predecessors are both single statement blocks where
@@ -406,12 +419,6 @@ is_feasible_trace (basic_block bb)
   /* We may want something here which looks at dataflow and tries
  to guess if duplication of BB is likely to result in simplification
  of instructions in BB in either the original or the duplicate.  */
-
-  /* Upper Hard limit on the number statements to copy.  */
-  if (num_stmts_in_join
-  >= param_max_jump_thread_duplication_stmts)
-return false;
-
   return true;
 }


[gcc r15-3436] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:b2b20b277988ab9ddb6ea82141075147b7b98f74

commit r15-3436-gb2b20b277988ab9ddb6ea82141075147b7b98f74
Author: Andrew Pinski 
Date:   Mon Sep 2 21:34:53 2024 -0700

split-path: Improve ifcvt heurstic for split path [PR112402]

This simplifies the heurstic for split path to see if the join
bb is a ifcvt candidate.
For the predecessors bbs need either to be empty or only have one
statement in them which could be a decent ifcvt candidate.
The previous heurstics would miss that:
```
if (a) goto B else goto C;
B:  goto C;
C:
c = PHI
```

Would be a decent ifcvt candidate. And would also miss:
```
if (a) goto B else goto C;
B: d = f + 1;  goto C;
C:
c = PHI
```

Also since currently the max number of cmovs being able to produced is 3, we
should only assume `<= 3` phis can be ifcvt candidates.

The testcase changes for split-path-6.c is that lookharder function
is a true ifcvt case where we would get cmov as expected; it looks like it
was not a candidate when the heurstic was added but became one later on.
pr88797.C is now rejected via it being an ifcvt candidate rather than being 
about
DCE/const prop.

The rest of the testsuite changes are just slight change in the dump,
removing the "*diamnond" part as it was removed from the print.

Bootstrapped and tested on x86_64.

PR tree-optimization/112402

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (poor_ifcvt_pred): New function.
(is_feasible_trace): Remove old heurstics for ifcvt cases.
For num_stmts <=1 for both pred check poor_ifcvt_pred on both
pred.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/split-path-11.c: Update scan.
* gcc.dg/tree-ssa/split-path-2.c: Update scan.
* gcc.dg/tree-ssa/split-path-5.c: Update scan.
* gcc.dg/tree-ssa/split-path-6.c: Update scan.
* g++.dg/tree-ssa/pr88797.C: Update scan.
* gcc.dg/tree-ssa/split-path-13.c: New test.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/gimple-ssa-split-paths.cc | 172 --
 gcc/testsuite/g++.dg/tree-ssa/pr88797.C   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-11.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-13.c |  26 
 gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-5.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c  |   4 +-
 7 files changed, 88 insertions(+), 122 deletions(-)

diff --git a/gcc/gimple-ssa-split-paths.cc b/gcc/gimple-ssa-split-paths.cc
index 81a5d1dee5b2..32b5c445760e 100644
--- a/gcc/gimple-ssa-split-paths.cc
+++ b/gcc/gimple-ssa-split-paths.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
 #include "fold-const.h"
+#include "cfghooks.h"
 
 /* Given LATCH, the latch block in a loop, see if the shape of the
path reaching LATCH is suitable for being split by duplication.
@@ -141,6 +142,40 @@ poor_ifcvt_candidate_code (enum tree_code code)
  || code == CALL_EXPR);
 }
 
+/* Return TRUE if PRED of BB is an poor ifcvt candidate. */
+static bool
+poor_ifcvt_pred (basic_block pred, basic_block bb)
+{
+  /* If the edge count of the pred is not 1, then
+ this is the predecessor from the if rather
+ than middle one. */
+  if (EDGE_COUNT (pred->succs) != 1)
+return false;
+
+  /* Empty middle bb are never a poor ifcvt candidate. */
+  if (empty_block_p (pred))
+return false;
+  /* If BB's predecessors are single statement blocks where
+ the output of that statement feed the same PHI in BB,
+ it an ifcvt candidate. */
+  gimple *stmt = last_and_only_stmt (pred);
+  if (!stmt || gimple_code (stmt) != GIMPLE_ASSIGN)
+return true;
+  tree_code code = gimple_assign_rhs_code (stmt);
+  if (poor_ifcvt_candidate_code (code))
+return true;
+  tree lhs = gimple_assign_lhs (stmt);
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple *phi = gsi_stmt (gsi);
+  if (gimple_phi_arg_def (phi, 0) == lhs
+ || gimple_phi_arg_def (phi, 1) == lhs)
+   return false;
+}
+  return true;
+}
+
 /* Return TRUE if BB is a reasonable block to duplicate by examining
its size, false otherwise.  BB will always be a loop latch block.
 
@@ -181,127 +216,30 @@ is_feasible_trace (basic_block bb)
 }
 
   /* This is meant to catch cases that are likely opportunities for
- if-conversion.  Essentially we look for the case where
- BB's predecessors are both single statement blocks where
- the output of that statement feed the same PHI in BB.  */
-  if (num_stmts_in_pred1 == 1 && num_stmts_in_pred2 == 1)
-{
-  gimple *stmt1 = last_and_only_stmt (pred1);
-  gimple *stmt2

[gcc/meissner/heads/work177] (2 commits) split-path: Improve ifcvt heurstic for split path [PR112402

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177' was updated to point to:

 b2b20b277988... split-path: Improve ifcvt heurstic for split path [PR112402

It previously pointed to:

 f9642ffe7814... Explicitly document that the "counted_by" attribute is only

Diff:

Summary of changes (added commits):
---

  b2b20b2... split-path: Improve ifcvt heurstic for split path [PR112402 (*)
  77e1755... split-paths: Move check for # of statements in join earlier (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work177' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work177)] Add ChangeLog.meissner and REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:53168176652d415fb8c77ff7f0fc3c4b8ef4c066

commit 53168176652d415fb8c77ff7f0fc3c4b8ef4c066
Author: Michael Meissner 
Date:   Tue Sep 3 19:39:29 2024 -0400

Add ChangeLog.meissner and REVISION.

2024-09-03  Michael Meissner  

gcc/

* REVISION: New file for branch.
* ChangeLog.meissner: New file.

gcc/c-family/

* ChangeLog.meissner: New file.

gcc/c/

* ChangeLog.meissner: New file.

gcc/cp/

* ChangeLog.meissner: New file.

gcc/fortran/

* ChangeLog.meissner: New file.

gcc/testsuite/

* ChangeLog.meissner: New file.

libgcc/

* ChangeLog.meissner: New file.

Diff:
---
 gcc/ChangeLog.meissner   | 6 ++
 gcc/REVISION | 1 +
 gcc/c-family/ChangeLog.meissner  | 6 ++
 gcc/c/ChangeLog.meissner | 6 ++
 gcc/cp/ChangeLog.meissner| 6 ++
 gcc/fortran/ChangeLog.meissner   | 6 ++
 gcc/testsuite/ChangeLog.meissner | 6 ++
 libgcc/ChangeLog.meissner| 6 ++
 libstdc++-v3/ChangeLog.meissner  | 6 ++
 9 files changed, 49 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
new file mode 100644
index ..581879743f29
--- /dev/null
+++ b/gcc/ChangeLog.meissner
@@ -0,0 +1,6 @@
+ Branch work177, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
new file mode 100644
index ..d331a5b4da39
--- /dev/null
+++ b/gcc/REVISION
@@ -0,0 +1 @@
+work177 branch
diff --git a/gcc/c-family/ChangeLog.meissner b/gcc/c-family/ChangeLog.meissner
new file mode 100644
index ..581879743f29
--- /dev/null
+++ b/gcc/c-family/ChangeLog.meissner
@@ -0,0 +1,6 @@
+ Branch work177, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/c/ChangeLog.meissner b/gcc/c/ChangeLog.meissner
new file mode 100644
index ..581879743f29
--- /dev/null
+++ b/gcc/c/ChangeLog.meissner
@@ -0,0 +1,6 @@
+ Branch work177, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/cp/ChangeLog.meissner b/gcc/cp/ChangeLog.meissner
new file mode 100644
index ..581879743f29
--- /dev/null
+++ b/gcc/cp/ChangeLog.meissner
@@ -0,0 +1,6 @@
+ Branch work177, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/fortran/ChangeLog.meissner b/gcc/fortran/ChangeLog.meissner
new file mode 100644
index ..581879743f29
--- /dev/null
+++ b/gcc/fortran/ChangeLog.meissner
@@ -0,0 +1,6 @@
+ Branch work177, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/testsuite/ChangeLog.meissner b/gcc/testsuite/ChangeLog.meissner
new file mode 100644
index ..581879743f29
--- /dev/null
+++ b/gcc/testsuite/ChangeLog.meissner
@@ -0,0 +1,6 @@
+ Branch work177, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/libgcc/ChangeLog.meissner b/libgcc/ChangeLog.meissner
new file mode 100644
index ..581879743f29
--- /dev/null
+++ b/libgcc/ChangeLog.meissner
@@ -0,0 +1,6 @@
+ Branch work177, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/libstdc++-v3/ChangeLog.meissner b/libstdc++-v3/ChangeLog.meissner
new file mode 100644
index ..581879743f29
--- /dev/null
+++ b/libstdc++-v3/ChangeLog.meissner
@@ -0,0 +1,6 @@
+ Branch work177, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+


[gcc] Created branch 'meissner/heads/work177-dmf' in namespace 'refs/users'

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-dmf' was created in namespace 'refs/users' 
pointing to:

 53168176652d... Add ChangeLog.meissner and REVISION.


[gcc(refs/users/meissner/heads/work177-dmf)] Add ChangeLog.dmf and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:b0d36040031e957f032d34c560c9050d0362c9a3

commit b0d36040031e957f032d34c560c9050d0362c9a3
Author: Michael Meissner 
Date:   Tue Sep 3 19:40:35 2024 -0400

Add ChangeLog.dmf and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.dmf: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.dmf | 6 ++
 gcc/REVISION  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf
new file mode 100644
index ..5686facb6869
--- /dev/null
+++ b/gcc/ChangeLog.dmf
@@ -0,0 +1,6 @@
+ Branch work177-dmf, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..7eeb812a9a1b 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-dmf branch


[gcc] Created branch 'meissner/heads/work177-vpair' in namespace 'refs/users'

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-vpair' was created in namespace 'refs/users' 
pointing to:

 53168176652d... Add ChangeLog.meissner and REVISION.


[gcc(refs/users/meissner/heads/work177-vpair)] Add ChangeLog.vpair and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:a1297ab8acc18f2c4a5da6eb0ea59616e4a31532

commit a1297ab8acc18f2c4a5da6eb0ea59616e4a31532
Author: Michael Meissner 
Date:   Tue Sep 3 19:41:27 2024 -0400

Add ChangeLog.vpair and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.vpair: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.vpair | 6 ++
 gcc/REVISION| 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.vpair b/gcc/ChangeLog.vpair
new file mode 100644
index ..958a2af2e811
--- /dev/null
+++ b/gcc/ChangeLog.vpair
@@ -0,0 +1,6 @@
+ Branch work177-vpair, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..c0f58f45e4fb 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-vpair branch


[gcc] Created branch 'meissner/heads/work177-tar' in namespace 'refs/users'

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-tar' was created in namespace 'refs/users' 
pointing to:

 53168176652d... Add ChangeLog.meissner and REVISION.


[gcc(refs/users/meissner/heads/work177-tar)] Add ChangeLog.tar and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:e3af2c7a011e462a0934e9f022c96227629d3cf1

commit e3af2c7a011e462a0934e9f022c96227629d3cf1
Author: Michael Meissner 
Date:   Tue Sep 3 19:42:20 2024 -0400

Add ChangeLog.tar and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.tar: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.tar | 6 ++
 gcc/REVISION  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.tar b/gcc/ChangeLog.tar
new file mode 100644
index ..6fe8a38bffcd
--- /dev/null
+++ b/gcc/ChangeLog.tar
@@ -0,0 +1,6 @@
+ Branch work177-tar, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..2207f56818d6 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-tar branch


[gcc] Created branch 'meissner/heads/work177-bugs' in namespace 'refs/users'

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-bugs' was created in namespace 'refs/users' 
pointing to:

 53168176652d... Add ChangeLog.meissner and REVISION.


[gcc(refs/users/meissner/heads/work177-bugs)] Add ChangeLog.bugs and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:e9eff3979fbb53d0959cac044f9918f4840a3a04

commit e9eff3979fbb53d0959cac044f9918f4840a3a04
Author: Michael Meissner 
Date:   Tue Sep 3 19:43:16 2024 -0400

Add ChangeLog.bugs and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.bugs: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.bugs | 6 ++
 gcc/REVISION   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.bugs b/gcc/ChangeLog.bugs
new file mode 100644
index ..d650b6ef609d
--- /dev/null
+++ b/gcc/ChangeLog.bugs
@@ -0,0 +1,6 @@
+ Branch work177-bugs, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..52a7be4c2653 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-bugs branch


[gcc] Created branch 'meissner/heads/work177-libs' in namespace 'refs/users'

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-libs' was created in namespace 'refs/users' 
pointing to:

 53168176652d... Add ChangeLog.meissner and REVISION.


[gcc(refs/users/meissner/heads/work177-libs)] Add ChangeLog.libs and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:752eeaaf40595de40bdd75cf16ba325c0403

commit 752eeaaf40595de40bdd75cf16ba325c0403
Author: Michael Meissner 
Date:   Tue Sep 3 19:44:40 2024 -0400

Add ChangeLog.libs and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.libs: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.libs | 6 ++
 gcc/REVISION   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.libs b/gcc/ChangeLog.libs
new file mode 100644
index ..03ebc5600b23
--- /dev/null
+++ b/gcc/ChangeLog.libs
@@ -0,0 +1,6 @@
+ Branch work177-libs, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..0ca4dbac01fa 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-libs branch


[gcc] Created branch 'meissner/heads/work177-test' in namespace 'refs/users'

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-test' was created in namespace 'refs/users' 
pointing to:

 53168176652d... Add ChangeLog.meissner and REVISION.


[gcc(refs/users/meissner/heads/work177-test)] Add ChangeLog.test and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:9977091c89facf09ce829bfd41a3437c28e328cb

commit 9977091c89facf09ce829bfd41a3437c28e328cb
Author: Michael Meissner 
Date:   Tue Sep 3 19:45:32 2024 -0400

Add ChangeLog.test and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.test: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.test | 6 ++
 gcc/REVISION   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.test b/gcc/ChangeLog.test
new file mode 100644
index ..7d4190b2ed42
--- /dev/null
+++ b/gcc/ChangeLog.test
@@ -0,0 +1,6 @@
+ Branch work177-test, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..0ccbb3643953 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-test branch


[gcc] Created branch 'meissner/heads/work177-orig' in namespace 'refs/users'

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-orig' was created in namespace 'refs/users' 
pointing to:

 b2b20b277988... split-path: Improve ifcvt heurstic for split path [PR112402


[gcc(refs/users/meissner/heads/work177-orig)] Add REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:f43015cda20994222e0660dace5f4196c0abd3ab

commit f43015cda20994222e0660dace5f4196c0abd3ab
Author: Michael Meissner 
Date:   Tue Sep 3 19:46:34 2024 -0400

Add REVISION.

2024-09-03  Michael Meissner  

gcc/

* REVISION: New file for branch.

Diff:
---
 gcc/REVISION | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/REVISION b/gcc/REVISION
new file mode 100644
index ..f49b5de8f059
--- /dev/null
+++ b/gcc/REVISION
@@ -0,0 +1 @@
+work177-orig branch


[gcc r15-3437] aarch64: Fix testcase vec-init-22-speed.c [PR116589]

2024-09-03 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:d8bc31d973d2ab3fabb5e85e7c4354ffb2283512

commit r15-3437-gd8bc31d973d2ab3fabb5e85e7c4354ffb2283512
Author: Andrew Pinski 
Date:   Tue Sep 3 17:10:37 2024 -0700

aarch64: Fix testcase vec-init-22-speed.c [PR116589]

For this testcase, the trunk produces:
```
f_s16:
fmovs31, w0
fmovs0, w1
```

While the testcase was expecting what was produced in GCC 14:
```
f_s16:
sxthw0, w0
sxthw1, w1
fmovd31, x0
fmovd0, x1
```

After r15-1575-gea8061f46a30 the code was:
```
dup v31.4h, w0
dup v0.4h, w1
```
But when ext-dce was added with r15-1901-g98914f9eba5f19, we get the better 
code generation now and only fmov's.

Pushed as obvious after running the testcase.

PR target/116589

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vec-init-22-speed.c: Update scan for better 
code gen.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c 
b/gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c
index 993ef8c41613..6edc82831a00 100644
--- a/gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c
@@ -7,6 +7,6 @@
 
 #include "vec-init-22.h"
 
-/* { dg-final { scan-assembler-times {\tfmov\td[0-9]+, x[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmov\ts[0-9]+, w[0-9]+} 2 } } */
 /* { dg-final { scan-assembler-times {\tins\tv[0-9]+\.h\[[1-3]\], w[0-9]+} 6 } 
} */
 /* { dg-final { scan-assembler {\tzip1\tv[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h} 
} } */


[gcc r15-3438] RISC-V: Allow IMM operand for unsigned scalar .SAT_ADD

2024-09-03 Thread Pan Li via Gcc-cvs
https://gcc.gnu.org/g:9ea9d05908432fc5f3632f3e397e3709f95ef636

commit r15-3438-g9ea9d05908432fc5f3632f3e397e3709f95ef636
Author: Pan Li 
Date:   Mon Sep 2 15:54:43 2024 +0800

RISC-V: Allow IMM operand for unsigned scalar .SAT_ADD

This patch would like to allow the IMM operand of the unsigned
scalar .SAT_ADD.  Like the operand 0, the operand 1 of .SAT_ADD
will be zero extended to Xmode before underlying code generation.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_usadd): Zero extend
the second operand of usadd as the first operand does.
* config/riscv/riscv.md (usadd3): Allow imm operand for
scalar usadd pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_add-11.c: Make asm check robust.
* gcc.target/riscv/sat_u_add-15.c: Ditto.
* gcc.target/riscv/sat_u_add-19.c: Ditto.
* gcc.target/riscv/sat_u_add-23.c: Ditto.
* gcc.target/riscv/sat_u_add-3.c: Ditto.
* gcc.target/riscv/sat_u_add-7.c: Ditto.

Signed-off-by: Pan Li 

Diff:
---
 gcc/config/riscv/riscv.cc | 2 +-
 gcc/config/riscv/riscv.md | 4 ++--
 gcc/testsuite/gcc.target/riscv/sat_u_add-11.c | 2 +-
 gcc/testsuite/gcc.target/riscv/sat_u_add-15.c | 2 +-
 gcc/testsuite/gcc.target/riscv/sat_u_add-19.c | 2 +-
 gcc/testsuite/gcc.target/riscv/sat_u_add-23.c | 2 +-
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/sat_u_add-7.c  | 2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 98720611e246..f82e64a6fec8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11970,7 +11970,7 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
   rtx xmode_sum = gen_reg_rtx (Xmode);
   rtx xmode_lt = gen_reg_rtx (Xmode);
   rtx xmode_x = riscv_gen_zero_extend_rtx (x, mode);
-  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_y = riscv_gen_zero_extend_rtx (y, mode);
   rtx xmode_dest = gen_reg_rtx (Xmode);
 
   /* Step-1: sum = x + y  */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6f7efafb8abe..9f94b5aa0232 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4360,8 +4360,8 @@
 
 (define_expand "usadd3"
   [(match_operand:ANYI 0 "register_operand")
-   (match_operand:ANYI 1 "register_operand")
-   (match_operand:ANYI 2 "register_operand")]
+   (match_operand:ANYI 1 "reg_or_int_operand")
+   (match_operand:ANYI 2 "reg_or_int_operand")]
   ""
   {
 riscv_expand_usadd (operands[0], operands[1], operands[2]);
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-11.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-11.c
index e248aeafa8ef..bd830ececad4 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_u_add-11.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-11.c
@@ -8,7 +8,7 @@
 ** sat_u_add_uint32_t_fmt_3:
 ** slli\s+[atx][0-9]+,\s*a0,\s*32
 ** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*32
-** add\s+[atx][0-9]+,\s*a0,\s*a1
+** add\s+[atx][0-9]+,\s*a[01],\s*a[01]
 ** slli\s+[atx][0-9]+,\s*[atx][0-9],\s*32
 ** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*32
 ** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-15.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-15.c
index bb8b991a84ee..de615a6225e9 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_u_add-15.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-15.c
@@ -8,7 +8,7 @@
 ** sat_u_add_uint32_t_fmt_4:
 ** slli\s+[atx][0-9]+,\s*a0,\s*32
 ** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*32
-** add\s+[atx][0-9]+,\s*a0,\s*a1
+** add\s+[atx][0-9]+,\s*a[01],\s*a[01]
 ** slli\s+[atx][0-9]+,\s*[atx][0-9],\s*32
 ** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*32
 ** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-19.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-19.c
index 7e4ae12f2f51..2b793e2f8fdb 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_u_add-19.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-19.c
@@ -8,7 +8,7 @@
 ** sat_u_add_uint32_t_fmt_5:
 ** slli\s+[atx][0-9]+,\s*a0,\s*32
 ** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*32
-** add\s+[atx][0-9]+,\s*a0,\s*a1
+** add\s+[atx][0-9]+,\s*a[01],\s*a[01]
 ** slli\s+[atx][0-9]+,\s*[atx][0-9],\s*32
 ** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*32
 ** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-23.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-23.c
index 49bbb74a401e..5de086e11384 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_u_add-23.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-23.c
@@ -8,7 +8,7 @@
 ** sat_u_add_uint32_t_fmt_6:
 ** slli\s+[atx][0-9]+,\s*a0,\s*32
 ** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*32
-** add\s+[atx][0-9]+,\s*a0,\s*a1
+** add\s+[atx][0-9]+,\s*

[gcc(refs/users/meissner/heads/work177)] Add rs6000 architecture masks.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:505f1a5cb7932ff0d22d0316481be278a986f257

commit 505f1a5cb7932ff0d22d0316481be278a986f257
Author: Michael Meissner 
Date:   Tue Sep 3 19:53:25 2024 -0400

Add rs6000 architecture masks.

This patch begins the journey to move architecture bits that are not user 
ISA
options from rs6000_isa_flags to a new targt variable rs6000_arch_flags.  
The
intention is to remove switches that are currently isa options, but the user
should not be using this particular option. For example, we want users to 
use
-mcpu=power10 and not just -mpower10.

This patch also changes the target_clones support to use an architecture 
mask
instead of isa bits.

This patch also switches the handling of .machine to use architecture masks 
if
they exist (power4 through power11).  All of the other PowerPCs will 
continue to
use the existing code for setting the .machine option.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

gcc/

* config/rs6000/rs6000-arch.def: New file.
* config/rs6000/rs6000.cc (struct clone_map): Switch to using
architecture masks instead of ISA masks.
(rs6000_clone_map): Likewise.
(rs6000_print_isa_options): Add an architecture flags argument, 
change
all callers.
(get_arch_flag): New function.
(rs6000_debug_reg_global): Update rs6000_print_isa_options calls.
(rs6000_option_override_internal): Likewise.
(rs6000_machine_from_flags): Switch to using architecture masks 
instead
of ISA masks.
(struct rs6000_arch_mask): New structure.
(rs6000_arch_masks): New table of architecutre masks and names.
(rs6000_function_specific_save): Save architecture flags.
(rs6000_function_specific_restore): Restore architecture flags.
(rs6000_function_specific_print): Update rs6000_print_isa_options 
calls.
(rs6000_print_options_internal): Add architecture flags options.
(rs6000_clone_priority): Switch to using architecture masks instead 
of
ISA masks.
(rs6000_can_inline_p): Don't allow inling if the callee requires a 
newer
architecture than the caller.
* config/rs6000/rs6000.h: Use rs6000-arch.def to create the 
architecture
masks.
* config/rs6000/rs6000.opt (rs6000_arch_flags): New target variable.
(x_rs6000_arch_flags): New save/restore field for rs6000_arch_flags.

Diff:
---
 gcc/config/rs6000/rs6000-arch.def |  48 +
 gcc/config/rs6000/rs6000.cc   | 215 +++---
 gcc/config/rs6000/rs6000.h|  24 +
 gcc/config/rs6000/rs6000.opt  |   8 ++
 4 files changed, 259 insertions(+), 36 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-arch.def 
b/gcc/config/rs6000/rs6000-arch.def
new file mode 100644
index ..e5b6e9581331
--- /dev/null
+++ b/gcc/config/rs6000/rs6000-arch.def
@@ -0,0 +1,48 @@
+/* IBM RS/6000 CPU architecture features by processor type.
+   Copyright (C) 1991-2024 Free Software Foundation, Inc.
+   Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu)
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* This file defines architecture features that are based on the -mcpu=
+   option, and not on user options that can be turned on or off.  The intention
+   is for newer processors (power7 and above) to not add new ISA bits for the
+   particular processor, but add these bits.  Otherwise we have to add a bunch
+   of hidden options, just so we have the proper ISA bits.
+
+   For example, in the past we added -mpower8-internal, so that on power8,
+   power9, and power10 would inherit the option

[gcc(refs/users/meissner/heads/work177)] Use architecture flags for defining _ARCH_PWR macros.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:559e865070211ec4a193a5565f1b23edca37f438

commit 559e865070211ec4a193a5565f1b23edca37f438
Author: Michael Meissner 
Date:   Tue Sep 3 19:53:55 2024 -0400

Use architecture flags for defining _ARCH_PWR macros.

For the newer architectures, this patch changes GCC to define the 
_ARCH_PWR
macros using the new architecture flags instead of relying on isa options 
like
-mpower10.

The -mpower8-internal, -mpower10, and -mpower11 options were removed.  The
-mpower11 option was removed completely, since it was just added in GCC 15. 
 The
other two options were marked as WarnRemoved, and the various ISA bits were
removed.

TARGET_POWER8 and TARGET_POWER10 were re-defined to use the architeture bits
instead of the ISA bits.

There are other internal isa bits that aren't removed with this patch 
because
the built-in function support uses those bits.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

gcc/

* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros) Add 
support to
use architecture flags instead of ISA flags for setting most of the
_ARCH_PWR* macros.
(rs6000_cpu_cpp_builtins): Update rs6000_target_modify_macros call.
* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Remove
OPTION_MASK_POWER8.
(ISA_3_1_MASKS_SERVER): Remove OPTION_MASK_POWER10.
(POWER11_MASKS_SERVER): Remove OPTION_MASK_POWER11.
(POWERPC_MASKS): Remove OPTION_MASK_POWER8, OPTION_MASK_POWER10, and
OPTION_MASK_POWER11.
* config/rs6000/rs6000-protos.h (rs6000_target_modify_macros): 
Update
declaration.
(rs6000_target_modify_macros_ptr): Likewise.
* config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): 
Likewise.
(rs6000_option_override_internal): Use architecture flags instead 
of ISA
flags.
(rs6000_opt_masks): Remove -mpower10 and -mpower11, which are no 
longer
in the ISA flags.
(rs6000_pragma_target_parse): Use architecture flags as well as ISA
flags.
* config/rs6000/rs6000.h (TARGET_POWER4): New macro.
(TARGET_POWER5): Likewise.
(TARGET_POWER5X): Likewise.
(TARGET_POWER6): Likewise.
(TARGET_POWER7): Likewise.
(TARGET_POWER8): Likewise.
(TARGET_POWER9): Likewise.
(TARGET_POWER10): Likewise.
(TARGET_POWER11): Likewise.
* config/rs6000/rs6000.opt (-mpower8-internal): Remove ISA flag 
bits.
(-mpower10): Likewise.
(-mpower11): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-c.cc | 27 +++
 gcc/config/rs6000/rs6000-cpus.def |  8 +---
 gcc/config/rs6000/rs6000-protos.h |  5 +++--
 gcc/config/rs6000/rs6000.cc   | 19 +++
 gcc/config/rs6000/rs6000.h| 20 
 gcc/config/rs6000/rs6000.opt  | 11 ++-
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 04882c396bfe..c8f33289fa38 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -338,7 +338,8 @@ rs6000_define_or_undefine_macro (bool define_p, const char 
*name)
#pragma GCC target, we need to adjust the macros dynamically.  */
 
 void
-rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
+rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
+HOST_WIDE_INT arch_flags)
 {
   if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
 fprintf (stderr,
@@ -411,7 +412,7 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
summary of the flags associated with particular cpu
definitions.  */
 
-  /* rs6000_isa_flags based options.  */
+  /* rs6000_isa_flags and rs6000_arch_flags based options.  */
   rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC");
   if ((flags & OPTION_MASK_PPC_GPOPT) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPCSQ");
@@ -419,23 +420,25 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPCGR");
   if ((flags & OPTION_MASK_POWERPC64) != 0)
 rs6000_define_or_undefine

[gcc(refs/users/meissner/heads/work177)] Do not allow -mvsx to boost processor to power7.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:75c6f74df51de2751c3da377b15da0d1c821d244

commit 75c6f74df51de2751c3da377b15da0d1c821d244
Author: Michael Meissner 
Date:   Tue Sep 3 19:55:32 2024 -0400

Do not allow -mvsx to boost processor to power7.

This patch restructures the code so that -mvsx for example will not silently
convert the processor to power7.  The user must now use -mcpu=power7 or 
higher.
This means if the user does -mvsx and the default processor does not have 
VSX
support, it will be an error.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

gcc/

* config/rs6000/rs6000.cc (report_architecture_mismatch): New 
function.
Report an error if the user used an option such as -mvsx when the
default processor would not allow the option.
(rs6000_option_override_internal): Move some ISA checking code into
report_architecture_mismatch.

Diff:
---
 gcc/config/rs6000/rs6000.cc | 129 +++-
 1 file changed, 79 insertions(+), 50 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0d3e1c731db2..c6176c59a1fe 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1173,6 +1173,7 @@ const int INSN_NOT_AVAILABLE = -1;
 static void rs6000_print_isa_options (FILE *, int, const char *,
  HOST_WIDE_INT, HOST_WIDE_INT);
 static HOST_WIDE_INT rs6000_disable_incompatible_switches (void);
+static void report_architecture_mismatch (void);
 
 static enum rs6000_reg_type register_to_reg_type (rtx, bool *);
 static bool rs6000_secondary_reload_move (enum rs6000_reg_type,
@@ -3695,7 +3696,6 @@ rs6000_option_override_internal (bool global_init_p)
   bool ret = true;
 
   HOST_WIDE_INT set_masks;
-  HOST_WIDE_INT ignore_masks;
   int cpu_index = -1;
   int tune_index;
   struct cl_target_option *main_target_opt
@@ -3964,59 +3964,13 @@ rs6000_option_override_internal (bool global_init_p)
 dwarf_offset_size = POINTER_SIZE_UNITS;
 #endif
 
-  /* Handle explicit -mno-{altivec,vsx} and turn off all of
- the options that depend on those flags.  */
-  ignore_masks = rs6000_disable_incompatible_switches ();
-
-  /* For the newer switches (vsx, dfp, etc.) set some of the older options,
- unless the user explicitly used the -mno- to disable the code.  */
-  if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_MISC)
-rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_P9_MINMAX)
-{
-  if (cpu_index >= 0)
-   {
- if (cpu_index == PROCESSOR_POWER9)
-   {
- /* legacy behavior: allow -mcpu=power9 with certain
-capabilities explicitly disabled.  */
- rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks);
-   }
- else
-   error ("power9 target option is incompatible with %<%s=%> "
-  "for  less than power9", "-mcpu");
-   }
-  else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit)
-  != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags
-  & rs6000_isa_flags_explicit))
-   /* Enforce that none of the ISA_3_0_MASKS_SERVER flags
-  were explicitly cleared.  */
-   error ("%qs incompatible with explicitly disabled options",
-  "-mpower9-minmax");
-  else
-   rs6000_isa_flags |= ISA_3_0_MASKS_SERVER;
-}
-  else if (TARGET_P8_VECTOR || TARGET_POWER8 || TARGET_CRYPTO)
-rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_VSX)
-rs6000_isa_flags |= (ISA_2_6_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_POPCNTD)
-rs6000_isa_flags |= (ISA_2_6_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_DFP)
-rs6000_isa_flags |= (ISA_2_5_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_CMPB)
-rs6000_isa_flags |= (ISA_2_5_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_FPRND)
-rs6000_isa_flags |= (ISA_2_4_MASKS & ~ignore_masks);
-  else if (TARGET_POPCNTB)
-rs6000_isa_flags |= (ISA_2_2_MASKS & ~ignore_masks);
-  else if (TARGET_ALTIVEC)
-rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~ignore_masks);
+  /* Report trying to use things like -mmodulo to imply -mcpu=power9.  */
+  report_architecture_mismatch ();
 
   /* Disable VSX and Altivec silently if the user switched cpus to power7 in a
  target at

[gcc(refs/users/meissner/heads/work177)] Change TARGET_POPCNTB to TARGET_POWER5

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:9e092b36f68d63b236f8693e0d9f445a81ab370b

commit 9e092b36f68d63b236f8693e0d9f445a81ab370b
Author: Michael Meissner 
Date:   Tue Sep 3 19:57:43 2024 -0400

Change TARGET_POPCNTB to TARGET_POWER5

As part of the architecture flags patches, this patch changes the use of
TARGET_POPCNTB to TARGET_POWER5.  The POPCNTB instruction was added in ISA 
2.02
(power5).

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER5 instead of TARGET_POPCNTB.
* config/rs6000/rs6000.h (TARGET_EXTRA_BUILTINS): Use TARGET_POWER5
instead of TARGET_POPCNTB.  Eliminate TARGET_CMPB and TARGET_POPCNTD
tests since TARGET_POWER5 will always be true for those tests.
(TARGET_FRE): Use TARGET_POWER5 instead of TARGET_POPCNTB.
(TARGET_FRSQRTES): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
(popcount): Use TARGET_POWER5 instead of TARGET_POPCNTB.  Drop
test for TARGET_POPCNTD (i.e power7), since TARGET_POPCNTB will 
always
be set if TARGET_POPCNTD is set.
(popcntb2): Use TARGET_POWER5 instead of TARGET_POPCNTB.
(parity2): Likewise.
(parity2_cmpb): Remove TARGET_POPCNTB test, since it will 
always
be true when TARGET_CMPB (i.e. power6) is set.

Diff:
---
 gcc/config/rs6000/rs6000-builtin.cc |  2 +-
 gcc/config/rs6000/rs6000.h  |  8 +++-
 gcc/config/rs6000/rs6000.md | 10 +-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 9bdbae1ecf94..98a0545030cd 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -155,7 +155,7 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_ALWAYS:
   return true;
 case ENB_P5:
-  return TARGET_POPCNTB;
+  return TARGET_POWER5;
 case ENB_P6:
   return TARGET_CMPB;
 case ENB_P6_64:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 7ad8baca177a..4500724d895c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -547,9 +547,7 @@ extern int rs6000_vector_align[];
 
 #define TARGET_EXTRA_BUILTINS  (TARGET_POWERPC64\
 || TARGET_PPC_GPOPT /* 970/power4 */\
-|| TARGET_POPCNTB   /* ISA 2.02 */  \
-|| TARGET_CMPB  /* ISA 2.05 */  \
-|| TARGET_POPCNTD   /* ISA 2.06 */  \
+|| TARGET_POWER5/* ISA 2.02 & above */ \
 || TARGET_ALTIVEC   \
 || TARGET_VSX   \
 || TARGET_HARD_FLOAT)
@@ -563,9 +561,9 @@ extern int rs6000_vector_align[];
 #define TARGET_FRES(TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT)
 
 #define TARGET_FRE (TARGET_HARD_FLOAT \
-&& (TARGET_POPCNTB || VECTOR_UNIT_VSX_P (DFmode)))
+&& (TARGET_POWER5 || VECTOR_UNIT_VSX_P (DFmode)))
 
-#define TARGET_FRSQRTES(TARGET_HARD_FLOAT && TARGET_POPCNTB \
+#define TARGET_FRSQRTES(TARGET_HARD_FLOAT && TARGET_POWER5 \
 && TARGET_PPC_GFXOPT)
 
 #define TARGET_FRSQRTE (TARGET_HARD_FLOAT \
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 8eda2f7bb0d7..10d13bf812d2 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -379,7 +379,7 @@
  (const_int 1)
 
  (and (eq_attr "isa" "p5")
- (match_test "TARGET_POPCNTB"))
+ (match_test "TARGET_POWER5"))
  (const_int 1)
 
  (and (eq_attr "isa" "p6")
@@ -2510,7 +2510,7 @@
 (define_expand "popcount2"
   [(set (match_operand:GPR 0 "gpc_reg_operand")
(popcount:GPR (match_operand:GPR 1 "gpc_reg_operand")))]
-  "TARGET_POPCNTB || TARGET_POPCNTD"
+  "TARGET_POWER5"
 {
   rs6000_emit_popcount (operands[0], operands[1]);
   DONE;
@@ -2520,7 +2520,7 @@
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")]
UNSP

[gcc(refs/users/meissner/heads/work177)] Change TARGET_FPRND to TARGET_POWER5X

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:c3fd033dca28a0617f0ce7e64f4a9b90c15f133e

commit c3fd033dca28a0617f0ce7e64f4a9b90c15f133e
Author: Michael Meissner 
Date:   Tue Sep 3 19:58:59 2024 -0400

Change TARGET_FPRND to TARGET_POWER5X

As part of the architecture flags patches, this patch changes the use of
TARGET_FPRND to TARGET_POWER5X.  The FPRND instruction was added in power5+.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

* config/rs6000/rs6000.cc (report_architecture_mismatch): Use
TARGET_POWER5X instead of TARGET_FPRND.
* config/rs6000/rs6000.md (fmod3): Use TARGET_POWER5X instead 
of
TARGET_FPRND.
(remainder3): Likewise.
(fctiwuz_): Likewise.
(btrunc2): Likewise.
(ceil2): Likewise.
(floor2): Likewise.
(round): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000.cc |  2 +-
 gcc/config/rs6000/rs6000.md | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index c6176c59a1fe..7805f8b3d491 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -25428,7 +25428,7 @@ report_architecture_mismatch (void)
 rs6000_isa_flags |= (ISA_2_5_MASKS_SERVER & ~ignore_masks);
   else if (TARGET_CMPB)
 rs6000_isa_flags |= (ISA_2_5_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_FPRND)
+  else if (TARGET_POWER5X)
 rs6000_isa_flags |= (ISA_2_4_MASKS & ~ignore_masks);
   else if (TARGET_POPCNTB)
 rs6000_isa_flags |= (ISA_2_2_MASKS & ~ignore_masks);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 10d13bf812d2..7f9fe609a031 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5171,7 +5171,7 @@
(use (match_operand:SFDF 1 "gpc_reg_operand"))
(use (match_operand:SFDF 2 "gpc_reg_operand"))]
   "TARGET_HARD_FLOAT
-   && TARGET_FPRND
+   && TARGET_POWER5X
&& flag_unsafe_math_optimizations"
 {
   rtx div = gen_reg_rtx (mode);
@@ -5189,7 +5189,7 @@
(use (match_operand:SFDF 1 "gpc_reg_operand"))
(use (match_operand:SFDF 2 "gpc_reg_operand"))]
   "TARGET_HARD_FLOAT
-   && TARGET_FPRND
+   && TARGET_POWER5X
&& flag_unsafe_math_optimizations"
 {
   rtx div = gen_reg_rtx (mode);
@@ -6687,7 +6687,7 @@
 (define_insn "*friz"
   [(set (match_operand:DF 0 "gpc_reg_operand" "=d,wa")
(float:DF (fix:DI (match_operand:DF 1 "gpc_reg_operand" "d,wa"]
-  "TARGET_HARD_FLOAT && TARGET_FPRND
+  "TARGET_HARD_FLOAT && TARGET_POWER5X
&& flag_unsafe_math_optimizations && !flag_trapping_math && TARGET_FRIZ"
   "@
friz %0,%1
@@ -6815,7 +6815,7 @@
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIZ))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
friz %0,%1
xsrdpiz %x0,%x1"
@@ -6825,7 +6825,7 @@
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIP))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
frip %0,%1
xsrdpip %x0,%x1"
@@ -6835,7 +6835,7 @@
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIM))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
frim %0,%1
xsrdpim %x0,%x1"
@@ -6846,7 +6846,7 @@
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "")]
 UNSPEC_FRIN))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "frin %0,%1"
   [(set_attr "type" "fp")])


[gcc(refs/users/meissner/heads/work177)] Change TARGET_CMPB to TARGET_POWER6

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:924e0f641636c09990f2d5094f9585ae5818bf16

commit 924e0f641636c09990f2d5094f9585ae5818bf16
Author: Michael Meissner 
Date:   Tue Sep 3 21:53:25 2024 -0400

Change TARGET_CMPB to TARGET_POWER6

As part of the architecture flags patches, this patch changes the use of
TARGET_FPRND to TARGET_POWER6.  The CMPB instruction was added in power6 
(ISA
2.05).

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER6 instead of TARGET_CMPB.
* config/rs6000/rs6000.h (TARGET_FCFID): Merge tests for popcntb, 
cmpb,
and popcntd into a single test for TARGET_POWER5.
(TARGET_LFIWAX): Use TARGET_POWER6 instead of TARGET_CMPB.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
(parity2_cmp): Likewise.
(cmpb): Likewise.
(copysign3): Likewise.
(copysign3_fcpsgn): Likewise.
(cmpstrnsi): Likewise.
(cmpstrsi): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtin.cc |  4 ++--
 gcc/config/rs6000/rs6000.h  |  6 ++
 gcc/config/rs6000/rs6000.md | 16 
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 98a0545030cd..76421bd1de0b 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -157,9 +157,9 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P5:
   return TARGET_POWER5;
 case ENB_P6:
-  return TARGET_CMPB;
+  return TARGET_POWER6;
 case ENB_P6_64:
-  return TARGET_CMPB && TARGET_POWERPC64;
+  return TARGET_POWER6 && TARGET_POWERPC64;
 case ENB_P7:
   return TARGET_POPCNTD;
 case ENB_P7_64:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 4500724d895c..d22693eb2bfb 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -448,13 +448,11 @@ extern int rs6000_vector_align[];
Enable 32-bit fcfid's on any of the switches for newer ISA machines.  */
 #define TARGET_FCFID   (TARGET_POWERPC64   \
 || TARGET_PPC_GPOPT/* 970/power4 */\
-|| TARGET_POPCNTB  /* ISA 2.02 */  \
-|| TARGET_CMPB /* ISA 2.05 */  \
-|| TARGET_POPCNTD) /* ISA 2.06 */
+|| TARGET_POWER5)  /* ISA 2.02 and above */ \
 
 #define TARGET_FCTIDZ  TARGET_FCFID
 #define TARGET_STFIWX  TARGET_PPC_GFXOPT
-#define TARGET_LFIWAX  TARGET_CMPB
+#define TARGET_LFIWAX  TARGET_POWER6
 #define TARGET_LFIWZX  TARGET_POPCNTD
 #define TARGET_FCFIDS  TARGET_POPCNTD
 #define TARGET_FCFIDU  TARGET_POPCNTD
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 7f9fe609a031..0c303087e944 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -383,7 +383,7 @@
  (const_int 1)
 
  (and (eq_attr "isa" "p6")
- (match_test "TARGET_CMPB"))
+ (match_test "TARGET_POWER6"))
  (const_int 1)
 
  (and (eq_attr "isa" "p7")
@@ -2544,7 +2544,7 @@
 (define_insn "parity2_cmpb"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")] 
UNSPEC_PARITY))]
-  "TARGET_CMPB"
+  "TARGET_POWER6"
   "prty %0,%1"
   [(set_attr "type" "popcnt")])
 
@@ -2597,7 +2597,7 @@
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")
 (match_operand:GPR 2 "gpc_reg_operand" "r")] UNSPEC_CMPB))]
-  "TARGET_CMPB"
+  "TARGET_POWER6"
   "cmpb %0,%1,%2"
   [(set_attr "type" "cmp")])
 
@@ -5401,7 +5401,7 @@
&& ((TARGET_PPC_GFXOPT
 && !HONOR_NANS (mode)
 && !HONOR_SIGNED_ZEROS (mode))
-   || TARGET_CMPB
+   || TARGET_POWER6
|| VECTOR_UNIT_VSX_P (mode))"
 {
   /* Middle-end canonicalizes -fabs (x) to copysign (x, -1),
@@ -5422,7 +5422,7 @@
   if (!gpc_reg_operand (operands[2], mode))
 operands[2] = copy_to_mode_reg (mode, operands[2]);
 
-  if (TARGET_CMPB || VECTOR_UNIT_VSX_P (mode))
+  if (TARGET_POWER6 || VECTOR_UNIT_VSX_P (mode))
 {
   emit_insn (gen_copysign3_fcpsgn

[gcc(refs/users/meissner/heads/work177)] Change TARGET_POPCNTD to TARGET_POWER7

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:59231f09ecfa595b602bcb96ce8d9ea73b57e934

commit 59231f09ecfa595b602bcb96ce8d9ea73b57e934
Author: Michael Meissner 
Date:   Tue Sep 3 21:54:04 2024 -0400

Change TARGET_POPCNTD to TARGET_POWER7

As part of the architecture flags patches, this patch changes the use of
TARGET_POPCNTD to TARGET_POWER7.  The POPCNTD instruction was added in 
power7
(ISA 2.06).

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

* config/rs6000/dfp.md (floatdidd2): Change TARGET_POPCNTD to
TARGET_POWER7.
* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported):
Likewise.
* config/rs6000/rs6000-string.cc (expand_block_compare_gpr): 
Likewise.
* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
Likewise.
(rs6000_rtx_costs): Likewise.
(rs6000_emit_popcount): Likewise.
* config/rs6000/rs6000.h (TARGET_LDBRX): Likewise.
(TARGET_LFIWZX): Likewise.
(TARGET_FCFIDS): Likewise.
(TARGET_FCFIDU): Likewise.
(TARGET_FCFIDUS): Likewise.
(TARGET_FCTIDUZ): Likewise.
(TARGET_FCTIWUZ): Likewise.
(CTZ_DEFINED_VALUE_AT_ZERO): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
(ctz2): Likewise.
(popcntd2): Likewise.
(lrintsi2): Likewise.
(lrintsi): Likewise.
(lrintsi_di): Likewise.
(cmpmemsi): Likewise.
(bpermd_"): Likewise.
(addg6s): Likewise.
(cdtbcd): Likewise.
(cbcdtd): Likewise.
(div_): Likewise.

Diff:
---
 gcc/config/rs6000/dfp.md|  2 +-
 gcc/config/rs6000/rs6000-builtin.cc |  4 ++--
 gcc/config/rs6000/rs6000-string.cc  |  4 ++--
 gcc/config/rs6000/rs6000.cc |  6 +++---
 gcc/config/rs6000/rs6000.h  | 16 
 gcc/config/rs6000/rs6000.md | 24 
 6 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index fa9d7dd45dd3..b8189390d410 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -214,7 +214,7 @@
 (define_insn "floatdidd2"
   [(set (match_operand:DD 0 "gpc_reg_operand" "=d")
(float:DD (match_operand:DI 1 "gpc_reg_operand" "d")))]
-  "TARGET_DFP && TARGET_POPCNTD"
+  "TARGET_DFP && TARGET_POWER7"
   "dcffix %0,%1"
   [(set_attr "type" "dfp")])
 
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 76421bd1de0b..dae43b672ea7 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -161,9 +161,9 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P6_64:
   return TARGET_POWER6 && TARGET_POWERPC64;
 case ENB_P7:
-  return TARGET_POPCNTD;
+  return TARGET_POWER7;
 case ENB_P7_64:
-  return TARGET_POPCNTD && TARGET_POWERPC64;
+  return TARGET_POWER7 && TARGET_POWERPC64;
 case ENB_P8:
   return TARGET_POWER8;
 case ENB_P8V:
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 55b4133b1a34..3674c4bd9847 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1948,8 +1948,8 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
-  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
-  gcc_assert (TARGET_POPCNTD);
+  /* TARGET_POWER7 is already guarded at expand cmpmemsi.  */
+  gcc_assert (TARGET_POWER7);
 
   /* For P8, this case is complicated to handle because the subtract
  with carry instructions do not generate the 64-bit carry and so
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 7805f8b3d491..a5125f882c78 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1999,7 +1999,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
machine_mode mode)
  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
return 1;
 
- if (TARGET_POPCNTD && mode == SImode)
+ if (TARGET_POWER7 && mode == SImode)
return 1;
 
  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
@@ -22473,7 +22473,7 @@ rs6000_rtx_

[gcc(refs/users/meissner/heads/work177)] Change TARGET_MODULO to TARGET_POWER9

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:33a2cf2f215c286088c33bc49f4426e1c354be5d

commit 33a2cf2f215c286088c33bc49f4426e1c354be5d
Author: Michael Meissner 
Date:   Tue Sep 3 21:55:50 2024 -0400

Change TARGET_MODULO to TARGET_POWER9

As part of the architecture flags patches, this patch changes the use of
TARGET_MODULO to TARGET_POWER9.  The modulo instructions were added in 
power9 (ISA
3.0).  Note, I did not change the uses of TARGET_MODULO where it was 
explicitly
generating different code if the machine had a modulo instruction.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER9 instead of TARGET_MODULO.
* config/rs6000/rs6000.h (TARGET_CTZ): Likewise.
(TARGET_EXTSWSLI): Likewise.
(TARGET_MADDLD): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtin.cc | 4 ++--
 gcc/config/rs6000/rs6000.h  | 6 +++---
 gcc/config/rs6000/rs6000.md | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index dae43b672ea7..b6093b3cb64c 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -169,9 +169,9 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P8V:
   return TARGET_P8_VECTOR;
 case ENB_P9:
-  return TARGET_MODULO;
+  return TARGET_POWER9;
 case ENB_P9_64:
-  return TARGET_MODULO && TARGET_POWERPC64;
+  return TARGET_POWER9 && TARGET_POWERPC64;
 case ENB_P9V:
   return TARGET_P9_VECTOR;
 case ENB_P10:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3a03c32f..89ca1bad80f3 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -461,9 +461,9 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIWUZ TARGET_POWER7
 /* Only powerpc64 and powerpc476 support fctid.  */
 #define TARGET_FCTID   (TARGET_POWERPC64 || rs6000_cpu == PROCESSOR_PPC476)
-#define TARGET_CTZ TARGET_MODULO
-#define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
-#define TARGET_MADDLD  TARGET_MODULO
+#define TARGET_CTZ TARGET_POWER9
+#define TARGET_EXTSWSLI(TARGET_POWER9 && TARGET_POWERPC64)
+#define TARGET_MADDLD  TARGET_POWER9
 
 /* TARGET_DIRECT_MOVE is redundant to TARGET_P8_VECTOR, so alias it to that.  
*/
 #define TARGET_DIRECT_MOVE TARGET_P8_VECTOR
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bff898a4eff1..fc0d454e9a42 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -403,7 +403,7 @@
  (const_int 1)
 
  (and (eq_attr "isa" "p9")
- (match_test "TARGET_MODULO"))
+ (match_test "TARGET_POWER9"))
  (const_int 1)
 
  (and (eq_attr "isa" "p9v")


[gcc(refs/users/meissner/heads/work177)] Update tests to work with architecture flags changes.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:62ff64e4d799235646100d85b4428267fa3844c3

commit 62ff64e4d799235646100d85b4428267fa3844c3
Author: Michael Meissner 
Date:   Tue Sep 3 21:57:11 2024 -0400

Update tests to work with architecture flags changes.

Two tests used -mvsx to raise the processor level to at least power7.  These
tests were rewritten to add cpu=power7 support.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define 
(like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I 
ran
this test for all supported combinations of -mcpu, big/little endian, and 
32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-09-03  Michael Meissner  

gcc/testsuite/

* gcc.target/powerpc/ppc-target-4.c: Rewrite the test to add 
cpu=power7
when we need to add VSX support.  Add test for adding cpu=power7 
no-vsx
to generate only Altivec instructions.
* gcc.target/powerpc/pr115688.c: Add cpu=power7 when requesting VSX
instructions.

Diff:
---
 gcc/testsuite/gcc.target/powerpc/ppc-target-4.c | 38 +++--
 gcc/testsuite/gcc.target/powerpc/pr115688.c |  3 +-
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c 
b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
index feef76db4618..5e2ecf34f249 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
@@ -2,7 +2,7 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_fprs } */
 /* { dg-options "-O2 -ffast-math -mdejagnu-cpu=power5 -mno-altivec 
-mabi=altivec -fno-unroll-loops" } */
-/* { dg-final { scan-assembler-times "vaddfp" 1 } } */
+/* { dg-final { scan-assembler-times "vaddfp" 2 } } */
 /* { dg-final { scan-assembler-times "xvaddsp" 1 } } */
 /* { dg-final { scan-assembler-times "fadds" 1 } } */
 
@@ -18,10 +18,6 @@
 #error "__VSX__ should not be defined."
 #endif
 
-#pragma GCC target("altivec,vsx")
-#include 
-#pragma GCC reset_options
-
 #pragma GCC push_options
 #pragma GCC target("altivec,no-vsx")
 
@@ -33,6 +29,7 @@
 #error "__VSX__ should not be defined."
 #endif
 
+/* Altivec build, generate vaddfp.  */
 void
 av_add (vector float *a, vector float *b, vector float *c)
 {
@@ -40,10 +37,11 @@ av_add (vector float *a, vector float *b, vector float *c)
   unsigned long n = SIZE / 4;
 
   for (i = 0; i < n; i++)
-a[i] = vec_add (b[i], c[i]);
+a[i] = b[i] + c[i];
 }
 
-#pragma GCC target("vsx")
+/* cpu=power7 must be used to enable VSX.  */
+#pragma GCC target("cpu=power7,vsx")
 
 #ifndef __ALTIVEC__
 #error "__ALTIVEC__ should be defined."
@@ -53,6 +51,7 @@ av_add (vector float *a, vector float *b, vector float *c)
 #error "__VSX__ should be defined."
 #endif
 
+/* VSX build on power7, generate xsaddsp.  */
 void
 vsx_add (vector float *a, vector float *b, vector float *c)
 {
@@ -60,11 +59,31 @@ vsx_add (vector float *a, vector float *b, vector float *c)
   unsigned long n = SIZE / 4;
 
   for (i = 0; i < n; i++)
-a[i] = vec_add (b[i], c[i]);
+a[i] = b[i] + c[i];
+}
+
+#pragma GCC target("cpu=power7,no-vsx")
+
+#ifndef __ALTIVEC__
+#error "__ALTIVEC__ should be defined."
+#endif
+
+#ifdef __VSX__
+#error "__VSX__ should not be defined."
+#endif
+
+/* Altivec build on power7 with no VSX, generate vaddfp.  */
+void
+av2_add (vector float *a, vector float *b, vector float *c)
+{
+  unsigned long i;
+  unsigned long n = SIZE / 4;
+
+  for (i = 0; i < n; i++)
+a[i] = b[i] + c[i];
 }
 
 #pragma GCC pop_options
-#pragma GCC target("no-vsx,no-altivec")
 
 #ifdef __ALTIVEC__
 #error "__ALTIVEC__ should not be defined."
@@ -74,6 +93,7 @@ vsx_add (vector float *a, vector float *b, vector float *c)
 #error "__VSX__ should not be defined."
 #endif
 
+/* Default power5 build, generate scalar fadds.  */
 void
 norm_add (float *a, float *b, float *c)
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr115688.c 
b/gcc/testsuite/gcc.target/powerpc/pr115688.c
index 5222e66ef170..00c7c301436a 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr115688.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr115688.c
@@ -7,7 +7,8 @@
 
 /* Verify there is no ICE under 32 bit env.  */
 
-__attribute__((target("vsx")))
+/* cpu=power7 must be used to enable VSX.  */
+__attribute__((target("cpu=power7,vsx")))
 int test (void)
 {
   return 0;


[gcc(refs/users/meissner/heads/work177)] Add support for -mcpu=future

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:bb83c503c32a453704994cb802985119d3199ff0

commit bb83c503c32a453704994cb802985119d3199ff0
Author: Michael Meissner 
Date:   Tue Sep 3 21:59:12 2024 -0400

Add support for -mcpu=future

This patch adds the support that can be used in developing GCC support for
future PowerPC processors.

2024-09-03  Michael Meissner  

* config.gcc (powerpc*-*-*): Add support for --with-cpu=future.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for 
-mcpu=future.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/rs6000-arch.def: Add future cpu.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): If
-mcpu=future, define _ARCH_FUTURE.
* config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
(future cpu): Define.
* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_FUTURE.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (power10_cost): Update comment.
(get_arch_flags): Add support for future processor.
(rs6000_option_override_internal): Likewise.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
(TARGET_POWER11): New macro.
* config/rs6000/rs6000.md (cpu attribute): Likewise.

Diff:
---
 gcc/config.gcc  |  4 ++--
 gcc/config/rs6000/aix71.h   |  1 +
 gcc/config/rs6000/aix72.h   |  1 +
 gcc/config/rs6000/aix73.h   |  1 +
 gcc/config/rs6000/driver-rs6000.cc  |  2 ++
 gcc/config/rs6000/rs6000-arch.def   |  1 +
 gcc/config/rs6000/rs6000-c.cc   |  2 ++
 gcc/config/rs6000/rs6000-cpus.def   |  3 +++
 gcc/config/rs6000/rs6000-opts.h |  1 +
 gcc/config/rs6000/rs6000-tables.opt | 11 +++
 gcc/config/rs6000/rs6000.cc | 34 ++
 gcc/config/rs6000/rs6000.h  |  2 ++
 gcc/config/rs6000/rs6000.md |  2 +-
 13 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f09ce9f63a01..0b794e977f6a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -539,7 +539,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
-   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
+   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500|xfuture)
cpu_is_64bit=yes
;;
esac
@@ -5646,7 +5646,7 @@ case "${target}" in
tm_defines="${tm_defines} CONFIG_PPC405CR"
eval "with_$which=405"
;;
-   "" | common | native \
+   "" | common | native | future \
| power[3456789] | power1[01] | power5+ | power6x \
| powerpc | powerpc64 | powerpc64le \
| rs64 \
diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 41037b3852d7..570ddcc451db 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -mpwr11; \
   mcpu=power10: -mpwr10; \
   mcpu=power9: -mpwr9; \
diff --git a/gcc/config/rs6000/aix72.h b/gcc/config/rs6000/aix72.h
index fe59f8319b48..242ca94bd065 100644
--- a/gcc/config/rs6000/aix72.h
+++ b/gcc/config/rs6000/aix72.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -mpwr11; \
   mcpu=power10: -mpwr10; \
   mcpu=power9: -mpwr9; \
diff --git a/gcc/config/rs6000/aix73.h b/gcc/config/rs6000/aix73.h
index 1318b0b3662d..2bd6b4bb3c4f 100644
--- a/gcc/config/rs6000/aix73.h
+++ b/gcc/config/rs6000/aix73.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -m

[gcc(refs/users/meissner/heads/work177)] Add -mcpu=future tuning support.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:6671caf766c5022dc409d9be4a5dc81526cbf1ed

commit 6671caf766c5022dc409d9be4a5dc81526cbf1ed
Author: Michael Meissner 
Date:   Tue Sep 3 22:00:24 2024 -0400

Add -mcpu=future tuning support.

This patch makes -mtune=future use the same tuning decision as 
-mtune=power11.

2024-09-03  Michael Meissner  

gcc/

* config/rs6000/power10.md (all reservations): Add future as an
alterntive to power10 and power11.

Diff:
---
 gcc/config/rs6000/power10.md | 144 +--
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 2310c4603457..e42b057dc45b 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -1,4 +1,4 @@
-;; Scheduling description for the IBM Power10 and Power11 processors.
+;; Scheduling description for the IBM Power10, Power11, and Future processors.
 ;; Copyright (C) 2020-2024 Free Software Foundation, Inc.
 ;;
 ;; Contributed by Pat Haugen (pthau...@us.ibm.com).
@@ -97,12 +97,12 @@
(eq_attr "update" "no")
(eq_attr "size" "!128")
(eq_attr "prefixed" "no")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 (define_insn_reservation "power10-fused-load" 4
   (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-prefixed-load" 4
@@ -110,13 +110,13 @@
(eq_attr "update" "no")
(eq_attr "size" "!128")
(eq_attr "prefixed" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-load-update" 4
   (and (eq_attr "type" "load")
(eq_attr "update" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 (define_insn_reservation "power10-fpload-double" 4
@@ -124,7 +124,7 @@
(eq_attr "update" "no")
(eq_attr "size" "64")
(eq_attr "prefixed" "no")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 (define_insn_reservation "power10-prefixed-fpload-double" 4
@@ -132,14 +132,14 @@
(eq_attr "update" "no")
(eq_attr "size" "64")
(eq_attr "prefixed" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-fpload-update-double" 4
   (and (eq_attr "type" "fpload")
(eq_attr "update" "yes")
(eq_attr "size" "64")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 ; SFmode loads are cracked and have additional 3 cycles over DFmode
@@ -148,27 +148,27 @@
   (and (eq_attr "type" "fpload")
(eq_attr "update" "no")
(eq_attr "size" "32")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-fpload-update-single" 7
   (and (eq_attr "type" "fpload")
(eq_attr "update" "yes")
(eq_attr "size" "32")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 (define_insn_reservation "power10-vecload" 4
   (and (eq_attr "type" "vecload")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 ; lxvp
 (define_insn_reservation "power10-vecload-pair" 4
   (and (eq_attr "type" "vecload")
(eq_attr "size" "256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 ; Store Unit
@@ -178,12 +178,12 @@
(eq_attr "prefixed" "no")
(eq_attr "size" "!128")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,STU_power10")
 
 (define_insn_reservation "power10-fused-store" 0
   (and (eq_attr "type" "fused_store_store")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,STU_power10")
 
 (define_insn_reservation "power10-prefixed-store" 0
@@ -191,52 +191,52 @@
(eq_attr "prefixed" "yes")
(eq_attr "size" "!128")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,STU_power10")
 
 ; Update forms have 2 cycle latency for update

[gcc(refs/users/meissner/heads/work177)] Update ChangeLog.*

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:c6a092f33bc8540eb5fdf8fd60f7ea84b5e3a934

commit c6a092f33bc8540eb5fdf8fd60f7ea84b5e3a934
Author: Michael Meissner 
Date:   Tue Sep 3 22:04:07 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.meissner | 449 -
 1 file changed, 448 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 581879743f29..4f1a776eebd5 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,6 +1,453 @@
+ Branch work177, patch #21 
+
+Add -mcpu=future tuning support.
+
+This patch makes -mtune=future use the same tuning decision as -mtune=power11.
+
+2024-08-17  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/power10.md (all reservations): Add future as an
+   alterntive to power10 and power11.
+
+ Branch work177, patch #20 
+
+Add support for -mcpu=future
+
+This patch adds the support that can be used in developing GCC support for
+future PowerPC processors.
+
+2024-08-17  Michael Meissner  
+
+   * config.gcc (powerpc*-*-*): Add support for --with-cpu=future.
+   * config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=future.
+   * config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
+   * config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
+   * config/rs6000/driver-rs6000.cc (asm_names): Likewise.
+   * config/rs6000/rs6000-arch.def: Add future cpu.
+   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): If
+   -mcpu=future, define _ARCH_FUTURE.
+   * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
+   (future cpu): Define.
+   * config/rs6000/rs6000-opts.h (enum processor_type): Add
+   PROCESSOR_FUTURE.
+   * config/rs6000/rs6000-tables.opt: Regenerate.
+   * config/rs6000/rs6000.cc (power10_cost): Update comment.
+   (get_arch_flags): Add support for future processor.
+   (rs6000_option_override_internal): Likewise.
+   (rs6000_machine_from_flags): Likewise.
+   (rs6000_reassociation_width): Likewise.
+   (rs6000_adjust_cost): Likewise.
+   (rs6000_issue_rate): Likewise.
+   (rs6000_sched_reorder): Likewise.
+   (rs6000_sched_reorder2): Likewise.
+   (rs6000_register_move_cost): Likewise.
+   * config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
+   (TARGET_POWER11): New macro.
+   * config/rs6000/rs6000.md (cpu attribute): Likewise.
+
+ Branch work177, patch #9 
+
+Update tests to work with architecture flags changes.
+
+Two tests used -mvsx to raise the processor level to at least power7.  These
+tests were rewritten to add cpu=power7 support.
+
+I have built both big endian and little endian bootstrap compilers and there
+were no regressions.
+
+In addition, I constructed a test case that used every archiecture define (like
+_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
+this test for all supported combinations of -mcpu, big/little endian, and 32/64
+bit support.  Every single instance generated exactly the same code with the
+patches installed compared to the compiler before installing the patches.
+
+Can I install this patch on the GCC 15 trunk?
+
+2024-08-17  Michael Meissner  
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/ppc-target-4.c: Rewrite the test to add cpu=power7
+   when we need to add VSX support.  Add test for adding cpu=power7 no-vsx
+   to generate only Altivec instructions.
+   * gcc.target/powerpc/pr115688.c: Add cpu=power7 when requesting VSX
+   instructions.
+
+ Branch work177, patch #8 
+
+Change TARGET_MODULO to TARGET_POWER9
+
+As part of the architecture flags patches, this patch changes the use of
+TARGET_MODULO to TARGET_POWER9.  The modulo instructions were added in power9 
(ISA
+3.0).  Note, I did not change the uses of TARGET_MODULO where it was explicitly
+generating different code if the machine had a modulo instruction.
+
+I have built both big endian and little endian bootstrap compilers and there
+were no regressions.
+
+In addition, I constructed a test case that used every archiecture define (like
+_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
+this test for all supported combinations of -mcpu, big/little endian, and 32/64
+bit support.  Every single instance generated exactly the same code with the
+patches installed compared to the compiler before installing the patches.
+
+Can I install this patch on the GCC 15 trunk?
+
+2024-08-16  Michael Meissner  
+
+   * config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
+   TARGET_POWER9 instead of TARGET_MODULO.
+   * config/rs6000/rs6000.h (TARGET_CTZ): Likewise.
+   (TARGET_EXTSWSLI): Likewise.
+   (TARGET_MADDLD): Likewise.
+   * config/rs6000/rs6000.md (enabled attribute): Likewise.
+
+=

[gcc/meissner/heads/work177-bugs] (14 commits) Merge commit 'refs/users/meissner/heads/work177-bugs' of gi

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-bugs' was updated to point to:

 3b714bd343c2... Merge commit 'refs/users/meissner/heads/work177-bugs' of gi

It previously pointed to:

 e9eff3979fbb... Add ChangeLog.bugs and update REVISION.

Diff:

Summary of changes (added commits):
---

  3b714bd... Merge commit 'refs/users/meissner/heads/work177-bugs' of gi
  47ea6ba... Add ChangeLog.bugs and update REVISION.
  c6a092f... Update ChangeLog.* (*)
  6671caf... Add -mcpu=future tuning support. (*)
  bb83c50... Add support for -mcpu=future (*)
  62ff64e... Update tests to work with architecture flags changes. (*)
  33a2cf2... Change TARGET_MODULO to TARGET_POWER9 (*)
  59231f0... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  924e0f6... Change TARGET_CMPB to TARGET_POWER6 (*)
  c3fd033... Change TARGET_FPRND to TARGET_POWER5X (*)
  9e092b3... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  75c6f74... Do not allow -mvsx to boost processor to power7. (*)
  559e865... Use architecture flags for defining _ARCH_PWR macros. (*)
  505f1a5... Add rs6000 architecture masks. (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work177-bugs' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work177-bugs)] Add ChangeLog.bugs and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:47ea6bac4fcd2fd42929756717369c8838096435

commit 47ea6bac4fcd2fd42929756717369c8838096435
Author: Michael Meissner 
Date:   Tue Sep 3 19:43:16 2024 -0400

Add ChangeLog.bugs and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.bugs: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.bugs | 6 ++
 gcc/REVISION   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.bugs b/gcc/ChangeLog.bugs
new file mode 100644
index ..d650b6ef609d
--- /dev/null
+++ b/gcc/ChangeLog.bugs
@@ -0,0 +1,6 @@
+ Branch work177-bugs, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..52a7be4c2653 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-bugs branch


[gcc(refs/users/meissner/heads/work177-bugs)] Merge commit 'refs/users/meissner/heads/work177-bugs' of git+ssh://gcc.gnu.org/git/gcc into me/work1

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:3b714bd343c212924f0371e2eedc3530c85ab5c0

commit 3b714bd343c212924f0371e2eedc3530c85ab5c0
Merge: 47ea6bac4fcd e9eff3979fbb
Author: Michael Meissner 
Date:   Tue Sep 3 22:04:52 2024 -0400

Merge commit 'refs/users/meissner/heads/work177-bugs' of 
git+ssh://gcc.gnu.org/git/gcc into me/work177-bugs

Diff:


[gcc/meissner/heads/work177-dmf] (14 commits) Merge commit 'refs/users/meissner/heads/work177-dmf' of git

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-dmf' was updated to point to:

 e26fe0258375... Merge commit 'refs/users/meissner/heads/work177-dmf' of git

It previously pointed to:

 b0d36040031e... Add ChangeLog.dmf and update REVISION.

Diff:

Summary of changes (added commits):
---

  e26fe02... Merge commit 'refs/users/meissner/heads/work177-dmf' of git
  ac3382f... Add ChangeLog.dmf and update REVISION.
  c6a092f... Update ChangeLog.* (*)
  6671caf... Add -mcpu=future tuning support. (*)
  bb83c50... Add support for -mcpu=future (*)
  62ff64e... Update tests to work with architecture flags changes. (*)
  33a2cf2... Change TARGET_MODULO to TARGET_POWER9 (*)
  59231f0... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  924e0f6... Change TARGET_CMPB to TARGET_POWER6 (*)
  c3fd033... Change TARGET_FPRND to TARGET_POWER5X (*)
  9e092b3... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  75c6f74... Do not allow -mvsx to boost processor to power7. (*)
  559e865... Use architecture flags for defining _ARCH_PWR macros. (*)
  505f1a5... Add rs6000 architecture masks. (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work177-dmf' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work177-dmf)] Add ChangeLog.dmf and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:ac3382f8526f4d0ffeeca8edc11149832c52d815

commit ac3382f8526f4d0ffeeca8edc11149832c52d815
Author: Michael Meissner 
Date:   Tue Sep 3 19:40:35 2024 -0400

Add ChangeLog.dmf and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.dmf: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.dmf | 6 ++
 gcc/REVISION  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf
new file mode 100644
index ..5686facb6869
--- /dev/null
+++ b/gcc/ChangeLog.dmf
@@ -0,0 +1,6 @@
+ Branch work177-dmf, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..7eeb812a9a1b 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-dmf branch


[gcc(refs/users/meissner/heads/work177-dmf)] Merge commit 'refs/users/meissner/heads/work177-dmf' of git+ssh://gcc.gnu.org/git/gcc into me/work17

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:e26fe0258375de420238bf66f9ddf2f2cfea57ce

commit e26fe0258375de420238bf66f9ddf2f2cfea57ce
Merge: ac3382f8526f b0d36040031e
Author: Michael Meissner 
Date:   Tue Sep 3 22:06:34 2024 -0400

Merge commit 'refs/users/meissner/heads/work177-dmf' of 
git+ssh://gcc.gnu.org/git/gcc into me/work177-dmf

Diff:


[gcc/meissner/heads/work177-libs] (14 commits) Merge commit 'refs/users/meissner/heads/work177-libs' of gi

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-libs' was updated to point to:

 fd9c9e45d5d5... Merge commit 'refs/users/meissner/heads/work177-libs' of gi

It previously pointed to:

 752eeaaf... Add ChangeLog.libs and update REVISION.

Diff:

Summary of changes (added commits):
---

  fd9c9e4... Merge commit 'refs/users/meissner/heads/work177-libs' of gi
  cd87d13... Add ChangeLog.libs and update REVISION.
  c6a092f... Update ChangeLog.* (*)
  6671caf... Add -mcpu=future tuning support. (*)
  bb83c50... Add support for -mcpu=future (*)
  62ff64e... Update tests to work with architecture flags changes. (*)
  33a2cf2... Change TARGET_MODULO to TARGET_POWER9 (*)
  59231f0... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  924e0f6... Change TARGET_CMPB to TARGET_POWER6 (*)
  c3fd033... Change TARGET_FPRND to TARGET_POWER5X (*)
  9e092b3... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  75c6f74... Do not allow -mvsx to boost processor to power7. (*)
  559e865... Use architecture flags for defining _ARCH_PWR macros. (*)
  505f1a5... Add rs6000 architecture masks. (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work177-libs' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work177-libs)] Add ChangeLog.libs and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:cd87d132d47469a00a6ffa5863ce57226541c74b

commit cd87d132d47469a00a6ffa5863ce57226541c74b
Author: Michael Meissner 
Date:   Tue Sep 3 19:44:40 2024 -0400

Add ChangeLog.libs and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.libs: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.libs | 6 ++
 gcc/REVISION   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.libs b/gcc/ChangeLog.libs
new file mode 100644
index ..03ebc5600b23
--- /dev/null
+++ b/gcc/ChangeLog.libs
@@ -0,0 +1,6 @@
+ Branch work177-libs, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..0ca4dbac01fa 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-libs branch


[gcc(refs/users/meissner/heads/work177-libs)] Merge commit 'refs/users/meissner/heads/work177-libs' of git+ssh://gcc.gnu.org/git/gcc into me/work1

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:fd9c9e45d5d5cef02e3f3ddc7669ae8028d0acd4

commit fd9c9e45d5d5cef02e3f3ddc7669ae8028d0acd4
Merge: cd87d132d474 752eeaaf
Author: Michael Meissner 
Date:   Tue Sep 3 22:08:20 2024 -0400

Merge commit 'refs/users/meissner/heads/work177-libs' of 
git+ssh://gcc.gnu.org/git/gcc into me/work177-libs

Diff:


[gcc/meissner/heads/work177-tar] (14 commits) Merge commit 'refs/users/meissner/heads/work177-tar' of git

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-tar' was updated to point to:

 e75a605eca9d... Merge commit 'refs/users/meissner/heads/work177-tar' of git

It previously pointed to:

 e3af2c7a011e... Add ChangeLog.tar and update REVISION.

Diff:

Summary of changes (added commits):
---

  e75a605... Merge commit 'refs/users/meissner/heads/work177-tar' of git
  6a6c43f... Add ChangeLog.tar and update REVISION.
  c6a092f... Update ChangeLog.* (*)
  6671caf... Add -mcpu=future tuning support. (*)
  bb83c50... Add support for -mcpu=future (*)
  62ff64e... Update tests to work with architecture flags changes. (*)
  33a2cf2... Change TARGET_MODULO to TARGET_POWER9 (*)
  59231f0... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  924e0f6... Change TARGET_CMPB to TARGET_POWER6 (*)
  c3fd033... Change TARGET_FPRND to TARGET_POWER5X (*)
  9e092b3... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  75c6f74... Do not allow -mvsx to boost processor to power7. (*)
  559e865... Use architecture flags for defining _ARCH_PWR macros. (*)
  505f1a5... Add rs6000 architecture masks. (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work177-tar' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work177-tar)] Add ChangeLog.tar and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:6a6c43f8cfe5e3a926be48f9739592522de9457b

commit 6a6c43f8cfe5e3a926be48f9739592522de9457b
Author: Michael Meissner 
Date:   Tue Sep 3 19:42:20 2024 -0400

Add ChangeLog.tar and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.tar: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.tar | 6 ++
 gcc/REVISION  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.tar b/gcc/ChangeLog.tar
new file mode 100644
index ..6fe8a38bffcd
--- /dev/null
+++ b/gcc/ChangeLog.tar
@@ -0,0 +1,6 @@
+ Branch work177-tar, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..2207f56818d6 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-tar branch


[gcc(refs/users/meissner/heads/work177-tar)] Merge commit 'refs/users/meissner/heads/work177-tar' of git+ssh://gcc.gnu.org/git/gcc into me/work17

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:e75a605eca9d86b15f0dc7f58e392012cf86b79a

commit e75a605eca9d86b15f0dc7f58e392012cf86b79a
Merge: 6a6c43f8cfe5 e3af2c7a011e
Author: Michael Meissner 
Date:   Tue Sep 3 22:10:04 2024 -0400

Merge commit 'refs/users/meissner/heads/work177-tar' of 
git+ssh://gcc.gnu.org/git/gcc into me/work177-tar

Diff:


[gcc/meissner/heads/work177-test] (14 commits) Merge commit 'refs/users/meissner/heads/work177-test' of gi

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-test' was updated to point to:

 35b6e0965ca6... Merge commit 'refs/users/meissner/heads/work177-test' of gi

It previously pointed to:

 9977091c89fa... Add ChangeLog.test and update REVISION.

Diff:

Summary of changes (added commits):
---

  35b6e09... Merge commit 'refs/users/meissner/heads/work177-test' of gi
  d8fca01... Add ChangeLog.test and update REVISION.
  c6a092f... Update ChangeLog.* (*)
  6671caf... Add -mcpu=future tuning support. (*)
  bb83c50... Add support for -mcpu=future (*)
  62ff64e... Update tests to work with architecture flags changes. (*)
  33a2cf2... Change TARGET_MODULO to TARGET_POWER9 (*)
  59231f0... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  924e0f6... Change TARGET_CMPB to TARGET_POWER6 (*)
  c3fd033... Change TARGET_FPRND to TARGET_POWER5X (*)
  9e092b3... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  75c6f74... Do not allow -mvsx to boost processor to power7. (*)
  559e865... Use architecture flags for defining _ARCH_PWR macros. (*)
  505f1a5... Add rs6000 architecture masks. (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work177-test' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work177-test)] Add ChangeLog.test and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:d8fca012866fa40cc793677326e4f3ca64a56f04

commit d8fca012866fa40cc793677326e4f3ca64a56f04
Author: Michael Meissner 
Date:   Tue Sep 3 19:45:32 2024 -0400

Add ChangeLog.test and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.test: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.test | 6 ++
 gcc/REVISION   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.test b/gcc/ChangeLog.test
new file mode 100644
index ..7d4190b2ed42
--- /dev/null
+++ b/gcc/ChangeLog.test
@@ -0,0 +1,6 @@
+ Branch work177-test, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..0ccbb3643953 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-test branch


[gcc(refs/users/meissner/heads/work177-test)] Merge commit 'refs/users/meissner/heads/work177-test' of git+ssh://gcc.gnu.org/git/gcc into me/work1

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:35b6e0965ca6db155b4281a59b70c7fabde99ead

commit 35b6e0965ca6db155b4281a59b70c7fabde99ead
Merge: d8fca012866f 9977091c89fa
Author: Michael Meissner 
Date:   Tue Sep 3 22:11:56 2024 -0400

Merge commit 'refs/users/meissner/heads/work177-test' of 
git+ssh://gcc.gnu.org/git/gcc into me/work177-test

Diff:


[gcc/meissner/heads/work177-vpair] (14 commits) Merge commit 'refs/users/meissner/heads/work177-vpair' of g

2024-09-03 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work177-vpair' was updated to point to:

 4e3f839435e3... Merge commit 'refs/users/meissner/heads/work177-vpair' of g

It previously pointed to:

 a1297ab8acc1... Add ChangeLog.vpair and update REVISION.

Diff:

Summary of changes (added commits):
---

  4e3f839... Merge commit 'refs/users/meissner/heads/work177-vpair' of g
  2ded4c8... Add ChangeLog.vpair and update REVISION.
  c6a092f... Update ChangeLog.* (*)
  6671caf... Add -mcpu=future tuning support. (*)
  bb83c50... Add support for -mcpu=future (*)
  62ff64e... Update tests to work with architecture flags changes. (*)
  33a2cf2... Change TARGET_MODULO to TARGET_POWER9 (*)
  59231f0... Change TARGET_POPCNTD to TARGET_POWER7 (*)
  924e0f6... Change TARGET_CMPB to TARGET_POWER6 (*)
  c3fd033... Change TARGET_FPRND to TARGET_POWER5X (*)
  9e092b3... Change TARGET_POPCNTB to TARGET_POWER5 (*)
  75c6f74... Do not allow -mvsx to boost processor to power7. (*)
  559e865... Use architecture flags for defining _ARCH_PWR macros. (*)
  505f1a5... Add rs6000 architecture masks. (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work177-vpair' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work177-vpair)] Add ChangeLog.vpair and update REVISION.

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:2ded4c8845ab4597540d3b4740fb9a8bad156c9e

commit 2ded4c8845ab4597540d3b4740fb9a8bad156c9e
Author: Michael Meissner 
Date:   Tue Sep 3 19:41:27 2024 -0400

Add ChangeLog.vpair and update REVISION.

2024-09-03  Michael Meissner  

gcc/

* ChangeLog.vpair: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.vpair | 6 ++
 gcc/REVISION| 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.vpair b/gcc/ChangeLog.vpair
new file mode 100644
index ..958a2af2e811
--- /dev/null
+++ b/gcc/ChangeLog.vpair
@@ -0,0 +1,6 @@
+ Branch work177-vpair, baseline 
+
+2024-09-03   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index d331a5b4da39..c0f58f45e4fb 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work177 branch
+work177-vpair branch


[gcc(refs/users/meissner/heads/work177-vpair)] Merge commit 'refs/users/meissner/heads/work177-vpair' of git+ssh://gcc.gnu.org/git/gcc into me/work

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:4e3f839435e31f04e7c1c7334b81352645da4ec8

commit 4e3f839435e31f04e7c1c7334b81352645da4ec8
Merge: 2ded4c8845ab a1297ab8acc1
Author: Michael Meissner 
Date:   Tue Sep 3 22:14:19 2024 -0400

Merge commit 'refs/users/meissner/heads/work177-vpair' of 
git+ssh://gcc.gnu.org/git/gcc into me/work177-vpair

Diff:


[gcc(refs/users/meissner/heads/work177-bugs)] Add better support for shifting vectors with 64-bit elements

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:2e236aa017bc9c3d1a4fc065c7051c3a45e2f71c

commit 2e236aa017bc9c3d1a4fc065c7051c3a45e2f71c
Author: Michael Meissner 
Date:   Tue Sep 3 22:17:54 2024 -0400

Add better support for shifting vectors with 64-bit elements

This patch fixes PR target/89213 to allow better code to be generated to do
constant shifts of V2DI/V2DF vectors.  Previously GCC would do constant 
shifts
of vectors with 64-bit elements by using:

XXSPLTIB 32,4
VEXTSB2D 0,0
VSRAD 2,2,0

I.e., the PowerPC does not have a VSPLTISD instruction to load -15..14 for 
the
64-bit shift count in one instruction.  Instead, it would need to load a 
byte
and then convert it to 64-bit.

With this patch, GCC now realizes that the vector shift instructions will 
look
at the bottom 6 bits for the shift count, and it can use either a VSPLTISW 
or
XXSPLTIB instruction to load the shift count.

2024-09-03  Michael Meissner  

gcc/

PR target/89213
* config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec.
(VSHIFT_MODE): New mode iterator.
(vshift_code): New code iterator.
(vshift_attr): New code attribute.
(altivec___const): New pattern to optimize
vector long long/int shifts by a constant.
(altivec__shift_const): New helper insn to load up a
constant used by the shift operation.
* config/rs6000/predicates.md (vector_shift_constant): New
predicate.

gcc/testsuite/

PR target/89213
* gcc.target/powerpc/pr89213.c: New test.
* gcc.target/powerpc/vec-rlmi-rlnm.c: Update instruction count.

Diff:
---
 gcc/config/rs6000/altivec.md |  51 +++
 gcc/config/rs6000/predicates.md  |  63 ++
 gcc/testsuite/gcc.target/powerpc/pr89213.c   | 106 +++
 gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c |   4 +-
 4 files changed, 222 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 1f5489b974f6..8faece984e9f 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -170,6 +170,7 @@
UNSPEC_VSTRIL
UNSPEC_SLDB
UNSPEC_SRDB
+   UNSPEC_VECTOR_SHIFT
 ])
 
 (define_c_enum "unspecv"
@@ -2176,6 +2177,56 @@
   "vsro %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
+;; Optimize V2DI shifts by constants.  This relies on the shift instructions
+;; only looking at the bits needed to do the shift.  This means we can use
+;; VSPLTISW or XXSPLTIB to load up the constant, and not worry about the bits
+;; that the vector shift instructions will not use.
+(define_mode_iterator VSHIFT_MODE  [(V4SI "TARGET_P9_VECTOR")
+(V2DI "TARGET_P8_VECTOR")])
+
+(define_code_iterator vshift_code  [ashift ashiftrt lshiftrt])
+(define_code_attr vshift_attr  [(ashift   "ashift")
+(ashiftrt "ashiftrt")
+(lshiftrt "lshiftrt")])
+
+(define_insn_and_split "*altivec___const"
+  [(set (match_operand:VSHIFT_MODE 0 "register_operand" "=v")
+   (vshift_code:VSHIFT_MODE
+(match_operand:VSHIFT_MODE 1 "register_operand" "v")
+(match_operand:VSHIFT_MODE 2 "vector_shift_constant" "")))
+   (clobber (match_scratch:VSHIFT_MODE 3 "=&v"))]
+  "((mode == V2DImode && TARGET_P8_VECTOR)
+|| (mode == V4SImode && TARGET_P9_VECTOR))"
+  "#"
+  "&& 1"
+  [(set (match_dup 3)
+   (unspec:VSHIFT_MODE [(match_dup 4)] UNSPEC_VECTOR_SHIFT))
+   (set (match_dup 0)
+   (vshift_code:VSHIFT_MODE (match_dup 1)
+(match_dup 3)))]
+{
+  if (GET_CODE (operands[3]) == SCRATCH)
+operands[3] = gen_reg_rtx (mode);
+
+  operands[4] = ((GET_CODE (operands[2]) == CONST_VECTOR)
+? CONST_VECTOR_ELT (operands[2], 0)
+: XEXP (operands[2], 0));
+})
+
+(define_insn "*altivec__shift_const"
+  [(set (match_operand:VSHIFT_MODE 0 "register_operand" "=v")
+   (unspec:VSHIFT_MODE [(match_operand 1 "const_int_operand" "n")]
+   UNSPEC_VECTOR_SHIFT))]
+  "TARGET_P8_VECTOR"
+{
+  if (UINTVAL (operands[1]) <= 15)
+return "vspltisw %0,%1";
+  else if (TARGET_P9_VECTOR)
+return "xxspltib %x0,%1";
+  else
+gcc_unreachable ();
+})
+
 (define_insn "altivec_vsum4ubs"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V16QI 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 7f0b4ab61e65..0b78901e94be 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -861,6 +861,69 @@
 return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode);
 })
 
+;; Return 1 if the operand is a V2DI or V4SI const_vector, where ea

[gcc(refs/users/meissner/heads/work177-bugs)] Optimize splat of a V2DF/V2DI extract with constant element

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:8dc69fbc17ba58eac4d758f412ec2014c41b9bbc

commit 8dc69fbc17ba58eac4d758f412ec2014c41b9bbc
Author: Michael Meissner 
Date:   Tue Sep 3 22:18:49 2024 -0400

Optimize splat of a V2DF/V2DI extract with constant element

We had optimizations for splat of a vector extract for the other vector
types, but we missed having one for V2DI and V2DF.  This patch adds a
combiner insn to do this optimization.

In looking at the source, we had similar optimizations for V4SI and V4SF
extract and splats, but we missed doing V2DI/V2DF.

Without the patch for the code:

vector long long splat_dup_l_0 (vector long long v)
{
  return __builtin_vec_splats (__builtin_vec_extract (v, 0));
}

the compiler generates (on a little endian power9):

splat_dup_l_0:
mfvsrld 9,34
mtvsrdd 34,9,9
blr

Now it generates:

splat_dup_l_0:
xxpermdi 34,34,34,3
blr

2024-09-03  Michael Meissner  

gcc/

* config/rs6000/vsx.md (vsx_splat_extract_): New insn.

gcc/testsuite/

* gcc.target/powerpc/builtins-1.c: Adjust insn count.
* gcc.target/powerpc/pr99293.c: New test.

Diff:
---
 gcc/config/rs6000/vsx.md  | 18 ++
 gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr99293.c| 22 ++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b2fc39acf4e8..73f20a86e56a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4796,6 +4796,24 @@
   "lxvdsx %x0,%y1"
   [(set_attr "type" "vecload")])
 
+;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element
+(define_insn "*vsx_splat_extract_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+   (vec_duplicate:VSX_D
+(vec_select:
+ (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+ (parallel [(match_operand 2 "const_0_to_1_operand" "n")]]
+  "VECTOR_MEM_VSX_P (mode)"
+{
+  int which_word = INTVAL (operands[2]);
+  if (!BYTES_BIG_ENDIAN)
+which_word = 1 - which_word;
+
+  operands[3] = GEN_INT (which_word ? 3 : 0);
+  return "xxpermdi %x0,%x1,%x1,%3";
+}
+  [(set_attr "type" "vecperm")])
+
 ;; V4SI splat support
 (define_insn "vsx_splat_v4si"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
index 8410a5fd4319..4e7e5384675f 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
@@ -1035,4 +1035,4 @@ foo156 (vector unsigned short usa)
 /* { dg-final { scan-assembler-times {\mvmrglb\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mvmrgew\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mvsplth|xxsplth\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mxxpermdi\M} 44 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 42 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c 
b/gcc/testsuite/gcc.target/powerpc/pr99293.c
new file mode 100644
index ..20adc1f27f65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c
@@ -0,0 +1,22 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test for PR 99263, which wants to do:
+   __builtin_vec_splats (__builtin_vec_extract (v, n))
+
+   where v is a V2DF or V2DI vector and n is either 0 or 1.  Previously the
+   compiler would do a direct move to the GPR registers to select the item and 
a
+   direct move from the GPR registers to do the splat.  */
+
+vector long long splat_dup_l_0 (vector long long v)
+{
+  return __builtin_vec_splats (__builtin_vec_extract (v, 0));
+}
+
+vector long long splat_dup_l_1 (vector long long v)
+{
+  return __builtin_vec_splats (__builtin_vec_extract (v, 1));
+}
+
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */


[gcc(refs/users/meissner/heads/work177-bugs)] Update ChangeLog.*

2024-09-03 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:4257b13aa62b5307dceaa686ac15088f367fa608

commit 4257b13aa62b5307dceaa686ac15088f367fa608
Author: Michael Meissner 
Date:   Tue Sep 3 22:20:30 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.bugs | 94 +-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.bugs b/gcc/ChangeLog.bugs
index d650b6ef609d..0826011aaae0 100644
--- a/gcc/ChangeLog.bugs
+++ b/gcc/ChangeLog.bugs
@@ -1,6 +1,98 @@
+ Branch work177-bugs, patch #201 
+
+Optimize splat of a V2DF/V2DI extract with constant element
+
+We had optimizations for splat of a vector extract for the other vector
+types, but we missed having one for V2DI and V2DF.  This patch adds a
+combiner insn to do this optimization.
+
+In looking at the source, we had similar optimizations for V4SI and V4SF
+extract and splats, but we missed doing V2DI/V2DF.
+
+Without the patch for the code:
+
+   vector long long splat_dup_l_0 (vector long long v)
+   {
+ return __builtin_vec_splats (__builtin_vec_extract (v, 0));
+   }
+
+the compiler generates (on a little endian power9):
+
+   splat_dup_l_0:
+   mfvsrld 9,34
+   mtvsrdd 34,9,9
+   blr
+
+Now it generates:
+
+   splat_dup_l_0:
+   xxpermdi 34,34,34,3
+   blr
+
+2024-08-19  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/vsx.md (vsx_splat_extract_): New insn.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/builtins-1.c: Adjust insn count.
+   * gcc.target/powerpc/pr99293.c: New test.
+
+ Branch work177-bugs, patch #200 
+
+Add better support for shifting vectors with 64-bit elements
+
+This patch fixes PR target/89213 to allow better code to be generated to do
+constant shifts of V2DI/V2DF vectors.  Previously GCC would do constant shifts
+of vectors with 64-bit elements by using:
+
+   XXSPLTIB 32,4
+   VEXTSB2D 0,0
+   VSRAD 2,2,0
+
+I.e., the PowerPC does not have a VSPLTISD instruction to load -15..14 for the
+64-bit shift count in one instruction.  Instead, it would need to load a byte
+and then convert it to 64-bit.
+
+With this patch, GCC now realizes that the vector shift instructions will look
+at the bottom 6 bits for the shift count, and it can use either a VSPLTISW or
+XXSPLTIB instruction to load the shift count.
+
+2024-08-19  Michael Meissner  
+
+gcc/
+
+   PR target/89213
+   * config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec.
+   (VSHIFT_MODE): New mode iterator.
+   (vshift_code): New code iterator.
+   (vshift_attr): New code attribute.
+   (altivec___const): New pattern to optimize
+   vector long long/int shifts by a constant.
+   (altivec__shift_const): New helper insn to load up a
+   constant used by the shift operation.
+   * config/rs6000/predicates.md (vector_shift_constant): New
+   predicate.
+
+gcc/testsuite/
+
+   PR target/89213
+   * gcc.target/powerpc/pr89213.c: New test.
+   * gcc.target/powerpc/vec-rlmi-rlnm.c: Update instruction count.
+
  Branch work177-bugs, baseline 
 
+Add ChangeLog.bugs and update REVISION.
+
+2024-08-16  Michael Meissner  
+
+gcc/
+
+   * ChangeLog.bugs: New file for branch.
+   * REVISION: Update.
+
 2024-09-03   Michael Meissner  
 
Clone branch
-


[gcc r15-3439] CRIS: Add new peephole2 "lra_szext_decomposed_indir_plus"

2024-09-03 Thread Hans-Peter Nilsson via Gcc-cvs
https://gcc.gnu.org/g:62dd893ff8a12a1d28f595b4e5bc43cf9f7d1c07

commit r15-3439-g62dd893ff8a12a1d28f595b4e5bc43cf9f7d1c07
Author: Hans-Peter Nilsson 
Date:   Mon Jul 8 03:59:55 2024 +0200

CRIS: Add new peephole2 "lra_szext_decomposed_indir_plus"

Exposed when running the test-suite with -flate-combine-instructions.

* config/cris/cris.md (lra_szext_decomposed_indir_plus): New
peephole2 pattern.

Diff:
---
 gcc/config/cris/cris.md | 45 +
 1 file changed, 45 insertions(+)

diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index c15395bd84c4..e066d5c920a9 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -3024,6 +3024,7 @@
 ;; Re-compose a decomposed "indirect offset" address for a szext
 ;; operation.  The non-clobbering "addi" is generated by LRA.
 ;; This and lra_szext_decomposed is covered by cris/rld-legit1.c.
+;; (Unfortunately not true when enabling late-combine.)
 (define_peephole2 ; lra_szext_decomposed_indirect_with_offset
   [(parallel
 [(set (match_operand:SI 0 "register_operand")
@@ -3046,6 +3047,50 @@
(mem:BW2 (plus:SI (szext:SI (mem:BW (match_dup 1))) (match_dup 2)
  (clobber (reg:CC CRIS_CC0_REGNUM))])])
 
+;; When enabling late-combine, we get a slightly changed register
+;; allocation.  The two allocations for the pseudo-registers involved
+;; in the matching pattern get "swapped" and the (plus ...) in the
+;; pattern above is now a load from a stack-slot.  If peephole2 is
+;; disabled, we see that the original sequence is actually improved;
+;; one less incoming instruction, a load.  We need to "undo" that
+;; improvement a bit and move that load "back" to before the sequence
+;; we combine in lra_szext_decomposed_indirect_with_offset.  But that
+;; changed again, so there's no define_peephole2 for that sequence
+;; here, because it'd be hard or impossible to write a matching
+;; test-case.  A few commits later, the incoming pattern sequence has
+;; changed again: back to the original but with the (plus...) part of
+;; the address inside the second memory reference.
+;; Coverage: cris/rld-legit1.c@r15-1880-gce34fcc572a0dc or
+;; r15-3386-gaf1500dd8c00 when adding -flate-combine-instructions.
+
+(define_peephole2 ; lra_szext_decomposed_indir_plus
+  [(parallel
+[(set (match_operand:SI 0 "register_operand")
+ (sign_extend:SI (mem:BW (match_operand:SI 1 "register_operand"
+ (clobber (reg:CC CRIS_CC0_REGNUM))])
+   (parallel
+[(set (match_operand:SI 3 "register_operand")
+ (szext:SI (mem:BW2 (plus:SI
+ (match_operand:SI 4 "register_operand")
+ (match_operand:SI 2 "register_operand")
+ (clobber (reg:CC CRIS_CC0_REGNUM))])]
+  "(REGNO (operands[0]) == REGNO (operands[3])
+|| peep2_reg_dead_p (3, operands[0]))
+   && (REGNO (operands[0]) == REGNO (operands[1])
+   || peep2_reg_dead_p (3, operands[0]))
+   && (rtx_equal_p (operands[2], operands[0])
+   || rtx_equal_p (operands[4], operands[0]))"
+  [(parallel
+[(set
+  (match_dup 3)
+  (szext:SI
+   (mem:BW2 (plus:SI (szext:SI (mem:BW (match_dup 1))) (match_dup 2)
+ (clobber (reg:CC CRIS_CC0_REGNUM))])]
+{
+  if (! rtx_equal_p (operands[4], operands[0]))
+operands[2] = operands[4];
+})
+
 ;; Add operations with similar or same decomposed addresses here, when
 ;; encountered - but only when covered by mentioned test-cases for at
 ;; least one of the cases generalized in the pattern.


  1   2   >