date:20250211

Re: [PATCH] Synchronize include/dwarf2.def with binutils

2025-02-11 Thread Jakub Jelinek

On Mon, Feb 10, 2025 at 04:21:28PM -, Roger Sayle wrote:
> 2025-02-10  Roger Sayle  
> 
> include/ChangeLog
> * dwarf2.def(DW_CFA_AARCH64_negate_ra_state_with_pc): Define.

Space after def

Ok for trunk with that nit fixed.

> diff --git a/include/dwarf2.def b/include/dwarf2.def
> index e9acb79df9c..989f078041d 100644
> --- a/include/dwarf2.def
> +++ b/include/dwarf2.def
> @@ -788,6 +788,8 @@ DW_CFA (DW_CFA_hi_user, 0x3f)
>  
>  /* SGI/MIPS specific.  */
>  DW_CFA (DW_CFA_MIPS_advance_loc8, 0x1d)
> +/* AArch64 extensions.  */
> +DW_CFA (DW_CFA_AARCH64_negate_ra_state_with_pc, 0x2c)
>  /* GNU extensions.
> NOTE: DW_CFA_GNU_window_save is multiplexed on Sparc and AArch64.  */
>  DW_CFA (DW_CFA_GNU_window_save, 0x2d)


Jakub

[PATCH v2] RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets

2025-02-11 Thread Jin Ma

This is a follow-up to the patch below to avoid generating unrecognized
vsetivl instructions for XTheadVector.

https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html

PR target/118601

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Check with new
constraint 'vl' instead of 'K'.
(expand_vec_setmem): Likewise.
(expand_vec_cmpmem): Likewise.
* config/riscv/riscv-v.cc (force_vector_length_operand): Likewise.
(expand_load_store): Likewise.
(expand_strided_load): Likewise.
(expand_strided_store): Likewise.
(expand_lanes_load_store): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to...
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test.
* gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test.

Reported-by: Edwin Lu 
---
 gcc/config/riscv/riscv-string.cc  |  6 +--
 gcc/config/riscv/riscv-v.cc   | 10 ++--
 .../riscv/rvv/xtheadvector/pr114194-rv32.c| 51 +++
 .../{pr114194.c => pr114194-rv64.c}   |  5 +-
 .../riscv/rvv/xtheadvector/pr118601.c | 18 +++
 5 files changed, 79 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c
 rename gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/{pr114194.c => 
pr114194-rv64.c} (80%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr118601.c

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 97e20bdb002..408eb07e87f 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1275,7 +1275,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, 
bool movmem_p)
   machine_mode mask_mode = riscv_vector::get_vector_mode
(BImode, GET_MODE_NUNITS (info.vmode)).require ();
   rtx mask =  CONSTM1_RTX (mask_mode);
-  if (!satisfies_constraint_K (cnt))
+  if (!satisfies_constraint_vl (cnt))
cnt= force_reg (Pmode, cnt);
   rtx m_ops[] = {vec, mask, src};
   emit_nonvlmax_insn (code_for_pred_mov (info.vmode),
@@ -1626,7 +1626,7 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx 
fill_value_in)
 }
   else
 {
-  if (!satisfies_constraint_K (info.avl))
+  if (!satisfies_constraint_vl (info.avl))
info.avl = force_reg (Pmode, info.avl);
   emit_nonvlmax_insn (code_for_pred_broadcast (info.vmode),
  riscv_vector::UNARY_OP, broadcast_ops, info.avl);
@@ -1694,7 +1694,7 @@ expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx 
blk_b_in, rtx length_in)
 }
   else
 {
-  if (!satisfies_constraint_K (length_in))
+  if (!satisfies_constraint_vl (length_in))
  length_in = force_reg (Pmode, length_in);
 
   rtx memmask = CONSTM1_RTX (mask_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9847439ca77..62456c7ef79 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2103,7 +2103,7 @@ get_unknown_min_value (machine_mode mode)
 static rtx
 force_vector_length_operand (rtx vl)
 {
-  if (CONST_INT_P (vl) && !satisfies_constraint_K (vl))
+  if (CONST_INT_P (vl) && !satisfies_constraint_vl (vl))
 return force_reg (Pmode, vl);
   return vl;
 }
@@ -4130,7 +4130,7 @@ expand_load_store (rtx *ops, bool is_load)
 }
   else
 {
-  if (!satisfies_constraint_K (len))
+  if (!satisfies_constraint_vl (len))
len = force_reg (Pmode, len);
   if (is_load)
{
@@ -4165,7 +4165,7 @@ expand_strided_load (machine_mode mode, rtx *ops)
 emit_vlmax_insn (icode, BINARY_OP_TAMA, emit_ops);
   else
 {
-  len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len);
+  len = satisfies_constraint_vl (len) ? len : force_reg (Pmode, len);
   emit_nonvlmax_insn (icode, BINARY_OP_TAMA, emit_ops, len);
 }
 }
@@ -4191,7 +4191,7 @@ expand_strided_store (machine_mode mode, rtx *ops)
 }
   else
 {
-  len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len);
+  len = satisfies_constraint_vl (len) ? len : force_reg (Pmode, len);
   vl_type = get_avl_type_rtx (NONVLMAX);
 }
 
@@ -4642,7 +4642,7 @@ expand_lanes_load_store (rtx *ops, bool is_load)
 }
   else
 {
-  if (!satisfies_constraint_K (len))
+  if (!satisfies_constraint_vl (len))
len = force_reg (Pmode, len);
   if (is_load)
{
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c
new file mode 100644
index 000..f95e713ea24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { { ! riscv_abi_e } && rv32 } } } */
+/* { dg-options "-march=rv3

Re: [PATCH v3 2/5] c++/modules: Ignore TU-local entities where necessary

2025-02-11 Thread Nathaniel Shead

On Mon, Jan 27, 2025 at 10:20:05AM -0500, Patrick Palka wrote:
> [snip]
>
> > @@ -18486,6 +18562,12 @@ dependent_operand_p (tree t)
> >  {
> >while (TREE_CODE (t) == IMPLICIT_CONV_EXPR)
> >  t = TREE_OPERAND (t, 0);
> > +
> > +  /* If we contain a TU_LOCAL_ENTITY assume we're non-dependent; we'll 
> > error
> > + later when instantiating.  */
> > +  if (expr_contains_tu_local_entity (t))
> > +return false;
> 
> I think it'd be more robust and cheaper (avoiding a separate tree walk)
> to teach the general constexpr/dependence predicates about
> TU_LOCAL_ENTITY instead of handling it only here.
> 
> > +
> >++processing_template_decl;
> >bool r = (potential_constant_expression (t)
> > ? value_dependent_expression_p (t)
> > @@ -20255,6 +20337,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> > complain, tree in_decl)
> > else
> >   object = NULL_TREE;
> >  
> > +   if (function_contains_tu_local_entity (templ))
> > + RETURN (error_mark_node);
> > +
> > tree tid = lookup_template_function (templ, targs);
> > protected_set_expr_location (tid, EXPR_LOCATION (t));
> >  
> > @@ -20947,6 +21032,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> > complain, tree in_decl)
> >   qualified_p = true;
> >   }
> >  
> > +   if (function_contains_tu_local_entity (function))
> > + RETURN (error_mark_node);
> 
> Similarly, maybe it'd suffice to check this more generally in the
> OVERLOAD case of tsubst_expr?
> 

So I'd completely missed the idea of handling it in the OVERLOAD case;
doing this also fixes the issues I'd been having trying to handle it in
potential_constant_expression.  I think this should be a lot cleaner
now.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Subject: [PATCH] c++: Handle TU_LOCAL_ENTITY in tsubst_expr and
 potential_constant_expression

This cleans up the TU_LOCAL_ENTITY handling to avoid unnecessary
tree walks and make the logic more robust.

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1): Handle
TU_LOCAL_ENTITY.
* pt.cc (expr_contains_tu_local_entity): Remove.
(function_contains_tu_local_entity): Remove.
(dependent_operand_p): Remove special handling for
TU_LOCAL_ENTITY.
(tsubst_expr): Handle TU_LOCAL_ENTITY when tsubsting OVERLOADs;
remove now-unnecessary extra handling.
(type_dependent_expression_p): Handle TU_LOCAL_ENTITY.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc |  5 +++
 gcc/cp/pt.cc| 80 -
 2 files changed, 19 insertions(+), 66 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index f142dd32bc8..b36705fd4ce 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -10825,6 +10825,11 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
 case CO_RETURN_EXPR:
   return false;
 
+/* Assume a TU-local entity is not constant, we'll error later when
+   instantiating.  */
+case TU_LOCAL_ENTITY:
+  return false;
+
 case NONTYPE_ARGUMENT_PACK:
   {
tree args = ARGUMENT_PACK_ARGS (t);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f857b3f1180..966050a6608 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9935,61 +9935,6 @@ complain_about_tu_local_entity (tree e)
   inform (TU_LOCAL_ENTITY_LOCATION (e), "declared here");
 }
 
-/* Checks if T contains a TU-local entity.  */
-
-static bool
-expr_contains_tu_local_entity (tree t)
-{
-  if (!modules_p ())
-return false;
-
-  auto walker = [](tree *tp, int *walk_subtrees, void *) -> tree
-{
-  if (TREE_CODE (*tp) == TU_LOCAL_ENTITY)
-   return *tp;
-  if (!EXPR_P (*tp))
-   *walk_subtrees = false;
-  return NULL_TREE;
-};
-  return cp_walk_tree (&t, walker, nullptr, nullptr);
-}
-
-/* Errors and returns TRUE if X is a function that contains a TU-local
-   entity in its overload set.  */
-
-static bool
-function_contains_tu_local_entity (tree x)
-{
-  if (!modules_p ())
-return false;
-
-  if (!x || x == error_mark_node)
-return false;
-
-  if (TREE_CODE (x) == OFFSET_REF
-  || TREE_CODE (x) == COMPONENT_REF)
-x = TREE_OPERAND (x, 1);
-  x = MAYBE_BASELINK_FUNCTIONS (x);
-  if (TREE_CODE (x) == TEMPLATE_ID_EXPR)
-x = TREE_OPERAND (x, 0);
-
-  if (OVL_P (x))
-for (tree ovl : lkp_range (x))
-  if (TREE_CODE (ovl) == TU_LOCAL_ENTITY)
-   {
- x = ovl;
- break;
-   }
-
-  if (TREE_CODE (x) == TU_LOCAL_ENTITY)
-{
-  complain_about_tu_local_entity (x);
-  return true;
-}
-
-  return false;
-}
-
 /* Return a TEMPLATE_ID_EXPR corresponding to the indicated FNS and
ARGLIST.  Valid choices for FNS are given in the cp-tree.def
documentation for TEMPLATE_ID_EXPR.  */
@@ -18797,11 +18742,6 @@ dependent_operand_p (tree t)
   while (TREE_CODE (t) == IMPLICIT_CONV_EXPR)
 t = TREE_OPERAND (t, 0);
 
-  /* If we

Re: [PATCH v2] RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets

2025-02-11 Thread Kito Cheng

LGTM, that seems right way to fix :)

Jin Ma  於 2025年2月11日 週二 21:45 寫道：

> This is a follow-up to the patch below to avoid generating unrecognized
> vsetivl instructions for XTheadVector.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html
>
> PR target/118601
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-string.cc (expand_block_move): Check with new
> constraint 'vl' instead of 'K'.
> (expand_vec_setmem): Likewise.
> (expand_vec_cmpmem): Likewise.
> * config/riscv/riscv-v.cc (force_vector_length_operand): Likewise.
> (expand_load_store): Likewise.
> (expand_strided_load): Likewise.
> (expand_strided_store): Likewise.
> (expand_lanes_load_store): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to...
> * gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here.
> * gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test.
> * gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test.
>
> Reported-by: Edwin Lu 
> ---
>  gcc/config/riscv/riscv-string.cc  |  6 +--
>  gcc/config/riscv/riscv-v.cc   | 10 ++--
>  .../riscv/rvv/xtheadvector/pr114194-rv32.c| 51 +++
>  .../{pr114194.c => pr114194-rv64.c}   |  5 +-
>  .../riscv/rvv/xtheadvector/pr118601.c | 18 +++
>  5 files changed, 79 insertions(+), 11 deletions(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c
>  rename gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/{pr114194.c =>
> pr114194-rv64.c} (80%)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr118601.c
>
> diff --git a/gcc/config/riscv/riscv-string.cc
> b/gcc/config/riscv/riscv-string.cc
> index 97e20bdb002..408eb07e87f 100644
> --- a/gcc/config/riscv/riscv-string.cc
> +++ b/gcc/config/riscv/riscv-string.cc
> @@ -1275,7 +1275,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx
> length_in, bool movmem_p)
>machine_mode mask_mode = riscv_vector::get_vector_mode
> (BImode, GET_MODE_NUNITS (info.vmode)).require ();
>rtx mask =  CONSTM1_RTX (mask_mode);
> -  if (!satisfies_constraint_K (cnt))
> +  if (!satisfies_constraint_vl (cnt))
> cnt= force_reg (Pmode, cnt);
>rtx m_ops[] = {vec, mask, src};
>emit_nonvlmax_insn (code_for_pred_mov (info.vmode),
> @@ -1626,7 +1626,7 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx
> fill_value_in)
>  }
>else
>  {
> -  if (!satisfies_constraint_K (info.avl))
> +  if (!satisfies_constraint_vl (info.avl))
> info.avl = force_reg (Pmode, info.avl);
>emit_nonvlmax_insn (code_for_pred_broadcast (info.vmode),
>   riscv_vector::UNARY_OP, broadcast_ops, info.avl);
> @@ -1694,7 +1694,7 @@ expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx
> blk_b_in, rtx length_in)
>  }
>else
>  {
> -  if (!satisfies_constraint_K (length_in))
> +  if (!satisfies_constraint_vl (length_in))
>   length_in = force_reg (Pmode, length_in);
>
>rtx memmask = CONSTM1_RTX (mask_mode);
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 9847439ca77..62456c7ef79 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -2103,7 +2103,7 @@ get_unknown_min_value (machine_mode mode)
>  static rtx
>  force_vector_length_operand (rtx vl)
>  {
> -  if (CONST_INT_P (vl) && !satisfies_constraint_K (vl))
> +  if (CONST_INT_P (vl) && !satisfies_constraint_vl (vl))
>  return force_reg (Pmode, vl);
>return vl;
>  }
> @@ -4130,7 +4130,7 @@ expand_load_store (rtx *ops, bool is_load)
>  }
>else
>  {
> -  if (!satisfies_constraint_K (len))
> +  if (!satisfies_constraint_vl (len))
> len = force_reg (Pmode, len);
>if (is_load)
> {
> @@ -4165,7 +4165,7 @@ expand_strided_load (machine_mode mode, rtx *ops)
>  emit_vlmax_insn (icode, BINARY_OP_TAMA, emit_ops);
>else
>  {
> -  len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len);
> +  len = satisfies_constraint_vl (len) ? len : force_reg (Pmode, len);
>emit_nonvlmax_insn (icode, BINARY_OP_TAMA, emit_ops, len);
>  }
>  }
> @@ -4191,7 +4191,7 @@ expand_strided_store (machine_mode mode, rtx *ops)
>  }
>else
>  {
> -  len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len);
> +  len = satisfies_constraint_vl (len) ? len : force_reg (Pmode, len);
>vl_type = get_avl_type_rtx (NONVLMAX);
>  }
>
> @@ -4642,7 +4642,7 @@ expand_lanes_load_store (rtx *ops, bool is_load)
>  }
>else
>  {
> -  if (!satisfies_constraint_K (len))
> +  if (!satisfies_constraint_vl (len))
> len = force_reg (Pmode, len);
>if (is_load)
> {
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c
> b/

Re: [PATCH] RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets

2025-02-11 Thread Craig Blackmore


On 10/02/2025 08:37, Jin Ma wrote:

On Sun, 09 Feb 2025 14:04:00 +0800, Jin Ma wrote:

PR target/118601


Ok for trunk?

Best regards,
Jin Ma


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p):
Exclude XTheadVector.

Reported-by: Edwin Lu 
---
  gcc/config/riscv/riscv.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 819e1538741..e5776aa0fbe 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -13826,7 +13826,7 @@ riscv_use_by_pieces_infrastructure_p (unsigned 
HOST_WIDE_INT size,
/* For set/clear with size > UNITS_PER_WORD, by pieces uses vector 
broadcasts
   with UNITS_PER_WORD size pieces.  Use setmem instead which can use
   bigger chunks.  */
-  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR
+  if (TARGET_VECTOR && !TARGET_XTHEADVECTOR && stringop_strategy & 
STRATEGY_VECTOR
&& (op == CLEAR_BY_PIECES || op == SET_BY_PIECES)
&& speed_p && size > UNITS_PER_WORD)
  return false;


`riscv_vector::expand_vec_setmem` generates the unrecognizable
instruction and your patch avoids calling it in some, but not all,
cases. Here is a case that still ICEs with `-march=rv32gc_xtheadvector
-mabi=ilp32d` and `-march=rv64gc_xtheadvector -mabi=lp64d` after
applying your patch:
```
void foo1_16 (void *p)
{
  __builtin_memset (p, 1, 16);
}
```
I suggest returning `false` in `riscv_vector::expand_vec_setmem` for
`TARGET_XTHEADVECTOR` or trying to generate something that is valid for
`TARGET_XTHEADVECTOR`. If you do bail out of
`riscv_vector::expand_vec_setmem` then you probably want to keep your
existing change too so that by pieces is still used for smaller lengths
rather than a libcall.


--
2.25.1

[PATCH] tree-optimization/118817 - missed folding of PRE inserted code

2025-02-11 Thread Richard Biener

When PRE inserts code it is not fully folded with following SSA
edges which can cause missed optimizations since the next fully
folding pass is way ahead, after strlen which in the PRs case leads
to diagnostics emitted on dead code.

The following mitigates the missed expression canonicalization that
happens during PHI translation where to be inserted expressions are
calculated.  It is largely refactoring and eliminating the single
use of fully_constant_expression and otherwise leverages the
work already done by vn_nary_simplify by updating the NARY with
the simplified expression.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/118817
* tree-ssa-pre.cc (fully_constant_expression): Fold into
the single caller.
(phi_translate_1): Refactor folded in fully_constant_expression.
* tree-ssa-sccvn.cc (vn_nary_simplify): Update the NARY with
the simplified expression.

* g++.dg/lto/pr118817_0.C: New testcase.
---
 gcc/testsuite/g++.dg/lto/pr118817_0.C |  17 
 gcc/tree-ssa-pre.cc   | 111 +-
 gcc/tree-ssa-sccvn.cc |  13 ++-
 3 files changed, 65 insertions(+), 76 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/pr118817_0.C

diff --git a/gcc/testsuite/g++.dg/lto/pr118817_0.C 
b/gcc/testsuite/g++.dg/lto/pr118817_0.C
new file mode 100644
index 000..ae65f34504e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr118817_0.C
@@ -0,0 +1,17 @@
+// { dg-lto-do link }
+// { dg-lto-options { { -O3 -fPIC -flto -shared -std=c++20 -Wall } } }
+// { dg-require-effective-target fpic }
+// { dg-require-effective-target shared }
+
+#include 
+#include 
+#include 
+
+int func()
+{
+  auto strVec = std::make_unique>();
+  strVec->emplace_back("One");
+  strVec->emplace_back("Two");
+  strVec->emplace_back("Three");
+  return 0;
+}
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 735893bb191..ecf45d29e76 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -1185,41 +1185,6 @@ get_or_alloc_expr_for_constant (tree constant)
   return newexpr;
 }
 
-/* Return the folded version of T if T, when folded, is a gimple
-   min_invariant or an SSA name.  Otherwise, return T.  */
-
-static pre_expr
-fully_constant_expression (pre_expr e)
-{
-  switch (e->kind)
-{
-case CONSTANT:
-  return e;
-case NARY:
-  {
-   vn_nary_op_t nary = PRE_EXPR_NARY (e);
-   tree res = vn_nary_simplify (nary);
-   if (!res)
- return e;
-   if (is_gimple_min_invariant (res))
- return get_or_alloc_expr_for_constant (res);
-   if (TREE_CODE (res) == SSA_NAME)
- return get_or_alloc_expr_for_name (res);
-   return e;
-  }
-case REFERENCE:
-  {
-   vn_reference_t ref = PRE_EXPR_REFERENCE (e);
-   tree folded;
-   if ((folded = fully_constant_vn_reference_p (ref)))
- return get_or_alloc_expr_for_constant (folded);
-   return e;
-  }
-default:
-  return e;
-}
-}
-
 /* Translate the VUSE backwards through phi nodes in E->dest, so that
it has the value it would have in E->src.  Set *SAME_VALID to true
in case the new vuse doesn't change the value id of the OPERANDS.  */
@@ -1443,57 +1408,55 @@ phi_translate_1 (bitmap_set_t dest,
  }
if (changed)
  {
-   pre_expr constant;
unsigned int new_val_id;
 
-   PRE_EXPR_NARY (expr) = newnary;
-   constant = fully_constant_expression (expr);
-   PRE_EXPR_NARY (expr) = nary;
-   if (constant != expr)
+   /* Try to simplify the new NARY.  */
+   tree res = vn_nary_simplify (newnary);
+   if (res)
  {
+   if (is_gimple_min_invariant (res))
+ return get_or_alloc_expr_for_constant (res);
+
/* For non-CONSTANTs we have to make sure we can eventually
   insert the expression.  Which means we need to have a
   leader for it.  */
-   if (constant->kind != CONSTANT)
+   gcc_assert (TREE_CODE (res) == SSA_NAME);
+
+   /* Do not allow simplifications to non-constants over
+  backedges as this will likely result in a loop PHI node
+  to be inserted and increased register pressure.
+  See PR77498 - this avoids doing predcoms work in
+  a less efficient way.  */
+   if (e->flags & EDGE_DFS_BACK)
+ ;
+   else
  {
-   /* Do not allow simplifications to non-constants over
-  backedges as this will likely result in a loop PHI node
-  to be inserted and increased register pressure.
-  See PR77498 - this avoids doing predcoms work in
-  a less efficient way.  */
-   if (e->flags & EDGE_DFS_BACK)
-

Re: [PATCH 2/3] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

2025-02-11 Thread Xi Ruoyao

On Tue, 2025-02-11 at 20:49 +0800, Lulu Cheng wrote:
> Split the implementation of the function loongarch_cpu_cpp_builtins
> into two parts:
>   1. Macro definitions that do not change (only considering 64-bit
> architecture)
>   2. Macro definitions that change with different compilation options.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch-c.cc (builtin_undef): New macro.
>   (loongarch_cpu_cpp_builtins): Split to
> loongarch_update_cpp_builtins
>   and loongarch_define_unconditional_macros.
>   (loongarch_def_or_undef): New functions.
>   (loongarch_define_unconditional_macros): Likewise.
>   (loongarch_update_cpp_builtins): Likewise.
> 
> Change-Id: Ifae73ffa2a07a595ed2a7f6ab7b82d8f51328a2a
> ---
>  gcc/config/loongarch/loongarch-c.cc | 109 +--
> -
>  1 file changed, 66 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/config/loongarch/loongarch-c.cc
> b/gcc/config/loongarch/loongarch-c.cc
> index 5d8c02e094b..9fe911325ab 100644
> --- a/gcc/config/loongarch/loongarch-c.cc
> +++ b/gcc/config/loongarch/loongarch-c.cc
> @@ -31,13 +31,21 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #define preprocessing_asm_p() (cpp_get_options (pfile)->lang ==
> CLK_ASM)
>  #define builtin_define(TXT) cpp_define (pfile, TXT)
> +#define builtin_undef(TXT) cpp_undef (pfile, TXT)
>  #define builtin_assert(TXT) cpp_assert (pfile, TXT)
>  
> -void
> -loongarch_cpu_cpp_builtins (cpp_reader *pfile)
> +static void
> +loongarch_def_or_undef (bool def_p, const char *macro, cpp_reader
> *pfile)
> +{
> +  if (def_p)
> +    cpp_define (pfile, macro);
> +  else
> +    cpp_undef (pfile, macro);
> +}
> +
> +static void
> +loongarch_define_unconditional_macros (cpp_reader *pfile)
>  {
> -  builtin_assert ("machine=loongarch");
> -  builtin_assert ("cpu=loongarch");
>    builtin_define ("__loongarch__");
>  
>    builtin_define_with_value ("__loongarch_arch",
> @@ -66,45 +74,6 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
>    builtin_define ("__loongarch_lp64");
>  }
>  
> -  /* These defines reflect the ABI in use, not whether the
> - FPU is directly accessible.  */
> -  if (TARGET_DOUBLE_FLOAT_ABI)
> -    builtin_define ("__loongarch_double_float=1");
> -  else if (TARGET_SINGLE_FLOAT_ABI)
> -    builtin_define ("__loongarch_single_float=1");
> -
> -  if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI)
> -    builtin_define ("__loongarch_hard_float=1");
> -  else
> -    builtin_define ("__loongarch_soft_float=1");
> -
> -
> -  /* ISA Extensions.  */
> -  if (TARGET_DOUBLE_FLOAT)
> -    builtin_define ("__loongarch_frlen=64");
> -  else if (TARGET_SINGLE_FLOAT)
> -    builtin_define ("__loongarch_frlen=32");
> -  else
> -    builtin_define ("__loongarch_frlen=0");
> -
> -  if (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE)
> -    builtin_define ("__loongarch_frecipe");
> -
> -  if (ISA_HAS_LSX)
> -    {
> -  builtin_define ("__loongarch_simd");
> -  builtin_define ("__loongarch_sx");
> -
> -  if (!ISA_HAS_LASX)
> - builtin_define ("__loongarch_simd_width=128");
> -    }
> -
> -  if (ISA_HAS_LASX)
> -    {
> -  builtin_define ("__loongarch_asx");
> -  builtin_define ("__loongarch_simd_width=256");
> -    }
> -
>    /* ISA evolution features */
>    int max_v_major = 1, max_v_minor = 0;

I guess the handling for la_evo_macro_name macros (like
__loongarch_div32) and
__loongarch_version_major/__loongarch_version_minor should be moved as
well?  Things like #pragma GCC target("arch=la664") may affect them.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets

2025-02-11 Thread Jin Ma

On Tue, 11 Feb 2025 20:29:03 +0800, Craig Blackmore wrote:
> On 10/02/2025 08:37, Jin Ma wrote:
> > On Sun, 09 Feb 2025 14:04:00 +0800, Jin Ma wrote:
> >>PR target/118601
> > 
> > Ok for trunk?
> > 
> > Best regards,
> > Jin Ma
> > 
> >> gcc/ChangeLog:
> >>
> >>* config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p):
> >>Exclude XTheadVector.
> >>
> >> Reported-by: Edwin Lu 
> >> ---
> >>   gcc/config/riscv/riscv.cc | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >> index 819e1538741..e5776aa0fbe 100644
> >> --- a/gcc/config/riscv/riscv.cc
> >> +++ b/gcc/config/riscv/riscv.cc
> >> @@ -13826,7 +13826,7 @@ riscv_use_by_pieces_infrastructure_p (unsigned 
> >> HOST_WIDE_INT size,
> >> /* For set/clear with size > UNITS_PER_WORD, by pieces uses vector 
> >> broadcasts
> >>with UNITS_PER_WORD size pieces.  Use setmem instead which 
> >> can use
> >>bigger chunks.  */
> >> -  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR
> >> +  if (TARGET_VECTOR && !TARGET_XTHEADVECTOR && stringop_strategy & 
> >> STRATEGY_VECTOR
> >> && (op == CLEAR_BY_PIECES || op == SET_BY_PIECES)
> >> && speed_p && size > UNITS_PER_WORD)
> >>   return false;
> 
> `riscv_vector::expand_vec_setmem` generates the unrecognizable
> instruction and your patch avoids calling it in some, but not all,
> cases. Here is a case that still ICEs with `-march=rv32gc_xtheadvector
> -mabi=ilp32d` and `-march=rv64gc_xtheadvector -mabi=lp64d` after
> applying your patch:
> ```
> void foo1_16 (void *p)
> {
>__builtin_memset (p, 1, 16);
> }
> ```
> I suggest returning `false` in `riscv_vector::expand_vec_setmem` for
> `TARGET_XTHEADVECTOR` or trying to generate something that is valid for
> `TARGET_XTHEADVECTOR`. If you do bail out of
> `riscv_vector::expand_vec_setmem` then you probably want to keep your
> existing change too so that by pieces is still used for smaller lengths
> rather than a libcall.

Thank you very much for your professional reply. I think this problem is very
simple and wrong judgment has occurred. I will rethink and think about this.

Best regards,
Jin Ma

> >> -- 
> >> 2.25.1
> >

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-11 Thread Richard Sandiford

Jeff Law  writes:
> On 2/7/25 5:59 AM, Andrew Waterman wrote:
>> This patch runs counter to the ABI spec, which states that vxrm is not
>> preserved across calls and is volatile upon function entry [1].  vxrm
>> does not play the same role as frm plays in the calling convention.
>> (I won't get into the rationale in this email, but the rationale isn't
>> especially important: we should follow the ABI.)
>> 
>> [1] 
>> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120
> Pan's patch doesn't change the basic property that VXRM has no known 
> state at function entry or upon return from a function call.

I think it will.  global_regs[X] means that X is defined on entry,
defined on exit, and can be changed by calls.  If the register is
call-clobbered/volatile/caller-saved, then I agree with Andrew that
this doesn't look like the right fix.

Thanks,
Richard

Re: [PATCH] aarch64: Update fp8 dependencies

2025-02-11 Thread Richard Sandiford

Andrew Carlotti  writes:
> We agreed with LLVM developer to not enforce the architectural
> dependencies between fp8 multiplication features, and they have already
> been removed from LLVM and Binutils.  Remove them from GCC as well.
>
>
>
> I have bootstrapped and regression tested this.  There are no test
> result changes between GCC+Binutils with old feature dependencies and
> GCC+Binutils with new feature dependencies, and some improvements
> compared to old GCC with new Binutils.
>
> Ok for master?
>
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-option-extensions.def
>   (SSVE_FP8FMA): Adjust formatting.
>   (FP8DOT4): Replace FP8FMA dependency with FP8.
>   (SSVE_FP8DOT4): Replace SSVE_FP8FMA dependency with SME2+FP8.
>   (FP8DOT2): Replace FP8DOT4 dependency with FP8.
>   (SSVE_FP8DOT2): Replace SSVE_FP8DOT4 dependency with SME2+FP8.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/pragma_cpp_predefs_4.c: Adjust expected
>   defines.
>   * gcc.target/aarch64/simd/vmla_lane_indices_1.c: Modify target
>   pragmas.
>   * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c:
>   Ditto.
>   * 
> gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
>   Ditto.
>   * gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Ditto.
>   * gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Ditto.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 
> cc42bd518dca5e4b947c81f06e543133b4f25440..aa8d315c240fbd25b49008b131cc09f04001eb80
>  100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -261,17 +261,17 @@ AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), 
> "f8cvt")
>  
>  AARCH64_OPT_EXTENSION("fp8fma", FP8FMA, (FP8), (), (), "f8fma")
>  
> -AARCH64_OPT_EXTENSION("ssve-fp8fma", SSVE_FP8FMA, (SME2,FP8), (), (), 
> "smesf8fma")
> +AARCH64_OPT_EXTENSION("ssve-fp8fma", SSVE_FP8FMA, (SME2, FP8), (), (), 
> "smesf8fma")
>  
>  AARCH64_OPT_EXTENSION("faminmax", FAMINMAX, (SIMD), (), (), "faminmax")
>  
> -AARCH64_OPT_EXTENSION("fp8dot4", FP8DOT4, (FP8FMA), (), (), "f8dp4")
> +AARCH64_OPT_EXTENSION("fp8dot4", FP8DOT4, (FP8), (), (), "f8dp4")
>  
> -AARCH64_OPT_EXTENSION("ssve-fp8dot4", SSVE_FP8DOT4, (SSVE_FP8FMA), (), (), 
> "smesf8dp4")
> +AARCH64_OPT_EXTENSION("ssve-fp8dot4", SSVE_FP8DOT4, (SME2, FP8), (), (), 
> "smesf8dp4")
>  
> -AARCH64_OPT_EXTENSION("fp8dot2", FP8DOT2, (FP8DOT4), (), (), "f8dp2")
> +AARCH64_OPT_EXTENSION("fp8dot2", FP8DOT2, (FP8), (), (), "f8dp2")
>  
> -AARCH64_OPT_EXTENSION("ssve-fp8dot2", SSVE_FP8DOT2, (SSVE_FP8DOT4), (), (), 
> "smesf8dp2")
> +AARCH64_OPT_EXTENSION("ssve-fp8dot2", SSVE_FP8DOT2, (SME2, FP8), (), (), 
> "smesf8dp2")
>  
>  AARCH64_OPT_EXTENSION("lut", LUT, (SIMD), (), (), "lut")
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c 
> b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
> index 
> 0dcfbec05bad5f446c9f169051c9b86b9844946d..97d68b94512e1ffdd5ceb484a6378b3a1ec9d115
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
> @@ -292,7 +292,7 @@
>  #ifndef __ARM_FEATURE_FP8
>  #error Foo
>  #endif
> -#ifndef __ARM_FEATURE_FP8FMA
> +#ifdef __ARM_FEATURE_FP8FMA
>  #error Foo
>  #endif
>  #ifndef __ARM_FEATURE_FP8DOT4
> @@ -306,10 +306,10 @@
>  #ifndef __ARM_FEATURE_FP8
>  #error Foo
>  #endif
> -#ifndef __ARM_FEATURE_FP8FMA
> +#ifdef __ARM_FEATURE_FP8FMA
>  #error Foo
>  #endif
> -#ifndef __ARM_FEATURE_FP8DOT4
> +#ifdef __ARM_FEATURE_FP8DOT4
>  #error Foo
>  #endif
>  #ifndef __ARM_FEATURE_FP8DOT2
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c
> index 
> d1a69f4ba54133a5d6d19b5fb73c2768ec29e60b..739ff4c6a75a8014637b2b48d8121127ad6a8539
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c
> @@ -2,7 +2,7 @@
>  
>  #include "arm_neon.h"
>  
> -#pragma GCC target "+fp8dot4+fp8dot2"
> +#pragma GCC target "+fp8fma"
>  
>  void
>  test(float16x4_t f16, float16x8_t f16q, float32x2_t f32,
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
> index 
> 9ad789a8ad2c5df109d6471a7ca22355ba26edea..fa0df46db2262a5a3e17bec974fb4807886708e9
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
> @@ -2,7 +2,7 @@
>  
>  #include 
>  
> -#pragma GCC target ("arch=armv8.2-a+sve2+fp8dot2")
> +#pragma GCC target ("arch=armv8.2-a+sve2+fp8fma+fp8dot4+fp8dot2")
>  
>  void
>  test (svfloat16_t f16, svmfloat8_t f8, fpm_t fp

Re: PING^2 [RFC] Prevent the scheduler from moving prefetch instructions when expanding __builtin_prefetch [PR 116713]

2025-02-11 Thread Richard Sandiford

Jeff Law  writes:
> On 2/7/25 5:51 AM, Oleg Endo wrote:
>>> Hi,
>>>
>>> Can the issue be resolved in a target independent manner as suggested below?
>>> Or is it better to deal with this in the target code?
> That seems like a pretty heavy hammer though.  For that reason alone I 
> think this is going to need some discussion and I believe the folks most 
> needed for that discussion are focused on release related issues.

Yeah, agreed.  Prefetches ought to be restricted to performance-critical
code, which is also the kind of code that would suffer from having extra
blockage instructions.

We do have a prefetch rtl code, so it should be possible for the scheduler
to recognise prefetches and handle them in a more sensible way.  That would
be more complex though...

Thanks,
Richard

Re: [PATCH] lto: Add an entry for cold attribute to lto_gnu_attributes

2025-02-11 Thread Richard Biener

On Mon, Feb 10, 2025 at 11:01 PM Martin Jambor  wrote:
>
> Hi,
>
> PR 118125 is a performance regression stemming from the fact that we
> lose the cold attribute of our __builtin_unreachable.  The attribute
> is simply and silently dropped on the floor by decl_attributes (in
> attribs.cc) in the process of building decls for builtins because it
> cannot look it up in the gnu attribute name space by
> lookup_scoped_attribute_spec.  For that not to happen it must be in
> lto_gnu_attributes and this patch adds it there.
>
> In comment 13 of the bug Andrew identified other attributes which are
> in builtin-attrs.def but missing in lto_gnu_attributes but apart from
> cold it seems that they are either not used in builtins.def or are
> used in DEF_LIB_BUILTIN which I guess might be less critical?
> Eventually I decided to go for the most simple of patches and only add
> things if they are requested.  For the same reason I also did not add
> any checking to the attribute "handle" callback or any exclusion check.
> They seem to be mostly relevant before LTO FE kicks in to me, but
> again, I'm happy to add any if they seem to be useful.
>
> Since Ian fixed PR 118746, the same issue has also been fixed in the
> Go front-end and so I have added a simple checking assert to the
> redirect_to_unreachable function to make sure it has the intended
> effect.
>
> LTO-bootstrapped and tested on x86_64-linux.  OK for master?

OK.

Thanks,
Richard.

> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2025-02-03  Martin Jambor  
>
> PR lto/118125
> * ipa-fnsummary.cc (redirect_to_unreachable): Add checking assert
> that the builtin_unreachable decl has attribute cold.
>
> gcc/lto/ChangeLog:
>
> 2025-02-03  Martin Jambor  
>
> PR lto/118125
> * lto-lang.cc (lto_gnu_attributes): Add an entry for cold attribute.
> (handle_cold_attribute): New function.
> ---
>  gcc/ipa-fnsummary.cc |  3 +++
>  gcc/lto/lto-lang.cc  | 13 +
>  2 files changed, 16 insertions(+)
>
> diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> index 33f19365ec3..4c062fe8a0e 100644
> --- a/gcc/ipa-fnsummary.cc
> +++ b/gcc/ipa-fnsummary.cc
> @@ -255,6 +255,9 @@ redirect_to_unreachable (struct cgraph_edge *e)
>struct cgraph_node *target
>  = cgraph_node::get_create (builtin_decl_unreachable ());
>
> +  gcc_checking_assert (lookup_attribute ("cold",
> +DECL_ATTRIBUTES (target->decl)));
> +
>if (e->speculative)
>  e = cgraph_edge::resolve_speculation (e, target->decl);
>else if (!e->callee)
> diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc
> index 652d7fc5e30..e41b548b398 100644
> --- a/gcc/lto/lto-lang.cc
> +++ b/gcc/lto/lto-lang.cc
> @@ -60,6 +60,7 @@ static tree ignore_attribute (tree *, tree, tree, int, bool 
> *);
>  static tree handle_format_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_format_arg_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_cold_attribute (tree *, tree, tree, int, bool *);
>
>  /* Helper to define attribute exclusions.  */
>  #define ATTR_EXCL(name, function, type, variable)  \
> @@ -128,6 +129,8 @@ static const attribute_spec lto_gnu_attributes[] =
>   handle_sentinel_attribute, NULL },
>{ "type generic",   0, 0, false, true, true, false,
>   handle_type_generic_attribute, NULL },
> +  { "cold",  0, 0, false,  false, false, false,
> + handle_cold_attribute, NULL },
>{ "fn spec",   1, 1, false, true, true, false,
>   handle_fnspec_attribute, NULL },
>{ "transaction_pure",  0, 0, false, true, true, false,
> @@ -598,6 +601,16 @@ handle_fnspec_attribute (tree *node ATTRIBUTE_UNUSED, 
> tree ARG_UNUSED (name),
>return NULL_TREE;
>  }
>
> +/* Handle a "cold" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_cold_attribute (tree *, tree, tree, int, bool *)
> +{
> +  /* Nothing to be done here.  */
> +  return NULL_TREE;
> +}
> +
>  /* Cribbed from c-common.cc.  */
>
>  static void
> --
> 2.47.1
>

Re: [PATCH] middle-end/118801 - excessive redundant DEBUG BEGIN_STMT

2025-02-11 Thread Richard Biener

On Mon, 10 Feb 2025, Richard Biener wrote:

> On Mon, 10 Feb 2025, Richard Biener wrote:
> 
> > The following addresses the fact that we keep an excessive amount of
> > redundant DEBUG BEGIN_STMTs - in the testcase it sums up to 99.999%
> > of all stmts, sucking up compile-time in IL walks.  The patch amends
> > the GIMPLE DCE code that elides redundant DEBUG BIND stmts, also
> > pruning uninterrupted sequences of DEBUG BEGIN_STMTs, keeping only
> > the last one.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > For the testcase this brings down compile-time to 150% of -g0 levels
> > (and only 215 out of originally 1981380 DEBUG BEGIN_STMTs kept).
> > 
> > OK for trunk and possibly backports?
> 
> It regresses a few guality checks (and progresses one), I've looked
> only into one, g++.dg/guality/pr67192.C, where we now see
> FAIL: g++.dg/guality/pr67192.C   -O[123sg]  line 54 cnt == 15
> because the breakpoint happens in the wrong place.  But this shows
> it "works" only by accident.  The testcase is
> 
> __attribute__((noinline, noclone)) static void
> f4 (void)
> {
>   while (1) /* { dg-final { gdb-test 54 "cnt" "15" } } */
> if (last ())
>   break;
> else
>   do_it ();
>   do_it (); /* { dg-final { gdb-test 59 "cnt" "20" } } */
> }
> 
> and we have two BEGIN_STMTs for line 54(!) originally:
> 
>   [/space/rguenther/src/gcc/gcc/testsuite/g++.dg/guality/pr67192.C:54:3] # 
> DEBUG BEGIN_STMT
>   :
>   [/space/rguenther/src/gcc/gcc/testsuite/g++.dg/guality/pr67192.C:55:5] # 
> DEBUG BEGIN_STMT
> ...
>   [/space/rguenther/src/gcc/gcc/testsuite/g++.dg/guality/pr67192.C:54:3] # 
> DEBUG BEGIN_STMT
>   [/space/rguenther/src/gcc/gcc/testsuite/g++.dg/guality/pr67192.C:55:5] 
> goto ;
> 
> and special code in make_blocks() moves the first BEGIN_STMT after
> the label, altering when we reach a breakpoint on the line.
> 
> You can see that with the first BEGIN_STMT moved the patch will elide it,
> and gdb will find the second location.
> 
> With removing only repeating BEGIN_STMT with exactly
> the same location (unfortunately with uint64_t a bitmap no longer
> works), we're "only" down to 996 BEGIN_STMTs for the testcase.
> 
> So I'm retesting the following.

Bootstrapped and tested on x86_64-unknown-linux-gnu without
regressions this time.

Alex, is this OK for trunk?

Thanks,
Richard.


> Richard.
> 
> From 38d49d3e2c0bf98e9e2a46e251ae0454b084cc8d Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Mon, 10 Feb 2025 10:23:45 +0100
> Subject: [PATCH] middle-end/118801 - excessive redundant DEBUG BEGIN_STMT
> To: gcc-patches@gcc.gnu.org
> 
> The following addresses the fact that we keep an excessive amount of
> redundant DEBUG BEGIN_STMTs - in the testcase it sums up to 99.999%
> of all stmts, sucking up compile-time in IL walks.  The patch amends
> the GIMPLE DCE code that elides redundant DEBUG BIND stmts, also
> pruning uninterrupted sequences of DEBUG BEGIN_STMTs, keeping only
> the last of each set of DEBUG BEGIN_STMT with unique location.
> 
>   PR middle-end/118801
>   * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Prune
>   sequences of uninterrupted DEBUG BEGIN_STMTs, keeping only
>   the last of a set with unique location.
> ---
>  gcc/tree-ssa-dce.cc | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
> index be21a2d0b50..461283ba858 100644
> --- a/gcc/tree-ssa-dce.cc
> +++ b/gcc/tree-ssa-dce.cc
> @@ -1508,6 +1508,7 @@ eliminate_unnecessary_stmts (bool aggressive)
>  
>/* Remove dead statements.  */
>auto_bitmap debug_seen;
> +  hash_set> locs_seen;
>for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi = psi)
>   {
> stmt = gsi_stmt (gsi);
> @@ -1670,6 +1671,15 @@ eliminate_unnecessary_stmts (bool aggressive)
>   remove_dead_stmt (&gsi, bb, to_remove_edges);
> continue;
>   }
> +   else if (gimple_debug_begin_stmt_p (stmt))
> + {
> +   /* We are only keeping the last debug-begin in a series of
> +  debug-begin stmts.  */
> +   if (locs_seen.add (gimple_location (stmt)))
> + remove_dead_stmt (&gsi, bb, to_remove_edges);
> +   continue;
> + }
> +   locs_seen.empty ();
> bitmap_clear (debug_seen);
>   }
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-11 Thread Xi Ruoyao

On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote:
> 
> 在 2025/2/7 下午8:09, Xi Ruoyao 写道:
> /* snip */
> > -
> > -(define_insn "lasx_xvpickev_w"
> > -  [(set (match_operand:V8SI 0 "register_operand" "=f")
> > -   (vec_select:V8SI
> > -     (vec_concat:V16SI
> > -       (match_operand:V8SI 1 "register_operand" "f")
> > -       (match_operand:V8SI 2 "register_operand" "f"))
> > -     (parallel [(const_int 0) (const_int 2)
> > -    (const_int 8) (const_int 10)
> > -    (const_int 4) (const_int 6)
> > -    (const_int 12) (const_int 14)])))]
> > -  "ISA_HAS_LASX"
> > -  "xvpickev.w\t%u0,%u2,%u1"
> > -  [(set_attr "type" "simd_permute")
> > -   (set_attr "mode" "V8SI")])
> > -
> /* snip */
> > +;; Picking even/odd elements.
> > +(define_insn "simd_pick_evod_"
> > +  [(set (match_operand:ALLVEC 0 "register_operand" "=f")
> > +   (vec_select:ALLVEC
> > +     (vec_concat:
> > +       (match_operand:ALLVEC 1 "register_operand" "f")
> > +       (match_operand:ALLVEC 2 "register_operand" "f"))
> > +     (match_operand: 3 "vect_par_cnst_even_or_odd_half")))]
> 
> For LASX, the generated select array is problematic, taking xvpickev.w
> as an example:
> 
> xvpickev.w  vd,vj,vk
> 
> The behavior of the instruction is as follows:
> 
> vd.w[0] = vk.w[0]
> 
> vd.w[1] = vk.w[2]
> 
> vd.w[2] = vj.w[0]
> 
> vd.w[3] = vj.w[2]
> 
> vd.w[4] = vk.w[4]
> 
> vd.w[5] = vk.w[6]
> 
> vd.w[6] = vj.w[4]
> 
> vd.w[7] = vj.w[6]

Oops stupid I.  Strangely the bootstrapping (even with BOOT_CFLAGS="-O2
-g -march=la664") and regtesting cannot catch it.

I'll limit this to LSX in v2.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3 2/5] c++/modules: Ignore TU-local entities where necessary

2025-02-11 Thread Patrick Palka

On Wed, 12 Feb 2025, Nathaniel Shead wrote:

> On Mon, Jan 27, 2025 at 10:20:05AM -0500, Patrick Palka wrote:
> > [snip]
> >
> > > @@ -18486,6 +18562,12 @@ dependent_operand_p (tree t)
> > >  {
> > >while (TREE_CODE (t) == IMPLICIT_CONV_EXPR)
> > >  t = TREE_OPERAND (t, 0);
> > > +
> > > +  /* If we contain a TU_LOCAL_ENTITY assume we're non-dependent; we'll 
> > > error
> > > + later when instantiating.  */
> > > +  if (expr_contains_tu_local_entity (t))
> > > +return false;
> > 
> > I think it'd be more robust and cheaper (avoiding a separate tree walk)
> > to teach the general constexpr/dependence predicates about
> > TU_LOCAL_ENTITY instead of handling it only here.
> > 
> > > +
> > >++processing_template_decl;
> > >bool r = (potential_constant_expression (t)
> > >   ? value_dependent_expression_p (t)
> > > @@ -20255,6 +20337,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> > > complain, tree in_decl)
> > >   else
> > > object = NULL_TREE;
> > >  
> > > + if (function_contains_tu_local_entity (templ))
> > > +   RETURN (error_mark_node);
> > > +
> > >   tree tid = lookup_template_function (templ, targs);
> > >   protected_set_expr_location (tid, EXPR_LOCATION (t));
> > >  
> > > @@ -20947,6 +21032,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> > > complain, tree in_decl)
> > > qualified_p = true;
> > > }
> > >  
> > > + if (function_contains_tu_local_entity (function))
> > > +   RETURN (error_mark_node);
> > 
> > Similarly, maybe it'd suffice to check this more generally in the
> > OVERLOAD case of tsubst_expr?
> > 
> 
> So I'd completely missed the idea of handling it in the OVERLOAD case;
> doing this also fixes the issues I'd been having trying to handle it in
> potential_constant_expression.  I think this should be a lot cleaner
> now.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Nice, LGTM!

> 
> -- >8 --
> 
> Subject: [PATCH] c++: Handle TU_LOCAL_ENTITY in tsubst_expr and
>  potential_constant_expression
> 
> This cleans up the TU_LOCAL_ENTITY handling to avoid unnecessary
> tree walks and make the logic more robust.
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (potential_constant_expression_1): Handle
>   TU_LOCAL_ENTITY.
>   * pt.cc (expr_contains_tu_local_entity): Remove.
>   (function_contains_tu_local_entity): Remove.
>   (dependent_operand_p): Remove special handling for
>   TU_LOCAL_ENTITY.
>   (tsubst_expr): Handle TU_LOCAL_ENTITY when tsubsting OVERLOADs;
>   remove now-unnecessary extra handling.
>   (type_dependent_expression_p): Handle TU_LOCAL_ENTITY.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/constexpr.cc |  5 +++
>  gcc/cp/pt.cc| 80 -
>  2 files changed, 19 insertions(+), 66 deletions(-)
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index f142dd32bc8..b36705fd4ce 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -10825,6 +10825,11 @@ potential_constant_expression_1 (tree t, bool 
> want_rval, bool strict, bool now,
>  case CO_RETURN_EXPR:
>return false;
>  
> +/* Assume a TU-local entity is not constant, we'll error later when
> +   instantiating.  */
> +case TU_LOCAL_ENTITY:
> +  return false;
> +
>  case NONTYPE_ARGUMENT_PACK:
>{
>   tree args = ARGUMENT_PACK_ARGS (t);
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index f857b3f1180..966050a6608 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -9935,61 +9935,6 @@ complain_about_tu_local_entity (tree e)
>inform (TU_LOCAL_ENTITY_LOCATION (e), "declared here");
>  }
>  
> -/* Checks if T contains a TU-local entity.  */
> -
> -static bool
> -expr_contains_tu_local_entity (tree t)
> -{
> -  if (!modules_p ())
> -return false;
> -
> -  auto walker = [](tree *tp, int *walk_subtrees, void *) -> tree
> -{
> -  if (TREE_CODE (*tp) == TU_LOCAL_ENTITY)
> - return *tp;
> -  if (!EXPR_P (*tp))
> - *walk_subtrees = false;
> -  return NULL_TREE;
> -};
> -  return cp_walk_tree (&t, walker, nullptr, nullptr);
> -}
> -
> -/* Errors and returns TRUE if X is a function that contains a TU-local
> -   entity in its overload set.  */
> -
> -static bool
> -function_contains_tu_local_entity (tree x)
> -{
> -  if (!modules_p ())
> -return false;
> -
> -  if (!x || x == error_mark_node)
> -return false;
> -
> -  if (TREE_CODE (x) == OFFSET_REF
> -  || TREE_CODE (x) == COMPONENT_REF)
> -x = TREE_OPERAND (x, 1);
> -  x = MAYBE_BASELINK_FUNCTIONS (x);
> -  if (TREE_CODE (x) == TEMPLATE_ID_EXPR)
> -x = TREE_OPERAND (x, 0);
> -
> -  if (OVL_P (x))
> -for (tree ovl : lkp_range (x))
> -  if (TREE_CODE (ovl) == TU_LOCAL_ENTITY)
> - {
> -   x = ovl;
> -   break;
> - }
> -
> -  if (TREE_CODE (x) == TU_LOCAL_ENTITY)
> -{
> -  complain_about_tu_local_entity (x);
> -  return true;
> -}
> -
> -  return false;
> -}
> -
>

Re: [RFA][PR target/115478] Accept ADD, IOR or XOR when combining objects with no bits in common

2025-02-11 Thread Richard Sandiford

Jeff Law  writes:
> So the change to prefer ADD over IOR for combining two objects with no 
> bits in common is (IMHO) generally good.  It has some minor fallout.
>
> In particular the aarch64 port (and I suspect others) have patterns that 
> recognize IOR, but not PLUS or XOR for these cases and thus tests which 
> expected to optimize with IOR are no longer optimizing.
>
> Roger suggested using a code iterator for this purpose.  Richard S. 
> suggested a new match operator to cover those cases.
>
> I really like the match operator idea, but as Richard S. notes in the PR 
> it would require either not validating the "no bits in common", which 
> dramatically reduces the utility IMHO or we'd need some work to allow 
> consistent results without polluting the nonzero bits cache.
>
> So this patch goes back to Roger's idea of just using a match iterator 
> in the aarch64 backend (and presumably anywhere else we see this popping 
> up).
>
> Bootstrapped and regression tested on aarch64-linux-gnu where it fixes 
> bitint-args.c (as expected).
>
> OK for the trunk?
>
> Jeff
>
>   PR target/115478
> gcc/
>   * config/aarch64/iterators.md (any_or_plus): New code iterator.
>   * config/aarch64/aarch64.md (extr5_insn): Use any_or_plus.
>   (extr5_insn_alt, extrsi5_insn_uxtw): Likewise.
>   (extrsi5_insn_uxtw_alt, extrsi5_insn_di): Likewise.
>
> gcc/testsuite/
>   * gcc.target/aarch64/bitint-args.c: Update expected output.

OK, thanks!

(For the record, I agree that the match_operator thing requires too many
changes for stage 4.)

Richard

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 071058dbeb3..cfe730f3732 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6194,10 +6194,11 @@
>  
>  (define_insn "*extr5_insn"
>[(set (match_operand:GPI 0 "register_operand" "=r")
> - (ior:GPI (ashift:GPI (match_operand:GPI 1 "register_operand" "r")
> -  (match_operand 3 "const_int_operand" "n"))
> -  (lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r")
> -(match_operand 4 "const_int_operand" "n"]
> + (any_or_plus:GPI
> +   (ashift:GPI (match_operand:GPI 1 "register_operand" "r")
> +   (match_operand 3 "const_int_operand" "n"))
> +   (lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r")
> +   (match_operand 4 "const_int_operand" "n"]
>"UINTVAL (operands[3]) < GET_MODE_BITSIZE (mode) &&
> (UINTVAL (operands[3]) + UINTVAL (operands[4]) == GET_MODE_BITSIZE 
> (mode))"
>"extr\\t%0, %1, %2, %4"
> @@ -6208,10 +6209,11 @@
>  ;; so we have to match both orderings.
>  (define_insn "*extr5_insn_alt"
>[(set (match_operand:GPI 0 "register_operand" "=r")
> - (ior:GPI  (lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r")
> - (match_operand 4 "const_int_operand" "n"))
> -   (ashift:GPI (match_operand:GPI 1 "register_operand" "r")
> -   (match_operand 3 "const_int_operand" "n"]
> + (any_or_plus:GPI
> +   (lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r")
> + (match_operand 4 "const_int_operand" "n"))
> +   (ashift:GPI (match_operand:GPI 1 "register_operand" "r")
> +   (match_operand 3 "const_int_operand" "n"]
>"UINTVAL (operands[3]) < GET_MODE_BITSIZE (mode)
> && (UINTVAL (operands[3]) + UINTVAL (operands[4])
> == GET_MODE_BITSIZE (mode))"
> @@ -6223,10 +6225,11 @@
>  (define_insn "*extrsi5_insn_uxtw"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (zero_extend:DI
> -  (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "r")
> - (match_operand 3 "const_int_operand" "n"))
> -  (lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
> -   (match_operand 4 "const_int_operand" "n")]
> +  (any_or_plus:SI
> +(ashift:SI (match_operand:SI 1 "register_operand" "r")
> +   (match_operand 3 "const_int_operand" "n"))
> +(lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
> +   (match_operand 4 "const_int_operand" "n")]
>"UINTVAL (operands[3]) < 32 &&
> (UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32)"
>"extr\\t%w0, %w1, %w2, %4"
> @@ -6236,10 +6239,11 @@
>  (define_insn "*extrsi5_insn_uxtw_alt"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (zero_extend:DI
> -  (ior:SI (lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
> -(match_operand 4 "const_int_operand" "n"))
> -  (ashift:SI (match_operand:SI 1 "register_operand" "r")
> - (match_operand 3 "const_int_operand" "n")]
> +  (any_or_plus:SI
> +(lshiftrt:SI (match_operand:SI 2 "register_operand" "r")
> + (match_operand 4 "co

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-11 Thread Richard Sandiford

Jeff Law  writes:
> On 2/11/25 9:08 AM, Richard Sandiford wrote:
>> Jeff Law  writes:
>>> On 2/7/25 5:59 AM, Andrew Waterman wrote:
 This patch runs counter to the ABI spec, which states that vxrm is not
 preserved across calls and is volatile upon function entry [1].  vxrm
 does not play the same role as frm plays in the calling convention.
 (I won't get into the rationale in this email, but the rationale isn't
 especially important: we should follow the ABI.)

 [1] 
 https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120
>>> Pan's patch doesn't change the basic property that VXRM has no known
>>> state at function entry or upon return from a function call.
>> 
>> I think it will.  global_regs[X] means that X is defined on entry,
>> defined on exit, and can be changed by calls.  If the register is
>> call-clobbered/volatile/caller-saved, then I agree with Andrew that
>> this doesn't look like the right fix.
> But the LCM code we use to manage vxrm assignments makes no assumption 
> about incoming state and assumes no state is preserved across calls.

In that case, I wonder what the patch is fixing.  Like you say,
the initial mode seems to be VXRM_MODE_NONE, and it looks like
riscv_vxrm_mode_after correctly models calls as clobbering the mode.

In the FRM case, the problem was that we had:

  entry:
call initialize
X := FRM
...
FRM := X

Since FRM was not previously defined on entry, and since the call in any
case was assumed to clobber FRM, the X := FRM seemed to be reading an
uninitialised value, and so the FRM := X could be folded away.
But from your description, and from an admittedly cursory look at
the code, it sounds like that couldn't happen for VXRM.

Richard

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Hongtao Liu

> PR117081 is about regression in povray. The reducted testcase:
Just for clarification. PR117081 is not about regression in povray.
it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not
testl
The pr91384.c is added by r12-7417 which is peephole optimization
expecting some specific instruction sequence, the regression can be
fixed by adding new peephole pattern.

H.J patch actually regressed povray by introducing extra push/pops
(since it adds preference for callee save registers, in the benchmark
using caller saved registers is much better).
Sorry, I may not have been clear in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9


-- 
BR,
Hongtao

Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-11 Thread Lulu Cheng




在 2025/2/7 下午8:09, Xi Ruoyao 写道:
/* snip */

-
-(define_insn "lasx_xvpickev_w"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-   (vec_select:V8SI
- (vec_concat:V16SI
-   (match_operand:V8SI 1 "register_operand" "f")
-   (match_operand:V8SI 2 "register_operand" "f"))
- (parallel [(const_int 0) (const_int 2)
-(const_int 8) (const_int 10)
-(const_int 4) (const_int 6)
-(const_int 12) (const_int 14)])))]
-  "ISA_HAS_LASX"
-  "xvpickev.w\t%u0,%u2,%u1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V8SI")])
-

/* snip */

+;; Picking even/odd elements.
+(define_insn "simd_pick_evod_"
+  [(set (match_operand:ALLVEC 0 "register_operand" "=f")
+   (vec_select:ALLVEC
+ (vec_concat:
+   (match_operand:ALLVEC 1 "register_operand" "f")
+   (match_operand:ALLVEC 2 "register_operand" "f"))
+ (match_operand: 3 "vect_par_cnst_even_or_odd_half")))]


For LASX, the generated select array is problematic, taking xvpickev.w 
as an example:


xvpickev.w  vd,vj,vk

The behavior of the instruction is as follows:

vd.w[0] = vk.w[0]

vd.w[1] = vk.w[2]

vd.w[2] = vj.w[0]

vd.w[3] = vj.w[2]

vd.w[4] = vk.w[4]

vd.w[5] = vk.w[6]

vd.w[6] = vj.w[4]

vd.w[7] = vj.w[6]

At this point, the select array should be {0, 2, 8, 10, 4, 6, 12, 14} 
instead of {0, 2, 4, 6, 8, 10, 12, 14}.



+  "GET_MODE_SIZE (mode) != 8" ;; Use vilvl.d instead
+  "vpick%O3.\t%0,%2,%1"
+  [(set_attr "type" "simd_permute")
+   (set_attr "mode" "")])
+
+(define_expand "_vpick_<_f>"
+  [(match_operand:ALLVEC 0 "register_operand" "=f")
+   (match_operand:ALLVEC 1 "register_operand" " f")
+   (match_operand:ALLVEC 2 "register_operand" " f")
+   (const_int zero_one)]
+  "GET_MODE_SIZE (mode) != 8" ;; Use vilvl.d instead
+{
+  int nelts = GET_MODE_NUNITS (mode);
+  rtx op3 = loongarch_gen_stepped_int_parallel (nelts, , 2);
+  rtx insn = gen_simd_pick_evod_ (operands[0], operands[1],
+   operands[2], op3);
+  emit_insn (insn);
+  DONE;
+})
+
  ;; Integer widening add/sub/mult.
  (define_insn "simd_w_evod__"
[(set (match_operand: 0 "register_operand" "=f")

Re: [PATCH] c++: Fix use-after-free of replaced friend instantiation [PR118807]

2025-02-11 Thread Jason Merrill


On 2/10/25 11:58 PM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu (and additionally
passed modules.exp with a checking=all build), OK for trunk?

-- >8 --

When instantiating a friend function, we call register_specialization
which adds it to the DECL_TEMPLATE_INSTANTIATIONS of the template.
However, in some circumstances we might immediately call pushdecl and
find an existing specialisation.  In this case, when reregistering the
specialisation we also need to update the DECL_TEMPLATE_INSTANTIATIONS
list so that we don't try to access the freed spec again later.

PR c++/118807

gcc/cp/ChangeLog:

* pt.cc (reregister_specialization): Remove spec from
DECL_TEMPLATE_INSTANTIATIONS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118807.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/pt.cc| 11 +++
  gcc/testsuite/g++.dg/modules/pr118807.C | 11 +++
  2 files changed, 22 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr118807.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 39232b5e67f..e1764743597 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1985,6 +1985,17 @@ reregister_specialization (tree spec, tree tinfo, tree 
new_spec)
gcc_assert (entry->spec == spec || entry->spec == new_spec);
gcc_assert (new_spec != NULL_TREE);
entry->spec = new_spec;
+
+  /* We need to also remove the old specialisation from


Let's say "spec" instead of "old specialisation", old and new are kind 
of confusing here since in duplicate_decls, new_spec is olddecl and spec 
is newdecl.


OK with that tweak.


+DECL_TEMPLATE_INSTANTIATIONS if it was placed there.  */
+  for (tree *inst = &DECL_TEMPLATE_INSTANTIATIONS (elt.tmpl);
+  *inst; inst = &TREE_CHAIN (*inst))
+   if (TREE_VALUE (*inst) == spec)
+ {
+   *inst = TREE_CHAIN (*inst);
+   break;
+ }
+
return 1;
  }
  
diff --git a/gcc/testsuite/g++.dg/modules/pr118807.C b/gcc/testsuite/g++.dg/modules/pr118807.C

new file mode 100644
index 000..a97afb92699
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr118807.C
@@ -0,0 +1,11 @@
+// PR c++/118807
+// { dg-additional-options "-fmodules --param=ggc-min-expand=0 
--param=ggc-min-heapsize=0 -Wno-global-module" }
+
+module;
+template  class basic_streambuf;
+template  struct basic_streambuf {
+  friend void __istream_extract();
+};
+template class basic_streambuf;
+template class basic_streambuf;
+export module M;

Re: [PATCH 4/8] LoongArch: Simplify {lsx_,lasx_x}hv{add,sub}w description

2025-02-11 Thread Xi Ruoyao

On Tue, 2025-02-11 at 15:48 +0800, Lulu Cheng wrote:
> Hi,
> 
>   I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be 
> "{lsx_,lasx_x}vh{add,sub}w".

Indeed.

> 
> 在 2025/2/7 下午8:09, Xi Ruoyao 写道:
> > Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
> > special predicates and TImode RTL instead of hard-coded const
> > vectors
> > and UNSPECs.
> /* snip */
> > +(define_insn "simd_hw__"
> > +  [(set (match_operand: 0 "register_operand" "=f")
> > +   (addsub:
> > +     (vec_select:
> > +       (any_extend:
> 
> Does the order of any_extend affect the code generation?

I'm not sure but I think it makes sense to keep the select/extend order
consistent for LoongArch, thus I'll make any_extend out of vec_select in
the next version of the series.  I just didn't really notice the order
difference when I wrote this.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH 5/8] LoongArch: Simplify {lsx_,lasx_x}maddw description

2025-02-11 Thread Xi Ruoyao

On Tue, 2025-02-11 at 15:49 +0800, Lulu Cheng wrote:
> It seems that the title here is "{lsx_,lasx_x}vmaddw".

Will fix in v2.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Hongtao Liu

On Tue, Feb 11, 2025 at 4:27 PM H.J. Lu  wrote:
>
> On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu  wrote:
> >
> > > PR117081 is about regression in povray. The reducted testcase:
> > Just for clarification. PR117081 is not about regression in povray.
> > it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not
> > testl
> > The pr91384.c is added by r12-7417 which is peephole optimization
> > expecting some specific instruction sequence, the regression can be
> > fixed by adding new peephole pattern.
> >
> > H.J patch actually regressed povray by introducing extra push/pops
> > (since it adds preference for callee save registers, in the benchmark
> > using caller saved registers is much better).
> > Sorry, I may not have been clear in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9
> >
>
> My patch doesn't change the codegen for that code as shown by
Real benchmark scenarios are a little more complex, the testcase in
the PR is just one of the scenes, but not all.
We are currently investigating this case and hope to find a better solution.
>
> commit 846837c2406ae7a52d9123b29c13e4b8b9d14224
> Author: H.J. Lu 
> Date:   Fri Feb 7 13:49:30 2025 +0800
>
> x86: Verify that PUSH/POP can be skipped
>
>
> --
> H.J.



-- 
BR,
Hongtao

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread H.J. Lu

On Tue, Feb 11, 2025 at 4:38 PM Hongtao Liu  wrote:
>
> On Tue, Feb 11, 2025 at 4:27 PM H.J. Lu  wrote:
> >
> > On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu  wrote:
> > >
> > > > PR117081 is about regression in povray. The reducted testcase:
> > > Just for clarification. PR117081 is not about regression in povray.
> > > it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not
> > > testl
> > > The pr91384.c is added by r12-7417 which is peephole optimization
> > > expecting some specific instruction sequence, the regression can be
> > > fixed by adding new peephole pattern.
> > >
> > > H.J patch actually regressed povray by introducing extra push/pops
> > > (since it adds preference for callee save registers, in the benchmark
> > > using caller saved registers is much better).
> > > Sorry, I may not have been clear in
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9
> > >
> >
> > My patch doesn't change the codegen for that code as shown by
> Real benchmark scenarios are a little more complex, the testcase in
> the PR is just one of the scenes, but not all.
> We are currently investigating this case and hope to find a better solution.

We need testcases to make sure that there are no regressions.

> >
> > commit 846837c2406ae7a52d9123b29c13e4b8b9d14224
> > Author: H.J. Lu 
> > Date:   Fri Feb 7 13:49:30 2025 +0800
> >
> > x86: Verify that PUSH/POP can be skipped
> >
> >
> > --
> > H.J.
>
>
>
> --
> BR,
> Hongtao



-- 
H.J.

[PATCH 2/2] libcpp: Fix incorrect line information for traditional cpp and #include [PR100904]

2025-02-11 Thread Andrew Pinski

After r7-1651-gac81cf0b2bf5efdd7, the location for the error for #include would
be the location on the token. Except in traditional cpp, the location 
information
for directives is all messed up because first libcpp processes the directive 
line in traditional
and copies it to a new buffer and then does the lexing using the ISO lexer. 
This means the location
information for the tokens are wrong and should just grab the location of the 
directive line instead.
This patch does exactly that. Uses directive line location for traditional cpp 
when parsing the include.

Bootstrapped and tested on x86_64-linux-gnu.

PR preprocessor/100904

libcpp/ChangeLog:

* directives.cc (parse_include): Use the directive line location
for the location in traditional cpp mode instead of the location
of the token.

gcc/testsuite/ChangeLog:

* gcc.dg/cpp/missing-header-trad-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c | 10 ++
 libcpp/directives.cc |  9 -
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c

diff --git a/gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c 
b/gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c
new file mode 100644
index 000..d77cc5fe228
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c
@@ -0,0 +1,10 @@
+/* { dg-do preprocess } */
+/* { dg-options "-traditional-cpp" } */
+
+/* PR preprocessor/100904 */
+/* Make sure we error out on the correct line
+   for traditional cpp. */
+
+#include "nonexistent.h" /* { dg-error "-: nonexistent.h"  } */
+
+/* { dg-message "terminated" "terminated" { target *-*-* } 0 } */
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 9c0f77ab017..d4a5ab1cbec 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -841,7 +841,14 @@ parse_include (cpp_reader *pfile, int *pangle_brackets,
 
   /* Allow macro expansion.  */
   header = get_token_no_padding (pfile);
-  *location = header->src_loc;
+
+  /* The location for traditional is the directive line as the
+ token line information for the temporary buffer.  */
+  if (CPP_OPTION (pfile, traditional))
+*location = pfile->directive_line;
+  else
+*location = header->src_loc;
+
   if ((header->type == CPP_STRING && header->val.str.text[0] != 'R')
   || header->type == CPP_HEADER_NAME)
 {
-- 
2.43.0

Re: [PATCH v4] c++: Reject cdtors and conversion operators with a single * as return type [PR118306]

2025-02-11 Thread Jason Merrill


On 2/10/25 12:09 PM, Simon Martin wrote:

Hi Jason,

On 7 Feb 2025, at 23:10, Jason Merrill wrote:


On 2/7/25 4:04 PM, Simon Martin wrote:

Hi Jason,

On 7 Feb 2025, at 14:21, Jason Merrill wrote:


On 2/6/25 3:05 PM, Simon Martin wrote:

Hi Jason,

On 6 Feb 2025, at 16:48, Jason Merrill wrote:


On 2/5/25 2:21 PM, Simon Martin wrote:

Hi Jason,

On 4 Feb 2025, at 21:23, Jason Merrill wrote:


On 2/4/25 3:03 PM, Jason Merrill wrote:

On 2/4/25 11:45 AM, Simon Martin wrote:

On 4 Feb 2025, at 17:17, Jason Merrill wrote:


On 2/4/25 10:56 AM, Simon Martin wrote:

Hi Jason,

On 4 Feb 2025, at 16:39, Jason Merrill wrote:


On 1/15/25 9:56 AM, Jason Merrill wrote:

On 1/15/25 7:24 AM, Simon Martin wrote:

Hi,

On 14 Jan 2025, at 23:31, Jason Merrill wrote:


On 1/14/25 2:13 PM, Simon Martin wrote:

On 10 Jan 2025, at 19:10, Andrew Pinski wrote:

On Fri, Jan 10, 2025 at 3:18 AM Simon Martin

wrote:


We currently accept the following invalid code (EDG
and



MSVC
do
as
well)


clang does too:
https://github.com/llvm/llvm-project/issues/121706 .




Note it might be useful if a testcase with multiply



`*`
is
included



too:
```
struct A {
      A ();
};
```

Thanks, makes sense to add those. Done in the attached
updated
revision,
successfully tested on x86_64-pc-linux-gnu.



+/* Check that it's OK to declare a function at ID_LOC



with
the
indicated TYPE,
+   TYPE_QUALS and DECLARATOR.  SFK indicates the
kind
of
special
function (if
+   any) that this function is.  OPTYPE is the type
given
in
a
conversion
   operator declaration, or the class type
for
a
constructor/destructor.
   Returns the actual return type of the
function;
that
may
be
different
   than TYPE if an error occurs, or for
certain
special
functions.
*/
@@ -12361,8 +12362,19 @@
check_special_function_return_type
(special_function_kind sfk,
    tree
type,
    tree
optype,
    int
type_quals,
+    const
cp_declarator
*declarator,
+    location_t
id_loc,


id_loc should be the same as declarator->id_loc?

You’re right.


    const
location_t*
locations)
    {
+  /* If TYPE is unspecified, DECLARATOR, if set,
should
not
represent a pointer
+ or a reference type.  */



+  if (type == NULL_TREE
+  && declarator
+  && (declarator->kind == cdk_pointer
+  || declarator->kind == cdk_reference))
+    error_at (id_loc, "expected unqualified-id
before
%qs
token",
+  declarator->kind == cdk_pointer ?
"*"
:
"&");


...and id_loc isn't the location of the ptr-operator,



it's



the



location of the identifier, so this indicates the wrong
column.
I
think using declarator->id_loc makes sense, just not
pretending
it's
the location of the *.

Good catch, thanks.


Let's give diagnostics more like the others later in the



function
instead of trying to emulate cp_parser_error.

Makes sense. This is what the updated patch does,
successfully
tested on
x86_64-pc-linux-gnu. OK for GCC 16?


OK.


Does this also fix 118304?  If so, let's go ahead and
apply
it
to
GCC
15.

I have checked just now, and we still ICE for 118304’s



testcase
with
that fix.


Why doesn't the preeexisting

type = void_type_node;

in check_special_function_return_type fix the return type and



avoid



the ICE?



We hit the gcc_assert at method.cc:3593, that Marek’s fix



bypasses.


Yes, but why doesn't check_special_function_return_type prevent



that?


Ah, because we call it before walking the declarator.  We need
to
check again later, perhaps in grokfndecl, that the type is
correct.
Perhaps instead of your patch.

One “issue” with adding another check in or close to
grokfndecl
is
that DECLARATOR will have “been moved to the ID”, and the
fact
that
we had a CDK_POINTER kind is “lost”. We could obviously
somehow
propagate this information, but there might be something easier.


The information isn't lost: it's now reflected in the (wrong)
return



type.  One place it would make sense to check would be


   if (ctype && (sfk == sfk_constructor
 || sfk == sfk_destructor))
 {
   /* We are within a class's scope. If our
declarator
name
 is the same as the class name, and we are
defining

a
function, then it is a constructor/destructor, and
therefore
returns a void type.  */


Here 'type' is still the return type, we haven't gotten to
build_function_type yet.

That’s true. However, doesn’t it make sense to cram all the
checks
about the return type of special functions in
check_special_function_return_type, and return an error if that
return
type is invalid?


This error seems easily recoverable since we know what the type
needs
to be, there's no need for error return from

Re: [PATCH 4/8] LoongArch: Simplify {lsx_,lasx_x}hv{add,sub}w description

2025-02-11 Thread Lulu Cheng




在 2025/2/11 下午4:37, Xi Ruoyao 写道:

On Tue, 2025-02-11 at 15:48 +0800, Lulu Cheng wrote:

Hi,

   I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be
"{lsx_,lasx_x}vh{add,sub}w".

Indeed.


在 2025/2/7 下午8:09, Xi Ruoyao 写道:

Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const
vectors
and UNSPECs.

/* snip */

+(define_insn "simd_hw__"
+  [(set (match_operand: 0 "register_operand" "=f")
+   (addsub:
+     (vec_select:
+       (any_extend:

Does the order of any_extend affect the code generation?

I'm not sure but I think it makes sense to keep the select/extend order
consistent for LoongArch, thus I'll make any_extend out of vec_select in
the next version of the series.  I just didn't really notice the order
difference when I wrote this.
In the later stages, we will continue to test whether the order of 
select/extend affects performance.

Re: [PATCH] x86: Correct ASM_OUTPUT_SYMBOL_REF

2025-02-11 Thread H.J. Lu

On Tue, Feb 11, 2025 at 3:12 PM Uros Bizjak  wrote:
>
> On Tue, Feb 11, 2025 at 7:13 AM H.J. Lu  wrote:
> >
> > x is not a macro argument.  It just happens to work as final.cc passes
> > x for 2nd argument:
> >
> > final.cc:  ASM_OUTPUT_SYMBOL_REF (file, x);
> >
> > PR target/118825
> > * config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with
> > SYM.
>
> > -  = assemble_name_resolve (XSTR (x, 0)); \
> > +  = assemble_name_resolve (XSTR ((SYM), 0)); \
>
> No need for parenthesis when macro argument is used in a function call.
>
> OK with the above change.

Fixed.  Pushed.  Will backport it to release branches later.

Thanks.

-- 
H.J.

Re: GCN, nvptx: 'sorry, unimplemented: exception handling not supported'

2025-02-11 Thread Thomas Schwinge

Hi!

On 2025-02-08T13:17:55+0100, I wrote:
> pushed to trunk branch commit 6312165650091a4df34668d8e2aaa0bbc4008a66
> "GCN, nvptx: 'sorry, unimplemented: exception handling not supported'"

> For GCN, this avoids ICEs further down the compilation pipeline.

For the record, in case that's helpful later on, here's a note from
~2023-04:

| Before [...], we got a lot of ICEs in 'g++' testing for '-fexceptions' etc.
| For example, 'g++.dg/pr49847.C':
| 
| $ build-gcc/gcc/xg++ -Bbuild-gcc/gcc/ 
source-gcc/gcc/testsuite/g++.dg/pr49847.C -std=gnu++98 -O -fnon-call-exceptions 
-Wno-return-type -S -o pr49847.s
| during RTL pass: jump
| source-gcc/gcc/testsuite/g++.dg/pr49847.C: In function ‘int f(float)’:
| source-gcc/gcc/testsuite/g++.dg/pr49847.C:7:1: internal compiler error: 
Segmentation fault
| 7 | }
|   | ^
| 0x1216aaf crash_signal
| [...]/gcc/toplev.cc:314
| 0x1ef5d23 count_reg_usage
| [...]/gcc/cse.cc:6757
| 0x1ef5f0a count_reg_usage
| [...]/gcc/cse.cc:6797
| 0x1efba4c delete_trivially_dead_insns(rtx_insn*, int)
| [...]/gcc/cse.cc:7028
| 0x1ea4e36 execute
| [...]/gcc/cfgcleanup.cc:3237
| 
| ..., and likewise for a lot of other 'g++' test cases, but also
| 'gcc.dg/pr104464.c', 'gcc.dg/uninit-pr106881.c', 'gcc.dg/torture/pr105484.c'.
| 
| The SIGSEGV is due to 'REGNO ("reg:DI -1") == INVALID_REGNUM', and
| 'INVALID_REGNUM == (~(unsigned int) 0)', which doesn't work in
| 'count_reg_usage':
| 
| counts[REGNO (x)] += incr;
| 
| Trying to work around this locally is not sufficient; further ICEs down the
| line.
| 
| The 'INVALID_REGNUM' is due to 'gcc/defaults.h:EH_RETURN_DATA_REGNO'.


Grüße
 Thomas

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread H.J. Lu

On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu  wrote:
>
> > PR117081 is about regression in povray. The reducted testcase:
> Just for clarification. PR117081 is not about regression in povray.
> it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not
> testl
> The pr91384.c is added by r12-7417 which is peephole optimization
> expecting some specific instruction sequence, the regression can be
> fixed by adding new peephole pattern.
>
> H.J patch actually regressed povray by introducing extra push/pops
> (since it adds preference for callee save registers, in the benchmark
> using caller saved registers is much better).
> Sorry, I may not have been clear in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9
>

My patch doesn't change the codegen for that code as shown by

commit 846837c2406ae7a52d9123b29c13e4b8b9d14224
Author: H.J. Lu 
Date:   Fri Feb 7 13:49:30 2025 +0800

x86: Verify that PUSH/POP can be skipped


-- 
H.J.

[PATCH 1/2] libcpp: Fix handling of `deferred` pragmas with -traditional [PR79516]

2025-02-11 Thread Andrew Pinski

The problem here is with deferred pragmas, libcpp would inject a
PRAGMA_EOL before the end of the new line in the tokens stream
but traditional cpp path does not use that path except when
dealing with directives. In this case we call out to handle `#if`
directive and that token got added due to the change of line #.
So at the end of a directive, we need to set in_deferred_pragma to
false as traditional cpp path handles the new line itself.

Bootstrapped and tested on x86_64-linux.

PR preprocessor/79516

libcpp/ChangeLog:

* directives.cc (end_directive): Also
set in_deferred_pragma to false with traditional cpp.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/pragma-message-trad.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c | 9 +
 libcpp/directives.cc | 2 ++
 2 files changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c

diff --git a/gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c 
b/gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c
new file mode 100644
index 000..0478e6fc7c7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c
@@ -0,0 +1,9 @@
+/* { dg-do preprocess } */
+/* { dg-options "-traditional-cpp" } */
+/* PR preprocessor/79516 */
+
+#pragma message "OK"
+
+#if 0
+#pragma message ("Not printed")
+#endif
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 6b0d691f491..9c0f77ab017 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -323,6 +323,8 @@ end_directive (cpp_reader *pfile, int skip_line)
   /* Revert change of prepare_directive_trad.  */
   if (!pfile->state.in_deferred_pragma)
pfile->state.prevent_expansion--;
+  /* No longer inside a deferred pragma. */
+  pfile->state.in_deferred_pragma = false;
 
   if (pfile->directive != &dtable[T_DEFINE])
_cpp_remove_overlay (pfile);
-- 
2.43.0

[PATCH] s390: Fix s390_valid_shift_count() for TI mode [PR118835]

2025-02-11 Thread Stefan Schulze Frielinghaus

During combine we may end up with

(set (reg:DI 66 [ _6 ])
 (ashift:DI (reg:DI 72 [ x ])
(subreg:QI (and:TI (reg:TI 67 [ _1 ])
   (const_wide_int 0x0aabf))
   15)))

where the shift count operand does not trivially fit the scheme of
address operands.  Reject those operands, especially since
strip_address_mutations() expects expressions of the form
(and ... (const_int ...)) and fails for (and ... (const_wide_int ...)).

While on it, fix indentation of the if block.

gcc/ChangeLog:

PR target/118835
* config/s390/s390.cc (s390_valid_shift_count): Reject shift
count operands which do not trivially fit the scheme of
address operands.

gcc/testsuite/ChangeLog:

* gcc.target/s390/pr118835.c: New test.
---
 Bootstrap and regtest are still running.  Assuming they finish without
 regressions and there are no further comments, I will push this.

 gcc/config/s390/s390.cc  | 37 ++--
 gcc/testsuite/gcc.target/s390/pr118835.c | 21 ++
 2 files changed, 43 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr118835.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 1d96df49fea..c2636c54613 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -3510,26 +3510,33 @@ s390_valid_shift_count (rtx op, HOST_WIDE_INT 
implicit_mask)
 
   /* Check for an and with proper constant.  */
   if (GET_CODE (op) == AND)
-  {
-rtx op1 = XEXP (op, 0);
-rtx imm = XEXP (op, 1);
+{
+  rtx op1 = XEXP (op, 0);
+  rtx imm = XEXP (op, 1);
 
-if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1))
-  op1 = XEXP (op1, 0);
+  if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1))
+   op1 = XEXP (op1, 0);
 
-if (!(register_operand (op1, GET_MODE (op1)) || GET_CODE (op1) == PLUS))
-  return false;
+  if (!(register_operand (op1, GET_MODE (op1)) || GET_CODE (op1) == PLUS))
+   return false;
 
-if (!immediate_operand (imm, GET_MODE (imm)))
-  return false;
+  if (!immediate_operand (imm, GET_MODE (imm)))
+   return false;
 
-HOST_WIDE_INT val = INTVAL (imm);
-if (implicit_mask > 0
-   && (val & implicit_mask) != implicit_mask)
-  return false;
+  /* Reject shift count operands which do not trivially fit the scheme of
+address operands.  Especially since strip_address_mutations() expects
+expressions of the form (and ... (const_int ...)) and fails for
+(and ... (const_wide_int ...)).  */
+  if (CONST_WIDE_INT_P (imm))
+   return false;
 
-op = op1;
-  }
+  HOST_WIDE_INT val = INTVAL (imm);
+  if (implicit_mask > 0
+ && (val & implicit_mask) != implicit_mask)
+   return false;
+
+  op = op1;
+}
 
   /* Check the rest.  */
   return s390_decompose_addrstyle_without_index (op, NULL, NULL);
diff --git a/gcc/testsuite/gcc.target/s390/pr118835.c 
b/gcc/testsuite/gcc.target/s390/pr118835.c
new file mode 100644
index 000..1ca6cd95543
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr118835.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+/* During combine we may end up with patterns of the form
+
+   (set (reg:DI 66 [ _6 ])
+(ashift:DI (reg:DI 72 [ x ])
+   (subreg:QI (and:TI (reg:TI 67 [ _1 ])
+  (const_wide_int 0x0aabf))
+  15)))
+
+   which should be rejected since the shift count does not trivially fit the
+   scheme of address operands.  */
+
+long
+test (long x, int y)
+{
+  __int128 z = 0xAABF;
+  z &= y;
+  return x << z;
+}
-- 
2.47.0

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-11 Thread Jeff Law





On 2/11/25 9:08 AM, Richard Sandiford wrote:

Jeff Law  writes:

On 2/7/25 5:59 AM, Andrew Waterman wrote:

This patch runs counter to the ABI spec, which states that vxrm is not
preserved across calls and is volatile upon function entry [1].  vxrm
does not play the same role as frm plays in the calling convention.
(I won't get into the rationale in this email, but the rationale isn't
especially important: we should follow the ABI.)

[1] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120

Pan's patch doesn't change the basic property that VXRM has no known
state at function entry or upon return from a function call.


I think it will.  global_regs[X] means that X is defined on entry,
defined on exit, and can be changed by calls.  If the register is
call-clobbered/volatile/caller-saved, then I agree with Andrew that
this doesn't look like the right fix.
But the LCM code we use to manage vxrm assignments makes no assumption 
about incoming state and assumes no state is preserved across calls.


Essentially
jeff

[PATCH] c++: Fix up regressions caused by for/while loops with declarations [PR118822]

2025-02-11 Thread Jakub Jelinek

Hi!

The recent PR86769 r15-7426 changes regressed the following two testcases,
the first one is more important as it is derived from real-world code.

The first problem is that the chosen
prep = do_pushlevel (sk_block);
// emit something
body = push_stmt_list ();
// emit further stuff
body = pop_stmt_list (body);
prep = do_poplevel (prep);
way of constructing the {FOR,WHILE}_COND_PREP and {FOR,WHILE}_BODY
isn't reliable.  If during parsing a label is seen in the body and then
some decl with destructors, sk_cleanup transparent scope is added, but
the correspondiong result from push_stmt_list is saved in
*current_binding_level and pop_stmt_list then pops even that statement list
but only do_poplevel actually attempts to pop the sk_cleanup scope and so we
ICE.
The reason for not doing do_pushlevel (sk_block); do_pushlevel (sk_block);
is that variables should be in the same scope (otherwise various e.g.
redeclaration*.C tests FAIL) and doing do_pushlevel (sk_block); do_pushlevel
(sk_cleanup); wouldn't work either as do_poplevel would silently unwind even
the cleanup one.

The second problem is that my assumption that the declaration in the
condition will have zero or one cleanup is just wrong, at least for
structured bindings used as condition, there can be as many cleanups as
there are names in the binding + 1.

So, the following patch changes the earlier approach.  Nothing is removed
from the {FOR,WHILE}_COND_PREP subtrees while doing adjust_loop_decl_cond,
push_stmt_list isn't called either; all it does is remember as an integer
the number of cleanups (CLEANUP_STMT at the end of the STATEMENT_LISTs)
from querying stmt_list_stack and finding the initial *body_p in there
(that integer is stored into {FOR,WHILE}_COND_CLEANUP), and temporarily
{FOR,WHILE}_BODY is set to the last statement (if any) in the innermost
STATEMENT_LIST at the adjust_loop_decl_cond time; then at
finish_{for,while}_stmt a new finish_loop_cond_prep routine takes care of
do_poplevel for the scope (which is in {FOR,WHILE}_COND_PREP) and finds
given {FOR,WHILE}_COND_CLEANUP number and {FOR,WHILE}_BODY tree the right
spot where body statements start and moves that into {FOR,WHILE}_BODY.
Finally genericize_c_loop then inserts the cond, body, continue label, expr
into the right subtree of {FOR,WHILE}_COND_PREP.
The constexpr evaluation unfortunately had to be changed as well, because
we don't want to evaluate everything in BIND_EXPR_BODY (*_COND_PREP ())
right away, we want to evaluate it with the exception of the CLEANUP_STMT
cleanups at the end (given {FOR,WHILE}_COND_CLEANUP levels), and defer
the evaluation of the cleanups until after cond, body, expr are evaluated.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-02-11  Jakub Jelinek  

PR c++/118822
PR c++/118833
gcc/c-family/
* c-common.h (WHILE_COND_CLEANUP): Change description in comment.
(FOR_COND_CLEANUP): Likewise.
* c-gimplify.cc (genericize_c_loop): Adjust for COND_CLEANUP
being CLEANUP_STMT/TRY_FINALLY_EXPR trailing nesting depth
instead of actual cleanup.
gcc/cp/
* semantics.cc (adjust_loop_decl_cond): Allow multiple trailing
CLEANUP_STMT levels in *BODY_P.  Set *CLEANUP_P to the number
of levels rather than one particular cleanup, keep the cleanups
in *PREP_P.  Set *BODY_P to the last stmt in the cur_stmt_list
or NULL if *CLEANUP_P and the innermost cur_stmt_list is empty.
(finish_loop_cond_prep): New function.
(finish_while_stmt, finish_for_stmt): Use it.  Don't call
set_one_cleanup_loc.
* constexpr.cc (cxx_eval_loop_expr): Adjust handling of
{FOR,WHILE}_COND_{PREP,CLEANUP}.
gcc/testsuite/
* g++.dg/expr/for9.C: New test.
* g++.dg/cpp26/decomp12.C: New test.

--- gcc/c-family/c-common.h.jj  2025-02-07 17:06:50.777235245 +0100
+++ gcc/c-family/c-common.h 2025-02-11 12:12:13.034861256 +0100
@@ -1518,7 +1518,8 @@ extern tree build_userdef_literal (tree
 
 /* WHILE_STMT accessors.  These give access to the condition of the
while statement, the body, and name of the while statement, and
-   condition preparation statements and its cleanup, respectively.  */
+   condition preparation statements and number of its nested cleanups,
+   respectively.  */
 #define WHILE_COND(NODE)   TREE_OPERAND (WHILE_STMT_CHECK (NODE), 0)
 #define WHILE_BODY(NODE)   TREE_OPERAND (WHILE_STMT_CHECK (NODE), 1)
 #define WHILE_NAME(NODE)   TREE_OPERAND (WHILE_STMT_CHECK (NODE), 2)
@@ -1533,7 +1534,8 @@ extern tree build_userdef_literal (tree
 
 /* FOR_STMT accessors.  These give access to the init statement,
condition, update expression, body and name of the for statement,
-   and condition preparation statements and its cleanup, respectively.  */
+   and condition preparation statements and number of its nested cleanups,
+   respectively.  */
 #define FOR_INIT_STMT(NODE)TREE_OPERAND (FOR_STMT_CHECK (NO

[PATCH] c++: Apply/diagnose attributes when instatiating ARRAY/POINTER/REFERENCE_TYPE [PR118787]

2025-02-11 Thread Jakub Jelinek

Hi!

The following testcase IMO in violation of the P2552R3 paper doesn't
pedwarn on alignas applying to dependent types or alignas with dependent
argument.

tsubst was just ignoring TYPE_ATTRIBUTES.

The following patch fixes it for the POINTER/REFERENCE_TYPE and
ARRAY_TYPE cases, but perhaps we need to do the same also for other
types (INTEGER_TYPE/REAL_TYPE and the like).  I guess I'll need to
construct more testcases.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-02-11  Jakub Jelinek  

PR c++/118787
* pt.cc (tsubst) : Use return t; only if it doesn't
have any TYPE_ATTRIBUTES.  Call apply_late_template_attributes.
: Likewise.  Formatting fix.

* g++.dg/cpp0x/alignas22.C: New test.

--- gcc/cp/pt.cc.jj 2025-02-07 17:03:13.560227281 +0100
+++ gcc/cp/pt.cc2025-02-10 17:17:47.65131 +0100
@@ -16854,7 +16854,9 @@ tsubst (tree t, tree args, tsubst_flags_
 case POINTER_TYPE:
 case REFERENCE_TYPE:
   {
-   if (type == TREE_TYPE (t) && TREE_CODE (type) != METHOD_TYPE)
+   if (type == TREE_TYPE (t)
+   && TREE_CODE (type) != METHOD_TYPE
+   && TYPE_ATTRIBUTES (t) == NULL_TREE)
  return t;
 
/* [temp.deduct]
@@ -16924,9 +16926,9 @@ tsubst (tree t, tree args, tsubst_flags_
 A,' while an attempt to create the type type rvalue reference to
 cv T' creates the type T"
  */
- r = cp_build_reference_type
- (TREE_TYPE (type),
-  TYPE_REF_IS_RVALUE (t) && TYPE_REF_IS_RVALUE (type));
+ r = cp_build_reference_type (TREE_TYPE (type),
+  TYPE_REF_IS_RVALUE (t)
+  && TYPE_REF_IS_RVALUE (type));
else
  r = cp_build_reference_type (type, TYPE_REF_IS_RVALUE (t));
r = cp_build_qualified_type (r, cp_type_quals (t), complain);
@@ -16935,6 +16937,11 @@ tsubst (tree t, tree args, tsubst_flags_
  /* Will this ever be needed for TYPE_..._TO values?  */
  layout_type (r);
 
+   if (!apply_late_template_attributes (&r, TYPE_ATTRIBUTES (t),
+/*flags=*/0,
+args, complain, in_decl))
+ return error_mark_node;
+
return r;
   }
 case OFFSET_TYPE:
@@ -17009,7 +17016,9 @@ tsubst (tree t, tree args, tsubst_flags_
 
/* As an optimization, we avoid regenerating the array type if
   it will obviously be the same as T.  */
-   if (type == TREE_TYPE (t) && domain == TYPE_DOMAIN (t))
+   if (type == TREE_TYPE (t)
+   && domain == TYPE_DOMAIN (t)
+   && TYPE_ATTRIBUTES (t) == NULL_TREE)
  return t;
 
/* These checks should match the ones in create_array_type_for_decl.
@@ -17048,6 +17057,11 @@ tsubst (tree t, tree args, tsubst_flags_
TYPE_USER_ALIGN (r) = 1;
  }
 
+   if (!apply_late_template_attributes (&r, TYPE_ATTRIBUTES (t),
+/*flags=*/0,
+args, complain, in_decl))
+ return error_mark_node;
+
return r;
   }
 
--- gcc/testsuite/g++.dg/cpp0x/alignas22.C.jj   2025-02-10 17:33:16.242452750 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/alignas22.C  2025-02-10 17:36:28.739046629 
+0100
@@ -0,0 +1,23 @@
+// PR c++/118787
+// { dg-do compile { target c++11 } }
+// { dg-options "-pedantic" }
+
+template 
+void foo (T & alignas (N));// { dg-warning "'alignas' on a type 
other than class" }
+template 
+void bar (T (&)[N] alignas (N));   // { dg-warning "'alignas' on a type 
other than class" }
+template 
+using U = T * alignas (N); // { dg-warning "'alignas' on a type 
other than class" }
+template 
+using V = T[N] alignas (N);// { dg-warning "'alignas' on a type 
other than class" }
+
+void
+baz ()
+{
+  int x alignas (4) = 0;
+  foo  (x);
+  int y alignas (4) [4];
+  bar  (y);
+  U  u;
+  V  v;
+}

Jakub

[PATCH 3/3] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-11 Thread Lulu Cheng

target/PR118828

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse):
Update the predefined macros.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118828.c: New test.

Change-Id: I13f7b44b11bba2080db797157a0389cc1bd65ac6
---
 gcc/config/loongarch/loongarch-c.cc   | 14 
 gcc/testsuite/gcc.target/loongarch/pr118828.c | 34 +++
 2 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 9fe911325ab..83df82c1361 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "c-family/c-common.h"
 #include "cpplib.h"
+#include "c-family/c-pragma.h"
 #include "tm_p.h"
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
@@ -203,6 +204,19 @@ loongarch_pragma_target_parse (tree args, tree pop_target)
 
   loongarch_reset_previous_fndecl ();
 
+  /* For the definitions, ensure all newly defined macros are considered
+ as used for -Wunused-macros.  There is no point warning about the
+ compiler predefined macros.  */
+  cpp_options *cpp_opts = cpp_get_options (parse_in);
+  unsigned char saved_warn_unused_macros = cpp_opts->warn_unused_macros;
+  cpp_opts->warn_unused_macros = 0;
+
+  cpp_force_token_locations (parse_in, BUILTINS_LOCATION);
+  loongarch_update_cpp_builtins (parse_in);
+  cpp_stop_forcing_token_locations (parse_in);
+
+  cpp_opts->warn_unused_macros = saved_warn_unused_macros;
+
   /* If we're popping or reseting make sure to update the globals so that
  the optab availability predicates get recomputed.  */
   if (pop_target)
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828.c
new file mode 100644
index 000..abdda24c758
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828.c
@@ -0,0 +1,34 @@
+/* { dg-do preprocess } */
+/* { dg-options "-mno-lasx" } */
+
+#ifdef __loongarch_asx
+#error LASX should not be available here
+#endif
+
+#ifdef __loongarch_simd_width
+#if __loongarch_simd_width == 256
+#error simd width shuold not be 256
+#endif
+#endif
+
+#pragma GCC push_options
+#pragma GCC target("lasx")
+#ifndef __loongarch_asx
+#error LASX should be available here
+#endif
+#ifndef __loongarch_simd_width
+#error simd width should be available here
+#elif __loongarch_simd_width != 256
+#error simd width should be 256
+#endif
+#pragma GCC pop_options
+
+#ifdef __loongarch_asx
+#error LASX should become unavailable again
+#endif
+
+#ifdef __loongarch_simd_width
+#if __loongarch_simd_width == 256
+#error simd width shuold not be 256 again
+#endif
+#endif
-- 
2.34.1

[PATCH 2/3] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

2025-02-11 Thread Lulu Cheng

Split the implementation of the function loongarch_cpu_cpp_builtins into two 
parts:
  1. Macro definitions that do not change (only considering 64-bit architecture)
  2. Macro definitions that change with different compilation options.

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (builtin_undef): New macro.
(loongarch_cpu_cpp_builtins): Split to loongarch_update_cpp_builtins
and loongarch_define_unconditional_macros.
(loongarch_def_or_undef): New functions.
(loongarch_define_unconditional_macros): Likewise.
(loongarch_update_cpp_builtins): Likewise.

Change-Id: Ifae73ffa2a07a595ed2a7f6ab7b82d8f51328a2a
---
 gcc/config/loongarch/loongarch-c.cc | 109 +---
 1 file changed, 66 insertions(+), 43 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 5d8c02e094b..9fe911325ab 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -31,13 +31,21 @@ along with GCC; see the file COPYING3.  If not see
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
 #define builtin_define(TXT) cpp_define (pfile, TXT)
+#define builtin_undef(TXT) cpp_undef (pfile, TXT)
 #define builtin_assert(TXT) cpp_assert (pfile, TXT)
 
-void
-loongarch_cpu_cpp_builtins (cpp_reader *pfile)
+static void
+loongarch_def_or_undef (bool def_p, const char *macro, cpp_reader *pfile)
+{
+  if (def_p)
+cpp_define (pfile, macro);
+  else
+cpp_undef (pfile, macro);
+}
+
+static void
+loongarch_define_unconditional_macros (cpp_reader *pfile)
 {
-  builtin_assert ("machine=loongarch");
-  builtin_assert ("cpu=loongarch");
   builtin_define ("__loongarch__");
 
   builtin_define_with_value ("__loongarch_arch",
@@ -66,45 +74,6 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__loongarch_lp64");
 }
 
-  /* These defines reflect the ABI in use, not whether the
- FPU is directly accessible.  */
-  if (TARGET_DOUBLE_FLOAT_ABI)
-builtin_define ("__loongarch_double_float=1");
-  else if (TARGET_SINGLE_FLOAT_ABI)
-builtin_define ("__loongarch_single_float=1");
-
-  if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI)
-builtin_define ("__loongarch_hard_float=1");
-  else
-builtin_define ("__loongarch_soft_float=1");
-
-
-  /* ISA Extensions.  */
-  if (TARGET_DOUBLE_FLOAT)
-builtin_define ("__loongarch_frlen=64");
-  else if (TARGET_SINGLE_FLOAT)
-builtin_define ("__loongarch_frlen=32");
-  else
-builtin_define ("__loongarch_frlen=0");
-
-  if (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE)
-builtin_define ("__loongarch_frecipe");
-
-  if (ISA_HAS_LSX)
-{
-  builtin_define ("__loongarch_simd");
-  builtin_define ("__loongarch_sx");
-
-  if (!ISA_HAS_LASX)
-   builtin_define ("__loongarch_simd_width=128");
-}
-
-  if (ISA_HAS_LASX)
-{
-  builtin_define ("__loongarch_asx");
-  builtin_define ("__loongarch_simd_width=256");
-}
-
   /* ISA evolution features */
   int max_v_major = 1, max_v_minor = 0;
 
@@ -145,7 +114,61 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define_with_int_value ("_LOONGARCH_SZPTR", POINTER_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_FPSET", 32);
   builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32);
+}
+
+static void
+loongarch_update_cpp_builtins (cpp_reader *pfile)
+{
+  builtin_undef ("__loongarch_double_float");
+  builtin_undef ("__loongarch_single_float");
+  /* These defines reflect the ABI in use, not whether the
+ FPU is directly accessible.  */
+  if (TARGET_DOUBLE_FLOAT_ABI)
+builtin_define ("__loongarch_double_float=1");
+  else if (TARGET_SINGLE_FLOAT_ABI)
+builtin_define ("__loongarch_single_float=1");
+
+  builtin_undef ("__loongarch_soft_float");
+  builtin_undef ("__loongarch_hard_float");
+  if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI)
+builtin_define ("__loongarch_hard_float=1");
+  else
+builtin_define ("__loongarch_soft_float=1");
+
+
+  /* ISA Extensions.  */
+  if (TARGET_DOUBLE_FLOAT)
+builtin_define ("__loongarch_frlen=64");
+  else if (TARGET_SINGLE_FLOAT)
+builtin_define ("__loongarch_frlen=32");
+  else
+builtin_define ("__loongarch_frlen=0");
+
+  loongarch_def_or_undef (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE,
+ "__loongarch_frecipe", pfile);
+
+  loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_simd", pfile);
+  loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_sx", pfile);
+  loongarch_def_or_undef (ISA_HAS_LASX, "__loongarch_asx", pfile);
+
+  builtin_undef ("__loongarch_simd_width");
+  if (ISA_HAS_LSX)
+{
+  if (ISA_HAS_LASX)
+   builtin_define ("__loongarch_simd_width=256");
+  else
+   builtin_define ("__loongarch_simd_width=128");
+}
+}
+
+void
+loongarch_cpu_cpp_builtins (cpp_reader *pfile)
+{
+  builtin_assert ("machine=loongarch");
+  builtin_assert ("cpu=loongarch");
 
+  loongarch_define_unc

[PATCH 0/3] Organize the code and fix PR118828.

2025-02-11 Thread Lulu Cheng

Refer to the implementation of aarch64 to fix PR118828.

Lulu Cheng (3):
  LoongArch: Move the function loongarch_register_pragmas to
loongarch-c.cc.
  LoongArch: Split the function loongarch_cpu_cpp_builtins into two
functions.
  LoongArch: After setting the compilation options, update the
predefined macros.

 gcc/config/loongarch/loongarch-c.cc   | 174 +-
 gcc/config/loongarch/loongarch-protos.h   |   1 +
 gcc/config/loongarch/loongarch-target-attr.cc |  48 -
 gcc/testsuite/gcc.target/loongarch/pr118828.c |  34 
 4 files changed, 166 insertions(+), 91 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c

-- 
2.34.1

[PATCH 1/3] LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc.

2025-02-11 Thread Lulu Cheng

gcc/ChangeLog:

* config/loongarch/loongarch-target-attr.cc
(loongarch_pragma_target_parse): Move to ...
(loongarch_register_pragmas): Move to ...
* config/loongarch/loongarch-c.cc
(loongarch_pragma_target_parse): ... here.
(loongarch_register_pragmas): ... here.
* config/loongarch/loongarch-protos.h
(loongarch_process_target_attr): Function Declaration.

Change-Id: I12751a6ce2f1b2f587699db3c80188066f193d2d
---
 gcc/config/loongarch/loongarch-c.cc   | 51 +++
 gcc/config/loongarch/loongarch-protos.h   |  1 +
 gcc/config/loongarch/loongarch-target-attr.cc | 48 -
 3 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index c95c0f373be..5d8c02e094b 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -23,9 +23,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "target.h"
 #include "tm.h"
 #include "c-family/c-common.h"
 #include "cpplib.h"
+#include "tm_p.h"
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
 #define builtin_define(TXT) cpp_define (pfile, TXT)
@@ -145,3 +147,52 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32);
 
 }
+
+/* Hook to validate the current #pragma GCC target and set the state, and
+   update the macros based on what was changed.  If ARGS is NULL, then
+   POP_TARGET is used to reset the options.  */
+
+static bool
+loongarch_pragma_target_parse (tree args, tree pop_target)
+{
+  /* If args is not NULL then process it and setup the target-specific
+ information that it specifies.  */
+  if (args)
+{
+  if (!loongarch_process_target_attr (args, NULL))
+   return false;
+
+  loongarch_option_override_internal (&la_target,
+ &global_options,
+ &global_options_set);
+}
+
+  /* args is NULL, restore to the state described in pop_target.  */
+  else
+{
+  pop_target = pop_target ? pop_target : target_option_default_node;
+  cl_target_option_restore (&global_options, &global_options_set,
+   TREE_TARGET_OPTION (pop_target));
+}
+
+  target_option_current_node
+= build_target_option_node (&global_options, &global_options_set);
+
+  loongarch_reset_previous_fndecl ();
+
+  /* If we're popping or reseting make sure to update the globals so that
+ the optab availability predicates get recomputed.  */
+  if (pop_target)
+loongarch_save_restore_target_globals (pop_target);
+
+  return true;
+}
+
+/* Implement REGISTER_TARGET_PRAGMAS.  */
+
+void
+loongarch_register_pragmas (void)
+{
+  /* Update pragma hook to allow parsing #pragma GCC target.  */
+  targetm.target_option.pragma_parse = loongarch_pragma_target_parse;
+}
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index 94d3e33cb9a..9659d5ae26e 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -222,4 +222,5 @@ extern void loongarch_save_restore_target_globals (tree 
new_tree);
 extern void loongarch_register_pragmas (void);
 extern rtx loongarch_gen_stepped_int_parallel (unsigned int nelts, int base,
   int step);
+extern bool loongarch_process_target_attr (tree args, tree fndecl);
 #endif /* ! GCC_LOONGARCH_PROTOS_H */
diff --git a/gcc/config/loongarch/loongarch-target-attr.cc 
b/gcc/config/loongarch/loongarch-target-attr.cc
index cee7031ca1e..cb537446dff 100644
--- a/gcc/config/loongarch/loongarch-target-attr.cc
+++ b/gcc/config/loongarch/loongarch-target-attr.cc
@@ -422,51 +422,3 @@ loongarch_option_valid_attribute_p (tree fndecl, tree, 
tree args, int)
   return ret;
 }
 
-/* Hook to validate the current #pragma GCC target and set the state, and
-   update the macros based on what was changed.  If ARGS is NULL, then
-   POP_TARGET is used to reset the options.  */
-
-static bool
-loongarch_pragma_target_parse (tree args, tree pop_target)
-{
-  /* If args is not NULL then process it and setup the target-specific
- information that it specifies.  */
-  if (args)
-{
-  if (!loongarch_process_target_attr (args, NULL))
-   return false;
-
-  loongarch_option_override_internal (&la_target,
- &global_options,
- &global_options_set);
-}
-
-  /* args is NULL, restore to the state described in pop_target.  */
-  else
-{
-  pop_target = pop_target ? pop_target : target_option_default_node;
-  cl_target_option_restore (&global_options, &global_options_set,
-   TREE_TARGET_OPTION (pop_target));
-}
-
-  target_optio

[pushed: r15-7474] sarif-replay: fix off-by-one in handling of "endColumn" (§3.30.8) [PR118792]

2025-02-11 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-7474-ge8c5013b6b7820.

gcc/ChangeLog:
PR sarif-replay/118792
* libsarifreplay.cc (sarif_replayer::handle_region_object): Fix
off-by-one in handling of endColumn property so that the code
matches the comment and the SARIF spec (§3.30.8).

gcc/testsuite/ChangeLog:
PR sarif-replay/118792
* sarif-replay.dg/2.1.0-valid/error-with-note.sarif: Update
expected output to reflect fix to off-by-one error in handling of
"endColumn" property.
* sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif: Likewise.
* sarif-replay.dg/2.1.0-valid/signal-1.c.moved.sarif: Likewise.
* sarif-replay.dg/2.1.0-valid/signal-1.c.sarif: Likewise.

Signed-off-by: David Malcolm 
---
 gcc/libsarifreplay.cc |  2 +-
 .../2.1.0-valid/error-with-note.sarif |  4 ++--
 .../2.1.0-valid/malloc-vs-local-4.c.sarif | 24 +--
 .../2.1.0-valid/signal-1.c.moved.sarif| 14 +--
 .../2.1.0-valid/signal-1.c.sarif  | 14 +--
 5 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/gcc/libsarifreplay.cc b/gcc/libsarifreplay.cc
index 61d9565588e..71f80797926 100644
--- a/gcc/libsarifreplay.cc
+++ b/gcc/libsarifreplay.cc
@@ -1739,7 +1739,7 @@ handle_region_object (const json::object ®ion_obj,
  /* SARIF's endColumn is 1 beyond the final column in the region,
 whereas GCC's end columns are inclusive.  */
  end = m_output_mgr.new_location_from_file_line_column
-   (file, end_line, end_column_jnum->get ());
+   (file, end_line, end_column_jnum->get () - 1);
}
   else
{
diff --git a/gcc/testsuite/sarif-replay.dg/2.1.0-valid/error-with-note.sarif 
b/gcc/testsuite/sarif-replay.dg/2.1.0-valid/error-with-note.sarif
index 0d75a693cdf..77d5a4ee181 100644
--- a/gcc/testsuite/sarif-replay.dg/2.1.0-valid/error-with-note.sarif
+++ b/gcc/testsuite/sarif-replay.dg/2.1.0-valid/error-with-note.sarif
@@ -26,12 +26,12 @@
 /* { dg-begin-multiline-output "" }
 /this/does/not/exist/test.bas:2:8: error: 'GOTO' is considered harmful
 2 |GOTO label
-  |^~  
+  |^~ 
{ dg-end-multiline-output "" } */
 /* { dg-begin-multiline-output "" }
 /this/does/not/exist/test.bas:1:1: note: this is the target of the 'GOTO'
 1 | label: PRINT "hello world!"
-  | ^~
+  | ^
{ dg-end-multiline-output "" } */
 
 // TODO: trailing [error]
diff --git 
a/gcc/testsuite/sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif 
b/gcc/testsuite/sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif
index 55c646bb5ad..947d65c6a7e 100644
--- a/gcc/testsuite/sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif
+++ b/gcc/testsuite/sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif
@@ -339,37 +339,37 @@
 In function 'callee_1':
 /not/a/real/path/malloc-vs-local-4.c:5:3: warning: dereference of 
possibly-NULL ‘ptr’ [-Wanalyzer-possible-null-dereference]
 5 |   *ptr = 42;
-  |   ^~
+  |   ^
   'test_1': events 1-5
 |
 |8 | int test_1 (int i, int flag)
-|  | ^~~
+|  | ^~
 |  | |
 |  | (1) entry to ‘test_1’
 |..
 |   12 |   if (flag)
-|  |  ~~
+|  |  ~
 |  |  |
 |  |  (2) following ‘true’ branch (when ‘flag != 0’)...
 |   13 | ptr = (int *)malloc (sizeof (int));
-|  |  ~~
+|  |  ~
 |  |  |
 |  |  (3) ...to here
 |  |  (4) this call could return NULL
 |   14 |   callee_1 (ptr);
-|  |   ~~~
+|  |   ~~
 |  |   |
 |  |   (5) calling ‘callee_1’ from ‘test_1’
 |
 +--> 'callee_1': events 6-7
|
|3 | void __attribute__((noinline)) callee_1 (int *ptr)
-   |  |^
+   |  |^~~~
|  ||
|  |(6) entry to ‘callee_1’
|4 | {
|5 |   *ptr = 42;
-   |  |   ~~
+   |  |   ~ 
|  |   |
|  |   (7) ‘ptr’ could be NULL: unchecked value from (4)
|
@@ -378,24 +378,24 @@ In function 'callee_1':
 In function 'test_2':
 /not/a/real/path/malloc-vs-local-4.c:38:7: warning: double-‘free’ of ‘ptr’ 
[-Wanalyzer-double-free]
38 |   free (ptr);
-  |   ^~~
+  |   ^~
   'test_2': events 1-5
34 |   if (!flag)
-  |  ^~
+  |  ^
   |  |

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-11 Thread Vladimir Makarov




On 2/7/25 12:18 PM, Richard Sandiford wrote:

FWIW, here's a very rough initial version of the kind of thing
I was thinking about.  Hopefully the hook documentation describes
the approach.  It's deliberately (overly?) flexible.

I've included an aarch64 version that (a) models the fact that the
first caller-save can also allocate the frame more-or-less for free,
and (b) once we've saved an odd number of GPRs, saving one more is
essentialy free.  I also hacked up an x86 version locally to model
the allocation benefits of using caller-saved registers.  It seemed
to fix the povray example above.

This still needs a lot of clean-up and testing, but I thought I might
as well send what I have before leaving for the weekend.  Does it look
reasonable in principle?

Richard, thank you for continuing work on this problem.  These hooks and 
their implementation have much more sense to me.  Although it is 
difficult to predict that it will solve all existing related PRs. You 
definitely get my approval of your hooks if you will manage not to have 
new GCC testsuite failures with these hooks on x86-64, aarch64, and ppc64.

[RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-11 Thread Jeff Law

So this is a fairly old regression, but with all the ranger work that's 
been done, it's become easy to resolve.


The basic idea here is to use known relationships between two operands 
of a SUB_OVERFLOW IFN to statically compute the overflow state and 
ultimately allow turning the IFN into simple arithmetic (or for the 
tests in this BZ elide the arithmetic entirely).


The regression example is when the two inputs are known equal.  In that 
case the subtraction will never overflow.But there's a few other 
cases we can handle as well.


a == b -> never overflows
a > b  -> never overflows when A and B are unsigned
a >= b -> never overflows when A and B are unsigned
a < b  -> always overflows when A and B are unsigned

Bootstrapped and regression tested on x86, and regression tested on the 
usual cross platforms.


OK for the trunk?

JeffPR tree-optimization/98028
gcc/
* vr-values.cc (check_for_binary_op_overflow): Try to use a known
relationship betwen op0/op1 to statically determine overflow state.

gcc/testsuite
* gcc.dg/tree-ssa/pr98028.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98028.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr98028.c
new file mode 100644
index 000..4e371b69235
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98028.c
@@ -0,0 +1,26 @@
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned f1(unsigned i, unsigned j) {
+  if (j != i) __builtin_unreachable();
+  return __builtin_sub_overflow_p(i, j, (unsigned)0);
+}
+
+unsigned f2(unsigned i, unsigned j) {
+  if (j > i) __builtin_unreachable();
+  return __builtin_sub_overflow_p(i, j, (unsigned)0);
+}
+
+unsigned f3(unsigned i, unsigned j) {
+  if (j >= i) __builtin_unreachable();
+  return __builtin_sub_overflow_p(i, j, (unsigned)0);
+}
+
+unsigned f4(unsigned i, unsigned j) {
+  if (j <= i) __builtin_unreachable();
+  return __builtin_sub_overflow_p(i, j, (unsigned)0);
+}
+
+/* { dg-final { scan-tree-dump-times "return 0" 3 optimized } } */
+/* { dg-final { scan-tree-dump-times "return 1" 1 optimized } } */
+/* { dg-final { scan-tree-dump-not "SUB_OVERFLOW" optimized } } */
+/* { dg-final { scan-tree-dump-not "IMAGPART_EXPR" optimized } } */
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index ed590138fe8..29568e27c38 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -85,6 +85,33 @@ check_for_binary_op_overflow (range_query *query,
  enum tree_code subcode, tree type,
  tree op0, tree op1, bool *ovf, gimple *s = NULL)
 {
+  /* For MINUS_EXPR, we may know based the relationship
+ (if any) between op0 and op1.  */
+  if (subcode == MINUS_EXPR)
+{
+  relation_kind rel = query->relation().query (s, op0, op1);
+
+  /* If the operands are equal, then the result will be zero
+and there is never an overflow.  */
+  if (rel == VREL_EQ)
+   return true;
+
+  /* If op0 and op1 are unsigned types, we still have a chance.  */
+  if (TYPE_UNSIGNED (TREE_TYPE (op0)) && TYPE_UNSIGNED (TREE_TYPE (op1)))
+   {
+ /* op0 > op1 or op0 >= op1 never overflows.  */
+ if (rel == VREL_GT || rel == VREL_GE)
+   return true;
+
+ /* And op0 < op1 always overflows.  */
+ if (rel == VREL_LT)
+   {
+ *ovf = true;
+ return true;
+   }
+   }
+}
+   
   int_range_max vr0, vr1;
   if (!query->range_of_expr (vr0, op0, s) || vr0.undefined_p ())
 vr0.set_varying (TREE_TYPE (op0));

Re: [PATCH 2/3] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

2025-02-11 Thread Lulu Cheng




在 2025/2/11 下午9:26, Xi Ruoyao 写道:

On Tue, 2025-02-11 at 20:49 +0800, Lulu Cheng wrote:

Split the implementation of the function loongarch_cpu_cpp_builtins
into two parts:
   1. Macro definitions that do not change (only considering 64-bit
architecture)
   2. Macro definitions that change with different compilation options.

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (builtin_undef): New macro.
(loongarch_cpu_cpp_builtins): Split to
loongarch_update_cpp_builtins
and loongarch_define_unconditional_macros.
(loongarch_def_or_undef): New functions.
(loongarch_define_unconditional_macros): Likewise.
(loongarch_update_cpp_builtins): Likewise.

Change-Id: Ifae73ffa2a07a595ed2a7f6ab7b82d8f51328a2a
---

/* snip */

I guess the handling for la_evo_macro_name macros (like
__loongarch_div32) and
__loongarch_version_major/__loongarch_version_minor should be moved as
well?  Things like #pragma GCC target("arch=la664") may affect them.


It seems that the following four also need to be updated. I will make 
corrections in v2


and add the corresponding test cases.


  builtin_define_with_value ("__loongarch_arch",

loongarch_arch_strings[la_target.cpu_arch], 1);

  builtin_define_with_value ("__loongarch_tune",
loongarch_tune_strings[la_target.cpu_tune], 1);

  builtin_define_with_value ("_LOONGARCH_ARCH",
loongarch_arch_strings[la_target.cpu_arch], 1);

  builtin_define_with_value ("_LOONGARCH_TUNE",
loongarch_tune_strings[la_target.cpu_tune], 1);

Re: [PATCH 3/8] LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description

2025-02-11 Thread Lulu Cheng




在 2025/2/7 下午8:09, Xi Ruoyao 写道:

These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors.  To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even indices for define_insn's, and generate those
vectors in define_expand's.

For "backward compatibilty" we need to provide a "punned" version for
the operations invoking TImode vectors as the intrinsics still expect
DImode vectors.

The stat is "201 insertions, 905 deletions."

/* snip */

diff --git a/gcc/config/loongarch/loongarch-modes.def 
b/gcc/config/loongarch/loongarch-modes.def
index e632f03636b..07cc29fceee 100644
--- a/gcc/config/loongarch/loongarch-modes.def
+++ b/gcc/config/loongarch/loongarch-modes.def
@@ -32,6 +32,7 @@ VECTOR_MODES (FLOAT, 8);  /*   V4HF V2SF */
  /* For LARCH LSX 128 bits.  */
  VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */
  VECTOR_MODES (FLOAT, 16); /*  V4SF V2DF */
+VECTOR_MODE (INT, TI, 1); /*V1TI */
  
  /* For LARCH LASX 256 bits.  */

  VECTOR_MODES (INT, 32); /* V32QI V16HI V8SI V4DI */


/* For LARCH LASX 256 bits.  */
 - VECTOR_MODES (INT, 32);/* V32QI V16HI V8SI V4DI */

 + VECTOR_MODES (INT, 32);/* V32QI V16HI V8SI V4DI V2TI */

Could you mark V2TI in v2?:-)


@@ -49,6 +50,7 @@ VECTOR_MODE (INT, QI, 64);/* V64QI*/
  VECTOR_MODE (INT, HI, 32);/* V32HI*/
  VECTOR_MODE (INT, SI, 16);/* V16SI*/
  VECTOR_MODE (INT, DI, 8); /* V8DI */
+VECTOR_MODE (INT, TI, 4); /* V4TI */
  VECTOR_MODE (FLOAT, SF, 16);  /* V16SF*/
  VECTOR_MODE (FLOAT, DF, 8);   /* V8DF */

[COMMITTED] RISC-V: Vector pesudoinsns with x0 operand to use imm 0

2025-02-11 Thread Vineet Gupta

A couple of Vector pseudoinstructions use x0 scalar which could be
inefficient on wider uarches due to regfile crossing.

Instead use the imm 0 form, which should be functionally equivalent.

 pseudoinsnorig insn with x0 this patch
     ---
 vneg.v vd,vs  vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0
 vncvt.x.x.w vd,vs,vm  vnsrl.wx vd,vs,x0,vm  vnsrl.wi vd,vs,0,vm
 vwcvt.x.x.v vd,vs,vm  vwadd.vx vd,vs,x0,vm  (imm not supported)

gcc/ChangeLog:
* config/riscv/vector.md: vncvt substitute vnsrl.
vnsrl with x0 replace with immediate 0.
vneg substitute vrsub.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: 
Change
expected pattern.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: 
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: 
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: 
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto.
* gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto.
* gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/vector.md| 16 ++---
 .../cond/cond_convert_int2int-rv32-1.c|  4 ++--
 .../cond/cond_convert_int2int-rv32-2.c|  4 ++--
 .../cond/cond_convert_int2int-rv64-1.c|  4 ++--
 .../cond/cond_convert_int2int-rv64-2.c|  4 ++--
 .../riscv/rvv/autovec/cond/cond_unary-1.c |  6 ++---
 .../riscv/rvv/autovec/cond/cond_unary-2.c |  6 ++---
 .../riscv/rvv/autovec/cond/cond_unary-3.c |  6 ++---
 .../riscv/rvv/autovec/cond/cond_unary-4.c |  6 ++---
 .../riscv/rvv/autovec/cond/cond_unary-5.c |  6 ++---
 .../riscv/rvv/autovec/cond/cond_unary-6.c |  6 ++---
 .../riscv/rvv/autovec/cond/cond_unary-7.c |  6 ++---
 .../riscv/rvv/autovec/cond/cond_unary-8.c |  6 ++---
 .../rvv/autovec/conversions/vncvt-rv32gcv.c   |  2 +-
 .../rvv/autovec/conversions/vncvt-rv64gcv.c   |  2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u16.c   |  2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u32.c   |  2 +-
 .../autovec/sat/vec_sat_u_sub_trunc-1-u8.c|  2 +-
 .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  2 +-
 .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  2 +-
 .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  2 +-
 .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/abs-2.c  |  2 +-
 .../riscv/rvv/autovec/vls/cond_convert-11.c   |  2 +-
 .../riscv/rvv/autovec/vls/cond_convert-12.c   |  2 +-
 .../riscv/rvv/autovec/vls/cond_neg-1.c|  2 +-
 .../riscv/rvv/autovec/vls/cond_trunc-1.c  |  2 +-
 .../riscv/rvv/autovec/vls/cond_trunc-2.c  |  2 +-
 .../riscv/rvv/autovec/vls/cond_trunc-3.c  |  2 +-
 .../riscv/rvv/autovec/vls/convert-11.c|  2 +-
 .../riscv/rvv/autovec/vls/convert-12.c|  2 +-
 .../gcc.target/riscv/rvv/autove

[PATCH] ifcvt: Don't speculation move inline-asm [PR102150]

2025-02-11 Thread Andrew Pinski

So unlike loop invariant motion, moving an inline-asm out of an
if is not always profitable and the cost estimate for the instruction
inside inline-asm is unknown.

This is a regression from GCC 4.6 which didn't speculatively move inline-asm
as far as I can tell.
Bootstrapped and tested on x86_64-linux-gnu.

PR rtl-optimization/102150
gcc/ChangeLog:

* ifcvt.cc (cheap_bb_rtx_cost_p): Return false if the insn
has an inline-asm in it.

Signed-off-by: Andrew Pinski 
---
 gcc/ifcvt.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index cb5597bc171..707937ba2f0 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -166,6 +166,12 @@ cheap_bb_rtx_cost_p (const_basic_block bb,
 {
   if (NONJUMP_INSN_P (insn))
{
+ /* Inline-asm's cost is not very estimatable.
+It could be a costly instruction but the
+estimate would be the same as a non costly
+instruction.  */
+ if (asm_noperands (PATTERN (insn)) >= 0)
+   return false;
  int cost = insn_cost (insn, speed) * REG_BR_PROB_BASE;
  if (cost == 0)
return false;
-- 
2.43.0

[PATCH] x86: Properly find the maximum stack slot alignment

2025-02-11 Thread H.J. Lu

Don't assume that stack slots can only be accessed by stack or frame
registers.  We first find all registers defined by stack or frame
registers.  Then check memory accesses by such registers, including
stack and frame registers.

gcc/

PR target/109780
PR target/109093
* config/i386/i386.cc (ix86_update_stack_alignment): New.
(ix86_find_all_reg_use): Likewise.
(ix86_find_max_used_stack_alignment): Also check memory accesses
from registers defined by stack or frame registers.

gcc/testsuite/

PR target/109780
PR target/109093
* g++.target/i386/pr109780-1.C: New test.
* gcc.target/i386/pr109093-1.c: Likewise.
* gcc.target/i386/pr109780-1.c: Likewise.
* gcc.target/i386/pr109780-2.c: Likewise.

-- 
H.J.
From 13da9e9be612333b7df7f66cf4b4c1396a64d89d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 14 Mar 2023 11:41:51 -0700
Subject: [PATCH] x86: Properly find the maximum stack slot alignment

Don't assume that stack slots can only be accessed by stack or frame
registers.  We first find all registers defined by stack or frame
registers.  Then check memory accesses by such registers, including
stack and frame registers.

gcc/

	PR target/109780
	PR target/109093
	* config/i386/i386.cc (ix86_update_stack_alignment): New.
	(ix86_find_all_reg_use): Likewise.
	(ix86_find_max_used_stack_alignment): Also check memory accesses
	from registers defined by stack or frame registers.

gcc/testsuite/

	PR target/109780
	PR target/109093
	* g++.target/i386/pr109780-1.C: New test.
	* gcc.target/i386/pr109093-1.c: Likewise.
	* gcc.target/i386/pr109780-1.c: Likewise.
	* gcc.target/i386/pr109780-2.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc| 128 +
 gcc/testsuite/g++.target/i386/pr109780-1.C |  72 
 gcc/testsuite/gcc.target/i386/pr109093-1.c |  38 ++
 gcc/testsuite/gcc.target/i386/pr109780-1.c |  14 +++
 gcc/testsuite/gcc.target/i386/pr109780-2.c |  21 
 5 files changed, 252 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109093-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3128973ba79..495b97116a4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -8466,6 +8466,65 @@ output_probe_stack_range (rtx reg, rtx end)
   return "";
 }
 
+/* Update the maximum stack slot alignment from memory alignment in
+   PAT.  */
+
+static void
+ix86_update_stack_alignment (rtx, const_rtx pat, void *data)
+{
+  /* This insn may reference stack slot.  Update the maximum stack slot
+ alignment.  */
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, pat, ALL)
+if (MEM_P (*iter))
+  {
+	unsigned int alignment = MEM_ALIGN (*iter);
+	unsigned int *stack_alignment
+	  = (unsigned int *) data;
+	if (alignment > *stack_alignment)
+	  *stack_alignment = alignment;
+	break;
+  }
+}
+
+/* Find all registers defined with REG.  */
+
+static void
+ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
+		   unsigned int reg, auto_bitmap &worklist)
+{
+  for (df_ref ref = DF_REG_USE_CHAIN (reg);
+   ref != NULL;
+   ref = DF_REF_NEXT_REG (ref))
+{
+  if (DF_REF_IS_ARTIFICIAL (ref))
+	continue;
+
+  rtx_insn *insn = DF_REF_INSN (ref);
+  if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+  rtx set = single_set (insn);
+  if (!set)
+	continue;
+
+  rtx src = SET_SRC (set);
+  if (MEM_P (src))
+	continue;
+
+  rtx dest = SET_DEST (set);
+  if (!REG_P (dest))
+	continue;
+
+  if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest)))
+	continue;
+
+  /* Add this register to stack_slot_access.  */
+  add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest));
+  bitmap_set_bit (worklist, REGNO (dest));
+}
+}
+
 /* Set stack_frame_required to false if stack frame isn't required.
Update STACK_ALIGNMENT to the largest alignment, in bits, of stack
slot used if stack frame is required and CHECK_STACK_SLOT is true.  */
@@ -8484,10 +8543,6 @@ ix86_find_max_used_stack_alignment (unsigned int &stack_alignment,
   add_to_hard_reg_set (&set_up_by_prologue, Pmode,
 		   HARD_FRAME_POINTER_REGNUM);
 
-  /* The preferred stack alignment is the minimum stack alignment.  */
-  if (stack_alignment > crtl->preferred_stack_boundary)
-stack_alignment = crtl->preferred_stack_boundary;
-
   bool require_stack_frame = false;
 
   FOR_EACH_BB_FN (bb, cfun)
@@ -8499,27 +8554,58 @@ ix86_find_max_used_stack_alignment (unsigned int &stack_alignment,
    set_up_by_prologue))
 	  {
 	require_stack_frame = true;
-
-	if (check_stack_slot)
-	  {
-		/* Find the maximum stack alignment.  */
-		subrtx_iterator::array_type array;
-		FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
-		  if (MEM_P (*iter)
-		  && (r

Re: [PATCH]AArch64: Fix GCC 13 backport of big.Little CPU detection [PR118800]

2025-02-11 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> It seems I ran regressions but forgot to check them last time `(*>?<*)?
>
> On the GCC-13 branch the backport caused a failure due to the branch not 
> having
> generic-armv8-a and also it still treating the generic cpu special.  This made
> it return NULL when trying to find the default CPU.
>
> In GCC 13 we still had multiple structures with the same information and in 
> this
> case aarch64_cpu_data was missing the generic CPU which is in all_cores.
>
> This corrects it by using "generc" instead and also adding it to
> aarch64_cpu_data.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu on GCC-13 branch and no 
> issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/118800
>   * config/aarch64/driver-aarch64.cc (DEFAULT_CPU): Use generic instead of
>   generic-armv8-a.
>   (aarch64_cpu_data): Add generic.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/118800
>   * gcc.target/aarch64/cpunative/native_cpu_34.c: Update order.

OK, thanks.  Reading this made me think that INVALID_IMP and INVALID_CORE
might be better for the generic entries, rather than 0x0 and 0x0.
But that applies to trunk and gcc-14 too, so isn't something to change here.

Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/driver-aarch64.cc 
> b/gcc/config/aarch64/driver-aarch64.cc
> index 
> ff4660f469cd5c899c981ee8181d1794fade..acc44536629e814a2aea0e4b21e327da3fa5d6ea
>  100644
> --- a/gcc/config/aarch64/driver-aarch64.cc
> +++ b/gcc/config/aarch64/driver-aarch64.cc
> @@ -60,7 +60,7 @@ struct aarch64_core_data
>  #define ALL_VARIANTS ((unsigned)-1)
>  /* Default architecture to use if -mcpu=native did not detect a known CPU.  
> */
>  #define DEFAULT_ARCH "8A"
> -#define DEFAULT_CPU "generic-armv8-a"
> +#define DEFAULT_CPU "generic"
>  
>  #define AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, 
> PART, VARIANT) \
>{ CORE_NAME, #ARCH, IMP, PART, VARIANT, feature_deps::cpu_##CORE_IDENT },
> @@ -68,6 +68,7 @@ struct aarch64_core_data
>  static CONSTEXPR const aarch64_core_data aarch64_cpu_data[] =
>  {
>  #include "aarch64-cores.def"
> +  { "generic", "armv8-a", 0, 0, ALL_VARIANTS, 0},
>{ NULL, NULL, INVALID_IMP, INVALID_CORE, ALL_VARIANTS, 0 }
>  };
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_34.c 
> b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_34.c
> index 
> 168140002a0f0205c0f552de0cce9b2d356e09e2..d2ff8156d8fc14fcc14ddd91f43f0b0fea15cc7b
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_34.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_34.c
> @@ -7,6 +7,6 @@ int main()
>return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch 
> armv8-a\+dotprod\+crc\+crypto\+sve2\n} } } */
> +/* { dg-final { scan-assembler {\.arch 
> armv8-a\+crc\+dotprod\+crypto\+sve2\n} } } */
>  
>  /* Test a normal looking procinfo.  */

[pushed] c++: change implementation of -frange-for-ext-temps [PR118574]

2025-02-11 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The implementation in r15-3840 used a novel technique of wrapping the entire
range-for loop in a CLEANUP_POINT_EXPR, which confused the coroutines
transformation.  Instead let's use the existing extend_ref_init_temps
mechanism.

This does not revert all of r15-3840, only the parts that change how
CLEANUP_POINT_EXPRs are applied to range-for declarations.

PR c++/118574
PR c++/107637

gcc/cp/ChangeLog:

* call.cc (struct extend_temps_data): New.
(extend_temps_r, extend_all_temps): New.
(set_up_extended_ref_temp): Handle tree walk case.
(extend_ref_init_temps): Cal extend_all_temps.
* decl.cc (initialize_local_var): Revert ext-temps change.
* parser.cc (cp_convert_range_for): Likewise.
(cp_parser_omp_loop_nest): Likewise.
* pt.cc (tsubst_stmt): Likewise.
* semantics.cc (finish_for_stmt): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/range-for1.C: New test.
---
 gcc/cp/call.cc   | 117 +--
 gcc/cp/decl.cc   |   5 -
 gcc/cp/parser.cc |  23 +---
 gcc/cp/pt.cc |  22 
 gcc/cp/semantics.cc  |  13 ---
 gcc/testsuite/g++.dg/coroutines/range-for1.C |  69 +++
 6 files changed, 180 insertions(+), 69 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/range-for1.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index e440d58141b..2c77b4a4b68 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -14154,6 +14154,20 @@ make_temporary_var_for_ref_to_temp (tree decl, tree 
type)
   return pushdecl (var);
 }
 
+/* Data for extend_temps_r, mostly matching the parameters of
+   extend_ref_init_temps.  */
+
+struct extend_temps_data
+{
+  tree decl;
+  tree init;
+  vec **cleanups;
+  tree* cond_guard;
+  hash_set *pset;
+};
+
+static tree extend_temps_r (tree *, int *, void *);
+
 /* EXPR is the initializer for a variable DECL of reference or
std::initializer_list type.  Create, push and return a new VAR_DECL
for the initializer so that it will live as long as DECL.  Any
@@ -14162,7 +14176,8 @@ make_temporary_var_for_ref_to_temp (tree decl, tree 
type)
 
 static tree
 set_up_extended_ref_temp (tree decl, tree expr, vec **cleanups,
- tree *initp, tree *cond_guard)
+ tree *initp, tree *cond_guard,
+ extend_temps_data *walk_data)
 {
   tree init;
   tree type;
@@ -14198,10 +14213,16 @@ set_up_extended_ref_temp (tree decl, tree expr, 
vec **cleanups,
   suppress_warning (decl);
 }
 
-  /* Recursively extend temps in this initializer.  */
-  TARGET_EXPR_INITIAL (expr)
-= extend_ref_init_temps (decl, TARGET_EXPR_INITIAL (expr), cleanups,
-cond_guard);
+  /* Recursively extend temps in this initializer.  The recursion needs to come
+ after creating the variable to conform to the mangling ABI, and before
+ maybe_constant_init because the extension might change its result.  */
+  if (walk_data)
+cp_walk_tree (&TARGET_EXPR_INITIAL (expr), extend_temps_r,
+ walk_data, walk_data->pset);
+  else
+TARGET_EXPR_INITIAL (expr)
+  = extend_ref_init_temps (decl, TARGET_EXPR_INITIAL (expr), cleanups,
+  cond_guard);
 
   /* Any reference temp has a non-trivial initializer.  */
   DECL_NONTRIVIALLY_INITIALIZED_P (var) = true;
@@ -14801,7 +14822,8 @@ extend_ref_init_temps_1 (tree decl, tree init, 
vec **cleanups,
   if (TREE_CODE (*p) == TARGET_EXPR)
 {
   tree subinit = NULL_TREE;
-  *p = set_up_extended_ref_temp (decl, *p, cleanups, &subinit, cond_guard);
+  *p = set_up_extended_ref_temp (decl, *p, cleanups, &subinit,
+cond_guard, nullptr);
   recompute_tree_invariant_for_addr_expr (sub);
   if (init != sub)
init = fold_convert (TREE_TYPE (init), sub);
@@ -14811,6 +14833,81 @@ extend_ref_init_temps_1 (tree decl, tree init, 
vec **cleanups,
   return init;
 }
 
+/* Tree walk function for extend_all_temps.  Generally parallel to
+   extend_ref_init_temps_1, but adapted for walk_tree.  */
+
+tree
+extend_temps_r (tree *tp, int *walk_subtrees, void *data)
+{
+  extend_temps_data *d = (extend_temps_data *)data;
+
+  if (TYPE_P (*tp) || TREE_CODE (*tp) == CLEANUP_POINT_EXPR)
+{
+  *walk_subtrees = 0;
+  return NULL_TREE;
+}
+
+  if (TREE_CODE (*tp) == COND_EXPR)
+{
+  cp_walk_tree (&TREE_OPERAND (*tp, 0), extend_temps_r, d, d->pset);
+
+  auto walk_arm = [d](tree &op)
+  {
+   tree cur_cond_guard = NULL_TREE;
+   auto ov = make_temp_override (d->cond_guard, &cur_cond_guard);
+   cp_walk_tree (&op, extend_temps_r, d, d->pset);
+   if (cur_cond_guard)
+ {
+   tree set = build2 (MODIFY_EXPR, boolean_type_node,
+

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-11 Thread Jeff Law





On 2/11/25 3:17 PM, Richard Sandiford wrote:

Jeff Law  writes:

On 2/11/25 9:08 AM, Richard Sandiford wrote:

Jeff Law  writes:

On 2/7/25 5:59 AM, Andrew Waterman wrote:

This patch runs counter to the ABI spec, which states that vxrm is not
preserved across calls and is volatile upon function entry [1].  vxrm
does not play the same role as frm plays in the calling convention.
(I won't get into the rationale in this email, but the rationale isn't
especially important: we should follow the ABI.)

[1] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120

Pan's patch doesn't change the basic property that VXRM has no known
state at function entry or upon return from a function call.


I think it will.  global_regs[X] means that X is defined on entry,
defined on exit, and can be changed by calls.  If the register is
call-clobbered/volatile/caller-saved, then I agree with Andrew that
this doesn't look like the right fix.

But the LCM code we use to manage vxrm assignments makes no assumption
about incoming state and assumes no state is preserved across calls.


In that case, I wonder what the patch is fixing.  Like you say,
the initial mode seems to be VXRM_MODE_NONE, and it looks like
riscv_vxrm_mode_after correctly models calls as clobbering the mode.
Just realized I didn't answer this part of your message.  It's not 
really fixing any known issue.  Just felt like the right thing to do as 
VXRM is roughly similar to (but clearly not 100% the same) FRM.


jeff

[pushed] c++: don't default -frange-for-ext-temps in -std=gnu++20 [PR188574]

2025-02-11 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Since -frange-for-ext-temps has been causing trouble, let's not enable it
by default in pre-C++23 GNU modes for GCC 15, and also allow disabling it in
C++23 and up.

PR c++/188574

gcc/c-family/ChangeLog:

* c-opts.cc (c_common_post_options): Only enable
-frange-for-ext-temps by default in C++23.

gcc/ChangeLog:

* doc/invoke.texi: Adjust -frange-for-ext-temps documentation.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/range-for3.C: Use -frange-for-ext-temps.
* g++.dg/cpp23/range-for4.C: Adjust expected result.

libgomp/ChangeLog:

* testsuite/libgomp.c++/range-for-4.C: Adjust expected result.
---
 gcc/doc/invoke.texi |  5 ++---
 gcc/c-family/c-opts.cc  | 17 +++--
 gcc/testsuite/g++.dg/cpp23/range-for3.C |  4 ++--
 gcc/testsuite/g++.dg/cpp23/range-for4.C |  4 ++--
 libgomp/testsuite/libgomp.c++/range-for-4.C |  2 +-
 5 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0aef2abf05b..56d43cb6779 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3548,9 +3548,8 @@ easier, you can use @option{-fno-pretty-templates} to 
disable them.
 Enable lifetime extension of C++ range based for temporaries.
 With @option{-std=c++23} and above this is part of the language standard,
 so lifetime of the temporaries is extended until the end of the loop
-regardless of this option.  This option allows enabling that behavior also
-in earlier versions of the standard and is enabled by default in the
-GNU dialects, from @option{-std=gnu++11} until @option{-std=gnu++20}.
+by default.  This option allows enabling that behavior also
+in earlier versions of the standard.
 
 @opindex fno-rtti
 @opindex frtti
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 87b231861a6..d43b3aef102 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1213,20 +1213,9 @@ c_common_post_options (const char **pfilename)
   if (cxx_dialect >= cxx20)
 flag_concepts = 1;
 
-  /* Enable lifetime extension of range based for temporaries for C++23.
- Diagnose -std=c++23 -fno-range-for-ext-temps.  */
-  if (cxx_dialect >= cxx23)
-{
-  if (OPTION_SET_P (flag_range_for_ext_temps)
- && !flag_range_for_ext_temps)
-   error ("%<-fno-range-for-ext-temps%> is incompatible with C++23");
-  flag_range_for_ext_temps = 1;
-}
-  /* Otherwise default to enabled in GNU modes but allow user to override.  */
-  else if (cxx_dialect >= cxx11
-  && !flag_iso
-  && !OPTION_SET_P (flag_range_for_ext_temps))
-flag_range_for_ext_temps = 1;
+  /* Enable lifetime extension of range based for temporaries for C++23.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+  flag_range_for_ext_temps, cxx_dialect >= cxx23);
 
   /* -fimmediate-escalation has no effect when immediate functions are not
  supported.  */
diff --git a/gcc/testsuite/g++.dg/cpp23/range-for3.C 
b/gcc/testsuite/g++.dg/cpp23/range-for3.C
index 301e25886ec..f95b21b3cee 100644
--- a/gcc/testsuite/g++.dg/cpp23/range-for3.C
+++ b/gcc/testsuite/g++.dg/cpp23/range-for3.C
@@ -1,7 +1,7 @@
 // P2718R0 - Wording for P2644R1 Fix for Range-based for Loop
 // { dg-do run { target c++11 } }
-// Verify -frange-for-ext-temps is set by default in -std=gnu++* modes.
-// { dg-options "" }
+// Verify -frange-for-ext-temps works in earlier standards.
+// { dg-additional-options "-frange-for-ext-temps" }
 
 #define RANGE_FOR_EXT_TEMPS 1
 #include "range-for1.C"
diff --git a/gcc/testsuite/g++.dg/cpp23/range-for4.C 
b/gcc/testsuite/g++.dg/cpp23/range-for4.C
index f8c380d32c7..16204974bac 100644
--- a/gcc/testsuite/g++.dg/cpp23/range-for4.C
+++ b/gcc/testsuite/g++.dg/cpp23/range-for4.C
@@ -1,7 +1,7 @@
 // P2718R0 - Wording for P2644R1 Fix for Range-based for Loop
 // { dg-do run { target c++11 } }
-// Verify -frange-for-ext-temps is set by default in -std=gnu++* modes.
+// Verify -frange-for-ext-temps is not set by default in -std=gnu++* modes.
 // { dg-options "" }
 
-#define RANGE_FOR_EXT_TEMPS 1
+#define RANGE_FOR_EXT_TEMPS 0
 #include "range-for2.C"
diff --git a/libgomp/testsuite/libgomp.c++/range-for-4.C 
b/libgomp/testsuite/libgomp.c++/range-for-4.C
index 3c10e7349af..aa6e4da523c 100644
--- a/libgomp/testsuite/libgomp.c++/range-for-4.C
+++ b/libgomp/testsuite/libgomp.c++/range-for-4.C
@@ -3,5 +3,5 @@
 // { dg-additional-options "-std=gnu++17" }
 // { dg-require-effective-target tls_runtime }
 
-#define RANGE_FOR_EXT_TEMPS 1
+#define RANGE_FOR_EXT_TEMPS 0
 #include "range-for-1.C"

base-commit: 299a8e2dc667e795991bc439d2cad5ea5bd379e2
prerequisite-patch-id: aeecd9138d83da91723a418776494445063247f2
-- 
2.48.1

[PATCH] c++: ICE with operator new[] in constexpr [PR118775]

2025-02-11 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we ICE since r11-7740 because we no longer say that (long)&a
(where a is a global var) is non_constant_p.  So VERIFY_CONSTANT
does not return and we crash on tree_to_uhwi.  We should check
tree_fits_uhwi_p before calling tree_to_uhwi.

PR c++/118775

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Check tree_fits_uhwi_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-new24.C: New test.
* g++.dg/cpp2a/constexpr-new25.C: New test.
---
 gcc/cp/constexpr.cc  |  7 +
 gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C | 25 ++
 gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C | 27 
 3 files changed, 59 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index f142dd32bc8..f8f9a9df1a2 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2909,6 +2909,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  gcc_assert (arg0);
  if (new_op_p)
{
+ if (!tree_fits_uhwi_p (arg0))
+   {
+ if (!ctx->quiet)
+   error_at (loc, "cannot allocate array: size too large");
+ *non_constant_p = true;
+ return t;
+   }
  tree type = build_array_type_nelts (char_type_node,
  tree_to_uhwi (arg0));
  tree var = build_decl (loc, VAR_DECL,
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
new file mode 100644
index 000..debb7f0f5c4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
@@ -0,0 +1,25 @@
+// PR c++/118775
+// { dg-do compile { target c++20 } }
+
+int a;
+
+constexpr char *
+f1 ()
+{
+  constexpr auto p = new char[(long int) &a]; // { dg-error "size too large" }
+  return p;
+}
+
+constexpr char *
+f2 ()
+{
+  auto p = new char[(long int) &a];  // { dg-error "size too large" }
+  return p;
+}
+
+void
+g ()
+{
+  auto r1 = f2 ();
+  constexpr auto r2 = f2 (); // { dg-message "in .constexpr. expansion" }
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C
new file mode 100644
index 000..91c0318abd8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C
@@ -0,0 +1,27 @@
+// PR c++/118775
+// { dg-do compile { target c++20 } }
+
+namespace std {
+struct __uniq_ptr_impl {
+  constexpr __uniq_ptr_impl(char *) {}
+};
+template  struct unique_ptr {
+  __uniq_ptr_impl _M_t;
+  constexpr ~unique_ptr() {}
+};
+template  struct _MakeUniq;
+template  struct _MakeUniq<_Tp[]> {
+  typedef unique_ptr<_Tp[]> __array;
+};
+template  using __unique_ptr_array_t = _MakeUniq<_Tp>::__array;
+constexpr __unique_ptr_array_t make_unique(long __num) {
+  return unique_ptr(new char[__num]);
+}
+} // namespace std
+int a;
+int
+main ()
+{
+  std::unique_ptr p = std::make_unique((long)&a);
+  constexpr std::unique_ptr p2 = std::make_unique((long)&a); // { dg-error 
"conversion" }
+}

base-commit: 299a8e2dc667e795991bc439d2cad5ea5bd379e2
-- 
2.48.1

Re: [PATCH v2] RISC-V: Vector pesudoinsns with x0 operand to use imm 0

2025-02-11 Thread Jeff Law





On 2/9/25 5:20 AM, Vineet Gupta wrote:

On 2/8/25 23:02, Jeff Law wrote:

On 2/7/25 9:34 PM, Vineet Gupta wrote:

A couple of Vector pseudoinstructions use x0 scalar which being regfile
crosser could be inefficient on certain wider uarches.

Use the imm 0 form, which should be functionally equivalent.

   pseudoinsnorig insn with x0 this patch
       ---
   vneg.v vd,vs  vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0
   vncvt.x.x.w vd,vs,vm  vnsrl.wx vd,vs,x0,vm  vnsrl.wi vd,vs,0,vm
   vwcvt.x.x.v vd,vs,vm  vwadd.vx vd,vs,x0,vm  (imm not supported)

This passes my testsuite A/B run but obviously wait for the CI tester to
give a green light.

gcc/ChangeLog:
* config/riscv/vector.md: vncvt substitute vnsrl.
vnsrl with x0 replace with immediate 0.
vneg substitute vrsub.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: 
Change
expected pattern.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: 
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: 
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: 
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto.
* gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto.
* gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto.

LGTM.  I think the only question is whether or not to make an exception
for this or not.  We are in stage4 after all ;-)  Figure we can make a
decision on the Tues call if you're available.


I don't have a strong opinion either way, just wanted to get it out of my tree 
:-)
Yeah sure, 9 PM IST is manageable.
FTR, this patch was discussed during the RISC-V patchwork meeting this 
morning.  The consensus was it was safe to go in now even though we're 
in stage4.


The only technical concern raised was the introduction of C code 
fragments to generate final asm, which we've largely avoided in the 
port.  But it was considered a fairly minor concern.


So officially OK for the trunk now.

jeff

Re: [PATCH] RISC-V: Drop __riscv_vendor_feature_bits

2025-02-11 Thread Jeff Law





On 2/11/25 12:35 AM, Yangyu Chen wrote:

As discussed from RISC-V C-API PR #101 [1], As discussed in #96, current
interface is insufficient to support some cases, like a vendor buying a
CPU IP from the upstream vendor but using their own mvendorid and custom
features from the upstream vendor. In this case, we might need to add
these extensions for each downstream vendor many times. Thus, making
__riscv_vendor_feature_bits guarded by mvendorid is not a good idea. So,
drop __riscv_vendor_feature_bits for now, and we should have time to
discuss a better solution.

[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/101

Signed-off-by: Yangyu Chen 

gcc/ChangeLog:

* config/riscv/riscv-feature-bits.h (RISCV_VENDOR_FEATURE_BITS_LENGTH): 
Drop.
(struct riscv_vendor_feature_bits): Drop.

libgcc/ChangeLog:

* config/riscv/feature_bits.c (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop.
(__init_riscv_features_bits_linux): Drop.

Thanks.  I've pushed this to the trunk.
jeff

[COMMITTED] Doc: Fix Texinfo warning in install.texi

2025-02-11 Thread Sandra Loosemore

For some time I've been seeing this Texinfo warning in my builds:

.../gcc/doc/install.texi:2295: warning: `.' or `,' must follow @xref, not f

Fixed thusly.

gcc/ChangeLog
* doc/install.texi: Add missing comma after @xref to fix warning.
---
 gcc/doc/install.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index d6cf318b3af..bd7a38048eb 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2292,7 +2292,7 @@ canadian cross build.  The @option{--disable-nls} option 
disables NLS@.
 Note that this functionality requires either libintl (provided by GNU
 gettext) or C standard library that contains support for gettext (such
 as the GNU C Library).
-@xref{with-included-gettext,,--with-included-gettext} for more
+@xref{with-included-gettext,,--with-included-gettext}, for more
 information on the conditions required to get gettext support.
 
 @item --with-libintl-prefix=@var{dir}
-- 
2.34.1

[COMMITTED] Doc: Fix some typos and other nearby sloppy-writing issues

2025-02-11 Thread Sandra Loosemore

I spotted some typos in the GCC manual.  Since often these are a sign
that the text was inserted without being proofread, I looked at the
context and fixed some grammar/punctuation/wording issues as well.

gcc/ChangeLog
* doc/extend.texi: Fix a bunch of typos and other writing bugs.
* doc/invoke.texi: Likewise.
---
 gcc/doc/extend.texi | 85 ++---
 gcc/doc/invoke.texi | 62 -
 2 files changed, 73 insertions(+), 74 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d79e97d9a03..065bd8b84e1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1004,17 +1004,16 @@ The ISO C++14 library also defines the @samp{i} suffix, 
so C++14 code
 that includes the @samp{} header cannot use @samp{i} for the
 GNU extension.  The @samp{j} suffix still has the GNU meaning.
 
-GCC can handle both implicit and explicit casts between the @code{_Complex}
-types and other @code{_Complex} types as casting both the real and imaginary
-parts to the scalar type.
-GCC can handle implicit and explicit casts from a scalar type to a 
@code{_Complex}
-type and where the imaginary part will be considered zero.
-The C front-end can handle implicit and explicit casts from a @code{_Complex} 
type
-to a scalar type where the imaginary part will be ignored. In C++ code, this 
cast
-is considered illformed and G++ will error out.
+GCC handles both implicit and explicit casts between the
+@code{_Complex} types with different scalar base types by casting both
+the real and imaginary parts to the base type of the result.
+GCC also handles implicit and explicit casts from a scalar type to a
+@code{_Complex} type, by giving the imaginary part a zero value.
 
-GCC provides a built-in function @code{__builtin_complex} will can be used to
-construct a complex value.
+The C front end can handle implicit and explicit casts from a
+@code{_Complex} type to a scalar type, which uses the value of the
+real part and ignores the imaginary part.  In C++ code, this cast is
+considered ill-formed and G++ diagnoses it as an error.
 
 @cindex @code{__real__} keyword
 @cindex @code{__imag__} keyword
@@ -1023,7 +1022,7 @@ GCC has a few extensions which can be used to extract the 
real
 and the imaginary part of the complex-valued expression. Note
 these expressions are lvalues if the @var{exp} is an lvalue.
 These expressions operands have the type of a complex type
-which might get prompoted to a complex type from a scalar type.
+which might get promoted to a complex type from a scalar type.
 E.g. @code{__real__ (int)@var{x}} is the same as casting to
 @code{_Complex int} before @code{__real__} is done.
 
@@ -1035,7 +1034,7 @@ E.g. @code{__real__ (int)@var{x}} is the same as casting 
to
 @tab Extract the imaginary part of @var{exp}.
 @end multitable
 
-For values of floating point, you should use the ISO C99
+For values of floating-point type, you should use the ISO C99
 functions, declared in @code{} and also provided as
 built-in functions by GCC@.
 
@@ -1053,7 +1052,7 @@ with a complex type.  This is a GNU extension; for values 
of
 floating type, you should use the ISO C99 functions @code{conjf},
 @code{conj} and @code{conjl}, declared in @code{} and also
 provided as built-in functions by GCC@. Note unlike the @code{__real__}
-and @code{__imag__} operators, this operator will not do an implicit cast
+and @code{__imag__} operators, this operator does not do an implicit cast
 to the complex type because the @samp{~} is already a normal operator.
 
 GCC can allocate complex automatic variables in a noncontiguous
@@ -3526,7 +3525,7 @@ mismatched allocation and deallocation functions and 
diagnose them under
 the control of options such as @option{-Wmismatched-dealloc}.  It also
 makes it possible to diagnose attempts to deallocate objects that were not
 allocated dynamically, by @option{-Wfree-nonheap-object}.  To indicate
-that an allocation function both satisifies the nonaliasing property and
+that an allocation function both satisfies the nonaliasing property and
 has a deallocator associated with it, both the plain form of the attribute
 and the one with the @var{deallocator} argument must be used.  The same
 function can be both an allocator and a deallocator.  Since inlining one
@@ -3949,7 +3948,7 @@ caveats.
 If the pointer argument is also referred to by an @code{access} attribute on 
the
 function with @var{access-mode} either @code{read_only} or @code{read_write}
 and the latter attribute has the optional @var{size-index} argument
-referring to a size argument, this expressses the maximum size of the access.
+referring to a size argument, this expresses the maximum size of the access.
 For example, given:
 
 @smallexample
@@ -4378,7 +4377,7 @@ is a usage of a function with @code{target_clones} 
attribute.
 Note that any subsequent call of a function without @code{target_clone}
 from a @code{target_clone} caller will not lead to copying
 (

[COMMITTED] Doc: Delete obsolete interface.texi chapter from GCC internals manual

2025-02-11 Thread Sandra Loosemore

The "Interfacing to GCC Output" chapter used to be part of the
user-facing GCC documentation but ended up in the GCC internals manual
when the two documents were separated in 2001.  It hasn't been updated
in any substantive way since then, and is now very bit-rotten.  (PCC is
no longer the "standard compiler" on any target, and the target-specific
issues mentioned are for very old architectures.)

Meanwhile, the GCC user documentation now has a chapter called "Binary
Compatibility" that covers ABI issues in a generic way and also covers
C++ compatibility.  Let's keep that one and throw out the obsolete
text that seems to predate the whole notion of an ABI.

gcc/ChangeLog
* Makefile.in (TEXI_GCCINT_FILES): Remove interface.texi.
* doc/gccint.texi (Top): Remove menu entry for the "interface" node,
and include of interface.texi.
* doc/interface.texi: Delete.
---
 gcc/Makefile.in|  2 +-
 gcc/doc/gccint.texi|  5 +--
 gcc/doc/interface.texi | 70 --
 3 files changed, 2 insertions(+), 75 deletions(-)
 delete mode 100644 gcc/doc/interface.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a8e32e25cf5..c159825e62c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3697,7 +3697,7 @@ TEXI_GCC_FILES = gcc.texi gcc-common.texi gcc-vers.texi 
frontends.texi\
 # the *.texi files have changed.
 TEXI_GCCINT_FILES = gccint.texi gcc-common.texi gcc-vers.texi  \
 contribute.texi makefile.texi configterms.texi options.texi\
-portability.texi interface.texi passes.texi rtl.texi md.texi   \
+portability.texi passes.texi rtl.texi md.texi  \
 $(srcdir)/doc/tm.texi hostconfig.texi fragments.texi   \
 configfiles.texi collect2.texi headerdirs.texi funding.texi\
 gnu.texi gpl_v3.texi fdl.texi contrib.texi languages.texi  \
diff --git a/gcc/doc/gccint.texi b/gcc/doc/gccint.texi
index eea2d48f87a..d88fc1a1c68 100644
--- a/gcc/doc/gccint.texi
+++ b/gcc/doc/gccint.texi
@@ -87,8 +87,7 @@ Compiler Collection (GCC)}.
 This manual is mainly a reference manual rather than a tutorial.  It
 discusses how to contribute to GCC (@pxref{Contributing}), the
 characteristics of the machines supported by GCC as hosts and targets
-(@pxref{Portability}), how GCC relates to the ABIs on such systems
-(@pxref{Interface}), and the characteristics of the languages for
+(@pxref{Portability}), and the characteristics of the languages for
 which GCC front ends are written (@pxref{Languages}).  It then
 describes the GCC source tree structure and build system, some of the
 interfaces to GCC front ends, and how support for a target system is
@@ -100,7 +99,6 @@ Additional tutorial information is linked to from
 @menu
 * Contributing::How to contribute to testing and developing GCC.
 * Portability:: Goals of GCC's portability features.
-* Interface::   Function-call interface of GCC output.
 * Libgcc::  Low-level runtime library used by GCC.
 * Languages::   Languages for which GCC front ends are written.
 * Source Tree:: GCC source tree structure and build system.
@@ -141,7 +139,6 @@ Additional tutorial information is linked to from
 
 @include contribute.texi
 @include portability.texi
-@include interface.texi
 @include libgcc.texi
 @include languages.texi
 @include sourcebuild.texi
diff --git a/gcc/doc/interface.texi b/gcc/doc/interface.texi
deleted file mode 100644
index 1688d6f66ec..000
--- a/gcc/doc/interface.texi
+++ /dev/null
@@ -1,70 +0,0 @@
-@c Copyright (C) 1988-2025 Free Software Foundation, Inc.
-@c This is part of the GCC manual.
-@c For copying conditions, see the file gcc.texi.
-
-@node Interface
-@chapter Interfacing to GCC Output
-@cindex interfacing to GCC output
-@cindex run-time conventions
-@cindex function call conventions
-@cindex conventions, run-time
-
-GCC is normally configured to use the same function calling convention
-normally in use on the target system.  This is done with the
-machine-description macros described (@pxref{Target Macros}).
-
-@cindex unions, returning
-@cindex structures, returning
-@cindex returning structures and unions
-However, returning of structure and union values is done differently on
-some target machines.  As a result, functions compiled with PCC
-returning such types cannot be called from code compiled with GCC,
-and vice versa.  This does not cause trouble often because few Unix
-library routines return structures or unions.
-
-GCC code returns structures and unions that are 1, 2, 4 or 8 bytes
-long in the same registers used for @code{int} or @code{double} return
-values.  (GCC typically allocates variables of such types in
-registers also.)  Structures and unions of other sizes are returned by
-storing them into an address passed by the caller (usually in a
-register).  The target hook @code{TARGET_STRUCT_VALUE_RTX}
-tells GCC where to pass this address.
-
-By contrast, PCC on most target machines retur

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-11 Thread Jeff Law





On 2/11/25 3:17 PM, Richard Sandiford wrote:

Jeff Law  writes:

On 2/11/25 9:08 AM, Richard Sandiford wrote:

Jeff Law  writes:

On 2/7/25 5:59 AM, Andrew Waterman wrote:

This patch runs counter to the ABI spec, which states that vxrm is not
preserved across calls and is volatile upon function entry [1].  vxrm
does not play the same role as frm plays in the calling convention.
(I won't get into the rationale in this email, but the rationale isn't
especially important: we should follow the ABI.)

[1] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120

Pan's patch doesn't change the basic property that VXRM has no known
state at function entry or upon return from a function call.


I think it will.  global_regs[X] means that X is defined on entry,
defined on exit, and can be changed by calls.  If the register is
call-clobbered/volatile/caller-saved, then I agree with Andrew that
this doesn't look like the right fix.

But the LCM code we use to manage vxrm assignments makes no assumption
about incoming state and assumes no state is preserved across calls.


In that case, I wonder what the patch is fixing.  Like you say,
the initial mode seems to be VXRM_MODE_NONE, and it looks like
riscv_vxrm_mode_after correctly models calls as clobbering the mode.

In the FRM case, the problem was that we had:

   entry:
 call initialize
 X := FRM
 ...
 FRM := X

Since FRM was not previously defined on entry, and since the call in any
case was assumed to clobber FRM, the X := FRM seemed to be reading an
uninitialised value, and so the FRM := X could be folded away.
But from your description, and from an admittedly cursory look at
the code, it sounds like that couldn't happen for VXRM.
The biggest difference with FRM is you can't call into glibc with FRM in 
a non-default state.  VXRM is simpler in that regard as its entirely 
managed by the compiler with no expectations of state within glibc. 
VXRM also has far fewer uses than FRM as VXRM just twiddles rounding 
modes for one small group of instructions.


Jeff


Richard

Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-11 Thread Lulu Cheng




在 2025/2/12 上午3:30, Xi Ruoyao 写道:

On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote:

在 2025/2/7 下午8:09, Xi Ruoyao 写道:
/* snip */

-
-(define_insn "lasx_xvpickev_w"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-   (vec_select:V8SI
-     (vec_concat:V16SI
-       (match_operand:V8SI 1 "register_operand" "f")
-       (match_operand:V8SI 2 "register_operand" "f"))
-     (parallel [(const_int 0) (const_int 2)
-    (const_int 8) (const_int 10)
-    (const_int 4) (const_int 6)
-    (const_int 12) (const_int 14)])))]
-  "ISA_HAS_LASX"
-  "xvpickev.w\t%u0,%u2,%u1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V8SI")])
-

/* snip */

+;; Picking even/odd elements.
+(define_insn "simd_pick_evod_"
+  [(set (match_operand:ALLVEC 0 "register_operand" "=f")
+   (vec_select:ALLVEC
+     (vec_concat:
+       (match_operand:ALLVEC 1 "register_operand" "f")
+       (match_operand:ALLVEC 2 "register_operand" "f"))
+     (match_operand: 3 "vect_par_cnst_even_or_odd_half")))]

For LASX, the generated select array is problematic, taking xvpickev.w
as an example:

xvpickev.w  vd,vj,vk

The behavior of the instruction is as follows:

vd.w[0] = vk.w[0]

vd.w[1] = vk.w[2]

vd.w[2] = vj.w[0]

vd.w[3] = vj.w[2]

vd.w[4] = vk.w[4]

vd.w[5] = vk.w[6]

vd.w[6] = vj.w[4]

vd.w[7] = vj.w[6]

Oops stupid I.  Strangely the bootstrapping (even with BOOT_CFLAGS="-O2
-g -march=la664") and regtesting cannot catch it.
In r15-6488, the issue also exists in the xvexth fixed by Guo Jie, and 
neither bootstrap nor spec tests have detected it.


I'll limit this to LSX in v2.

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-11 Thread Uros Bizjak

On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu  wrote:
>
> Don't assume that stack slots can only be accessed by stack or frame
> registers.  We first find all registers defined by stack or frame
> registers.  Then check memory accesses by such registers, including
> stack and frame registers.

I wonder if this approach will also handle cases like e.g.:

lea64(%rsp), %rbx
...
movaps16(%rbx, %rcx), %xmm0

and:

movq%rsp, %rax
...
lea64(%rax), %rbx
...
movaps16(%rbx), %xmm0

?

Thanks,
uros.


>
> gcc/
>
> PR target/109780
> PR target/109093
> * config/i386/i386.cc (ix86_update_stack_alignment): New.
> (ix86_find_all_reg_use): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> PR target/109093
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109093-1.c: Likewise.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.
>
> --
> H.J.

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-11 Thread H.J. Lu

On Wed, Feb 12, 2025 at 3:16 PM Uros Bizjak  wrote:
>
> On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu  wrote:
> >
> > Don't assume that stack slots can only be accessed by stack or frame
> > registers.  We first find all registers defined by stack or frame
> > registers.  Then check memory accesses by such registers, including
> > stack and frame registers.
>
> I wonder if this approach will also handle cases like e.g.:
>
> lea64(%rsp), %rbx
> ...
> movaps16(%rbx, %rcx), %xmm0
>
> and:
>
> movq%rsp, %rax
> ...
> lea64(%rax), %rbx
> ...
> movaps16(%rbx), %xmm0
>
> ?

They should be handled by ix86_find_all_reg_use

 do
{
  reg = bitmap_clear_first_set_bit (worklist);
  ix86_find_all_reg_use (stack_slot_access, reg, worklist);
}
  while (!bitmap_empty_p (worklist));


> Thanks,
> uros.
>
>
> >
> > gcc/
> >
> > PR target/109780
> > PR target/109093
> > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > (ix86_find_all_reg_use): Likewise.
> > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > from registers defined by stack or frame registers.
> >
> > gcc/testsuite/
> >
> > PR target/109780
> > PR target/109093
> > * g++.target/i386/pr109780-1.C: New test.
> > * gcc.target/i386/pr109093-1.c: Likewise.
> > * gcc.target/i386/pr109780-1.c: Likewise.
> > * gcc.target/i386/pr109780-2.c: Likewise.
> >
> > --
> > H.J.



--
H.J.

61 matches

Mail list logo