Re: [PATCH] Synchronize include/dwarf2.def with binutils
On Mon, Feb 10, 2025 at 04:21:28PM -, Roger Sayle wrote: > 2025-02-10 Roger Sayle > > include/ChangeLog > * dwarf2.def(DW_CFA_AARCH64_negate_ra_state_with_pc): Define. Space after def Ok for trunk with that nit fixed. > diff --git a/include/dwarf2.def b/include/dwarf2.def > index e9acb79df9c..989f078041d 100644 > --- a/include/dwarf2.def > +++ b/include/dwarf2.def > @@ -788,6 +788,8 @@ DW_CFA (DW_CFA_hi_user, 0x3f) > > /* SGI/MIPS specific. */ > DW_CFA (DW_CFA_MIPS_advance_loc8, 0x1d) > +/* AArch64 extensions. */ > +DW_CFA (DW_CFA_AARCH64_negate_ra_state_with_pc, 0x2c) > /* GNU extensions. > NOTE: DW_CFA_GNU_window_save is multiplexed on Sparc and AArch64. */ > DW_CFA (DW_CFA_GNU_window_save, 0x2d) Jakub
[PATCH v2] RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets
This is a follow-up to the patch below to avoid generating unrecognized vsetivl instructions for XTheadVector. https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html PR target/118601 gcc/ChangeLog: * config/riscv/riscv-string.cc (expand_block_move): Check with new constraint 'vl' instead of 'K'. (expand_vec_setmem): Likewise. (expand_vec_cmpmem): Likewise. * config/riscv/riscv-v.cc (force_vector_length_operand): Likewise. (expand_load_store): Likewise. (expand_strided_load): Likewise. (expand_strided_store): Likewise. (expand_lanes_load_store): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to... * gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here. * gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test. * gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test. Reported-by: Edwin Lu --- gcc/config/riscv/riscv-string.cc | 6 +-- gcc/config/riscv/riscv-v.cc | 10 ++-- .../riscv/rvv/xtheadvector/pr114194-rv32.c| 51 +++ .../{pr114194.c => pr114194-rv64.c} | 5 +- .../riscv/rvv/xtheadvector/pr118601.c | 18 +++ 5 files changed, 79 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c rename gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/{pr114194.c => pr114194-rv64.c} (80%) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr118601.c diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 97e20bdb002..408eb07e87f 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1275,7 +1275,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in, bool movmem_p) machine_mode mask_mode = riscv_vector::get_vector_mode (BImode, GET_MODE_NUNITS (info.vmode)).require (); rtx mask = CONSTM1_RTX (mask_mode); - if (!satisfies_constraint_K (cnt)) + if (!satisfies_constraint_vl (cnt)) cnt= force_reg (Pmode, cnt); rtx m_ops[] = {vec, mask, src}; emit_nonvlmax_insn (code_for_pred_mov (info.vmode), @@ -1626,7 +1626,7 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in) } else { - if (!satisfies_constraint_K (info.avl)) + if (!satisfies_constraint_vl (info.avl)) info.avl = force_reg (Pmode, info.avl); emit_nonvlmax_insn (code_for_pred_broadcast (info.vmode), riscv_vector::UNARY_OP, broadcast_ops, info.avl); @@ -1694,7 +1694,7 @@ expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx blk_b_in, rtx length_in) } else { - if (!satisfies_constraint_K (length_in)) + if (!satisfies_constraint_vl (length_in)) length_in = force_reg (Pmode, length_in); rtx memmask = CONSTM1_RTX (mask_mode); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 9847439ca77..62456c7ef79 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -2103,7 +2103,7 @@ get_unknown_min_value (machine_mode mode) static rtx force_vector_length_operand (rtx vl) { - if (CONST_INT_P (vl) && !satisfies_constraint_K (vl)) + if (CONST_INT_P (vl) && !satisfies_constraint_vl (vl)) return force_reg (Pmode, vl); return vl; } @@ -4130,7 +4130,7 @@ expand_load_store (rtx *ops, bool is_load) } else { - if (!satisfies_constraint_K (len)) + if (!satisfies_constraint_vl (len)) len = force_reg (Pmode, len); if (is_load) { @@ -4165,7 +4165,7 @@ expand_strided_load (machine_mode mode, rtx *ops) emit_vlmax_insn (icode, BINARY_OP_TAMA, emit_ops); else { - len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len); + len = satisfies_constraint_vl (len) ? len : force_reg (Pmode, len); emit_nonvlmax_insn (icode, BINARY_OP_TAMA, emit_ops, len); } } @@ -4191,7 +4191,7 @@ expand_strided_store (machine_mode mode, rtx *ops) } else { - len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len); + len = satisfies_constraint_vl (len) ? len : force_reg (Pmode, len); vl_type = get_avl_type_rtx (NONVLMAX); } @@ -4642,7 +4642,7 @@ expand_lanes_load_store (rtx *ops, bool is_load) } else { - if (!satisfies_constraint_K (len)) + if (!satisfies_constraint_vl (len)) len = force_reg (Pmode, len); if (is_load) { diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c new file mode 100644 index 000..f95e713ea24 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c @@ -0,0 +1,51 @@ +/* { dg-do compile { target { { ! riscv_abi_e } && rv32 } } } */ +/* { dg-options "-march=rv3
Re: [PATCH v3 2/5] c++/modules: Ignore TU-local entities where necessary
On Mon, Jan 27, 2025 at 10:20:05AM -0500, Patrick Palka wrote: > [snip] > > > @@ -18486,6 +18562,12 @@ dependent_operand_p (tree t) > > { > >while (TREE_CODE (t) == IMPLICIT_CONV_EXPR) > > t = TREE_OPERAND (t, 0); > > + > > + /* If we contain a TU_LOCAL_ENTITY assume we're non-dependent; we'll > > error > > + later when instantiating. */ > > + if (expr_contains_tu_local_entity (t)) > > +return false; > > I think it'd be more robust and cheaper (avoiding a separate tree walk) > to teach the general constexpr/dependence predicates about > TU_LOCAL_ENTITY instead of handling it only here. > > > + > >++processing_template_decl; > >bool r = (potential_constant_expression (t) > > ? value_dependent_expression_p (t) > > @@ -20255,6 +20337,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t > > complain, tree in_decl) > > else > > object = NULL_TREE; > > > > + if (function_contains_tu_local_entity (templ)) > > + RETURN (error_mark_node); > > + > > tree tid = lookup_template_function (templ, targs); > > protected_set_expr_location (tid, EXPR_LOCATION (t)); > > > > @@ -20947,6 +21032,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t > > complain, tree in_decl) > > qualified_p = true; > > } > > > > + if (function_contains_tu_local_entity (function)) > > + RETURN (error_mark_node); > > Similarly, maybe it'd suffice to check this more generally in the > OVERLOAD case of tsubst_expr? > So I'd completely missed the idea of handling it in the OVERLOAD case; doing this also fixes the issues I'd been having trying to handle it in potential_constant_expression. I think this should be a lot cleaner now. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- Subject: [PATCH] c++: Handle TU_LOCAL_ENTITY in tsubst_expr and potential_constant_expression This cleans up the TU_LOCAL_ENTITY handling to avoid unnecessary tree walks and make the logic more robust. gcc/cp/ChangeLog: * constexpr.cc (potential_constant_expression_1): Handle TU_LOCAL_ENTITY. * pt.cc (expr_contains_tu_local_entity): Remove. (function_contains_tu_local_entity): Remove. (dependent_operand_p): Remove special handling for TU_LOCAL_ENTITY. (tsubst_expr): Handle TU_LOCAL_ENTITY when tsubsting OVERLOADs; remove now-unnecessary extra handling. (type_dependent_expression_p): Handle TU_LOCAL_ENTITY. Signed-off-by: Nathaniel Shead --- gcc/cp/constexpr.cc | 5 +++ gcc/cp/pt.cc| 80 - 2 files changed, 19 insertions(+), 66 deletions(-) diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index f142dd32bc8..b36705fd4ce 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -10825,6 +10825,11 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now, case CO_RETURN_EXPR: return false; +/* Assume a TU-local entity is not constant, we'll error later when + instantiating. */ +case TU_LOCAL_ENTITY: + return false; + case NONTYPE_ARGUMENT_PACK: { tree args = ARGUMENT_PACK_ARGS (t); diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index f857b3f1180..966050a6608 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -9935,61 +9935,6 @@ complain_about_tu_local_entity (tree e) inform (TU_LOCAL_ENTITY_LOCATION (e), "declared here"); } -/* Checks if T contains a TU-local entity. */ - -static bool -expr_contains_tu_local_entity (tree t) -{ - if (!modules_p ()) -return false; - - auto walker = [](tree *tp, int *walk_subtrees, void *) -> tree -{ - if (TREE_CODE (*tp) == TU_LOCAL_ENTITY) - return *tp; - if (!EXPR_P (*tp)) - *walk_subtrees = false; - return NULL_TREE; -}; - return cp_walk_tree (&t, walker, nullptr, nullptr); -} - -/* Errors and returns TRUE if X is a function that contains a TU-local - entity in its overload set. */ - -static bool -function_contains_tu_local_entity (tree x) -{ - if (!modules_p ()) -return false; - - if (!x || x == error_mark_node) -return false; - - if (TREE_CODE (x) == OFFSET_REF - || TREE_CODE (x) == COMPONENT_REF) -x = TREE_OPERAND (x, 1); - x = MAYBE_BASELINK_FUNCTIONS (x); - if (TREE_CODE (x) == TEMPLATE_ID_EXPR) -x = TREE_OPERAND (x, 0); - - if (OVL_P (x)) -for (tree ovl : lkp_range (x)) - if (TREE_CODE (ovl) == TU_LOCAL_ENTITY) - { - x = ovl; - break; - } - - if (TREE_CODE (x) == TU_LOCAL_ENTITY) -{ - complain_about_tu_local_entity (x); - return true; -} - - return false; -} - /* Return a TEMPLATE_ID_EXPR corresponding to the indicated FNS and ARGLIST. Valid choices for FNS are given in the cp-tree.def documentation for TEMPLATE_ID_EXPR. */ @@ -18797,11 +18742,6 @@ dependent_operand_p (tree t) while (TREE_CODE (t) == IMPLICIT_CONV_EXPR) t = TREE_OPERAND (t, 0); - /* If we
Re: [PATCH v2] RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets
LGTM, that seems right way to fix :) Jin Ma 於 2025年2月11日 週二 21:45 寫道: > This is a follow-up to the patch below to avoid generating unrecognized > vsetivl instructions for XTheadVector. > > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html > > PR target/118601 > > gcc/ChangeLog: > > * config/riscv/riscv-string.cc (expand_block_move): Check with new > constraint 'vl' instead of 'K'. > (expand_vec_setmem): Likewise. > (expand_vec_cmpmem): Likewise. > * config/riscv/riscv-v.cc (force_vector_length_operand): Likewise. > (expand_load_store): Likewise. > (expand_strided_load): Likewise. > (expand_strided_store): Likewise. > (expand_lanes_load_store): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to... > * gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here. > * gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test. > * gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test. > > Reported-by: Edwin Lu > --- > gcc/config/riscv/riscv-string.cc | 6 +-- > gcc/config/riscv/riscv-v.cc | 10 ++-- > .../riscv/rvv/xtheadvector/pr114194-rv32.c| 51 +++ > .../{pr114194.c => pr114194-rv64.c} | 5 +- > .../riscv/rvv/xtheadvector/pr118601.c | 18 +++ > 5 files changed, 79 insertions(+), 11 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c > rename gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/{pr114194.c => > pr114194-rv64.c} (80%) > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr118601.c > > diff --git a/gcc/config/riscv/riscv-string.cc > b/gcc/config/riscv/riscv-string.cc > index 97e20bdb002..408eb07e87f 100644 > --- a/gcc/config/riscv/riscv-string.cc > +++ b/gcc/config/riscv/riscv-string.cc > @@ -1275,7 +1275,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx > length_in, bool movmem_p) >machine_mode mask_mode = riscv_vector::get_vector_mode > (BImode, GET_MODE_NUNITS (info.vmode)).require (); >rtx mask = CONSTM1_RTX (mask_mode); > - if (!satisfies_constraint_K (cnt)) > + if (!satisfies_constraint_vl (cnt)) > cnt= force_reg (Pmode, cnt); >rtx m_ops[] = {vec, mask, src}; >emit_nonvlmax_insn (code_for_pred_mov (info.vmode), > @@ -1626,7 +1626,7 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx > fill_value_in) > } >else > { > - if (!satisfies_constraint_K (info.avl)) > + if (!satisfies_constraint_vl (info.avl)) > info.avl = force_reg (Pmode, info.avl); >emit_nonvlmax_insn (code_for_pred_broadcast (info.vmode), > riscv_vector::UNARY_OP, broadcast_ops, info.avl); > @@ -1694,7 +1694,7 @@ expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx > blk_b_in, rtx length_in) > } >else > { > - if (!satisfies_constraint_K (length_in)) > + if (!satisfies_constraint_vl (length_in)) > length_in = force_reg (Pmode, length_in); > >rtx memmask = CONSTM1_RTX (mask_mode); > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc > index 9847439ca77..62456c7ef79 100644 > --- a/gcc/config/riscv/riscv-v.cc > +++ b/gcc/config/riscv/riscv-v.cc > @@ -2103,7 +2103,7 @@ get_unknown_min_value (machine_mode mode) > static rtx > force_vector_length_operand (rtx vl) > { > - if (CONST_INT_P (vl) && !satisfies_constraint_K (vl)) > + if (CONST_INT_P (vl) && !satisfies_constraint_vl (vl)) > return force_reg (Pmode, vl); >return vl; > } > @@ -4130,7 +4130,7 @@ expand_load_store (rtx *ops, bool is_load) > } >else > { > - if (!satisfies_constraint_K (len)) > + if (!satisfies_constraint_vl (len)) > len = force_reg (Pmode, len); >if (is_load) > { > @@ -4165,7 +4165,7 @@ expand_strided_load (machine_mode mode, rtx *ops) > emit_vlmax_insn (icode, BINARY_OP_TAMA, emit_ops); >else > { > - len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len); > + len = satisfies_constraint_vl (len) ? len : force_reg (Pmode, len); >emit_nonvlmax_insn (icode, BINARY_OP_TAMA, emit_ops, len); > } > } > @@ -4191,7 +4191,7 @@ expand_strided_store (machine_mode mode, rtx *ops) > } >else > { > - len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len); > + len = satisfies_constraint_vl (len) ? len : force_reg (Pmode, len); >vl_type = get_avl_type_rtx (NONVLMAX); > } > > @@ -4642,7 +4642,7 @@ expand_lanes_load_store (rtx *ops, bool is_load) > } >else > { > - if (!satisfies_constraint_K (len)) > + if (!satisfies_constraint_vl (len)) > len = force_reg (Pmode, len); >if (is_load) > { > diff --git > a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c > b/
Re: [PATCH] RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets
On 10/02/2025 08:37, Jin Ma wrote: On Sun, 09 Feb 2025 14:04:00 +0800, Jin Ma wrote: PR target/118601 Ok for trunk? Best regards, Jin Ma gcc/ChangeLog: * config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p): Exclude XTheadVector. Reported-by: Edwin Lu --- gcc/config/riscv/riscv.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 819e1538741..e5776aa0fbe 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -13826,7 +13826,7 @@ riscv_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, /* For set/clear with size > UNITS_PER_WORD, by pieces uses vector broadcasts with UNITS_PER_WORD size pieces. Use setmem instead which can use bigger chunks. */ - if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR + if (TARGET_VECTOR && !TARGET_XTHEADVECTOR && stringop_strategy & STRATEGY_VECTOR && (op == CLEAR_BY_PIECES || op == SET_BY_PIECES) && speed_p && size > UNITS_PER_WORD) return false; `riscv_vector::expand_vec_setmem` generates the unrecognizable instruction and your patch avoids calling it in some, but not all, cases. Here is a case that still ICEs with `-march=rv32gc_xtheadvector -mabi=ilp32d` and `-march=rv64gc_xtheadvector -mabi=lp64d` after applying your patch: ``` void foo1_16 (void *p) { __builtin_memset (p, 1, 16); } ``` I suggest returning `false` in `riscv_vector::expand_vec_setmem` for `TARGET_XTHEADVECTOR` or trying to generate something that is valid for `TARGET_XTHEADVECTOR`. If you do bail out of `riscv_vector::expand_vec_setmem` then you probably want to keep your existing change too so that by pieces is still used for smaller lengths rather than a libcall. -- 2.25.1
[PATCH] tree-optimization/118817 - missed folding of PRE inserted code
When PRE inserts code it is not fully folded with following SSA edges which can cause missed optimizations since the next fully folding pass is way ahead, after strlen which in the PRs case leads to diagnostics emitted on dead code. The following mitigates the missed expression canonicalization that happens during PHI translation where to be inserted expressions are calculated. It is largely refactoring and eliminating the single use of fully_constant_expression and otherwise leverages the work already done by vn_nary_simplify by updating the NARY with the simplified expression. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/118817 * tree-ssa-pre.cc (fully_constant_expression): Fold into the single caller. (phi_translate_1): Refactor folded in fully_constant_expression. * tree-ssa-sccvn.cc (vn_nary_simplify): Update the NARY with the simplified expression. * g++.dg/lto/pr118817_0.C: New testcase. --- gcc/testsuite/g++.dg/lto/pr118817_0.C | 17 gcc/tree-ssa-pre.cc | 111 +- gcc/tree-ssa-sccvn.cc | 13 ++- 3 files changed, 65 insertions(+), 76 deletions(-) create mode 100644 gcc/testsuite/g++.dg/lto/pr118817_0.C diff --git a/gcc/testsuite/g++.dg/lto/pr118817_0.C b/gcc/testsuite/g++.dg/lto/pr118817_0.C new file mode 100644 index 000..ae65f34504e --- /dev/null +++ b/gcc/testsuite/g++.dg/lto/pr118817_0.C @@ -0,0 +1,17 @@ +// { dg-lto-do link } +// { dg-lto-options { { -O3 -fPIC -flto -shared -std=c++20 -Wall } } } +// { dg-require-effective-target fpic } +// { dg-require-effective-target shared } + +#include +#include +#include + +int func() +{ + auto strVec = std::make_unique>(); + strVec->emplace_back("One"); + strVec->emplace_back("Two"); + strVec->emplace_back("Three"); + return 0; +} diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc index 735893bb191..ecf45d29e76 100644 --- a/gcc/tree-ssa-pre.cc +++ b/gcc/tree-ssa-pre.cc @@ -1185,41 +1185,6 @@ get_or_alloc_expr_for_constant (tree constant) return newexpr; } -/* Return the folded version of T if T, when folded, is a gimple - min_invariant or an SSA name. Otherwise, return T. */ - -static pre_expr -fully_constant_expression (pre_expr e) -{ - switch (e->kind) -{ -case CONSTANT: - return e; -case NARY: - { - vn_nary_op_t nary = PRE_EXPR_NARY (e); - tree res = vn_nary_simplify (nary); - if (!res) - return e; - if (is_gimple_min_invariant (res)) - return get_or_alloc_expr_for_constant (res); - if (TREE_CODE (res) == SSA_NAME) - return get_or_alloc_expr_for_name (res); - return e; - } -case REFERENCE: - { - vn_reference_t ref = PRE_EXPR_REFERENCE (e); - tree folded; - if ((folded = fully_constant_vn_reference_p (ref))) - return get_or_alloc_expr_for_constant (folded); - return e; - } -default: - return e; -} -} - /* Translate the VUSE backwards through phi nodes in E->dest, so that it has the value it would have in E->src. Set *SAME_VALID to true in case the new vuse doesn't change the value id of the OPERANDS. */ @@ -1443,57 +1408,55 @@ phi_translate_1 (bitmap_set_t dest, } if (changed) { - pre_expr constant; unsigned int new_val_id; - PRE_EXPR_NARY (expr) = newnary; - constant = fully_constant_expression (expr); - PRE_EXPR_NARY (expr) = nary; - if (constant != expr) + /* Try to simplify the new NARY. */ + tree res = vn_nary_simplify (newnary); + if (res) { + if (is_gimple_min_invariant (res)) + return get_or_alloc_expr_for_constant (res); + /* For non-CONSTANTs we have to make sure we can eventually insert the expression. Which means we need to have a leader for it. */ - if (constant->kind != CONSTANT) + gcc_assert (TREE_CODE (res) == SSA_NAME); + + /* Do not allow simplifications to non-constants over + backedges as this will likely result in a loop PHI node + to be inserted and increased register pressure. + See PR77498 - this avoids doing predcoms work in + a less efficient way. */ + if (e->flags & EDGE_DFS_BACK) + ; + else { - /* Do not allow simplifications to non-constants over - backedges as this will likely result in a loop PHI node - to be inserted and increased register pressure. - See PR77498 - this avoids doing predcoms work in - a less efficient way. */ - if (e->flags & EDGE_DFS_BACK) -
Re: [PATCH 2/3] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.
On Tue, 2025-02-11 at 20:49 +0800, Lulu Cheng wrote: > Split the implementation of the function loongarch_cpu_cpp_builtins > into two parts: > 1. Macro definitions that do not change (only considering 64-bit > architecture) > 2. Macro definitions that change with different compilation options. > > gcc/ChangeLog: > > * config/loongarch/loongarch-c.cc (builtin_undef): New macro. > (loongarch_cpu_cpp_builtins): Split to > loongarch_update_cpp_builtins > and loongarch_define_unconditional_macros. > (loongarch_def_or_undef): New functions. > (loongarch_define_unconditional_macros): Likewise. > (loongarch_update_cpp_builtins): Likewise. > > Change-Id: Ifae73ffa2a07a595ed2a7f6ab7b82d8f51328a2a > --- > gcc/config/loongarch/loongarch-c.cc | 109 +-- > - > 1 file changed, 66 insertions(+), 43 deletions(-) > > diff --git a/gcc/config/loongarch/loongarch-c.cc > b/gcc/config/loongarch/loongarch-c.cc > index 5d8c02e094b..9fe911325ab 100644 > --- a/gcc/config/loongarch/loongarch-c.cc > +++ b/gcc/config/loongarch/loongarch-c.cc > @@ -31,13 +31,21 @@ along with GCC; see the file COPYING3. If not see > > #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == > CLK_ASM) > #define builtin_define(TXT) cpp_define (pfile, TXT) > +#define builtin_undef(TXT) cpp_undef (pfile, TXT) > #define builtin_assert(TXT) cpp_assert (pfile, TXT) > > -void > -loongarch_cpu_cpp_builtins (cpp_reader *pfile) > +static void > +loongarch_def_or_undef (bool def_p, const char *macro, cpp_reader > *pfile) > +{ > + if (def_p) > + cpp_define (pfile, macro); > + else > + cpp_undef (pfile, macro); > +} > + > +static void > +loongarch_define_unconditional_macros (cpp_reader *pfile) > { > - builtin_assert ("machine=loongarch"); > - builtin_assert ("cpu=loongarch"); > builtin_define ("__loongarch__"); > > builtin_define_with_value ("__loongarch_arch", > @@ -66,45 +74,6 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) > builtin_define ("__loongarch_lp64"); > } > > - /* These defines reflect the ABI in use, not whether the > - FPU is directly accessible. */ > - if (TARGET_DOUBLE_FLOAT_ABI) > - builtin_define ("__loongarch_double_float=1"); > - else if (TARGET_SINGLE_FLOAT_ABI) > - builtin_define ("__loongarch_single_float=1"); > - > - if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI) > - builtin_define ("__loongarch_hard_float=1"); > - else > - builtin_define ("__loongarch_soft_float=1"); > - > - > - /* ISA Extensions. */ > - if (TARGET_DOUBLE_FLOAT) > - builtin_define ("__loongarch_frlen=64"); > - else if (TARGET_SINGLE_FLOAT) > - builtin_define ("__loongarch_frlen=32"); > - else > - builtin_define ("__loongarch_frlen=0"); > - > - if (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE) > - builtin_define ("__loongarch_frecipe"); > - > - if (ISA_HAS_LSX) > - { > - builtin_define ("__loongarch_simd"); > - builtin_define ("__loongarch_sx"); > - > - if (!ISA_HAS_LASX) > - builtin_define ("__loongarch_simd_width=128"); > - } > - > - if (ISA_HAS_LASX) > - { > - builtin_define ("__loongarch_asx"); > - builtin_define ("__loongarch_simd_width=256"); > - } > - > /* ISA evolution features */ > int max_v_major = 1, max_v_minor = 0; I guess the handling for la_evo_macro_name macros (like __loongarch_div32) and __loongarch_version_major/__loongarch_version_minor should be moved as well? Things like #pragma GCC target("arch=la664") may affect them. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targets
On Tue, 11 Feb 2025 20:29:03 +0800, Craig Blackmore wrote: > On 10/02/2025 08:37, Jin Ma wrote: > > On Sun, 09 Feb 2025 14:04:00 +0800, Jin Ma wrote: > >>PR target/118601 > > > > Ok for trunk? > > > > Best regards, > > Jin Ma > > > >> gcc/ChangeLog: > >> > >>* config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p): > >>Exclude XTheadVector. > >> > >> Reported-by: Edwin Lu > >> --- > >> gcc/config/riscv/riscv.cc | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > >> index 819e1538741..e5776aa0fbe 100644 > >> --- a/gcc/config/riscv/riscv.cc > >> +++ b/gcc/config/riscv/riscv.cc > >> @@ -13826,7 +13826,7 @@ riscv_use_by_pieces_infrastructure_p (unsigned > >> HOST_WIDE_INT size, > >> /* For set/clear with size > UNITS_PER_WORD, by pieces uses vector > >> broadcasts > >>with UNITS_PER_WORD size pieces. Use setmem instead which > >> can use > >>bigger chunks. */ > >> - if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR > >> + if (TARGET_VECTOR && !TARGET_XTHEADVECTOR && stringop_strategy & > >> STRATEGY_VECTOR > >> && (op == CLEAR_BY_PIECES || op == SET_BY_PIECES) > >> && speed_p && size > UNITS_PER_WORD) > >> return false; > > `riscv_vector::expand_vec_setmem` generates the unrecognizable > instruction and your patch avoids calling it in some, but not all, > cases. Here is a case that still ICEs with `-march=rv32gc_xtheadvector > -mabi=ilp32d` and `-march=rv64gc_xtheadvector -mabi=lp64d` after > applying your patch: > ``` > void foo1_16 (void *p) > { >__builtin_memset (p, 1, 16); > } > ``` > I suggest returning `false` in `riscv_vector::expand_vec_setmem` for > `TARGET_XTHEADVECTOR` or trying to generate something that is valid for > `TARGET_XTHEADVECTOR`. If you do bail out of > `riscv_vector::expand_vec_setmem` then you probably want to keep your > existing change too so that by pieces is still used for smaller lengths > rather than a libcall. Thank you very much for your professional reply. I think this problem is very simple and wrong judgment has occurred. I will rethink and think about this. Best regards, Jin Ma > >> -- > >> 2.25.1 > >
Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]
Jeff Law writes: > On 2/7/25 5:59 AM, Andrew Waterman wrote: >> This patch runs counter to the ABI spec, which states that vxrm is not >> preserved across calls and is volatile upon function entry [1]. vxrm >> does not play the same role as frm plays in the calling convention. >> (I won't get into the rationale in this email, but the rationale isn't >> especially important: we should follow the ABI.) >> >> [1] >> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120 > Pan's patch doesn't change the basic property that VXRM has no known > state at function entry or upon return from a function call. I think it will. global_regs[X] means that X is defined on entry, defined on exit, and can be changed by calls. If the register is call-clobbered/volatile/caller-saved, then I agree with Andrew that this doesn't look like the right fix. Thanks, Richard
Re: [PATCH] aarch64: Update fp8 dependencies
Andrew Carlotti writes: > We agreed with LLVM developer to not enforce the architectural > dependencies between fp8 multiplication features, and they have already > been removed from LLVM and Binutils. Remove them from GCC as well. > > > > I have bootstrapped and regression tested this. There are no test > result changes between GCC+Binutils with old feature dependencies and > GCC+Binutils with new feature dependencies, and some improvements > compared to old GCC with new Binutils. > > Ok for master? > > > gcc/ChangeLog: > > * config/aarch64/aarch64-option-extensions.def > (SSVE_FP8FMA): Adjust formatting. > (FP8DOT4): Replace FP8FMA dependency with FP8. > (SSVE_FP8DOT4): Replace SSVE_FP8FMA dependency with SME2+FP8. > (FP8DOT2): Replace FP8DOT4 dependency with FP8. > (SSVE_FP8DOT2): Replace SSVE_FP8DOT4 dependency with SME2+FP8. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/pragma_cpp_predefs_4.c: Adjust expected > defines. > * gcc.target/aarch64/simd/vmla_lane_indices_1.c: Modify target > pragmas. > * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: > Ditto. > * > gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c: > Ditto. > * gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Ditto. > * gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Ditto. OK, thanks. Richard > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def > b/gcc/config/aarch64/aarch64-option-extensions.def > index > cc42bd518dca5e4b947c81f06e543133b4f25440..aa8d315c240fbd25b49008b131cc09f04001eb80 > 100644 > --- a/gcc/config/aarch64/aarch64-option-extensions.def > +++ b/gcc/config/aarch64/aarch64-option-extensions.def > @@ -261,17 +261,17 @@ AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), > "f8cvt") > > AARCH64_OPT_EXTENSION("fp8fma", FP8FMA, (FP8), (), (), "f8fma") > > -AARCH64_OPT_EXTENSION("ssve-fp8fma", SSVE_FP8FMA, (SME2,FP8), (), (), > "smesf8fma") > +AARCH64_OPT_EXTENSION("ssve-fp8fma", SSVE_FP8FMA, (SME2, FP8), (), (), > "smesf8fma") > > AARCH64_OPT_EXTENSION("faminmax", FAMINMAX, (SIMD), (), (), "faminmax") > > -AARCH64_OPT_EXTENSION("fp8dot4", FP8DOT4, (FP8FMA), (), (), "f8dp4") > +AARCH64_OPT_EXTENSION("fp8dot4", FP8DOT4, (FP8), (), (), "f8dp4") > > -AARCH64_OPT_EXTENSION("ssve-fp8dot4", SSVE_FP8DOT4, (SSVE_FP8FMA), (), (), > "smesf8dp4") > +AARCH64_OPT_EXTENSION("ssve-fp8dot4", SSVE_FP8DOT4, (SME2, FP8), (), (), > "smesf8dp4") > > -AARCH64_OPT_EXTENSION("fp8dot2", FP8DOT2, (FP8DOT4), (), (), "f8dp2") > +AARCH64_OPT_EXTENSION("fp8dot2", FP8DOT2, (FP8), (), (), "f8dp2") > > -AARCH64_OPT_EXTENSION("ssve-fp8dot2", SSVE_FP8DOT2, (SSVE_FP8DOT4), (), (), > "smesf8dp2") > +AARCH64_OPT_EXTENSION("ssve-fp8dot2", SSVE_FP8DOT2, (SME2, FP8), (), (), > "smesf8dp2") > > AARCH64_OPT_EXTENSION("lut", LUT, (SIMD), (), (), "lut") > > diff --git a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c > b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c > index > 0dcfbec05bad5f446c9f169051c9b86b9844946d..97d68b94512e1ffdd5ceb484a6378b3a1ec9d115 > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c > +++ b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c > @@ -292,7 +292,7 @@ > #ifndef __ARM_FEATURE_FP8 > #error Foo > #endif > -#ifndef __ARM_FEATURE_FP8FMA > +#ifdef __ARM_FEATURE_FP8FMA > #error Foo > #endif > #ifndef __ARM_FEATURE_FP8DOT4 > @@ -306,10 +306,10 @@ > #ifndef __ARM_FEATURE_FP8 > #error Foo > #endif > -#ifndef __ARM_FEATURE_FP8FMA > +#ifdef __ARM_FEATURE_FP8FMA > #error Foo > #endif > -#ifndef __ARM_FEATURE_FP8DOT4 > +#ifdef __ARM_FEATURE_FP8DOT4 > #error Foo > #endif > #ifndef __ARM_FEATURE_FP8DOT2 > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c > b/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c > index > d1a69f4ba54133a5d6d19b5fb73c2768ec29e60b..739ff4c6a75a8014637b2b48d8121127ad6a8539 > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmla_lane_indices_1.c > @@ -2,7 +2,7 @@ > > #include "arm_neon.h" > > -#pragma GCC target "+fp8dot4+fp8dot2" > +#pragma GCC target "+fp8fma" > > void > test(float16x4_t f16, float16x8_t f16q, float32x2_t f32, > diff --git > a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c > index > 9ad789a8ad2c5df109d6471a7ca22355ba26edea..fa0df46db2262a5a3e17bec974fb4807886708e9 > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c > @@ -2,7 +2,7 @@ > > #include > > -#pragma GCC target ("arch=armv8.2-a+sve2+fp8dot2") > +#pragma GCC target ("arch=armv8.2-a+sve2+fp8fma+fp8dot4+fp8dot2") > > void > test (svfloat16_t f16, svmfloat8_t f8, fpm_t fp
Re: PING^2 [RFC] Prevent the scheduler from moving prefetch instructions when expanding __builtin_prefetch [PR 116713]
Jeff Law writes: > On 2/7/25 5:51 AM, Oleg Endo wrote: >>> Hi, >>> >>> Can the issue be resolved in a target independent manner as suggested below? >>> Or is it better to deal with this in the target code? > That seems like a pretty heavy hammer though. For that reason alone I > think this is going to need some discussion and I believe the folks most > needed for that discussion are focused on release related issues. Yeah, agreed. Prefetches ought to be restricted to performance-critical code, which is also the kind of code that would suffer from having extra blockage instructions. We do have a prefetch rtl code, so it should be possible for the scheduler to recognise prefetches and handle them in a more sensible way. That would be more complex though... Thanks, Richard
Re: [PATCH] lto: Add an entry for cold attribute to lto_gnu_attributes
On Mon, Feb 10, 2025 at 11:01 PM Martin Jambor wrote: > > Hi, > > PR 118125 is a performance regression stemming from the fact that we > lose the cold attribute of our __builtin_unreachable. The attribute > is simply and silently dropped on the floor by decl_attributes (in > attribs.cc) in the process of building decls for builtins because it > cannot look it up in the gnu attribute name space by > lookup_scoped_attribute_spec. For that not to happen it must be in > lto_gnu_attributes and this patch adds it there. > > In comment 13 of the bug Andrew identified other attributes which are > in builtin-attrs.def but missing in lto_gnu_attributes but apart from > cold it seems that they are either not used in builtins.def or are > used in DEF_LIB_BUILTIN which I guess might be less critical? > Eventually I decided to go for the most simple of patches and only add > things if they are requested. For the same reason I also did not add > any checking to the attribute "handle" callback or any exclusion check. > They seem to be mostly relevant before LTO FE kicks in to me, but > again, I'm happy to add any if they seem to be useful. > > Since Ian fixed PR 118746, the same issue has also been fixed in the > Go front-end and so I have added a simple checking assert to the > redirect_to_unreachable function to make sure it has the intended > effect. > > LTO-bootstrapped and tested on x86_64-linux. OK for master? OK. Thanks, Richard. > Thanks, > > Martin > > > gcc/ChangeLog: > > 2025-02-03 Martin Jambor > > PR lto/118125 > * ipa-fnsummary.cc (redirect_to_unreachable): Add checking assert > that the builtin_unreachable decl has attribute cold. > > gcc/lto/ChangeLog: > > 2025-02-03 Martin Jambor > > PR lto/118125 > * lto-lang.cc (lto_gnu_attributes): Add an entry for cold attribute. > (handle_cold_attribute): New function. > --- > gcc/ipa-fnsummary.cc | 3 +++ > gcc/lto/lto-lang.cc | 13 + > 2 files changed, 16 insertions(+) > > diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc > index 33f19365ec3..4c062fe8a0e 100644 > --- a/gcc/ipa-fnsummary.cc > +++ b/gcc/ipa-fnsummary.cc > @@ -255,6 +255,9 @@ redirect_to_unreachable (struct cgraph_edge *e) >struct cgraph_node *target > = cgraph_node::get_create (builtin_decl_unreachable ()); > > + gcc_checking_assert (lookup_attribute ("cold", > +DECL_ATTRIBUTES (target->decl))); > + >if (e->speculative) > e = cgraph_edge::resolve_speculation (e, target->decl); >else if (!e->callee) > diff --git a/gcc/lto/lto-lang.cc b/gcc/lto/lto-lang.cc > index 652d7fc5e30..e41b548b398 100644 > --- a/gcc/lto/lto-lang.cc > +++ b/gcc/lto/lto-lang.cc > @@ -60,6 +60,7 @@ static tree ignore_attribute (tree *, tree, tree, int, bool > *); > static tree handle_format_attribute (tree *, tree, tree, int, bool *); > static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *); > static tree handle_format_arg_attribute (tree *, tree, tree, int, bool *); > +static tree handle_cold_attribute (tree *, tree, tree, int, bool *); > > /* Helper to define attribute exclusions. */ > #define ATTR_EXCL(name, function, type, variable) \ > @@ -128,6 +129,8 @@ static const attribute_spec lto_gnu_attributes[] = > handle_sentinel_attribute, NULL }, >{ "type generic", 0, 0, false, true, true, false, > handle_type_generic_attribute, NULL }, > + { "cold", 0, 0, false, false, false, false, > + handle_cold_attribute, NULL }, >{ "fn spec", 1, 1, false, true, true, false, > handle_fnspec_attribute, NULL }, >{ "transaction_pure", 0, 0, false, true, true, false, > @@ -598,6 +601,16 @@ handle_fnspec_attribute (tree *node ATTRIBUTE_UNUSED, > tree ARG_UNUSED (name), >return NULL_TREE; > } > > +/* Handle a "cold" attribute; arguments as in > + struct attribute_spec.handler. */ > + > +static tree > +handle_cold_attribute (tree *, tree, tree, int, bool *) > +{ > + /* Nothing to be done here. */ > + return NULL_TREE; > +} > + > /* Cribbed from c-common.cc. */ > > static void > -- > 2.47.1 >
Re: [PATCH] middle-end/118801 - excessive redundant DEBUG BEGIN_STMT
On Mon, 10 Feb 2025, Richard Biener wrote: > On Mon, 10 Feb 2025, Richard Biener wrote: > > > The following addresses the fact that we keep an excessive amount of > > redundant DEBUG BEGIN_STMTs - in the testcase it sums up to 99.999% > > of all stmts, sucking up compile-time in IL walks. The patch amends > > the GIMPLE DCE code that elides redundant DEBUG BIND stmts, also > > pruning uninterrupted sequences of DEBUG BEGIN_STMTs, keeping only > > the last one. > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > > > For the testcase this brings down compile-time to 150% of -g0 levels > > (and only 215 out of originally 1981380 DEBUG BEGIN_STMTs kept). > > > > OK for trunk and possibly backports? > > It regresses a few guality checks (and progresses one), I've looked > only into one, g++.dg/guality/pr67192.C, where we now see > FAIL: g++.dg/guality/pr67192.C -O[123sg] line 54 cnt == 15 > because the breakpoint happens in the wrong place. But this shows > it "works" only by accident. The testcase is > > __attribute__((noinline, noclone)) static void > f4 (void) > { > while (1) /* { dg-final { gdb-test 54 "cnt" "15" } } */ > if (last ()) > break; > else > do_it (); > do_it (); /* { dg-final { gdb-test 59 "cnt" "20" } } */ > } > > and we have two BEGIN_STMTs for line 54(!) originally: > > [/space/rguenther/src/gcc/gcc/testsuite/g++.dg/guality/pr67192.C:54:3] # > DEBUG BEGIN_STMT > : > [/space/rguenther/src/gcc/gcc/testsuite/g++.dg/guality/pr67192.C:55:5] # > DEBUG BEGIN_STMT > ... > [/space/rguenther/src/gcc/gcc/testsuite/g++.dg/guality/pr67192.C:54:3] # > DEBUG BEGIN_STMT > [/space/rguenther/src/gcc/gcc/testsuite/g++.dg/guality/pr67192.C:55:5] > goto ; > > and special code in make_blocks() moves the first BEGIN_STMT after > the label, altering when we reach a breakpoint on the line. > > You can see that with the first BEGIN_STMT moved the patch will elide it, > and gdb will find the second location. > > With removing only repeating BEGIN_STMT with exactly > the same location (unfortunately with uint64_t a bitmap no longer > works), we're "only" down to 996 BEGIN_STMTs for the testcase. > > So I'm retesting the following. Bootstrapped and tested on x86_64-unknown-linux-gnu without regressions this time. Alex, is this OK for trunk? Thanks, Richard. > Richard. > > From 38d49d3e2c0bf98e9e2a46e251ae0454b084cc8d Mon Sep 17 00:00:00 2001 > From: Richard Biener > Date: Mon, 10 Feb 2025 10:23:45 +0100 > Subject: [PATCH] middle-end/118801 - excessive redundant DEBUG BEGIN_STMT > To: gcc-patches@gcc.gnu.org > > The following addresses the fact that we keep an excessive amount of > redundant DEBUG BEGIN_STMTs - in the testcase it sums up to 99.999% > of all stmts, sucking up compile-time in IL walks. The patch amends > the GIMPLE DCE code that elides redundant DEBUG BIND stmts, also > pruning uninterrupted sequences of DEBUG BEGIN_STMTs, keeping only > the last of each set of DEBUG BEGIN_STMT with unique location. > > PR middle-end/118801 > * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Prune > sequences of uninterrupted DEBUG BEGIN_STMTs, keeping only > the last of a set with unique location. > --- > gcc/tree-ssa-dce.cc | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc > index be21a2d0b50..461283ba858 100644 > --- a/gcc/tree-ssa-dce.cc > +++ b/gcc/tree-ssa-dce.cc > @@ -1508,6 +1508,7 @@ eliminate_unnecessary_stmts (bool aggressive) > >/* Remove dead statements. */ >auto_bitmap debug_seen; > + hash_set> locs_seen; >for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi = psi) > { > stmt = gsi_stmt (gsi); > @@ -1670,6 +1671,15 @@ eliminate_unnecessary_stmts (bool aggressive) > remove_dead_stmt (&gsi, bb, to_remove_edges); > continue; > } > + else if (gimple_debug_begin_stmt_p (stmt)) > + { > + /* We are only keeping the last debug-begin in a series of > + debug-begin stmts. */ > + if (locs_seen.add (gimple_location (stmt))) > + remove_dead_stmt (&gsi, bb, to_remove_edges); > + continue; > + } > + locs_seen.empty (); > bitmap_clear (debug_seen); > } > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description
On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote: > > 在 2025/2/7 下午8:09, Xi Ruoyao 写道: > /* snip */ > > - > > -(define_insn "lasx_xvpickev_w" > > - [(set (match_operand:V8SI 0 "register_operand" "=f") > > - (vec_select:V8SI > > - (vec_concat:V16SI > > - (match_operand:V8SI 1 "register_operand" "f") > > - (match_operand:V8SI 2 "register_operand" "f")) > > - (parallel [(const_int 0) (const_int 2) > > - (const_int 8) (const_int 10) > > - (const_int 4) (const_int 6) > > - (const_int 12) (const_int 14)])))] > > - "ISA_HAS_LASX" > > - "xvpickev.w\t%u0,%u2,%u1" > > - [(set_attr "type" "simd_permute") > > - (set_attr "mode" "V8SI")]) > > - > /* snip */ > > +;; Picking even/odd elements. > > +(define_insn "simd_pick_evod_" > > + [(set (match_operand:ALLVEC 0 "register_operand" "=f") > > + (vec_select:ALLVEC > > + (vec_concat: > > + (match_operand:ALLVEC 1 "register_operand" "f") > > + (match_operand:ALLVEC 2 "register_operand" "f")) > > + (match_operand: 3 "vect_par_cnst_even_or_odd_half")))] > > For LASX, the generated select array is problematic, taking xvpickev.w > as an example: > > xvpickev.w vd,vj,vk > > The behavior of the instruction is as follows: > > vd.w[0] = vk.w[0] > > vd.w[1] = vk.w[2] > > vd.w[2] = vj.w[0] > > vd.w[3] = vj.w[2] > > vd.w[4] = vk.w[4] > > vd.w[5] = vk.w[6] > > vd.w[6] = vj.w[4] > > vd.w[7] = vj.w[6] Oops stupid I. Strangely the bootstrapping (even with BOOT_CFLAGS="-O2 -g -march=la664") and regtesting cannot catch it. I'll limit this to LSX in v2. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v3 2/5] c++/modules: Ignore TU-local entities where necessary
On Wed, 12 Feb 2025, Nathaniel Shead wrote: > On Mon, Jan 27, 2025 at 10:20:05AM -0500, Patrick Palka wrote: > > [snip] > > > > > @@ -18486,6 +18562,12 @@ dependent_operand_p (tree t) > > > { > > >while (TREE_CODE (t) == IMPLICIT_CONV_EXPR) > > > t = TREE_OPERAND (t, 0); > > > + > > > + /* If we contain a TU_LOCAL_ENTITY assume we're non-dependent; we'll > > > error > > > + later when instantiating. */ > > > + if (expr_contains_tu_local_entity (t)) > > > +return false; > > > > I think it'd be more robust and cheaper (avoiding a separate tree walk) > > to teach the general constexpr/dependence predicates about > > TU_LOCAL_ENTITY instead of handling it only here. > > > > > + > > >++processing_template_decl; > > >bool r = (potential_constant_expression (t) > > > ? value_dependent_expression_p (t) > > > @@ -20255,6 +20337,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t > > > complain, tree in_decl) > > > else > > > object = NULL_TREE; > > > > > > + if (function_contains_tu_local_entity (templ)) > > > + RETURN (error_mark_node); > > > + > > > tree tid = lookup_template_function (templ, targs); > > > protected_set_expr_location (tid, EXPR_LOCATION (t)); > > > > > > @@ -20947,6 +21032,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t > > > complain, tree in_decl) > > > qualified_p = true; > > > } > > > > > > + if (function_contains_tu_local_entity (function)) > > > + RETURN (error_mark_node); > > > > Similarly, maybe it'd suffice to check this more generally in the > > OVERLOAD case of tsubst_expr? > > > > So I'd completely missed the idea of handling it in the OVERLOAD case; > doing this also fixes the issues I'd been having trying to handle it in > potential_constant_expression. I think this should be a lot cleaner > now. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Nice, LGTM! > > -- >8 -- > > Subject: [PATCH] c++: Handle TU_LOCAL_ENTITY in tsubst_expr and > potential_constant_expression > > This cleans up the TU_LOCAL_ENTITY handling to avoid unnecessary > tree walks and make the logic more robust. > > gcc/cp/ChangeLog: > > * constexpr.cc (potential_constant_expression_1): Handle > TU_LOCAL_ENTITY. > * pt.cc (expr_contains_tu_local_entity): Remove. > (function_contains_tu_local_entity): Remove. > (dependent_operand_p): Remove special handling for > TU_LOCAL_ENTITY. > (tsubst_expr): Handle TU_LOCAL_ENTITY when tsubsting OVERLOADs; > remove now-unnecessary extra handling. > (type_dependent_expression_p): Handle TU_LOCAL_ENTITY. > > Signed-off-by: Nathaniel Shead > --- > gcc/cp/constexpr.cc | 5 +++ > gcc/cp/pt.cc| 80 - > 2 files changed, 19 insertions(+), 66 deletions(-) > > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc > index f142dd32bc8..b36705fd4ce 100644 > --- a/gcc/cp/constexpr.cc > +++ b/gcc/cp/constexpr.cc > @@ -10825,6 +10825,11 @@ potential_constant_expression_1 (tree t, bool > want_rval, bool strict, bool now, > case CO_RETURN_EXPR: >return false; > > +/* Assume a TU-local entity is not constant, we'll error later when > + instantiating. */ > +case TU_LOCAL_ENTITY: > + return false; > + > case NONTYPE_ARGUMENT_PACK: >{ > tree args = ARGUMENT_PACK_ARGS (t); > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc > index f857b3f1180..966050a6608 100644 > --- a/gcc/cp/pt.cc > +++ b/gcc/cp/pt.cc > @@ -9935,61 +9935,6 @@ complain_about_tu_local_entity (tree e) >inform (TU_LOCAL_ENTITY_LOCATION (e), "declared here"); > } > > -/* Checks if T contains a TU-local entity. */ > - > -static bool > -expr_contains_tu_local_entity (tree t) > -{ > - if (!modules_p ()) > -return false; > - > - auto walker = [](tree *tp, int *walk_subtrees, void *) -> tree > -{ > - if (TREE_CODE (*tp) == TU_LOCAL_ENTITY) > - return *tp; > - if (!EXPR_P (*tp)) > - *walk_subtrees = false; > - return NULL_TREE; > -}; > - return cp_walk_tree (&t, walker, nullptr, nullptr); > -} > - > -/* Errors and returns TRUE if X is a function that contains a TU-local > - entity in its overload set. */ > - > -static bool > -function_contains_tu_local_entity (tree x) > -{ > - if (!modules_p ()) > -return false; > - > - if (!x || x == error_mark_node) > -return false; > - > - if (TREE_CODE (x) == OFFSET_REF > - || TREE_CODE (x) == COMPONENT_REF) > -x = TREE_OPERAND (x, 1); > - x = MAYBE_BASELINK_FUNCTIONS (x); > - if (TREE_CODE (x) == TEMPLATE_ID_EXPR) > -x = TREE_OPERAND (x, 0); > - > - if (OVL_P (x)) > -for (tree ovl : lkp_range (x)) > - if (TREE_CODE (ovl) == TU_LOCAL_ENTITY) > - { > - x = ovl; > - break; > - } > - > - if (TREE_CODE (x) == TU_LOCAL_ENTITY) > -{ > - complain_about_tu_local_entity (x); > - return true; > -} > - > - return false; > -} > - >
Re: [RFA][PR target/115478] Accept ADD, IOR or XOR when combining objects with no bits in common
Jeff Law writes: > So the change to prefer ADD over IOR for combining two objects with no > bits in common is (IMHO) generally good. It has some minor fallout. > > In particular the aarch64 port (and I suspect others) have patterns that > recognize IOR, but not PLUS or XOR for these cases and thus tests which > expected to optimize with IOR are no longer optimizing. > > Roger suggested using a code iterator for this purpose. Richard S. > suggested a new match operator to cover those cases. > > I really like the match operator idea, but as Richard S. notes in the PR > it would require either not validating the "no bits in common", which > dramatically reduces the utility IMHO or we'd need some work to allow > consistent results without polluting the nonzero bits cache. > > So this patch goes back to Roger's idea of just using a match iterator > in the aarch64 backend (and presumably anywhere else we see this popping > up). > > Bootstrapped and regression tested on aarch64-linux-gnu where it fixes > bitint-args.c (as expected). > > OK for the trunk? > > Jeff > > PR target/115478 > gcc/ > * config/aarch64/iterators.md (any_or_plus): New code iterator. > * config/aarch64/aarch64.md (extr5_insn): Use any_or_plus. > (extr5_insn_alt, extrsi5_insn_uxtw): Likewise. > (extrsi5_insn_uxtw_alt, extrsi5_insn_di): Likewise. > > gcc/testsuite/ > * gcc.target/aarch64/bitint-args.c: Update expected output. OK, thanks! (For the record, I agree that the match_operator thing requires too many changes for stage 4.) Richard > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 071058dbeb3..cfe730f3732 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -6194,10 +6194,11 @@ > > (define_insn "*extr5_insn" >[(set (match_operand:GPI 0 "register_operand" "=r") > - (ior:GPI (ashift:GPI (match_operand:GPI 1 "register_operand" "r") > - (match_operand 3 "const_int_operand" "n")) > - (lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r") > -(match_operand 4 "const_int_operand" "n"] > + (any_or_plus:GPI > + (ashift:GPI (match_operand:GPI 1 "register_operand" "r") > + (match_operand 3 "const_int_operand" "n")) > + (lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r") > + (match_operand 4 "const_int_operand" "n"] >"UINTVAL (operands[3]) < GET_MODE_BITSIZE (mode) && > (UINTVAL (operands[3]) + UINTVAL (operands[4]) == GET_MODE_BITSIZE > (mode))" >"extr\\t%0, %1, %2, %4" > @@ -6208,10 +6209,11 @@ > ;; so we have to match both orderings. > (define_insn "*extr5_insn_alt" >[(set (match_operand:GPI 0 "register_operand" "=r") > - (ior:GPI (lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r") > - (match_operand 4 "const_int_operand" "n")) > - (ashift:GPI (match_operand:GPI 1 "register_operand" "r") > - (match_operand 3 "const_int_operand" "n"] > + (any_or_plus:GPI > + (lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r") > + (match_operand 4 "const_int_operand" "n")) > + (ashift:GPI (match_operand:GPI 1 "register_operand" "r") > + (match_operand 3 "const_int_operand" "n"] >"UINTVAL (operands[3]) < GET_MODE_BITSIZE (mode) > && (UINTVAL (operands[3]) + UINTVAL (operands[4]) > == GET_MODE_BITSIZE (mode))" > @@ -6223,10 +6225,11 @@ > (define_insn "*extrsi5_insn_uxtw" >[(set (match_operand:DI 0 "register_operand" "=r") > (zero_extend:DI > - (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "r") > - (match_operand 3 "const_int_operand" "n")) > - (lshiftrt:SI (match_operand:SI 2 "register_operand" "r") > - (match_operand 4 "const_int_operand" "n")] > + (any_or_plus:SI > +(ashift:SI (match_operand:SI 1 "register_operand" "r") > + (match_operand 3 "const_int_operand" "n")) > +(lshiftrt:SI (match_operand:SI 2 "register_operand" "r") > + (match_operand 4 "const_int_operand" "n")] >"UINTVAL (operands[3]) < 32 && > (UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32)" >"extr\\t%w0, %w1, %w2, %4" > @@ -6236,10 +6239,11 @@ > (define_insn "*extrsi5_insn_uxtw_alt" >[(set (match_operand:DI 0 "register_operand" "=r") > (zero_extend:DI > - (ior:SI (lshiftrt:SI (match_operand:SI 2 "register_operand" "r") > -(match_operand 4 "const_int_operand" "n")) > - (ashift:SI (match_operand:SI 1 "register_operand" "r") > - (match_operand 3 "const_int_operand" "n")] > + (any_or_plus:SI > +(lshiftrt:SI (match_operand:SI 2 "register_operand" "r") > + (match_operand 4 "co
Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]
Jeff Law writes: > On 2/11/25 9:08 AM, Richard Sandiford wrote: >> Jeff Law writes: >>> On 2/7/25 5:59 AM, Andrew Waterman wrote: This patch runs counter to the ABI spec, which states that vxrm is not preserved across calls and is volatile upon function entry [1]. vxrm does not play the same role as frm plays in the calling convention. (I won't get into the rationale in this email, but the rationale isn't especially important: we should follow the ABI.) [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120 >>> Pan's patch doesn't change the basic property that VXRM has no known >>> state at function entry or upon return from a function call. >> >> I think it will. global_regs[X] means that X is defined on entry, >> defined on exit, and can be changed by calls. If the register is >> call-clobbered/volatile/caller-saved, then I agree with Andrew that >> this doesn't look like the right fix. > But the LCM code we use to manage vxrm assignments makes no assumption > about incoming state and assumes no state is preserved across calls. In that case, I wonder what the patch is fixing. Like you say, the initial mode seems to be VXRM_MODE_NONE, and it looks like riscv_vxrm_mode_after correctly models calls as clobbering the mode. In the FRM case, the problem was that we had: entry: call initialize X := FRM ... FRM := X Since FRM was not previously defined on entry, and since the call in any case was assumed to clobber FRM, the X := FRM seemed to be reading an uninitialised value, and so the FRM := X could be folded away. But from your description, and from an admittedly cursory look at the code, it sounds like that couldn't happen for VXRM. Richard
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
> PR117081 is about regression in povray. The reducted testcase: Just for clarification. PR117081 is not about regression in povray. it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not testl The pr91384.c is added by r12-7417 which is peephole optimization expecting some specific instruction sequence, the regression can be fixed by adding new peephole pattern. H.J patch actually regressed povray by introducing extra push/pops (since it adds preference for callee save registers, in the benchmark using caller saved registers is much better). Sorry, I may not have been clear in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9 -- BR, Hongtao
Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description
在 2025/2/7 下午8:09, Xi Ruoyao 写道: /* snip */ - -(define_insn "lasx_xvpickev_w" - [(set (match_operand:V8SI 0 "register_operand" "=f") - (vec_select:V8SI - (vec_concat:V16SI - (match_operand:V8SI 1 "register_operand" "f") - (match_operand:V8SI 2 "register_operand" "f")) - (parallel [(const_int 0) (const_int 2) -(const_int 8) (const_int 10) -(const_int 4) (const_int 6) -(const_int 12) (const_int 14)])))] - "ISA_HAS_LASX" - "xvpickev.w\t%u0,%u2,%u1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V8SI")]) - /* snip */ +;; Picking even/odd elements. +(define_insn "simd_pick_evod_" + [(set (match_operand:ALLVEC 0 "register_operand" "=f") + (vec_select:ALLVEC + (vec_concat: + (match_operand:ALLVEC 1 "register_operand" "f") + (match_operand:ALLVEC 2 "register_operand" "f")) + (match_operand: 3 "vect_par_cnst_even_or_odd_half")))] For LASX, the generated select array is problematic, taking xvpickev.w as an example: xvpickev.w vd,vj,vk The behavior of the instruction is as follows: vd.w[0] = vk.w[0] vd.w[1] = vk.w[2] vd.w[2] = vj.w[0] vd.w[3] = vj.w[2] vd.w[4] = vk.w[4] vd.w[5] = vk.w[6] vd.w[6] = vj.w[4] vd.w[7] = vj.w[6] At this point, the select array should be {0, 2, 8, 10, 4, 6, 12, 14} instead of {0, 2, 4, 6, 8, 10, 12, 14}. + "GET_MODE_SIZE (mode) != 8" ;; Use vilvl.d instead + "vpick%O3.\t%0,%2,%1" + [(set_attr "type" "simd_permute") + (set_attr "mode" "")]) + +(define_expand "_vpick_<_f>" + [(match_operand:ALLVEC 0 "register_operand" "=f") + (match_operand:ALLVEC 1 "register_operand" " f") + (match_operand:ALLVEC 2 "register_operand" " f") + (const_int zero_one)] + "GET_MODE_SIZE (mode) != 8" ;; Use vilvl.d instead +{ + int nelts = GET_MODE_NUNITS (mode); + rtx op3 = loongarch_gen_stepped_int_parallel (nelts, , 2); + rtx insn = gen_simd_pick_evod_ (operands[0], operands[1], + operands[2], op3); + emit_insn (insn); + DONE; +}) + ;; Integer widening add/sub/mult. (define_insn "simd_w_evod__" [(set (match_operand: 0 "register_operand" "=f")
Re: [PATCH] c++: Fix use-after-free of replaced friend instantiation [PR118807]
On 2/10/25 11:58 PM, Nathaniel Shead wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu (and additionally passed modules.exp with a checking=all build), OK for trunk? -- >8 -- When instantiating a friend function, we call register_specialization which adds it to the DECL_TEMPLATE_INSTANTIATIONS of the template. However, in some circumstances we might immediately call pushdecl and find an existing specialisation. In this case, when reregistering the specialisation we also need to update the DECL_TEMPLATE_INSTANTIATIONS list so that we don't try to access the freed spec again later. PR c++/118807 gcc/cp/ChangeLog: * pt.cc (reregister_specialization): Remove spec from DECL_TEMPLATE_INSTANTIATIONS. gcc/testsuite/ChangeLog: * g++.dg/modules/pr118807.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/pt.cc| 11 +++ gcc/testsuite/g++.dg/modules/pr118807.C | 11 +++ 2 files changed, 22 insertions(+) create mode 100644 gcc/testsuite/g++.dg/modules/pr118807.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 39232b5e67f..e1764743597 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -1985,6 +1985,17 @@ reregister_specialization (tree spec, tree tinfo, tree new_spec) gcc_assert (entry->spec == spec || entry->spec == new_spec); gcc_assert (new_spec != NULL_TREE); entry->spec = new_spec; + + /* We need to also remove the old specialisation from Let's say "spec" instead of "old specialisation", old and new are kind of confusing here since in duplicate_decls, new_spec is olddecl and spec is newdecl. OK with that tweak. +DECL_TEMPLATE_INSTANTIATIONS if it was placed there. */ + for (tree *inst = &DECL_TEMPLATE_INSTANTIATIONS (elt.tmpl); + *inst; inst = &TREE_CHAIN (*inst)) + if (TREE_VALUE (*inst) == spec) + { + *inst = TREE_CHAIN (*inst); + break; + } + return 1; } diff --git a/gcc/testsuite/g++.dg/modules/pr118807.C b/gcc/testsuite/g++.dg/modules/pr118807.C new file mode 100644 index 000..a97afb92699 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/pr118807.C @@ -0,0 +1,11 @@ +// PR c++/118807 +// { dg-additional-options "-fmodules --param=ggc-min-expand=0 --param=ggc-min-heapsize=0 -Wno-global-module" } + +module; +template class basic_streambuf; +template struct basic_streambuf { + friend void __istream_extract(); +}; +template class basic_streambuf; +template class basic_streambuf; +export module M;
Re: [PATCH 4/8] LoongArch: Simplify {lsx_,lasx_x}hv{add,sub}w description
On Tue, 2025-02-11 at 15:48 +0800, Lulu Cheng wrote: > Hi, > > I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be > "{lsx_,lasx_x}vh{add,sub}w". Indeed. > > 在 2025/2/7 下午8:09, Xi Ruoyao 写道: > > Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use > > special predicates and TImode RTL instead of hard-coded const > > vectors > > and UNSPECs. > /* snip */ > > +(define_insn "simd_hw__" > > + [(set (match_operand: 0 "register_operand" "=f") > > + (addsub: > > + (vec_select: > > + (any_extend: > > Does the order of any_extend affect the code generation? I'm not sure but I think it makes sense to keep the select/extend order consistent for LoongArch, thus I'll make any_extend out of vec_select in the next version of the series. I just didn't really notice the order difference when I wrote this. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH 5/8] LoongArch: Simplify {lsx_,lasx_x}maddw description
On Tue, 2025-02-11 at 15:49 +0800, Lulu Cheng wrote: > It seems that the title here is "{lsx_,lasx_x}vmaddw". Will fix in v2. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On Tue, Feb 11, 2025 at 4:27 PM H.J. Lu wrote: > > On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu wrote: > > > > > PR117081 is about regression in povray. The reducted testcase: > > Just for clarification. PR117081 is not about regression in povray. > > it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not > > testl > > The pr91384.c is added by r12-7417 which is peephole optimization > > expecting some specific instruction sequence, the regression can be > > fixed by adding new peephole pattern. > > > > H.J patch actually regressed povray by introducing extra push/pops > > (since it adds preference for callee save registers, in the benchmark > > using caller saved registers is much better). > > Sorry, I may not have been clear in > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9 > > > > My patch doesn't change the codegen for that code as shown by Real benchmark scenarios are a little more complex, the testcase in the PR is just one of the scenes, but not all. We are currently investigating this case and hope to find a better solution. > > commit 846837c2406ae7a52d9123b29c13e4b8b9d14224 > Author: H.J. Lu > Date: Fri Feb 7 13:49:30 2025 +0800 > > x86: Verify that PUSH/POP can be skipped > > > -- > H.J. -- BR, Hongtao
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On Tue, Feb 11, 2025 at 4:38 PM Hongtao Liu wrote: > > On Tue, Feb 11, 2025 at 4:27 PM H.J. Lu wrote: > > > > On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu wrote: > > > > > > > PR117081 is about regression in povray. The reducted testcase: > > > Just for clarification. PR117081 is not about regression in povray. > > > it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not > > > testl > > > The pr91384.c is added by r12-7417 which is peephole optimization > > > expecting some specific instruction sequence, the regression can be > > > fixed by adding new peephole pattern. > > > > > > H.J patch actually regressed povray by introducing extra push/pops > > > (since it adds preference for callee save registers, in the benchmark > > > using caller saved registers is much better). > > > Sorry, I may not have been clear in > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9 > > > > > > > My patch doesn't change the codegen for that code as shown by > Real benchmark scenarios are a little more complex, the testcase in > the PR is just one of the scenes, but not all. > We are currently investigating this case and hope to find a better solution. We need testcases to make sure that there are no regressions. > > > > commit 846837c2406ae7a52d9123b29c13e4b8b9d14224 > > Author: H.J. Lu > > Date: Fri Feb 7 13:49:30 2025 +0800 > > > > x86: Verify that PUSH/POP can be skipped > > > > > > -- > > H.J. > > > > -- > BR, > Hongtao -- H.J.
[PATCH 2/2] libcpp: Fix incorrect line information for traditional cpp and #include [PR100904]
After r7-1651-gac81cf0b2bf5efdd7, the location for the error for #include would be the location on the token. Except in traditional cpp, the location information for directives is all messed up because first libcpp processes the directive line in traditional and copies it to a new buffer and then does the lexing using the ISO lexer. This means the location information for the tokens are wrong and should just grab the location of the directive line instead. This patch does exactly that. Uses directive line location for traditional cpp when parsing the include. Bootstrapped and tested on x86_64-linux-gnu. PR preprocessor/100904 libcpp/ChangeLog: * directives.cc (parse_include): Use the directive line location for the location in traditional cpp mode instead of the location of the token. gcc/testsuite/ChangeLog: * gcc.dg/cpp/missing-header-trad-1.c: New test. Signed-off-by: Andrew Pinski --- gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c | 10 ++ libcpp/directives.cc | 9 - 2 files changed, 18 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c diff --git a/gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c b/gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c new file mode 100644 index 000..d77cc5fe228 --- /dev/null +++ b/gcc/testsuite/gcc.dg/cpp/missing-header-trad-1.c @@ -0,0 +1,10 @@ +/* { dg-do preprocess } */ +/* { dg-options "-traditional-cpp" } */ + +/* PR preprocessor/100904 */ +/* Make sure we error out on the correct line + for traditional cpp. */ + +#include "nonexistent.h" /* { dg-error "-: nonexistent.h" } */ + +/* { dg-message "terminated" "terminated" { target *-*-* } 0 } */ diff --git a/libcpp/directives.cc b/libcpp/directives.cc index 9c0f77ab017..d4a5ab1cbec 100644 --- a/libcpp/directives.cc +++ b/libcpp/directives.cc @@ -841,7 +841,14 @@ parse_include (cpp_reader *pfile, int *pangle_brackets, /* Allow macro expansion. */ header = get_token_no_padding (pfile); - *location = header->src_loc; + + /* The location for traditional is the directive line as the + token line information for the temporary buffer. */ + if (CPP_OPTION (pfile, traditional)) +*location = pfile->directive_line; + else +*location = header->src_loc; + if ((header->type == CPP_STRING && header->val.str.text[0] != 'R') || header->type == CPP_HEADER_NAME) { -- 2.43.0
Re: [PATCH v4] c++: Reject cdtors and conversion operators with a single * as return type [PR118306]
On 2/10/25 12:09 PM, Simon Martin wrote: Hi Jason, On 7 Feb 2025, at 23:10, Jason Merrill wrote: On 2/7/25 4:04 PM, Simon Martin wrote: Hi Jason, On 7 Feb 2025, at 14:21, Jason Merrill wrote: On 2/6/25 3:05 PM, Simon Martin wrote: Hi Jason, On 6 Feb 2025, at 16:48, Jason Merrill wrote: On 2/5/25 2:21 PM, Simon Martin wrote: Hi Jason, On 4 Feb 2025, at 21:23, Jason Merrill wrote: On 2/4/25 3:03 PM, Jason Merrill wrote: On 2/4/25 11:45 AM, Simon Martin wrote: On 4 Feb 2025, at 17:17, Jason Merrill wrote: On 2/4/25 10:56 AM, Simon Martin wrote: Hi Jason, On 4 Feb 2025, at 16:39, Jason Merrill wrote: On 1/15/25 9:56 AM, Jason Merrill wrote: On 1/15/25 7:24 AM, Simon Martin wrote: Hi, On 14 Jan 2025, at 23:31, Jason Merrill wrote: On 1/14/25 2:13 PM, Simon Martin wrote: On 10 Jan 2025, at 19:10, Andrew Pinski wrote: On Fri, Jan 10, 2025 at 3:18 AM Simon Martin wrote: We currently accept the following invalid code (EDG and MSVC do as well) clang does too: https://github.com/llvm/llvm-project/issues/121706 . Note it might be useful if a testcase with multiply `*` is included too: ``` struct A { A (); }; ``` Thanks, makes sense to add those. Done in the attached updated revision, successfully tested on x86_64-pc-linux-gnu. +/* Check that it's OK to declare a function at ID_LOC with the indicated TYPE, + TYPE_QUALS and DECLARATOR. SFK indicates the kind of special function (if + any) that this function is. OPTYPE is the type given in a conversion operator declaration, or the class type for a constructor/destructor. Returns the actual return type of the function; that may be different than TYPE if an error occurs, or for certain special functions. */ @@ -12361,8 +12362,19 @@ check_special_function_return_type (special_function_kind sfk, tree type, tree optype, int type_quals, + const cp_declarator *declarator, + location_t id_loc, id_loc should be the same as declarator->id_loc? You’re right. const location_t* locations) { + /* If TYPE is unspecified, DECLARATOR, if set, should not represent a pointer + or a reference type. */ + if (type == NULL_TREE + && declarator + && (declarator->kind == cdk_pointer + || declarator->kind == cdk_reference)) + error_at (id_loc, "expected unqualified-id before %qs token", + declarator->kind == cdk_pointer ? "*" : "&"); ...and id_loc isn't the location of the ptr-operator, it's the location of the identifier, so this indicates the wrong column. I think using declarator->id_loc makes sense, just not pretending it's the location of the *. Good catch, thanks. Let's give diagnostics more like the others later in the function instead of trying to emulate cp_parser_error. Makes sense. This is what the updated patch does, successfully tested on x86_64-pc-linux-gnu. OK for GCC 16? OK. Does this also fix 118304? If so, let's go ahead and apply it to GCC 15. I have checked just now, and we still ICE for 118304’s testcase with that fix. Why doesn't the preeexisting type = void_type_node; in check_special_function_return_type fix the return type and avoid the ICE? We hit the gcc_assert at method.cc:3593, that Marek’s fix bypasses. Yes, but why doesn't check_special_function_return_type prevent that? Ah, because we call it before walking the declarator. We need to check again later, perhaps in grokfndecl, that the type is correct. Perhaps instead of your patch. One “issue” with adding another check in or close to grokfndecl is that DECLARATOR will have “been moved to the ID”, and the fact that we had a CDK_POINTER kind is “lost”. We could obviously somehow propagate this information, but there might be something easier. The information isn't lost: it's now reflected in the (wrong) return type. One place it would make sense to check would be if (ctype && (sfk == sfk_constructor || sfk == sfk_destructor)) { /* We are within a class's scope. If our declarator name is the same as the class name, and we are defining a function, then it is a constructor/destructor, and therefore returns a void type. */ Here 'type' is still the return type, we haven't gotten to build_function_type yet. That’s true. However, doesn’t it make sense to cram all the checks about the return type of special functions in check_special_function_return_type, and return an error if that return type is invalid? This error seems easily recoverable since we know what the type needs to be, there's no need for error return from
Re: [PATCH 4/8] LoongArch: Simplify {lsx_,lasx_x}hv{add,sub}w description
在 2025/2/11 下午4:37, Xi Ruoyao 写道: On Tue, 2025-02-11 at 15:48 +0800, Lulu Cheng wrote: Hi, I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be "{lsx_,lasx_x}vh{add,sub}w". Indeed. 在 2025/2/7 下午8:09, Xi Ruoyao 写道: Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. /* snip */ +(define_insn "simd_hw__" + [(set (match_operand: 0 "register_operand" "=f") + (addsub: + (vec_select: + (any_extend: Does the order of any_extend affect the code generation? I'm not sure but I think it makes sense to keep the select/extend order consistent for LoongArch, thus I'll make any_extend out of vec_select in the next version of the series. I just didn't really notice the order difference when I wrote this. In the later stages, we will continue to test whether the order of select/extend affects performance.
Re: [PATCH] x86: Correct ASM_OUTPUT_SYMBOL_REF
On Tue, Feb 11, 2025 at 3:12 PM Uros Bizjak wrote: > > On Tue, Feb 11, 2025 at 7:13 AM H.J. Lu wrote: > > > > x is not a macro argument. It just happens to work as final.cc passes > > x for 2nd argument: > > > > final.cc: ASM_OUTPUT_SYMBOL_REF (file, x); > > > > PR target/118825 > > * config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with > > SYM. > > > - = assemble_name_resolve (XSTR (x, 0)); \ > > + = assemble_name_resolve (XSTR ((SYM), 0)); \ > > No need for parenthesis when macro argument is used in a function call. > > OK with the above change. Fixed. Pushed. Will backport it to release branches later. Thanks. -- H.J.
Re: GCN, nvptx: 'sorry, unimplemented: exception handling not supported'
Hi! On 2025-02-08T13:17:55+0100, I wrote: > pushed to trunk branch commit 6312165650091a4df34668d8e2aaa0bbc4008a66 > "GCN, nvptx: 'sorry, unimplemented: exception handling not supported'" > For GCN, this avoids ICEs further down the compilation pipeline. For the record, in case that's helpful later on, here's a note from ~2023-04: | Before [...], we got a lot of ICEs in 'g++' testing for '-fexceptions' etc. | For example, 'g++.dg/pr49847.C': | | $ build-gcc/gcc/xg++ -Bbuild-gcc/gcc/ source-gcc/gcc/testsuite/g++.dg/pr49847.C -std=gnu++98 -O -fnon-call-exceptions -Wno-return-type -S -o pr49847.s | during RTL pass: jump | source-gcc/gcc/testsuite/g++.dg/pr49847.C: In function ‘int f(float)’: | source-gcc/gcc/testsuite/g++.dg/pr49847.C:7:1: internal compiler error: Segmentation fault | 7 | } | | ^ | 0x1216aaf crash_signal | [...]/gcc/toplev.cc:314 | 0x1ef5d23 count_reg_usage | [...]/gcc/cse.cc:6757 | 0x1ef5f0a count_reg_usage | [...]/gcc/cse.cc:6797 | 0x1efba4c delete_trivially_dead_insns(rtx_insn*, int) | [...]/gcc/cse.cc:7028 | 0x1ea4e36 execute | [...]/gcc/cfgcleanup.cc:3237 | | ..., and likewise for a lot of other 'g++' test cases, but also | 'gcc.dg/pr104464.c', 'gcc.dg/uninit-pr106881.c', 'gcc.dg/torture/pr105484.c'. | | The SIGSEGV is due to 'REGNO ("reg:DI -1") == INVALID_REGNUM', and | 'INVALID_REGNUM == (~(unsigned int) 0)', which doesn't work in | 'count_reg_usage': | | counts[REGNO (x)] += incr; | | Trying to work around this locally is not sufficient; further ICEs down the | line. | | The 'INVALID_REGNUM' is due to 'gcc/defaults.h:EH_RETURN_DATA_REGNO'. Grüße Thomas
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On Tue, Feb 11, 2025 at 4:13 PM Hongtao Liu wrote: > > > PR117081 is about regression in povray. The reducted testcase: > Just for clarification. PR117081 is not about regression in povray. > it's related to FAIL: gcc.target/i386/pr91384.c scan-assembler-not > testl > The pr91384.c is added by r12-7417 which is peephole optimization > expecting some specific instruction sequence, the regression can be > fixed by adding new peephole pattern. > > H.J patch actually regressed povray by introducing extra push/pops > (since it adds preference for callee save registers, in the benchmark > using caller saved registers is much better). > Sorry, I may not have been clear in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081#c9 > My patch doesn't change the codegen for that code as shown by commit 846837c2406ae7a52d9123b29c13e4b8b9d14224 Author: H.J. Lu Date: Fri Feb 7 13:49:30 2025 +0800 x86: Verify that PUSH/POP can be skipped -- H.J.
[PATCH 1/2] libcpp: Fix handling of `deferred` pragmas with -traditional [PR79516]
The problem here is with deferred pragmas, libcpp would inject a PRAGMA_EOL before the end of the new line in the tokens stream but traditional cpp path does not use that path except when dealing with directives. In this case we call out to handle `#if` directive and that token got added due to the change of line #. So at the end of a directive, we need to set in_deferred_pragma to false as traditional cpp path handles the new line itself. Bootstrapped and tested on x86_64-linux. PR preprocessor/79516 libcpp/ChangeLog: * directives.cc (end_directive): Also set in_deferred_pragma to false with traditional cpp. gcc/testsuite/ChangeLog: * c-c++-common/cpp/pragma-message-trad.c: New test. Signed-off-by: Andrew Pinski --- gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c | 9 + libcpp/directives.cc | 2 ++ 2 files changed, 11 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c diff --git a/gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c b/gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c new file mode 100644 index 000..0478e6fc7c7 --- /dev/null +++ b/gcc/testsuite/c-c++-common/cpp/pragma-message-trad.c @@ -0,0 +1,9 @@ +/* { dg-do preprocess } */ +/* { dg-options "-traditional-cpp" } */ +/* PR preprocessor/79516 */ + +#pragma message "OK" + +#if 0 +#pragma message ("Not printed") +#endif diff --git a/libcpp/directives.cc b/libcpp/directives.cc index 6b0d691f491..9c0f77ab017 100644 --- a/libcpp/directives.cc +++ b/libcpp/directives.cc @@ -323,6 +323,8 @@ end_directive (cpp_reader *pfile, int skip_line) /* Revert change of prepare_directive_trad. */ if (!pfile->state.in_deferred_pragma) pfile->state.prevent_expansion--; + /* No longer inside a deferred pragma. */ + pfile->state.in_deferred_pragma = false; if (pfile->directive != &dtable[T_DEFINE]) _cpp_remove_overlay (pfile); -- 2.43.0
[PATCH] s390: Fix s390_valid_shift_count() for TI mode [PR118835]
During combine we may end up with (set (reg:DI 66 [ _6 ]) (ashift:DI (reg:DI 72 [ x ]) (subreg:QI (and:TI (reg:TI 67 [ _1 ]) (const_wide_int 0x0aabf)) 15))) where the shift count operand does not trivially fit the scheme of address operands. Reject those operands, especially since strip_address_mutations() expects expressions of the form (and ... (const_int ...)) and fails for (and ... (const_wide_int ...)). While on it, fix indentation of the if block. gcc/ChangeLog: PR target/118835 * config/s390/s390.cc (s390_valid_shift_count): Reject shift count operands which do not trivially fit the scheme of address operands. gcc/testsuite/ChangeLog: * gcc.target/s390/pr118835.c: New test. --- Bootstrap and regtest are still running. Assuming they finish without regressions and there are no further comments, I will push this. gcc/config/s390/s390.cc | 37 ++-- gcc/testsuite/gcc.target/s390/pr118835.c | 21 ++ 2 files changed, 43 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/pr118835.c diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 1d96df49fea..c2636c54613 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -3510,26 +3510,33 @@ s390_valid_shift_count (rtx op, HOST_WIDE_INT implicit_mask) /* Check for an and with proper constant. */ if (GET_CODE (op) == AND) - { -rtx op1 = XEXP (op, 0); -rtx imm = XEXP (op, 1); +{ + rtx op1 = XEXP (op, 0); + rtx imm = XEXP (op, 1); -if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1)) - op1 = XEXP (op1, 0); + if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1)) + op1 = XEXP (op1, 0); -if (!(register_operand (op1, GET_MODE (op1)) || GET_CODE (op1) == PLUS)) - return false; + if (!(register_operand (op1, GET_MODE (op1)) || GET_CODE (op1) == PLUS)) + return false; -if (!immediate_operand (imm, GET_MODE (imm))) - return false; + if (!immediate_operand (imm, GET_MODE (imm))) + return false; -HOST_WIDE_INT val = INTVAL (imm); -if (implicit_mask > 0 - && (val & implicit_mask) != implicit_mask) - return false; + /* Reject shift count operands which do not trivially fit the scheme of +address operands. Especially since strip_address_mutations() expects +expressions of the form (and ... (const_int ...)) and fails for +(and ... (const_wide_int ...)). */ + if (CONST_WIDE_INT_P (imm)) + return false; -op = op1; - } + HOST_WIDE_INT val = INTVAL (imm); + if (implicit_mask > 0 + && (val & implicit_mask) != implicit_mask) + return false; + + op = op1; +} /* Check the rest. */ return s390_decompose_addrstyle_without_index (op, NULL, NULL); diff --git a/gcc/testsuite/gcc.target/s390/pr118835.c b/gcc/testsuite/gcc.target/s390/pr118835.c new file mode 100644 index 000..1ca6cd95543 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/pr118835.c @@ -0,0 +1,21 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ + +/* During combine we may end up with patterns of the form + + (set (reg:DI 66 [ _6 ]) +(ashift:DI (reg:DI 72 [ x ]) + (subreg:QI (and:TI (reg:TI 67 [ _1 ]) + (const_wide_int 0x0aabf)) + 15))) + + which should be rejected since the shift count does not trivially fit the + scheme of address operands. */ + +long +test (long x, int y) +{ + __int128 z = 0xAABF; + z &= y; + return x << z; +} -- 2.47.0
Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]
On 2/11/25 9:08 AM, Richard Sandiford wrote: Jeff Law writes: On 2/7/25 5:59 AM, Andrew Waterman wrote: This patch runs counter to the ABI spec, which states that vxrm is not preserved across calls and is volatile upon function entry [1]. vxrm does not play the same role as frm plays in the calling convention. (I won't get into the rationale in this email, but the rationale isn't especially important: we should follow the ABI.) [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120 Pan's patch doesn't change the basic property that VXRM has no known state at function entry or upon return from a function call. I think it will. global_regs[X] means that X is defined on entry, defined on exit, and can be changed by calls. If the register is call-clobbered/volatile/caller-saved, then I agree with Andrew that this doesn't look like the right fix. But the LCM code we use to manage vxrm assignments makes no assumption about incoming state and assumes no state is preserved across calls. Essentially jeff
[PATCH] c++: Fix up regressions caused by for/while loops with declarations [PR118822]
Hi! The recent PR86769 r15-7426 changes regressed the following two testcases, the first one is more important as it is derived from real-world code. The first problem is that the chosen prep = do_pushlevel (sk_block); // emit something body = push_stmt_list (); // emit further stuff body = pop_stmt_list (body); prep = do_poplevel (prep); way of constructing the {FOR,WHILE}_COND_PREP and {FOR,WHILE}_BODY isn't reliable. If during parsing a label is seen in the body and then some decl with destructors, sk_cleanup transparent scope is added, but the correspondiong result from push_stmt_list is saved in *current_binding_level and pop_stmt_list then pops even that statement list but only do_poplevel actually attempts to pop the sk_cleanup scope and so we ICE. The reason for not doing do_pushlevel (sk_block); do_pushlevel (sk_block); is that variables should be in the same scope (otherwise various e.g. redeclaration*.C tests FAIL) and doing do_pushlevel (sk_block); do_pushlevel (sk_cleanup); wouldn't work either as do_poplevel would silently unwind even the cleanup one. The second problem is that my assumption that the declaration in the condition will have zero or one cleanup is just wrong, at least for structured bindings used as condition, there can be as many cleanups as there are names in the binding + 1. So, the following patch changes the earlier approach. Nothing is removed from the {FOR,WHILE}_COND_PREP subtrees while doing adjust_loop_decl_cond, push_stmt_list isn't called either; all it does is remember as an integer the number of cleanups (CLEANUP_STMT at the end of the STATEMENT_LISTs) from querying stmt_list_stack and finding the initial *body_p in there (that integer is stored into {FOR,WHILE}_COND_CLEANUP), and temporarily {FOR,WHILE}_BODY is set to the last statement (if any) in the innermost STATEMENT_LIST at the adjust_loop_decl_cond time; then at finish_{for,while}_stmt a new finish_loop_cond_prep routine takes care of do_poplevel for the scope (which is in {FOR,WHILE}_COND_PREP) and finds given {FOR,WHILE}_COND_CLEANUP number and {FOR,WHILE}_BODY tree the right spot where body statements start and moves that into {FOR,WHILE}_BODY. Finally genericize_c_loop then inserts the cond, body, continue label, expr into the right subtree of {FOR,WHILE}_COND_PREP. The constexpr evaluation unfortunately had to be changed as well, because we don't want to evaluate everything in BIND_EXPR_BODY (*_COND_PREP ()) right away, we want to evaluate it with the exception of the CLEANUP_STMT cleanups at the end (given {FOR,WHILE}_COND_CLEANUP levels), and defer the evaluation of the cleanups until after cond, body, expr are evaluated. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-02-11 Jakub Jelinek PR c++/118822 PR c++/118833 gcc/c-family/ * c-common.h (WHILE_COND_CLEANUP): Change description in comment. (FOR_COND_CLEANUP): Likewise. * c-gimplify.cc (genericize_c_loop): Adjust for COND_CLEANUP being CLEANUP_STMT/TRY_FINALLY_EXPR trailing nesting depth instead of actual cleanup. gcc/cp/ * semantics.cc (adjust_loop_decl_cond): Allow multiple trailing CLEANUP_STMT levels in *BODY_P. Set *CLEANUP_P to the number of levels rather than one particular cleanup, keep the cleanups in *PREP_P. Set *BODY_P to the last stmt in the cur_stmt_list or NULL if *CLEANUP_P and the innermost cur_stmt_list is empty. (finish_loop_cond_prep): New function. (finish_while_stmt, finish_for_stmt): Use it. Don't call set_one_cleanup_loc. * constexpr.cc (cxx_eval_loop_expr): Adjust handling of {FOR,WHILE}_COND_{PREP,CLEANUP}. gcc/testsuite/ * g++.dg/expr/for9.C: New test. * g++.dg/cpp26/decomp12.C: New test. --- gcc/c-family/c-common.h.jj 2025-02-07 17:06:50.777235245 +0100 +++ gcc/c-family/c-common.h 2025-02-11 12:12:13.034861256 +0100 @@ -1518,7 +1518,8 @@ extern tree build_userdef_literal (tree /* WHILE_STMT accessors. These give access to the condition of the while statement, the body, and name of the while statement, and - condition preparation statements and its cleanup, respectively. */ + condition preparation statements and number of its nested cleanups, + respectively. */ #define WHILE_COND(NODE) TREE_OPERAND (WHILE_STMT_CHECK (NODE), 0) #define WHILE_BODY(NODE) TREE_OPERAND (WHILE_STMT_CHECK (NODE), 1) #define WHILE_NAME(NODE) TREE_OPERAND (WHILE_STMT_CHECK (NODE), 2) @@ -1533,7 +1534,8 @@ extern tree build_userdef_literal (tree /* FOR_STMT accessors. These give access to the init statement, condition, update expression, body and name of the for statement, - and condition preparation statements and its cleanup, respectively. */ + and condition preparation statements and number of its nested cleanups, + respectively. */ #define FOR_INIT_STMT(NODE)TREE_OPERAND (FOR_STMT_CHECK (NO
[PATCH] c++: Apply/diagnose attributes when instatiating ARRAY/POINTER/REFERENCE_TYPE [PR118787]
Hi! The following testcase IMO in violation of the P2552R3 paper doesn't pedwarn on alignas applying to dependent types or alignas with dependent argument. tsubst was just ignoring TYPE_ATTRIBUTES. The following patch fixes it for the POINTER/REFERENCE_TYPE and ARRAY_TYPE cases, but perhaps we need to do the same also for other types (INTEGER_TYPE/REAL_TYPE and the like). I guess I'll need to construct more testcases. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-02-11 Jakub Jelinek PR c++/118787 * pt.cc (tsubst) : Use return t; only if it doesn't have any TYPE_ATTRIBUTES. Call apply_late_template_attributes. : Likewise. Formatting fix. * g++.dg/cpp0x/alignas22.C: New test. --- gcc/cp/pt.cc.jj 2025-02-07 17:03:13.560227281 +0100 +++ gcc/cp/pt.cc2025-02-10 17:17:47.65131 +0100 @@ -16854,7 +16854,9 @@ tsubst (tree t, tree args, tsubst_flags_ case POINTER_TYPE: case REFERENCE_TYPE: { - if (type == TREE_TYPE (t) && TREE_CODE (type) != METHOD_TYPE) + if (type == TREE_TYPE (t) + && TREE_CODE (type) != METHOD_TYPE + && TYPE_ATTRIBUTES (t) == NULL_TREE) return t; /* [temp.deduct] @@ -16924,9 +16926,9 @@ tsubst (tree t, tree args, tsubst_flags_ A,' while an attempt to create the type type rvalue reference to cv T' creates the type T" */ - r = cp_build_reference_type - (TREE_TYPE (type), - TYPE_REF_IS_RVALUE (t) && TYPE_REF_IS_RVALUE (type)); + r = cp_build_reference_type (TREE_TYPE (type), + TYPE_REF_IS_RVALUE (t) + && TYPE_REF_IS_RVALUE (type)); else r = cp_build_reference_type (type, TYPE_REF_IS_RVALUE (t)); r = cp_build_qualified_type (r, cp_type_quals (t), complain); @@ -16935,6 +16937,11 @@ tsubst (tree t, tree args, tsubst_flags_ /* Will this ever be needed for TYPE_..._TO values? */ layout_type (r); + if (!apply_late_template_attributes (&r, TYPE_ATTRIBUTES (t), +/*flags=*/0, +args, complain, in_decl)) + return error_mark_node; + return r; } case OFFSET_TYPE: @@ -17009,7 +17016,9 @@ tsubst (tree t, tree args, tsubst_flags_ /* As an optimization, we avoid regenerating the array type if it will obviously be the same as T. */ - if (type == TREE_TYPE (t) && domain == TYPE_DOMAIN (t)) + if (type == TREE_TYPE (t) + && domain == TYPE_DOMAIN (t) + && TYPE_ATTRIBUTES (t) == NULL_TREE) return t; /* These checks should match the ones in create_array_type_for_decl. @@ -17048,6 +17057,11 @@ tsubst (tree t, tree args, tsubst_flags_ TYPE_USER_ALIGN (r) = 1; } + if (!apply_late_template_attributes (&r, TYPE_ATTRIBUTES (t), +/*flags=*/0, +args, complain, in_decl)) + return error_mark_node; + return r; } --- gcc/testsuite/g++.dg/cpp0x/alignas22.C.jj 2025-02-10 17:33:16.242452750 +0100 +++ gcc/testsuite/g++.dg/cpp0x/alignas22.C 2025-02-10 17:36:28.739046629 +0100 @@ -0,0 +1,23 @@ +// PR c++/118787 +// { dg-do compile { target c++11 } } +// { dg-options "-pedantic" } + +template +void foo (T & alignas (N));// { dg-warning "'alignas' on a type other than class" } +template +void bar (T (&)[N] alignas (N)); // { dg-warning "'alignas' on a type other than class" } +template +using U = T * alignas (N); // { dg-warning "'alignas' on a type other than class" } +template +using V = T[N] alignas (N);// { dg-warning "'alignas' on a type other than class" } + +void +baz () +{ + int x alignas (4) = 0; + foo (x); + int y alignas (4) [4]; + bar (y); + U u; + V v; +} Jakub
[PATCH 3/3] LoongArch: After setting the compilation options, update the predefined macros.
target/PR118828 gcc/ChangeLog: * config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse): Update the predefined macros. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr118828.c: New test. Change-Id: I13f7b44b11bba2080db797157a0389cc1bd65ac6 --- gcc/config/loongarch/loongarch-c.cc | 14 gcc/testsuite/gcc.target/loongarch/pr118828.c | 34 +++ 2 files changed, 48 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index 9fe911325ab..83df82c1361 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3. If not see #include "tm.h" #include "c-family/c-common.h" #include "cpplib.h" +#include "c-family/c-pragma.h" #include "tm_p.h" #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM) @@ -203,6 +204,19 @@ loongarch_pragma_target_parse (tree args, tree pop_target) loongarch_reset_previous_fndecl (); + /* For the definitions, ensure all newly defined macros are considered + as used for -Wunused-macros. There is no point warning about the + compiler predefined macros. */ + cpp_options *cpp_opts = cpp_get_options (parse_in); + unsigned char saved_warn_unused_macros = cpp_opts->warn_unused_macros; + cpp_opts->warn_unused_macros = 0; + + cpp_force_token_locations (parse_in, BUILTINS_LOCATION); + loongarch_update_cpp_builtins (parse_in); + cpp_stop_forcing_token_locations (parse_in); + + cpp_opts->warn_unused_macros = saved_warn_unused_macros; + /* If we're popping or reseting make sure to update the globals so that the optab availability predicates get recomputed. */ if (pop_target) diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828.c b/gcc/testsuite/gcc.target/loongarch/pr118828.c new file mode 100644 index 000..abdda24c758 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/pr118828.c @@ -0,0 +1,34 @@ +/* { dg-do preprocess } */ +/* { dg-options "-mno-lasx" } */ + +#ifdef __loongarch_asx +#error LASX should not be available here +#endif + +#ifdef __loongarch_simd_width +#if __loongarch_simd_width == 256 +#error simd width shuold not be 256 +#endif +#endif + +#pragma GCC push_options +#pragma GCC target("lasx") +#ifndef __loongarch_asx +#error LASX should be available here +#endif +#ifndef __loongarch_simd_width +#error simd width should be available here +#elif __loongarch_simd_width != 256 +#error simd width should be 256 +#endif +#pragma GCC pop_options + +#ifdef __loongarch_asx +#error LASX should become unavailable again +#endif + +#ifdef __loongarch_simd_width +#if __loongarch_simd_width == 256 +#error simd width shuold not be 256 again +#endif +#endif -- 2.34.1
[PATCH 2/3] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.
Split the implementation of the function loongarch_cpu_cpp_builtins into two parts: 1. Macro definitions that do not change (only considering 64-bit architecture) 2. Macro definitions that change with different compilation options. gcc/ChangeLog: * config/loongarch/loongarch-c.cc (builtin_undef): New macro. (loongarch_cpu_cpp_builtins): Split to loongarch_update_cpp_builtins and loongarch_define_unconditional_macros. (loongarch_def_or_undef): New functions. (loongarch_define_unconditional_macros): Likewise. (loongarch_update_cpp_builtins): Likewise. Change-Id: Ifae73ffa2a07a595ed2a7f6ab7b82d8f51328a2a --- gcc/config/loongarch/loongarch-c.cc | 109 +--- 1 file changed, 66 insertions(+), 43 deletions(-) diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index 5d8c02e094b..9fe911325ab 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -31,13 +31,21 @@ along with GCC; see the file COPYING3. If not see #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM) #define builtin_define(TXT) cpp_define (pfile, TXT) +#define builtin_undef(TXT) cpp_undef (pfile, TXT) #define builtin_assert(TXT) cpp_assert (pfile, TXT) -void -loongarch_cpu_cpp_builtins (cpp_reader *pfile) +static void +loongarch_def_or_undef (bool def_p, const char *macro, cpp_reader *pfile) +{ + if (def_p) +cpp_define (pfile, macro); + else +cpp_undef (pfile, macro); +} + +static void +loongarch_define_unconditional_macros (cpp_reader *pfile) { - builtin_assert ("machine=loongarch"); - builtin_assert ("cpu=loongarch"); builtin_define ("__loongarch__"); builtin_define_with_value ("__loongarch_arch", @@ -66,45 +74,6 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) builtin_define ("__loongarch_lp64"); } - /* These defines reflect the ABI in use, not whether the - FPU is directly accessible. */ - if (TARGET_DOUBLE_FLOAT_ABI) -builtin_define ("__loongarch_double_float=1"); - else if (TARGET_SINGLE_FLOAT_ABI) -builtin_define ("__loongarch_single_float=1"); - - if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI) -builtin_define ("__loongarch_hard_float=1"); - else -builtin_define ("__loongarch_soft_float=1"); - - - /* ISA Extensions. */ - if (TARGET_DOUBLE_FLOAT) -builtin_define ("__loongarch_frlen=64"); - else if (TARGET_SINGLE_FLOAT) -builtin_define ("__loongarch_frlen=32"); - else -builtin_define ("__loongarch_frlen=0"); - - if (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE) -builtin_define ("__loongarch_frecipe"); - - if (ISA_HAS_LSX) -{ - builtin_define ("__loongarch_simd"); - builtin_define ("__loongarch_sx"); - - if (!ISA_HAS_LASX) - builtin_define ("__loongarch_simd_width=128"); -} - - if (ISA_HAS_LASX) -{ - builtin_define ("__loongarch_asx"); - builtin_define ("__loongarch_simd_width=256"); -} - /* ISA evolution features */ int max_v_major = 1, max_v_minor = 0; @@ -145,7 +114,61 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) builtin_define_with_int_value ("_LOONGARCH_SZPTR", POINTER_SIZE); builtin_define_with_int_value ("_LOONGARCH_FPSET", 32); builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32); +} + +static void +loongarch_update_cpp_builtins (cpp_reader *pfile) +{ + builtin_undef ("__loongarch_double_float"); + builtin_undef ("__loongarch_single_float"); + /* These defines reflect the ABI in use, not whether the + FPU is directly accessible. */ + if (TARGET_DOUBLE_FLOAT_ABI) +builtin_define ("__loongarch_double_float=1"); + else if (TARGET_SINGLE_FLOAT_ABI) +builtin_define ("__loongarch_single_float=1"); + + builtin_undef ("__loongarch_soft_float"); + builtin_undef ("__loongarch_hard_float"); + if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI) +builtin_define ("__loongarch_hard_float=1"); + else +builtin_define ("__loongarch_soft_float=1"); + + + /* ISA Extensions. */ + if (TARGET_DOUBLE_FLOAT) +builtin_define ("__loongarch_frlen=64"); + else if (TARGET_SINGLE_FLOAT) +builtin_define ("__loongarch_frlen=32"); + else +builtin_define ("__loongarch_frlen=0"); + + loongarch_def_or_undef (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE, + "__loongarch_frecipe", pfile); + + loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_simd", pfile); + loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_sx", pfile); + loongarch_def_or_undef (ISA_HAS_LASX, "__loongarch_asx", pfile); + + builtin_undef ("__loongarch_simd_width"); + if (ISA_HAS_LSX) +{ + if (ISA_HAS_LASX) + builtin_define ("__loongarch_simd_width=256"); + else + builtin_define ("__loongarch_simd_width=128"); +} +} + +void +loongarch_cpu_cpp_builtins (cpp_reader *pfile) +{ + builtin_assert ("machine=loongarch"); + builtin_assert ("cpu=loongarch"); + loongarch_define_unc
[PATCH 0/3] Organize the code and fix PR118828.
Refer to the implementation of aarch64 to fix PR118828. Lulu Cheng (3): LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc. LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions. LoongArch: After setting the compilation options, update the predefined macros. gcc/config/loongarch/loongarch-c.cc | 174 +- gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch-target-attr.cc | 48 - gcc/testsuite/gcc.target/loongarch/pr118828.c | 34 4 files changed, 166 insertions(+), 91 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c -- 2.34.1
[PATCH 1/3] LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc.
gcc/ChangeLog: * config/loongarch/loongarch-target-attr.cc (loongarch_pragma_target_parse): Move to ... (loongarch_register_pragmas): Move to ... * config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse): ... here. (loongarch_register_pragmas): ... here. * config/loongarch/loongarch-protos.h (loongarch_process_target_attr): Function Declaration. Change-Id: I12751a6ce2f1b2f587699db3c80188066f193d2d --- gcc/config/loongarch/loongarch-c.cc | 51 +++ gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch-target-attr.cc | 48 - 3 files changed, 52 insertions(+), 48 deletions(-) diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index c95c0f373be..5d8c02e094b 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -23,9 +23,11 @@ along with GCC; see the file COPYING3. If not see #include "config.h" #include "system.h" #include "coretypes.h" +#include "target.h" #include "tm.h" #include "c-family/c-common.h" #include "cpplib.h" +#include "tm_p.h" #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM) #define builtin_define(TXT) cpp_define (pfile, TXT) @@ -145,3 +147,52 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32); } + +/* Hook to validate the current #pragma GCC target and set the state, and + update the macros based on what was changed. If ARGS is NULL, then + POP_TARGET is used to reset the options. */ + +static bool +loongarch_pragma_target_parse (tree args, tree pop_target) +{ + /* If args is not NULL then process it and setup the target-specific + information that it specifies. */ + if (args) +{ + if (!loongarch_process_target_attr (args, NULL)) + return false; + + loongarch_option_override_internal (&la_target, + &global_options, + &global_options_set); +} + + /* args is NULL, restore to the state described in pop_target. */ + else +{ + pop_target = pop_target ? pop_target : target_option_default_node; + cl_target_option_restore (&global_options, &global_options_set, + TREE_TARGET_OPTION (pop_target)); +} + + target_option_current_node += build_target_option_node (&global_options, &global_options_set); + + loongarch_reset_previous_fndecl (); + + /* If we're popping or reseting make sure to update the globals so that + the optab availability predicates get recomputed. */ + if (pop_target) +loongarch_save_restore_target_globals (pop_target); + + return true; +} + +/* Implement REGISTER_TARGET_PRAGMAS. */ + +void +loongarch_register_pragmas (void) +{ + /* Update pragma hook to allow parsing #pragma GCC target. */ + targetm.target_option.pragma_parse = loongarch_pragma_target_parse; +} diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index 94d3e33cb9a..9659d5ae26e 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -222,4 +222,5 @@ extern void loongarch_save_restore_target_globals (tree new_tree); extern void loongarch_register_pragmas (void); extern rtx loongarch_gen_stepped_int_parallel (unsigned int nelts, int base, int step); +extern bool loongarch_process_target_attr (tree args, tree fndecl); #endif /* ! GCC_LOONGARCH_PROTOS_H */ diff --git a/gcc/config/loongarch/loongarch-target-attr.cc b/gcc/config/loongarch/loongarch-target-attr.cc index cee7031ca1e..cb537446dff 100644 --- a/gcc/config/loongarch/loongarch-target-attr.cc +++ b/gcc/config/loongarch/loongarch-target-attr.cc @@ -422,51 +422,3 @@ loongarch_option_valid_attribute_p (tree fndecl, tree, tree args, int) return ret; } -/* Hook to validate the current #pragma GCC target and set the state, and - update the macros based on what was changed. If ARGS is NULL, then - POP_TARGET is used to reset the options. */ - -static bool -loongarch_pragma_target_parse (tree args, tree pop_target) -{ - /* If args is not NULL then process it and setup the target-specific - information that it specifies. */ - if (args) -{ - if (!loongarch_process_target_attr (args, NULL)) - return false; - - loongarch_option_override_internal (&la_target, - &global_options, - &global_options_set); -} - - /* args is NULL, restore to the state described in pop_target. */ - else -{ - pop_target = pop_target ? pop_target : target_option_default_node; - cl_target_option_restore (&global_options, &global_options_set, - TREE_TARGET_OPTION (pop_target)); -} - - target_optio
[pushed: r15-7474] sarif-replay: fix off-by-one in handling of "endColumn" (§3.30.8) [PR118792]
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-7474-ge8c5013b6b7820. gcc/ChangeLog: PR sarif-replay/118792 * libsarifreplay.cc (sarif_replayer::handle_region_object): Fix off-by-one in handling of endColumn property so that the code matches the comment and the SARIF spec (§3.30.8). gcc/testsuite/ChangeLog: PR sarif-replay/118792 * sarif-replay.dg/2.1.0-valid/error-with-note.sarif: Update expected output to reflect fix to off-by-one error in handling of "endColumn" property. * sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif: Likewise. * sarif-replay.dg/2.1.0-valid/signal-1.c.moved.sarif: Likewise. * sarif-replay.dg/2.1.0-valid/signal-1.c.sarif: Likewise. Signed-off-by: David Malcolm --- gcc/libsarifreplay.cc | 2 +- .../2.1.0-valid/error-with-note.sarif | 4 ++-- .../2.1.0-valid/malloc-vs-local-4.c.sarif | 24 +-- .../2.1.0-valid/signal-1.c.moved.sarif| 14 +-- .../2.1.0-valid/signal-1.c.sarif | 14 +-- 5 files changed, 29 insertions(+), 29 deletions(-) diff --git a/gcc/libsarifreplay.cc b/gcc/libsarifreplay.cc index 61d9565588e..71f80797926 100644 --- a/gcc/libsarifreplay.cc +++ b/gcc/libsarifreplay.cc @@ -1739,7 +1739,7 @@ handle_region_object (const json::object ®ion_obj, /* SARIF's endColumn is 1 beyond the final column in the region, whereas GCC's end columns are inclusive. */ end = m_output_mgr.new_location_from_file_line_column - (file, end_line, end_column_jnum->get ()); + (file, end_line, end_column_jnum->get () - 1); } else { diff --git a/gcc/testsuite/sarif-replay.dg/2.1.0-valid/error-with-note.sarif b/gcc/testsuite/sarif-replay.dg/2.1.0-valid/error-with-note.sarif index 0d75a693cdf..77d5a4ee181 100644 --- a/gcc/testsuite/sarif-replay.dg/2.1.0-valid/error-with-note.sarif +++ b/gcc/testsuite/sarif-replay.dg/2.1.0-valid/error-with-note.sarif @@ -26,12 +26,12 @@ /* { dg-begin-multiline-output "" } /this/does/not/exist/test.bas:2:8: error: 'GOTO' is considered harmful 2 |GOTO label - |^~ + |^~ { dg-end-multiline-output "" } */ /* { dg-begin-multiline-output "" } /this/does/not/exist/test.bas:1:1: note: this is the target of the 'GOTO' 1 | label: PRINT "hello world!" - | ^~ + | ^ { dg-end-multiline-output "" } */ // TODO: trailing [error] diff --git a/gcc/testsuite/sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif b/gcc/testsuite/sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif index 55c646bb5ad..947d65c6a7e 100644 --- a/gcc/testsuite/sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif +++ b/gcc/testsuite/sarif-replay.dg/2.1.0-valid/malloc-vs-local-4.c.sarif @@ -339,37 +339,37 @@ In function 'callee_1': /not/a/real/path/malloc-vs-local-4.c:5:3: warning: dereference of possibly-NULL ‘ptr’ [-Wanalyzer-possible-null-dereference] 5 | *ptr = 42; - | ^~ + | ^ 'test_1': events 1-5 | |8 | int test_1 (int i, int flag) -| | ^~~ +| | ^~ | | | | | (1) entry to ‘test_1’ |.. | 12 | if (flag) -| | ~~ +| | ~ | | | | | (2) following ‘true’ branch (when ‘flag != 0’)... | 13 | ptr = (int *)malloc (sizeof (int)); -| | ~~ +| | ~ | | | | | (3) ...to here | | (4) this call could return NULL | 14 | callee_1 (ptr); -| | ~~~ +| | ~~ | | | | | (5) calling ‘callee_1’ from ‘test_1’ | +--> 'callee_1': events 6-7 | |3 | void __attribute__((noinline)) callee_1 (int *ptr) - | |^ + | |^~~~ | || | |(6) entry to ‘callee_1’ |4 | { |5 | *ptr = 42; - | | ~~ + | | ~ | | | | | (7) ‘ptr’ could be NULL: unchecked value from (4) | @@ -378,24 +378,24 @@ In function 'callee_1': In function 'test_2': /not/a/real/path/malloc-vs-local-4.c:38:7: warning: double-‘free’ of ‘ptr’ [-Wanalyzer-double-free] 38 | free (ptr); - | ^~~ + | ^~ 'test_2': events 1-5 34 | if (!flag) - | ^~ + | ^ | |
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On 2/7/25 12:18 PM, Richard Sandiford wrote: FWIW, here's a very rough initial version of the kind of thing I was thinking about. Hopefully the hook documentation describes the approach. It's deliberately (overly?) flexible. I've included an aarch64 version that (a) models the fact that the first caller-save can also allocate the frame more-or-less for free, and (b) once we've saved an odd number of GPRs, saving one more is essentialy free. I also hacked up an x86 version locally to model the allocation benefits of using caller-saved registers. It seemed to fix the povray example above. This still needs a lot of clean-up and testing, but I thought I might as well send what I have before leaving for the weekend. Does it look reasonable in principle? Richard, thank you for continuing work on this problem. These hooks and their implementation have much more sense to me. Although it is difficult to predict that it will solve all existing related PRs. You definitely get my approval of your hooks if you will manage not to have new GCC testsuite failures with these hooks on x86-64, aarch64, and ppc64.
[RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW
So this is a fairly old regression, but with all the ranger work that's been done, it's become easy to resolve. The basic idea here is to use known relationships between two operands of a SUB_OVERFLOW IFN to statically compute the overflow state and ultimately allow turning the IFN into simple arithmetic (or for the tests in this BZ elide the arithmetic entirely). The regression example is when the two inputs are known equal. In that case the subtraction will never overflow.But there's a few other cases we can handle as well. a == b -> never overflows a > b -> never overflows when A and B are unsigned a >= b -> never overflows when A and B are unsigned a < b -> always overflows when A and B are unsigned Bootstrapped and regression tested on x86, and regression tested on the usual cross platforms. OK for the trunk? JeffPR tree-optimization/98028 gcc/ * vr-values.cc (check_for_binary_op_overflow): Try to use a known relationship betwen op0/op1 to statically determine overflow state. gcc/testsuite * gcc.dg/tree-ssa/pr98028.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98028.c b/gcc/testsuite/gcc.dg/tree-ssa/pr98028.c new file mode 100644 index 000..4e371b69235 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98028.c @@ -0,0 +1,26 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned f1(unsigned i, unsigned j) { + if (j != i) __builtin_unreachable(); + return __builtin_sub_overflow_p(i, j, (unsigned)0); +} + +unsigned f2(unsigned i, unsigned j) { + if (j > i) __builtin_unreachable(); + return __builtin_sub_overflow_p(i, j, (unsigned)0); +} + +unsigned f3(unsigned i, unsigned j) { + if (j >= i) __builtin_unreachable(); + return __builtin_sub_overflow_p(i, j, (unsigned)0); +} + +unsigned f4(unsigned i, unsigned j) { + if (j <= i) __builtin_unreachable(); + return __builtin_sub_overflow_p(i, j, (unsigned)0); +} + +/* { dg-final { scan-tree-dump-times "return 0" 3 optimized } } */ +/* { dg-final { scan-tree-dump-times "return 1" 1 optimized } } */ +/* { dg-final { scan-tree-dump-not "SUB_OVERFLOW" optimized } } */ +/* { dg-final { scan-tree-dump-not "IMAGPART_EXPR" optimized } } */ diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc index ed590138fe8..29568e27c38 100644 --- a/gcc/vr-values.cc +++ b/gcc/vr-values.cc @@ -85,6 +85,33 @@ check_for_binary_op_overflow (range_query *query, enum tree_code subcode, tree type, tree op0, tree op1, bool *ovf, gimple *s = NULL) { + /* For MINUS_EXPR, we may know based the relationship + (if any) between op0 and op1. */ + if (subcode == MINUS_EXPR) +{ + relation_kind rel = query->relation().query (s, op0, op1); + + /* If the operands are equal, then the result will be zero +and there is never an overflow. */ + if (rel == VREL_EQ) + return true; + + /* If op0 and op1 are unsigned types, we still have a chance. */ + if (TYPE_UNSIGNED (TREE_TYPE (op0)) && TYPE_UNSIGNED (TREE_TYPE (op1))) + { + /* op0 > op1 or op0 >= op1 never overflows. */ + if (rel == VREL_GT || rel == VREL_GE) + return true; + + /* And op0 < op1 always overflows. */ + if (rel == VREL_LT) + { + *ovf = true; + return true; + } + } +} + int_range_max vr0, vr1; if (!query->range_of_expr (vr0, op0, s) || vr0.undefined_p ()) vr0.set_varying (TREE_TYPE (op0));
Re: [PATCH 2/3] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.
在 2025/2/11 下午9:26, Xi Ruoyao 写道: On Tue, 2025-02-11 at 20:49 +0800, Lulu Cheng wrote: Split the implementation of the function loongarch_cpu_cpp_builtins into two parts: 1. Macro definitions that do not change (only considering 64-bit architecture) 2. Macro definitions that change with different compilation options. gcc/ChangeLog: * config/loongarch/loongarch-c.cc (builtin_undef): New macro. (loongarch_cpu_cpp_builtins): Split to loongarch_update_cpp_builtins and loongarch_define_unconditional_macros. (loongarch_def_or_undef): New functions. (loongarch_define_unconditional_macros): Likewise. (loongarch_update_cpp_builtins): Likewise. Change-Id: Ifae73ffa2a07a595ed2a7f6ab7b82d8f51328a2a --- /* snip */ I guess the handling for la_evo_macro_name macros (like __loongarch_div32) and __loongarch_version_major/__loongarch_version_minor should be moved as well? Things like #pragma GCC target("arch=la664") may affect them. It seems that the following four also need to be updated. I will make corrections in v2 and add the corresponding test cases. builtin_define_with_value ("__loongarch_arch", loongarch_arch_strings[la_target.cpu_arch], 1); builtin_define_with_value ("__loongarch_tune", loongarch_tune_strings[la_target.cpu_tune], 1); builtin_define_with_value ("_LOONGARCH_ARCH", loongarch_arch_strings[la_target.cpu_arch], 1); builtin_define_with_value ("_LOONGARCH_TUNE", loongarch_tune_strings[la_target.cpu_tune], 1);
Re: [PATCH 3/8] LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description
在 2025/2/7 下午8:09, Xi Ruoyao 写道: These pattern definitions are tediously long, invoking 32 UNSPECs and many hard-coded long const vectors. To simplify them, at first we use the TImode vector operations instead of the UNSPECs, then we adopt an approach in AArch64: using a special predicate to match the const vectors for odd/even indices for define_insn's, and generate those vectors in define_expand's. For "backward compatibilty" we need to provide a "punned" version for the operations invoking TImode vectors as the intrinsics still expect DImode vectors. The stat is "201 insertions, 905 deletions." /* snip */ diff --git a/gcc/config/loongarch/loongarch-modes.def b/gcc/config/loongarch/loongarch-modes.def index e632f03636b..07cc29fceee 100644 --- a/gcc/config/loongarch/loongarch-modes.def +++ b/gcc/config/loongarch/loongarch-modes.def @@ -32,6 +32,7 @@ VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ /* For LARCH LSX 128 bits. */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */ VECTOR_MODES (FLOAT, 16); /* V4SF V2DF */ +VECTOR_MODE (INT, TI, 1); /*V1TI */ /* For LARCH LASX 256 bits. */ VECTOR_MODES (INT, 32); /* V32QI V16HI V8SI V4DI */ /* For LARCH LASX 256 bits. */ - VECTOR_MODES (INT, 32);/* V32QI V16HI V8SI V4DI */ + VECTOR_MODES (INT, 32);/* V32QI V16HI V8SI V4DI V2TI */ Could you mark V2TI in v2?:-) @@ -49,6 +50,7 @@ VECTOR_MODE (INT, QI, 64);/* V64QI*/ VECTOR_MODE (INT, HI, 32);/* V32HI*/ VECTOR_MODE (INT, SI, 16);/* V16SI*/ VECTOR_MODE (INT, DI, 8); /* V8DI */ +VECTOR_MODE (INT, TI, 4); /* V4TI */ VECTOR_MODE (FLOAT, SF, 16); /* V16SF*/ VECTOR_MODE (FLOAT, DF, 8); /* V8DF */
[COMMITTED] RISC-V: Vector pesudoinsns with x0 operand to use imm 0
A couple of Vector pseudoinstructions use x0 scalar which could be inefficient on wider uarches due to regfile crossing. Instead use the imm 0 form, which should be functionally equivalent. pseudoinsnorig insn with x0 this patch --- vneg.v vd,vs vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0 vncvt.x.x.w vd,vs,vm vnsrl.wx vd,vs,x0,vm vnsrl.wi vd,vs,0,vm vwcvt.x.x.v vd,vs,vm vwadd.vx vd,vs,x0,vm (imm not supported) gcc/ChangeLog: * config/riscv/vector.md: vncvt substitute vnsrl. vnsrl with x0 replace with immediate 0. vneg substitute vrsub. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: Change expected pattern. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto. * gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto. * gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto. Signed-off-by: Vineet Gupta --- gcc/config/riscv/vector.md| 16 ++--- .../cond/cond_convert_int2int-rv32-1.c| 4 ++-- .../cond/cond_convert_int2int-rv32-2.c| 4 ++-- .../cond/cond_convert_int2int-rv64-1.c| 4 ++-- .../cond/cond_convert_int2int-rv64-2.c| 4 ++-- .../riscv/rvv/autovec/cond/cond_unary-1.c | 6 ++--- .../riscv/rvv/autovec/cond/cond_unary-2.c | 6 ++--- .../riscv/rvv/autovec/cond/cond_unary-3.c | 6 ++--- .../riscv/rvv/autovec/cond/cond_unary-4.c | 6 ++--- .../riscv/rvv/autovec/cond/cond_unary-5.c | 6 ++--- .../riscv/rvv/autovec/cond/cond_unary-6.c | 6 ++--- .../riscv/rvv/autovec/cond/cond_unary-7.c | 6 ++--- .../riscv/rvv/autovec/cond/cond_unary-8.c | 6 ++--- .../rvv/autovec/conversions/vncvt-rv32gcv.c | 2 +- .../rvv/autovec/conversions/vncvt-rv64gcv.c | 2 +- .../autovec/sat/vec_sat_u_sub_trunc-1-u16.c | 2 +- .../autovec/sat/vec_sat_u_sub_trunc-1-u32.c | 2 +- .../autovec/sat/vec_sat_u_sub_trunc-1-u8.c| 2 +- .../riscv/rvv/autovec/unop/abs-rv32gcv.c | 2 +- .../riscv/rvv/autovec/unop/abs-rv64gcv.c | 2 +- .../riscv/rvv/autovec/unop/vneg-rv32gcv.c | 2 +- .../riscv/rvv/autovec/unop/vneg-rv64gcv.c | 2 +- .../gcc.target/riscv/rvv/autovec/vls/abs-2.c | 2 +- .../riscv/rvv/autovec/vls/cond_convert-11.c | 2 +- .../riscv/rvv/autovec/vls/cond_convert-12.c | 2 +- .../riscv/rvv/autovec/vls/cond_neg-1.c| 2 +- .../riscv/rvv/autovec/vls/cond_trunc-1.c | 2 +- .../riscv/rvv/autovec/vls/cond_trunc-2.c | 2 +- .../riscv/rvv/autovec/vls/cond_trunc-3.c | 2 +- .../riscv/rvv/autovec/vls/convert-11.c| 2 +- .../riscv/rvv/autovec/vls/convert-12.c| 2 +- .../gcc.target/riscv/rvv/autove
[PATCH] ifcvt: Don't speculation move inline-asm [PR102150]
So unlike loop invariant motion, moving an inline-asm out of an if is not always profitable and the cost estimate for the instruction inside inline-asm is unknown. This is a regression from GCC 4.6 which didn't speculatively move inline-asm as far as I can tell. Bootstrapped and tested on x86_64-linux-gnu. PR rtl-optimization/102150 gcc/ChangeLog: * ifcvt.cc (cheap_bb_rtx_cost_p): Return false if the insn has an inline-asm in it. Signed-off-by: Andrew Pinski --- gcc/ifcvt.cc | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index cb5597bc171..707937ba2f0 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -166,6 +166,12 @@ cheap_bb_rtx_cost_p (const_basic_block bb, { if (NONJUMP_INSN_P (insn)) { + /* Inline-asm's cost is not very estimatable. +It could be a costly instruction but the +estimate would be the same as a non costly +instruction. */ + if (asm_noperands (PATTERN (insn)) >= 0) + return false; int cost = insn_cost (insn, speed) * REG_BR_PROB_BASE; if (cost == 0) return false; -- 2.43.0
[PATCH] x86: Properly find the maximum stack slot alignment
Don't assume that stack slots can only be accessed by stack or frame registers. We first find all registers defined by stack or frame registers. Then check memory accesses by such registers, including stack and frame registers. gcc/ PR target/109780 PR target/109093 * config/i386/i386.cc (ix86_update_stack_alignment): New. (ix86_find_all_reg_use): Likewise. (ix86_find_max_used_stack_alignment): Also check memory accesses from registers defined by stack or frame registers. gcc/testsuite/ PR target/109780 PR target/109093 * g++.target/i386/pr109780-1.C: New test. * gcc.target/i386/pr109093-1.c: Likewise. * gcc.target/i386/pr109780-1.c: Likewise. * gcc.target/i386/pr109780-2.c: Likewise. -- H.J. From 13da9e9be612333b7df7f66cf4b4c1396a64d89d Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 14 Mar 2023 11:41:51 -0700 Subject: [PATCH] x86: Properly find the maximum stack slot alignment Don't assume that stack slots can only be accessed by stack or frame registers. We first find all registers defined by stack or frame registers. Then check memory accesses by such registers, including stack and frame registers. gcc/ PR target/109780 PR target/109093 * config/i386/i386.cc (ix86_update_stack_alignment): New. (ix86_find_all_reg_use): Likewise. (ix86_find_max_used_stack_alignment): Also check memory accesses from registers defined by stack or frame registers. gcc/testsuite/ PR target/109780 PR target/109093 * g++.target/i386/pr109780-1.C: New test. * gcc.target/i386/pr109093-1.c: Likewise. * gcc.target/i386/pr109780-1.c: Likewise. * gcc.target/i386/pr109780-2.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386.cc| 128 + gcc/testsuite/g++.target/i386/pr109780-1.C | 72 gcc/testsuite/gcc.target/i386/pr109093-1.c | 38 ++ gcc/testsuite/gcc.target/i386/pr109780-1.c | 14 +++ gcc/testsuite/gcc.target/i386/pr109780-2.c | 21 5 files changed, 252 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C create mode 100644 gcc/testsuite/gcc.target/i386/pr109093-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 3128973ba79..495b97116a4 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -8466,6 +8466,65 @@ output_probe_stack_range (rtx reg, rtx end) return ""; } +/* Update the maximum stack slot alignment from memory alignment in + PAT. */ + +static void +ix86_update_stack_alignment (rtx, const_rtx pat, void *data) +{ + /* This insn may reference stack slot. Update the maximum stack slot + alignment. */ + subrtx_iterator::array_type array; + FOR_EACH_SUBRTX (iter, array, pat, ALL) +if (MEM_P (*iter)) + { + unsigned int alignment = MEM_ALIGN (*iter); + unsigned int *stack_alignment + = (unsigned int *) data; + if (alignment > *stack_alignment) + *stack_alignment = alignment; + break; + } +} + +/* Find all registers defined with REG. */ + +static void +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access, + unsigned int reg, auto_bitmap &worklist) +{ + for (df_ref ref = DF_REG_USE_CHAIN (reg); + ref != NULL; + ref = DF_REF_NEXT_REG (ref)) +{ + if (DF_REF_IS_ARTIFICIAL (ref)) + continue; + + rtx_insn *insn = DF_REF_INSN (ref); + if (!NONDEBUG_INSN_P (insn)) + continue; + + rtx set = single_set (insn); + if (!set) + continue; + + rtx src = SET_SRC (set); + if (MEM_P (src)) + continue; + + rtx dest = SET_DEST (set); + if (!REG_P (dest)) + continue; + + if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest))) + continue; + + /* Add this register to stack_slot_access. */ + add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest)); + bitmap_set_bit (worklist, REGNO (dest)); +} +} + /* Set stack_frame_required to false if stack frame isn't required. Update STACK_ALIGNMENT to the largest alignment, in bits, of stack slot used if stack frame is required and CHECK_STACK_SLOT is true. */ @@ -8484,10 +8543,6 @@ ix86_find_max_used_stack_alignment (unsigned int &stack_alignment, add_to_hard_reg_set (&set_up_by_prologue, Pmode, HARD_FRAME_POINTER_REGNUM); - /* The preferred stack alignment is the minimum stack alignment. */ - if (stack_alignment > crtl->preferred_stack_boundary) -stack_alignment = crtl->preferred_stack_boundary; - bool require_stack_frame = false; FOR_EACH_BB_FN (bb, cfun) @@ -8499,27 +8554,58 @@ ix86_find_max_used_stack_alignment (unsigned int &stack_alignment, set_up_by_prologue)) { require_stack_frame = true; - - if (check_stack_slot) - { - /* Find the maximum stack alignment. */ - subrtx_iterator::array_type array; - FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL) - if (MEM_P (*iter) - && (r
Re: [PATCH]AArch64: Fix GCC 13 backport of big.Little CPU detection [PR118800]
Tamar Christina writes: > Hi All, > > It seems I ran regressions but forgot to check them last time `(*>?<*)? > > On the GCC-13 branch the backport caused a failure due to the branch not > having > generic-armv8-a and also it still treating the generic cpu special. This made > it return NULL when trying to find the default CPU. > > In GCC 13 we still had multiple structures with the same information and in > this > case aarch64_cpu_data was missing the generic CPU which is in all_cores. > > This corrects it by using "generc" instead and also adding it to > aarch64_cpu_data. > > Bootstrapped Regtested on aarch64-none-linux-gnu on GCC-13 branch and no > issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/118800 > * config/aarch64/driver-aarch64.cc (DEFAULT_CPU): Use generic instead of > generic-armv8-a. > (aarch64_cpu_data): Add generic. > > gcc/testsuite/ChangeLog: > > PR target/118800 > * gcc.target/aarch64/cpunative/native_cpu_34.c: Update order. OK, thanks. Reading this made me think that INVALID_IMP and INVALID_CORE might be better for the generic entries, rather than 0x0 and 0x0. But that applies to trunk and gcc-14 too, so isn't something to change here. Richard > > --- > > diff --git a/gcc/config/aarch64/driver-aarch64.cc > b/gcc/config/aarch64/driver-aarch64.cc > index > ff4660f469cd5c899c981ee8181d1794fade..acc44536629e814a2aea0e4b21e327da3fa5d6ea > 100644 > --- a/gcc/config/aarch64/driver-aarch64.cc > +++ b/gcc/config/aarch64/driver-aarch64.cc > @@ -60,7 +60,7 @@ struct aarch64_core_data > #define ALL_VARIANTS ((unsigned)-1) > /* Default architecture to use if -mcpu=native did not detect a known CPU. > */ > #define DEFAULT_ARCH "8A" > -#define DEFAULT_CPU "generic-armv8-a" > +#define DEFAULT_CPU "generic" > > #define AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, > PART, VARIANT) \ >{ CORE_NAME, #ARCH, IMP, PART, VARIANT, feature_deps::cpu_##CORE_IDENT }, > @@ -68,6 +68,7 @@ struct aarch64_core_data > static CONSTEXPR const aarch64_core_data aarch64_cpu_data[] = > { > #include "aarch64-cores.def" > + { "generic", "armv8-a", 0, 0, ALL_VARIANTS, 0}, >{ NULL, NULL, INVALID_IMP, INVALID_CORE, ALL_VARIANTS, 0 } > }; > > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_34.c > b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_34.c > index > 168140002a0f0205c0f552de0cce9b2d356e09e2..d2ff8156d8fc14fcc14ddd91f43f0b0fea15cc7b > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_34.c > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_34.c > @@ -7,6 +7,6 @@ int main() >return 0; > } > > -/* { dg-final { scan-assembler {\.arch > armv8-a\+dotprod\+crc\+crypto\+sve2\n} } } */ > +/* { dg-final { scan-assembler {\.arch > armv8-a\+crc\+dotprod\+crypto\+sve2\n} } } */ > > /* Test a normal looking procinfo. */
[pushed] c++: change implementation of -frange-for-ext-temps [PR118574]
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- The implementation in r15-3840 used a novel technique of wrapping the entire range-for loop in a CLEANUP_POINT_EXPR, which confused the coroutines transformation. Instead let's use the existing extend_ref_init_temps mechanism. This does not revert all of r15-3840, only the parts that change how CLEANUP_POINT_EXPRs are applied to range-for declarations. PR c++/118574 PR c++/107637 gcc/cp/ChangeLog: * call.cc (struct extend_temps_data): New. (extend_temps_r, extend_all_temps): New. (set_up_extended_ref_temp): Handle tree walk case. (extend_ref_init_temps): Cal extend_all_temps. * decl.cc (initialize_local_var): Revert ext-temps change. * parser.cc (cp_convert_range_for): Likewise. (cp_parser_omp_loop_nest): Likewise. * pt.cc (tsubst_stmt): Likewise. * semantics.cc (finish_for_stmt): Likewise. gcc/testsuite/ChangeLog: * g++.dg/coroutines/range-for1.C: New test. --- gcc/cp/call.cc | 117 +-- gcc/cp/decl.cc | 5 - gcc/cp/parser.cc | 23 +--- gcc/cp/pt.cc | 22 gcc/cp/semantics.cc | 13 --- gcc/testsuite/g++.dg/coroutines/range-for1.C | 69 +++ 6 files changed, 180 insertions(+), 69 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/range-for1.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index e440d58141b..2c77b4a4b68 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -14154,6 +14154,20 @@ make_temporary_var_for_ref_to_temp (tree decl, tree type) return pushdecl (var); } +/* Data for extend_temps_r, mostly matching the parameters of + extend_ref_init_temps. */ + +struct extend_temps_data +{ + tree decl; + tree init; + vec **cleanups; + tree* cond_guard; + hash_set *pset; +}; + +static tree extend_temps_r (tree *, int *, void *); + /* EXPR is the initializer for a variable DECL of reference or std::initializer_list type. Create, push and return a new VAR_DECL for the initializer so that it will live as long as DECL. Any @@ -14162,7 +14176,8 @@ make_temporary_var_for_ref_to_temp (tree decl, tree type) static tree set_up_extended_ref_temp (tree decl, tree expr, vec **cleanups, - tree *initp, tree *cond_guard) + tree *initp, tree *cond_guard, + extend_temps_data *walk_data) { tree init; tree type; @@ -14198,10 +14213,16 @@ set_up_extended_ref_temp (tree decl, tree expr, vec **cleanups, suppress_warning (decl); } - /* Recursively extend temps in this initializer. */ - TARGET_EXPR_INITIAL (expr) -= extend_ref_init_temps (decl, TARGET_EXPR_INITIAL (expr), cleanups, -cond_guard); + /* Recursively extend temps in this initializer. The recursion needs to come + after creating the variable to conform to the mangling ABI, and before + maybe_constant_init because the extension might change its result. */ + if (walk_data) +cp_walk_tree (&TARGET_EXPR_INITIAL (expr), extend_temps_r, + walk_data, walk_data->pset); + else +TARGET_EXPR_INITIAL (expr) + = extend_ref_init_temps (decl, TARGET_EXPR_INITIAL (expr), cleanups, + cond_guard); /* Any reference temp has a non-trivial initializer. */ DECL_NONTRIVIALLY_INITIALIZED_P (var) = true; @@ -14801,7 +14822,8 @@ extend_ref_init_temps_1 (tree decl, tree init, vec **cleanups, if (TREE_CODE (*p) == TARGET_EXPR) { tree subinit = NULL_TREE; - *p = set_up_extended_ref_temp (decl, *p, cleanups, &subinit, cond_guard); + *p = set_up_extended_ref_temp (decl, *p, cleanups, &subinit, +cond_guard, nullptr); recompute_tree_invariant_for_addr_expr (sub); if (init != sub) init = fold_convert (TREE_TYPE (init), sub); @@ -14811,6 +14833,81 @@ extend_ref_init_temps_1 (tree decl, tree init, vec **cleanups, return init; } +/* Tree walk function for extend_all_temps. Generally parallel to + extend_ref_init_temps_1, but adapted for walk_tree. */ + +tree +extend_temps_r (tree *tp, int *walk_subtrees, void *data) +{ + extend_temps_data *d = (extend_temps_data *)data; + + if (TYPE_P (*tp) || TREE_CODE (*tp) == CLEANUP_POINT_EXPR) +{ + *walk_subtrees = 0; + return NULL_TREE; +} + + if (TREE_CODE (*tp) == COND_EXPR) +{ + cp_walk_tree (&TREE_OPERAND (*tp, 0), extend_temps_r, d, d->pset); + + auto walk_arm = [d](tree &op) + { + tree cur_cond_guard = NULL_TREE; + auto ov = make_temp_override (d->cond_guard, &cur_cond_guard); + cp_walk_tree (&op, extend_temps_r, d, d->pset); + if (cur_cond_guard) + { + tree set = build2 (MODIFY_EXPR, boolean_type_node, +
Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]
On 2/11/25 3:17 PM, Richard Sandiford wrote: Jeff Law writes: On 2/11/25 9:08 AM, Richard Sandiford wrote: Jeff Law writes: On 2/7/25 5:59 AM, Andrew Waterman wrote: This patch runs counter to the ABI spec, which states that vxrm is not preserved across calls and is volatile upon function entry [1]. vxrm does not play the same role as frm plays in the calling convention. (I won't get into the rationale in this email, but the rationale isn't especially important: we should follow the ABI.) [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120 Pan's patch doesn't change the basic property that VXRM has no known state at function entry or upon return from a function call. I think it will. global_regs[X] means that X is defined on entry, defined on exit, and can be changed by calls. If the register is call-clobbered/volatile/caller-saved, then I agree with Andrew that this doesn't look like the right fix. But the LCM code we use to manage vxrm assignments makes no assumption about incoming state and assumes no state is preserved across calls. In that case, I wonder what the patch is fixing. Like you say, the initial mode seems to be VXRM_MODE_NONE, and it looks like riscv_vxrm_mode_after correctly models calls as clobbering the mode. Just realized I didn't answer this part of your message. It's not really fixing any known issue. Just felt like the right thing to do as VXRM is roughly similar to (but clearly not 100% the same) FRM. jeff
[pushed] c++: don't default -frange-for-ext-temps in -std=gnu++20 [PR188574]
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- Since -frange-for-ext-temps has been causing trouble, let's not enable it by default in pre-C++23 GNU modes for GCC 15, and also allow disabling it in C++23 and up. PR c++/188574 gcc/c-family/ChangeLog: * c-opts.cc (c_common_post_options): Only enable -frange-for-ext-temps by default in C++23. gcc/ChangeLog: * doc/invoke.texi: Adjust -frange-for-ext-temps documentation. gcc/testsuite/ChangeLog: * g++.dg/cpp23/range-for3.C: Use -frange-for-ext-temps. * g++.dg/cpp23/range-for4.C: Adjust expected result. libgomp/ChangeLog: * testsuite/libgomp.c++/range-for-4.C: Adjust expected result. --- gcc/doc/invoke.texi | 5 ++--- gcc/c-family/c-opts.cc | 17 +++-- gcc/testsuite/g++.dg/cpp23/range-for3.C | 4 ++-- gcc/testsuite/g++.dg/cpp23/range-for4.C | 4 ++-- libgomp/testsuite/libgomp.c++/range-for-4.C | 2 +- 5 files changed, 10 insertions(+), 22 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 0aef2abf05b..56d43cb6779 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -3548,9 +3548,8 @@ easier, you can use @option{-fno-pretty-templates} to disable them. Enable lifetime extension of C++ range based for temporaries. With @option{-std=c++23} and above this is part of the language standard, so lifetime of the temporaries is extended until the end of the loop -regardless of this option. This option allows enabling that behavior also -in earlier versions of the standard and is enabled by default in the -GNU dialects, from @option{-std=gnu++11} until @option{-std=gnu++20}. +by default. This option allows enabling that behavior also +in earlier versions of the standard. @opindex fno-rtti @opindex frtti diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index 87b231861a6..d43b3aef102 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -1213,20 +1213,9 @@ c_common_post_options (const char **pfilename) if (cxx_dialect >= cxx20) flag_concepts = 1; - /* Enable lifetime extension of range based for temporaries for C++23. - Diagnose -std=c++23 -fno-range-for-ext-temps. */ - if (cxx_dialect >= cxx23) -{ - if (OPTION_SET_P (flag_range_for_ext_temps) - && !flag_range_for_ext_temps) - error ("%<-fno-range-for-ext-temps%> is incompatible with C++23"); - flag_range_for_ext_temps = 1; -} - /* Otherwise default to enabled in GNU modes but allow user to override. */ - else if (cxx_dialect >= cxx11 - && !flag_iso - && !OPTION_SET_P (flag_range_for_ext_temps)) -flag_range_for_ext_temps = 1; + /* Enable lifetime extension of range based for temporaries for C++23. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + flag_range_for_ext_temps, cxx_dialect >= cxx23); /* -fimmediate-escalation has no effect when immediate functions are not supported. */ diff --git a/gcc/testsuite/g++.dg/cpp23/range-for3.C b/gcc/testsuite/g++.dg/cpp23/range-for3.C index 301e25886ec..f95b21b3cee 100644 --- a/gcc/testsuite/g++.dg/cpp23/range-for3.C +++ b/gcc/testsuite/g++.dg/cpp23/range-for3.C @@ -1,7 +1,7 @@ // P2718R0 - Wording for P2644R1 Fix for Range-based for Loop // { dg-do run { target c++11 } } -// Verify -frange-for-ext-temps is set by default in -std=gnu++* modes. -// { dg-options "" } +// Verify -frange-for-ext-temps works in earlier standards. +// { dg-additional-options "-frange-for-ext-temps" } #define RANGE_FOR_EXT_TEMPS 1 #include "range-for1.C" diff --git a/gcc/testsuite/g++.dg/cpp23/range-for4.C b/gcc/testsuite/g++.dg/cpp23/range-for4.C index f8c380d32c7..16204974bac 100644 --- a/gcc/testsuite/g++.dg/cpp23/range-for4.C +++ b/gcc/testsuite/g++.dg/cpp23/range-for4.C @@ -1,7 +1,7 @@ // P2718R0 - Wording for P2644R1 Fix for Range-based for Loop // { dg-do run { target c++11 } } -// Verify -frange-for-ext-temps is set by default in -std=gnu++* modes. +// Verify -frange-for-ext-temps is not set by default in -std=gnu++* modes. // { dg-options "" } -#define RANGE_FOR_EXT_TEMPS 1 +#define RANGE_FOR_EXT_TEMPS 0 #include "range-for2.C" diff --git a/libgomp/testsuite/libgomp.c++/range-for-4.C b/libgomp/testsuite/libgomp.c++/range-for-4.C index 3c10e7349af..aa6e4da523c 100644 --- a/libgomp/testsuite/libgomp.c++/range-for-4.C +++ b/libgomp/testsuite/libgomp.c++/range-for-4.C @@ -3,5 +3,5 @@ // { dg-additional-options "-std=gnu++17" } // { dg-require-effective-target tls_runtime } -#define RANGE_FOR_EXT_TEMPS 1 +#define RANGE_FOR_EXT_TEMPS 0 #include "range-for-1.C" base-commit: 299a8e2dc667e795991bc439d2cad5ea5bd379e2 prerequisite-patch-id: aeecd9138d83da91723a418776494445063247f2 -- 2.48.1
[PATCH] c++: ICE with operator new[] in constexpr [PR118775]
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- Here we ICE since r11-7740 because we no longer say that (long)&a (where a is a global var) is non_constant_p. So VERIFY_CONSTANT does not return and we crash on tree_to_uhwi. We should check tree_fits_uhwi_p before calling tree_to_uhwi. PR c++/118775 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_call_expression): Check tree_fits_uhwi_p. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-new24.C: New test. * g++.dg/cpp2a/constexpr-new25.C: New test. --- gcc/cp/constexpr.cc | 7 + gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C | 25 ++ gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C | 27 3 files changed, 59 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index f142dd32bc8..f8f9a9df1a2 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -2909,6 +2909,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, gcc_assert (arg0); if (new_op_p) { + if (!tree_fits_uhwi_p (arg0)) + { + if (!ctx->quiet) + error_at (loc, "cannot allocate array: size too large"); + *non_constant_p = true; + return t; + } tree type = build_array_type_nelts (char_type_node, tree_to_uhwi (arg0)); tree var = build_decl (loc, VAR_DECL, diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C new file mode 100644 index 000..debb7f0f5c4 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C @@ -0,0 +1,25 @@ +// PR c++/118775 +// { dg-do compile { target c++20 } } + +int a; + +constexpr char * +f1 () +{ + constexpr auto p = new char[(long int) &a]; // { dg-error "size too large" } + return p; +} + +constexpr char * +f2 () +{ + auto p = new char[(long int) &a]; // { dg-error "size too large" } + return p; +} + +void +g () +{ + auto r1 = f2 (); + constexpr auto r2 = f2 (); // { dg-message "in .constexpr. expansion" } +} diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C new file mode 100644 index 000..91c0318abd8 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C @@ -0,0 +1,27 @@ +// PR c++/118775 +// { dg-do compile { target c++20 } } + +namespace std { +struct __uniq_ptr_impl { + constexpr __uniq_ptr_impl(char *) {} +}; +template struct unique_ptr { + __uniq_ptr_impl _M_t; + constexpr ~unique_ptr() {} +}; +template struct _MakeUniq; +template struct _MakeUniq<_Tp[]> { + typedef unique_ptr<_Tp[]> __array; +}; +template using __unique_ptr_array_t = _MakeUniq<_Tp>::__array; +constexpr __unique_ptr_array_t make_unique(long __num) { + return unique_ptr(new char[__num]); +} +} // namespace std +int a; +int +main () +{ + std::unique_ptr p = std::make_unique((long)&a); + constexpr std::unique_ptr p2 = std::make_unique((long)&a); // { dg-error "conversion" } +} base-commit: 299a8e2dc667e795991bc439d2cad5ea5bd379e2 -- 2.48.1
Re: [PATCH v2] RISC-V: Vector pesudoinsns with x0 operand to use imm 0
On 2/9/25 5:20 AM, Vineet Gupta wrote: On 2/8/25 23:02, Jeff Law wrote: On 2/7/25 9:34 PM, Vineet Gupta wrote: A couple of Vector pseudoinstructions use x0 scalar which being regfile crosser could be inefficient on certain wider uarches. Use the imm 0 form, which should be functionally equivalent. pseudoinsnorig insn with x0 this patch --- vneg.v vd,vs vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0 vncvt.x.x.w vd,vs,vm vnsrl.wx vd,vs,x0,vm vnsrl.wi vd,vs,0,vm vwcvt.x.x.v vd,vs,vm vwadd.vx vd,vs,x0,vm (imm not supported) This passes my testsuite A/B run but obviously wait for the CI tester to give a green light. gcc/ChangeLog: * config/riscv/vector.md: vncvt substitute vnsrl. vnsrl with x0 replace with immediate 0. vneg substitute vrsub. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: Change expected pattern. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto. * gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto. * gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto. LGTM. I think the only question is whether or not to make an exception for this or not. We are in stage4 after all ;-) Figure we can make a decision on the Tues call if you're available. I don't have a strong opinion either way, just wanted to get it out of my tree :-) Yeah sure, 9 PM IST is manageable. FTR, this patch was discussed during the RISC-V patchwork meeting this morning. The consensus was it was safe to go in now even though we're in stage4. The only technical concern raised was the introduction of C code fragments to generate final asm, which we've largely avoided in the port. But it was considered a fairly minor concern. So officially OK for the trunk now. jeff
Re: [PATCH] RISC-V: Drop __riscv_vendor_feature_bits
On 2/11/25 12:35 AM, Yangyu Chen wrote: As discussed from RISC-V C-API PR #101 [1], As discussed in #96, current interface is insufficient to support some cases, like a vendor buying a CPU IP from the upstream vendor but using their own mvendorid and custom features from the upstream vendor. In this case, we might need to add these extensions for each downstream vendor many times. Thus, making __riscv_vendor_feature_bits guarded by mvendorid is not a good idea. So, drop __riscv_vendor_feature_bits for now, and we should have time to discuss a better solution. [1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/101 Signed-off-by: Yangyu Chen gcc/ChangeLog: * config/riscv/riscv-feature-bits.h (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop. (struct riscv_vendor_feature_bits): Drop. libgcc/ChangeLog: * config/riscv/feature_bits.c (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop. (__init_riscv_features_bits_linux): Drop. Thanks. I've pushed this to the trunk. jeff
[COMMITTED] Doc: Fix Texinfo warning in install.texi
For some time I've been seeing this Texinfo warning in my builds: .../gcc/doc/install.texi:2295: warning: `.' or `,' must follow @xref, not f Fixed thusly. gcc/ChangeLog * doc/install.texi: Add missing comma after @xref to fix warning. --- gcc/doc/install.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index d6cf318b3af..bd7a38048eb 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -2292,7 +2292,7 @@ canadian cross build. The @option{--disable-nls} option disables NLS@. Note that this functionality requires either libintl (provided by GNU gettext) or C standard library that contains support for gettext (such as the GNU C Library). -@xref{with-included-gettext,,--with-included-gettext} for more +@xref{with-included-gettext,,--with-included-gettext}, for more information on the conditions required to get gettext support. @item --with-libintl-prefix=@var{dir} -- 2.34.1
[COMMITTED] Doc: Fix some typos and other nearby sloppy-writing issues
I spotted some typos in the GCC manual. Since often these are a sign that the text was inserted without being proofread, I looked at the context and fixed some grammar/punctuation/wording issues as well. gcc/ChangeLog * doc/extend.texi: Fix a bunch of typos and other writing bugs. * doc/invoke.texi: Likewise. --- gcc/doc/extend.texi | 85 ++--- gcc/doc/invoke.texi | 62 - 2 files changed, 73 insertions(+), 74 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index d79e97d9a03..065bd8b84e1 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -1004,17 +1004,16 @@ The ISO C++14 library also defines the @samp{i} suffix, so C++14 code that includes the @samp{} header cannot use @samp{i} for the GNU extension. The @samp{j} suffix still has the GNU meaning. -GCC can handle both implicit and explicit casts between the @code{_Complex} -types and other @code{_Complex} types as casting both the real and imaginary -parts to the scalar type. -GCC can handle implicit and explicit casts from a scalar type to a @code{_Complex} -type and where the imaginary part will be considered zero. -The C front-end can handle implicit and explicit casts from a @code{_Complex} type -to a scalar type where the imaginary part will be ignored. In C++ code, this cast -is considered illformed and G++ will error out. +GCC handles both implicit and explicit casts between the +@code{_Complex} types with different scalar base types by casting both +the real and imaginary parts to the base type of the result. +GCC also handles implicit and explicit casts from a scalar type to a +@code{_Complex} type, by giving the imaginary part a zero value. -GCC provides a built-in function @code{__builtin_complex} will can be used to -construct a complex value. +The C front end can handle implicit and explicit casts from a +@code{_Complex} type to a scalar type, which uses the value of the +real part and ignores the imaginary part. In C++ code, this cast is +considered ill-formed and G++ diagnoses it as an error. @cindex @code{__real__} keyword @cindex @code{__imag__} keyword @@ -1023,7 +1022,7 @@ GCC has a few extensions which can be used to extract the real and the imaginary part of the complex-valued expression. Note these expressions are lvalues if the @var{exp} is an lvalue. These expressions operands have the type of a complex type -which might get prompoted to a complex type from a scalar type. +which might get promoted to a complex type from a scalar type. E.g. @code{__real__ (int)@var{x}} is the same as casting to @code{_Complex int} before @code{__real__} is done. @@ -1035,7 +1034,7 @@ E.g. @code{__real__ (int)@var{x}} is the same as casting to @tab Extract the imaginary part of @var{exp}. @end multitable -For values of floating point, you should use the ISO C99 +For values of floating-point type, you should use the ISO C99 functions, declared in @code{} and also provided as built-in functions by GCC@. @@ -1053,7 +1052,7 @@ with a complex type. This is a GNU extension; for values of floating type, you should use the ISO C99 functions @code{conjf}, @code{conj} and @code{conjl}, declared in @code{} and also provided as built-in functions by GCC@. Note unlike the @code{__real__} -and @code{__imag__} operators, this operator will not do an implicit cast +and @code{__imag__} operators, this operator does not do an implicit cast to the complex type because the @samp{~} is already a normal operator. GCC can allocate complex automatic variables in a noncontiguous @@ -3526,7 +3525,7 @@ mismatched allocation and deallocation functions and diagnose them under the control of options such as @option{-Wmismatched-dealloc}. It also makes it possible to diagnose attempts to deallocate objects that were not allocated dynamically, by @option{-Wfree-nonheap-object}. To indicate -that an allocation function both satisifies the nonaliasing property and +that an allocation function both satisfies the nonaliasing property and has a deallocator associated with it, both the plain form of the attribute and the one with the @var{deallocator} argument must be used. The same function can be both an allocator and a deallocator. Since inlining one @@ -3949,7 +3948,7 @@ caveats. If the pointer argument is also referred to by an @code{access} attribute on the function with @var{access-mode} either @code{read_only} or @code{read_write} and the latter attribute has the optional @var{size-index} argument -referring to a size argument, this expressses the maximum size of the access. +referring to a size argument, this expresses the maximum size of the access. For example, given: @smallexample @@ -4378,7 +4377,7 @@ is a usage of a function with @code{target_clones} attribute. Note that any subsequent call of a function without @code{target_clone} from a @code{target_clone} caller will not lead to copying (
[COMMITTED] Doc: Delete obsolete interface.texi chapter from GCC internals manual
The "Interfacing to GCC Output" chapter used to be part of the user-facing GCC documentation but ended up in the GCC internals manual when the two documents were separated in 2001. It hasn't been updated in any substantive way since then, and is now very bit-rotten. (PCC is no longer the "standard compiler" on any target, and the target-specific issues mentioned are for very old architectures.) Meanwhile, the GCC user documentation now has a chapter called "Binary Compatibility" that covers ABI issues in a generic way and also covers C++ compatibility. Let's keep that one and throw out the obsolete text that seems to predate the whole notion of an ABI. gcc/ChangeLog * Makefile.in (TEXI_GCCINT_FILES): Remove interface.texi. * doc/gccint.texi (Top): Remove menu entry for the "interface" node, and include of interface.texi. * doc/interface.texi: Delete. --- gcc/Makefile.in| 2 +- gcc/doc/gccint.texi| 5 +-- gcc/doc/interface.texi | 70 -- 3 files changed, 2 insertions(+), 75 deletions(-) delete mode 100644 gcc/doc/interface.texi diff --git a/gcc/Makefile.in b/gcc/Makefile.in index a8e32e25cf5..c159825e62c 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3697,7 +3697,7 @@ TEXI_GCC_FILES = gcc.texi gcc-common.texi gcc-vers.texi frontends.texi\ # the *.texi files have changed. TEXI_GCCINT_FILES = gccint.texi gcc-common.texi gcc-vers.texi \ contribute.texi makefile.texi configterms.texi options.texi\ -portability.texi interface.texi passes.texi rtl.texi md.texi \ +portability.texi passes.texi rtl.texi md.texi \ $(srcdir)/doc/tm.texi hostconfig.texi fragments.texi \ configfiles.texi collect2.texi headerdirs.texi funding.texi\ gnu.texi gpl_v3.texi fdl.texi contrib.texi languages.texi \ diff --git a/gcc/doc/gccint.texi b/gcc/doc/gccint.texi index eea2d48f87a..d88fc1a1c68 100644 --- a/gcc/doc/gccint.texi +++ b/gcc/doc/gccint.texi @@ -87,8 +87,7 @@ Compiler Collection (GCC)}. This manual is mainly a reference manual rather than a tutorial. It discusses how to contribute to GCC (@pxref{Contributing}), the characteristics of the machines supported by GCC as hosts and targets -(@pxref{Portability}), how GCC relates to the ABIs on such systems -(@pxref{Interface}), and the characteristics of the languages for +(@pxref{Portability}), and the characteristics of the languages for which GCC front ends are written (@pxref{Languages}). It then describes the GCC source tree structure and build system, some of the interfaces to GCC front ends, and how support for a target system is @@ -100,7 +99,6 @@ Additional tutorial information is linked to from @menu * Contributing::How to contribute to testing and developing GCC. * Portability:: Goals of GCC's portability features. -* Interface:: Function-call interface of GCC output. * Libgcc:: Low-level runtime library used by GCC. * Languages:: Languages for which GCC front ends are written. * Source Tree:: GCC source tree structure and build system. @@ -141,7 +139,6 @@ Additional tutorial information is linked to from @include contribute.texi @include portability.texi -@include interface.texi @include libgcc.texi @include languages.texi @include sourcebuild.texi diff --git a/gcc/doc/interface.texi b/gcc/doc/interface.texi deleted file mode 100644 index 1688d6f66ec..000 --- a/gcc/doc/interface.texi +++ /dev/null @@ -1,70 +0,0 @@ -@c Copyright (C) 1988-2025 Free Software Foundation, Inc. -@c This is part of the GCC manual. -@c For copying conditions, see the file gcc.texi. - -@node Interface -@chapter Interfacing to GCC Output -@cindex interfacing to GCC output -@cindex run-time conventions -@cindex function call conventions -@cindex conventions, run-time - -GCC is normally configured to use the same function calling convention -normally in use on the target system. This is done with the -machine-description macros described (@pxref{Target Macros}). - -@cindex unions, returning -@cindex structures, returning -@cindex returning structures and unions -However, returning of structure and union values is done differently on -some target machines. As a result, functions compiled with PCC -returning such types cannot be called from code compiled with GCC, -and vice versa. This does not cause trouble often because few Unix -library routines return structures or unions. - -GCC code returns structures and unions that are 1, 2, 4 or 8 bytes -long in the same registers used for @code{int} or @code{double} return -values. (GCC typically allocates variables of such types in -registers also.) Structures and unions of other sizes are returned by -storing them into an address passed by the caller (usually in a -register). The target hook @code{TARGET_STRUCT_VALUE_RTX} -tells GCC where to pass this address. - -By contrast, PCC on most target machines retur
Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]
On 2/11/25 3:17 PM, Richard Sandiford wrote: Jeff Law writes: On 2/11/25 9:08 AM, Richard Sandiford wrote: Jeff Law writes: On 2/7/25 5:59 AM, Andrew Waterman wrote: This patch runs counter to the ABI spec, which states that vxrm is not preserved across calls and is volatile upon function entry [1]. vxrm does not play the same role as frm plays in the calling convention. (I won't get into the rationale in this email, but the rationale isn't especially important: we should follow the ABI.) [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120 Pan's patch doesn't change the basic property that VXRM has no known state at function entry or upon return from a function call. I think it will. global_regs[X] means that X is defined on entry, defined on exit, and can be changed by calls. If the register is call-clobbered/volatile/caller-saved, then I agree with Andrew that this doesn't look like the right fix. But the LCM code we use to manage vxrm assignments makes no assumption about incoming state and assumes no state is preserved across calls. In that case, I wonder what the patch is fixing. Like you say, the initial mode seems to be VXRM_MODE_NONE, and it looks like riscv_vxrm_mode_after correctly models calls as clobbering the mode. In the FRM case, the problem was that we had: entry: call initialize X := FRM ... FRM := X Since FRM was not previously defined on entry, and since the call in any case was assumed to clobber FRM, the X := FRM seemed to be reading an uninitialised value, and so the FRM := X could be folded away. But from your description, and from an admittedly cursory look at the code, it sounds like that couldn't happen for VXRM. The biggest difference with FRM is you can't call into glibc with FRM in a non-default state. VXRM is simpler in that regard as its entirely managed by the compiler with no expectations of state within glibc. VXRM also has far fewer uses than FRM as VXRM just twiddles rounding modes for one small group of instructions. Jeff Richard
Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description
在 2025/2/12 上午3:30, Xi Ruoyao 写道: On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote: 在 2025/2/7 下午8:09, Xi Ruoyao 写道: /* snip */ - -(define_insn "lasx_xvpickev_w" - [(set (match_operand:V8SI 0 "register_operand" "=f") - (vec_select:V8SI - (vec_concat:V16SI - (match_operand:V8SI 1 "register_operand" "f") - (match_operand:V8SI 2 "register_operand" "f")) - (parallel [(const_int 0) (const_int 2) - (const_int 8) (const_int 10) - (const_int 4) (const_int 6) - (const_int 12) (const_int 14)])))] - "ISA_HAS_LASX" - "xvpickev.w\t%u0,%u2,%u1" - [(set_attr "type" "simd_permute") - (set_attr "mode" "V8SI")]) - /* snip */ +;; Picking even/odd elements. +(define_insn "simd_pick_evod_" + [(set (match_operand:ALLVEC 0 "register_operand" "=f") + (vec_select:ALLVEC + (vec_concat: + (match_operand:ALLVEC 1 "register_operand" "f") + (match_operand:ALLVEC 2 "register_operand" "f")) + (match_operand: 3 "vect_par_cnst_even_or_odd_half")))] For LASX, the generated select array is problematic, taking xvpickev.w as an example: xvpickev.w vd,vj,vk The behavior of the instruction is as follows: vd.w[0] = vk.w[0] vd.w[1] = vk.w[2] vd.w[2] = vj.w[0] vd.w[3] = vj.w[2] vd.w[4] = vk.w[4] vd.w[5] = vk.w[6] vd.w[6] = vj.w[4] vd.w[7] = vj.w[6] Oops stupid I. Strangely the bootstrapping (even with BOOT_CFLAGS="-O2 -g -march=la664") and regtesting cannot catch it. In r15-6488, the issue also exists in the xvexth fixed by Guo Jie, and neither bootstrap nor spec tests have detected it. I'll limit this to LSX in v2.
Re: [PATCH] x86: Properly find the maximum stack slot alignment
On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu wrote: > > Don't assume that stack slots can only be accessed by stack or frame > registers. We first find all registers defined by stack or frame > registers. Then check memory accesses by such registers, including > stack and frame registers. I wonder if this approach will also handle cases like e.g.: lea64(%rsp), %rbx ... movaps16(%rbx, %rcx), %xmm0 and: movq%rsp, %rax ... lea64(%rax), %rbx ... movaps16(%rbx), %xmm0 ? Thanks, uros. > > gcc/ > > PR target/109780 > PR target/109093 > * config/i386/i386.cc (ix86_update_stack_alignment): New. > (ix86_find_all_reg_use): Likewise. > (ix86_find_max_used_stack_alignment): Also check memory accesses > from registers defined by stack or frame registers. > > gcc/testsuite/ > > PR target/109780 > PR target/109093 > * g++.target/i386/pr109780-1.C: New test. > * gcc.target/i386/pr109093-1.c: Likewise. > * gcc.target/i386/pr109780-1.c: Likewise. > * gcc.target/i386/pr109780-2.c: Likewise. > > -- > H.J.
Re: [PATCH] x86: Properly find the maximum stack slot alignment
On Wed, Feb 12, 2025 at 3:16 PM Uros Bizjak wrote: > > On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu wrote: > > > > Don't assume that stack slots can only be accessed by stack or frame > > registers. We first find all registers defined by stack or frame > > registers. Then check memory accesses by such registers, including > > stack and frame registers. > > I wonder if this approach will also handle cases like e.g.: > > lea64(%rsp), %rbx > ... > movaps16(%rbx, %rcx), %xmm0 > > and: > > movq%rsp, %rax > ... > lea64(%rax), %rbx > ... > movaps16(%rbx), %xmm0 > > ? They should be handled by ix86_find_all_reg_use do { reg = bitmap_clear_first_set_bit (worklist); ix86_find_all_reg_use (stack_slot_access, reg, worklist); } while (!bitmap_empty_p (worklist)); > Thanks, > uros. > > > > > > gcc/ > > > > PR target/109780 > > PR target/109093 > > * config/i386/i386.cc (ix86_update_stack_alignment): New. > > (ix86_find_all_reg_use): Likewise. > > (ix86_find_max_used_stack_alignment): Also check memory accesses > > from registers defined by stack or frame registers. > > > > gcc/testsuite/ > > > > PR target/109780 > > PR target/109093 > > * g++.target/i386/pr109780-1.C: New test. > > * gcc.target/i386/pr109093-1.c: Likewise. > > * gcc.target/i386/pr109780-1.c: Likewise. > > * gcc.target/i386/pr109780-2.c: Likewise. > > > > -- > > H.J. -- H.J.