Re: [patch, fortran] Add modular exponentiation for unsigned
Hello world, with the following patch to the failing test case diff --git a/gcc/testsuite/gfortran.dg/unsigned_15.f90 b/gcc/testsuite/gfortran.dg/unsigned_15.f90 index da4ccd2dc17..80a7a54e380 100644 --- a/gcc/testsuite/gfortran.dg/unsigned_15.f90 +++ b/gcc/testsuite/gfortran.dg/unsigned_15.f90 @@ -6,8 +6,8 @@ program main unsigned :: u print *,1 + 2u ! { dg-error "Operands of binary numeric operator" } print *,2u + 1 ! { dg-error "Operands of binary numeric operator" } - print *,2u ** 1 ! { dg-error "Exponentiation not valid" } - print *,2u ** 1u ! { dg-error "Exponentiation not valid" } + print *,2u ** 1 ! { dg-error "Operands of binary numeric operator" } + print *,2u ** 1u print *,1u < 2 ! { dg-error "Inconsistent types" } print *,int(1u) < 2 end program main the patch posted to https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html and https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674931.html passes (I don't want to re-submit the whole thing). OK for trunk? Best regards Thomas
[PATCH] IBM zSystems: Do not use @PLT with larl
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? Commit 0990d93dd8a4 ("IBM Z: Use @PLT symbols for local functions in 64-bit mode") made GCC call both static and non-static functions and load both static and non-static function addresses with the @PLT suffix. This made it difficult for linkers to distinguish calling and address taking instructions [1]. It is currently assumed that the R_390_PLT32DBL relocation, corresponding to the @PLT suffix, is used only for calling, and the R_390_PC32DBL relocation, corresponding to the empty suffix, is used only for address taking. Linkers needs to make this distinction in order to decide whether to ask ld.so to use canonical PLT entries. Normally GOT entries in shared objects contain addresses of the respective functions, with one notable exception: when a no-pie executable calls the respective function and also takes its address. Such executables assume that all addresses are known in advance, so they use addresses of the respective PLT entries. For consistency reasons, all respective GOT entries in the process must also use them. When a linker sees that a no-pie executable both calls a function and also takes its address, it creates a PLT entry and asks ld.so to consider it canonical by setting the respective undefined symbol's address, which is normally 0, to the address of this PLT entry. Improve the situation by not using @PLT with larl. Now that @PLT is not used with larl, also drop the 31-bit handling, which was required because 31-bit PLT entries require %r12 to point to the respective object's GOT, and this requirement is not satisfied when calling them by pointer from another object. Also drop the weak symbol handling, which was required because it is not possible to load an undefined weak symbol address (0) using larl. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=29655 gcc/ChangeLog: * config/s390/s390.cc (print_operand): Remove the no longer necessary 31-bit and weak symbol handling. * config/s390/s390.md (*movdi_64): Do not use @PLT with larl. (*movsi_larl): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/call-z10-pic-nodatarel.c: Adjust expectations. * gcc.target/s390/call-z10-pic.c: Likewise. * gcc.target/s390/call-z10.c: Likewise. * gcc.target/s390/call-z9-pic-nodatarel.c: Likewise. * gcc.target/s390/call-z9-pic.c: Likewise. * gcc.target/s390/call-z9.c: Likewise. --- gcc/config/s390/s390.cc | 16 +++- gcc/config/s390/s390.md | 8 .../gcc.target/s390/call-z10-pic-nodatarel.c | 6 ++ gcc/testsuite/gcc.target/s390/call-z10-pic.c | 6 ++ gcc/testsuite/gcc.target/s390/call-z10.c | 14 +- .../gcc.target/s390/call-z9-pic-nodatarel.c | 6 ++ gcc/testsuite/gcc.target/s390/call-z9-pic.c | 6 ++ gcc/testsuite/gcc.target/s390/call-z9.c | 14 +- 8 files changed, 25 insertions(+), 51 deletions(-) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 86a5f059b85..1d96df49fea 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -8585,7 +8585,7 @@ print_operand_address (FILE *file, rtx addr) 'E': print opcode suffix for branch on index instruction. 'G': print the size of the operand in bytes. 'J': print tls_load/tls_gdcall/tls_ldcall suffix -'K': print @PLT suffix for call targets and load address values. +'K': print @PLT suffix for branch targets; do not use with larl. 'M': print the second word of a TImode operand. 'N': print the second word of a DImode operand. 'O': print only the displacement of a memory reference or address. @@ -8854,19 +8854,9 @@ print_operand (FILE *file, rtx x, int code) call even static functions via PLT. ld will optimize @PLT away for normal code, and keep it for patches. -Do not indiscriminately add @PLT in 31-bit mode due to the %r12 -restriction, use UNSPEC_PLT31 instead. - @PLT only makes sense for functions, data is taken care of by --mno-pic-data-is-text-relative. - -Adding @PLT interferes with handling of weak symbols in non-PIC code, -since their addresses are loaded with larl, which then always produces -a non-NULL result, so skip them here as well. */ - if (TARGET_64BIT - && GET_CODE (x) == SYMBOL_REF - && SYMBOL_REF_FUNCTION_P (x) - && !(SYMBOL_REF_WEAK (x) && !flag_pic)) +-mno-pic-data-is-text-relative. */ + if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_FUNCTION_P (x)) fprintf (file, "@PLT"); return; } diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index c164ea72c78..9d495803387 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -2001,7 +2
Re: [PATCH v2] c++: Properly detect calls to digest_init in build_vec_init [PR114619]
On 2/3/25 8:29 AM, Simon Martin wrote: Hi Jason, On 16 Jan 2025, at 23:28, Jason Merrill wrote: On 10/19/24 5:09 AM, Simon Martin wrote: We currently ICE in checking mode with cxx_dialect < 17 on the following valid code === cut here === struct X { X(const X&) {} }; extern X x; void foo () { new X[1]{x}; } === cut here === The problem is that cp_gimplify_expr gcc_checking_asserts that a TARGET_EXPR is not TARGET_EXPR_ELIDING_P (or cannot be elided), while in this case with cxx_dialect < 17, it is TARGET_EXPR_ELIDING_P but we have not even tried to elide. This patch relaxes that gcc_checking_assert to not fail when using cxx_dialect < 17 and -fno-elide-constructors (I considered being more clever at setting TARGET_EXPR_ELIDING_P appropriately but it looks more risky and not worth the extra complexity for a checking assert). The problem is that in that case we end up with two copy constructor calls instead of one: one built in massage_init_elt, and the other in expand_default_init. The result of the first copy is marked TARGET_EXPR_ELIDING_P, so when we try to pass it to the second copy we hit the assert. I think the assert is catching a real bug: even with -fno-elide-constructors we should only copy once, not twice. That’s right, thanks for pointing me in the right direction. This seems to be because 'digested' has the wrong value in build_vec_init; we did just call digest_init in build_new_1, but build_vec_init doesn't understand that. The test to determine whether digest_init has been called is indeed incorrect, in that it will work if BASE is a reference to the array but not if it’s a pointer to its first element. The attached updated patch fixes this. Successfully tested on x86_64-pc-linux-gnu. OK for trunk? OK. Jason
Re: [PATCH] c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro
On 2/3/25 7:38 AM, Jakub Jelinek wrote: On Mon, Feb 03, 2025 at 11:33:38AM +0100, Richard Biener wrote: The first argument is supposed to be a type, not a decl. Bootstrap & regtest running on x86_64-unknown-linux-gnu. OK? PR c++/79786 gcc/cp/ * rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation. LGTM. OK. --- a/gcc/cp/rtti.cc +++ b/gcc/cp/rtti.cc @@ -1741,7 +1741,8 @@ emit_tinfo_decl (tree decl) /* Avoid targets optionally bumping up the alignment to improve vector instruction accesses, tinfo are never accessed this way. */ #ifdef DATA_ABI_ALIGNMENT - SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE (decl; + SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (TREE_TYPE (decl), + TYPE_ALIGN (TREE_TYPE (decl; DECL_USER_ALIGN (decl) = true; #endif return true; -- 2.43.0 Jakub
RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]
Looks like a last minute change I made accidentally blocked SVE. Fixed and re-sending: Hi All, This fixes two PRs on Early break vectorization by delaying the safety checks to vectorizable_load when the VF, VMAT and vectype are all known. This patch does add two new restrictions: 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven group sizes, as they are unaligned every n % 2 iterations and so may cross a page unwittingly. 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if we cannot peel for alignment, as the alignment requirement is quite large at GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we don't support it for now. There are other steps documented inside the code itself so that the reasoning is next to the code. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu -m32, -m64 and no issues. On arm-none-linux-gnueabihf some tests are failing to vectorize because it looks like LOAD_LANES is often misaligned. I need to debug those a bit more to see if it's the patch or backend. For now I think the patch itself is fine. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/118464 PR tree-optimization/116855 * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay checks. (vect_compute_data_ref_alignment): Remove alignment checks and move to vectorizable_load. (vect_enhance_data_refs_alignment): Add note to comment needing investigating. (vect_analyze_data_refs_alignment): Likewise. (vect_supportable_dr_alignment): For group loads look at first DR. * tree-vect-stmts.cc (get_load_store_type, vectorizable_load): Perform safety checks for early break pfa. * tree-vectorizer.h (dr_peeling_alignment): New. gcc/testsuite/ChangeLog: PR tree-optimization/118464 PR tree-optimization/116855 * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the load type is relaxed later. * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets * g++.dg/vect/vect-early-break_7-pr118464.cc: New test. * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. -- inline copy of patch -- diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c114064a6a603b732 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will register in a basic block. Work bound when discovering transitive relations from existing relations. @item min-pagesize -Minimum page size for warning purposes. +Minimum page size for warning and early break vectorization purposes. @item openacc-kernels Specify mode of OpenACC `kernels' constructs handling. diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc new file mode 100644 index ..5e50e56ad17515e278c05c92263af120c3ab2c21 --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-add-options vect_early_break } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-O3" } */ + +#include + +struct ts1 { + int spans[6][2]; +}; +struct gg { + int t[6]; +}; +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) { + ts1 ret; + for (size_t i = 0; i != t; i++) { +if (!(i < t)) __builtin_abort(); +ret.spans[i][0] = s1->spans[i][0] + s2->t[i]; +ret.spans[i][1] = s1->spans[i][1] + s2->t[i]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c index 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c @@ -55,7 +55,9 @@ int main() } } rephase (); +#pragma GCC novector for (i = 0; i < 32; ++i) +#pragma GCC novector for
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On 2/3/25 7:14 AM, Jeff Law wrote: > On 2/3/25 2:31 AM, H.J. Lu wrote: >> I believe the original patch should be reverted. Then my patch isn't needed. > > That patch had significant improvements across the board for RISC-V. > I wouldn't want to see it reverted without a strong explanation of why it was > wrong. In my opinion, the patch is not wrong, but rather has exposed latent issues that need to be worked on and fixed. Ive asked Surya to continue working on the fallout (see her other patches), but help from others is always appreciated. Peter
Re: [patch, fortran] Add modular exponentiation for unsigned
On 2/3/25 11:55 AM, Thomas Koenig wrote: Hello world, with the following patch to the failing test case diff --git a/gcc/testsuite/gfortran.dg/unsigned_15.f90 b/gcc/testsuite/ gfortran.dg/unsigned_15.f90 index da4ccd2dc17..80a7a54e380 100644 --- a/gcc/testsuite/gfortran.dg/unsigned_15.f90 +++ b/gcc/testsuite/gfortran.dg/unsigned_15.f90 @@ -6,8 +6,8 @@ program main unsigned :: u print *,1 + 2u ! { dg-error "Operands of binary numeric operator" } print *,2u + 1 ! { dg-error "Operands of binary numeric operator" } - print *,2u ** 1 ! { dg-error "Exponentiation not valid" } - print *,2u ** 1u ! { dg-error "Exponentiation not valid" } + print *,2u ** 1 ! { dg-error "Operands of binary numeric operator" } + print *,2u ** 1u print *,1u < 2 ! { dg-error "Inconsistent types" } print *,int(1u) < 2 end program main the patch posted to https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html and https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674931.html passes (I don't want to re-submit the whole thing). OK for trunk? Best regards Thomas Yes, please proceed. Jerry
Re: [PATCH v3] c++, coroutines: Fix awaiter var creation [PR116506].
On 12/9/24 7:53 PM, Jason Merrill wrote: On 12/9/24 1:52 PM, Iain Sandoe wrote: On 9 Dec 2024, at 17:41, Jason Merrill wrote: On 10/31/24 4:40 AM, Iain Sandoe wrote: This version tested on x86_64-darwin,linux, powerpc64-linux, on folly and by Sam on wider codebases, Why don't you need a variable to preserve o across suspensions if it's a call returning lvalue reference? We always need a space for the awaiter, unless it is already a variable/parameter (or part of one). I suspect that the simple case is not lvalue_p, but ! TREE_SIDE_EFFECTS. That is likely where I’m going wrong - we must not generate a variable for any case that already has one (or a parm), but we must for any case that is a temporary. So, I should adjust the logic to use !TREE_SIDE_EFFECTS. Or perhaps DECL_P. The difference would be for compound lvalues like *p or a[n]; if the value of p or a or n could change across suspension, the same side-effect-free lvalue expression could refer to a different object. Right, part of the code that was elided catered for the compound values by making a reference to the original entity and placing that in the frame. We restore that behaviour here. Note that there is no point in making a reference to an xvalue (we'd only have to save the expiring value in the frame anyway), so we just go ahead and build that var directly. Hmm, is there a defect report about this? I don’t believe there’s any defect here. My reading of https://eel.is/c++draft/expr#await-3 is that our current behavior in these testcases conforms to the WP: we evaluate o, do temporary materialization, then treat the result as an lvalue. Nothing that I can see specifies making a copy of an xvalue; it reads to me more like initializing a && variable, i.e. Awaiter&& e = p.await_transform(Awaiter{}); // dangling reference I'm only finding 2472, which doesn't cover this case. This comment specifically relates to the final sentence of https:// eel.is/c++draft/expr#await-3.3. IFF, as per that, we materialise a temporary for the awaiter, we know (in advance) that its lifetime must persist across the suspension, therefore it will be “promoted” to a frame entry. We could make a reference to it (but that would become a frame entry reference to another frame entry which is a waste). Therefore, we make it into a frame candidate right away. Perhaps I’m still missing something... 3.3 materialization applies if o is a prvalue, but it's an xvalue, so it doesn't apply. Temporary materialization for Awaiter{} happened earlier, for passing it to await_transform. And I'd think flatten_await_stmt should handle preserving the temporary. But reading more closely I see that you aren't actually making a copy of the object in this case, because of what you do with INDIRECT_REF_P; here o_type is Awaiter&&, so you do create a frame variable like my declaration above. But I think messing with INDIRECT_REF_P is unnecessary; we should deal with glvalues the same regardless of whether they are directly REFERENCE_REF_P. We certainly want to exclude the case in this testcase from the "use the existing entity" handling, but the lvalue_p handling in the "we need a var" case should also be fine for xvalues. It seemed like this was stalled, so I went ahead and made the changes myself. Applying this: From 4c743798b1d4530b327dad7c606c610f3811fdbf Mon Sep 17 00:00:00 2001 From: Iain Sandoe Date: Thu, 31 Oct 2024 08:40:08 + Subject: [PATCH] c++/coroutines: Fix awaiter var creation [PR116506] To: gcc-patches@gcc.gnu.org Awaiters always need to have a coroutine state frame copy since they persist across potential supensions. It simplifies the later analysis considerably to assign these early which we do when building co_await expressions. The cleanups in r15-3146-g47dbd69b1, unfortunately elided some of processing used to cater for cases where the var created from an xvalue, or is a pointer/reference type. Corrected thus. PR c++/116506 PR c++/116880 gcc/cp/ChangeLog: * coroutines.cc (build_co_await): Ensure that xvalues are materialised. Handle references/pointer values in awaiter access expressions. (is_stable_lvalue): New. * decl.cc (cxx_maybe_build_cleanup): Handle null arg. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr116506.C: New test. * g++.dg/coroutines/pr116880.C: New test. Signed-off-by: Iain Sandoe Co-authored-by: Jason Merrill --- gcc/cp/coroutines.cc | 59 ++ gcc/cp/decl.cc | 2 +- gcc/testsuite/g++.dg/coroutines/pr116506.C | 53 +++ gcc/testsuite/g++.dg/coroutines/pr116880.C | 36 + 4 files changed, 139 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116506.C create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116880.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 1dee3d25b9b..d3c7ff3bd72 100644 --- a/gcc/cp/corou
[pushed] c++: coroutines and range for [PR118491]
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- The implementation of extended range-for temporaries in r15-3840 confused coroutines, because await_statement_walker and the like get confused by the EXPR_STMT into thinking that the whole for-loop is a single expression statement and try to process it accordingly. Fixing this seems to be a simple matter of dropping the EXPR_STMT. PR c++/116914 PR c++/117231 PR c++/118470 PR c++/118491 gcc/cp/ChangeLog: * semantics.cc (finish_for_stmt): Don't wrap the result of pop_stmt_list in EXPR_STMT. gcc/testsuite/ChangeLog: * g++.dg/coroutines/coro-range-for1.C: New test. --- gcc/cp/semantics.cc | 1 - .../g++.dg/coroutines/coro-range-for1.C | 38 +++ 2 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/coro-range-for1.C diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index ad9864c3a91..73b49174de4 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -1709,7 +1709,6 @@ finish_for_stmt (tree for_stmt) { tree stmt = pop_stmt_list (FOR_INIT_STMT (for_stmt)); FOR_INIT_STMT (for_stmt) = NULL_TREE; - stmt = build_stmt (EXPR_LOCATION (for_stmt), EXPR_STMT, stmt); stmt = maybe_cleanup_point_expr_void (stmt); add_stmt (stmt); } diff --git a/gcc/testsuite/g++.dg/coroutines/coro-range-for1.C b/gcc/testsuite/g++.dg/coroutines/coro-range-for1.C new file mode 100644 index 000..eaf4d19e62c --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/coro-range-for1.C @@ -0,0 +1,38 @@ +// PR c++/118491 +// { dg-do compile { target c++20 } } + +#include + +struct task { + struct promise_type { +task get_return_object() { return {}; } +std::suspend_always initial_suspend() { return {}; } +std::suspend_always final_suspend() noexcept { return {}; } +std::suspend_always yield_value(double value) { return {}; } +void unhandled_exception() { throw; } + }; +}; + +task do_task() { + const int arr[]{1, 2, 3}; + + // No ICE if classic loop and not range-based one. + // for (auto i = 0; i < 10; ++i) { + + // No ICE if these are moved out of the loop. + // auto x = std::suspend_always{}; + // co_await x; + + for (auto _ : arr) { +auto bar = std::suspend_always{}; +co_await bar; + +// Alternatively: +// auto bar = 42.; +// co_yield bar; + +// No ICE if r-values: +// co_await std::suspend_always{}; +// co_yield 42.; + } +} base-commit: f3a41e6cb5d70f0c94cc8273a118b8542fb5c2fa -- 2.48.0
[PATCH] c++: ICE on invalid 'tor with =default [PR118304]
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- In this PR we crash in maybe_delete_defaulted_fn because the switch doesn't expect a cfk_constructor/_destructor. But we can get there: struct A { *A() = default; }; is invalid due to the void/void* mismatch, so we get to m_d_d_fn: if (!same_type_p (TREE_TYPE (TREE_TYPE (fn)), TREE_TYPE (TREE_TYPE (implicit_fn))) maybe_delete_defaulted_fn (fn, implicit_fn); Currently, we give no error (subject to c++/118306), but even if we did, we should probably return early in maybe_delete_defaulted_fn. PR c++/118304 gcc/cp/ChangeLog: * method.cc (maybe_delete_defaulted_fn): Return early for cdtors. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/defaulted70.C: New test. --- gcc/cp/method.cc | 10 +- gcc/testsuite/g++.dg/cpp0x/defaulted70.C | 9 + 2 files changed, 18 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted70.C diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc index 3914bbb1ef2..99e247125c3 100644 --- a/gcc/cp/method.cc +++ b/gcc/cp/method.cc @@ -3531,10 +3531,18 @@ maybe_delete_defaulted_fn (tree fn, tree implicit_fn) if (DECL_ARTIFICIAL (fn) || !DECL_DEFAULTED_IN_CLASS_P (fn)) return; + const special_function_kind kind = special_function_p (fn); + if (kind == sfk_constructor || kind == sfk_destructor) +{ + /* FIXME: This is ill-formed, and we should have given an error. +But this is only going to be fixed in GCC 16 via c++/118306. */ + gcc_assert (true || seen_error ()); + return; +} + DECL_DELETED_FN (fn) = true; auto_diagnostic_group d; - const special_function_kind kind = special_function_p (fn); tree parmtype = TREE_VALUE (DECL_XOBJ_MEMBER_FUNCTION_P (fn) ? TREE_CHAIN (TYPE_ARG_TYPES (TREE_TYPE (fn))) diff --git a/gcc/testsuite/g++.dg/cpp0x/defaulted70.C b/gcc/testsuite/g++.dg/cpp0x/defaulted70.C new file mode 100644 index 000..e269d9bc6a5 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/defaulted70.C @@ -0,0 +1,9 @@ +// PR c++/118304 +// { dg-do compile { target c++11 } } + +struct A { + *A() = default; // { dg-error "invalid" "PR118306" { xfail *-*-* } } + *~A() = default; // { dg-error "invalid" "PR118306" { xfail *-*-* } } +}; + +A a; base-commit: 214224c4973bfb76f73a7efff29c5823eef31194 -- 2.48.1
Re: [PING, PATCH] fortran: fix -MT/-MQ adding additional target [PR47485]
Hi all, Gentle ping for the patch below: https://gcc.gnu.org/pipermail/fortran/2024-December/061467.html Best wishes, Vincent On 30/12/2024 00:19, Vincent Vanlaer wrote: The -MT and -MQ options should replace the default target in the generated dependency file. deps_add_target needs to be called before cpp_read_main_file, otherwise the original object name is added. gcc/fortran/ PR fortran/47485 * cpp.cc: fix -MT/-MQ adding additional target instead of replacing the default gcc/testsuite/ PR fortran/47485 * gfortran.dg/dependency_generation_1.f90: New test Signed-off-by: Vincent Vanlaer --- gcc/fortran/cpp.cc | 18 -- .../gfortran.dg/dependency_generation_1.f90| 15 +++ 2 files changed, 27 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/dependency_generation_1.f90 diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc index 7c5f00cfd69..3b93d17b90f 100644 --- a/gcc/fortran/cpp.cc +++ b/gcc/fortran/cpp.cc @@ -96,6 +96,8 @@ struct gfc_cpp_option_data int deps_skip_system; /* -MM */ const char *deps_filename;/* -M[M]D */ const char *deps_filename_user; /* -MF */ + const char *deps_target_filename; /* -MT / -MQ */ + bool quote_deps_target_filename; /* -MQ */ int deps_missing_are_generated; /* -MG */ int deps_phony; /* -MP */ int warn_date_time; /* -Wdate-time */ @@ -287,6 +289,8 @@ gfc_cpp_init_options (unsigned int decoded_options_count, gfc_cpp_option.deps_missing_are_generated = 0; gfc_cpp_option.deps_filename = NULL; gfc_cpp_option.deps_filename_user = NULL; + gfc_cpp_option.deps_target_filename = NULL; + gfc_cpp_option.quote_deps_target_filename = false; gfc_cpp_option.multilib = NULL; gfc_cpp_option.prefix = NULL; @@ -439,9 +443,8 @@ gfc_cpp_handle_option (size_t scode, const char *arg, int value ATTRIBUTE_UNUSED case OPT_MQ: case OPT_MT: - gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].code = code; - gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].arg = arg; - gfc_cpp_option.deferred_opt_count++; + gfc_cpp_option.quote_deps_target_filename = (code == OPT_MQ); + gfc_cpp_option.deps_target_filename = arg; break; case OPT_P: @@ -593,6 +596,12 @@ gfc_cpp_init_0 (void) } gcc_assert(cpp_in); + + if (gfc_cpp_option.deps_target_filename) +if (mkdeps *deps = cpp_get_deps (cpp_in)) + deps_add_target (deps, gfc_cpp_option.deps_target_filename, + gfc_cpp_option.quote_deps_target_filename); + if (!cpp_read_main_file (cpp_in, gfc_source_file)) errorcount++; } @@ -635,9 +644,6 @@ gfc_cpp_init (void) else cpp_assert (cpp_in, opt->arg); } - else if (opt->code == OPT_MT || opt->code == OPT_MQ) - if (mkdeps *deps = cpp_get_deps (cpp_in)) - deps_add_target (deps, opt->arg, opt->code == OPT_MQ); } /* Pre-defined macros for non-required INTEGER kind types. */ diff --git a/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 b/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 new file mode 100644 index 000..d42a257f83a --- /dev/null +++ b/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 @@ -0,0 +1,15 @@ +! This test case ensures that the -MT flag is correctly replacing the object name in the dependency file. +! See PR 47485 +! +! Contributed by Vincent Vanlaer +! +! { dg-do preprocess } +! { dg-additional-options "-cpp" } +! { dg-additional-options "-M" } +! { dg-additional-options "-MF deps" } +! { dg-additional-options "-MT obj.o" } + +module test +end module + +! { dg-final { scan-file "deps" "obj.o:.*" } }
Re: [PATCH v2] c++: Add tree walk case to reach A pack from B in ...B> [PR118265]
On 2/2/25 5:26 PM, A J Ryan Solutions Ltd wrote: This version has all the updates as per feedback from version 1. It makes a minor correction to the code styling, reformats the commit message and moves the test into the cpp1z directory. In addition I've updated the test to conform with c++17 for better coverage. Andrew Pinski had put one up on the ticket to use, it would be c++20, I can switch it to that if there was another reason to use it that I've overlooked. Pushed, thanks! Jason
Re: [PATCH v2] c++: auto in trailing-return-type in parameter [PR117778]
On 1/31/25 4:23 PM, Marek Polacek wrote: On Fri, Jan 31, 2025 at 09:34:52AM -0500, Jason Merrill wrote: On 1/30/25 5:24 PM, Marek Polacek wrote: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? -- >8 -- This PR describes a few issues, both ICE and rejects-valid, but ultimately the problem is that we don't properly synthesize the second auto in: int g (auto fp() -> auto) { return fp (); } since r12-5860, which disabled auto_is_implicit_function_template_parm_p in cp_parser_parameter_declaration after parsing the decl-specifier-seq. If there is no trailing auto, there is no problem. So we have to make sure auto_is_implicit_function_template_parm_p is properly set when parsing the trailing auto. A complication is that one can write: auto f (auto fp(auto fp2() -> auto) -> auto) -> auto; ~~~ where only the underlined auto should be synthesized. So when we parse a parameter-declaration-clause inside another parameter-declaration-clause, we should not enable the flag. We have no flags to keep track of such nesting, but I think I can walk current_binding_level to see if we find ourselves in such an unlikely scenario. PR c++/117778 gcc/cp/ChangeLog: * parser.cc (cp_parser_late_return_type_opt): Maybe override auto_is_implicit_function_template_parm_p. (cp_parser_parameter_declaration): Update commentary. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/lambda-generic-117778.C: New test. * g++.dg/cpp2a/abbrev-fn2.C: New test. * g++.dg/cpp2a/abbrev-fn3.C: New test. --- gcc/cp/parser.cc | 24 - .../g++.dg/cpp1y/lambda-generic-117778.C | 12 + gcc/testsuite/g++.dg/cpp2a/abbrev-fn2.C | 49 +++ gcc/testsuite/g++.dg/cpp2a/abbrev-fn3.C | 7 +++ 4 files changed, 90 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-117778.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/abbrev-fn2.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/abbrev-fn3.C diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 44515bb9074..89c5c2721a7 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -25514,6 +25514,25 @@ cp_parser_late_return_type_opt (cp_parser *parser, cp_declarator *declarator, /* Consume the ->. */ cp_lexer_consume_token (parser->lexer); + /* We may be in the context of parsing a parameter declaration, +namely, its declarator. auto_is_implicit_function_template_parm_p +will be disabled in that case. But for code like + + int g (auto fp() -> auto); + +we have to re-enable the flag for the trailing auto. However, that +only applies for the outermost trailing auto in a parameter clause; in + + int f2 (auto fp(auto fp2() -> auto) -> auto); + +the inner -> auto should not be synthesized. */ + int i = 0; + for (cp_binding_level *b = current_binding_level; + b->kind == sk_function_parms; b = b->level_chain) + ++i; + auto cleanup = make_temp_override + (parser->auto_is_implicit_function_template_parm_p, i == 2); This looks like it will wrongly allow declaring an implicit template within a function; you need a testcase with local extern declarations. Ah right, I didn't check that so it was broken. We should check !current_function_decl. Incidentally, it seems odd that the override in cp_parser_parameter_declaration is before an error early exit a few lines below, moving it after that would avoid needing to clean it up on that path. Good point, adjusted. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? OK. -- >8 -- This PR describes a few issues, both ICE and rejects-valid, but ultimately the problem is that we don't properly synthesize the second auto in: int g (auto fp() -> auto) { return fp (); } since r12-5860, which disabled auto_is_implicit_function_template_parm_p in cp_parser_parameter_declaration after parsing the decl-specifier-seq. If there is no trailing auto, there is no problem. So we have to make sure auto_is_implicit_function_template_parm_p is properly set when parsing the trailing auto. A complication is that one can write: auto f (auto fp(auto fp2() -> auto) -> auto) -> auto; ~~~ where only the underlined auto should be synthesized. So when we parse a parameter-declaration-clause inside another parameter-declaration-clause, we should not enable the flag. We have no flags to keep track of such nesting, but I think I can walk current_binding_level to see if we find ourselves in such an unlikely scenario. PR c++/117778 gcc/cp/ChangeLog: * parser.cc (cp_parser_late_return_type_opt): Maybe override auto_is_implicit_function_template_parm_p. (cp_parser_parameter_declaration): Move a make_temp
Re: [PATCH] c++: Modularise start_cleanup_fn [PR98893]
On 2/1/25 5:29 AM, Nathaniel Shead wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? OK. -- >8 -- 'start_cleanup_fn' is not currently viable in modules, due to generating functions relying on the 'start_cleanup_cnt' counter which is reset to 0 with each new TU. This means that cleanup functions declared in a TU will conflict with any imported cleanup functions. This patch mitigates the problem by using the mangled name of the decl we're destroying as part of the name of the function. This should avoid clashes unless the decls would have clashed anyway. PR c++/98893 gcc/cp/ChangeLog: * decl.cc (start_cleanup_fn): Make name from the mangled name of the passed-in decl. (register_dtor_fn): Pass decl to start_cleanup_fn. gcc/testsuite/ChangeLog: * g++.dg/modules/pr98893_a.H: New test. * g++.dg/modules/pr98893_b.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/decl.cc | 18 +- gcc/testsuite/g++.dg/modules/pr98893_a.H | 9 + gcc/testsuite/g++.dg/modules/pr98893_b.C | 10 ++ 3 files changed, 28 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/pr98893_a.H create mode 100644 gcc/testsuite/g++.dg/modules/pr98893_b.C diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index cf5e055e146..7219543823b 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -96,7 +96,7 @@ static void record_key_method_defined (tree); static tree create_array_type_for_decl (tree, tree, tree, location_t); static tree get_atexit_node (void); static tree get_dso_handle_node (void); -static tree start_cleanup_fn (bool); +static tree start_cleanup_fn (tree, bool); static void end_cleanup_fn (void); static tree cp_make_fname_decl (location_t, tree, int); static void initialize_predefined_identifiers (void); @@ -10373,23 +10373,23 @@ get_dso_handle_node (void) } /* Begin a new function with internal linkage whose job will be simply - to destroy some particular variable. OB_PARM is true if object pointer + to destroy some particular DECL. OB_PARM is true if object pointer is passed to the cleanup function, otherwise no argument is passed. */ -static GTY(()) int start_cleanup_cnt; - static tree -start_cleanup_fn (bool ob_parm) +start_cleanup_fn (tree decl, bool ob_parm) { - char name[32]; - push_to_top_level (); /* No need to mangle this. */ push_lang_context (lang_name_c); /* Build the name of the function. */ - sprintf (name, "__tcf_%d", start_cleanup_cnt++); + gcc_checking_assert (HAS_DECL_ASSEMBLER_NAME_P (decl)); + const char *dname = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + dname = targetm.strip_name_encoding (dname); + char *name = ACONCAT (("__tcf", dname, NULL)); + tree fntype = TREE_TYPE (ob_parm ? get_cxa_atexit_fn_ptr_type () : get_atexit_fn_ptr_type ()); /* Build the function declaration. */ @@ -10482,7 +10482,7 @@ register_dtor_fn (tree decl) build_cleanup (decl); /* Now start the function. */ - cleanup = start_cleanup_fn (ob_parm); + cleanup = start_cleanup_fn (decl, ob_parm); /* Now, recompute the cleanup. It may contain SAVE_EXPRs that refer to the original function, rather than the anonymous one. That diff --git a/gcc/testsuite/g++.dg/modules/pr98893_a.H b/gcc/testsuite/g++.dg/modules/pr98893_a.H new file mode 100644 index 000..062ab6d9ccc --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/pr98893_a.H @@ -0,0 +1,9 @@ +// { dg-additional-options "-fmodule-header" } +// { dg-module-cmi {} } + +struct S { + ~S() {} +}; +inline void foo() { + static S a[1]; +} diff --git a/gcc/testsuite/g++.dg/modules/pr98893_b.C b/gcc/testsuite/g++.dg/modules/pr98893_b.C new file mode 100644 index 000..9065589bdfb --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/pr98893_b.C @@ -0,0 +1,10 @@ +// { dg-additional-options "-fmodules" } + +import "pr98893_a.H"; +static S b[1]; +int main() { + foo(); +} + +// { dg-final { scan-assembler {__tcf_ZZ3foovE1a:} } } +// { dg-final { scan-assembler {__tcf_ZL1b:} } }
Re: [PATCH] c++: Improve contracts support in modules [PR108205]
On 2/1/25 7:03 AM, Nathaniel Shead wrote: Regtested on x86_64-pc-linux-gnu (so far just "dg.exp=contract* modules.exp=contract*"), OK for trunk if full bootstrap+regtest passes? -- >8 -- Modules makes some assumptions about types that currently aren't fulfilled by the types created in contracts logic. This patch ensures that exporting inline functions using contracts works again with modules. PR c++/108205 gcc/cp/ChangeLog: * contracts.cc (get_pseudo_contract_violation_type): Give names to generated FIELD_DECLs. (declare_handle_contract_violation): Mark contract_violation type as external linkage. (build_contract_handler_call): Ensure any builtin declarations created here aren't treated as attached to the current module. OK, but now I'm curious why we don't need this sort of thing in rtti.cc? gcc/testsuite/ChangeLog: * g++.dg/modules/contracts-5_a.C: New test. * g++.dg/modules/contracts-5_b.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/contracts.cc | 27 +--- gcc/testsuite/g++.dg/modules/contracts-5_a.C | 8 ++ gcc/testsuite/g++.dg/modules/contracts-5_b.C | 20 +++ 3 files changed, 46 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/contracts-5_a.C create mode 100644 gcc/testsuite/g++.dg/modules/contracts-5_b.C diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc index 5782ec8bf29..f2b126c8d6b 100644 --- a/gcc/cp/contracts.cc +++ b/gcc/cp/contracts.cc @@ -1633,19 +1633,22 @@ get_pseudo_contract_violation_type () signed char _M_continue; If this changes, also update the initializer in build_contract_violation. */ - const tree types[] = { const_string_type_node, -const_string_type_node, -const_string_type_node, -const_string_type_node, -const_string_type_node, -uint_least32_type_node, -signed_char_type_node }; + struct field_info { tree type; const char* name; }; + const field_info info[] = { + { const_string_type_node, "_M_file" }, + { const_string_type_node, "_M_function" }, + { const_string_type_node, "_M_comment" }, + { const_string_type_node, "_M_level" }, + { const_string_type_node, "_M_role" }, + { uint_least32_type_node, "_M_line" }, + { signed_char_type_node, "_M_continue" } + }; tree fields = NULL_TREE; - for (tree type : types) + for (const field_info& i : info) { /* finish_builtin_struct wants fieldss chained in reverse. */ tree next = build_decl (BUILTINS_LOCATION, FIELD_DECL, - NULL_TREE, type); + get_identifier (i.name), i.type); DECL_CHAIN (next) = fields; fields = next; } @@ -1737,6 +1740,7 @@ declare_handle_contract_violation () create_implicit_typedef (viol_name, violation); DECL_SOURCE_LOCATION (TYPE_NAME (violation)) = BUILTINS_LOCATION; DECL_CONTEXT (TYPE_NAME (violation)) = current_namespace; + TREE_PUBLIC (TYPE_NAME (violation)) = true; pushdecl_namespace_level (TYPE_NAME (violation), /*hidden*/true); pop_namespace (); pop_nested_namespace (std_node); @@ -1761,6 +1765,11 @@ static void build_contract_handler_call (tree contract, contract_continuation cmode) { + /* We may need to declare new types, ensure they are not considered + attached to a named module. */ + auto module_kind_override = make_temp_override +(module_kind, module_kind & ~(MK_PURVIEW | MK_ATTACH | MK_EXPORTING)); + tree violation = build_contract_violation (contract, cmode); tree violation_fn = declare_handle_contract_violation (); tree call = build_call_n (violation_fn, 1, build_address (violation)); diff --git a/gcc/testsuite/g++.dg/modules/contracts-5_a.C b/gcc/testsuite/g++.dg/modules/contracts-5_a.C new file mode 100644 index 000..2ff6701ff3f --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/contracts-5_a.C @@ -0,0 +1,8 @@ +// PR c++/108205 +// Test that the implicitly declared handle_contract_violation function is +// properly matched with a later declaration in an importing TU. +// { dg-additional-options "-fmodules -fcontracts -fcontract-continuation-mode=on" } +// { dg-module-cmi test } + +export module test; +export inline void foo(int x) noexcept [[ pre: x != 0 ]] {} diff --git a/gcc/testsuite/g++.dg/modules/contracts-5_b.C b/gcc/testsuite/g++.dg/modules/contracts-5_b.C new file mode 100644 index 000..0e794b8ae45 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/contracts-5_b.C @@ -0,0 +1,20 @@ +// PR c++/108205 +// { dg-module-do run } +// { dg-additional-options "-fmodules -fcontracts -fcontract-con
Re: [PATCH] c++: bogus -Wvexing-parse with trailing-return-type [PR118718]
On 1/31/25 4:21 PM, Marek Polacek wrote: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? OK. -- >8 -- This warning should not warn for auto f1 () -> auto; because that cannot be confused with initializing a variable. PR c++/118718 gcc/cp/ChangeLog: * parser.cc (warn_about_ambiguous_parse): Don't warn when a trailing return type is present. gcc/testsuite/ChangeLog: * g++.dg/warn/Wvexing-parse10.C: New test. --- gcc/cp/parser.cc| 4 gcc/testsuite/g++.dg/warn/Wvexing-parse10.C | 9 + 2 files changed, 13 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/Wvexing-parse10.C diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 44515bb9074..1da881e295b 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -23617,6 +23617,10 @@ warn_about_ambiguous_parse (const cp_decl_specifier_seq *decl_specifiers, (const_cast(declarator return; + /* Don't warn for auto f () -> auto. */ + if (declarator->u.function.late_return_type) +return; + /* Don't warn when the whole declarator (not just the declarator-id!) was parenthesized. That is, don't warn for int(n()) but do warn for int(f)(). */ diff --git a/gcc/testsuite/g++.dg/warn/Wvexing-parse10.C b/gcc/testsuite/g++.dg/warn/Wvexing-parse10.C new file mode 100644 index 000..3fbe88b7d00 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wvexing-parse10.C @@ -0,0 +1,9 @@ +// PR c++/118718 +// { dg-do compile { target c++14 } } + +void +fn () +{ + auto f1 () -> auto; + auto f2 (); // { dg-warning "empty parentheses" } +} base-commit: d6418fe22684f9335474d1fd405ade45954c069d
Re: [PATCH v11] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]
On 1/31/25 12:11 PM, Simon Martin wrote: Hi Jason, On 27 Jan 2025, at 16:49, Jason Merrill wrote: On 1/27/25 10:41 AM, Simon Martin wrote: Hi Jason, On 17 Jan 2025, at 23:33, Jason Merrill wrote: On 1/17/25 9:52 AM, Simon Martin wrote: Hi Jason, On 16 Jan 2025, at 22:49, Jason Merrill wrote: On 10/16/24 11:43 AM, Simon Martin wrote: As you know the patch had to be reverted due to PR117114, that highlighted a bunch of issues with comparing DECL_VINDEXes: it might give false positives in case of multiple inheritance (the case in PR117114), but also if there’s single inheritance by the hierarchy has more than two levels (another issue I found while bootstrapping with rust enabled). Yes, relying on DECL_VINDEX equality was wrong, sorry to mislead you. The attached updated patch introduces an overrides_p function, based on the existing check_final_overrider, and uses it when the signatures match. That seems unnecessary. It seems like removing that only breaks Woverloaded-virt11.C, and making that work again only requires bringing back the check that DECL_VINDEX (fndecl) is set (to any value). Or remembering that fndecl was a template, so it can't really have the same signature as a non-template, whatever same_signature_p says. That’s right, only Woverloaded-virt11.C fails without the check_final_overrider call. Thanks for the suggestion to check whether fndecl is a template. This is what the updated attached patch does, successfully tested on x86_64-pc-linux-gnu. OK for GCC 15? And if so, thoughts on backporting to release branches (technically it’s a regression but it’s “just” an incorrect warning fix, so probably not worth the risk)? Right, I wouldn't backport. + if (warn_overloaded_virtual == 1 + && overrider_fndecls.elements () == num_fns) + /* All the fns override a base virtual. */ + continue; This looks like the only use of the overrider_fndecls hash_set. A hash_set seems a bit overkill for checking whether everything in fns is an overrider; keeping track of how many times the old any_override was set should work just as well? Yeah you’re right :-/ I’ve changed my latest patch to simply count overriders. + /* fndecls hides base_fndecls[k]. */ + auto_vec &hiders = + hidden_base_fndecls.get_or_insert (base_fndecls[k]); + if (!hiders.contains (fndecl)) + hiders.safe_push (fndecl); Hmm, do you think users want a full list of the overloads that don't override? I'd think the problem is more the overload that doesn't exist rather than the ones that do. The current code ends up in the OVERLOAD handling of dump_decl that just prints scope::name. Indeed, the full list is probably not super useful... One problem with the current code is that for conversion operators, it will give a note such as “note: by 'operator’”, so I propose to keep track of at least one of the hiders, and use it to show the note (and get a proper “by 'virtual B::operator char()'” note for conversion operators). Hence the updated patch, successfully tested on x86_64-pc-linux-gnu. Ok for trunk? + else if (!template_p /* Template methods don't override. */ +&& same_signature_p (fndecl, base_fndecls[k])) + { + overriden_base_fndecls.add (base_fndecls[k]); + ++num_overriders; + } I'm concerned that this will increment num_overriders multiple times for a single fndecl if it overrides functions in multiple bases. Such a case is covered by the new Woverloaded-virt11.C and does not warn, but it’s true that we don’t take the “if (warn_overloaded_virtual == 1 && num_overriders == num_fns)” continue, and we should - thanks. I have updated the patch to only increment num_overriders at the end of the loop iterating on base functions if we’ve seen at least one overridden base function. Successfully tested on x86_64-pc-linux-gnu. OK for trunk? @@ -3402,7 +3402,8 @@ location_of (tree t) return input_location; } else if (TREE_CODE (t) == OVERLOAD) -t = OVL_FIRST (t); +t = OVL_FIRST (t) != conv_op_marker ? OVL_FIRST (t) + : OVL_FIRST (OVL_CHAIN (t)); Please add parentheses around the ?: expression to preserve the indentation. OK with that tweak. Jason
Re: [PATCH v2] c++: Don't merge friend declarations that specify default arguments [PR118319]
On 1/31/25 11:12 AM, Simon Martin wrote: Hi Jason, On 31 Jan 2025, at 16:29, Jason Merrill wrote: On 1/31/25 9:52 AM, Simon Martin wrote: Hi Jason, On 9 Jan 2025, at 22:55, Jason Merrill wrote: On 1/9/25 8:25 AM, Simon Martin wrote: We segfault upon the following invalid code === cut here === template struct S { friend void foo (int a = []{}()); }; void foo (int a) {} int main () { S<0> t; foo (); } === cut here === The problem is that we end up with a LAMBDA_EXPR callee in set_flags_from_callee, and dereference its NULL_TREE TREE_TYPE (TREE_TYPE ( )). This patch simply sets the default argument to error_mark_node for friend functions that do not meet the requirement in C++17 11.3.6/4. Successfully tested on x86_64-pc-linux-gnu. PR c++/118319 gcc/cp/ChangeLog: * decl.cc (grokfndecl): Inspect all friend function parameters, and set them to error_mark_node if invalid. gcc/testsuite/ChangeLog: * g++.dg/parse/defarg18.C: New test. --- gcc/cp/decl.cc| 13 +--- gcc/testsuite/g++.dg/parse/defarg18.C | 48 +++ 2 files changed, 57 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/parse/defarg18.C diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 503ecd9387e..b2761c23d3e 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -11134,14 +11134,19 @@ grokfndecl (tree ctype, expression, that declaration shall be a definition..." */ if (friendp && !funcdef_flag) { + bool has_permerrored = false; for (tree t = FUNCTION_FIRST_USER_PARMTYPE (decl); t && t != void_list_node; t = TREE_CHAIN (t)) if (TREE_PURPOSE (t)) { - permerror (DECL_SOURCE_LOCATION (decl), - "friend declaration of %qD specifies default " - "arguments and isn%'t a definition", decl); - break; + if (!has_permerrored) + { + has_permerrored = true; + permerror (DECL_SOURCE_LOCATION (decl), + "friend declaration of %qD specifies default " + "arguments and isn%'t a definition", decl); + } + TREE_PURPOSE (t) = error_mark_node; If we're going to unconditionally change TREE_PURPOSE, then permerror needs to strengthen to error. But I'd think we could leave the current state in a non-template class, only changing the template case. Thanks. It’s true that setting the argument to error_mark_node is contradictory with the fact that we accept the code with -fpermissive, even if only under processing_template_decl, so I checked if there’s not a better way of approaching this PR. After a bit of investigation, I think that the real problem is that duplicate_decls tries to merge the two declarations, even though they don’t meet the constraint about friend functions and default arguments. I disagree; in this testcase the friend is the (lexically) first declaration, the problem is that it's a non-defining friend (in a template) that specifies default args, as addressed by your first patch. Fair. I still think my earlier comments are the way forward here: leave the non-template case alone (permerror, don't change TREE_PURPOSE), in a template give a hard error and change to error_mark_node. Thanks, understood. The reason I looked for another “solution” is that it felt strange to be permissive in non-templates and stricter in templates. For example, if we do so, we’ll regress the case I added in defarg19.C in -fpermissive (also available at https://godbolt.org/z/YT3dexGjM). I’m probably splitting hair, and I’m happy to go ahead with your suggestion if you think it’s fine. Otherwise I’ll see if I find some better fix. That's fine, it's common to be stricter in templates. Jason
[PATCH] RTEMS: Add Cortex-M33 multilib
Enable use of Armv8-M instruction set. Account for CVE-2021-35465 mitigation [PR102035]. The -mfix-cmse-cve-2021-35465 enabled by default, if -mcpu=cortex-m33 is used. gcc/ * config/arm/t-rtems: Add Cortex-M33 multilib. --- gcc/config/arm/t-rtems | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/t-rtems b/gcc/config/arm/t-rtems index b2fcf572bca..797640bd4f4 100644 --- a/gcc/config/arm/t-rtems +++ b/gcc/config/arm/t-rtems @@ -17,8 +17,8 @@ MULTILIB_DIRNAMES += eb MULTILIB_OPTIONS += mthumb MULTILIB_DIRNAMES += thumb -MULTILIB_OPTIONS += march=armv5te+fp/march=armv6-m/march=armv7-a/march=armv7-a+simd/march=armv7-r/march=armv7-r+fp/mcpu=cortex-r52/mcpu=cortex-m3/mcpu=cortex-m4/mcpu=cortex-m4+nofp/mcpu=cortex-m7 -MULTILIB_DIRNAMES += armv5te+fp armv6-m armv7-a armv7-a+simd armv7-r armv7-r+fp cortex-r52 cortex-m3 cortex-m4 cortex-m4+nofp cortex-m7 +MULTILIB_OPTIONS += march=armv5te+fp/march=armv6-m/march=armv7-a/march=armv7-a+simd/march=armv7-r/march=armv7-r+fp/mcpu=cortex-r52/mcpu=cortex-m3/mcpu=cortex-m33/mcpu=cortex-m4/mcpu=cortex-m4+nofp/mcpu=cortex-m7 +MULTILIB_DIRNAMES += armv5te+fp armv6-m armv7-a armv7-a+simd armv7-r armv7-r+fp cortex-r52 cortex-m3 cortex-m33 cortex-m4 cortex-m4+nofp cortex-m7 MULTILIB_OPTIONS += mfloat-abi=hard MULTILIB_DIRNAMES += hard @@ -33,6 +33,7 @@ MULTILIB_REQUIRED += mthumb/march=armv7-r+fp/mfloat-abi=hard MULTILIB_REQUIRED += mthumb/march=armv7-r MULTILIB_REQUIRED += mthumb/mcpu=cortex-r52/mfloat-abi=hard MULTILIB_REQUIRED += mthumb/mcpu=cortex-m3 +MULTILIB_REQUIRED += mthumb/mcpu=cortex-m33 MULTILIB_REQUIRED += mthumb/mcpu=cortex-m4/mfloat-abi=hard MULTILIB_REQUIRED += mthumb/mcpu=cortex-m4+nofp MULTILIB_REQUIRED += mthumb/mcpu=cortex-m7/mfloat-abi=hard -- 2.43.0
Re: [PATCH] RTEMS: Add Cortex-M33 multilib
- Am 4. Feb 2025 um 4:15 schrieb Sebastian Huber sebastian.hu...@embedded-brains.de: > Enable use of Armv8-M instruction set. > > Account for CVE-2021-35465 mitigation [PR102035]. The > -mfix-cmse-cve-2021-35465 enabled by default, if -mcpu=cortex-m33 is > used. > > gcc/ > > * config/arm/t-rtems: Add Cortex-M33 multilib. > --- > gcc/config/arm/t-rtems | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) I would like to back port this change to the GCC 13 and 14 branches. -- embedded brains GmbH & Co. KG Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.hu...@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/
Re: [PATCH] Fortran: different character lengths in array constructor [PR93289]
Am 03.02.25 um 19:31 schrieb Jerry D: On 2/3/25 2:49 AM, Richard Sandiford wrote: Steve Kargl writes: On Sat, Feb 01, 2025 at 09:49:17PM +0100, Harald Anlauf wrote: Am 01.02.25 um 21:03 schrieb Steve Kargl: On Sat, Feb 01, 2025 at 07:25:51PM +0100, Harald Anlauf wrote: the attached patch downgrades different constant character lengths in an array constructor from a GNU to a legacy extension, so that users get a warning with -std=gnu. We continue to generate an error when standard conformance is requested. Regtested on x86_64-pc-linux-gnu (found one testcase where this triggered... :) OK for mainline? My vote is 'no'. This is either a GNU extension or an error. It is certainly not a legacy issue as array constructors simple cannot appear old moldy *legacy* codes. legacy /= moldy. My intention is to downgrade existing, potentially dangerous GNU extensions (like this one) carefully to "legacy", but not with an axe. I would be in favor of making it a hard error. If you believe gfortan must be able to compile invalid source, then add an option such as -fallow-invalid-scalar-character-entities-in-array- constructor. I don't see why we shall scare users by making code that is currently accepted silently, because it is a GNU extension, suddenly to a hard error. So why must we be so tough? Because -std=legacy allows a whole bunch of garbage. Instead of fixing broken code, a user will slap -std=legacy in a Makefile and move on. Then years from now, you'll see -std=legacy in a whole bunch of Makefiles whether it is needed or not. See -maligned-double and -fallow-argument-mismatch as poster children. I agree that this is what will happen. But for people running benchmarks, it's kind-of (kind-of) a feature. Benchmarks tend to include relatively old code by the time that they're released, and benchmarks continue to be relevant (or at least widely tested) after they're out of maintenance. So it has been really useful to have -std=legacy accept old, dangerous code, since it means that we can continue to test old benchmarks with newer compilers. Improving the benchmark source to avoid the dangerous constructs would invalidate the test and make it harder to compare with historical results. Again, just my $0.02. Same here, just wanted to raise the benchmark use case. Thanks, Richard I think we have had good discussion and for sake of the good of the order I recommend we push this for now. The work has been done. Regards, Jerry Thanks, Jerry! Pushed: r15-7336-gf3a41e6cb5d70f
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On 2/3/25 3:44 AM, Richard Biener wrote: > On Mon, Feb 3, 2025 at 10:32 AM H.J. Lu wrote: >> I believe the original patch should be reverted. Then my patch isn't needed. > > I'm OK with that, but it's not my call. I do wonder why the contributor did > not > address any of the fallout. Maybe he's gone? Peter? Surya (she) is working on the fallout. In fact, one patch earlier this year was committed and reverted due to some aarch64 fallout. That said, Andrew mentioned on IRC that he was interested in getting that patch back in for aarch64 because it helps shrink-wrapping and he believes the patch itself wasn't bad, but exposed a latent issue that was causing the bootstrap issue on aarch64. Surya also just recently submitted another patch to help with the original fallout: [PATCH] lra: initialize allocated_hard_reg_p[] for hard regs referenced in RTL [PR118533] ...which you commented on. She is working on them. I disagree with H.J.'s comment. I have said before at the Cauldron and in some bugzilla's, that Surya's fix is a correct fix. The issues encountered here seem to be latent issues exposed by Surya's fix (read also Matz's reply) and as such, this patch should stay. The correct path here is to track down those latent issues and fix those. I've asked Surya to continue to work on the fallout, but any help from other's is greatly appreciated! Reverting now would also cause performance regressions on Power, RISC-V and ARM. Peter
Re: [PING, PATCH] fortran: fix -MT/-MQ adding additional target [PR47485]
On 2/3/25 2:14 PM, Vincent Vanlaer wrote: Hi all, Gentle ping for the patch below: https://gcc.gnu.org/pipermail/ fortran/2024-December/061467.html Best wishes, Vincent On 30/12/2024 00:19, Vincent Vanlaer wrote: The -MT and -MQ options should replace the default target in the generated dependency file. deps_add_target needs to be called before cpp_read_main_file, otherwise the original object name is added. gcc/fortran/ PR fortran/47485 * cpp.cc: fix -MT/-MQ adding additional target instead of replacing the default gcc/testsuite/ PR fortran/47485 * gfortran.dg/dependency_generation_1.f90: New test Signed-off-by: Vincent Vanlaer --- gcc/fortran/cpp.cc | 18 -- .../gfortran.dg/dependency_generation_1.f90 | 15 +++ 2 files changed, 27 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/ dependency_generation_1.f90 diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc index 7c5f00cfd69..3b93d17b90f 100644 --- a/gcc/fortran/cpp.cc +++ b/gcc/fortran/cpp.cc @@ -96,6 +96,8 @@ struct gfc_cpp_option_data int deps_skip_system; /* -MM */ const char *deps_filename; /* -M[M]D */ const char *deps_filename_user; /* -MF */ + const char *deps_target_filename; /* -MT / -MQ */ + bool quote_deps_target_filename; /* -MQ */ int deps_missing_are_generated; /* -MG */ int deps_phony; /* -MP */ int warn_date_time; /* -Wdate-time */ @@ -287,6 +289,8 @@ gfc_cpp_init_options (unsigned int decoded_options_count, gfc_cpp_option.deps_missing_are_generated = 0; gfc_cpp_option.deps_filename = NULL; gfc_cpp_option.deps_filename_user = NULL; + gfc_cpp_option.deps_target_filename = NULL; + gfc_cpp_option.quote_deps_target_filename = false; gfc_cpp_option.multilib = NULL; gfc_cpp_option.prefix = NULL; @@ -439,9 +443,8 @@ gfc_cpp_handle_option (size_t scode, const char *arg, int value ATTRIBUTE_UNUSED case OPT_MQ: case OPT_MT: - gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].code = code; - gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].arg = arg; - gfc_cpp_option.deferred_opt_count++; + gfc_cpp_option.quote_deps_target_filename = (code == OPT_MQ); + gfc_cpp_option.deps_target_filename = arg; break; case OPT_P: @@ -593,6 +596,12 @@ gfc_cpp_init_0 (void) } gcc_assert(cpp_in); + + if (gfc_cpp_option.deps_target_filename) + if (mkdeps *deps = cpp_get_deps (cpp_in)) + deps_add_target (deps, gfc_cpp_option.deps_target_filename, + gfc_cpp_option.quote_deps_target_filename); + if (!cpp_read_main_file (cpp_in, gfc_source_file)) errorcount++; } @@ -635,9 +644,6 @@ gfc_cpp_init (void) else cpp_assert (cpp_in, opt->arg); } - else if (opt->code == OPT_MT || opt->code == OPT_MQ) - if (mkdeps *deps = cpp_get_deps (cpp_in)) - deps_add_target (deps, opt->arg, opt->code == OPT_MQ); } /* Pre-defined macros for non-required INTEGER kind types. */ diff --git a/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 b/ gcc/testsuite/gfortran.dg/dependency_generation_1.f90 new file mode 100644 index 000..d42a257f83a --- /dev/null +++ b/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 @@ -0,0 +1,15 @@ +! This test case ensures that the -MT flag is correctly replacing the object name in the dependency file. +! See PR 47485 +! +! Contributed by Vincent Vanlaer +! +! { dg-do preprocess } +! { dg-additional-options "-cpp" } +! { dg-additional-options "-M" } +! { dg-additional-options "-MF deps" } +! { dg-additional-options "-MT obj.o" } + +module test +end module + +! { dg-final { scan-file "deps" "obj.o:.*" } } Do you have commit rights to gcc? I did not catch your original post. Jerry
Re: [PATCH] c++: Improve contracts support in modules [PR108205]
On Mon, Feb 03, 2025 at 06:57:14PM -0500, Jason Merrill wrote: > On 2/1/25 7:03 AM, Nathaniel Shead wrote: > > Regtested on x86_64-pc-linux-gnu (so far just "dg.exp=contract* > > modules.exp=contract*"), OK for trunk if full bootstrap+regtest passes? > > > > -- >8 -- > > > > Modules makes some assumptions about types that currently aren't > > fulfilled by the types created in contracts logic. This patch ensures > > that exporting inline functions using contracts works again with > > modules. > > > > PR c++/108205 > > > > gcc/cp/ChangeLog: > > > > * contracts.cc (get_pseudo_contract_violation_type): Give names > > to generated FIELD_DECLs. > > (declare_handle_contract_violation): Mark contract_violation > > type as external linkage. > > (build_contract_handler_call): Ensure any builtin declarations > > created here aren't treated as attached to the current module. > > OK, but now I'm curious why we don't need this sort of thing in rtti.cc? > Modules streaming ignores the types built for RTTI because DECL_TINFO_P is handled specially in trees_out::decl_node (it just writes enough information for the importer to rebuild the type itself). But it might be worth at least forcing global attachment just in case the types having module attachment causes something else to go wrong; thoughts? That said, we will definitely need something like this for the types built for ubsan (PR98735), which I have some ideas on how to fix but probably won't get to for GCC15 since there's some other complications there. Nathaniel > > gcc/testsuite/ChangeLog: > > > > * g++.dg/modules/contracts-5_a.C: New test. > > * g++.dg/modules/contracts-5_b.C: New test. > > > > Signed-off-by: Nathaniel Shead > > --- > > gcc/cp/contracts.cc | 27 +--- > > gcc/testsuite/g++.dg/modules/contracts-5_a.C | 8 ++ > > gcc/testsuite/g++.dg/modules/contracts-5_b.C | 20 +++ > > 3 files changed, 46 insertions(+), 9 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/modules/contracts-5_a.C > > create mode 100644 gcc/testsuite/g++.dg/modules/contracts-5_b.C > > > > diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc > > index 5782ec8bf29..f2b126c8d6b 100644 > > --- a/gcc/cp/contracts.cc > > +++ b/gcc/cp/contracts.cc > > @@ -1633,19 +1633,22 @@ get_pseudo_contract_violation_type () > >signed char _M_continue; > > If this changes, also update the initializer in > > build_contract_violation. */ > > - const tree types[] = { const_string_type_node, > > -const_string_type_node, > > -const_string_type_node, > > -const_string_type_node, > > -const_string_type_node, > > -uint_least32_type_node, > > -signed_char_type_node }; > > + struct field_info { tree type; const char* name; }; > > + const field_info info[] = { > > + { const_string_type_node, "_M_file" }, > > + { const_string_type_node, "_M_function" }, > > + { const_string_type_node, "_M_comment" }, > > + { const_string_type_node, "_M_level" }, > > + { const_string_type_node, "_M_role" }, > > + { uint_least32_type_node, "_M_line" }, > > + { signed_char_type_node, "_M_continue" } > > + }; > > tree fields = NULL_TREE; > > - for (tree type : types) > > + for (const field_info& i : info) > > { > > /* finish_builtin_struct wants fieldss chained in reverse. */ > > tree next = build_decl (BUILTINS_LOCATION, FIELD_DECL, > > - NULL_TREE, type); > > + get_identifier (i.name), i.type); > > DECL_CHAIN (next) = fields; > > fields = next; > > } > > @@ -1737,6 +1740,7 @@ declare_handle_contract_violation () > > create_implicit_typedef (viol_name, violation); > > DECL_SOURCE_LOCATION (TYPE_NAME (violation)) = BUILTINS_LOCATION; > > DECL_CONTEXT (TYPE_NAME (violation)) = current_namespace; > > + TREE_PUBLIC (TYPE_NAME (violation)) = true; > > pushdecl_namespace_level (TYPE_NAME (violation), /*hidden*/true); > > pop_namespace (); > > pop_nested_namespace (std_node); > > @@ -1761,6 +1765,11 @@ static void > > build_contract_handler_call (tree contract, > > contract_continuation cmode) > > { > > + /* We may need to declare new types, ensure they are not considered > > + attached to a named module. */ > > + auto module_kind_override = make_temp_override > > +(module_kind, module_kind & ~(MK_PURVIEW | MK_ATTACH | MK_EXPORTING)); > > + > > tree violation = build_contract_violation (contract, cmode); > > tree violation_fn = declare_handle_contract_violation (); > > tree call = build_call_n (violation_fn, 1, build_address (violation)); > > diff --git a/gcc/testsuite/g++.dg/modules/contracts-5_a.C > > b/gcc/tes
Re: [PING, PATCH] fortran: fix -MT/-MQ adding additional target [PR47485]
On 4/02/2025 01:42, Jerry D wrote: On 2/3/25 2:14 PM, Vincent Vanlaer wrote: Hi all, Gentle ping for the patch below: https://gcc.gnu.org/pipermail/ fortran/2024-December/061467.html Best wishes, Vincent On 30/12/2024 00:19, Vincent Vanlaer wrote: The -MT and -MQ options should replace the default target in the generated dependency file. deps_add_target needs to be called before cpp_read_main_file, otherwise the original object name is added. gcc/fortran/ PR fortran/47485 * cpp.cc: fix -MT/-MQ adding additional target instead of replacing the default gcc/testsuite/ PR fortran/47485 * gfortran.dg/dependency_generation_1.f90: New test Signed-off-by: Vincent Vanlaer --- gcc/fortran/cpp.cc | 18 -- .../gfortran.dg/dependency_generation_1.f90 | 15 +++ 2 files changed, 27 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/ dependency_generation_1.f90 diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc index 7c5f00cfd69..3b93d17b90f 100644 --- a/gcc/fortran/cpp.cc +++ b/gcc/fortran/cpp.cc @@ -96,6 +96,8 @@ struct gfc_cpp_option_data int deps_skip_system; /* -MM */ const char *deps_filename; /* -M[M]D */ const char *deps_filename_user; /* -MF */ + const char *deps_target_filename; /* -MT / -MQ */ + bool quote_deps_target_filename; /* -MQ */ int deps_missing_are_generated; /* -MG */ int deps_phony; /* -MP */ int warn_date_time; /* -Wdate-time */ @@ -287,6 +289,8 @@ gfc_cpp_init_options (unsigned int decoded_options_count, gfc_cpp_option.deps_missing_are_generated = 0; gfc_cpp_option.deps_filename = NULL; gfc_cpp_option.deps_filename_user = NULL; + gfc_cpp_option.deps_target_filename = NULL; + gfc_cpp_option.quote_deps_target_filename = false; gfc_cpp_option.multilib = NULL; gfc_cpp_option.prefix = NULL; @@ -439,9 +443,8 @@ gfc_cpp_handle_option (size_t scode, const char *arg, int value ATTRIBUTE_UNUSED case OPT_MQ: case OPT_MT: - gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].code = code; - gfc_cpp_option.deferred_opt[gfc_cpp_option.deferred_opt_count].arg = arg; - gfc_cpp_option.deferred_opt_count++; + gfc_cpp_option.quote_deps_target_filename = (code == OPT_MQ); + gfc_cpp_option.deps_target_filename = arg; break; case OPT_P: @@ -593,6 +596,12 @@ gfc_cpp_init_0 (void) } gcc_assert(cpp_in); + + if (gfc_cpp_option.deps_target_filename) + if (mkdeps *deps = cpp_get_deps (cpp_in)) + deps_add_target (deps, gfc_cpp_option.deps_target_filename, + gfc_cpp_option.quote_deps_target_filename); + if (!cpp_read_main_file (cpp_in, gfc_source_file)) errorcount++; } @@ -635,9 +644,6 @@ gfc_cpp_init (void) else cpp_assert (cpp_in, opt->arg); } - else if (opt->code == OPT_MT || opt->code == OPT_MQ) - if (mkdeps *deps = cpp_get_deps (cpp_in)) - deps_add_target (deps, opt->arg, opt->code == OPT_MQ); } /* Pre-defined macros for non-required INTEGER kind types. */ diff --git a/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 b/ gcc/testsuite/gfortran.dg/dependency_generation_1.f90 new file mode 100644 index 000..d42a257f83a --- /dev/null +++ b/gcc/testsuite/gfortran.dg/dependency_generation_1.f90 @@ -0,0 +1,15 @@ +! This test case ensures that the -MT flag is correctly replacing the object name in the dependency file. +! See PR 47485 +! +! Contributed by Vincent Vanlaer +! +! { dg-do preprocess } +! { dg-additional-options "-cpp" } +! { dg-additional-options "-M" } +! { dg-additional-options "-MF deps" } +! { dg-additional-options "-MT obj.o" } + +module test +end module + +! { dg-final { scan-file "deps" "obj.o:.*" } } Do you have commit rights to gcc? I did not catch your original post. Jerry I do not, this is my first time contributing to GCC. Vincent
Re: [PATCH] c++: Fix up pedwarn for capturing structured bindings in lambdas [PR118719]
On 2/2/25 5:14 AM, Jakub Jelinek wrote: Hi! As mentioned in the PR, this pedwarni is desirable for the implicit or explicit capturing of structured bindings in C++17, but in the case of init-captures the initializer is just some expression and that can include structured bindings. So, the following patch limits the warning to non-explicit_init_p. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. 2025-02-02 Jakub Jelinek PR c++/118719 * lambda.cc (add_capture): Only pedwarn about capturing structured binding if !explicit_init_p. * g++.dg/cpp1z/decomp63.C: New test. --- gcc/cp/lambda.cc.jj 2025-01-24 17:37:49.004457905 +0100 +++ gcc/cp/lambda.cc2025-01-31 23:47:08.907034696 +0100 @@ -613,7 +613,7 @@ add_capture (tree lambda, tree id, tree return error_mark_node; } - if (cxx_dialect < cxx20) + if (cxx_dialect < cxx20 && !explicit_init_p) { auto_diagnostic_group d; tree stripped_init = tree_strip_any_location_wrapper (initializer); --- gcc/testsuite/g++.dg/cpp1z/decomp63.C.jj2025-01-31 23:54:15.480699418 +0100 +++ gcc/testsuite/g++.dg/cpp1z/decomp63.C 2025-01-31 23:53:02.998578507 +0100 @@ -0,0 +1,18 @@ +// PR c++/118719 +// { dg-do compile { target c++11 } } +// { dg-options "" } + +int +main () +{ + int a[] = { 42 }; + auto [x] = a;// { dg-warning "structured bindings only available with" "" { target c++14_down } } + // { dg-message "declared here" "" { target c++17_down } .-1 } + [=] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } + [&] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } + [x] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } + [&x] () { int b = x; (void) b; };// { dg-warning "captured structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } + [x = x] () { int b = x; (void) b; }; // { dg-warning "lambda capture initializers only available with" "" { target c++11_only } } + [y = x] () { int b = y; (void) b; }; // { dg-warning "lambda capture initializers only available with" "" { target c++11_only } } + [y = x * 2] () { int b = y; (void) b; }; // { dg-warning "lambda capture initializers only available with" "" { target c++11_only } } +} Jakub
[committed] i386: Fix and improve TARGET_INDIRECT_BRANCH_REGISTER handling some more
gcc/ChangeLog: * config/i386/i386.md (*sibcall_pop_memory): Disable for TARGET_INDIRECT_BRANCH_REGISTER * config/i386/predicates.md (call_insn_operand): Enable when "satisfies_constraint_Bw (op)" is true, instead of open-coding constraint here. (sibcall_insn_operand): Ditto with "satisfies_constraint_Bs (op)". Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d6ae3ee378a..cb37b2af50b 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -20244,7 +20244,7 @@ (define_insn "*sibcall_pop_memory" (plus:SI (reg:SI SP_REG) (match_operand:SI 2 "immediate_operand" "i"))) (unspec [(const_int 0)] UNSPEC_PEEPSIB)] - "!TARGET_64BIT" + "!TARGET_64BIT && !TARGET_INDIRECT_BRANCH_REGISTER" "* return ix86_output_call_insn (insn, operands[0]);" [(set_attr "type" "call")]) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 9a9101ed374..8631588f78e 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -781,22 +781,14 @@ (define_special_predicate "call_insn_operand" (ior (match_test "constant_call_address_operand (op, mode == VOIDmode ? mode : Pmode)") (match_operand 0 "call_register_operand") - (and (not (match_test "TARGET_INDIRECT_BRANCH_REGISTER")) - (ior (and (not (match_test "TARGET_X32")) - (match_operand 0 "memory_operand")) -(and (match_test "TARGET_X32 && Pmode == DImode") - (match_operand 0 "GOT_memory_operand")) + (match_test "satisfies_constraint_Bw (op)"))) ;; Similarly, but for tail calls, in which we cannot allow memory references. (define_special_predicate "sibcall_insn_operand" (ior (match_test "constant_call_address_operand (op, mode == VOIDmode ? mode : Pmode)") (match_operand 0 "register_no_elim_operand") - (and (not (match_test "TARGET_INDIRECT_BRANCH_REGISTER")) - (ior (and (not (match_test "TARGET_X32")) - (match_operand 0 "sibcall_memory_operand")) -(and (match_test "TARGET_X32 && Pmode == DImode") - (match_operand 0 "GOT_memory_operand")) + (match_test "satisfies_constraint_Bs (op)"))) ;; Return true if OP is a 32-bit GOT symbol operand. (define_predicate "GOT32_symbol_operand"
Re: [PATCH] IBM zSystems: Do not use @PLT with larl
gcc/ChangeLog: * config/s390/s390.cc (print_operand): Remove the no longer necessary 31-bit and weak symbol handling. * config/s390/s390.md (*movdi_64): Do not use @PLT with larl. (*movsi_larl): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/call-z10-pic-nodatarel.c: Adjust expectations. * gcc.target/s390/call-z10-pic.c: Likewise. * gcc.target/s390/call-z10.c: Likewise. * gcc.target/s390/call-z9-pic-nodatarel.c: Likewise. * gcc.target/s390/call-z9-pic.c: Likewise. * gcc.target/s390/call-z9.c: Likewise. Ok. Thanks! Andreas --- gcc/config/s390/s390.cc | 16 +++- gcc/config/s390/s390.md | 8 .../gcc.target/s390/call-z10-pic-nodatarel.c | 6 ++ gcc/testsuite/gcc.target/s390/call-z10-pic.c | 6 ++ gcc/testsuite/gcc.target/s390/call-z10.c | 14 +- .../gcc.target/s390/call-z9-pic-nodatarel.c | 6 ++ gcc/testsuite/gcc.target/s390/call-z9-pic.c | 6 ++ gcc/testsuite/gcc.target/s390/call-z9.c | 14 +- 8 files changed, 25 insertions(+), 51 deletions(-) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 86a5f059b85..1d96df49fea 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -8585,7 +8585,7 @@ print_operand_address (FILE *file, rtx addr) 'E': print opcode suffix for branch on index instruction. 'G': print the size of the operand in bytes. 'J': print tls_load/tls_gdcall/tls_ldcall suffix -'K': print @PLT suffix for call targets and load address values. +'K': print @PLT suffix for branch targets; do not use with larl. 'M': print the second word of a TImode operand. 'N': print the second word of a DImode operand. 'O': print only the displacement of a memory reference or address. @@ -8854,19 +8854,9 @@ print_operand (FILE *file, rtx x, int code) call even static functions via PLT. ld will optimize @PLT away for normal code, and keep it for patches. - Do not indiscriminately add @PLT in 31-bit mode due to the %r12 -restriction, use UNSPEC_PLT31 instead. - @PLT only makes sense for functions, data is taken care of by --mno-pic-data-is-text-relative. - -Adding @PLT interferes with handling of weak symbols in non-PIC code, -since their addresses are loaded with larl, which then always produces -a non-NULL result, so skip them here as well. */ - if (TARGET_64BIT - && GET_CODE (x) == SYMBOL_REF - && SYMBOL_REF_FUNCTION_P (x) - && !(SYMBOL_REF_WEAK (x) && !flag_pic)) +-mno-pic-data-is-text-relative. */ + if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_FUNCTION_P (x)) fprintf (file, "@PLT"); return; } diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index c164ea72c78..9d495803387 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -2001,7 +2001,7 @@ vlgvg\t%0,%v1,0 vleg\t%v0,%1,0 vsteg\t%v1,%0,0 - larl\t%0,%1%K1" + larl\t%0,%1" [(set_attr "op_type" "RI,RI,RI,RI,RI,RIL,RIL,RIL,RRE,RRE,RRE,RXY,RIL,RRE,RXY, RXY,RR,RX,RXY,RX,RXY,RIL,SIL,*,*,RS,RS,VRI,VRR,VRS,VRS, VRX,VRX,RIL") @@ -2390,7 +2390,7 @@ (match_operand:SI 1 "larl_operand" "X"))] "!TARGET_64BIT && !FP_REG_P (operands[0])" - "larl\t%0,%1%K1" + "larl\t%0,%1" [(set_attr "op_type" "RIL") (set_attr "type""larl") (set_attr "z10prop" "z10_fwd_A1") @@ -11735,7 +11735,7 @@ [(set (match_operand 0 "register_operand" "=a") (unspec [(label_ref (match_operand 1 "" ""))] UNSPEC_MAIN_BASE))] "GET_MODE (operands[0]) == Pmode" - "larl\t%0,%1%K1" + "larl\t%0,%1" [(set_attr "op_type" "RIL") (set_attr "type""larl") (set_attr "z10prop" "z10_fwd_A1") @@ -11755,7 +11755,7 @@ [(set (match_operand 0 "register_operand" "=a") (unspec [(label_ref (match_operand 1 "" ""))] UNSPEC_RELOAD_BASE))] "GET_MODE (operands[0]) == Pmode" - "larl\t%0,%1%K1" + "larl\t%0,%1" [(set_attr "op_type" "RIL") (set_attr "type""larl") (set_attr "z10prop" "z10_fwd_A1")]) diff --git a/gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c b/gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c index 49984614bc6..6df0c75584f 100644 --- a/gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c +++ b/gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c @@ -7,10 +7,8 @@ /* { dg-final { scan-assembler {lgrl\t%r2,foo@GOTENT\n} { target lp64 } } } */ /* { dg-final { scan-assembler {lrl\t%r2,foo@GOTENT\n} { target { ! lp64 } } } } */ -/* { dg-final { scan-assembler {brasl\t%r\d+,foostatic@PLT\n} { target lp64 } } } */ -/* { dg-final { scan-assembler {bra
Re: [PATCH 61/61] Fix pr54240
On Fri, Jan 31, 2025 at 7:18 PM Aleksandar Rakic wrote: > > From: Chao-ying Fu OK > gcc/testsuite/ > * gcc.target/mips/pr54240.c: Scan phiopt2. > > Cherry-picked 02dd052d4822ca187af075f1fb5301c954844144 > from https://github.com/MIPS/gcc > > Signed-off-by: Chao-ying Fu > Signed-off-by: Aleksandar Rakic > --- > gcc/testsuite/gcc.target/mips/pr54240.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/testsuite/gcc.target/mips/pr54240.c > b/gcc/testsuite/gcc.target/mips/pr54240.c > index d3976f6cfef..31b793bb8c6 100644 > --- a/gcc/testsuite/gcc.target/mips/pr54240.c > +++ b/gcc/testsuite/gcc.target/mips/pr54240.c > @@ -27,4 +27,4 @@ NOMIPS16 int foo(S *s) >return next->v; > } > > -/* { dg-final { scan-tree-dump "Hoisting adjacent loads" "phiopt1" } } */ > +/* { dg-final { scan-tree-dump "Hoisting adjacent loads" "phiopt2" } } */ > -- > 2.34.1
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On Mon, Feb 3, 2025 at 10:32 AM H.J. Lu wrote: > > On Mon, Feb 3, 2025 at 5:27 PM Richard Biener > wrote: > > > > On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu wrote: > > > > > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > > > Author: Surya Kumari Jangala > > > Date: Tue Jun 25 08:37:49 2024 -0500 > > > > > > ira: Scale save/restore costs of callee save registers with block > > > frequency > > > > > > scales the cost of saving/restoring a callee-save hard register in > > > epilogue > > > and prologue with the entry block frequency, which, if not optimizing for > > > size, is 1, for all targets. As the result, callee-saved registers > > > may not be used to preserve local variable values across calls on some > > > targets, like x86. Add a target hook for the callee-saved register cost > > > scale in epilogue and prologue used by IRA. The default version of this > > > target hook returns 1 if optimizing for size, otherwise returns the entry > > > block frequency. Add an x86 version of this target hook to restore the > > > old behavior prior to the above commit. > > > > > > PR rtl-optimization/111673 > > > PR rtl-optimization/115932 > > > PR rtl-optimization/116028 > > > PR rtl-optimization/117081 > > > PR rtl-optimization/117082 > > > PR rtl-optimization/118497 > > > * ira-color.cc (assign_hard_reg): Call the target hook for the > > > callee-saved register cost scale in epilogue and prologue. > > > * target.def (ira_callee_saved_register_cost_scale): New target > > > hook. > > > * targhooks.cc (default_ira_callee_saved_register_cost_scale): > > > New. > > > * targhooks.h (default_ira_callee_saved_register_cost_scale): > > > Likewise. > > > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): > > > New. > > > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise. > > > * doc/tm.texi: Regenerated. > > > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): > > > New. > > > > > > Signed-off-by: H.J. Lu > > > --- > > > gcc/config/i386/i386.cc | 11 +++ > > > gcc/doc/tm.texi | 8 > > > gcc/doc/tm.texi.in | 2 ++ > > > gcc/ira-color.cc| 3 +-- > > > gcc/target.def | 12 > > > gcc/targhooks.cc| 8 > > > gcc/targhooks.h | 1 + > > > 7 files changed, 43 insertions(+), 2 deletions(-) > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > index f89201684a8..3128973ba79 100644 > > > --- a/gcc/config/i386/i386.cc > > > +++ b/gcc/config/i386/i386.cc > > > @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass) > > >return false; > > > } > > > > > > +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ > > > + > > > +static int > > > +ix86_ira_callee_saved_register_cost_scale (int) > > > +{ > > > + return 1; > > > +} > > > + > > > /* Return true if a set of DST by the expression SRC should be allowed. > > > This prevents complex sets of likely_spilled hard regs before split1. > > > */ > > > > > > @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p > > > #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS > > > ix86_preferred_output_reload_class > > > #undef TARGET_CLASS_LIKELY_SPILLED_P > > > #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p > > > +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > > > +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \ > > > + ix86_ira_callee_saved_register_cost_scale > > > > > > #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST > > > #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > > > index 0de24eda6f0..9f42913a4ef 100644 > > > --- a/gcc/doc/tm.texi > > > +++ b/gcc/doc/tm.texi > > > @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for > > > given pseudo from > > >The default version of this target hook always returns given class. > > > @end deftypefn > > > > > > +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > > > (int @var{hard_regno}) > > > +A target hook which returns the callee-saved register @var{hard_regno} > > > +cost scale in epilogue and prologue used by IRA. > > > + > > > +The default version of this target hook returns 1 if optimizing for > > > +size, otherwise returns the entry block frequency. > > > +@end deftypefn > > > + > > > @deftypefn {Target Hook} bool TARGET_LRA_P (void) > > > A target hook which returns true if we use LRA instead of reload pass. > > > > > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > > > index 631d04131e3..6dbe22581ca 100644 > > > --- a/gcc/doc/tm.texi.in > > > +++ b/gcc/doc/tm.texi.in > > > @@ -2388,6 +2388,8 @@ in the reload pass. > > > > > > @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > > > > > > +@hook TARGET_IRA_CALLEE_SAVED_REGIST
Re: Patch held up in gcc-patches due to size
On Feb 02 2025, Thomas Koenig wrote: > I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html > to gcc-patches also, as normal, but got back an e-mail that it > was too large. and that a moderator would look at it. The mail has been accepted anyway: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674931.html -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: [PATCH 31/61] Improve aligned straight line memcpy
On Mon, Feb 03, 2025 at 10:36:15AM +0100, Richard Biener wrote: > > --- a/gcc/config/mips/mips.cc > > +++ b/gcc/config/mips/mips.cc > > @@ -9631,7 +9631,13 @@ mips_expand_block_move (rtx dest, rtx src, rtx > > length, rtx alignment) > > { > >if (ISA_HAS_COPY) > > return mips16_expand_copy (dest, src, length, alignment); > > - else if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_PER_LOOP_ITER) > > + else if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_PER_LOOP_ITER > > + /* We increase slightly the maximum number of bytes in > > + a straight-line block if the source and destination > > + are aligned to the register width. */ > > + || (!optimize_size > > + && INTVAL (alignment) == UNITS_PER_WORD > > + && INTVAL (length) <= MIPS_MAX_MOVE_MEM_STRAIGHT)) The formatting here doesn't follow the coding conventions. Dunno if this is in the MUA say replacing a tab with spaces, but unlikely, e.g. the || line should be indented by one tab and 7 spaces to go under INTVAL in else if line. See https://gcc.gnu.org/contribute.html#standards for details. Jakub
Re: Patch held up in gcc-patches due to size
Hi Thomas, On Sun, Feb 02, 2025 at 07:09:14PM +0100, Thomas Koenig via Gcc wrote: > I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html > to gcc-patches also, as normal, but got back an e-mail that it > was too large. and that a moderator would look at it. I think that was done, since the message is here: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674931.html I also have in in my local gcc-patches inbox. Didn't you receive it yourself through the list? > Maybe the limits can be increased a bit, sometimes patches can > be quite large, especially if they contain large test cases > or a large number of generated files. The problem is, as always spam... Do you find the current limit (400K) restricts you often from fast posting to the gcc-patches list? > (Does anybody actually look at the messages, as promised in the e-mail?= I think it is done multiple times each day. The current moderators are Jeff and Marc, with help from the Sourceware volunteers monitoring postmaster. I know some of these people, including Marc and myself were at Fosdem this weekend. How long did you have to wait for your message to get to the list? Cheers, Mark
[PATCH] tree-optimization/118717 - store commoning vs. abnormals
When we sink common stores in cselim or the sink pass we have to make sure to not introduce overlapping lifetimes for abnormals used in the ref. The easiest is to avoid sinking stmts which reference abnormals at all which is what the following does. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/118717 * tree-ssa-phiopt.cc (cond_if_else_store_replacement_1): Do not common stores referencing abnormal SSA names. * tree-ssa-sink.cc (sink_common_stores_to_bb): Likewise. * gcc.dg/torture/pr118717.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr118717.c | 41 + gcc/tree-ssa-phiopt.cc | 4 ++- gcc/tree-ssa-sink.cc| 4 ++- 3 files changed, 47 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr118717.c diff --git a/gcc/testsuite/gcc.dg/torture/pr118717.c b/gcc/testsuite/gcc.dg/torture/pr118717.c new file mode 100644 index 000..42dc5ec84f2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr118717.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ + +void jj(void); +int ff1(void) __attribute__((__returns_twice__)); +struct s2 { + int prev; +}; +typedef struct s1 { + unsigned interrupt_flag; + unsigned interrupt_mask; + int tag; + int state; +}s1; +int ff(void); +static inline +int mm(s1 *ec) { + if (ff()) +if (ec->interrupt_flag & ~(ec)->interrupt_mask) + return 0; +} +void ll(s1 *ec) { + int t = 1; + int state; + if (t) + { +{ + s1 *const _ec = ec; + struct s2 _tag = {0}; + if (ff1()) + state = ec->state; + else + state = 0; + if (!state) + mm (ec); + _ec->tag = _tag.prev; +} +if (state) + __builtin_exit(0); + } + jj(); +} diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc index 64d3ba9e160..f67f52d2d69 100644 --- a/gcc/tree-ssa-phiopt.cc +++ b/gcc/tree-ssa-phiopt.cc @@ -3646,7 +3646,9 @@ cond_if_else_store_replacement_1 (basic_block then_bb, basic_block else_bb, || else_assign == NULL || !gimple_assign_single_p (else_assign) || gimple_clobber_p (else_assign) - || gimple_has_volatile_ops (else_assign)) + || gimple_has_volatile_ops (else_assign) + || stmt_references_abnormal_ssa_name (then_assign) + || stmt_references_abnormal_ssa_name (else_assign)) return false; lhs = gimple_assign_lhs (then_assign); diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc index e79762b9848..959e0d5c6be 100644 --- a/gcc/tree-ssa-sink.cc +++ b/gcc/tree-ssa-sink.cc @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" #include "tree-eh.h" #include "tree-ssa-live.h" +#include "tree-dfa.h" /* TODO: 1. Sinking store only using scalar promotion (IE without moving the RHS): @@ -516,7 +517,8 @@ sink_common_stores_to_bb (basic_block bb) gimple *def = SSA_NAME_DEF_STMT (arg); if (! is_gimple_assign (def) || stmt_can_throw_internal (cfun, def) - || (gimple_phi_arg_edge (phi, i)->flags & EDGE_ABNORMAL)) + || (gimple_phi_arg_edge (phi, i)->flags & EDGE_ABNORMAL) + || stmt_references_abnormal_ssa_name (def)) { /* ??? We could handle some cascading with the def being another PHI. We'd have to insert multiple PHIs for -- 2.43.0
Re: Patch held up in gcc-patches due to size
February 3, 2025 at 11:02 AM, "Mark Wielaard" mailto:m...@klomp.org?to=%22Mark%20Wielaard%22%20%3Cmark%40klomp.org%3E > wrote: > > (Does anybody actually look at the messages, as promised in the e-mail?= > > > I think it is done multiple times each day. The current moderators are > Jeff and Marc, with help from the Sourceware volunteers monitoring > postmaster. I know some of these people, including Marc and myself > were at Fosdem this weekend. How long did you have to wait for your > message to get to the list? Hello, I usually look at the queue a few times a day (working day)... So at least in my case, I may not be very active during the weekends (even less so this weekend)... As for unlocking too-big patches, I happen to accept the ones that are "close" to the limit. I think I asked last year about the big translation patches and someone (Jospeh IIRC) told me that it was ok to accept them. Should I be more strict and reject anything above the limit? Marc PS: would be nice if git-send-email could take care of this...
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
Richard Biener writes: > On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu wrote: >> >> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b >> Author: Surya Kumari Jangala >> Date: Tue Jun 25 08:37:49 2024 -0500 >> >> ira: Scale save/restore costs of callee save registers with block >> frequency >> >> scales the cost of saving/restoring a callee-save hard register in epilogue >> and prologue with the entry block frequency, which, if not optimizing for >> size, is 1, for all targets. As the result, callee-saved registers >> may not be used to preserve local variable values across calls on some >> targets, like x86. Add a target hook for the callee-saved register cost >> scale in epilogue and prologue used by IRA. The default version of this >> target hook returns 1 if optimizing for size, otherwise returns the entry >> block frequency. Add an x86 version of this target hook to restore the >> old behavior prior to the above commit. >> >> PR rtl-optimization/111673 >> PR rtl-optimization/115932 >> PR rtl-optimization/116028 >> PR rtl-optimization/117081 >> PR rtl-optimization/117082 >> PR rtl-optimization/118497 >> * ira-color.cc (assign_hard_reg): Call the target hook for the >> callee-saved register cost scale in epilogue and prologue. >> * target.def (ira_callee_saved_register_cost_scale): New target >> hook. >> * targhooks.cc (default_ira_callee_saved_register_cost_scale): >> New. >> * targhooks.h (default_ira_callee_saved_register_cost_scale): >> Likewise. >> * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): >> New. >> (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise. >> * doc/tm.texi: Regenerated. >> * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): >> New. >> >> Signed-off-by: H.J. Lu >> --- >> gcc/config/i386/i386.cc | 11 +++ >> gcc/doc/tm.texi | 8 >> gcc/doc/tm.texi.in | 2 ++ >> gcc/ira-color.cc| 3 +-- >> gcc/target.def | 12 >> gcc/targhooks.cc| 8 >> gcc/targhooks.h | 1 + >> 7 files changed, 43 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc >> index f89201684a8..3128973ba79 100644 >> --- a/gcc/config/i386/i386.cc >> +++ b/gcc/config/i386/i386.cc >> @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass) >>return false; >> } >> >> +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ >> + >> +static int >> +ix86_ira_callee_saved_register_cost_scale (int) >> +{ >> + return 1; >> +} >> + >> /* Return true if a set of DST by the expression SRC should be allowed. >> This prevents complex sets of likely_spilled hard regs before split1. */ >> >> @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p >> #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS >> ix86_preferred_output_reload_class >> #undef TARGET_CLASS_LIKELY_SPILLED_P >> #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p >> +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE >> +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \ >> + ix86_ira_callee_saved_register_cost_scale >> >> #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST >> #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> index 0de24eda6f0..9f42913a4ef 100644 >> --- a/gcc/doc/tm.texi >> +++ b/gcc/doc/tm.texi >> @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for >> given pseudo from >>The default version of this target hook always returns given class. >> @end deftypefn >> >> +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE >> (int @var{hard_regno}) >> +A target hook which returns the callee-saved register @var{hard_regno} >> +cost scale in epilogue and prologue used by IRA. >> + >> +The default version of this target hook returns 1 if optimizing for >> +size, otherwise returns the entry block frequency. >> +@end deftypefn >> + >> @deftypefn {Target Hook} bool TARGET_LRA_P (void) >> A target hook which returns true if we use LRA instead of reload pass. >> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in >> index 631d04131e3..6dbe22581ca 100644 >> --- a/gcc/doc/tm.texi.in >> +++ b/gcc/doc/tm.texi.in >> @@ -2388,6 +2388,8 @@ in the reload pass. >> >> @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS >> >> +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE >> + >> @hook TARGET_LRA_P >> >> @hook TARGET_REGISTER_PRIORITY >> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc >> index 0699b349a1a..233060e1587 100644 >> --- a/gcc/ira-color.cc >> +++ b/gcc/ira-color.cc >> @@ -2180,8 +2180,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) >> + ira_memory_move_cost[mode][rclass][1]) >> * saved_nregs / hard_regno_nregs (hard_r
[PATCH] c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro
The first argument is supposed to be a type, not a decl. Bootstrap & regtest running on x86_64-unknown-linux-gnu. OK? PR c++/79786 gcc/cp/ * rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation. --- gcc/cp/rtti.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/cp/rtti.cc b/gcc/cp/rtti.cc index 2dfc2e3d7c5..dcf84f17163 100644 --- a/gcc/cp/rtti.cc +++ b/gcc/cp/rtti.cc @@ -1741,7 +1741,8 @@ emit_tinfo_decl (tree decl) /* Avoid targets optionally bumping up the alignment to improve vector instruction accesses, tinfo are never accessed this way. */ #ifdef DATA_ABI_ALIGNMENT - SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE (decl; + SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (TREE_TYPE (decl), + TYPE_ALIGN (TREE_TYPE (decl; DECL_USER_ALIGN (decl) = true; #endif return true; -- 2.43.0
Re: [PATCH 0/61] Improve Mips target
Richard Biener writes: > On Fri, Jan 31, 2025 at 6:18 PM Aleksandar Rakic > wrote: >> >> This patch series improves the support for the mips64r6 target in GCC, >> includes the enhancements to the general bug fixes and contains other >> MIPS ISA and processor enablement. >> >> These patches are cherry-picked from the mips_rel/11_2_0/master >> and mips_rel/9_3_0/master branches from the MIPS' repository: >> https://github.com/MIPS/gcc . >> Further details on the individual changes are included in the >> respective patches. > > Please split up this series at least into patches that solely affect mips/ > and send patches that touch middle-end parts separately. A 61 patches > series is unlikely to be looked at this way. Sorry to ask, but what about the copyright assignment/DCO side of things? Is it ok to assume that all these patches are covered by MTI's copyright assignment with the FSF, even though MTI didn't submit the patches themselves? (Genuine question, not trying to imply a particular answer.) Thanks, Richard
Re: [PATCH] Fortran: different character lengths in array constructor [PR93289]
Steve Kargl writes: > On Sat, Feb 01, 2025 at 09:49:17PM +0100, Harald Anlauf wrote: >> Am 01.02.25 um 21:03 schrieb Steve Kargl: >> > On Sat, Feb 01, 2025 at 07:25:51PM +0100, Harald Anlauf wrote: >> > > >> > > the attached patch downgrades different constant character lengths in an >> > > array constructor from a GNU to a legacy extension, so that users get a >> > > warning with -std=gnu. We continue to generate an error when standard >> > > conformance is requested. >> > > >> > > Regtested on x86_64-pc-linux-gnu (found one testcase where this >> > > triggered... :) >> > > >> > > OK for mainline? >> > > >> > >> > My vote is 'no'. >> > >> > This is either a GNU extension or an error. It is certainly >> > not a legacy issue as array constructors simple cannot appear >> > old moldy *legacy* codes. >> >> legacy /= moldy. >> >> My intention is to downgrade existing, potentially dangerous >> GNU extensions (like this one) carefully to "legacy", but not >> with an axe. >> >> > I would be in favor of making it a hard error. If you believe >> > gfortan must be able to compile invalid source, then add an option >> > such as -fallow-invalid-scalar-character-entities-in-array-constructor. >> >> I don't see why we shall scare users by making code that is currently >> accepted silently, because it is a GNU extension, suddenly to a hard >> error. >> >> So why must we be so tough? >> > > Because -std=legacy allows a whole bunch of garbage. > > Instead of fixing broken code, a user will slap -std=legacy > in a Makefile and move on. Then years from now, you'll see > -std=legacy in a whole bunch of Makefiles whether it is needed > or not. See -maligned-double and -fallow-argument-mismatch as > poster children. I agree that this is what will happen. But for people running benchmarks, it's kind-of (kind-of) a feature. Benchmarks tend to include relatively old code by the time that they're released, and benchmarks continue to be relevant (or at least widely tested) after they're out of maintenance. So it has been really useful to have -std=legacy accept old, dangerous code, since it means that we can continue to test old benchmarks with newer compilers. Improving the benchmark source to avoid the dangerous constructs would invalidate the test and make it harder to compare with historical results. > Again, just my $0.02. Same here, just wanted to raise the benchmark use case. Thanks, Richard
Re: Patch held up in gcc-patches due to size
On Mon, 3 Feb 2025 at 10:27, Marc Poulhiès wrote: > > I usually look at the queue a few times a day (working day)... So at least in > my case, I may not be very active during the weekends (even less so this > weekend)... > As for unlocking too-big patches, I happen to accept the ones that are > "close" to the limit. I think I asked last year about the big translation > patches and someone (Jospeh IIRC) told me that it was ok to accept them. > Should I be more strict and reject anything above the limit? I think if it's a real patch, not spam, then it's OK to accept it. The limit is there partly to stop spam with large PDF/docx/exe attachments that we never want on the lists. The fact that the limits might also make people think twice before sending half a megabyte of text to hundreds of people's inbox is a useful secondary effect IMHO :-) Very, very few people who receive 500kB of generated code or testcases are actually going to review all of that. On the other hand, if patchwork and the automated CI can't handle compressed attachments (can they?) then gzipping things causes other problems.
Re: Patch held up in gcc-patches due to size
On Mon, Feb 03, 2025 at 10:55:10AM +, Jonathan Wakely via Gcc wrote: > On Mon, 3 Feb 2025 at 10:27, Marc Poulhiès wrote: > > > > I usually look at the queue a few times a day (working day)... So at least > > in my case, I may not be very active during the weekends (even less so this > > weekend)... > > As for unlocking too-big patches, I happen to accept the ones that are > > "close" to the limit. I think I asked last year about the big translation > > patches and someone (Jospeh IIRC) told me that it was ok to accept them. > > Should I be more strict and reject anything above the limit? > > I think if it's a real patch, not spam, then it's OK to accept it. The And if the sender has not tried to send it (almost) immediately split up as a patch series or gzipped etc. In that case letting the large patch through would be just waste of bandwidth. Jakub
Re: [PATCH] arm: testsuite: Adapt mve-vabs.c to improved codegen
On Sun, 2 Feb 2025 at 21:18, Thiago Jung Bauermann wrote: > > Since commit r15-491-gc290e6a0b7a9de this failure happens on on > armv8l-linux-gnueabihf and arm-eabi: > > Running gcc:gcc.target/arm/simd/simd.exp ... > gcc.target/arm/simd/mve-vabs.c: memmove found 0 times > FAIL: gcc.target/arm/simd/mve-vabs.c scan-assembler-times memmove 3 > > In PR PR target/116010, Andrew Pinski noted that > "gcc.target/arm/simd/mve-vabs.c now calls memcpy because of the restrict > instead of memmove. That should be a simple fix there." > > Therefore change the test to expect memcpy rather than memmove. > > Another change is that memcpy is inlined rather than called, so also change > the test to check the optimized tree dump rather than the generated > assembly. > > Tested on armv8l-linux-gnueabihf and arm-eabi. > LGTM, thanks. Christophe > gcc/testsuite/ChangeLog: > PR target/116010 > * gcc.target/arm/simd/mve-vabs.c: Test tree dump and adjust to new > code. > > Suggested-by: Andrew Pinski > --- > gcc/testsuite/gcc.target/arm/simd/mve-vabs.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c > b/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c > index f2f9ee349906..e85d0b18ee71 100644 > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c > @@ -1,7 +1,7 @@ > /* { dg-do assemble } */ > /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > /* { dg-add-options arm_v8_1m_mve_fp } */ > -/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > +/* { dg-additional-options "-O3 -funsafe-math-optimizations > -fdump-tree-optimized" } */ > > #include > #include > @@ -35,10 +35,10 @@ FUNC_FLOAT(f, float, 32, 4, vabs) > FUNC(f, float, 16, 8, vabs) > > /* Taking the absolute value of an unsigned value is a no-op, so half of the > - integer optimizations actually generate a call to memmove, the other ones > a > + integer optimizations actually generate a call to memcpy, the other ones a > 'vabs'. */ > /* { dg-final { scan-assembler-times {vabs.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } > */ > /* { dg-final { scan-assembler-times {vabs.f[0-9]+\tq[0-9]+, q[0-9]+} 2 } } > */ > /* { dg-final { scan-assembler-times {vldr[bhw].[0-9]+\tq[0-9]+} 5 } } */ > /* { dg-final { scan-assembler-times {vstr[bhw].[0-9]+\tq[0-9]+} 5 } } */ > -/* { dg-final { scan-assembler-times {memmove} 3 } } */ > +/* { dg-final { scan-tree-dump-times "memcpy" 3 "optimized" } } */
Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg
Jeff Law writes: >>> Focusing on this insn: >>> (insn 77 75 80 6 (parallel [ (set (reg:DI 75 [ _32 ]) (plus:DI (reg:DI 73 [ _31 ]) (subreg:DI (reg/v:SI 41 [ __n ]) 0))) (clobber (scratch:SI)) ]) "j.C":50:38 discrim 1 155 {adddi3} (expr_list:REG_DEAD (reg:DI 73 [ _31 ]) (expr_list:REG_DEAD (reg/v:SI 41 [ __n ]) (nil >>> >>> Not surprisingly we're focused on the subreg expression in there. >>> >>> The first checkpoint in my mind is IRA's allocation where we assign it >>> to reg 0. >>> >>> Popping a0(r41,l0) -- assign reg 0 >>> >>> >>> So given the use inside a paradoxical subreg, do we consider this valid? >>> >>> After the discussion from last week, I'm leaning a bit more towards no >>> than before. >> >> I thought it wasn't valid. AIUI, there are two mechanisms that try >> to prevent it: >> >> - valid_mode_changes_for_regno, which says which hard registers can >>form all subregs required by a pseudo. This is only used to restrict >>class choices though, rather than forbid individual registers. >> >> - This code in ira_build_conflicts: >> >>/* Now we deal with paradoxical subreg cases where certain registers >> cannot be accessed in the widest mode. */ >>machine_mode outer_mode = ALLOCNO_WMODE (a); >>machine_mode inner_mode = ALLOCNO_MODE (a); >>if (paradoxical_subreg_p (outer_mode, inner_mode)) >> { >>enum reg_class aclass = ALLOCNO_CLASS (a); >>for (int j = ira_class_hard_regs_num[aclass] - 1; j >= 0; --j) >> { >> int inner_regno = ira_class_hard_regs[aclass][j]; >> int outer_regno = simplify_subreg_regno (inner_regno, >> inner_mode, 0, >> outer_mode); >> if (outer_regno < 0 >> || !in_hard_reg_set_p (reg_class_contents[aclass], >>outer_mode, outer_regno)) >> { >> SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), >> inner_regno); >> SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), >> inner_regno); >> } >> } >> } >> >>which operates at the level of individual registers. >> >> So yeah, I think the first question is why ira_build_conflicts isn't >> kicking in for this register or (if it is) why we still get register 0. > So pulling on this thread leads me into the code that sets up > ALLOCNO_WMODE in create_insn_allocnos: > >> if ((a = ira_curr_regno_allocno_map[regno]) == NULL) >> { >> a = ira_create_allocno (regno, false, ira_curr_loop_tree_node); >> if (outer != NULL && GET_CODE (outer) == SUBREG) >> { >> machine_mode wmode = GET_MODE (outer); >> if (partial_subreg_p (ALLOCNO_WMODE (a), wmode)) >> ALLOCNO_WMODE (a) = wmode; >> } >> } > Note how we only set ALLOCNO_MODE only at allocno creation, so it'll > work as intended if and only if the first reference is via a SUBREG. Huh, yeah, I agree that that looks wrong. > ISTM the fix here is to always do the check and set ALLOCNO_WMODE. > > The other bug I see is that we may potentially have paradoxicals in > different modes. ie, on a 32 bit target, we could in theory have a > paradoxical in DI and another in TI. So in addition to pulling that > code out of the conditional so that it executes every time, the > assignment would look like > > if (partial_subreg_p (ALLCONO_WMODE (a), wmode) > && wmode > ALLOCNO_WMODE (a)) >ALLOCNO_WMODE (a) = wmode; > > Or something along those lines. Not sure about this part though. The construct: if (partial_subreg_p (ALLCONO_WMODE (a), wmode)) ALLOCNO_WMODE (a) = wmode; is effectively: ALLOCNO_WMODE (a) = MAX_SIZE (ALLOCNO_WMODE (a), wmode); and so already picks the single widest mode, if there is one. For things like DI vs DF, it will use the existing mode as a tie-breaker. So ISTM that moving the code out of the "if (... == NULL)" should be enough on its own. > And it all makes sense that you caught this. You and another colleague > at ARM were trying to address this exact problem ~11 years ago ;-) Heh, thought it sounded familiar :) Richard
Re: [PATCH] ira: Cap callee-saved register cost scale to 300
On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu wrote: > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener > wrote: > > > > > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu : > > > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > > wrote: > > >> > > >> > > >> > > Am 02.02.2025 um 08:00 schrieb H.J. Lu : > > >>> > > >>> Don't increase callee-saved register cost by 1000x, which leads to that > > >>> callee-saved registers aren't used to preserve local variable values > > >>> across calls, by capping the scale to 300. > > >> > > >>> PR rtl-optimization/111673 > > >>> PR rtl-optimization/115932 > > >>> PR rtl-optimization/116028 > > >>> PR rtl-optimization/117081 > > >>> PR rtl-optimization/118497 > > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost > > >>> scale to 300. > > >>> > > >>> Signed-off-by: H.J. Lu > > >>> --- > > >>> gcc/ira-color.cc | 16 ++-- > > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > > >>> > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > >>> index 0699b349a1a..707ff188250 100644 > > >>> --- a/gcc/ira-color.cc > > >>> +++ b/gcc/ira-color.cc > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > > >>> /* We need to save/restore the hard register in > > >>>epilogue/prologue. Therefore we increase the cost. */ > > >>> { > > >>> +int scale; > > >>> +if (optimize_size) > > >>> + scale = 1; > > >>> +else > > >>> + { > > >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > >>> +/* Don't increase callee-saved register cost by 1000x, > > >>> + which leads to that callee-saved registers aren't > > >>> + used to preserve local variable values across calls, > > >>> + by capping the scale to 300. */ > > >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > > >>> + scale = 300; > > >> > > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have > > >> expected to scale this down to [0, 300] or is MAX a magic value? > > > > > > There are > > > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > > > This constant does not need to be high, as in infrequently executed > > > regions we want to count instructions equivalently to optimize for > > > size instead of speed. */ > > > #define REG_FREQ_MAX 1000 > > > > > > /* Compute register frequency from the BB frequency. When optimizing for > > > size, > > > or profile driven feedback is available and the function is never > > > executed, > > > frequency is always equivalent. Otherwise rescale the basic block > > > frequency. */ > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > > \ > > > || !cfun->cfg->count_max.initialized_p ()) > > >\ > > > ? REG_FREQ_MAX > > >\ > > > : ((bb)->count.to_frequency (cfun) > > >\ > > >* REG_FREQ_MAX / BB_FREQ_MAX) > > >\ > > > ? ((bb)->count.to_frequency (cfun) > > >\ > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > >\ > > > : 1) > > > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only want > > > to get a more reasonable default scale, instead of 1000. Lower > > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > > > I see. Why not adjust the above macro then? That would be a bit more > > obvious. Like use MAX/2 or so? > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > Author: Surya Kumari Jangala > Date: Tue Jun 25 08:37:49 2024 -0500 > > ira: Scale save/restore costs of callee save registers with block > frequency > > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. > I don't want to change REG_FREQ_FROM_BB since it is used in other places, > not as a cost scale. Maybe the above commit should be reverted and we add > a target hook for callee-saved register cost scale. Each target can choose > a proper cost scale, install of increasing the cost by 1000x for everyone. I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least, as it doesn't seem to be used. The comment talks about profile feedback, but for example with -fprofile-correction or -fpartial-profile this test looks odd. In fact optimize_function_for_size_p should already handle this correctly. Also REG_FREQ_FROM_BB simply documents that in this case the frequency will be equivalent for all BBs and not any particular value. The new use might indeed not have the same constraints as others, instead of a target hook making the "same value" another macro argument might be a good first step. That said - does removing the || !cfun->cfg->count
[PATCH] rtl-optimization/117611 - ICE in simplify_shift_const_1
The following checks we have a scalar int shift mode before enforcing it. As AVR shows the mode can be a signed _Accum mode as well. Bootstrap and regtest pending on x86_64-unknown-linux-gnu. OK if that succeeds? Thanks, Richard. PR rtl-optimization/117611 * combine.cc (simplify_shift_const_1): Bail if not scalar int mode. * gcc.target/avr/pr117611.c: New testcase. --- gcc/combine.cc | 6 -- gcc/testsuite/gcc.target/avr/pr117611.c | 7 +++ 2 files changed, 11 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/avr/pr117611.c diff --git a/gcc/combine.cc b/gcc/combine.cc index 90828108ba4..3beeb514b81 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -10635,8 +10635,10 @@ simplify_shift_const_1 (enum rtx_code code, machine_mode result_mode, outer_op, outer_const); } - scalar_int_mode shift_unit_mode - = as_a (GET_MODE_INNER (shift_mode)); + scalar_int_mode shift_unit_mode; + if (!is_a (GET_MODE_INNER (shift_mode), + &shift_unit_mode)) + return NULL_RTX; /* Handle cases where the count is greater than the size of the mode minus 1. For ASHIFT, use the size minus one as the count (this can diff --git a/gcc/testsuite/gcc.target/avr/pr117611.c b/gcc/testsuite/gcc.target/avr/pr117611.c new file mode 100644 index 000..c76093f12d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/avr/pr117611.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +_Accum acc1 (_Accum x) +{ +return x << 16; +} -- 2.43.0
Re: [PATCH] Fortran: different character lengths in array constructor [PR93289]
On 2/3/25 2:49 AM, Richard Sandiford wrote: Steve Kargl writes: On Sat, Feb 01, 2025 at 09:49:17PM +0100, Harald Anlauf wrote: Am 01.02.25 um 21:03 schrieb Steve Kargl: On Sat, Feb 01, 2025 at 07:25:51PM +0100, Harald Anlauf wrote: the attached patch downgrades different constant character lengths in an array constructor from a GNU to a legacy extension, so that users get a warning with -std=gnu. We continue to generate an error when standard conformance is requested. Regtested on x86_64-pc-linux-gnu (found one testcase where this triggered... :) OK for mainline? My vote is 'no'. This is either a GNU extension or an error. It is certainly not a legacy issue as array constructors simple cannot appear old moldy *legacy* codes. legacy /= moldy. My intention is to downgrade existing, potentially dangerous GNU extensions (like this one) carefully to "legacy", but not with an axe. I would be in favor of making it a hard error. If you believe gfortan must be able to compile invalid source, then add an option such as -fallow-invalid-scalar-character-entities-in-array-constructor. I don't see why we shall scare users by making code that is currently accepted silently, because it is a GNU extension, suddenly to a hard error. So why must we be so tough? Because -std=legacy allows a whole bunch of garbage. Instead of fixing broken code, a user will slap -std=legacy in a Makefile and move on. Then years from now, you'll see -std=legacy in a whole bunch of Makefiles whether it is needed or not. See -maligned-double and -fallow-argument-mismatch as poster children. I agree that this is what will happen. But for people running benchmarks, it's kind-of (kind-of) a feature. Benchmarks tend to include relatively old code by the time that they're released, and benchmarks continue to be relevant (or at least widely tested) after they're out of maintenance. So it has been really useful to have -std=legacy accept old, dangerous code, since it means that we can continue to test old benchmarks with newer compilers. Improving the benchmark source to avoid the dangerous constructs would invalidate the test and make it harder to compare with historical results. Again, just my $0.02. Same here, just wanted to raise the benchmark use case. Thanks, Richard I think we have had good discussion and for sake of the good of the order I recommend we push this for now. The work has been done. Regards, Jerry
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On Mon, Feb 3, 2025 at 6:29 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu wrote: > >> > >> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > >> Author: Surya Kumari Jangala > >> Date: Tue Jun 25 08:37:49 2024 -0500 > >> > >> ira: Scale save/restore costs of callee save registers with block > >> frequency > >> > >> scales the cost of saving/restoring a callee-save hard register in epilogue > >> and prologue with the entry block frequency, which, if not optimizing for > >> size, is 1, for all targets. As the result, callee-saved registers > >> may not be used to preserve local variable values across calls on some > >> targets, like x86. Add a target hook for the callee-saved register cost > >> scale in epilogue and prologue used by IRA. The default version of this > >> target hook returns 1 if optimizing for size, otherwise returns the entry > >> block frequency. Add an x86 version of this target hook to restore the > >> old behavior prior to the above commit. > >> > >> PR rtl-optimization/111673 > >> PR rtl-optimization/115932 > >> PR rtl-optimization/116028 > >> PR rtl-optimization/117081 > >> PR rtl-optimization/117082 > >> PR rtl-optimization/118497 > >> * ira-color.cc (assign_hard_reg): Call the target hook for the > >> callee-saved register cost scale in epilogue and prologue. > >> * target.def (ira_callee_saved_register_cost_scale): New target > >> hook. > >> * targhooks.cc (default_ira_callee_saved_register_cost_scale): > >> New. > >> * targhooks.h (default_ira_callee_saved_register_cost_scale): > >> Likewise. > >> * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): > >> New. > >> (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise. > >> * doc/tm.texi: Regenerated. > >> * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): > >> New. > >> > >> Signed-off-by: H.J. Lu > >> --- > >> gcc/config/i386/i386.cc | 11 +++ > >> gcc/doc/tm.texi | 8 > >> gcc/doc/tm.texi.in | 2 ++ > >> gcc/ira-color.cc| 3 +-- > >> gcc/target.def | 12 > >> gcc/targhooks.cc| 8 > >> gcc/targhooks.h | 1 + > >> 7 files changed, 43 insertions(+), 2 deletions(-) > >> > >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > >> index f89201684a8..3128973ba79 100644 > >> --- a/gcc/config/i386/i386.cc > >> +++ b/gcc/config/i386/i386.cc > >> @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass) > >>return false; > >> } > >> > >> +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ > >> + > >> +static int > >> +ix86_ira_callee_saved_register_cost_scale (int) > >> +{ > >> + return 1; > >> +} > >> + > >> /* Return true if a set of DST by the expression SRC should be allowed. > >> This prevents complex sets of likely_spilled hard regs before split1. > >> */ > >> > >> @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p > >> #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS > >> ix86_preferred_output_reload_class > >> #undef TARGET_CLASS_LIKELY_SPILLED_P > >> #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p > >> +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > >> +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \ > >> + ix86_ira_callee_saved_register_cost_scale > >> > >> #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST > >> #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ > >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > >> index 0de24eda6f0..9f42913a4ef 100644 > >> --- a/gcc/doc/tm.texi > >> +++ b/gcc/doc/tm.texi > >> @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for > >> given pseudo from > >>The default version of this target hook always returns given class. > >> @end deftypefn > >> > >> +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > >> (int @var{hard_regno}) > >> +A target hook which returns the callee-saved register @var{hard_regno} > >> +cost scale in epilogue and prologue used by IRA. > >> + > >> +The default version of this target hook returns 1 if optimizing for > >> +size, otherwise returns the entry block frequency. > >> +@end deftypefn > >> + > >> @deftypefn {Target Hook} bool TARGET_LRA_P (void) > >> A target hook which returns true if we use LRA instead of reload pass. > >> > >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > >> index 631d04131e3..6dbe22581ca 100644 > >> --- a/gcc/doc/tm.texi.in > >> +++ b/gcc/doc/tm.texi.in > >> @@ -2388,6 +2388,8 @@ in the reload pass. > >> > >> @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > >> > >> +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > >> + > >> @hook TARGET_LRA_P > >> > >> @hook TARGET_REGISTER_PRIORITY > >> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > >
Re: [PATCH] RX: Restrict displacement ranges in "Q" constraint
On Thu, 30 Jan 2025 00:11:01 +0900, Jeff Law wrote: > > > > On 1/29/25 3:47 AM, Yoshinori Sato wrote: > > When using the "Q" constraint in the inline assembler, the displacement > > value > > could exceed the range specified by the instruction. > > To avoid this issue, a displacement range check is added to the "Q" > > constraint. > > > Thanks. I've pushed this to the trunk, even though it's not a > regression as it's limited to the rx port and fixes a clear bug. > > In the future, if you could include a testcase it'd be useful. > > Thanks again, > Jeff > Thank,s The source code that caused this problem is large, so if I can make it smaller I'll add it as a test. -- Yosinori Sato
[PATCH v1 14/16] Change target_version semantics to follow ACLE specification.
This changes behavior of target_clones and target_version attributes to be inline with what is specified in the Arm C Language Extension. Notably this changes the scope and signature of multiversioned functions to that of the default version, and changes the resolver to be created at the implementation of the default version. This is achieved by changing the C++ front end to no longer resolve any non-default version decls in lookup, and by moving dipatching for default_target sets to reuse the dispatching logic for target_clones in multiple_target.cc. This also fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118313 for aarch64 and riscv. This changes the behavior of both the aarch64, and riscv targets. gcc/ChangeLog: * cgraphunit.cc (analyze_functions): Add dependency from default node to non-default versions. * ipa.cc (symbol_table::remove_unreachable_nodes): Ditto. * multiple_target.cc (ipa_target_clone): Change logic to conditionally dispatch target_clones and to dispatch some target_version sets. gcc/cp/ChangeLog: * call.cc (add_candidates): For target_version semantics don't resolve non-default versions. * class.cc (resolve_address_of_overloaded_function): Ditto. * cp-gimplify.cc (cp_genericize_r): For target_version semantics don't redirect calls to versioned functions (done later at multiple_target.cc.) * decl.cc (start_decl): Mangle and mark all non-default function decls. (start_preparsed_function): Ditto. * typeck.cc (cp_build_function_call_vec): Add error if target has no default implementation. gcc/testsuite/ChangeLog: * g++.target/aarch64/mv-1.C: Change for new semantics. * g++.target/aarch64/mv-symbols2.C: Ditto. * g++.target/aarch64/mv-symbols3.C: Ditto. * g++.target/aarch64/mv-symbols4.C: Ditto. * g++.target/aarch64/mv-symbols5.C: Ditto. * g++.target/aarch64/mvc-symbols3.C: Ditto. * g++.target/riscv/mv-symbols2.C: Ditto. * g++.target/riscv/mv-symbols3.C: Ditto. * g++.target/riscv/mv-symbols4.C: Ditto. * g++.target/riscv/mv-symbols5.C: Ditto. * g++.target/riscv/mvc-symbols3.C: Ditto. * g++.target/aarch64/mv-symbols10.C: New test. * g++.target/aarch64/mv-symbols11.C: New test. * g++.target/aarch64/mv-symbols12.C: New test. * g++.target/aarch64/mv-symbols14.C: New test. * g++.target/aarch64/mv-symbols15.C: New test. * g++.target/aarch64/mv-symbols6.C: New test. * g++.target/aarch64/mv-symbols8.C: New test. * g++.target/aarch64/mv-symbols9.C: New test. --- gcc/cgraphunit.cc | 9 gcc/cp/call.cc| 8 gcc/cp/class.cc | 11 - gcc/cp/cp-gimplify.cc | 6 ++- gcc/cp/decl.cc| 24 ++ gcc/cp/typeck.cc | 8 gcc/ipa.cc| 11 + gcc/multiple_target.cc| 13 - gcc/testsuite/g++.target/aarch64/mv-1.C | 4 ++ .../g++.target/aarch64/mv-symbols10.C | 43 + .../g++.target/aarch64/mv-symbols11.C | 27 +++ .../g++.target/aarch64/mv-symbols12.C | 18 +++ .../g++.target/aarch64/mv-symbols14.C | 16 +++ .../g++.target/aarch64/mv-symbols15.C | 16 +++ .../g++.target/aarch64/mv-symbols2.C | 12 ++--- .../g++.target/aarch64/mv-symbols3.C | 6 +-- .../g++.target/aarch64/mv-symbols4.C | 6 +-- .../g++.target/aarch64/mv-symbols5.C | 6 +-- .../g++.target/aarch64/mv-symbols6.C | 23 + .../g++.target/aarch64/mv-symbols8.C | 48 +++ .../g++.target/aarch64/mv-symbols9.C | 46 ++ .../g++.target/aarch64/mvc-symbols3.C | 12 ++--- gcc/testsuite/g++.target/riscv/mv-symbols2.C | 12 ++--- gcc/testsuite/g++.target/riscv/mv-symbols3.C | 6 +-- gcc/testsuite/g++.target/riscv/mv-symbols4.C | 6 +-- gcc/testsuite/g++.target/riscv/mv-symbols5.C | 6 +-- gcc/testsuite/g++.target/riscv/mvc-symbols3.C | 12 ++--- 27 files changed, 368 insertions(+), 47 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols10.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols11.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols12.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols14.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols15.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols6.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols8.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols9.C diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc index 82f205488e9..f7f8957e618 100644 ---
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On 2/3/25 2:31 AM, H.J. Lu wrote: IMO at this point a new target hook should preserve existing behavior by default or alternatively the original patch should be reverted as causing regressions and a new patch introducing the target hook should be installed in next stage1. I believe the original patch should be reverted. Then my patch isn't needed. That patch had significant improvements across the board for RISC-V. I wouldn't want to see it reverted without a strong explanation of why it was wrong. jeff
[PATCH v1 08/16] Add get_clone_versions function.
This is a reimplementation of get_target_clone_attr_len, get_attr_str, and separate_attrs using string_slice and auto_vec to make memory management and use simpler. gcc/c-family/ChangeLog: * c-attribs.cc (handle_target_clones_attribute): Change to use get_clone_versions. gcc/ChangeLog: * tree.cc (get_clone_versions): New function. (get_clone_attr_versions): New function. * tree.h (get_clone_versions): New function. (get_clone_attr_versions): New function. --- gcc/c-family/c-attribs.cc | 2 +- gcc/tree.cc | 40 +++ gcc/tree.h| 5 + 3 files changed, 46 insertions(+), 1 deletion(-) diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index f3181e7b57c..642d724f6c6 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -6129,7 +6129,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args), } } - if (get_target_clone_attr_len (args) == -1) + if (get_clone_attr_versions (args).length () == 1) { warning (OPT_Wattributes, "single % attribute is ignored"); diff --git a/gcc/tree.cc b/gcc/tree.cc index 05f679edc09..346522d01c0 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -15299,6 +15299,46 @@ get_target_clone_attr_len (tree arglist) return str_len_sum; } +/* Returns an auto_vec of string_slices containing the version strings from + ARGLIST. DEFAULT_COUNT is incremented for each default version found. */ + +auto_vec +get_clone_attr_versions (const tree arglist, int *default_count) +{ + gcc_assert (TREE_CODE (arglist) == TREE_LIST); + auto_vec versions; + + static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0}; + string_slice separators = string_slice (separator_str); + + for (tree arg = arglist; arg; arg = TREE_CHAIN (arg)) +{ + string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE (arg))); + for (string_slice attr = string_slice::strtok (&str, separators); + attr.is_valid (); attr = string_slice::strtok (&str, separators)) + { + attr = attr.strip (); + if (attr == string_slice ("default") && default_count) + (*default_count)++; + versions.safe_push (attr); + } +} + return versions; +} + +/* Returns an auto_vec of string_slices containing the version strings from + the target_clone attribute from DECL. DEFAULT_COUNT is incremented for each + default version found. */ +auto_vec +get_clone_versions (const tree decl, int *default_count) +{ + tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl)); + if (!attr) +return auto_vec (); + tree arglist = TREE_VALUE (attr); + return get_clone_attr_versions (arglist, default_count); +} + void tree_cc_finalize (void) { diff --git a/gcc/tree.h b/gcc/tree.h index 21f3cd5525c..aea1cf078a0 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-core.h" #include "options.h" +#include "vec.h" /* Convert a target-independent built-in function code to a combined_fn. */ @@ -7035,5 +7036,9 @@ extern unsigned fndecl_dealloc_argno (tree); extern tree get_attr_nonstring_decl (tree, tree * = NULL); extern int get_target_clone_attr_len (tree); +auto_vec +get_clone_versions (const tree, int * = NULL); +auto_vec +get_clone_attr_versions (const tree, int * = NULL); #endif /* GCC_TREE_H */
Re: Patch held up in gcc-patches due to size
On 2/2/25 11:09 AM, Thomas Koenig wrote: Hi, I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html to gcc-patches also, as normal, but got back an e-mail that it was too large. and that a moderator would look at it. Maybe the limits can be increased a bit, sometimes patches can be quite large, especially if they contain large test cases or a large number of generated files. I do think an increase in size is probably warranted. (Does anybody actually look at the messages, as promised in the e-mail?= I'd been doing this for a while, but at some point over the last few years I lost the password that allowed me to review this stuff. After that it never bubbled up to get attention on my list. jeff
[PATCH v1 16/16] Remove FMV beta warning.
This patch removes the warning for target_version and target_clones in aarch64 as it is now spec compliant. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_process_target_version_attr): Remove warning. gcc/testsuite/ChangeLog: * g++.target/aarch64/mv-1.C: Remove option. * g++.target/aarch64/mv-and-mvc1.C: Remove option. * g++.target/aarch64/mv-and-mvc2.C: Remove option. * g++.target/aarch64/mv-and-mvc3.C: Remove option. * g++.target/aarch64/mv-and-mvc4.C: Remove option. * g++.target/aarch64/mv-error1.C: Remove option. * g++.target/aarch64/mv-error13.C: Remove option. * g++.target/aarch64/mv-error2.C: Remove option. * g++.target/aarch64/mv-error3.C: Remove option. * g++.target/aarch64/mv-error7.C: Remove option. * g++.target/aarch64/mv-error8.C: Remove option. * g++.target/aarch64/mv-error9.C: Remove option. * g++.target/aarch64/mv-pragma.C: Remove option. * g++.target/aarch64/mv-symbols1.C: Remove option. * g++.target/aarch64/mv-symbols10.C: Remove option. * g++.target/aarch64/mv-symbols11.C: Remove option. * g++.target/aarch64/mv-symbols12.C: Remove option. * g++.target/aarch64/mv-symbols14.C: Remove option. * g++.target/aarch64/mv-symbols15.C: Remove option. * g++.target/aarch64/mv-symbols2.C: Remove option. * g++.target/aarch64/mv-symbols3.C: Remove option. * g++.target/aarch64/mv-symbols4.C: Remove option. * g++.target/aarch64/mv-symbols5.C: Remove option. * g++.target/aarch64/mv-symbols6.C: Remove option. * g++.target/aarch64/mv-symbols8.C: Remove option. * g++.target/aarch64/mv-symbols9.C: Remove option. * g++.target/aarch64/mvc-symbols1.C: Remove option. * g++.target/aarch64/mvc-symbols2.C: Remove option. * g++.target/aarch64/mvc-symbols3.C: Remove option. * g++.target/aarch64/mvc-symbols4.C: Remove option. * g++.target/aarch64/mv-warning1.C: Removed. * g++.target/aarch64/mvc-warning1.C: Removed. --- gcc/config/aarch64/aarch64.cc | 9 - gcc/testsuite/g++.target/aarch64/mv-1.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-and-mvc1.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-and-mvc2.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-and-mvc3.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-and-mvc4.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-error1.C| 2 +- gcc/testsuite/g++.target/aarch64/mv-error13.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-error2.C| 2 +- gcc/testsuite/g++.target/aarch64/mv-error3.C| 2 +- gcc/testsuite/g++.target/aarch64/mv-error7.C| 2 +- gcc/testsuite/g++.target/aarch64/mv-error8.C| 2 +- gcc/testsuite/g++.target/aarch64/mv-error9.C| 2 +- gcc/testsuite/g++.target/aarch64/mv-pragma.C| 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols1.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols10.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols11.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols12.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols14.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols15.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols2.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols3.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols4.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols5.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols6.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols8.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-symbols9.C | 2 +- gcc/testsuite/g++.target/aarch64/mv-warning1.C | 9 - gcc/testsuite/g++.target/aarch64/mvc-symbols1.C | 2 +- gcc/testsuite/g++.target/aarch64/mvc-symbols2.C | 2 +- gcc/testsuite/g++.target/aarch64/mvc-symbols3.C | 2 +- gcc/testsuite/g++.target/aarch64/mvc-symbols4.C | 2 +- gcc/testsuite/g++.target/aarch64/mvc-warning1.C | 6 -- 33 files changed, 30 insertions(+), 54 deletions(-) delete mode 100644 gcc/testsuite/g++.target/aarch64/mv-warning1.C delete mode 100644 gcc/testsuite/g++.target/aarch64/mvc-warning1.C diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f6cb7903d88..a2c3ba8e12e 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -19939,15 +19939,6 @@ aarch64_parse_fmv_features (string_slice str, aarch64_feature_flags *isa_flags, static bool aarch64_process_target_version_attr (tree args) { - static bool issued_warning = false; - if (!issued_warning) -{ - warning (OPT_Wexperimental_fmv_target, - "Function Multi Versioning support is experimental, and the " - "behavior is likely to change"); - issued_warning = true; -} - if (TREE_CODE (args) == TREE_LIST) { if (TREE_CHAIN (args)) diff --git a/gcc/testsuite/g++.target/aarch64/mv-1.C b/gcc/testsuite/g++.target/aarch64/mv-1.C index 93b8a136587..4f815e18683
Re: [PATCH v1 08/16] Add get_clone_versions function.
Alfie Richards writes: > This is a reimplementation of get_target_clone_attr_len, > get_attr_str, and separate_attrs using string_slice and auto_vec to make > memory management and use simpler. > > gcc/c-family/ChangeLog: > > * c-attribs.cc (handle_target_clones_attribute): Change to use > get_clone_versions. > > gcc/ChangeLog: > > * tree.cc (get_clone_versions): New function. > (get_clone_attr_versions): New function. > * tree.h (get_clone_versions): New function. > (get_clone_attr_versions): New function. > --- > gcc/c-family/c-attribs.cc | 2 +- > gcc/tree.cc | 40 +++ > gcc/tree.h| 5 + > 3 files changed, 46 insertions(+), 1 deletion(-) > > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc > index f3181e7b57c..642d724f6c6 100644 > --- a/gcc/c-family/c-attribs.cc > +++ b/gcc/c-family/c-attribs.cc > @@ -6129,7 +6129,7 @@ handle_target_clones_attribute (tree *node, tree name, > tree ARG_UNUSED (args), > } > } > > - if (get_target_clone_attr_len (args) == -1) > + if (get_clone_attr_versions (args).length () == 1) > { > warning (OPT_Wattributes, > "single % attribute is ignored"); > diff --git a/gcc/tree.cc b/gcc/tree.cc > index 05f679edc09..346522d01c0 100644 > --- a/gcc/tree.cc > +++ b/gcc/tree.cc > @@ -15299,6 +15299,46 @@ get_target_clone_attr_len (tree arglist) >return str_len_sum; > } > > +/* Returns an auto_vec of string_slices containing the version strings from > + ARGLIST. DEFAULT_COUNT is incremented for each default version found. */ > + > +auto_vec > +get_clone_attr_versions (const tree arglist, int *default_count) > +{ > + gcc_assert (TREE_CODE (arglist) == TREE_LIST); > + auto_vec versions; > + > + static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0}; > + string_slice separators = string_slice (separator_str); > + > + for (tree arg = arglist; arg; arg = TREE_CHAIN (arg)) > +{ > + string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE > (arg))); > + for (string_slice attr = string_slice::strtok (&str, separators); > +attr.is_valid (); attr = string_slice::strtok (&str, separators)) > + { > + attr = attr.strip (); > + if (attr == string_slice ("default") && default_count) Do we need the explicit constructor here? It would be nice if attr == "default" worked. > + (*default_count)++; > + versions.safe_push (attr); > + } > +} > + return versions; > +} > + > +/* Returns an auto_vec of string_slices containing the version strings from > + the target_clone attribute from DECL. DEFAULT_COUNT is incremented for > each > + default version found. */ > +auto_vec > +get_clone_versions (const tree decl, int *default_count) > +{ > + tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl)); > + if (!attr) > +return auto_vec (); > + tree arglist = TREE_VALUE (attr); > + return get_clone_attr_versions (arglist, default_count); > +} > + > void > tree_cc_finalize (void) > { > diff --git a/gcc/tree.h b/gcc/tree.h > index 21f3cd5525c..aea1cf078a0 100644 > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see > > #include "tree-core.h" > #include "options.h" > +#include "vec.h" > > /* Convert a target-independent built-in function code to a combined_fn. */ > > @@ -7035,5 +7036,9 @@ extern unsigned fndecl_dealloc_argno (tree); > extern tree get_attr_nonstring_decl (tree, tree * = NULL); > > extern int get_target_clone_attr_len (tree); > +auto_vec > +get_clone_versions (const tree, int * = NULL); > +auto_vec > +get_clone_attr_versions (const tree, int * = NULL); Formatting nit, but: it's more usual to put declarations on a single line, if they'd fit. Otherwise it looks good, given that patch 13 removes the old functions. Thanks, Richard > > #endif /* GCC_TREE_H */
Re: [PATCH] ira: Cap callee-saved register cost scale to 300
> On Mon, Feb 3, 2025 at 5:21 PM Richard Biener > wrote: > > > > On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu wrote: > > > > > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener > > > wrote: > > > > > > > > > > > > > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu : > > > > > > > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > > > > wrote: > > > > >> > > > > >> > > > > >> > > > > Am 02.02.2025 um 08:00 schrieb H.J. Lu : > > > > >>> > > > > >>> Don't increase callee-saved register cost by 1000x, which leads to > > > > >>> that > > > > >>> callee-saved registers aren't used to preserve local variable values > > > > >>> across calls, by capping the scale to 300. > > > > >> > > > > >>> PR rtl-optimization/111673 > > > > >>> PR rtl-optimization/115932 > > > > >>> PR rtl-optimization/116028 > > > > >>> PR rtl-optimization/117081 > > > > >>> PR rtl-optimization/118497 > > > > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost > > > > >>> scale to 300. > > > > >>> > > > > >>> Signed-off-by: H.J. Lu > > > > >>> --- > > > > >>> gcc/ira-color.cc | 16 ++-- > > > > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > > > > >>> > > > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > > > >>> index 0699b349a1a..707ff188250 100644 > > > > >>> --- a/gcc/ira-color.cc > > > > >>> +++ b/gcc/ira-color.cc > > > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool > > > > >>> retry_p) > > > > >>> /* We need to save/restore the hard register in > > > > >>>epilogue/prologue. Therefore we increase the cost. */ > > > > >>> { > > > > >>> +int scale; > > > > >>> +if (optimize_size) > > > > >>> + scale = 1; > > > > >>> +else > > > > >>> + { > > > > >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > > > >>> +/* Don't increase callee-saved register cost by 1000x, > > > > >>> + which leads to that callee-saved registers aren't > > > > >>> + used to preserve local variable values across calls, > > > > >>> + by capping the scale to 300. */ > > > > >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > > > > >>> + scale = 300; > > > > >> > > > > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have > > > > >> expected to scale this down to [0, 300] or is MAX a magic value? > > > > > > > > > > There are > > > > > > > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > > > > > This constant does not need to be high, as in infrequently executed > > > > > regions we want to count instructions equivalently to optimize for > > > > > size instead of speed. */ > > > > > #define REG_FREQ_MAX 1000 > > > > > > > > > > /* Compute register frequency from the BB frequency. When optimizing > > > > > for size, > > > > > or profile driven feedback is available and the function is never > > > > > executed, > > > > > frequency is always equivalent. Otherwise rescale the basic block > > > > > frequency. */ > > > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > > > > \ > > > > > || !cfun->cfg->count_max.initialized_p > > > > > ()) \ > > > > > ? REG_FREQ_MAX > > > > >\ > > > > > : ((bb)->count.to_frequency (cfun) > > > > >\ > > > > >* REG_FREQ_MAX / BB_FREQ_MAX) > > > > >\ > > > > > ? ((bb)->count.to_frequency (cfun) > > > > >\ > > > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > > >\ > > > > > : 1) > > > > > > > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only > > > > > want > > > > > to get a more reasonable default scale, instead of 1000. Lower > > > > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > > > > > > > I see. Why not adjust the above macro then? That would be a bit more > > > > obvious. Like use MAX/2 or so? > > > > > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > > > Author: Surya Kumari Jangala > > > Date: Tue Jun 25 08:37:49 2024 -0500 > > > > > > ira: Scale save/restore costs of callee save registers with block > > > frequency > > > > > > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. > > > I don't want to change REG_FREQ_FROM_BB since it is used in other places, > > > not as a cost scale. Maybe the above commit should be reverted and we add > > > a target hook for callee-saved register cost scale. Each target can > > > choose > > > a proper cost scale, install of increasing the cost by 1000x for everyone. > > > > I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at > > least, > > as it doesn't seem to be used. The co
Re: [PATCH] ira: Cap callee-saved register cost scale to 300
> > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > > > \ > > > > || !cfun->cfg->count_max.initialized_p > > > > ()) \ > > > > ? REG_FREQ_MAX > > > > \ > > > > : ((bb)->count.to_frequency (cfun) > > > > \ > > > >* REG_FREQ_MAX / BB_FREQ_MAX) > > > > \ > > > > ? ((bb)->count.to_frequency (cfun) > > > > \ > > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > > \ > > > > : 1) > > > > > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only > > > > want > > > > to get a more reasonable default scale, instead of 1000. Lower > > > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > > > > > I see. Why not adjust the above macro then? That would be a bit more > > > obvious. Like use MAX/2 or so? > > > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > > Author: Surya Kumari Jangala > > Date: Tue Jun 25 08:37:49 2024 -0500 > > > > ira: Scale save/restore costs of callee save registers with block > > frequency > > > > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. > > I don't want to change REG_FREQ_FROM_BB since it is used in other places, > > not as a cost scale. Maybe the above commit should be reverted and we add > > a target hook for callee-saved register cost scale. Each target can choose > > a proper cost scale, install of increasing the cost by 1000x for everyone. > > I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least, > as it doesn't seem to be used. The comment talks about profile feedback, It is used by count.to_frequency, which basically computes count/max_count * REG_FREQ_MAX. It aborts if max_count is uninitialized rather than returning arbitrary value... Honza
[PATCH]middle-end: delay checking for alignment to load [PR118464]
Hi All, This fixes two PRs on Early break vectorization by delaying the safety checks to vectorizable_load when the VF, VMAT and vectype are all known. This patch does add two new restrictions: 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven group sizes, as they are unaligned every n % 2 iterations and so may cross a page unwittingly. 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if we cannot peel for alignment, as the alignment requirement is quite large at GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we don't support it for now. There are other steps documented inside the code itself so that the reasoning is next to the code. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/118464 PR tree-optimization/116855 * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay checks. (vect_compute_data_ref_alignment): Remove alignment checks and move to vectorizable_load. (vect_enhance_data_refs_alignment): Add note to comment needing investigating. (vect_analyze_data_refs_alignment): Likewise. (vect_supportable_dr_alignment): For group loads look at first DR. * tree-vect-stmts.cc (get_load_store_type, vectorizable_load): Perform safety checks for early break pfa. * tree-vectorizer.h (dr_peeling_alignment): New. gcc/testsuite/ChangeLog: PR tree-optimization/118464 PR tree-optimization/116855 * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the load type is relaxed later. * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets * g++.dg/vect/vect-early-break_7-pr118464.cc: New test. * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. --- diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c114064a6a603b732 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will register in a basic block. Work bound when discovering transitive relations from existing relations. @item min-pagesize -Minimum page size for warning purposes. +Minimum page size for warning and early break vectorization purposes. @item openacc-kernels Specify mode of OpenACC `kernels' constructs handling. diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc new file mode 100644 index ..4b859488d533bf3ba5d0e0bcf8779d9b024b2596 --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-add-options vect_early_break } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-O3" } */ + +typedef decltype(sizeof(0)) size_t; +struct ts1 { + int spans[6][2]; +}; +struct gg { + int t[6]; +}; +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) { + ts1 ret; + for (size_t i = 0; i != t; i++) { +if (!(i < t)) __builtin_abort(); +ret.spans[i][0] = s1->spans[i][0] + s2->t[i]; +ret.spans[i][1] = s1->spans[i][1] + s2->t[i]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c index 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d93c950629f3231554 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c @@ -55,7 +55,9 @@ int main() } } rephase (); +#pragma GCC novector for (i = 0; i < 32; ++i) +#pragma GCC novector for (j = 0; j < 3; ++j) #pragma GCC novector for (k = 0; k < 3; ++k) diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c index 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..8bd85f3893f08157e640414b5b252b716a8ba93a
RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog guard [PR117790]
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, January 24, 2025 9:18 AM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog > guard [PR117790] > > ping > > > -Original Message- > > From: Tamar Christina > > Sent: Wednesday, January 15, 2025 2:08 PM > > To: Alex Coplan ; gcc-patches@gcc.gnu.org > > Cc: Richard Biener ; Jan Hubicka > > Subject: RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog > > guard [PR117790] > > > > Ping > > > > > -Original Message- > > > From: Alex Coplan > > > Sent: Monday, January 6, 2025 11:35 AM > > > To: gcc-patches@gcc.gnu.org > > > Cc: Richard Biener ; Jan Hubicka ; > Tamar > > > Christina > > > Subject: [PATCH 3/4] vect: Ensure profile consistency when adding epilog > > > guard > > > [PR117790] > > > > > > This patch tries to make the CFG profile consistent when adding a guard > > > edge to skip the epilog during peeling. > > > > > > The changes can be summarized as follows: > > > - We avoid adding the guard edge entirely if the guard condition folds > > >to false, otherwise the profile will become inconsistent since > > >the cfgcleanup code doesn't attempt to update it on removing the dead > > >edge. > > > - If the guard condition instead folds to true, we account for this by > > >giving the skip edge 100% probability (otherwise the profile will > > >again become inconsistent when removing the other now-dead edge). > > > - Finally, we use the new helper scale_loop_freqs_with_new_exit_count > instead > > >of scale_loop_profile to update the epilog frequencies / probabiltiies. > > >We make the assumption here that if the IV exit is taken in the vector > > > loop, > > >then it will also be taken in the epilog (and not an early exit). > > > Since we > > >add the guard to the vector iv exit, we know any reduction in count > > >associated with the epilog skip should be accounted for by a reduction > > > in the > > >epilog's iv exit edge count. > > > > > > Bootstrapped/regtested as a series on aarch64-linux-gnu, > > > arm-linux-gnueabihf, > > > and x86_64-linux-gnu. OK for trunk? > > > > > > Thanks, > > > Alex > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/117790 > > > * tree-vect-loop-manip.cc (vect_do_peeling): Attempt to maintain > > > consistency of the CFG profile when adding an epilog skip edge. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/117790 > > > * gcc.dg/vect/vect-early-break-profile-1.c: New test. > > > --- > > > .../gcc.dg/vect/vect-early-break-profile-1.c | 10 > > > gcc/tree-vect-loop-manip.cc | 48 ++- > > > 2 files changed, 47 insertions(+), 11 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break-profile-1.c
RE: [PATCH 1/4] vect: Set counts of early break exit blocks correctly [PR117790]
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, January 24, 2025 9:17 AM > To: Alex Coplan ; 'gcc-patches@gcc.gnu.org' patc...@gcc.gnu.org> > Cc: 'Richard Biener' ; 'Jan Hubicka' > Subject: RE: [PATCH 1/4] vect: Set counts of early break exit blocks correctly > [PR117790] > > ping > > > -Original Message- > > From: Tamar Christina > > Sent: Wednesday, January 15, 2025 2:07 PM > > To: Alex Coplan ; gcc-patches@gcc.gnu.org > > Cc: Richard Biener ; Jan Hubicka > > Subject: RE: [PATCH 1/4] vect: Set counts of early break exit blocks > > correctly > > [PR117790] > > > > Ping > > > > > -Original Message- > > > From: Alex Coplan > > > Sent: Monday, January 6, 2025 11:34 AM > > > To: gcc-patches@gcc.gnu.org > > > Cc: Richard Biener ; Jan Hubicka ; > Tamar > > > Christina > > > Subject: [PATCH 1/4] vect: Set counts of early break exit blocks correctly > > > [PR117790] > > > > > > This adds missing code to correctly set the counts of the exit blocks we > > > create when building the CFG for a vectorized early break loop. > > > > > > Tested as a series on aarch64-linux-gnu, arm-linux-gnueabihf, and > > > x86_64-linux-gnu. OK for trunk? > > > > > > Thanks, > > > Alex > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/117790 > > > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): > > > Set profile counts for {main,alt}_loop_exit_block. > > > --- > > > gcc/tree-vect-loop-manip.cc | 10 ++ > > > 1 file changed, 10 insertions(+)
RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, January 24, 2025 9:18 AM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of > multi-exit > loops [PR117790] > > ping > > > -Original Message- > > From: Tamar Christina > > Sent: Wednesday, January 15, 2025 2:08 PM > > To: Alex Coplan ; gcc-patches@gcc.gnu.org > > Cc: Richard Biener ; Jan Hubicka > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of > > multi- > exit > > loops [PR117790] > > > > Ping > > > > > -Original Message- > > > From: Alex Coplan > > > Sent: Monday, January 6, 2025 11:35 AM > > > To: gcc-patches@gcc.gnu.org > > > Cc: Richard Biener ; Jan Hubicka ; > Tamar > > > Christina > > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of > > > multi-exit > > > loops [PR117790] > > > > > > As it stands, scale_loop_profile doesn't correctly handle loops with > > > multiple exits. In particular, in the case where the expected niters > > > exceeds iteration_bound, scale_loop_profile attempts to reduce the > > > number of iterations with a call to scale_loop_frequencies, which > > > multiplies the count of each BB by a given probability. This > > > transformation preserves the relationships between the counts of the BBs > > > within the loop (and thus the edge probabilities stay the same) but this > > > cannot possibly work for loops with multiple exits, since in order for > > > the expected niters to reduce (and counts along exit edges to remain the > > > same), the exit edge probabilities must increase, thus decreasing the > > > probabilities of the internal edges, meaning that the ratios of the > > > counts of the BBs inside the loop must change. So we need a different > > > approach (not a straightforward multiplicative scaling) to adjust the > > > expected niters of a loop with multiple exits. > > > > > > This patch introduces a new helper, flow_scale_loop_freqs, which can be > > > used to correctly scale the profile of a loop with multiple exits. It > > > is parameterized by a probability (with which to scale the header and > > > therefore the expected niters) and a lambda which gives the desired > > > counts for the exit edges. In this patch, to make things simpler, > > > flow_scale_loop_freqs only handles loop shapes without internal control > > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test > > > whether a given loop meets these criteria. This restriction is > > > reasonable since this patch is motivated by fixing the profile > > > consistency for early break vectorization, and we don't currently > > > vectorize loops with internal control flow. We also fall back to a > > > multiplicative scaling (the status quo) for loops that > > > flow_scale_loop_freqs can't handle, so the patch should be a net > > > improvement. > > > > > > We wrap the call to flow_scale_loop_freqs in a helper > > > scale_loop_freqs_with_exit_counts which handles the above-mentioned > > > fallback. This wrapper is still generic in that it accepts a lambda to > > > allow overriding the desired exit edge counts. We specialize this with > > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the > > > counts along exit edges fixed), which is then used to implement the > > > niters-scaling case of scale_loop_profile, thus fixing this path through > > > the function for loops with multiple exits. > > > > > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use > > > in subsequent vectorizer patches. scale_loop_profile_hold_exit_counts > > > is a variant of scale_loop_profile which assumes we want to keep the > > > counts along exit edges of the loop fixed through both parts of the > > > transformation (including the initial probability scale). > > > scale_loop_freqs_with_new_exit_count is intended to be used in a > > > subsequent patch when adding a skip edge around the epilog, where the > > > reduction of count entering the loop is mirrored by a reduced count > > > along a given exit edge. > > > > > > Bootstrapped/regtested as a series on aarch64-linux-gnu, > > > x86_64-linux-gnu, and arm-linux-gnueabihf. OK for trunk? > > > > > > Thanks, > > > Alex > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/117790 > > > * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New. > > > (flow_scale_loop_freqs): New. > > > (scale_loop_freqs_with_exit_counts): New. > > > (scale_loop_freqs_hold_exit_counts): New. > > > (scale_loop_profile): Refactor to use the newly-added > > > scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to > > > correctly handle reducing the expected niters for loops with multiple > > > exits. > > > (scale_loop_freqs_with_new_exit_count): New. > > > (scale_loop_profile_1): New. > > > (scale_loop_profile_hold_exit_counts): New. > > > * cfgloopmanip.h (scale_loop_profile
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
Hello, On Mon, 3 Feb 2025, H.J. Lu wrote: > Author: Surya Kumari Jangala > Date: Tue Jun 25 08:37:49 2024 -0500 > > ira: Scale save/restore costs of callee save registers with block > frequency > > scales the cost of saving/restoring a callee-save hard register in epilogue > and prologue with the entry block frequency, which, if not optimizing for > size, is 1, for all targets. This merely represents the fact that the entry block is indeed entered exactly once per function invocation, i.e. 1.0 in fixed point with a scale of 1000. All costs in ira are (supposed to be) scaled by bb-frequency of the allocno/register occurence, and hence this add_cost to cater for xlogue-save/restore needs to be scaled by that as well, which is what Suryas patch was adding. Any fallout from that needs to be addressed on top of that, not by reverting it, or by introducing a hook to avoid that. Think of this scale as an arbitrary value to implement pseudo-fixed-point arithmetic for costs. All values need to be scaled by it. That its value is a seemingly large number of 1000 is not the worry, it represents 1.0 . If the issue is for instance that callee-saved registers aren't used because the prologue save/restore is now deemed too expensive relative to the around-call-save-restore when a call-clobbered register is used, then either the around-call-save-restore instructions aren't correctly costed (perhaps also missing the scale factor?), or because ties aren't broken nicely, in which case adding a 1 at one or the other place might be needed. Ciao, Michael.
Re: [PATCH 0/61] Improve Mips target
On Mon, Feb 3, 2025 at 11:34 AM Richard Sandiford wrote: > > Richard Biener writes: > > On Fri, Jan 31, 2025 at 6:18 PM Aleksandar Rakic > > wrote: > >> > >> This patch series improves the support for the mips64r6 target in GCC, > >> includes the enhancements to the general bug fixes and contains other > >> MIPS ISA and processor enablement. > >> > >> These patches are cherry-picked from the mips_rel/11_2_0/master > >> and mips_rel/9_3_0/master branches from the MIPS' repository: > >> https://github.com/MIPS/gcc . > >> Further details on the individual changes are included in the > >> respective patches. > > > > Please split up this series at least into patches that solely affect mips/ > > and send patches that touch middle-end parts separately. A 61 patches > > series is unlikely to be looked at this way. > > Sorry to ask, but what about the copyright assignment/DCO side of things? > Is it ok to assume that all these patches are covered by MTI's copyright > assignment with the FSF, even though MTI didn't submit the patches > themselves? (Genuine question, not trying to imply a particular answer.) It's a good question since one of the Signed-off e-mails bounces... Richard. > Thanks, > Richard
RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple exits [PR117790]
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, January 24, 2025 9:18 AM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple > exits > [PR117790] > > ping > > > -Original Message- > > From: Tamar Christina > > Sent: Wednesday, January 15, 2025 2:08 PM > > To: Alex Coplan ; gcc-patches@gcc.gnu.org > > Cc: Richard Biener ; Jan Hubicka > > Subject: RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple > > exits > > [PR117790] > > > > Ping > > > > > -Original Message- > > > From: Alex Coplan > > > Sent: Monday, January 6, 2025 11:36 AM > > > To: gcc-patches@gcc.gnu.org > > > Cc: Richard Biener ; Jan Hubicka ; > Tamar > > > Christina > > > Subject: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple > > > exits > > > [PR117790] > > > > > > This adjusts scale_profile_for_vect_loop to DTRT for loops with multiple > > > exits, > > > namely using scale_loop_profile_hold_exit_counts instead and scaling the > > > expected niters by 1 / VF. > > > > > > Tested as a series on aarch64-linux-gnu, arm-linux-gnueabihf, and > > > x86_64-linux-gnu. OK for trunk? > > > > > > Thanks, > > > Alex > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/117790 > > > * tree-vect-loop.cc (scale_profile_for_vect_loop): Use > > > scale_loop_profile_hold_exit_counts instead of scale_loop_profile. Drop > > > the exit edge parameter, since the code now handles multiple exits. > > > Adjust the caller ... > > > (vect_transform_loop): ... here. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/117790 > > > * gcc.dg/vect/vect-early-break-profile-2.c: New test. > > > --- > > > .../gcc.dg/vect/vect-early-break-profile-2.c | 21 +++ > > > gcc/tree-vect-loop.cc | 21 ++- > > > 2 files changed, 27 insertions(+), 15 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break-profile-2.c
Re: [PATCH] c++: Fix up pedwarn for capturing structured bindings in lambdas [PR118719]
On Sun, Feb 02, 2025 at 11:14:55AM +0100, Jakub Jelinek wrote: > Hi! > > As mentioned in the PR, this pedwarni is desirable for the implicit or > explicit capturing of structured bindings in C++17, but in the case of > init-captures the initializer is just some expression and that can include > structured bindings. > > So, the following patch limits the warning to non-explicit_init_p. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? LGTM, sorry for missing the !explicit_init_p check. > 2025-02-02 Jakub Jelinek > > PR c++/118719 > * lambda.cc (add_capture): Only pedwarn about capturing structured > binding if !explicit_init_p. > > * g++.dg/cpp1z/decomp63.C: New test. > > --- gcc/cp/lambda.cc.jj 2025-01-24 17:37:49.004457905 +0100 > +++ gcc/cp/lambda.cc 2025-01-31 23:47:08.907034696 +0100 > @@ -613,7 +613,7 @@ add_capture (tree lambda, tree id, tree > return error_mark_node; > } > > - if (cxx_dialect < cxx20) > + if (cxx_dialect < cxx20 && !explicit_init_p) > { > auto_diagnostic_group d; > tree stripped_init = tree_strip_any_location_wrapper (initializer); > --- gcc/testsuite/g++.dg/cpp1z/decomp63.C.jj 2025-01-31 23:54:15.480699418 > +0100 > +++ gcc/testsuite/g++.dg/cpp1z/decomp63.C 2025-01-31 23:53:02.998578507 > +0100 > @@ -0,0 +1,18 @@ > +// PR c++/118719 > +// { dg-do compile { target c++11 } } > +// { dg-options "" } > + > +int > +main () > +{ > + int a[] = { 42 }; > + auto [x] = a; // { dg-warning > "structured bindings only available with" "" { target c++14_down } } > + // { dg-message "declared here" > "" { target c++17_down } .-1 } > + [=] () { int b = x; (void) b; }; // { dg-warning "captured > structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } > + [&] () { int b = x; (void) b; }; // { dg-warning "captured > structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } > + [x] () { int b = x; (void) b; }; // { dg-warning "captured > structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } > + [&x] () { int b = x; (void) b; }; // { dg-warning "captured > structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } > + [x = x] () { int b = x; (void) b; }; // { dg-warning "lambda > capture initializers only available with" "" { target c++11_only } } > + [y = x] () { int b = y; (void) b; }; // { dg-warning "lambda > capture initializers only available with" "" { target c++11_only } } > + [y = x * 2] () { int b = y; (void) b; }; // { dg-warning "lambda capture > initializers only available with" "" { target c++11_only } } > +} > > Jakub > Marek
[RFC][PATCH v1 00/16] FMV refactor and ACLE compliance.
Hello, This patch series intends to changes the behavior of targets with TARGET_HAS_FMV_TARGET_ATTRIBUTE set to false (ie. uses target_version attributes for FMV as opposed to target attributes) to follow the behavior specified in the Arm C Language Extension. There is significant refactoring to FMV in the process. Notable changes include: * Introduction of the string_slice class. * Refactoring FMV mangling to always use the existing hook. * Changing the x86 mangling of dispatched symbols. * Adding new members to cgraph_function_version_info and cgraph_node. * Specifically, adding cgraph logic earlier in the C and C++ front ends than it was previously. * Changing the cgraph_function_version_info to be implicitly ordered. * Changing resolver creation for target_version to reuse the target_clones logic. * Only creating the resolver (in target_version semantics) when the default version is implemented. * Changing C++ symbol resolution for target_version semantics to only resolve default versions. * ie. changing the scope and signature of the FMV function set to be determined by default versions. I would appreciate overall feedback on these changes, and specific thoughts on the behavioral changes it makes to riscv FMV (as it also uses target_version semantics) and to the mangling change for x86 (see test changes for both) from relevant maintainers. These changes are targeting GCC 16 stage 1. Regression tested and bootstrapped for aarch64-none-linux-gnu and x86_64-unknown-linux-gnu. Cross compiled and the FMV tests ran for riscv and powerpc. Kind regards, Alfie Richards Alfie Richards (16): Add PowerPC FMV symbol tests. Add x86 FMV symbol tests Add string_slice class. Remove unnecessary `record` argument from maybe_version_functions. Update is_function_default_version to work with target_version. Change function versions to be implicitly ordered. Add version of make_attribute supporting string_slice. Add get_clone_versions function. Add assembler_name to cgraph_function_version_info. Add dispatcher_resolver_function and is_target_clone to cgraph_node. Add clone_identifier function. Refactor FMV name mangling. Remove unused target_clone parsing code. Change target_version semantics to follow ACLE specification. Support mixing of target_clones and target_version for aarch64. Remove FMV beta warning. gcc/attribs.cc| 79 -- gcc/attribs.h | 1 + gcc/c-family/c-attribs.cc | 4 +- gcc/c/c-decl.cc | 20 ++ gcc/cgraph.cc | 49 +++- gcc/cgraph.h | 31 ++- gcc/cgraphclones.cc | 16 +- gcc/cgraphunit.cc | 9 + gcc/config/aarch64/aarch64.cc | 233 -- gcc/config/i386/i386-features.cc | 123 + gcc/config/riscv/riscv.cc | 136 -- gcc/config/rs6000/rs6000.cc | 139 --- gcc/cp/call.cc| 8 + gcc/cp/class.cc | 13 +- gcc/cp/cp-gimplify.cc | 6 +- gcc/cp/cp-tree.h | 2 +- gcc/cp/decl.cc| 80 +- gcc/cp/typeck.cc | 8 + gcc/ipa.cc| 11 + gcc/multiple_target.cc| 220 ++--- gcc/testsuite/g++.target/aarch64/mv-1.C | 6 +- .../g++.target/aarch64/mv-and-mvc1.C | 38 +++ .../g++.target/aarch64/mv-and-mvc2.C | 29 +++ .../g++.target/aarch64/mv-and-mvc3.C | 41 +++ .../g++.target/aarch64/mv-and-mvc4.C | 38 +++ gcc/testsuite/g++.target/aarch64/mv-error1.C | 13 + gcc/testsuite/g++.target/aarch64/mv-error13.C | 13 + gcc/testsuite/g++.target/aarch64/mv-error2.C | 10 + gcc/testsuite/g++.target/aarch64/mv-error3.C | 13 + gcc/testsuite/g++.target/aarch64/mv-error7.C | 9 + gcc/testsuite/g++.target/aarch64/mv-error8.C | 21 ++ gcc/testsuite/g++.target/aarch64/mv-error9.C | 12 + gcc/testsuite/g++.target/aarch64/mv-pragma.C | 2 +- .../g++.target/aarch64/mv-symbols1.C | 2 +- .../g++.target/aarch64/mv-symbols10.C | 43 .../g++.target/aarch64/mv-symbols11.C | 27 ++ .../g++.target/aarch64/mv-symbols12.C | 18 ++ .../g++.target/aarch64/mv-symbols14.C | 16 ++ .../g++.target/aarch64/mv-symbols15.C | 16 ++ .../g++.target/aarch64/mv-symbols2.C | 14 +- .../g++.target/aarch64/mv-symbols3.C | 8 +- .../g++.target/aarch64/mv-symbols4.C | 8 +- .../g++.target/aarch64/mv-symbols5.C | 8 +- .../g++.target/aarch64/mv-symbols6.C | 23 ++ .../g++.target/aarch64/mv-symbols8.C | 48 .../g++.targe
[PATCH v1 04/16] Remove unnecessary `record` argument from maybe_version_functions.
The `record` argument in maybe_version_function was intended to allow controlling recording the relationship of versions. However, it only exercised this if both input functions were already marked as versioned, and this same logic is repeated in maybe_version_function itself so the argument is unnecessary. gcc/cp/ChangeLog: * class.cc (add_method): Remove argument. * cp-tree.h (maybe_version_functions): Ditto. * decl.cc (decls_match): Ditto. (maybe_version_functions): Ditto. --- gcc/cp/class.cc | 2 +- gcc/cp/cp-tree.h | 2 +- gcc/cp/decl.cc | 9 +++-- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc index f2f81a44718..a9a80d1b4be 100644 --- a/gcc/cp/class.cc +++ b/gcc/cp/class.cc @@ -1402,7 +1402,7 @@ add_method (tree type, tree method, bool via_using) /* If these are versions of the same function, process and move on. */ if (TREE_CODE (fn) == FUNCTION_DECL - && maybe_version_functions (method, fn, true)) + && maybe_version_functions (method, fn)) continue; if (DECL_INHERITED_CTOR (method)) diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index ec976928f5f..8eba8d455be 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7114,7 +7114,7 @@ extern void determine_local_discriminator (tree, tree = NULL_TREE); extern bool member_like_constrained_friend_p (tree); extern bool fns_correspond (tree, tree); extern int decls_match(tree, tree, bool = true); -extern bool maybe_version_functions (tree, tree, bool); +extern bool maybe_version_functions (tree, tree); extern bool validate_constexpr_redeclaration (tree, tree); extern bool merge_default_template_args (tree, tree, bool); extern tree duplicate_decls (tree, tree, diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index cf5e055e146..3b3b4481964 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -1215,9 +1215,7 @@ decls_match (tree newdecl, tree olddecl, bool record_versions /* = true */) && targetm.target_option.function_versions (newdecl, olddecl)) { if (record_versions) - maybe_version_functions (newdecl, olddecl, - (!DECL_FUNCTION_VERSIONED (newdecl) - || !DECL_FUNCTION_VERSIONED (olddecl))); + maybe_version_functions (newdecl, olddecl); return 0; } } @@ -1288,7 +1286,7 @@ maybe_mark_function_versioned (tree decl) If RECORD is set to true, record function versions. */ bool -maybe_version_functions (tree newdecl, tree olddecl, bool record) +maybe_version_functions (tree newdecl, tree olddecl) { if (!targetm.target_option.function_versions (newdecl, olddecl)) return false; @@ -1311,8 +1309,7 @@ maybe_version_functions (tree newdecl, tree olddecl, bool record) maybe_mark_function_versioned (newdecl); } - if (record) -cgraph_node::record_function_versions (olddecl, newdecl); + cgraph_node::record_function_versions (olddecl, newdecl); return true; }
[PATCH v1 07/16] Add version of make_attribute supporting string_slice.
gcc/ChangeLog: * attribs.cc (make_attribute): New function overload. * attribs.h (make_attribute): New function overload. --- gcc/attribs.cc | 19 ++- gcc/attribs.h | 1 + 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/gcc/attribs.cc b/gcc/attribs.cc index 5cf45491ada..cb25845715d 100644 --- a/gcc/attribs.cc +++ b/gcc/attribs.cc @@ -1090,7 +1090,24 @@ make_attribute (const char *name, const char *arg_name, tree chain) return attr; } - +/* Makes a function attribute of the form NAME (ARG_NAME) and chains + it to CHAIN. */ + +tree +make_attribute (string_slice name, string_slice arg_name, tree chain) +{ + tree attr_name; + tree attr_arg_name; + tree attr_args; + tree attr; + + attr_name = get_identifier_with_length (name.begin (), name.size ()); + attr_arg_name = build_string (arg_name.size (), arg_name.begin ()); + attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); + attr = tree_cons (attr_name, attr_args, chain); + return attr; +} + /* Common functions used for target clone support. */ /* Comparator function to be used in qsort routine to sort attribute diff --git a/gcc/attribs.h b/gcc/attribs.h index 4b946390f76..e7d592c5b41 100644 --- a/gcc/attribs.h +++ b/gcc/attribs.h @@ -46,6 +46,7 @@ extern tree get_attribute_name (const_tree); extern tree get_attribute_namespace (const_tree); extern void apply_tm_attr (tree, tree); extern tree make_attribute (const char *, const char *, tree); +extern tree make_attribute (string_slice, string_slice, tree); extern bool attribute_ignored_p (tree); extern bool attribute_ignored_p (const attribute_spec *const); extern bool any_nonignored_attribute_p (tree);
[PATCH v1 02/16] Add x86 FMV symbol tests
This is for testing the x86 mangling of FMV versioned function assembly names. gcc/testsuite/ChangeLog: * g++.target/i386/mv-symbols1.C: New test. * g++.target/i386/mv-symbols2.C: New test. * g++.target/i386/mv-symbols3.C: New test. * g++.target/i386/mv-symbols4.C: New test. * g++.target/i386/mv-symbols5.C: New test. * g++.target/i386/mvc-symbols1.C: New test. * g++.target/i386/mvc-symbols2.C: New test. * g++.target/i386/mvc-symbols3.C: New test. * g++.target/i386/mvc-symbols4.C: New test. --- gcc/testsuite/g++.target/i386/mv-symbols1.C | 68 gcc/testsuite/g++.target/i386/mv-symbols2.C | 56 gcc/testsuite/g++.target/i386/mv-symbols3.C | 44 + gcc/testsuite/g++.target/i386/mv-symbols4.C | 50 ++ gcc/testsuite/g++.target/i386/mv-symbols5.C | 56 gcc/testsuite/g++.target/i386/mvc-symbols1.C | 44 + gcc/testsuite/g++.target/i386/mvc-symbols2.C | 29 + gcc/testsuite/g++.target/i386/mvc-symbols3.C | 35 ++ gcc/testsuite/g++.target/i386/mvc-symbols4.C | 23 +++ 9 files changed, 405 insertions(+) create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols1.C create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols2.C create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols3.C create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols4.C create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols5.C create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols1.C create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols2.C create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols3.C create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols4.C diff --git a/gcc/testsuite/g++.target/i386/mv-symbols1.C b/gcc/testsuite/g++.target/i386/mv-symbols1.C new file mode 100644 index 000..1290299aea5 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/mv-symbols1.C @@ -0,0 +1,68 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O0" } */ + +__attribute__((target("default"))) +int foo () +{ + return 1; +} + +__attribute__((target("arch=slm"))) +int foo () +{ + return 3; +} + +__attribute__((target("sse4.2"))) +int foo () +{ + return 5; +} + +__attribute__((target("sse4.2"))) +int foo (int) +{ + return 6; +} + +__attribute__((target("arch=slm"))) +int foo (int) +{ + return 4; +} + +__attribute__((target("default"))) +int foo (int) +{ + return 2; +} + +int bar() +{ + return foo (); +} + +int bar(int x) +{ + return foo (x); +} + +/* When updating any of the symbol names in these tests, make sure to also + update any tests for their absence in mvc-symbolsN.C */ + +/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.arch_slm:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.sse4.2:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\tcall\t_Z7_Z3foovv\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3foovv, @gnu_indirect_function\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.set\t_Z7_Z3foovv,_Z3foov\.resolver\n" 1 } } */ + +/* { dg-final { scan-assembler-times "\n_Z3fooi:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.arch_slm:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.sse4.2:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\tcall\t_Z7_Z3fooii\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3fooii, @gnu_indirect_function\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.set\t_Z7_Z3fooii,_Z3fooi\.resolver\n" 1 } } */ diff --git a/gcc/testsuite/g++.target/i386/mv-symbols2.C b/gcc/testsuite/g++.target/i386/mv-symbols2.C new file mode 100644 index 000..8b75565d78d --- /dev/null +++ b/gcc/testsuite/g++.target/i386/mv-symbols2.C @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O0" } */ + +__attribute__((target("default"))) +int foo () +{ + return 1; +} + +__attribute__((target("arch=slm"))) +int foo () +{ + return 3; +} + +__attribute__((target("sse4.2"))) +int foo () +{ + return 5; +} + +__attribute__((target("sse4.2"))) +int foo (int) +{ + return 6; +} + +__attribute__((target("arch=slm"))) +int foo (int) +{ + return 4; +} + +__attribute__((target("default"))) +int foo (int) +{ + return 2; +} + +/* When updating any of the symbol names in these tests, make sure to also + update any tests for their absence in mvc-symbolsN.C */ + +/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.arch_slm:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.sse4.2:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 0 } } */ +/* { dg-final { s
[PATCH v1 11/16] Add clone_identifier function.
This is similar to clone_function_name and its siblings but takes an identifier tree node rather than a function declaration. This is to be used in conjunction with the identifier node stored in cgraph_function_version_info::assembler_name to mangle FMV functions in later patches. gcc/ChangeLog: * cgraph.h (clone_identifier): New function. * cgraphclones.cc (clone_identifier): New function. clone_function_name: Refactored to use clone_identifier. --- gcc/cgraph.h| 1 + gcc/cgraphclones.cc | 16 ++-- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 9561bce2c33..a4eff14ddf6 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -2627,6 +2627,7 @@ tree clone_function_name (const char *name, const char *suffix, tree clone_function_name (tree decl, const char *suffix, unsigned long number); tree clone_function_name (tree decl, const char *suffix); +tree clone_identifier (tree decl, const char *suffix); void tree_function_versioning (tree, tree, vec *, ipa_param_adjustments *, diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc index 5332a433317..6b650849a63 100644 --- a/gcc/cgraphclones.cc +++ b/gcc/cgraphclones.cc @@ -557,6 +557,14 @@ clone_function_name (tree decl, const char *suffix) /* For consistency this needs to behave the same way as ASM_FORMAT_PRIVATE_NAME does, but without the final number suffix. */ + return clone_identifier (identifier, suffix); +} + +/* Return a new clone of ID ending with the string SUFFIX. */ + +tree +clone_identifier (tree id, const char *suffix) +{ char *separator = XALLOCAVEC (char, 2); separator[0] = symbol_table::symbol_suffix_separator (); separator[1] = 0; @@ -565,15 +573,11 @@ clone_function_name (tree decl, const char *suffix) #else const char *prefix = ""; #endif - char *result = ACONCAT ((prefix, - IDENTIFIER_POINTER (identifier), - separator, - suffix, - (char*)0)); + char *result = ACONCAT ( +(prefix, IDENTIFIER_POINTER (id), separator, suffix, (char *) 0)); return get_identifier (result); } - /* Create callgraph node clone with new declaration. The actual body will be copied later at compilation stage. The name of the new clone will be constructed from the name of the original node, SUFFIX and NUM_SUFFIX.
[PATCH v1 13/16] Remove unused target_clone parsing code.
This removes the target_clone parsing code that was replaced with get_clone_versions. gcc/ChangeLog: * multiple_target.cc (get_attr_str): Removed. (separate_attrs): Removed. * tree.cc (get_target_clone_attr_len): Removed. * tree.h (get_target_clone_attr_len): Removed. --- gcc/multiple_target.cc | 61 -- gcc/tree.cc| 26 -- gcc/tree.h | 1 - 3 files changed, 88 deletions(-) diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc index 6aeceadbfd1..4f748a81f9b 100644 --- a/gcc/multiple_target.cc +++ b/gcc/multiple_target.cc @@ -177,67 +177,6 @@ create_dispatcher_calls (struct cgraph_node *node) } } -/* Create string with attributes separated by TARGET_CLONES_ATTR_SEPARATOR. - Return number of attributes. */ - -static int -get_attr_str (tree arglist, char *attr_str) -{ - tree arg; - size_t str_len_sum = 0; - int argnum = 0; - - for (arg = arglist; arg; arg = TREE_CHAIN (arg)) -{ - const char *str = TREE_STRING_POINTER (TREE_VALUE (arg)); - size_t len = strlen (str); - for (const char *p = strchr (str, TARGET_CLONES_ATTR_SEPARATOR); - p; - p = strchr (p + 1, TARGET_CLONES_ATTR_SEPARATOR)) - argnum++; - memcpy (attr_str + str_len_sum, str, len); - attr_str[str_len_sum + len] - = TREE_CHAIN (arg) ? TARGET_CLONES_ATTR_SEPARATOR : '\0'; - str_len_sum += len + 1; - argnum++; -} - return argnum; -} - -/* Return number of attributes separated by TARGET_CLONES_ATTR_SEPARATOR - and put them into ARGS. - If there is no DEFAULT attribute return -1. - If there is an empty string in attribute return -2. - If there are multiple DEFAULT attributes return -3. - */ - -static int -separate_attrs (char *attr_str, char **attrs, int attrnum) -{ - int i = 0; - int default_count = 0; - static const char separator_str[] = { TARGET_CLONES_ATTR_SEPARATOR, 0 }; - - for (char *attr = strtok (attr_str, separator_str); - attr != NULL; attr = strtok (NULL, separator_str)) -{ - if (strcmp (attr, "default") == 0) - { - default_count++; - continue; - } - attrs[i++] = attr; -} - if (default_count == 0) -return -1; - else if (default_count > 1) -return -3; - else if (i + default_count < attrnum) -return -2; - - return i; -} - /* Creates target clone of NODE. */ static cgraph_node * diff --git a/gcc/tree.cc b/gcc/tree.cc index 346522d01c0..9856f190367 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -15273,32 +15273,6 @@ get_attr_nonstring_decl (tree expr, tree *ref) return NULL_TREE; } -/* Return length of attribute names string, - if arglist chain > 1, -1 otherwise. */ - -int -get_target_clone_attr_len (tree arglist) -{ - tree arg; - int str_len_sum = 0; - int argnum = 0; - - for (arg = arglist; arg; arg = TREE_CHAIN (arg)) -{ - const char *str = TREE_STRING_POINTER (TREE_VALUE (arg)); - size_t len = strlen (str); - str_len_sum += len + 1; - for (const char *p = strchr (str, TARGET_CLONES_ATTR_SEPARATOR); - p; - p = strchr (p + 1, TARGET_CLONES_ATTR_SEPARATOR)) - argnum++; - argnum++; -} - if (argnum <= 1) -return -1; - return str_len_sum; -} - /* Returns an auto_vec of string_slices containing the version strings from ARGLIST. DEFAULT_COUNT is incremented for each default version found. */ diff --git a/gcc/tree.h b/gcc/tree.h index aea1cf078a0..df64d9cc847 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -7035,7 +7035,6 @@ extern unsigned fndecl_dealloc_argno (tree); object or pointer. Otherwise return null. */ extern tree get_attr_nonstring_decl (tree, tree * = NULL); -extern int get_target_clone_attr_len (tree); auto_vec get_clone_versions (const tree, int * = NULL); auto_vec
[PATCH] lto/113207 - fix free_lang_data_in_type
When we process function types we strip volatile and const qualifiers after building a simplified type variant (which preserves those). The qualified type handling of both isn't really compatible, so avoid bad interaction by swapping this, first dropping const/volatile qualifiers and then building the simplified type thereof. LTO bootstrapped on x86_64-unknown-linux-gnu (with extra checking as indicated in the PR), testing in progress. I'll push this unless you have any further comments and queue the extra checking for stage1. PR lto/113207 * ipa-free-lang-data.cc (free_lang_data_in_type): First drop const/volatile qualifiers from function argument types, then build a simplified type. * gcc.dg/pr113207.c: New testcase. --- gcc/ipa-free-lang-data.cc | 3 +-- gcc/testsuite/gcc.dg/pr113207.c | 10 ++ 2 files changed, 11 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr113207.c diff --git a/gcc/ipa-free-lang-data.cc b/gcc/ipa-free-lang-data.cc index be96d2928d7..a865332ddf1 100644 --- a/gcc/ipa-free-lang-data.cc +++ b/gcc/ipa-free-lang-data.cc @@ -441,9 +441,7 @@ free_lang_data_in_type (tree type, class free_lang_data_d *fld) different front ends. */ for (tree p = TYPE_ARG_TYPES (type); p; p = TREE_CHAIN (p)) { - TREE_VALUE (p) = fld_simplified_type (TREE_VALUE (p), fld); tree arg_type = TREE_VALUE (p); - if (TYPE_READONLY (arg_type) || TYPE_VOLATILE (arg_type)) { int quals = TYPE_QUALS (arg_type) @@ -453,6 +451,7 @@ free_lang_data_in_type (tree type, class free_lang_data_d *fld) if (!fld->pset.add (TREE_VALUE (p))) free_lang_data_in_type (TREE_VALUE (p), fld); } + TREE_VALUE (p) = fld_simplified_type (TREE_VALUE (p), fld); /* C++ FE uses TREE_PURPOSE to store initial values. */ TREE_PURPOSE (p) = NULL; } diff --git a/gcc/testsuite/gcc.dg/pr113207.c b/gcc/testsuite/gcc.dg/pr113207.c new file mode 100644 index 000..81f53d8fcc2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr113207.c @@ -0,0 +1,10 @@ +/* { dg-compile } */ +/* { dg-require-effective-target lto } */ +/* { dg-options "-flto -fchecking" } */ + +typedef struct cl_lispunion *cl_object; +struct cl_lispunion {}; +cl_object cl_error() __attribute__((noreturn)); +volatile cl_object cl_coerce_value0; +void cl_coerce() { cl_error(); } +void L66safe_canonical_type(cl_object volatile); -- 2.43.0
Re: [PATCH v1 06/16] Change function versions to be implicitly ordered.
Alfie Richards writes: > This changes function version structures to maintain the default version > as the first declaration in the linked data structures by giving priority > to the set containing the default when constructing the structure. > > This allows for removing logic for moving the default to the first > position which was duplicated across target specific code and enables > easier reasoning about function sets when checking for a default. > > gcc/ChangeLog: > > * cgraph.cc (cgraph_node::record_function_versions): Update to > implicitly keep default first. > * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher): > Remove reordering. > * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher): > Remove reordering. > * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher): > Remove reordering. > * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher): > Remove reordering. Thanks, this is a really nice clean-up. I see that it's already the documented expectation: /* Chains all the semantically identical function versions. The first function in this chain is the version_info node of the default function. */ cgraph_function_version_info *prev; So in a sense the patch isn't changing the structures. It's simply making the current expectation always hold, rather than hold after a certain point. Some comments below. > --- > gcc/cgraph.cc| 39 +++--- > gcc/config/aarch64/aarch64.cc| 37 +++- > gcc/config/i386/i386-features.cc | 33 - > gcc/config/riscv/riscv.cc| 41 +++- > gcc/config/rs6000/rs6000.cc | 35 +-- > 5 files changed, 58 insertions(+), 127 deletions(-) > > diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc > index d0b19ad850e..1ea38d16e56 100644 > --- a/gcc/cgraph.cc > +++ b/gcc/cgraph.cc > @@ -236,37 +236,58 @@ cgraph_node::delete_function_version_by_decl (tree decl) > void > cgraph_node::record_function_versions (tree decl1, tree decl2) > { > - cgraph_node *decl1_node = cgraph_node::get_create (decl1); > - cgraph_node *decl2_node = cgraph_node::get_create (decl2); > + cgraph_node *decl1_node; > + cgraph_node *decl2_node; >cgraph_function_version_info *decl1_v = NULL; >cgraph_function_version_info *decl2_v = NULL; >cgraph_function_version_info *before; >cgraph_function_version_info *after; > + cgraph_function_version_info *temp_node; > + > + decl1_node = cgraph_node::get_create (decl1); > + decl2_node = cgraph_node::get_create (decl2); > >gcc_assert (decl1_node != NULL && decl2_node != NULL); >decl1_v = decl1_node->function_version (); >decl2_v = decl2_node->function_version (); > > - if (decl1_v != NULL && decl2_v != NULL) > -return; > - Could you go into more detail about why this return needs to be removed? It seems like the assumption was that, if the two decls were already versioned, they were already versions of the same thing. For example, we wouldn't create a set of 4 versions and a set of 2 versions and only then merge them into a single set of 6 versions. Is that not the case with the new scheme? If we could keep the return, then we could add: if (is_function_default_version (decl2) || (!decl1_v && !is_function_default_version (decl1))) { std::swap (decl1, decl2); std::swap (decl1_v, decl2_v); } after it and then proceed as before, on the basis that (a) decl1_v and decl2_v are individually canonical and (b) after the swap, any default must be decl1 or earlier in decl1_v. That would avoid a bit of extra pointer chasing. >if (decl1_v == NULL) > decl1_v = decl1_node->insert_new_function_version (); > >if (decl2_v == NULL) > decl2_v = decl2_node->insert_new_function_version (); > > - /* Chain decl2_v and decl1_v. All semantically identical versions > - will be chained together. */ > + gcc_assert (decl1_v); > + gcc_assert (decl2_v); > >before = decl1_v; >after = decl2_v; > > + /* Go to first after node. */ > + while (after->prev != NULL) > +after = after->prev; > + > + /* Go to first before node. */ > + while (before->prev != NULL) > +before = before->prev; > + > + /* These are already recorded as versions. */ > + if (before == after) > +return; > + > + /* Possibly swap to make sure the default node stays at the front. */ > + if (is_function_default_version (after->this_node->decl)) > +{ > + temp_node = after; > + after = before; > + before = temp_node; > +} > + > + /* Go to last node of before. */ >while (before->next != NULL) > before = before->next; > > - while (after->prev != NULL) > -after= after->prev; > + /* Chain decl2_v and decl1_v. */ > >before->next = after; >after->prev = before; > [...] > diff
Re: [PATCH v1 07/16] Add version of make_attribute supporting string_slice.
Alfie Richards writes: > gcc/ChangeLog: > > * attribs.cc (make_attribute): New function overload. > * attribs.h (make_attribute): New function overload. > --- > gcc/attribs.cc | 19 ++- > gcc/attribs.h | 1 + > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/gcc/attribs.cc b/gcc/attribs.cc > index 5cf45491ada..cb25845715d 100644 > --- a/gcc/attribs.cc > +++ b/gcc/attribs.cc > @@ -1090,7 +1090,24 @@ make_attribute (const char *name, const char > *arg_name, tree chain) >return attr; > } > > - > +/* Makes a function attribute of the form NAME (ARG_NAME) and chains > + it to CHAIN. */ > + > +tree > +make_attribute (string_slice name, string_slice arg_name, tree chain) > +{ > + tree attr_name; > + tree attr_arg_name; > + tree attr_args; > + tree attr; > + > + attr_name = get_identifier_with_length (name.begin (), name.size ()); > + attr_arg_name = build_string (arg_name.size (), arg_name.begin ()); > + attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); > + attr = tree_cons (attr_name, attr_args, chain); > + return attr; > +} > + It seems to be more usual in new code to prefer initialisation over assignment where possible, so: tree attr_name = get_identifier_with_length (name.begin (), name.size ()); tree attr_arg_name = build_string (arg_name.size (), arg_name.begin ()); tree attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); tree attr = tree_cons (attr_name, attr_args, chain); return attr; OK for GCC 16 with that change, thanks. Richard > /* Common functions used for target clone support. */ > > /* Comparator function to be used in qsort routine to sort attribute > diff --git a/gcc/attribs.h b/gcc/attribs.h > index 4b946390f76..e7d592c5b41 100644 > --- a/gcc/attribs.h > +++ b/gcc/attribs.h > @@ -46,6 +46,7 @@ extern tree get_attribute_name (const_tree); > extern tree get_attribute_namespace (const_tree); > extern void apply_tm_attr (tree, tree); > extern tree make_attribute (const char *, const char *, tree); > +extern tree make_attribute (string_slice, string_slice, tree); > extern bool attribute_ignored_p (tree); > extern bool attribute_ignored_p (const attribute_spec *const); > extern bool any_nonignored_attribute_p (tree);
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
> > I don't think we should add a new target hook unless it's providing > > genuinely new information about the target. Hooking into the RA to > > brute-force a particular heuristic makes it harder to improve the RA > > in future. > > > > There are already hooks that provide the costs of the relevant operations, > > so I think we should concentrate on using those to get good results for > > both Power and x86. > > It isn't just about Power and x86. > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > Author: Surya Kumari Jangala > Date: Tue Jun 25 08:37:49 2024 -0500 > > ira: Scale save/restore costs of callee save registers with block > frequency > > caused regressions on many targets, including aarch64: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116028 > > I don't understand why frequency can be used to scale the > cost. Adding such absolute hook does not seem to make sense to me, expecially if the problem happens on multiple targets. Let me see if I understnad what is going on. Checking one PR the patch broke testcase: void foo (void); void bar (void); int test (int a) { int r; if (r = -a) foo (); else bar (); return r; } Here the cost model should decide whether extra spill in prologe/epilogue is chaper than extra spill around calls to foo() and bar(). REG_FREQ_FROM_BB should return values in range 1REG_FREQ_MAX which expresses relative frequencies of BBs compressed to 0...REG_FREQ_MAX. Avoiding 0 is to prevent cost model from thinking that spilling is completely free. In the case above we should have entry_block_freq = REG_FREQ_MAX (so the epilogue/prologue cost is scaled by 1000) and frequencies of foo/bar to have something like REG_FREQ_MAX * p and REG_FREQ_MAX * (1-p) where p is the probability of the conditional. So I guess IRA ends up comparing - spill_cost * REG_FREQ_MAX for using callee saved register - spill_cost * (p + 1-p) * REG_FREQ_MAX for using caller saved register. Which correctly represnets that in both case we will end up executing same amount of spill code. If we want it to choose variant with fewer static spill count, we could add 1 to the final costs for every spill instruction which will biass the cost model that way... What would make sense would be to express to IRA that moves used in prologue/epilogue are slightly chaper on x86 then reuglar moves, since we can use push/pop pair which encode shorter than stack frame adjustment + regular store/load used to spill around call. Honza > > -- > H.J.
[PATCH v2] c++: Properly detect calls to digest_init in build_vec_init [PR114619]
Hi Jason, On 16 Jan 2025, at 23:28, Jason Merrill wrote: > On 10/19/24 5:09 AM, Simon Martin wrote: >> We currently ICE in checking mode with cxx_dialect < 17 on the >> following >> valid code >> >> === cut here === >> struct X { >>X(const X&) {} >> }; >> extern X x; >> void foo () { >>new X[1]{x}; >> } >> === cut here === >> >> The problem is that cp_gimplify_expr gcc_checking_asserts that a >> TARGET_EXPR is not TARGET_EXPR_ELIDING_P (or cannot be elided), while >> in >> this case with cxx_dialect < 17, it is TARGET_EXPR_ELIDING_P but we >> have >> not even tried to elide. >> >> This patch relaxes that gcc_checking_assert to not fail when using >> cxx_dialect < 17 and -fno-elide-constructors (I considered being more >> clever at setting TARGET_EXPR_ELIDING_P appropriately but it looks >> more >> risky and not worth the extra complexity for a checking assert). > > The problem is that in that case we end up with two copy constructor > calls instead of one: one built in massage_init_elt, and the other in > expand_default_init. The result of the first copy is marked > TARGET_EXPR_ELIDING_P, so when we try to pass it to the second copy we > hit the assert. I think the assert is catching a real bug: even with > -fno-elide-constructors we should only copy once, not twice. That’s right, thanks for pointing me in the right direction. > This seems to be because 'digested' has the wrong value in > build_vec_init; we did just call digest_init in build_new_1, but > build_vec_init doesn't understand that. The test to determine whether digest_init has been called is indeed incorrect, in that it will work if BASE is a reference to the array but not if it’s a pointer to its first element. The attached updated patch fixes this. Successfully tested on x86_64-pc-linux-gnu. OK for trunk? Simon From 578ac1a022ff039cdca45cdfca31bdfe8b571b79 Mon Sep 17 00:00:00 2001 From: Simon Martin Date: Mon, 3 Feb 2025 11:43:14 +0100 Subject: [PATCH] c++: Properly detect calls to digest_init in build_vec_init [PR114619] We currently ICE in checking mode with cxx_dialect < 17 on the following valid code === cut here === struct X { X(const X&) {} }; extern X x; void foo () { new X[1]{x}; } === cut here === We trip on a gcc_checking_assert in cp_gimplify_expr due to a TARGET_EXPR that is not TARGET_EXPR_ELIDING_P. As pointed by Jason, the problem is that build_vec_init does not recognize that digest_init has been called, and we end up calling the copy constructor twice. This happens because the detection in build_vec_init assumes that BASE is a reference to the array, while it's a pointer to its first element here. This patch makes sure that the detection works in both cases. Successfully tested on x86_64-pc-linux-gnu. PR c++/114619 gcc/cp/ChangeLog: * init.cc (build_vec_init): Properly determine whether digest_init has been called. gcc/testsuite/ChangeLog: * g++.dg/init/no-elide4.C: New test. --- gcc/cp/init.cc| 3 ++- gcc/testsuite/g++.dg/init/no-elide4.C | 11 +++ 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/init/no-elide4.C diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc index 3ab7f96335c..613775c5a7c 100644 --- a/gcc/cp/init.cc +++ b/gcc/cp/init.cc @@ -4786,7 +4786,8 @@ build_vec_init (tree base, tree maxindex, tree init, tree field, elt; /* If the constructor already has the array type, it's been through digest_init, so we shouldn't try to do anything more. */ - bool digested = same_type_p (atype, TREE_TYPE (init)); + bool digested = (TREE_CODE (TREE_TYPE (init)) == ARRAY_TYPE + && same_type_p (type, TREE_TYPE (TREE_TYPE (init; from_array = 0; if (length_check) diff --git a/gcc/testsuite/g++.dg/init/no-elide4.C b/gcc/testsuite/g++.dg/init/no-elide4.C new file mode 100644 index 000..9377d9f0161 --- /dev/null +++ b/gcc/testsuite/g++.dg/init/no-elide4.C @@ -0,0 +1,11 @@ +// PR c++/114619 +// { dg-do "compile" { target c++11 } } +// { dg-options "-fno-elide-constructors" } + +struct X { + X(const X&) {} +}; +extern X x; +void foo () { + new X[1]{x}; +} -- 2.44.0
[PATCH v1 05/16] Update is_function_default_version to work with target_version.
Notably this respects target_version semantics where an unannotated function can be the default version. gcc/ChangeLog: * attribs.cc (is_function_default_version): Add target_version logic. --- gcc/attribs.cc | 28 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/gcc/attribs.cc b/gcc/attribs.cc index 56dd18c2fa8..5cf45491ada 100644 --- a/gcc/attribs.cc +++ b/gcc/attribs.cc @@ -1279,18 +1279,30 @@ make_dispatcher_decl (const tree decl) return func_decl; } -/* Returns true if DECL is multi-versioned using the target attribute, and this - is the default version. This function can only be used for targets that do - not support the "target_version" attribute. */ +/* Returns true if DECL a multiversioned default. + With the target attribute semantics, returns true if the function is marked + as default with the target version. + With the target_version attribute semantics, returns true if the function + is either not annotated, or annotated as default. */ bool is_function_default_version (const tree decl) { - if (TREE_CODE (decl) != FUNCTION_DECL - || !DECL_FUNCTION_VERSIONED (decl)) -return false; - tree attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl)); - gcc_assert (attr); + tree attr; + if (TARGET_HAS_FMV_TARGET_ATTRIBUTE) +{ + if (!DECL_FUNCTION_VERSIONED (decl)) + return false; + attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl)); + if (!attr) + return false; +} + else +{ + attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl)); + if (!attr) + return true; +} attr = TREE_VALUE (TREE_VALUE (attr)); return (TREE_CODE (attr) == STRING_CST && strcmp (TREE_STRING_POINTER (attr), "default") == 0);
[PATCH v1 06/16] Change function versions to be implicitly ordered.
This changes function version structures to maintain the default version as the first declaration in the linked data structures by giving priority to the set containing the default when constructing the structure. This allows for removing logic for moving the default to the first position which was duplicated across target specific code and enables easier reasoning about function sets when checking for a default. gcc/ChangeLog: * cgraph.cc (cgraph_node::record_function_versions): Update to implicitly keep default first. * config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher): Remove reordering. * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher): Remove reordering. * config/riscv/riscv.cc (riscv_get_function_versions_dispatcher): Remove reordering. * config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher): Remove reordering. --- gcc/cgraph.cc| 39 +++--- gcc/config/aarch64/aarch64.cc| 37 +++- gcc/config/i386/i386-features.cc | 33 - gcc/config/riscv/riscv.cc| 41 +++- gcc/config/rs6000/rs6000.cc | 35 +-- 5 files changed, 58 insertions(+), 127 deletions(-) diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index d0b19ad850e..1ea38d16e56 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -236,37 +236,58 @@ cgraph_node::delete_function_version_by_decl (tree decl) void cgraph_node::record_function_versions (tree decl1, tree decl2) { - cgraph_node *decl1_node = cgraph_node::get_create (decl1); - cgraph_node *decl2_node = cgraph_node::get_create (decl2); + cgraph_node *decl1_node; + cgraph_node *decl2_node; cgraph_function_version_info *decl1_v = NULL; cgraph_function_version_info *decl2_v = NULL; cgraph_function_version_info *before; cgraph_function_version_info *after; + cgraph_function_version_info *temp_node; + + decl1_node = cgraph_node::get_create (decl1); + decl2_node = cgraph_node::get_create (decl2); gcc_assert (decl1_node != NULL && decl2_node != NULL); decl1_v = decl1_node->function_version (); decl2_v = decl2_node->function_version (); - if (decl1_v != NULL && decl2_v != NULL) -return; - if (decl1_v == NULL) decl1_v = decl1_node->insert_new_function_version (); if (decl2_v == NULL) decl2_v = decl2_node->insert_new_function_version (); - /* Chain decl2_v and decl1_v. All semantically identical versions - will be chained together. */ + gcc_assert (decl1_v); + gcc_assert (decl2_v); before = decl1_v; after = decl2_v; + /* Go to first after node. */ + while (after->prev != NULL) +after = after->prev; + + /* Go to first before node. */ + while (before->prev != NULL) +before = before->prev; + + /* These are already recorded as versions. */ + if (before == after) +return; + + /* Possibly swap to make sure the default node stays at the front. */ + if (is_function_default_version (after->this_node->decl)) +{ + temp_node = after; + after = before; + before = temp_node; +} + + /* Go to last node of before. */ while (before->next != NULL) before = before->next; - while (after->prev != NULL) -after= after->prev; + /* Chain decl2_v and decl1_v. */ before->next = after; after->prev = before; diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index be99137b052..15dd7dda48a 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -20630,7 +20630,6 @@ aarch64_get_function_versions_dispatcher (void *decl) struct cgraph_node *node = NULL; struct cgraph_node *default_node = NULL; struct cgraph_function_version_info *node_v = NULL; - struct cgraph_function_version_info *first_v = NULL; tree dispatch_decl = NULL; @@ -20647,37 +20646,17 @@ aarch64_get_function_versions_dispatcher (void *decl) if (node_v->dispatcher_resolver != NULL) return node_v->dispatcher_resolver; - /* Find the default version and make it the first node. */ - first_v = node_v; - /* Go to the beginning of the chain. */ - while (first_v->prev != NULL) -first_v = first_v->prev; - default_version_info = first_v; - while (default_version_info != NULL) -{ - if (get_feature_mask_for_version - (default_version_info->this_node->decl) == 0ULL) - break; - default_version_info = default_version_info->next; -} - - /* If there is no default node, just return NULL. */ - if (default_version_info == NULL) -return NULL; - - /* Make default info the first node. */ - if (first_v != default_version_info) -{ - default_version_info->prev->next = default_version_info->next; - if (default_version_info->next) - default_version_info->next->prev = default_version_info->prev; - first_v->prev = default_version_info; -
[PATCH v1 09/16] Add assembler_name to cgraph_function_version_info.
This adds the assembler_name member to cgraph_function_version_info to store the base assembler name for the function to be mangled. This is used in later patches for refactoring FMV mangling. gcc/c/ChangeLog: * c-decl.cc (start_decl): Record assembler_name. (start_function): Record assembler_name. gcc/ChangeLog: * cgraph.cc (cgraph_node::record_function_versions): Record assembler_name. * cgraph.h (struct cgraph_function_version_info): Add assembler_name. gcc/cp/ChangeLog: * decl.cc (maybe_mark_function_versioned): Record assember_name. (start_decl): Record assembler_name. (start_preparsed_function): Record assembler_name. --- gcc/c/c-decl.cc | 20 gcc/cgraph.cc | 10 -- gcc/cgraph.h| 3 +++ gcc/cp/decl.cc | 34 ++ 4 files changed, 65 insertions(+), 2 deletions(-) diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc index 0dcbae9b26f..daa19f360e6 100644 --- a/gcc/c/c-decl.cc +++ b/gcc/c/c-decl.cc @@ -5762,6 +5762,16 @@ start_decl (struct c_declarator *declarator, struct c_declspecs *declspecs, && VAR_OR_FUNCTION_DECL_P (decl)) objc_check_global_decl (decl); + /* Store the base assembler name for mangling later. */ + if (TREE_CODE (decl) == FUNCTION_DECL + && lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl))) +{ + cgraph_node *node = cgraph_node::get_create (decl); + if (!node->function_version ()) + node->insert_new_function_version (); + node->function_version ()->assembler_name = DECL_ASSEMBLER_NAME (decl); +} + /* Add this decl to the current scope. TEM may equal DECL or it may be a previous decl of the same name. */ if (do_push) @@ -10863,6 +10873,16 @@ start_function (struct c_declspecs *declspecs, struct c_declarator *declarator, current_function_decl = pushdecl (decl1); + /* Store the base assembler name for mangling later. */ + if (TREE_CODE (decl1) == FUNCTION_DECL + && lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl1))) +{ + cgraph_node *node = cgraph_node::get_create (decl1); + if (!node->function_version ()) + node->insert_new_function_version (); + node->function_version ()->assembler_name = DECL_ASSEMBLER_NAME (decl1); +} + if (tree access = build_attr_access_from_parms (parms, false)) decl_attributes (¤t_function_decl, access, ATTR_FLAG_INTERNAL, old_decl); diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index 1ea38d16e56..c2038be4671 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -252,10 +252,16 @@ cgraph_node::record_function_versions (tree decl1, tree decl2) decl2_v = decl2_node->function_version (); if (decl1_v == NULL) -decl1_v = decl1_node->insert_new_function_version (); +{ + decl1_v = decl1_node->insert_new_function_version (); + decl1_v->assembler_name = DECL_ASSEMBLER_NAME (decl1); +} if (decl2_v == NULL) -decl2_v = decl2_node->insert_new_function_version (); +{ + decl2_v = decl2_node->insert_new_function_version (); + decl2_v->assembler_name = DECL_ASSEMBLER_NAME (decl2); +} gcc_assert (decl1_v); gcc_assert (decl2_v); diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 065fcc742e8..d9177364b7a 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -856,6 +856,9 @@ struct GTY((for_user)) cgraph_function_version_info { dispatcher. The dispatcher decl is an alias to the resolver function decl. */ tree dispatcher_resolver; + + /* The assmbly name of the function set before version mangling. */ + tree assembler_name; }; #define DEFCIFCODE(code, type, string) CIF_ ## code, diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 3b3b4481964..fdef98f8062 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -1273,6 +1273,12 @@ maybe_mark_function_versioned (tree decl) { if (!DECL_FUNCTION_VERSIONED (decl)) { + cgraph_node *node = cgraph_node::get_create (decl); + if (!node->function_version ()) + node->insert_new_function_version (); + if (!node->function_version ()->assembler_name) + node->function_version ()->assembler_name = DECL_ASSEMBLER_NAME (decl); + DECL_FUNCTION_VERSIONED (decl) = 1; /* If DECL_ASSEMBLER_NAME has already been set, re-mangle to include the version marker. */ @@ -6155,6 +6161,20 @@ start_decl (const cp_declarator *declarator, was_public = TREE_PUBLIC (decl); + /* Set the assembler string for any versioned function. */ + if (TREE_CODE (decl) == FUNCTION_DECL + && (lookup_attribute (TARGET_HAS_FMV_TARGET_ATTRIBUTE ? "target" + : "target_version", + DECL_ATTRIBUTES (decl)) + || lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl +{ + cgraph_node *node = cgraph_node::get_create (decl); + if (!node->function_version ()) + node->insert_new_function_version (); + if (!node->function_version ()->assembler_name) + node->function_version ()->as
Re: [PATCH v1 05/16] Update is_function_default_version to work with target_version.
Alfie Richards writes: > Notably this respects target_version semantics where an unannotated > function can be the default version. > > gcc/ChangeLog: > > * attribs.cc (is_function_default_version): Add target_version logic. Generally looks good to me, but: > --- > gcc/attribs.cc | 28 > 1 file changed, 20 insertions(+), 8 deletions(-) > > diff --git a/gcc/attribs.cc b/gcc/attribs.cc > index 56dd18c2fa8..5cf45491ada 100644 > --- a/gcc/attribs.cc > +++ b/gcc/attribs.cc > @@ -1279,18 +1279,30 @@ make_dispatcher_decl (const tree decl) >return func_decl; > } > > -/* Returns true if DECL is multi-versioned using the target attribute, and > this > - is the default version. This function can only be used for targets that > do > - not support the "target_version" attribute. */ > +/* Returns true if DECL a multiversioned default. > + With the target attribute semantics, returns true if the function is > marked > + as default with the target version. > + With the target_version attribute semantics, returns true if the function > + is either not annotated, or annotated as default. */ > > bool > is_function_default_version (const tree decl) > { > - if (TREE_CODE (decl) != FUNCTION_DECL > - || !DECL_FUNCTION_VERSIONED (decl)) > -return false; > - tree attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl)); > - gcc_assert (attr); It might be worth either preserving the FUNCTION_DECL test or turning it into an assert. With that change... > + tree attr; > + if (TARGET_HAS_FMV_TARGET_ATTRIBUTE) > +{ > + if (!DECL_FUNCTION_VERSIONED (decl)) > + return false; > + attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl)); > + if (!attr) > + return false; ...I suppose we should also preserve the original assert here, unless there's a specific reason not to. Thanks, Richard > +} > + else > +{ > + attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl)); > + if (!attr) > + return true; > +} >attr = TREE_VALUE (TREE_VALUE (attr)); >return (TREE_CODE (attr) == STRING_CST > && strcmp (TREE_STRING_POINTER (attr), "default") == 0);
[committed] hppa: Revise various millicode insn patterns to use match_operand
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to trunk. Dave --- hppa: Revise various millicode insn patterns to use match_operand LRA does not correctly support hard-register input operands that are clobbered. This is needed to support millicode calls on hppa. The operand setup is sometimes deleted. This problem can be avoided by hiding hard-register input operands using match_operand. This also potentially allows for constraints that specify the operand is both read and written. 2025-02-03 John David Anglin gcc/ChangeLog: PR rtl-optimization/117248 * config/pa/predicates.md (r25_operand): New predicate. (r26_operand): Likewise. * config/pa/pa.md: Use match_operand for r25 and r26 hard register operands in mult, div, udiv, mod and umod millicode patterns. diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md index df1b61e871f..23129940e64 100644 --- a/gcc/config/pa/pa.md +++ b/gcc/config/pa/pa.md @@ -5632,8 +5632,10 @@ (set_attr "length" "4")]) (define_insn "" - [(set (reg:SI 29) (mult:SI (reg:SI 26) (reg:SI 25))) - (clobber (match_operand:SI 0 "register_operand" "=a")) + [(set (reg:SI 29) + (mult:SI (match_operand:SI 1 "r26_operand" "") +(match_operand:SI 0 "r25_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:SI 26)) (clobber (reg:SI 25)) (clobber (reg:SI 31))] @@ -5645,8 +5647,10 @@ (symbol_ref "pa_attr_length_millicode_call (insn)")))]) (define_insn "" - [(set (reg:SI 29) (mult:SI (reg:SI 26) (reg:SI 25))) - (clobber (match_operand:SI 0 "register_operand" "=a")) + [(set (reg:SI 29) + (mult:SI (match_operand:SI 1 "r26_operand" "") +(match_operand:SI 0 "r25_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:SI 26)) (clobber (reg:SI 25)) (clobber (reg:SI 2))] @@ -5753,8 +5757,9 @@ (define_insn "" [(set (reg:SI 29) - (div:SI (reg:SI 26) (match_operand:SI 0 "div_operand" ""))) - (clobber (match_operand:SI 1 "register_operand" "=a")) + (div:SI (match_operand:SI 1 "r26_operand" "") + (match_operand:SI 0 "div_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:SI 26)) (clobber (reg:SI 25)) (clobber (reg:SI 31))] @@ -5768,8 +5773,9 @@ (define_insn "" [(set (reg:SI 29) - (div:SI (reg:SI 26) (match_operand:SI 0 "div_operand" ""))) - (clobber (match_operand:SI 1 "register_operand" "=a")) + (div:SI (match_operand:SI 1 "r26_operand" "") + (match_operand:SI 0 "div_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:SI 26)) (clobber (reg:SI 25)) (clobber (reg:SI 2))] @@ -5800,8 +5806,9 @@ (define_insn "" [(set (reg:SI 29) - (udiv:SI (reg:SI 26) (match_operand:SI 0 "div_operand" ""))) - (clobber (match_operand:SI 1 "register_operand" "=a")) + (udiv:SI (match_operand:SI 1 "r26_operand" "") +(match_operand:SI 0 "div_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:SI 26)) (clobber (reg:SI 25)) (clobber (reg:SI 31))] @@ -5815,8 +5822,9 @@ (define_insn "" [(set (reg:SI 29) - (udiv:SI (reg:SI 26) (match_operand:SI 0 "div_operand" ""))) - (clobber (match_operand:SI 1 "register_operand" "=a")) + (udiv:SI (match_operand:SI 1 "r26_operand" "") +(match_operand:SI 0 "div_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:SI 26)) (clobber (reg:SI 25)) (clobber (reg:SI 2))] @@ -5844,8 +5852,10 @@ }") (define_insn "" - [(set (reg:SI 29) (mod:SI (reg:SI 26) (reg:SI 25))) - (clobber (match_operand:SI 0 "register_operand" "=a")) + [(set (reg:SI 29) + (mod:SI (match_operand:SI 1 "r26_operand" "") + (match_operand:SI 0 "r25_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:SI 26)) (clobber (reg:SI 25)) (clobber (reg:SI 31))] @@ -5858,8 +5868,10 @@ (symbol_ref "pa_attr_length_millicode_call (insn)")))]) (define_insn "" - [(set (reg:SI 29) (mod:SI (reg:SI 26) (reg:SI 25))) - (clobber (match_operand:SI 0 "register_operand" "=a")) + [(set (reg:SI 29) + (mod:SI (match_operand:SI 1 "r26_operand" "") + (match_operand:SI 0 "r25_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:SI 26)) (clobber (reg:SI 25)) (clobber (reg:SI 2))] @@ -5887,8 +5899,10 @@ }") (define_insn "" - [(set (reg:SI 29) (umod:SI (reg:SI 26) (reg:SI 25))) - (clobber (match_operand:SI 0 "register_operand" "=a")) + [(set (reg:SI 29) + (umod:SI (match_operand:SI 1 "r26_operand" "") +(match_operand:SI 0 "r25_operand" ""))) + (clobber (match_operand:SI 2 "register_operand" "=a")) (clobber (reg:S
Re: [PATCH] c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro
On Mon, Feb 03, 2025 at 11:33:38AM +0100, Richard Biener wrote: > The first argument is supposed to be a type, not a decl. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > OK? > > PR c++/79786 > gcc/cp/ > * rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation. LGTM. > --- a/gcc/cp/rtti.cc > +++ b/gcc/cp/rtti.cc > @@ -1741,7 +1741,8 @@ emit_tinfo_decl (tree decl) >/* Avoid targets optionally bumping up the alignment to improve >vector instruction accesses, tinfo are never accessed this way. */ > #ifdef DATA_ABI_ALIGNMENT > - SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE > (decl; > + SET_DECL_ALIGN (decl, DATA_ABI_ALIGNMENT (TREE_TYPE (decl), > + TYPE_ALIGN (TREE_TYPE (decl; >DECL_USER_ALIGN (decl) = true; > #endif >return true; > -- > 2.43.0 Jakub
[PATCH v1 10/16] Add dispatcher_resolver_function and is_target_clone to cgraph_node.
These flags are used to make sure mangling is done correctly. gcc/ChangeLog: * cgraph.h (struct cgraph_node): Add dispatcher_resolver_function and is_target_clone. --- gcc/cgraph.h | 27 --- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index d9177364b7a..9561bce2c33 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -896,19 +896,19 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node /* Constructor. */ explicit cgraph_node () : symtab_node (SYMTAB_FUNCTION), callees (NULL), callers (NULL), - indirect_calls (NULL), - next_sibling_clone (NULL), prev_sibling_clone (NULL), clones (NULL), - clone_of (NULL), call_site_hash (NULL), former_clone_of (NULL), - simdclone (NULL), simd_clones (NULL), ipa_transforms_to_apply (vNULL), - inlined_to (NULL), rtl (NULL), - count (profile_count::uninitialized ()), + indirect_calls (NULL), next_sibling_clone (NULL), + prev_sibling_clone (NULL), clones (NULL), clone_of (NULL), + call_site_hash (NULL), former_clone_of (NULL), simdclone (NULL), + simd_clones (NULL), ipa_transforms_to_apply (vNULL), inlined_to (NULL), + rtl (NULL), count (profile_count::uninitialized ()), count_materialization_scale (REG_BR_PROB_BASE), profile_id (0), unit_id (0), tp_first_run (0), thunk (false), - used_as_abstract_origin (false), - lowered (false), process (false), frequency (NODE_FREQUENCY_NORMAL), - only_called_at_startup (false), only_called_at_exit (false), - tm_clone (false), dispatcher_function (false), calls_comdat_local (false), - icf_merged (false), nonfreeing_fn (false), merged_comdat (false), + used_as_abstract_origin (false), lowered (false), process (false), + frequency (NODE_FREQUENCY_NORMAL), only_called_at_startup (false), + only_called_at_exit (false), tm_clone (false), + dispatcher_function (false), dispatcher_resolver_function (false), + is_target_clone (false), calls_comdat_local (false), icf_merged (false), + nonfreeing_fn (false), merged_comdat (false), merged_extern_inline (false), parallelized_function (false), split_part (false), indirect_call_target (false), local (false), versionable (false), can_change_signature (false), @@ -1465,6 +1465,11 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node unsigned tm_clone : 1; /* True if this decl is a dispatcher for function versions. */ unsigned dispatcher_function : 1; + /* True if this decl is a resolver for function versions. */ + unsigned dispatcher_resolver_function : 1; + /* True this is part of a multiversioned set and the default version + comes from a target_clone attribute. */ + unsigned is_target_clone : 1; /* True if this decl calls a COMDAT-local function. This is set up in compute_fn_summary and inline_call. */ unsigned calls_comdat_local : 1;
[PATCH v1 04/16] Remove unecessary `record` argument from maybe_version_functions.
The `record` argument in maybe_version_function was intended to allow controlling recording the relationship of versions. However, it only exercised this if both input funcitons were already marked as versioned, and this same logic is repeated in maybe_version_function itself so the argument is unecessary. gcc/cp/ChangeLog: * class.cc (add_method): Remove argument. * cp-tree.h (maybe_version_functions): Ditto. * decl.cc (decls_match): Ditto. (maybe_version_functions): Ditto. --- gcc/cp/class.cc | 2 +- gcc/cp/cp-tree.h | 2 +- gcc/cp/decl.cc | 9 +++-- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc index f2f81a44718..a9a80d1b4be 100644 --- a/gcc/cp/class.cc +++ b/gcc/cp/class.cc @@ -1402,7 +1402,7 @@ add_method (tree type, tree method, bool via_using) /* If these are versions of the same function, process and move on. */ if (TREE_CODE (fn) == FUNCTION_DECL - && maybe_version_functions (method, fn, true)) + && maybe_version_functions (method, fn)) continue; if (DECL_INHERITED_CTOR (method)) diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index ec976928f5f..8eba8d455be 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7114,7 +7114,7 @@ extern void determine_local_discriminator (tree, tree = NULL_TREE); extern bool member_like_constrained_friend_p (tree); extern bool fns_correspond (tree, tree); extern int decls_match(tree, tree, bool = true); -extern bool maybe_version_functions (tree, tree, bool); +extern bool maybe_version_functions (tree, tree); extern bool validate_constexpr_redeclaration (tree, tree); extern bool merge_default_template_args (tree, tree, bool); extern tree duplicate_decls (tree, tree, diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index cf5e055e146..3b3b4481964 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -1215,9 +1215,7 @@ decls_match (tree newdecl, tree olddecl, bool record_versions /* = true */) && targetm.target_option.function_versions (newdecl, olddecl)) { if (record_versions) - maybe_version_functions (newdecl, olddecl, - (!DECL_FUNCTION_VERSIONED (newdecl) - || !DECL_FUNCTION_VERSIONED (olddecl))); + maybe_version_functions (newdecl, olddecl); return 0; } } @@ -1288,7 +1286,7 @@ maybe_mark_function_versioned (tree decl) If RECORD is set to true, record function versions. */ bool -maybe_version_functions (tree newdecl, tree olddecl, bool record) +maybe_version_functions (tree newdecl, tree olddecl) { if (!targetm.target_option.function_versions (newdecl, olddecl)) return false; @@ -1311,8 +1309,7 @@ maybe_version_functions (tree newdecl, tree olddecl, bool record) maybe_mark_function_versioned (newdecl); } - if (record) -cgraph_node::record_function_versions (olddecl, newdecl); + cgraph_node::record_function_versions (olddecl, newdecl); return true; }
[PATCH v1 03/16] Add string_slice class.
The string_slice inherits from array_slice and is used to refer to a substring of an array that is memory managed elsewhere without modifying the underlying array. For example, this is useful in cases such as when needing to refer to a substring of an attribute in the syntax tree. This commit also adds some minimal helper functions for string_slice, such as strtok, strcmp, and a function to strip whitespace from the beginning and end of a slice. gcc/ChangeLog: * vec.cc (string_slice::strtok): New method. (strcmp): Add implementation for string_slice. (string_slice::strip): New method. (test_string_slice_initializers): New test. (test_string_slice_strtok): Ditto. (test_string_slice_strcmp): Ditto. (test_string_slice_equality): Ditto. (test_string_slice_invalid): Ditto. (test_string_slice_strip): Ditto. (vec_cc_tests): Add new tests. * vec.h (class string_slice): New class. (strcmp): Add implementation for string_slice. --- gcc/vec.cc | 157 + gcc/vec.h | 38 + 2 files changed, 195 insertions(+) diff --git a/gcc/vec.cc b/gcc/vec.cc index 55f5f3dd447..569dbf2a53c 100644 --- a/gcc/vec.cc +++ b/gcc/vec.cc @@ -176,6 +176,67 @@ dump_vec_loc_statistics (void) vec_mem_desc.dump (VEC_ORIGIN); } +string_slice +string_slice::strtok (string_slice *str, string_slice delims) +{ + const char *ptr = str->begin (); + + /* If the input string is empty or invalid, return an invalid slice + as there are no more tokens to return. */ + if (str->empty () || !str->is_valid ()) +{ + *str = string_slice::invalid (); + return string_slice::invalid (); +} + + for (; ptr < str->end (); ptr++) +for (const char *c = delims.begin (); c < delims.end(); c++) + if (*ptr == *c) + { + const char *start = str->begin (); + /* Update the input string to be the remaining string. */ + *str = string_slice ((ptr + 1), str->end () - ptr - 1); + return string_slice (start, (size_t) (ptr - start)); + } + + /* If no deliminators between the start and end, return the whole string. */ + string_slice res = *str; + *str = string_slice::invalid (); + return res; +} + +int +strcmp (string_slice str1, string_slice str2) +{ + for (unsigned int i = 0; i < str1.size () && i < str2.size (); i++) +{ + if (str1[i] < str2[i]) + return -1; + if (str1[i] > str2[i]) + return 1; +} + + if (str1.size () < str2.size ()) +return -1; + if (str1.size () > str2.size ()) +return 1; + return 0; +} + +string_slice +string_slice::strip () +{ + const char *start = this->begin (); + const char *end = this->end (); + + while (start < end && ISSPACE (*start)) +start++; + while (end > start && ISSPACE (*(end-1))) +end--; + + return string_slice (start, end-start); +} + #if CHECKING_P /* Report qsort comparator CMP consistency check failure with P1, P2, P3 as witness elements. */ @@ -584,6 +645,96 @@ test_auto_alias () ASSERT_EQ (val, 0); } +static void +test_string_slice_initializers () +{ + string_slice str1 = string_slice (); + ASSERT_TRUE (str1.is_valid ()); + ASSERT_EQ (str1.size (), 0); + + string_slice str2 = string_slice ("Test string"); + ASSERT_TRUE (str2.is_valid ()); + ASSERT_EQ (str2.size (), 11); + + string_slice str3 = string_slice ("Test string", 4); + ASSERT_TRUE (str3.is_valid ()); + ASSERT_EQ (str3.size (), 4); +} + +static void +test_string_slice_strtok () +{ + const char *test_string += "This is the test string, it \0 is for testing, 123 ,,"; + + string_slice test_string_slice = string_slice (test_string, 53); + string_slice test_delims = string_slice (",\0", 2); + + ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims), + string_slice ("This is the test string")); + ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims), + string_slice (" it ")); + ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims), + string_slice (" is for testing")); + ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims), + string_slice (" 123 ")); + ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims), + string_slice ("")); + ASSERT_EQ (string_slice::strtok (&test_string_slice, test_delims), + string_slice ("")); + ASSERT_TRUE (test_string_slice.empty ()); + ASSERT_FALSE (string_slice::strtok (&test_string_slice, test_delims) +.is_valid ()); + ASSERT_FALSE (test_string_slice.is_valid ()); +} + +static void +test_string_slice_strcmp () +{ + ASSERT_EQ (strcmp (string_slice (), string_slice ()), 0); + ASSERT_EQ (strcmp (string_slice ("test"), string_slice ()), 1); + ASSERT_EQ (strcmp (string_slice (), string_slice ("test")), -1); + ASSERT_EQ (strcmp (string_slice ("test"), string_slice ("test")), 0); + ASSERT_EQ (strcmp (string_slice ("a"), string_slice ("b")), -1); + ASSERT_EQ
[PATCH v1 15/16] Support mixing of target_clones and target_version for aarch64.
This patch adds support for the combination of target_clones and target_version in the definition of a versioned function. This patch changes is_function_default_version to consider a function declaration annotated with target_clones containing default to be a default version. It also changes the common_function_version hook to consider two functions annotated with target_clones and/or target_versions to be common if their specified versions don't overlap. This takes advantage of refactoring done in previous patches changing how target_clones are expanded. gcc/ChangeLog: * attribs.cc (is_function_default_version): Add logic for target_clones defining the default version. * config/aarch64/aarch64.cc (aarch64_common_function_versions): Add logic for a target_clones and target_version, or two target_clones coexisting in a version set. gcc/c-family/ChangeLog: * c-attribs.cc: Add support for target_version and target_clones coexisting. gcc/testsuite/ChangeLog: * g++.target/aarch64/mv-and-mvc1.C: New test. * g++.target/aarch64/mv-and-mvc2.C: New test. * g++.target/aarch64/mv-and-mvc3.C: New test. * g++.target/aarch64/mv-and-mvc4.C: New test. --- gcc/attribs.cc| 7 +++ gcc/c-family/c-attribs.cc | 2 - gcc/config/aarch64/aarch64.cc | 46 ++- .../g++.target/aarch64/mv-and-mvc1.C | 38 +++ .../g++.target/aarch64/mv-and-mvc2.C | 29 .../g++.target/aarch64/mv-and-mvc3.C | 41 + .../g++.target/aarch64/mv-and-mvc4.C | 38 +++ gcc/testsuite/g++.target/aarch64/mv-error1.C | 13 ++ gcc/testsuite/g++.target/aarch64/mv-error13.C | 13 ++ gcc/testsuite/g++.target/aarch64/mv-error2.C | 10 gcc/testsuite/g++.target/aarch64/mv-error3.C | 13 ++ gcc/testsuite/g++.target/aarch64/mv-error7.C | 9 gcc/testsuite/g++.target/aarch64/mv-error8.C | 21 + gcc/testsuite/g++.target/aarch64/mv-error9.C | 12 + 14 files changed, 289 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc1.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc2.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc3.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc4.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error1.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error13.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error2.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error3.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error7.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error8.C create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error9.C diff --git a/gcc/attribs.cc b/gcc/attribs.cc index 687e6d4143a..f877dc4f6e3 100644 --- a/gcc/attribs.cc +++ b/gcc/attribs.cc @@ -1327,6 +1327,13 @@ is_function_default_version (const tree decl) } else { + if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl))) + { + int num_def = 0; + auto_vec versions = get_clone_versions (decl, &num_def); + return num_def > 0; + } + attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl)); if (!attr) return true; diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index 642d724f6c6..f2cc43ad641 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -249,13 +249,11 @@ static const struct attribute_spec::exclusions attr_target_clones_exclusions[] = ATTR_EXCL ("always_inline", true, true, true), ATTR_EXCL ("target", TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE), - ATTR_EXCL ("target_version", true, true, true), ATTR_EXCL (NULL, false, false, false), }; static const struct attribute_spec::exclusions attr_target_version_exclusions[] = { - ATTR_EXCL ("target_clones", true, true, true), ATTR_EXCL (NULL, false, false, false), }; diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 420bbba9be2..f6cb7903d88 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -20671,7 +20671,51 @@ aarch64_common_function_versions (tree fn1, tree fn2) || TREE_CODE (fn2) != FUNCTION_DECL) return false; - return (aarch64_compare_version_priority (fn1, fn2) != 0); + if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (fn2))) +{ + tree temp = fn1; + fn1 = fn2; + fn2 = temp; +} + + if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (fn1))) +{ + auto_vec fn1_versions = get_clone_versions (fn1); + // fn1 is target_clone + if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (fn2))) + { + auto_vec fn2_versions = get_clone_versions (fn2); + for (string_slice v1 : fn1_ve
[PATCH v1 01/16] Add PowerPC FMV symbol tests.
This tests the mangling of function assembly names when annotated with target_clones attributes. gcc/testsuite/ChangeLog: * g++.target/powerpc/mvc-symbols1.C: New test. * g++.target/powerpc/mvc-symbols2.C: New test. * g++.target/powerpc/mvc-symbols3.C: New test. * g++.target/powerpc/mvc-symbols4.C: New test. --- .../g++.target/powerpc/mvc-symbols1.C | 47 +++ .../g++.target/powerpc/mvc-symbols2.C | 35 ++ .../g++.target/powerpc/mvc-symbols3.C | 41 .../g++.target/powerpc/mvc-symbols4.C | 29 4 files changed, 152 insertions(+) create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols1.C create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols2.C create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols3.C create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols4.C diff --git a/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C b/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C new file mode 100644 index 000..9424382bf14 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O0" } */ + +__attribute__((target_clones("default", "cpu=power6", "cpu=power6x"))) +int foo () +{ + return 1; +} + +__attribute__((target_clones("cpu=power6x", "cpu=power6", "default"))) +int foo (int) +{ + return 2; +} + +int bar() +{ + return foo (); +} + +int bar(int x) +{ + return foo (x); +} + +/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6x:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\tbl _Z3foov\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3foov, @gnu_indirect_function\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3foov,_Z3foov\.resolver\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.default\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6x\n" 0 } } */ + +/* { dg-final { scan-assembler-times "\n_Z3fooi\.default:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6x:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\tbl _Z3fooi\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3fooi, @gnu_indirect_function\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3fooi,_Z3fooi\.resolver\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.default\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.cpu_power6\n" 0 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.cpu_power6x\n" 1 } } */ diff --git a/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C b/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C new file mode 100644 index 000..edf54480efd --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O0" } */ + +__attribute__((target_clones("default", "cpu=power6", "cpu=power6x"))) +int foo () +{ + return 1; +} + +__attribute__((target_clones("cpu=power6x", "cpu=power6", "default"))) +int foo (int) +{ + return 2; +} + +/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6x:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3foov, @gnu_indirect_function\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3foov,_Z3foov\.resolver\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.default\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6x\n" 0 } } */ + +/* { dg-final { scan-assembler-times "\n_Z3fooi\.default:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6x:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3fooi, @gnu_indirect_function\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3fooi,_Z3fooi\.resolver\n" 1 } } */ +/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.default\n" 1 } } */ +/* { dg-final { scan-assembler-times
[PATCH v1 12/16] Refactor FMV name mangling.
This patch is an overhaul of how FMV name mangling works. Previously mangling logic was duplicated in several places across both target specific and independent code. This patch changes this such that all mangling is done in targetm.mangle_decl_assembler_name (including for the dispatched symbol and dispatcher resolver). This allows for the removing of previous hacks, such as where the default mangled decl's assembler name was unmangled to then remangle all versions and the resolver and dispatched symbol. This does introduce a change though (shown in test changes) where previously x86 for target annotated FMV sets set the function name to the assembler name and remangled this. This was hard to reproduce without resorting to hacks I wasn't comfortable with so the mangling is changed to append ".ifunc" which matches clang. This change also refactors expand_target_clone using targetm.mangle_decl_assembler_name for mangling and get_clone_versions. gcc/ChangeLog: * attribs.cc (make_dispatcher_decl): Refactor to use targetm.mangle_decl_assembler_name for mangling. * config/aarch64/aarch64.cc (aarch64_parse_fmv_features): Change to support string_slice. (aarch64_process_target_version_attr): Ditto. (get_feature_mask_for_version): Ditto. (aarch64_mangle_decl_assembler_name): Refactor to handle FMV dispatched symbol and resolver. (get_suffixed_assembler_name): Removed. (make_resolver_func): Refactor to use aarch64_mangle_decl_assembler_name for mangling. (aarch64_generate_version_dispatcher_body): Ditto. * config/i386/i386-features.cc (is_valid_asm_symbol): Moved from multiple_target.cc. (create_new_asm_name): Moved from gcc/multiple_target.cc. (ix86_mangle_function_version_assembler_name): Refactor to handle FMV dispatched symbol and resolver. (ix86_mangle_decl_assembler_name): Ditto. (ix86_get_function_versions_dispatcher): Refactor to use ix86_mangle_decl_assembler_name for mangling. (make_resolver_func): Ditto. * config/riscv/riscv.cc (riscv_mangle_decl_assembler_name): Refactor to handle FMV dispatched symbol and resolver. (get_suffixed_assembler_name): Removed. (make_resolver_func): Refactor to use riscv_mangle_decl_assembler_name for mangling. (riscv_generate_version_dispatcher_body): Ditto. * config/rs6000/rs6000.cc (rs6000_mangle_decl_assembler_name): Refactor to handle FMV dispatched symbol and resolver. (make_resolver_func): Refactor to use rs6000_mangle_function_version_assembler_name for mangling. (is_valid_asm_symbol): Moved from gcc/multiple_target.cc. (create_new_asm_name): Ditto. (rs6000_mangle_function_version_assembler_name): Refactor to handle FMV dispatched symbol and resolver. * multiple_target.cc (create_dispatcher_calls): Refactored to use targetm.mangle_decl_assembler_name for mangling. (is_valid_asm_symbol): Moved to target specific code. (create_new_asm_name): Ditto. (expand_target_clones): Refactored to use targetm.mangle_decl_assembler_name for mangling. gcc/cp/ChangeLog: * decl.cc (duplicate_decls): Added logic to remangle FMV decls when merging. gcc/testsuite/ChangeLog: * g++.target/i386/mv-symbols1.C: Change FMV mangling. * g++.target/i386/mv-symbols3.C: Ditto. * g++.target/i386/mv-symbols4.C: Ditto. * g++.target/i386/mv-symbols5.C: Ditto. --- gcc/attribs.cc | 25 +++- gcc/config/aarch64/aarch64.cc | 141 --- gcc/config/i386/i386-features.cc| 90 +--- gcc/config/riscv/riscv.cc | 95 ++--- gcc/config/rs6000/rs6000.cc | 104 +- gcc/cp/decl.cc | 13 ++ gcc/multiple_target.cc | 146 +--- gcc/testsuite/g++.target/i386/mv-symbols1.C | 12 +- gcc/testsuite/g++.target/i386/mv-symbols3.C | 10 +- gcc/testsuite/g++.target/i386/mv-symbols4.C | 10 +- gcc/testsuite/g++.target/i386/mv-symbols5.C | 10 +- 11 files changed, 375 insertions(+), 281 deletions(-) diff --git a/gcc/attribs.cc b/gcc/attribs.cc index cb25845715d..687e6d4143a 100644 --- a/gcc/attribs.cc +++ b/gcc/attribs.cc @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pretty-print.h" #include "intl.h" #include "gcc-urlifier.h" +#include "cgraph.h" /* Table of the tables of attributes (common, language, format, machine) searched. */ @@ -1271,18 +1272,13 @@ common_function_versions (tree fn1, tree fn2) tree make_dispatcher_decl (const tree decl) { - tree func_decl; - char *func_name; - tree fn_type, func_type; - - func_name = xstrdup (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl))); + tree func_decl,
Re: [PATCH] rtl-optimization/117611 - ICE in simplify_shift_const_1
On Mon, Feb 03, 2025 at 03:30:41PM +0100, Richard Biener wrote: > The following checks we have a scalar int shift mode before > enforcing it. As AVR shows the mode can be a signed _Accum mode > as well. > > Bootstrap and regtest pending on x86_64-unknown-linux-gnu. > > OK if that succeeds? > > Thanks, > Richard. > > PR rtl-optimization/117611 > * combine.cc (simplify_shift_const_1): Bail if not > scalar int mode. LGTM. > * gcc.target/avr/pr117611.c: New testcase. I don't see anything AVR specific here. Move to gcc.dg/fixed-point/pr117611.c ? > new file mode 100644 > index 000..c76093f12d1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/avr/pr117611.c > @@ -0,0 +1,7 @@ > +/* { dg-do compile } */ > +/* { dg-options "-Os" } */ > + > +_Accum acc1 (_Accum x) > +{ > +return x << 16; > +} > -- > 2.43.0 Jakub
[PATCH] RISC-V: Fix wrong LMUL when only implict zve32f.
According to Section 3.4.2, Vector Register Grouping, in the RISC-V Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN --- gcc/config/riscv/riscv-v.cc | 8 +- gcc/config/riscv/riscv-vector-switch.def | 84 ++--- .../gcc.target/riscv/rvv/autovec/pr111391-2.c | 2 +- .../gcc.target/riscv/rvv/base/abi-14.c| 84 ++--- .../gcc.target/riscv/rvv/base/abi-16.c| 98 +++ .../gcc.target/riscv/rvv/base/abi-18.c| 112 +- .../gcc.target/riscv/rvv/base/vsetvl_zve32f.c | 73 7 files changed, 268 insertions(+), 193 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vsetvl_zve32f.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 9847439ca77..24f3127e71d 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1730,13 +1730,15 @@ get_vlmul (machine_mode mode) int inner_size = GET_MODE_BITSIZE (GET_MODE_INNER (mode)); if (size < TARGET_MIN_VLEN) { + /* Follow rule LMUL >= SEW / ELEN. */ + int elen = TARGET_VECTOR_ELEN_64 ? 1 : 2; int factor = TARGET_MIN_VLEN / size; if (inner_size == 8) - factor = MIN (factor, 8); + factor = MIN (factor, 8 / elen); else if (inner_size == 16) - factor = MIN (factor, 4); + factor = MIN (factor, 4 / elen); else if (inner_size == 32) - factor = MIN (factor, 2); + factor = MIN (factor, 2 / elen); else if (inner_size == 64) factor = MIN (factor, 1); else diff --git a/gcc/config/riscv/riscv-vector-switch.def b/gcc/config/riscv/riscv-vector-switch.def index 23744d076f9..1b0d61940a6 100644 --- a/gcc/config/riscv/riscv-vector-switch.def +++ b/gcc/config/riscv/riscv-vector-switch.def @@ -64,13 +64,13 @@ Encode the ratio of SEW/LMUL into the mask types. |BI |RVVM1BI|RVVMF2BI|RVVMF4BI|RVVMF8BI|RVVMF16BI|RVVMF32BI|RVVMF64BI| */ /* Return 'REQUIREMENT' for machine_mode 'MODE'. - For example: 'MODE' = RVVMF64BImode needs TARGET_MIN_VLEN > 32. */ + For example: 'MODE' = RVVMF64BImode needs TARGET_VECTOR_ELEN_64. */ #ifndef ENTRY #define ENTRY(MODE, REQUIREMENT, VLMUL, RATIO) #endif /* Disable modes if TARGET_MIN_VLEN == 32. */ -ENTRY (RVVMF64BI, TARGET_MIN_VLEN > 32, TARGET_XTHEADVECTOR ? LMUL_1 :LMUL_F8, 64) +ENTRY (RVVMF64BI, TARGET_VECTOR_ELEN_64, TARGET_XTHEADVECTOR ? LMUL_1 :LMUL_F8, 64) ENTRY (RVVMF32BI, true, TARGET_XTHEADVECTOR ? LMUL_1 :LMUL_F4, 32) ENTRY (RVVMF16BI, true, TARGET_XTHEADVECTOR ? LMUL_1 : LMUL_F2 , 16) ENTRY (RVVMF8BI, true, LMUL_1, 8) @@ -85,7 +85,7 @@ ENTRY (RVVM2QI, true, LMUL_2, 4) ENTRY (RVVM1QI, true, LMUL_1, 8) ENTRY (RVVMF2QI, !TARGET_XTHEADVECTOR, LMUL_F2, 16) ENTRY (RVVMF4QI, !TARGET_XTHEADVECTOR, LMUL_F4, 32) -ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F8, 64) +ENTRY (RVVMF8QI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F8, 64) /* Disable modes if TARGET_MIN_VLEN == 32. */ ENTRY (RVVM8HI, true, LMUL_8, 2) @@ -93,7 +93,7 @@ ENTRY (RVVM4HI, true, LMUL_4, 4) ENTRY (RVVM2HI, true, LMUL_2, 8) ENTRY (RVVM1HI, true, LMUL_1, 16) ENTRY (RVVMF2HI, !TARGET_XTHEADVECTOR, LMUL_F2, 32) -ENTRY (RVVMF4HI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F4, 64) +ENTRY (RVVMF4HI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F4, 64) /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_BF_16. */ ENTRY (RVVM8BF, TARGET_VECTOR_ELEN_BF_16, LMUL_8, 2) @@ -109,21 +109,21 @@ ENTRY (RVVM4HF, TARGET_VECTOR_ELEN_FP_16, LMUL_4, 4) ENTRY (RVVM2HF, TARGET_VECTOR_ELEN_FP_16, LMUL_2, 8) ENTRY (RVVM1HF, TARGET_VECTOR_ELEN_FP_16, LMUL_1, 16) ENTRY (RVVMF2HF, TARGET_VECTOR_ELEN_FP_16 && !TARGET_XTHEADVECTOR, LMUL_F2, 32) -ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F4, 64) +ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F4, 64) /* Disable modes if TARGET_MIN_VLEN == 32. */ ENTRY (RVVM8SI, true, LMUL_8, 4) ENTRY (RVVM4SI, true, LMUL_4, 8) ENTRY (RVVM2SI, true, LMUL_2, 16) ENTRY (RVVM1SI, true, LMUL_1, 32) -ENTRY (RVVMF2SI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F2, 64) +ENTRY (RVVMF2SI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F2, 64) /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_FP_32. */ ENTRY (RVVM8SF, TARGET_VECTOR_ELEN_FP_32, LMUL_8, 4) ENTRY (RVVM4SF, TARGET_VECTOR_ELEN_FP_32, LMUL_4, 8) ENTRY (RVVM2SF, TARGET_VECTOR_ELEN_FP_32, LMUL_2, 16) ENTRY (RVVM1SF, TARGET_VECTOR_ELEN_FP_32, LMUL_1, 32) -ENTRY (RVVMF2SF, TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F2, 64) +ENTRY (RVVMF2SF, TARGET_VECTOR_ELEN_FP_32 && TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F2, 64) /* Disable modes if !TARGET_VECTOR_ELEN_64. */ ENTRY (RVVM8DI, TARGET_VECTOR_ELE
Re: [PATCH] rtl-optimization/117611 - ICE in simplify_shift_const_1
On Mon, 3 Feb 2025, Jakub Jelinek wrote: > On Mon, Feb 03, 2025 at 03:30:41PM +0100, Richard Biener wrote: > > The following checks we have a scalar int shift mode before > > enforcing it. As AVR shows the mode can be a signed _Accum mode > > as well. > > > > Bootstrap and regtest pending on x86_64-unknown-linux-gnu. > > > > OK if that succeeds? > > > > Thanks, > > Richard. > > > > PR rtl-optimization/117611 > > * combine.cc (simplify_shift_const_1): Bail if not > > scalar int mode. > > LGTM. > > > * gcc.target/avr/pr117611.c: New testcase. > > I don't see anything AVR specific here. > Move to gcc.dg/fixed-point/pr117611.c ? Done and pushed. Richard. > > new file mode 100644 > > index 000..c76093f12d1 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/avr/pr117611.c > > @@ -0,0 +1,7 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-Os" } */ > > + > > +_Accum acc1 (_Accum x) > > +{ > > +return x << 16; > > +} > > -- > > 2.43.0 > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
[PATCH] optabs: Fix widening optabs for vec-mode -> scalar-mode [PR116926]
r15-4317-ga6f4404689f12 tried to add support for widending optabs for vec-mode -> scalar-mode but it misunderstood how FOR_EACH_MODE worked, the limit in this case is not inclusive. Which means setting limit to from, would cause the loop not be executed at all. This fixes by setting the limit to be the next mode after from mode. Note the original version that added the widening optabs for vec-mode -> scalar-mode (https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665021.html) didn't have this bug, only the second version with suggested change (https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665068.html) dud. The suggested change missed this issue with FOR_EACH_MODE. Bootstrapped and tested on x86_64-linux-gnu. PR middle-end/116926 gcc/ChangeLog: * optabs-query.cc (find_widening_optab_handler_and_mode): Fix limit for `vec-mode -> scalar-mode` case. Signed-off-by: Andrew Pinski --- gcc/optabs-query.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc index 65eeb5d8e51..f5ca98da818 100644 --- a/gcc/optabs-query.cc +++ b/gcc/optabs-query.cc @@ -492,7 +492,7 @@ find_widening_optab_handler_and_mode (optab op, machine_mode to_mode, { gcc_checking_assert (VECTOR_MODE_P (from_mode) && GET_MODE_INNER (from_mode) < to_mode); - limit_mode = from_mode; + limit_mode = GET_MODE_NEXT_MODE (from_mode).require (); } else gcc_checking_assert (GET_MODE_CLASS (from_mode) == GET_MODE_CLASS (to_mode) -- 2.43.0
[PATCH v2] RISC-V: Fix wrong LMUL when only implict zve32f.
According to Section 3.4.2, Vector Register Grouping, in the RISC-V Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN gcc/ChangeLog: * config/riscv/riscv-v.cc: Add restrict for insert LMUL. config/riscv/riscv-vector-builtins-types.def: Use RVV_REQUIRE_ELEN_64 to check LMUL number. config/riscv/riscv-vector-switch.def: Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr111391-2.c: Update test. gcc.target/riscv/rvv/base/abi-14.c: Update test. gcc.target/riscv/rvv/base/abi-16.c: Update test. gcc.target/riscv/rvv/base/abi-18.c: Update test. gcc.target/riscv/rvv/base/vsetvl_zve32f.c: New test. --- gcc/config/riscv/riscv-v.cc | 8 +- .../riscv/riscv-vector-builtins-types.def | 322 +- gcc/config/riscv/riscv-vector-switch.def | 84 ++--- .../gcc.target/riscv/rvv/autovec/pr111391-2.c | 2 +- .../gcc.target/riscv/rvv/base/abi-14.c| 84 ++--- .../gcc.target/riscv/rvv/base/abi-16.c| 98 +++--- .../gcc.target/riscv/rvv/base/abi-18.c| 112 +++--- .../gcc.target/riscv/rvv/base/vsetvl_zve32f.c | 73 8 files changed, 429 insertions(+), 354 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vsetvl_zve32f.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 9847439ca77..24f3127e71d 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1730,13 +1730,15 @@ get_vlmul (machine_mode mode) int inner_size = GET_MODE_BITSIZE (GET_MODE_INNER (mode)); if (size < TARGET_MIN_VLEN) { + /* Follow rule LMUL >= SEW / ELEN. */ + int elen = TARGET_VECTOR_ELEN_64 ? 1 : 2; int factor = TARGET_MIN_VLEN / size; if (inner_size == 8) - factor = MIN (factor, 8); + factor = MIN (factor, 8 / elen); else if (inner_size == 16) - factor = MIN (factor, 4); + factor = MIN (factor, 4 / elen); else if (inner_size == 32) - factor = MIN (factor, 2); + factor = MIN (factor, 2 / elen); else if (inner_size == 64) factor = MIN (factor, 1); else diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def b/gcc/config/riscv/riscv-vector-builtins-types.def index 6b98b93dfb6..857b63758a0 100644 --- a/gcc/config/riscv/riscv-vector-builtins-types.def +++ b/gcc/config/riscv/riscv-vector-builtins-types.def @@ -369,20 +369,20 @@ along with GCC; see the file COPYING3. If not see #define DEF_RVV_XFQF_OPS(TYPE, REQUIRE) #endif -DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64) +DEF_RVV_I_OPS (vint8mf8_t, RVV_REQUIRE_ELEN_64) DEF_RVV_I_OPS (vint8mf4_t, 0) DEF_RVV_I_OPS (vint8mf2_t, 0) DEF_RVV_I_OPS (vint8m1_t, 0) DEF_RVV_I_OPS (vint8m2_t, 0) DEF_RVV_I_OPS (vint8m4_t, 0) DEF_RVV_I_OPS (vint8m8_t, 0) -DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64) +DEF_RVV_I_OPS (vint16mf4_t, RVV_REQUIRE_ELEN_64) DEF_RVV_I_OPS (vint16mf2_t, 0) DEF_RVV_I_OPS (vint16m1_t, 0) DEF_RVV_I_OPS (vint16m2_t, 0) DEF_RVV_I_OPS (vint16m4_t, 0) DEF_RVV_I_OPS (vint16m8_t, 0) -DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64) +DEF_RVV_I_OPS (vint32mf2_t, RVV_REQUIRE_ELEN_64) DEF_RVV_I_OPS (vint32m1_t, 0) DEF_RVV_I_OPS (vint32m2_t, 0) DEF_RVV_I_OPS (vint32m4_t, 0) @@ -392,20 +392,20 @@ DEF_RVV_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64) DEF_RVV_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64) DEF_RVV_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64) -DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64) +DEF_RVV_U_OPS (vuint8mf8_t, RVV_REQUIRE_ELEN_64) DEF_RVV_U_OPS (vuint8mf4_t, 0) DEF_RVV_U_OPS (vuint8mf2_t, 0) DEF_RVV_U_OPS (vuint8m1_t, 0) DEF_RVV_U_OPS (vuint8m2_t, 0) DEF_RVV_U_OPS (vuint8m4_t, 0) DEF_RVV_U_OPS (vuint8m8_t, 0) -DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64) +DEF_RVV_U_OPS (vuint16mf4_t, RVV_REQUIRE_ELEN_64) DEF_RVV_U_OPS (vuint16mf2_t, 0) DEF_RVV_U_OPS (vuint16m1_t, 0) DEF_RVV_U_OPS (vuint16m2_t, 0) DEF_RVV_U_OPS (vuint16m4_t, 0) DEF_RVV_U_OPS (vuint16m8_t, 0) -DEF_RVV_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64) +DEF_RVV_U_OPS (vuint32mf2_t, RVV_REQUIRE_ELEN_64) DEF_RVV_U_OPS (vuint32m1_t, 0) DEF_RVV_U_OPS (vuint32m2_t, 0) DEF_RVV_U_OPS (vuint32m4_t, 0) @@ -415,21 +415,21 @@ DEF_RVV_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64) DEF_RVV_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64) DEF_RVV_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64) -DEF_RVV_F_OPS (vbfloat16mf4_t, RVV_REQUIRE_ELEN_BF_16 | RVV_REQUIRE_MIN_VLEN_64) +DEF_RVV_F_OPS (vbfloat16mf4_t, RVV_REQUIRE_ELEN_BF_16 | RVV_REQUIRE_ELEN_64) DEF_RVV_F_OPS (vbfloat16mf2_t, RVV_REQUIRE_ELEN_BF_16) DEF_RVV_F_OPS (vbfloat16m1_t, RVV_REQUIRE_ELEN_BF_16) DEF_RVV_F_OPS (vbfloat16m2_t, RVV_REQUIRE_ELEN_BF_16) DEF_RVV_F_OPS (vbfloat16m4_t, RVV_REQUIRE_ELEN_BF_16) DEF_RVV_F_OPS (vbfloat16m8_t, RVV_REQUIRE_ELEN_BF_16) -DEF_RVV_F_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | RVV_REQUIRE_MIN_VLEN_64)
Re: [PATCH] RISC-V: Fix wrong LMUL when only implict zve32f.
cc Robin an Ju-Zhe On Tue, Feb 4, 2025 at 3:16 PM Monk Chiang wrote: > > According to Section 3.4.2, Vector Register Grouping, in the RISC-V > Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN > --- > gcc/config/riscv/riscv-v.cc | 8 +- > gcc/config/riscv/riscv-vector-switch.def | 84 ++--- > .../gcc.target/riscv/rvv/autovec/pr111391-2.c | 2 +- > .../gcc.target/riscv/rvv/base/abi-14.c| 84 ++--- > .../gcc.target/riscv/rvv/base/abi-16.c| 98 +++ > .../gcc.target/riscv/rvv/base/abi-18.c| 112 +- > .../gcc.target/riscv/rvv/base/vsetvl_zve32f.c | 73 > 7 files changed, 268 insertions(+), 193 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vsetvl_zve32f.c > > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc > index 9847439ca77..24f3127e71d 100644 > --- a/gcc/config/riscv/riscv-v.cc > +++ b/gcc/config/riscv/riscv-v.cc > @@ -1730,13 +1730,15 @@ get_vlmul (machine_mode mode) >int inner_size = GET_MODE_BITSIZE (GET_MODE_INNER (mode)); >if (size < TARGET_MIN_VLEN) > { > + /* Follow rule LMUL >= SEW / ELEN. */ > + int elen = TARGET_VECTOR_ELEN_64 ? 1 : 2; > int factor = TARGET_MIN_VLEN / size; > if (inner_size == 8) > - factor = MIN (factor, 8); > + factor = MIN (factor, 8 / elen); > else if (inner_size == 16) > - factor = MIN (factor, 4); > + factor = MIN (factor, 4 / elen); > else if (inner_size == 32) > - factor = MIN (factor, 2); > + factor = MIN (factor, 2 / elen); > else if (inner_size == 64) > factor = MIN (factor, 1); > else > diff --git a/gcc/config/riscv/riscv-vector-switch.def > b/gcc/config/riscv/riscv-vector-switch.def > index 23744d076f9..1b0d61940a6 100644 > --- a/gcc/config/riscv/riscv-vector-switch.def > +++ b/gcc/config/riscv/riscv-vector-switch.def > @@ -64,13 +64,13 @@ Encode the ratio of SEW/LMUL into the mask types. >|BI |RVVM1BI|RVVMF2BI|RVVMF4BI|RVVMF8BI|RVVMF16BI|RVVMF32BI|RVVMF64BI| > */ > > /* Return 'REQUIREMENT' for machine_mode 'MODE'. > - For example: 'MODE' = RVVMF64BImode needs TARGET_MIN_VLEN > 32. */ > + For example: 'MODE' = RVVMF64BImode needs TARGET_VECTOR_ELEN_64. */ > #ifndef ENTRY > #define ENTRY(MODE, REQUIREMENT, VLMUL, RATIO) > #endif > > /* Disable modes if TARGET_MIN_VLEN == 32. */ > -ENTRY (RVVMF64BI, TARGET_MIN_VLEN > 32, TARGET_XTHEADVECTOR ? LMUL_1 > :LMUL_F8, 64) > +ENTRY (RVVMF64BI, TARGET_VECTOR_ELEN_64, TARGET_XTHEADVECTOR ? LMUL_1 > :LMUL_F8, 64) > ENTRY (RVVMF32BI, true, TARGET_XTHEADVECTOR ? LMUL_1 :LMUL_F4, 32) > ENTRY (RVVMF16BI, true, TARGET_XTHEADVECTOR ? LMUL_1 : LMUL_F2 , 16) > ENTRY (RVVMF8BI, true, LMUL_1, 8) > @@ -85,7 +85,7 @@ ENTRY (RVVM2QI, true, LMUL_2, 4) > ENTRY (RVVM1QI, true, LMUL_1, 8) > ENTRY (RVVMF2QI, !TARGET_XTHEADVECTOR, LMUL_F2, 16) > ENTRY (RVVMF4QI, !TARGET_XTHEADVECTOR, LMUL_F4, 32) > -ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F8, 64) > +ENTRY (RVVMF8QI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F8, 64) > > /* Disable modes if TARGET_MIN_VLEN == 32. */ > ENTRY (RVVM8HI, true, LMUL_8, 2) > @@ -93,7 +93,7 @@ ENTRY (RVVM4HI, true, LMUL_4, 4) > ENTRY (RVVM2HI, true, LMUL_2, 8) > ENTRY (RVVM1HI, true, LMUL_1, 16) > ENTRY (RVVMF2HI, !TARGET_XTHEADVECTOR, LMUL_F2, 32) > -ENTRY (RVVMF4HI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F4, 64) > +ENTRY (RVVMF4HI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F4, 64) > > /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_BF_16. */ > ENTRY (RVVM8BF, TARGET_VECTOR_ELEN_BF_16, LMUL_8, 2) > @@ -109,21 +109,21 @@ ENTRY (RVVM4HF, TARGET_VECTOR_ELEN_FP_16, LMUL_4, 4) > ENTRY (RVVM2HF, TARGET_VECTOR_ELEN_FP_16, LMUL_2, 8) > ENTRY (RVVM1HF, TARGET_VECTOR_ELEN_FP_16, LMUL_1, 16) > ENTRY (RVVMF2HF, TARGET_VECTOR_ELEN_FP_16 && !TARGET_XTHEADVECTOR, LMUL_F2, > 32) > -ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32 && > !TARGET_XTHEADVECTOR, LMUL_F4, 64) > +ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_64 && > !TARGET_XTHEADVECTOR, LMUL_F4, 64) > > /* Disable modes if TARGET_MIN_VLEN == 32. */ > ENTRY (RVVM8SI, true, LMUL_8, 4) > ENTRY (RVVM4SI, true, LMUL_4, 8) > ENTRY (RVVM2SI, true, LMUL_2, 16) > ENTRY (RVVM1SI, true, LMUL_1, 32) > -ENTRY (RVVMF2SI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F2, 64) > +ENTRY (RVVMF2SI, TARGET_VECTOR_ELEN_64 && !TARGET_XTHEADVECTOR, LMUL_F2, 64) > > /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_FP_32. */ > ENTRY (RVVM8SF, TARGET_VECTOR_ELEN_FP_32, LMUL_8, 4) > ENTRY (RVVM4SF, TARGET_VECTOR_ELEN_FP_32, LMUL_4, 8) > ENTRY (RVVM2SF, TARGET_VECTOR_ELEN_FP_32, LMUL_2, 16) > ENTRY (RVVM1SF, TARGET_VECTOR_ELEN_FP_32, LMUL_1, 32) > -ENTRY (RVVMF2SF, TARGET_VECTOR_
Re: [patch, fortran] Add modular exponentiation for unsigned
Regression-tested on x86_64. Seems I didn't look closely enough, I will check and resubmit. Best regards Thomas
Re: Patch held up in gcc-patches due to size
On Sun, 2 Feb 2025, 18:10 Thomas Koenig via Gcc, wrote: > Hi, > > I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html > to gcc-patches also, as normal, but got back an e-mail that it > was too large. and that a moderator would look at it. > > Maybe the limits can be increased a bit, sometimes patches can > be quite large, especially if they contain large test cases > or a large number of generated files. > The limits for gcc-patches are already larger than other lists, but 560kB is pretty big. You can gzip the patch if it's too large. > (Does anybody actually look at the messages, as promised in the e-mail? > I don't know about that list. There are moderators and mod queues for other gcc lists. > >
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu wrote: > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > Author: Surya Kumari Jangala > Date: Tue Jun 25 08:37:49 2024 -0500 > > ira: Scale save/restore costs of callee save registers with block > frequency > > scales the cost of saving/restoring a callee-save hard register in epilogue > and prologue with the entry block frequency, which, if not optimizing for > size, is 1, for all targets. As the result, callee-saved registers > may not be used to preserve local variable values across calls on some > targets, like x86. Add a target hook for the callee-saved register cost > scale in epilogue and prologue used by IRA. The default version of this > target hook returns 1 if optimizing for size, otherwise returns the entry > block frequency. Add an x86 version of this target hook to restore the > old behavior prior to the above commit. > > PR rtl-optimization/111673 > PR rtl-optimization/115932 > PR rtl-optimization/116028 > PR rtl-optimization/117081 > PR rtl-optimization/117082 > PR rtl-optimization/118497 > * ira-color.cc (assign_hard_reg): Call the target hook for the > callee-saved register cost scale in epilogue and prologue. > * target.def (ira_callee_saved_register_cost_scale): New target > hook. > * targhooks.cc (default_ira_callee_saved_register_cost_scale): > New. > * targhooks.h (default_ira_callee_saved_register_cost_scale): > Likewise. > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): > New. > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise. > * doc/tm.texi: Regenerated. > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): > New. > > Signed-off-by: H.J. Lu > --- > gcc/config/i386/i386.cc | 11 +++ > gcc/doc/tm.texi | 8 > gcc/doc/tm.texi.in | 2 ++ > gcc/ira-color.cc| 3 +-- > gcc/target.def | 12 > gcc/targhooks.cc| 8 > gcc/targhooks.h | 1 + > 7 files changed, 43 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index f89201684a8..3128973ba79 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass) >return false; > } > > +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ > + > +static int > +ix86_ira_callee_saved_register_cost_scale (int) > +{ > + return 1; > +} > + > /* Return true if a set of DST by the expression SRC should be allowed. > This prevents complex sets of likely_spilled hard regs before split1. */ > > @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p > #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS > ix86_preferred_output_reload_class > #undef TARGET_CLASS_LIKELY_SPILLED_P > #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p > +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \ > + ix86_ira_callee_saved_register_cost_scale > > #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST > #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index 0de24eda6f0..9f42913a4ef 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for given > pseudo from >The default version of this target hook always returns given class. > @end deftypefn > > +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > (int @var{hard_regno}) > +A target hook which returns the callee-saved register @var{hard_regno} > +cost scale in epilogue and prologue used by IRA. > + > +The default version of this target hook returns 1 if optimizing for > +size, otherwise returns the entry block frequency. > +@end deftypefn > + > @deftypefn {Target Hook} bool TARGET_LRA_P (void) > A target hook which returns true if we use LRA instead of reload pass. > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > index 631d04131e3..6dbe22581ca 100644 > --- a/gcc/doc/tm.texi.in > +++ b/gcc/doc/tm.texi.in > @@ -2388,6 +2388,8 @@ in the reload pass. > > @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > > +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > + > @hook TARGET_LRA_P > > @hook TARGET_REGISTER_PRIORITY > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > index 0699b349a1a..233060e1587 100644 > --- a/gcc/ira-color.cc > +++ b/gcc/ira-color.cc > @@ -2180,8 +2180,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > + ira_memory_move_cost[mode][rclass][1]) > * saved_nregs / hard_regno_nregs (hard_regno, > mode) - 1) > - * (optimize_size ? 1 : > -
Re: Patch held up in gcc-patches due to size
On Mon, Feb 3, 2025 at 9:55 AM Jonathan Wakely wrote: > > > > On Sun, 2 Feb 2025, 18:10 Thomas Koenig via Gcc, wrote: >> >> Hi, >> >> I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html >> to gcc-patches also, as normal, but got back an e-mail that it >> was too large. and that a moderator would look at it. >> >> Maybe the limits can be increased a bit, sometimes patches can >> be quite large, especially if they contain large test cases >> or a large number of generated files. > > > The limits for gcc-patches are already larger than other lists, but 560kB is > pretty big. You can gzip the patch if it's too large. > >> >> (Does anybody actually look at the messages, as promised in the e-mail? > > > > I don't know about that list. There are moderators and mod queues for other > gcc lists. I don't think we ever unlock too large mails. But I'm not sure the message you get can be altered individually based on the reason of the moderation. Richard. > >> >>
Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale
On Mon, Feb 3, 2025 at 5:27 PM Richard Biener wrote: > > On Mon, Feb 3, 2025 at 7:23 AM H.J. Lu wrote: > > > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > > Author: Surya Kumari Jangala > > Date: Tue Jun 25 08:37:49 2024 -0500 > > > > ira: Scale save/restore costs of callee save registers with block > > frequency > > > > scales the cost of saving/restoring a callee-save hard register in epilogue > > and prologue with the entry block frequency, which, if not optimizing for > > size, is 1, for all targets. As the result, callee-saved registers > > may not be used to preserve local variable values across calls on some > > targets, like x86. Add a target hook for the callee-saved register cost > > scale in epilogue and prologue used by IRA. The default version of this > > target hook returns 1 if optimizing for size, otherwise returns the entry > > block frequency. Add an x86 version of this target hook to restore the > > old behavior prior to the above commit. > > > > PR rtl-optimization/111673 > > PR rtl-optimization/115932 > > PR rtl-optimization/116028 > > PR rtl-optimization/117081 > > PR rtl-optimization/117082 > > PR rtl-optimization/118497 > > * ira-color.cc (assign_hard_reg): Call the target hook for the > > callee-saved register cost scale in epilogue and prologue. > > * target.def (ira_callee_saved_register_cost_scale): New target > > hook. > > * targhooks.cc (default_ira_callee_saved_register_cost_scale): > > New. > > * targhooks.h (default_ira_callee_saved_register_cost_scale): > > Likewise. > > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): > > New. > > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise. > > * doc/tm.texi: Regenerated. > > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): > > New. > > > > Signed-off-by: H.J. Lu > > --- > > gcc/config/i386/i386.cc | 11 +++ > > gcc/doc/tm.texi | 8 > > gcc/doc/tm.texi.in | 2 ++ > > gcc/ira-color.cc| 3 +-- > > gcc/target.def | 12 > > gcc/targhooks.cc| 8 > > gcc/targhooks.h | 1 + > > 7 files changed, 43 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index f89201684a8..3128973ba79 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass) > >return false; > > } > > > > +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ > > + > > +static int > > +ix86_ira_callee_saved_register_cost_scale (int) > > +{ > > + return 1; > > +} > > + > > /* Return true if a set of DST by the expression SRC should be allowed. > > This prevents complex sets of likely_spilled hard regs before split1. > > */ > > > > @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p > > #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS > > ix86_preferred_output_reload_class > > #undef TARGET_CLASS_LIKELY_SPILLED_P > > #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p > > +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > > +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \ > > + ix86_ira_callee_saved_register_cost_scale > > > > #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST > > #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > > index 0de24eda6f0..9f42913a4ef 100644 > > --- a/gcc/doc/tm.texi > > +++ b/gcc/doc/tm.texi > > @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for > > given pseudo from > >The default version of this target hook always returns given class. > > @end deftypefn > > > > +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > > (int @var{hard_regno}) > > +A target hook which returns the callee-saved register @var{hard_regno} > > +cost scale in epilogue and prologue used by IRA. > > + > > +The default version of this target hook returns 1 if optimizing for > > +size, otherwise returns the entry block frequency. > > +@end deftypefn > > + > > @deftypefn {Target Hook} bool TARGET_LRA_P (void) > > A target hook which returns true if we use LRA instead of reload pass. > > > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > > index 631d04131e3..6dbe22581ca 100644 > > --- a/gcc/doc/tm.texi.in > > +++ b/gcc/doc/tm.texi.in > > @@ -2388,6 +2388,8 @@ in the reload pass. > > > > @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > > > > +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > > + > > @hook TARGET_LRA_P > > > > @hook TARGET_REGISTER_PRIORITY > > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > index 0699b349a1a..233060e1587 100644 > > --- a/gcc/ira-color.cc > > +++ b/gcc/ira-color.cc > > @@ -2180,8 +2180,7 @@ assign_hard_reg (ira_allocno
Re: [PATCH 0/61] Improve Mips target
On Fri, Jan 31, 2025 at 6:18 PM Aleksandar Rakic wrote: > > This patch series improves the support for the mips64r6 target in GCC, > includes the enhancements to the general bug fixes and contains other > MIPS ISA and processor enablement. > > These patches are cherry-picked from the mips_rel/11_2_0/master > and mips_rel/9_3_0/master branches from the MIPS' repository: > https://github.com/MIPS/gcc . > Further details on the individual changes are included in the > respective patches. Please split up this series at least into patches that solely affect mips/ and send patches that touch middle-end parts separately. A 61 patches series is unlikely to be looked at this way. Richard.
Re: [PATCH] ira: Cap callee-saved register cost scale to 300
On Mon, Feb 3, 2025 at 5:21 PM Richard Biener wrote: > > On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu wrote: > > > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener > > wrote: > > > > > > > > > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu : > > > > > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > > > wrote: > > > >> > > > >> > > > >> > > > Am 02.02.2025 um 08:00 schrieb H.J. Lu : > > > >>> > > > >>> Don't increase callee-saved register cost by 1000x, which leads to > > > >>> that > > > >>> callee-saved registers aren't used to preserve local variable values > > > >>> across calls, by capping the scale to 300. > > > >> > > > >>> PR rtl-optimization/111673 > > > >>> PR rtl-optimization/115932 > > > >>> PR rtl-optimization/116028 > > > >>> PR rtl-optimization/117081 > > > >>> PR rtl-optimization/118497 > > > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost > > > >>> scale to 300. > > > >>> > > > >>> Signed-off-by: H.J. Lu > > > >>> --- > > > >>> gcc/ira-color.cc | 16 ++-- > > > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > > > >>> > > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > > >>> index 0699b349a1a..707ff188250 100644 > > > >>> --- a/gcc/ira-color.cc > > > >>> +++ b/gcc/ira-color.cc > > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool > > > >>> retry_p) > > > >>> /* We need to save/restore the hard register in > > > >>>epilogue/prologue. Therefore we increase the cost. */ > > > >>> { > > > >>> +int scale; > > > >>> +if (optimize_size) > > > >>> + scale = 1; > > > >>> +else > > > >>> + { > > > >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > > >>> +/* Don't increase callee-saved register cost by 1000x, > > > >>> + which leads to that callee-saved registers aren't > > > >>> + used to preserve local variable values across calls, > > > >>> + by capping the scale to 300. */ > > > >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > > > >>> + scale = 300; > > > >> > > > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have > > > >> expected to scale this down to [0, 300] or is MAX a magic value? > > > > > > > > There are > > > > > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > > > > This constant does not need to be high, as in infrequently executed > > > > regions we want to count instructions equivalently to optimize for > > > > size instead of speed. */ > > > > #define REG_FREQ_MAX 1000 > > > > > > > > /* Compute register frequency from the BB frequency. When optimizing > > > > for size, > > > > or profile driven feedback is available and the function is never > > > > executed, > > > > frequency is always equivalent. Otherwise rescale the basic block > > > > frequency. */ > > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > > > \ > > > > || !cfun->cfg->count_max.initialized_p > > > > ()) \ > > > > ? REG_FREQ_MAX > > > > \ > > > > : ((bb)->count.to_frequency (cfun) > > > > \ > > > >* REG_FREQ_MAX / BB_FREQ_MAX) > > > > \ > > > > ? ((bb)->count.to_frequency (cfun) > > > > \ > > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > > \ > > > > : 1) > > > > > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only > > > > want > > > > to get a more reasonable default scale, instead of 1000. Lower > > > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > > > > > I see. Why not adjust the above macro then? That would be a bit more > > > obvious. Like use MAX/2 or so? > > > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > > Author: Surya Kumari Jangala > > Date: Tue Jun 25 08:37:49 2024 -0500 > > > > ira: Scale save/restore costs of callee save registers with block > > frequency > > > > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. > > I don't want to change REG_FREQ_FROM_BB since it is used in other places, > > not as a cost scale. Maybe the above commit should be reverted and we add > > a target hook for callee-saved register cost scale. Each target can choose > > a proper cost scale, install of increasing the cost by 1000x for everyone. > > I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least, > as it doesn't seem to be used. The comment talks about profile feedback, > but for example with -fprofile-correction or -fpartial-profile this > test looks odd. > In fact optimize_function_for_size_p should already handle this correctly. > > Also REG_FREQ_FROM_BB simply document