[gcc r12-10847] c++: Don't reject pointer to virtual method during constant evaluation [PR117615]
https://gcc.gnu.org/g:ae8d9d2b40aa7fd6a455beda38ff1b3c21728c31 commit r12-10847-gae8d9d2b40aa7fd6a455beda38ff1b3c21728c31 Author: Simon Martin Date: Tue Dec 3 14:30:43 2024 +0100 c++: Don't reject pointer to virtual method during constant evaluation [PR117615] We currently reject the following valid code: === cut here === struct Base { virtual void doit (int v) const {} }; struct Derived : Base { void doit (int v) const {} }; using fn_t = void (Base::*)(int) const; struct Helper { fn_t mFn; constexpr Helper (auto && fn) : mFn(static_cast(fn)) {} }; void foo () { constexpr Helper h (&Derived::doit); } === cut here === The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is represented with an expression with type pointer to method and using an INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any such expression with a non-null INTEGER_CST. This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for PR c++/102786), and simply lets such expressions go through. PR c++/117615 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression): Don't reject INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-virtual22.C: New test. (cherry picked from commit 72a2380a306a1c3883cb7e4f99253522bc265af0) Diff: --- gcc/cp/constexpr.cc | 6 ++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C | 22 ++ 2 files changed, 28 insertions(+) diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index 20abbee3600e..6c8d8ab17f29 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -7353,6 +7353,12 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, return t; } } + else if (TYPE_PTR_P (type) + && TREE_CODE (TREE_TYPE (type)) == METHOD_TYPE) + /* INTEGER_CST with pointer-to-method type is only used +for a virtual method in a pointer to member function. +Don't reject those. */ + ; else { /* This detects for example: diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C new file mode 100644 index ..89330bf86200 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C @@ -0,0 +1,22 @@ +// PR c++/117615 +// { dg-do "compile" { target c++20 } } + +struct Base { +virtual void doit (int v) const {} +}; + +struct Derived : Base { +void doit (int v) const {} +}; + +using fn_t = void (Base::*)(int) const; + +struct Helper { +fn_t mFn; +constexpr Helper (auto && fn) : mFn(static_cast(fn)) {} +}; + +void foo () { +constexpr Helper h (&Derived::doit); +constexpr Helper h2 (&Base::doit); +}
[gcc r15-5932] Allow limited extended asm at toplevel [PR41045]
https://gcc.gnu.org/g:ca4d6285974817080d3488b293c4970a8231372b commit r15-5932-gca4d6285974817080d3488b293c4970a8231372b Author: Jakub Jelinek Date: Thu Dec 5 09:25:06 2024 +0100 Allow limited extended asm at toplevel [PR41045] In the Cauldron IPA/LTO BoF we've discussed toplevel asms and it was discussed it would be nice to tell the compiler something about what the toplevel asm does. Sure, I'm aware the kernel people said they aren't willing to use something like that, but perhaps other projects do. And for kernel perhaps we should add some new option which allows some dumb parsing of the toplevel asms and gather something from that parsing. The following patch is just a small step towards that, namely, allow some subset of extended inline asm outside of functions. The patch is unfinished, LTO streaming (out/in) of the ASM_EXPRs isn't implemented (it emits a sorry diagnostics), nor any cgraph/varpool changes to find out references etc. The patch allows something like: int a[2], b; enum { E1, E2, E3, E4, E5 }; struct S { int a; char b; long long c; }; asm (".section blah; .quad %P0, %P1, %P2, %P3, %P4; .previous" : : "m" (a), "m" (b), "i" (42), "i" (E4), "i" (sizeof (struct S))); Even for non-LTO, that could be useful e.g. for getting enumerators from C/C++ as integers into the toplevel asm, or sizeof/offsetof etc. The restrictions I've implemented are: 1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be specified at toplevel, asm volatile has the volatile ignored for C++ with a warning and is an error in C like before 2) I see good use for mainly input operands, output maybe to make it clear that the inline asm may write some memory, I don't see a good use for clobbers, so the patch doesn't allow those (and of course labels because asm goto can't be specified) 3) the patch allows only constraints which don't allow registers, so typically "m" or "i" or other memory or immediate constraints; for memory, it requires that the operand is addressable and its address could be used in static var initializer (so that no code actually needs to be emitted for it), for others that they are constants usable in the static var initializers 4) the patch disallows + (there is no reload of the operands, so I don't see benefits of tying some operands together), nor % (who cares if something is commutative in this case), or & (again, no code is emitted around the asm), nor the 0-9 constraints Right now there is no way to tell the compiler that the inline asm defines some symbol, that is implemented in a later patch, as : constraint. Similarly, the c modifier doesn't work in all cases and the cc modifier is implemented separately. 2024-12-05 Jakub Jelinek PR c/41045 gcc/ * output.h (insn_noperands): Declare. * final.cc (insn_noperands): No longer static. * varasm.cc (assemble_asm): Handle ASM_EXPR. * lto-streamer-out.cc (lto_output_toplevel_asms): Add sorry_at for non-STRING_CST toplevel asm for now. * doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document that extended asm is now allowed outside of functions with certain restrictions. gcc/c/ * c-parser.cc (c_parser_asm_string_literal): Add forward declaration. (c_parser_asm_definition): Parse also extended asm without clobbers/labels. * c-typeck.cc (build_asm_expr): Allow extended asm outside of functions and check extra restrictions. gcc/cp/ * cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument. * parser.cc (cp_parser_asm_definition): Parse also extended asm without clobbers/labels outside of functions. * semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set, check extra restrictions for extended asm outside of functions. * pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller. gcc/testsuite/ * c-c++-common/toplevel-asm-1.c: New test. * c-c++-common/toplevel-asm-2.c: New test. * c-c++-common/toplevel-asm-3.c: New test. Diff: --- gcc/c/c-parser.cc | 67 ++- gcc/c/c-typeck.cc | 56 ++ gcc/cp/cp-tree.h| 2 +- gcc/cp/parser.cc| 15 ++- gcc/cp/pt.cc| 2 +- gcc/cp/semantics.cc | 92 +++- gcc/doc/extend.texi | 32 +++--- gcc/final.cc| 2 +- gcc/lto-streamer-out.cc | 7 ++ gcc/output.h
[gcc r15-5944] Match: Refactor the unsigned SAT_TRUNC match patterns [NFC]
https://gcc.gnu.org/g:9163d16e4f56ced25839ff246c56e166ae62e962 commit r15-5944-g9163d16e4f56ced25839ff246c56e166ae62e962 Author: Pan Li Date: Thu Dec 5 09:19:39 2024 +0800 Match: Refactor the unsigned SAT_TRUNC match patterns [NFC] This patch would like to refactor the all unsigned SAT_TRUNC patterns, aka: * Extract type check outside. * Re-arrange the related match pattern forms together. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Refactor sorts of unsigned SAT_TRUNC match patterns. Signed-off-by: Pan Li Diff: --- gcc/match.pd | 112 +++ 1 file changed, 52 insertions(+), 60 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index fd1d8bcc7763..650c3f4cc1df 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3262,6 +3262,58 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) } (if (wi::eq_p (sum, wi::uhwi (0, precision +(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)) + (match (unsigned_integer_sat_trunc @0) + /* SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))) */ + (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1))) (convert @0)) + (if (TYPE_UNSIGNED (TREE_TYPE (@0))) + (with +{ + unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0)); + unsigned otype_precision = TYPE_PRECISION (type); + wide_int trunc_max = wi::mask (otype_precision, false, itype_precision); + wide_int int_cst = wi::to_wide (@1, itype_precision); +} +(if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst)) + (match (unsigned_integer_sat_trunc @0) + /* SAT_U_TRUNC = (NT)(MIN_EXPR (X, IMM)) + If Op_0 def is MIN_EXPR and not single_use. Aka below pattern: + + _18 = MIN_EXPR ; // op_0 def + iftmp.0_11 = (unsigned int) _18; // op_0 + stream.avail_out = iftmp.0_11; + left_37 = left_8 - _18; // op_0 use + + Transfer to .SAT_TRUNC will have MIN_EXPR still live. Then the backend + (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC + besides the MIN_EXPR. Thus, keep the normal truncation as is should be + the better choose. */ + (convert (min@2 @0 INTEGER_CST@1)) + (if (TYPE_UNSIGNED (TREE_TYPE (@0)) && single_use (@2)) + (with +{ + unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0)); + unsigned otype_precision = TYPE_PRECISION (type); + wide_int trunc_max = wi::mask (otype_precision, false, itype_precision); + wide_int int_cst = wi::to_wide (@1, itype_precision); +} +(if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst)) + (match (unsigned_integer_sat_trunc @0) + /* SAT_U_TRUNC = (NT)X | ((NT)(X <= (WT)-1) + (NT)-1) */ + (bit_ior:c (plus:c (convert (le @0 INTEGER_CST@1)) INTEGER_CST@2) +(convert @0)) + (if (TYPE_UNSIGNED (TREE_TYPE (@0))) + (with +{ + unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0)); + unsigned otype_precision = TYPE_PRECISION (type); + wide_int trunc_max = wi::mask (otype_precision, false, itype_precision); + wide_int max = wi::mask (otype_precision, false, otype_precision); + wide_int int_cst_1 = wi::to_wide (@1); + wide_int int_cst_2 = wi::to_wide (@2); +} +(if (wi::eq_p (trunc_max, int_cst_1) && wi::eq_p (max, int_cst_2))) + /* Signed saturation add, case 1: T sum = (T)((UT)X + (UT)Y) SAT_S_ADD = (X ^ sum) & !(X ^ Y) < 0 ? (-(T)(X < 0) ^ MAX) : sum; @@ -3416,66 +3468,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) @2) (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type -/* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT). - SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))). */ -(match (unsigned_integer_sat_trunc @0) - (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1))) - (convert @0)) - (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) - && TYPE_UNSIGNED (TREE_TYPE (@0))) - (with - { - unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0)); - unsigned otype_precision = TYPE_PRECISION (type); - wide_int trunc_max = wi::mask (otype_precision, false, itype_precision); - wide_int int_cst = wi::to_wide (@1, itype_precision); - } - (if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst)) - -/* Unsigned saturation truncate, case 2, sizeof (WT) > sizeof (NT). - SAT_U_TRUNC = (NT)(MIN_EXPR (X, 255)). */ -/* If Op_0 def is MIN_EXPR and not single_use. Aka below pattern: - - _18 = MIN_EXPR ; // op_0 def - iftmp.0_11 = (unsigned int) _18; // op_0 - stream.avail_out = iftmp.0_11; - left_37 = left_8 - _18; // op_0 use - - Transfer to .SAT_TRUNC will have MIN_EXPR still live. Then the backend - (for example x86/riscv) will have 2-3 more insns generation for .SAT_TRUNC - b
[gcc r15-5949] arm: remove support for iWMMX/iWMMX2 intrinsics
https://gcc.gnu.org/g:a92b2be97f369ae4c6e1cdcbb7a45525994afaad commit r15-5949-ga92b2be97f369ae4c6e1cdcbb7a45525994afaad Author: Richard Earnshaw Date: Thu Dec 5 15:14:09 2024 + arm: remove support for iWMMX/iWMMX2 intrinsics The mmintrin.h header was adjusted for GCC-14 to generate a (suppressible) warning if it was used, saying that support would be removed in GCC-15. Make that come true by removing the contents of this header and emitting an error. At this point in time I've not removed the internal support for the intrinsics, just the wrappers that enable access to them. That can be done at leisure from now on. gcc/ChangeLog: * config/arm/mmintrin.h: Raise an error if this header is used. Remove other content. Diff: --- gcc/config/arm/mmintrin.h | 1812 + 1 file changed, 1 insertion(+), 1811 deletions(-) diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h index e9cc3ddd7ab7..65b6f943cf3d 100644 --- a/gcc/config/arm/mmintrin.h +++ b/gcc/config/arm/mmintrin.h @@ -24,1816 +24,6 @@ #ifndef _MMINTRIN_H_INCLUDED #define _MMINTRIN_H_INCLUDED -#ifndef __IWMMXT__ -#error mmintrin.h included without enabling WMMX/WMMX2 instructions (e.g. -march=iwmmxt or -march=iwmmxt2) -#endif - -#ifndef __ENABLE_DEPRECATED_IWMMXT -#warning support for WMMX/WMMX2 is deprecated and will be removed in GCC 15. Define __ENABLE_DEPRECATED_IWMMXT to suppress this warning -#endif - -#if defined __cplusplus -extern "C" { -/* Intrinsics use C name-mangling. */ -#endif /* __cplusplus */ - -/* The data type intended for user use. */ -typedef unsigned long long __m64, __int64; - -/* Internal data types for implementing the intrinsics. */ -typedef int __v2si __attribute__ ((vector_size (8))); -typedef short __v4hi __attribute__ ((vector_size (8))); -typedef signed char __v8qi __attribute__ ((vector_size (8))); - -/* Provided for source compatibility with MMX. */ -extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_empty (void) -{ -} - -/* "Convert" __m64 and __int64 into each other. */ -static __inline __m64 -_mm_cvtsi64_m64 (__int64 __i) -{ - return __i; -} - -static __inline __int64 -_mm_cvtm64_si64 (__m64 __i) -{ - return __i; -} - -static __inline int -_mm_cvtsi64_si32 (__int64 __i) -{ - return __i; -} - -static __inline __int64 -_mm_cvtsi32_si64 (int __i) -{ - return (__i & 0x); -} - -/* Pack the four 16-bit values from M1 into the lower four 8-bit values of - the result, and the four 16-bit values from M2 into the upper four 8-bit - values of the result, all with signed saturation. */ -static __inline __m64 -_mm_packs_pi16 (__m64 __m1, __m64 __m2) -{ - return (__m64) __builtin_arm_wpackhss ((__v4hi)__m1, (__v4hi)__m2); -} - -/* Pack the two 32-bit values from M1 in to the lower two 16-bit values of - the result, and the two 32-bit values from M2 into the upper two 16-bit - values of the result, all with signed saturation. */ -static __inline __m64 -_mm_packs_pi32 (__m64 __m1, __m64 __m2) -{ - return (__m64) __builtin_arm_wpackwss ((__v2si)__m1, (__v2si)__m2); -} - -/* Copy the 64-bit value from M1 into the lower 32-bits of the result, and - the 64-bit value from M2 into the upper 32-bits of the result, all with - signed saturation for values that do not fit exactly into 32-bits. */ -static __inline __m64 -_mm_packs_pi64 (__m64 __m1, __m64 __m2) -{ - return (__m64) __builtin_arm_wpackdss ((long long)__m1, (long long)__m2); -} - -/* Pack the four 16-bit values from M1 into the lower four 8-bit values of - the result, and the four 16-bit values from M2 into the upper four 8-bit - values of the result, all with unsigned saturation. */ -static __inline __m64 -_mm_packs_pu16 (__m64 __m1, __m64 __m2) -{ - return (__m64) __builtin_arm_wpackhus ((__v4hi)__m1, (__v4hi)__m2); -} - -/* Pack the two 32-bit values from M1 into the lower two 16-bit values of - the result, and the two 32-bit values from M2 into the upper two 16-bit - values of the result, all with unsigned saturation. */ -static __inline __m64 -_mm_packs_pu32 (__m64 __m1, __m64 __m2) -{ - return (__m64) __builtin_arm_wpackwus ((__v2si)__m1, (__v2si)__m2); -} - -/* Copy the 64-bit value from M1 into the lower 32-bits of the result, and - the 64-bit value from M2 into the upper 32-bits of the result, all with - unsigned saturation for values that do not fit exactly into 32-bits. */ -static __inline __m64 -_mm_packs_pu64 (__m64 __m1, __m64 __m2) -{ - return (__m64) __builtin_arm_wpackdus ((long long)__m1, (long long)__m2); -} - -/* Interleave the four 8-bit values from the high half of M1 with the four - 8-bit values from the high half of M2. */ -static __inline __m64 -_mm_unpackhi_pi8 (__m64 __m1, __m64 __m2) -{ - return (__m64) __builtin_arm_wunpckihb ((__v8qi)__m1, (__v8qi)__m2); -} - -/* Interleave the two 16-bit values
[gcc r15-5948] aarch64: Mark vluti* intrinsics as QUIET
https://gcc.gnu.org/g:cd9499a78dd57c311a9cfd1e0ba132833eaea490 commit r15-5948-gcd9499a78dd57c311a9cfd1e0ba132833eaea490 Author: Richard Sandiford Date: Thu Dec 5 15:33:11 2024 + aarch64: Mark vluti* intrinsics as QUIET This patch fixes the vluti* definitions to say that they don't raise FP exceptions even for floating-point modes. gcc/ * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_TERNARY_VLUT8): Use FLAG_QUIET rather than FLAG_DEFAULT. (ENTRY_TERNARY_VLUT16): Likewise. Diff: --- .../aarch64/aarch64-simd-pragma-builtins.def | 24 +++--- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd-pragma-builtins.def b/gcc/config/aarch64/aarch64-simd-pragma-builtins.def index dfcfa8a0ac02..bc9a63b968af 100644 --- a/gcc/config/aarch64/aarch64-simd-pragma-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-pragma-builtins.def @@ -37,32 +37,32 @@ #undef ENTRY_TERNARY_VLUT8 #define ENTRY_TERNARY_VLUT8(T) \ ENTRY_BINARY_LANE (vluti2_lane_##T##8, T##8q, T##8, u8, \ -UNSPEC_LUTI2, DEFAULT) \ +UNSPEC_LUTI2, QUIET) \ ENTRY_BINARY_LANE (vluti2_laneq_##T##8, T##8q, T##8, u8q,\ -UNSPEC_LUTI2, DEFAULT) \ +UNSPEC_LUTI2, QUIET) \ ENTRY_BINARY_LANE (vluti2q_lane_##T##8, T##8q, T##8q, u8,\ -UNSPEC_LUTI2, DEFAULT) \ +UNSPEC_LUTI2, QUIET) \ ENTRY_BINARY_LANE (vluti2q_laneq_##T##8, T##8q, T##8q, u8q, \ -UNSPEC_LUTI2, DEFAULT) \ +UNSPEC_LUTI2, QUIET) \ ENTRY_BINARY_LANE (vluti4q_lane_##T##8, T##8q, T##8q, u8,\ -UNSPEC_LUTI4, DEFAULT) \ +UNSPEC_LUTI4, QUIET) \ ENTRY_BINARY_LANE (vluti4q_laneq_##T##8, T##8q, T##8q, u8q, \ -UNSPEC_LUTI4, DEFAULT) +UNSPEC_LUTI4, QUIET) #undef ENTRY_TERNARY_VLUT16 #define ENTRY_TERNARY_VLUT16(T) \ ENTRY_BINARY_LANE (vluti2_lane_##T##16, T##16q, T##16, u8, \ -UNSPEC_LUTI2, DEFAULT) \ +UNSPEC_LUTI2, QUIET) \ ENTRY_BINARY_LANE (vluti2_laneq_##T##16, T##16q, T##16, u8q, \ -UNSPEC_LUTI2, DEFAULT) \ +UNSPEC_LUTI2, QUIET) \ ENTRY_BINARY_LANE (vluti2q_lane_##T##16, T##16q, T##16q, u8, \ -UNSPEC_LUTI2, DEFAULT) \ +UNSPEC_LUTI2, QUIET) \ ENTRY_BINARY_LANE (vluti2q_laneq_##T##16, T##16q, T##16q, u8q, \ -UNSPEC_LUTI2, DEFAULT) \ +UNSPEC_LUTI2, QUIET) \ ENTRY_BINARY_LANE (vluti4q_lane_##T##16_x2, T##16q, T##16qx2, u8,\ -UNSPEC_LUTI4, DEFAULT) \ +UNSPEC_LUTI4, QUIET) \ ENTRY_BINARY_LANE (vluti4q_laneq_##T##16_x2, T##16q, T##16qx2, u8q, \ -UNSPEC_LUTI4, DEFAULT) +UNSPEC_LUTI4, QUIET) // faminmax #define REQUIRED_EXTENSIONS nonstreaming_only (AARCH64_FL_FAMINMAX)
[gcc r15-5950] i386: Fix addcarry/subborrow issues [PR117860]
https://gcc.gnu.org/g:b3cb0c3302a7c16e661a08c15c897c8f7bbb5d23 commit r15-5950-gb3cb0c3302a7c16e661a08c15c897c8f7bbb5d23 Author: Uros Bizjak Date: Thu Dec 5 17:02:46 2024 +0100 i386: Fix addcarry/subborrow issues [PR117860] Fix several things to enable combine to handle addcarry/subborrow patterns: - Fix wrong canonical form of addcarry insn and friends. For commutative operand (PLUS RTX) binary operand (LTU) takes precedence before unary operand (ZERO_EXTEND). - Swap operands of GTU comparison to canonicalize addcarry/subborrow comparison. Again, the canonical form of the compare is PLUS RTX before ZERO_EXTEND RTX. GTU comparison is not a carry flag comparison, so we have to swap operands in x86_canonicalize_comparison to a non-canonical form to use LTU comparison. - Return correct compare mode (CCCmode) for addcarry/subborrow pattern from ix86_cc_mode, so combine is able to emit required compare mode for combined insn. - Add *subborrow_1 pattern having const_scalar_int_operand predicate. Here, canonicalization of SUB (op1, const) RTX to PLUS (op1, -const) requires negation of constant operand when ckecking operands. With the above changes, combine is able to create *addcarry_1/*subborrow_1 pattern with immediate operand for the testcase in the PR: SomeAddFunc: addq%rcx, %rsi # 10[c=4 l=3] adddi3_cc_overflow_1/0 movq%rdi, %rax # 33[c=4 l=3] *movdi_internal/3 adcq$5, %rdx# 19[c=4 l=4] *addcarrydi_1/0 movq%rsi, (%rdi)# 23[c=4 l=3] *movdi_internal/5 movq%rdx, 8(%rdi) # 24[c=4 l=4] *movdi_internal/5 setc%dl # 39[c=4 l=3] *setcc_qi movzbl %dl, %edx # 40[c=4 l=3] zero_extendqidi2/0 movq%rdx, 16(%rdi) # 26[c=4 l=4] *movdi_internal/5 ret # 43[c=0 l=1] simple_return_internal SomeSubFunc: subq%rcx, %rsi # 10[c=4 l=3] *subdi_3/0 movq%rdi, %rax # 42[c=4 l=3] *movdi_internal/3 sbbq$17, %rdx # 19[c=4 l=4] *subborrowdi_1/0 movq%rsi, (%rdi)# 33[c=4 l=3] *movdi_internal/5 sbbq%rcx, %rcx # 29[c=8 l=3] *x86_movdicc_0_m1_neg movq%rdx, 8(%rdi) # 34[c=4 l=4] *movdi_internal/5 movq%rcx, 16(%rdi) # 35[c=4 l=4] *movdi_internal/5 ret # 51[c=0 l=1] simple_return_internal PR target/117860 gcc/ChangeLog: * config/i386/i386.cc (ix86_canonicalize_comparison): Swap operands of GTU comparison to canonicalize addcarry/subborrow comparison. (ix86_cc_mode): Return CCCmode for the comparison of addcarry/subborrow pattern. * config/i386/i386.md (addcarry): Swap operands of PLUS RTX to make it canonical. (*addcarry_1): Ditto. (addcarry peephole2s): Update RTXes for addcarry_1 change. (*add3_doubleword_cc_overflow_1): Ditto. (*subborrow_1): New insn pattern. gcc/testsuite/ChangeLog: * gcc.target/i386/pr117860.c: New test. Diff: --- gcc/config/i386/i386.cc | 23 - gcc/config/i386/i386.md | 85 +--- gcc/testsuite/gcc.target/i386/pr117860.c | 52 +++ 3 files changed, 140 insertions(+), 20 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 0beeb514cf95..23ff16b40812 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -578,11 +578,25 @@ ix86_canonicalize_comparison (int *code, rtx *op0, rtx *op1, { std::swap (*op0, *op1); *code = (int) scode; + return; } } + + /* Swap operands of GTU comparison to canonicalize + addcarry/subborrow comparison. */ + if (!op0_preserve_value + && *code == GTU + && GET_CODE (*op0) == PLUS + && ix86_carry_flag_operator (XEXP (*op0, 0), VOIDmode) + && GET_CODE (XEXP (*op0, 1)) == ZERO_EXTEND + && GET_CODE (*op1) == ZERO_EXTEND) +{ + std::swap (*op0, *op1); + *code = (int) swap_condition ((enum rtx_code) *code); + return; +} } - /* Hook to determine if one function can safely inline another. */ static bool @@ -16479,6 +16493,13 @@ ix86_cc_mode (enum rtx_code code, rtx op0, rtx op1) && GET_CODE (op1) == GEU && GET_MODE (XEXP (op1, 0)) == CCCmode) return CCCmode; + /* Similarly for the comparison of addcarry/subborrow pattern. */ + else if (code == LTU + && GET_CODE (op0) == ZERO_EXTEND + && GET_CODE (op1) == PLUS + && ix86_carry_flag_operator (XEXP (op1, 0)
[gcc r15-5943] middle-end/117801 - failed register coalescing due to GIMPLE schedule
https://gcc.gnu.org/g:dc0dea98c96e02c6b24060170bc88da8d4931bc2 commit r15-5943-gdc0dea98c96e02c6b24060170bc88da8d4931bc2 Author: Richard Biener Date: Wed Nov 27 13:36:19 2024 +0100 middle-end/117801 - failed register coalescing due to GIMPLE schedule For a TSVC testcase we see failed register coalescing due to a different schedule of GIMPLE .FMA and stores fed by it. This can be mitigated by making direct internal functions participate in TER - given we're using more and more of such functions to expose target capabilities it seems to be a natural thing to not exempt those. Unfortunately the internal function expanding API doesn't match what we usually have - passing in a target and returning an RTX but instead the LHS of the call is expanded and written to. This makes the TER expansion of a call SSA def a bit unwieldly. Bootstrapped and tested on x86_64-unknown-linux-gnu. The ccmp changes have likely not seen any coverage, the debug stmt changes might not be optimal, we might end up losing on replaceable calls. PR middle-end/117801 * tree-outof-ssa.cc (ssa_is_replaceable_p): Make direct internal function calls replaceable. * expr.cc (get_def_for_expr): Handle replacements with calls. (get_def_for_expr_class): Likewise. (optimize_bitfield_assignment_op): Likewise. (expand_expr_real_1): Likewise. Properly expand direct internal function defs. * cfgexpand.cc (expand_call_stmt): Handle replacements with calls. (avoid_deep_ter_for_debug): Likewise, always create a debug temp for calls. (expand_debug_expr): Likewise, give up for calls. (expand_gimple_basic_block): Likewise. * ccmp.cc (ccmp_candidate_p): Likewise. (get_compare_parts): Likewise. Diff: --- gcc/ccmp.cc | 4 ++-- gcc/cfgexpand.cc | 14 +++--- gcc/expr.cc | 19 ++- gcc/tree-outof-ssa.cc | 15 --- 4 files changed, 39 insertions(+), 13 deletions(-) diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc index 45629abadbe0..4f739dfda504 100644 --- a/gcc/ccmp.cc +++ b/gcc/ccmp.cc @@ -100,7 +100,7 @@ ccmp_candidate_p (gimple *g, bool outer = false) tree_code tcode; basic_block bb; - if (!g) + if (!g || !is_gimple_assign (g)) return false; tcode = gimple_assign_rhs_code (g); @@ -138,7 +138,7 @@ get_compare_parts (tree t, int *up, rtx_code *rcode, { tree_code code; gimple *g = get_gimple_for_ssa_name (t); - if (g) + if (g && is_gimple_assign (g)) { *up = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (g))); code = gimple_assign_rhs_code (g); diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index 58d68ec1caa5..ea08810df045 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -2848,6 +2848,7 @@ expand_call_stmt (gcall *stmt) if (builtin_p && TREE_CODE (arg) == SSA_NAME && (def = get_gimple_for_ssa_name (arg)) + && is_gimple_assign (def) && gimple_assign_rhs_code (def) == ADDR_EXPR) arg = gimple_assign_rhs1 (def); CALL_EXPR_ARG (exp, i) = arg; @@ -4414,7 +4415,7 @@ avoid_deep_ter_for_debug (gimple *stmt, int depth) gimple *g = get_gimple_for_ssa_name (use); if (g == NULL) continue; - if (depth > 6 && !stmt_ends_bb_p (g)) + if ((depth > 6 || !is_gimple_assign (g)) && !stmt_ends_bb_p (g)) { if (deep_ter_debug_map == NULL) deep_ter_debug_map = new hash_map; @@ -5388,7 +5389,13 @@ expand_debug_expr (tree exp) t = *slot; } if (t == NULL_TREE) - t = gimple_assign_rhs_to_tree (g); + { + if (is_gimple_assign (g)) + t = gimple_assign_rhs_to_tree (g); + else + /* expand_debug_expr doesn't handle CALL_EXPR right now. */ + return NULL; + } op0 = expand_debug_expr (t); if (!op0) return NULL; @@ -5964,7 +5971,8 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls) /* Look for SSA names that have their last use here (TERed names always have only one real use). */ FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE) - if ((def = get_gimple_for_ssa_name (op))) + if ((def = get_gimple_for_ssa_name (op)) + && is_gimple_assign (def)) { imm_use_iterator imm_iter; use_operand_p use_p; diff --git a/gcc/expr.cc b/gcc/expr.cc index 70f2ecec9983..5578e3d9e993 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3. If not see #include "rtx-vector-builder.h" #include "tree-pretty-print.h" #include "flags.h" +#include "internal-fn.h"
[gcc r15-5942] libstdc++: Use ADL swap for containers' function objects [PR117921]
https://gcc.gnu.org/g:0368c42507328774cadbea589509b95aaf3cb826 commit r15-5942-g0368c42507328774cadbea589509b95aaf3cb826 Author: Jonathan Wakely Date: Thu Dec 5 12:46:26 2024 + libstdc++: Use ADL swap for containers' function objects [PR117921] The standard says that Compare, Pred and Hash objects should be swapped as described in [swappable.requirements] which means calling swap unqualified with std::swap visible to name lookup. libstdc++-v3/ChangeLog: PR libstdc++/117921 * include/bits/hashtable_policy.h (_Hash_code_base::_M_swap): Use ADL swap for Hash members. (_Hashtable_base::_M_swap): Use ADL swap for _Equal members. * include/bits/stl_tree.h (_Rb_tree::swap): Use ADL swap for _Compare members. * testsuite/23_containers/set/modifiers/swap/adl.cc: New test. * testsuite/23_containers/unordered_set/modifiers/swap-2.cc: New test. Diff: --- libstdc++-v3/include/bits/hashtable_policy.h | 8 ++- libstdc++-v3/include/bits/stl_tree.h | 4 +- .../23_containers/set/modifiers/swap/adl.cc| 54 +++ .../unordered_set/modifiers/swap-2.cc | 62 ++ 4 files changed, 125 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index ad0dfd55c3f1..f2260f3926dc 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -1177,7 +1177,10 @@ namespace __detail void _M_swap(_Hash_code_base& __x) - { std::swap(__ebo_hash::_M_get(), __x.__ebo_hash::_M_get()); } + { + using std::swap; + swap(__ebo_hash::_M_get(), __x.__ebo_hash::_M_get()); + } const _Hash& _M_hash() const { return __ebo_hash::_M_cget(); } @@ -1561,7 +1564,8 @@ namespace __detail _M_swap(_Hashtable_base& __x) { __hash_code_base::_M_swap(__x); - std::swap(_EqualEBO::_M_get(), __x._EqualEBO::_M_get()); + using std::swap; + swap(_EqualEBO::_M_get(), __x._EqualEBO::_M_get()); } const _Equal& diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h index bc27e191e8b8..0f536517d6b7 100644 --- a/libstdc++-v3/include/bits/stl_tree.h +++ b/libstdc++-v3/include/bits/stl_tree.h @@ -2091,7 +2091,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION std::swap(this->_M_impl._M_node_count, __t._M_impl._M_node_count); } // No need to swap header's color as it does not change. - std::swap(this->_M_impl._M_key_compare, __t._M_impl._M_key_compare); + + using std::swap; + swap(this->_M_impl._M_key_compare, __t._M_impl._M_key_compare); _Alloc_traits::_S_on_swap(_M_get_Node_allocator(), __t._M_get_Node_allocator()); diff --git a/libstdc++-v3/testsuite/23_containers/set/modifiers/swap/adl.cc b/libstdc++-v3/testsuite/23_containers/set/modifiers/swap/adl.cc new file mode 100644 index ..2b7975a366fc --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/set/modifiers/swap/adl.cc @@ -0,0 +1,54 @@ +// { dg-do run { target c++11 } } + +// Bug 117921 - containers do not use ADL swap for Compare, Pred or Hash types + +#include +#include + +namespace adl +{ + struct Less : std::less + { +static bool swapped; +friend void swap(Less&, Less&) { swapped = true; } + }; + bool Less::swapped = false; + + struct Allocator_base + { +static bool swapped; + }; + bool Allocator_base::swapped = false; + + using std::size_t; + + template +struct Allocator : Allocator_base +{ + using value_type = T; + + Allocator() { } + template Allocator(const Allocator&) { } + + T* allocate(size_t n) { return std::allocator().allocate(n); } + void deallocate(T* p, size_t n) { std::allocator().deallocate(p, n); } + + using propagate_on_container_swap = std::true_type; + + friend void swap(Allocator&, Allocator&) { swapped = true; } +}; +} + +void +test_swap() +{ + std::set> s1, s2; + s1.swap(s2); + VERIFY( adl::Less::swapped ); + VERIFY( adl::Allocator_base::swapped ); +} + +int main() +{ + test_swap(); +} diff --git a/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/swap-2.cc b/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/swap-2.cc new file mode 100644 index ..a0fb1a6f662f --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/swap-2.cc @@ -0,0 +1,62 @@ +// { dg-do run { target c++11 } } + +// Bug 117921 - containers do not use ADL swap for Compare, Pred or Hash types + +#include +#include + +namespace adl +{ + struct Hash : std::hash + { +static bool swapped; +friend void swap(Hash&, Hash&) { swapped = true; } + }; + bool Hash::swapped = false; + + struct Eq : std::equal_to + { +
[gcc r15-5946] aarch64: Rename FLAG_NONE to FLAG_DEFAULT
https://gcc.gnu.org/g:1e181536ba5c39d987bf394d346f49982e6df83a commit r15-5946-g1e181536ba5c39d987bf394d346f49982e6df83a Author: Richard Sandiford Date: Thu Dec 5 15:33:10 2024 + aarch64: Rename FLAG_NONE to FLAG_DEFAULT This patch renames to FLAG_NONE to FLAG_DEFAULT. "NONE" suggests that the function has no side-effects, whereas it actually means that floating-point operations are assumed to read FPCR and to raise FP exceptions. gcc/ * config/aarch64/aarch64-builtins.cc (FLAG_NONE): Rename to... (FLAG_DEFAULT): ...this and update all references. * config/aarch64/aarch64-simd-builtins.def: Update all references here too. * config/aarch64/aarch64-simd-pragma-builtins.def: Likewise. Diff: --- gcc/config/aarch64/aarch64-builtins.cc | 32 +- gcc/config/aarch64/aarch64-simd-builtins.def | 726 ++--- .../aarch64/aarch64-simd-pragma-builtins.def | 24 +- 3 files changed, 391 insertions(+), 391 deletions(-) diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 4f735e8e58b8..eb44580bd9cb 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -193,7 +193,7 @@ using namespace aarch64; #define SIMD_MAX_BUILTIN_ARGS 5 /* Flags that describe what a function might do. */ -const unsigned int FLAG_NONE = 0U; +const unsigned int FLAG_DEFAULT = 0U; const unsigned int FLAG_READ_FPCR = 1U << 0; const unsigned int FLAG_RAISE_FP_EXCEPTIONS = 1U << 1; const unsigned int FLAG_READ_MEMORY = 1U << 2; @@ -913,7 +913,7 @@ static aarch64_fcmla_laneq_builtin_datum aarch64_fcmla_lane_builtin_data[] = { 2, \ { SIMD_INTR_MODE(A, L), SIMD_INTR_MODE(B, L) }, \ { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(B) }, \ - FLAG_NONE, \ + FLAG_DEFAULT, \ SIMD_INTR_MODE(A, L) == SIMD_INTR_MODE(B, L) \ && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \ }, @@ -925,7 +925,7 @@ static aarch64_fcmla_laneq_builtin_datum aarch64_fcmla_lane_builtin_data[] = { 2, \ { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \ { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \ - FLAG_NONE, \ + FLAG_DEFAULT, \ false \ }, @@ -936,7 +936,7 @@ static aarch64_fcmla_laneq_builtin_datum aarch64_fcmla_lane_builtin_data[] = { 2, \ { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \ { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \ - FLAG_NONE, \ + FLAG_DEFAULT, \ false \ }, @@ -1857,7 +1857,7 @@ aarch64_init_crc32_builtins () aarch64_crc_builtin_datum* d = &aarch64_crc_builtin_data[i]; tree argtype = aarch64_simd_builtin_type (d->mode, qualifier_unsigned); tree ftype = build_function_type_list (usi_type, usi_type, argtype, NULL_TREE); - tree attrs = aarch64_get_attributes (FLAG_NONE, d->mode); + tree attrs = aarch64_get_attributes (FLAG_DEFAULT, d->mode); tree fndecl = aarch64_general_add_builtin (d->name, ftype, d->fcode, attrs); @@ -2232,7 +2232,7 @@ static void aarch64_init_data_intrinsics (void) { /* These intrinsics are not fp nor they read/write memory. */ - tree attrs = aarch64_get_attributes (FLAG_NONE, SImode); + tree attrs = aarch64_get_attributes (FLAG_DEFAULT, SImode); tree uint32_fntype = build_function_type_list (uint32_type_node, uint32_type_node, NULL_TREE); tree ulong_fntype = build_function_type_list (long_unsigned_type_node, @@ -4048,7 +4048,7 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt, switch (fcode) { BUILTIN_VALL (UNOP, reduc_plus_scal_, 10, ALL) - BUILTIN_VDQ_I (UNOPU, reduc_plus_scal_, 10, NONE) + BUILTIN_VDQ_I (UNOPU, reduc_plus_scal_, 10, DEFAULT) new_stmt = gimple_build_call_internal (IFN_REDUC_PLUS, 1, args[0]); gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt)); @@ -4062,8 +4062,8 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt, break; BUILTIN_VDC (BINOP, combine, 0, QUIET) - BUILTIN_VD_I (BINOPU, combine, 0, NONE) - BUILTIN_VDC_P (BINOPP, combine, 0, NONE) + BUILTIN_VD_I (BINOPU, combine, 0, DEFAULT) + BUILTIN_VDC_P (BINOPP, combine, 0, DEFAULT) { tree first_part, second_part; if (BYTES_BIG_ENDIAN) @@ -4152,14 +4152,14 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt, 1, args[0]); gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt)); break; - BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE) + BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, DEFAULT) if (TREE_CODE (args[1]) == INTEGER_CST && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0]))) new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
[gcc r15-5945] aarch64: Rename FLAG_AUTO_FP to FLAG_QUIET
https://gcc.gnu.org/g:bd7363ed699cae78bd87d23922fdbf3dd51fa03b commit r15-5945-gbd7363ed699cae78bd87d23922fdbf3dd51fa03b Author: Richard Sandiford Date: Thu Dec 5 15:33:09 2024 + aarch64: Rename FLAG_AUTO_FP to FLAG_QUIET I'd suggested the name "FLAG_AUTO_FP" to mean "automatically derive FLAG_FP from the mode", i.e. automatically decide whether the function might read the FPCR or might raise FP exceptions. However, the flag currently suppresses that behaviour instead. This patch renames FLAG_AUTO_FP to FLAG_QUIET. That's probably not a great name, but it's also what the SVE code means by "quiet", and is borrowed from "quiet NaNs". gcc/ * config/aarch64/aarch64-builtins.cc (FLAG_AUTO_FP): Rename to... (FLAG_QUIET): ...this and update all references. * config/aarch64/aarch64-simd-builtins.def: Update all references here too. Diff: --- gcc/config/aarch64/aarch64-builtins.cc | 10 gcc/config/aarch64/aarch64-simd-builtins.def | 36 ++-- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 22f8216a45b3..4f735e8e58b8 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -202,13 +202,13 @@ const unsigned int FLAG_WRITE_MEMORY = 1U << 4; /* Not all FP intrinsics raise FP exceptions or read FPCR register, use this flag to suppress it. */ -const unsigned int FLAG_AUTO_FP = 1U << 5; +const unsigned int FLAG_QUIET = 1U << 5; const unsigned int FLAG_FP = FLAG_READ_FPCR | FLAG_RAISE_FP_EXCEPTIONS; const unsigned int FLAG_ALL = FLAG_READ_FPCR | FLAG_RAISE_FP_EXCEPTIONS | FLAG_READ_MEMORY | FLAG_PREFETCH_MEMORY | FLAG_WRITE_MEMORY; -const unsigned int FLAG_STORE = FLAG_WRITE_MEMORY | FLAG_AUTO_FP; -const unsigned int FLAG_LOAD = FLAG_READ_MEMORY | FLAG_AUTO_FP; +const unsigned int FLAG_STORE = FLAG_WRITE_MEMORY | FLAG_QUIET; +const unsigned int FLAG_LOAD = FLAG_READ_MEMORY | FLAG_QUIET; typedef struct { @@ -1322,7 +1322,7 @@ aarch64_init_simd_builtin_scalar_types (void) static unsigned int aarch64_call_properties (unsigned int flags, machine_mode mode) { - if (!(flags & FLAG_AUTO_FP) && FLOAT_MODE_P (mode)) + if (!(flags & FLAG_QUIET) && FLOAT_MODE_P (mode)) flags |= FLAG_FP; /* -fno-trapping-math means that we can assume any FP exceptions @@ -4061,7 +4061,7 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt, gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt)); break; - BUILTIN_VDC (BINOP, combine, 0, AUTO_FP) + BUILTIN_VDC (BINOP, combine, 0, QUIET) BUILTIN_VD_I (BINOPU, combine, 0, NONE) BUILTIN_VDC_P (BINOPP, combine, 0, NONE) { diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 0814f8ba14f5..3df2773380ed 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -50,7 +50,7 @@ BUILTIN_V12DI (STORESTRUCT_LANE_U, vec_stl1_lane, 0, ALL) BUILTIN_V12DI (STORESTRUCT_LANE_P, vec_stl1_lane, 0, ALL) - BUILTIN_VDC (BINOP, combine, 0, AUTO_FP) + BUILTIN_VDC (BINOP, combine, 0, QUIET) BUILTIN_VD_I (BINOPU, combine, 0, NONE) BUILTIN_VDC_P (BINOPP, combine, 0, NONE) BUILTIN_VB (BINOPP, pmul, 0, NONE) @@ -657,12 +657,12 @@ /* Implemented by aarch64_. */ - BUILTIN_VALL (BINOP, zip1, 0, AUTO_FP) - BUILTIN_VALL (BINOP, zip2, 0, AUTO_FP) - BUILTIN_VALL (BINOP, uzp1, 0, AUTO_FP) - BUILTIN_VALL (BINOP, uzp2, 0, AUTO_FP) - BUILTIN_VALL (BINOP, trn1, 0, AUTO_FP) - BUILTIN_VALL (BINOP, trn2, 0, AUTO_FP) + BUILTIN_VALL (BINOP, zip1, 0, QUIET) + BUILTIN_VALL (BINOP, zip2, 0, QUIET) + BUILTIN_VALL (BINOP, uzp1, 0, QUIET) + BUILTIN_VALL (BINOP, uzp2, 0, QUIET) + BUILTIN_VALL (BINOP, trn1, 0, QUIET) + BUILTIN_VALL (BINOP, trn2, 0, QUIET) BUILTIN_GPF_F16 (UNOP, frecpe, 0, FP) BUILTIN_GPF_F16 (UNOP, frecpx, 0, FP) @@ -674,9 +674,9 @@ /* Implemented by a mixture of abs2 patterns. Note the DImode builtin is only ever used for the int64x1_t intrinsic, there is no scalar version. */ - BUILTIN_VSDQ_I_DI (UNOP, abs, 0, AUTO_FP) - BUILTIN_VHSDF (UNOP, abs, 2, AUTO_FP) - VAR1 (UNOP, abs, 2, AUTO_FP, hf) + BUILTIN_VSDQ_I_DI (UNOP, abs, 0, QUIET) + BUILTIN_VHSDF (UNOP, abs, 2, QUIET) + VAR1 (UNOP, abs, 2, QUIET, hf) BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10, FP) VAR1 (BINOP, float_truncate_hi_, 0, FP, v4sf) @@ -720,7 +720,7 @@ BUILTIN_VDQQH (BSL_P, simd_bsl, 0, NONE) VAR2 (BSL_P, simd_bsl,0, NONE, di, v2di) BUILTIN_VSDQ_I_DI (BSL_U, simd_bsl, 0, NONE) - BUILTIN_VALLDIF (BSL_S, simd_bsl, 0, AUTO_FP) + BUILTIN_VALLDIF (BSL_S, simd_bsl, 0, QUIET) /* Implemented by aarch64_crypto_aes. */ VAR1 (BINOPU, crypto_aese, 0, NONE, v16qi) @@ -940,12 +940,12 @@ BUILT
[gcc r15-5947] aarch64: Reintroduce FLAG_AUTO_FP
https://gcc.gnu.org/g:0a4490a1ad3f73d546f53d0940dbc9f217d12922 commit r15-5947-g0a4490a1ad3f73d546f53d0940dbc9f217d12922 Author: Richard Sandiford Date: Thu Dec 5 15:33:10 2024 + aarch64: Reintroduce FLAG_AUTO_FP The flag now known as FLAG_QUIET is an odd-one-out in that it removes side-effects rather than adding them. This patch inverts it and gives it the old name FLAG_AUTO_FP. FLAG_QUIET now means "no flags" instead. gcc/ * config/aarch64/aarch64-builtins.cc (FLAG_QUIET): Redefine to 0, replacing the old flag with... (FLAG_AUTO_FP): ...this. (FLAG_DEFAULT): Redefine to FLAG_AUTO_FP. (aarch64_call_properties): Update accordingly. Diff: --- gcc/config/aarch64/aarch64-builtins.cc | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index eb44580bd9cb..f528592a17d8 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -193,22 +193,23 @@ using namespace aarch64; #define SIMD_MAX_BUILTIN_ARGS 5 /* Flags that describe what a function might do. */ -const unsigned int FLAG_DEFAULT = 0U; const unsigned int FLAG_READ_FPCR = 1U << 0; const unsigned int FLAG_RAISE_FP_EXCEPTIONS = 1U << 1; const unsigned int FLAG_READ_MEMORY = 1U << 2; const unsigned int FLAG_PREFETCH_MEMORY = 1U << 3; const unsigned int FLAG_WRITE_MEMORY = 1U << 4; -/* Not all FP intrinsics raise FP exceptions or read FPCR register, - use this flag to suppress it. */ -const unsigned int FLAG_QUIET = 1U << 5; +/* Indicates that READ_FPCR and RAISE_FP_EXCEPTIONS should be set for + floating-point modes but not for integer modes. */ +const unsigned int FLAG_AUTO_FP = 1U << 5; +const unsigned int FLAG_QUIET = 0; +const unsigned int FLAG_DEFAULT = FLAG_AUTO_FP; const unsigned int FLAG_FP = FLAG_READ_FPCR | FLAG_RAISE_FP_EXCEPTIONS; const unsigned int FLAG_ALL = FLAG_READ_FPCR | FLAG_RAISE_FP_EXCEPTIONS | FLAG_READ_MEMORY | FLAG_PREFETCH_MEMORY | FLAG_WRITE_MEMORY; -const unsigned int FLAG_STORE = FLAG_WRITE_MEMORY | FLAG_QUIET; -const unsigned int FLAG_LOAD = FLAG_READ_MEMORY | FLAG_QUIET; +const unsigned int FLAG_STORE = FLAG_WRITE_MEMORY; +const unsigned int FLAG_LOAD = FLAG_READ_MEMORY; typedef struct { @@ -1322,7 +1323,7 @@ aarch64_init_simd_builtin_scalar_types (void) static unsigned int aarch64_call_properties (unsigned int flags, machine_mode mode) { - if (!(flags & FLAG_QUIET) && FLOAT_MODE_P (mode)) + if ((flags & FLAG_AUTO_FP) && FLOAT_MODE_P (mode)) flags |= FLAG_FP; /* -fno-trapping-math means that we can assume any FP exceptions
[gcc r13-9230] c++: Don't reject pointer to virtual method during constant evaluation [PR117615]
https://gcc.gnu.org/g:322faea202947561ee8c03edf5ab0ccf649587e1 commit r13-9230-g322faea202947561ee8c03edf5ab0ccf649587e1 Author: Simon Martin Date: Tue Dec 3 14:30:43 2024 +0100 c++: Don't reject pointer to virtual method during constant evaluation [PR117615] We currently reject the following valid code: === cut here === struct Base { virtual void doit (int v) const {} }; struct Derived : Base { void doit (int v) const {} }; using fn_t = void (Base::*)(int) const; struct Helper { fn_t mFn; constexpr Helper (auto && fn) : mFn(static_cast(fn)) {} }; void foo () { constexpr Helper h (&Derived::doit); } === cut here === The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is represented with an expression with type pointer to method and using an INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any such expression with a non-null INTEGER_CST. This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for PR c++/102786), and simply lets such expressions go through. PR c++/117615 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression): Don't reject INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-virtual22.C: New test. (cherry picked from commit 72a2380a306a1c3883cb7e4f99253522bc265af0) Diff: --- gcc/cp/constexpr.cc | 6 ++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C | 22 ++ 2 files changed, 28 insertions(+) diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index fb8a1023b222..f885a806c0a2 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -7778,6 +7778,12 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, return t; } } + else if (TYPE_PTR_P (type) + && TREE_CODE (TREE_TYPE (type)) == METHOD_TYPE) + /* INTEGER_CST with pointer-to-method type is only used +for a virtual method in a pointer to member function. +Don't reject those. */ + ; else { /* This detects for example: diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C new file mode 100644 index ..89330bf86200 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C @@ -0,0 +1,22 @@ +// PR c++/117615 +// { dg-do "compile" { target c++20 } } + +struct Base { +virtual void doit (int v) const {} +}; + +struct Derived : Base { +void doit (int v) const {} +}; + +using fn_t = void (Base::*)(int) const; + +struct Helper { +fn_t mFn; +constexpr Helper (auto && fn) : mFn(static_cast(fn)) {} +}; + +void foo () { +constexpr Helper h (&Derived::doit); +constexpr Helper h2 (&Base::doit); +}
[gcc r15-5933] params.opt: Fix typo
https://gcc.gnu.org/g:2a2f285ecd2cd681cadae305990ffb9e23e157cb commit r15-5933-g2a2f285ecd2cd681cadae305990ffb9e23e157cb Author: Filip Kastl Date: Thu Dec 5 11:23:13 2024 +0100 params.opt: Fix typo Add missing '=' after -param=cycle-accurate-model. gcc/ChangeLog: * params.opt: Add missing '=' after -param=cycle-accurate-model. Signed-off-by: Filip Kastl Diff: --- gcc/params.opt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/params.opt b/gcc/params.opt index f5cc71d0f493..5853bf02f9ee 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -66,7 +66,7 @@ Enable asan stack protection. Common Joined UInteger Var(param_asan_use_after_return) Init(1) IntegerRange(0, 1) Param Optimization Enable asan detection of use-after-return bugs. --param=cycle-accurate-model +-param=cycle-accurate-model= Common Joined UInteger Var(param_cycle_accurate_model) Init(1) IntegerRange(0, 1) Param Optimization Whether the scheduling description is mostly a cycle-accurate model of the target processor and is likely to be spill aggressively to fill any pipeline bubbles.
[gcc r15-5937] AVR: target/107957 - Split multi-byte loads and stores.
https://gcc.gnu.org/g:b78c0dcb1b6b523880ee193698defca3ebd0b3f7 commit r15-5937-gb78c0dcb1b6b523880ee193698defca3ebd0b3f7 Author: Georg-Johann Lay Date: Sun Dec 1 17:12:34 2024 +0100 AVR: target/107957 - Split multi-byte loads and stores. This patch splits multi-byte loads and stores into single-byte ones provided: - New option -msplit-ldst is on (e.g. -O2 and higher), and - The memory is non-volatile, and - The address space is generic, and - The split addresses are natively supported by the hardware. gcc/ PR target/107957 * config/avr/avr.opt (-msplit-ldst, avropt_split_ldst): New option and associated var. * common/config/avr/avr-common.cc (avr_option_optimization_table) [OPT_LEVELS_2_PLUS]: Turn on -msplit_ldst. * config/avr/avr-passes.cc (splittable_address_p) (avr_byte_maybe_mem, avr_split_ldst): New functions. * config/avr/avr-protos.h (avr_split_ldst): New proto. * config/avr/avr.md (define_split) [avropt_split_ldst]: Run avr_split_ldst(). Diff: --- gcc/common/config/avr/avr-common.cc | 1 + gcc/config/avr/avr-passes.cc| 106 gcc/config/avr/avr-protos.h | 1 + gcc/config/avr/avr.md | 19 +-- gcc/config/avr/avr.opt | 4 ++ 5 files changed, 126 insertions(+), 5 deletions(-) diff --git a/gcc/common/config/avr/avr-common.cc b/gcc/common/config/avr/avr-common.cc index 7473429fa360..9059e7d2b485 100644 --- a/gcc/common/config/avr/avr-common.cc +++ b/gcc/common/config/avr/avr-common.cc @@ -39,6 +39,7 @@ static const struct default_options avr_option_optimization_table[] = { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mfuse_move_, NULL, 3 }, { OPT_LEVELS_2_PLUS, OPT_mfuse_move_, NULL, 23 }, { OPT_LEVELS_2_PLUS, OPT_msplit_bit_shift, NULL, 1 }, +{ OPT_LEVELS_2_PLUS, OPT_msplit_ldst, NULL, 1 }, // Stick to the "old" placement of the subreg lowering pass. { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types_early, NULL, 1 }, /* Allow optimizer to introduce store data races. This used to be the diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc index f89a534bcbd9..de8de1cd2e8a 100644 --- a/gcc/config/avr/avr-passes.cc +++ b/gcc/config/avr/avr-passes.cc @@ -5466,6 +5466,112 @@ avr_split_fake_addressing_move (rtx_insn * /*insn*/, rtx *xop) } +/* Given memory reference mem(ADDR), return true when it can be split into + single-byte moves, and all resulting addresses are natively supported. + ADDR is in addr-space generic. */ + +static bool +splittable_address_p (rtx addr, int n_bytes) +{ + if (CONSTANT_ADDRESS_P (addr) + || GET_CODE (addr) == PRE_DEC + || GET_CODE (addr) == POST_INC) +return true; + + if (! AVR_TINY) +{ + rtx base = select() + : REG_P (addr) ? addr + : GET_CODE (addr) == PLUS ? XEXP (addr, 0) + : NULL_RTX; + + int off = select() + : REG_P (addr) ? 0 + : GET_CODE (addr) == PLUS ? (int) INTVAL (XEXP (addr, 1)) + : -1; + + return (base && REG_P (base) + && (REGNO (base) == REG_Y || REGNO (base) == REG_Z) + && IN_RANGE (off, 0, 64 - n_bytes)); +} + + return false; +} + + +/* Like avr_byte(), but also knows how to split POST_INC and PRE_DEC + memory references. */ + +static rtx +avr_byte_maybe_mem (rtx x, int n) +{ + rtx addr, b; + if (MEM_P (x) + && (GET_CODE (addr = XEXP (x, 0)) == POST_INC + || GET_CODE (addr) == PRE_DEC)) +b = gen_rtx_MEM (QImode, copy_rtx (addr)); + else +b = avr_byte (x, n); + + if (MEM_P (x)) +gcc_assert (MEM_P (b)); + + return b; +} + + +/* Split multi-byte load / stores into 1-byte such insns + provided non-volatile, addr-space = generic, no reg-overlap + and the resulting addressings are all natively supported. + Returns true when the XOP[0] = XOP[1] insn has been split and + false, otherwise. */ + +bool +avr_split_ldst (rtx *xop) +{ + rtx dest = xop[0]; + rtx src = xop[1]; + machine_mode mode = GET_MODE (dest); + int n_bytes = GET_MODE_SIZE (mode); + rtx mem, reg_or_0; + + if (MEM_P (dest) && reg_or_0_operand (src, mode)) +{ + mem = dest; + reg_or_0 = src; +} + else if (register_operand (dest, mode) && MEM_P (src)) +{ + reg_or_0 = dest; + mem = src; +} + else +return false; + + rtx addr = XEXP (mem, 0); + + if (MEM_VOLATILE_P (mem) + || ! ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (mem)) + || ! IN_RANGE (n_bytes, 2, 4) + || ! splittable_address_p (addr, n_bytes) + || reg_overlap_mentioned_p (reg_or_0, addr)) +return false; + + const int step = GET_CODE (addr) == PRE_DEC ? -1 : 1; + const int istart = step > 0 ? 0 : n_bytes - 1; + const int iend = istart + step * n_bytes; + + for (int i = istart; i != iend; i += step) +{ + rtx di = avr_byte_may
[gcc r14-11062] AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.
https://gcc.gnu.org/g:0eb7f0a860add7b1c79ae4248e1960120bc77d60 commit r14-11062-g0eb7f0a860add7b1c79ae4248e1960120bc77d60 Author: Georg-Johann Lay Date: Wed Dec 4 20:56:50 2024 +0100 AVR: target/64242 - Copy FP to a local reg in nonlocal_goto. In nonlocal_goto sets, change hard_frame_pointer_rtx only after emit_stack_restore() restored SP. This is needed because SP my be stored in some frame location. gcc/ PR target/64242 * config/avr/avr.md (nonlocal_goto): Don't restore hard_frame_pointer_rtx directly, but copy it to local register, and only set hard_frame_pointer_rtx from it after emit_stack_restore(). (cherry picked from commit f7b5527d1b48b33d8ab633c1e9dcb9883667492a) Diff: --- gcc/config/avr/avr.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index b7273fa19f6e..823fc716f2c7 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -404,9 +404,14 @@ emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx)); -emit_move_insn (hard_frame_pointer_rtx, r_fp); +// PR64242: When r_sp is located in the frame, we must not +// change FP prior to reading r_sp. Hence copy r_fp to a +// local register (and hope that reload won't spill it). +rtx r_fp_reg = copy_to_reg (r_fp); emit_stack_restore (SAVE_NONLOCAL, r_sp); +emit_move_insn (hard_frame_pointer_rtx, r_fp_reg); + emit_use (hard_frame_pointer_rtx); emit_use (stack_pointer_rtx);
[gcc r15-5934] doc: Add store-forwarding-max-distance to invoke.texi
https://gcc.gnu.org/g:9755f5973473aa547063d1a97d47a409d237eb5b commit r15-5934-g9755f5973473aa547063d1a97d47a409d237eb5b Author: Filip Kastl Date: Thu Dec 5 11:27:26 2024 +0100 doc: Add store-forwarding-max-distance to invoke.texi gcc/ChangeLog: * doc/invoke.texi: Add store-forwarding-max-distance. Signed-off-by: Filip Kastl Diff: --- gcc/doc/invoke.texi | 5 + 1 file changed, 5 insertions(+) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d2409a41d50a..4b1acf9b79c1 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -17122,6 +17122,11 @@ diagnostics. @item store-merging-max-size Maximum size of a single store merging region in bytes. +@item store-forwarding-max-distance +Maximum number of instruction distance that a small store forwarded to a larger +load may stall. Value '0' disables the cost checks for the +avoid-store-forwarding pass. + @item hash-table-verification-limit The number of elements for which hash table verification is done for each searched element.
[gcc r15-5936] AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.
https://gcc.gnu.org/g:f7b5527d1b48b33d8ab633c1e9dcb9883667492a commit r15-5936-gf7b5527d1b48b33d8ab633c1e9dcb9883667492a Author: Georg-Johann Lay Date: Wed Dec 4 20:56:50 2024 +0100 AVR: target/64242 - Copy FP to a local reg in nonlocal_goto. In nonlocal_goto sets, change hard_frame_pointer_rtx only after emit_stack_restore() restored SP. This is needed because SP my be stored in some frame location. gcc/ PR target/64242 * config/avr/avr.md (nonlocal_goto): Don't restore hard_frame_pointer_rtx directly, but copy it to local register, and only set hard_frame_pointer_rtx from it after emit_stack_restore(). Diff: --- gcc/config/avr/avr.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index a68b38c272de..f45677a4533d 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -421,9 +421,14 @@ emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx)); -emit_move_insn (hard_frame_pointer_rtx, r_fp); +// PR64242: When r_sp is located in the frame, we must not +// change FP prior to reading r_sp. Hence copy r_fp to a +// local register (and hope that reload won't spill it). +rtx r_fp_reg = copy_to_reg (r_fp); emit_stack_restore (SAVE_NONLOCAL, r_sp); +emit_move_insn (hard_frame_pointer_rtx, r_fp_reg); + emit_use (hard_frame_pointer_rtx); emit_use (stack_pointer_rtx);
[gcc r14-11063] c++: Don't reject pointer to virtual method during constant evaluation [PR117615]
https://gcc.gnu.org/g:4a73efcbdc5fb9c3f6ab0cba718dd25b5062fc22 commit r14-11063-g4a73efcbdc5fb9c3f6ab0cba718dd25b5062fc22 Author: Simon Martin Date: Tue Dec 3 14:30:43 2024 +0100 c++: Don't reject pointer to virtual method during constant evaluation [PR117615] We currently reject the following valid code: === cut here === struct Base { virtual void doit (int v) const {} }; struct Derived : Base { void doit (int v) const {} }; using fn_t = void (Base::*)(int) const; struct Helper { fn_t mFn; constexpr Helper (auto && fn) : mFn(static_cast(fn)) {} }; void foo () { constexpr Helper h (&Derived::doit); } === cut here === The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is represented with an expression with type pointer to method and using an INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any such expression with a non-null INTEGER_CST. This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for PR c++/102786), and simply lets such expressions go through. PR c++/117615 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression): Don't reject INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-virtual22.C: New test. (cherry picked from commit 72a2380a306a1c3883cb7e4f99253522bc265af0) Diff: --- gcc/cp/constexpr.cc | 6 ++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C | 22 ++ 2 files changed, 28 insertions(+) diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index 853694d78a56..40cc755258ff 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -8254,6 +8254,12 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, return t; } } + else if (TYPE_PTR_P (type) + && TREE_CODE (TREE_TYPE (type)) == METHOD_TYPE) + /* INTEGER_CST with pointer-to-method type is only used +for a virtual method in a pointer to member function. +Don't reject those. */ + ; else { /* This detects for example: diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C new file mode 100644 index ..89330bf86200 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual22.C @@ -0,0 +1,22 @@ +// PR c++/117615 +// { dg-do "compile" { target c++20 } } + +struct Base { +virtual void doit (int v) const {} +}; + +struct Derived : Base { +void doit (int v) const {} +}; + +using fn_t = void (Base::*)(int) const; + +struct Helper { +fn_t mFn; +constexpr Helper (auto && fn) : mFn(static_cast(fn)) {} +}; + +void foo () { +constexpr Helper h (&Derived::doit); +constexpr Helper h2 (&Base::doit); +}
[gcc r15-5939] c: Diagnose unexpected va_start arguments in C23 [PR107980]
https://gcc.gnu.org/g:fca04028d7075a6eaae350774a3916f14d4004ae commit r15-5939-gfca04028d7075a6eaae350774a3916f14d4004ae Author: Jakub Jelinek Date: Thu Dec 5 12:57:44 2024 +0100 c: Diagnose unexpected va_start arguments in C23 [PR107980] va_start macro was changed in C23 from the C17 va_start (va_list ap, parmN) where parmN is the identifier of the last parameter into va_start (va_list ap, ...) where arguments after ap aren't evaluated. Late in the C23 development "If any additional arguments expand to include unbalanced parentheses, or a preprocessing token that does not convert to a token, the behavior is undefined." has been added, plus there is "NOTE The macro allows additional arguments to be passed for va_start for compatibility with older versions of the library only." and "Additional arguments beyond the first given to the va_start macro may be expanded and used in unspecified contexts where they are unevaluated. For example, an implementation diagnoses potentially erroneous input for an invocation of va_start such as:" ... va_start(vl, 1, 3.0, "12", xd); // diagnostic encouraged ... "Simultaneously, va_start usage consistent with older revisions of this document should not produce a diagnostic:" ... void neigh (int last_arg, ...) { va_list vl; va_start(vl, last_arg); // no diagnostic The following patch implements the recommended diagnostics. Until now in C23 mode va_start(v, ...) was defined to __builtin_va_start(v, 0) and the extra arguments were silently ignored. The following patch adds a new builtin in a form of a keyword which parses the first argument, is silent about the __builtin_c23_va_start (ap) form, for __builtin_c23_va_start (ap, identifier) looks the identifier up and is silent if it is the last named parameter (except that it diagnoses if it has register keyword), otherwise diagnoses it isn't the last one but something else, and if there is just __builtin_c23_va_start (ap, ) or if __builtin_c23_va_start (ap, is followed by tokens other than identifier followed by ), it skips over the tokens (with handling of balanced ()s) until ) and diagnoses the extra tokens. In all cases in a form of warnings. 2024-12-05 Jakub Jelinek PR c/107980 gcc/ * ginclude/stdarg.h (va_start): For C23+ change parameters from v, ... to just ... and define to __builtin_c23_va_start(__VA_ARGS__) rather than __builtin_va_start(v, 0). gcc/c-family/ * c-common.h (enum rid): Add RID_C23_VA_START. * c-common.cc (c_common_reswords): Add __builtin_c23_va_start. gcc/c/ * c-parser.cc (c_parser_postfix_expression): Handle RID_C23_VA_START. gcc/testsuite/ * gcc.dg/c23-stdarg-4.c: Expect extra warning. * gcc.dg/c23-stdarg-6.c: Likewise. * gcc.dg/c23-stdarg-7.c: Likewise. * gcc.dg/c23-stdarg-8.c: Likewise. * gcc.dg/c23-stdarg-10.c: New test. * gcc.dg/c23-stdarg-11.c: New test. * gcc.dg/torture/c23-stdarg-split-1a.c: Expect extra warning. * gcc.dg/torture/c23-stdarg-split-1b.c: Likewise. Diff: --- gcc/c-family/c-common.cc | 1 + gcc/c-family/c-common.h| 2 +- gcc/c/c-parser.cc | 95 + gcc/ginclude/stdarg.h | 2 +- gcc/testsuite/gcc.dg/c23-stdarg-10.c | 112 + gcc/testsuite/gcc.dg/c23-stdarg-11.c | 11 ++ gcc/testsuite/gcc.dg/c23-stdarg-4.c| 2 +- gcc/testsuite/gcc.dg/c23-stdarg-6.c| 2 +- gcc/testsuite/gcc.dg/c23-stdarg-7.c| 2 + gcc/testsuite/gcc.dg/c23-stdarg-8.c| 2 + gcc/testsuite/gcc.dg/torture/c23-stdarg-split-1a.c | 2 + gcc/testsuite/gcc.dg/torture/c23-stdarg-split-1b.c | 2 +- 12 files changed, 230 insertions(+), 5 deletions(-) diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index d21f2f9909c4..048952311f2f 100644 --- a/gcc/c-family/c-common.cc +++ b/gcc/c-family/c-common.cc @@ -459,6 +459,7 @@ const struct c_common_resword c_common_reswords[] = { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY }, { "__builtin_offsetof", RID_OFFSETOF, 0 }, { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY }, + { "__builtin_c23_va_start", RID_C23_VA_START,D_C23 }, { "__builtin_va_arg",RID_VA_ARG, 0 }, { "__complex", RID_COMPLEX,0 }, { "__complex__", RID_COMPLEX,0 }, diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index 7834e0d19590..e2195aa54b8b 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -105,7 +105,7 @@ enum rid /* C extensions */ RID_ASM, RID_TYP
[gcc r15-5940] doloop: Fix up doloop df use [PR116799]
https://gcc.gnu.org/g:0eed81612ad6eac2bec60286348a103d4dc02a5a commit r15-5940-g0eed81612ad6eac2bec60286348a103d4dc02a5a Author: Jakub Jelinek Date: Thu Dec 5 13:01:21 2024 +0100 doloop: Fix up doloop df use [PR116799] The following testcases are miscompiled on s390x-linux, because the doloop_optimize /* Ensure that the new sequence doesn't clobber a register that is live at the end of the block. */ { bitmap modified = BITMAP_ALLOC (NULL); for (rtx_insn *i = doloop_seq; i != NULL; i = NEXT_INSN (i)) note_stores (i, record_reg_sets, modified); basic_block loop_end = desc->out_edge->src; bool fail = bitmap_intersect_p (df_get_live_out (loop_end), modified); check doesn't work as intended. The problem is that it uses df, but the df analysis was only done using iv_analysis_loop_init (loop); -> df_analyze_loop (loop); which computes df inside on the bbs of the loop. While loop_end bb is inside of the loop, df_get_live_out computed that way includes registers set in the loop and used at the start of the next iteration, but doesn't include registers set in the loop (or before the loop) and used after the loop. The following patch fixes that by doing whole function df_analyze first, changes the loop iteration mode from 0 to LI_ONLY_INNERMOST (on many targets which use can_use_doloop_if_innermost target hook a so are known to only handle innermost loops) or LI_FROM_INNERMOST (I think only bfin actually allows non-innermost loops) and checking not just df_get_live_out (loop_end) (that is needed for something used by the next iteration), but also df_get_live_in (desc->out_edge->dest), i.e. what will be used after the loop. df of such a bb shouldn't be affected by the df_analyze_loop and so should be from df_analyze of the whole function. 2024-12-05 Jakub Jelinek PR rtl-optimization/113994 PR rtl-optimization/116799 * loop-doloop.cc: Include targhooks.h. (doloop_optimize): Also punt on intersection of modified with df_get_live_in (desc->out_edge->dest). (doloop_optimize_loops): Call df_analyze. Use LI_ONLY_INNERMOST or LI_FROM_INNERMOST instead of 0 as second loops_list argument. * gcc.c-torture/execute/pr116799.c: New test. * g++.dg/torture/pr113994.C: New test. Diff: --- gcc/loop-doloop.cc | 20 - gcc/testsuite/g++.dg/torture/pr113994.C| 31 +++ gcc/testsuite/gcc.c-torture/execute/pr116799.c | 41 ++ 3 files changed, 91 insertions(+), 1 deletion(-) diff --git a/gcc/loop-doloop.cc b/gcc/loop-doloop.cc index 2f0c56b0efd2..60d5f2c10c66 100644 --- a/gcc/loop-doloop.cc +++ b/gcc/loop-doloop.cc @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. If not see #include "loop-unroll.h" #include "regs.h" #include "df.h" +#include "targhooks.h" /* This module is used to modify loops with a determinable number of iterations to use special low-overhead looping instructions. @@ -800,6 +801,18 @@ doloop_optimize (class loop *loop) basic_block loop_end = desc->out_edge->src; bool fail = bitmap_intersect_p (df_get_live_out (loop_end), modified); +/* iv_analysis_loop_init calls df_analyze_loop, which computes just + partial df for blocks of the loop only. The above will catch if + any of the modified registers are use inside of the loop body, but + it will most likely not have accurate info on registers used + at the destination of the out_edge. We call df_analyze on the + whole function at the start of the pass though and iterate only + on innermost loops or from innermost loops, so + live in on desc->out_edge->dest should be still unmodified from + the initial df_analyze. */ +if (!fail) + fail = bitmap_intersect_p (df_get_live_in (desc->out_edge->dest), +modified); BITMAP_FREE (modified); if (fail) @@ -825,7 +838,12 @@ doloop_optimize_loops (void) df_live_set_all_dirty (); } - for (auto loop : loops_list (cfun, 0)) + df_analyze (); + + for (auto loop : loops_list (cfun, + targetm.can_use_doloop_p + == can_use_doloop_if_innermost + ? LI_ONLY_INNERMOST : LI_FROM_INNERMOST)) doloop_optimize (loop); if (optimize == 1) diff --git a/gcc/testsuite/g++.dg/torture/pr113994.C b/gcc/testsuite/g++.dg/torture/pr113994.C new file mode 100644 index ..c9c186d45ee7 --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/pr113994.C @@ -0,0 +1,31 @@ +// PR rtl-optimization/113994 +// { dg-do run } + +#include + +void +foo (const std::string &x, size_t &y, std::string &z) +{ + size_t w =
[gcc r15-5956] rtl-optimization/117922 - add timevar for fold-mem-offsets
https://gcc.gnu.org/g:8772f37e45e9401c9a361548e00c9691424e75e0 commit r15-5956-g8772f37e45e9401c9a361548e00c9691424e75e0 Author: Richard Biener Date: Fri Dec 6 08:08:55 2024 +0100 rtl-optimization/117922 - add timevar for fold-mem-offsets The new fold-mem-offsets RTL pass takes significant amount of time and memory. Add a timevar for it. PR rtl-optimization/117922 * timevar.def (TV_FOLD_MEM_OFFSETS): New. * fold-mem-offsets.cc (pass_data_fold_mem): Use TV_FOLD_MEM_OFFSETS. Diff: --- gcc/fold-mem-offsets.cc | 2 +- gcc/timevar.def | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc index 84b9623058bd..284aea9f06fa 100644 --- a/gcc/fold-mem-offsets.cc +++ b/gcc/fold-mem-offsets.cc @@ -100,7 +100,7 @@ const pass_data pass_data_fold_mem = RTL_PASS, /* type */ "fold_mem_offsets", /* name */ OPTGROUP_NONE, /* optinfo_flags */ - TV_NONE, /* tv_id */ + TV_FOLD_MEM_OFFSETS, /* tv_id */ 0, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ diff --git a/gcc/timevar.def b/gcc/timevar.def index 574e62584ffc..4bd26e0b6b79 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -317,6 +317,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IFCVT , "tree loop if-conversion") DEFTIMEVAR (TV_WARN_ACCESS , "access analysis") DEFTIMEVAR (TV_GIMPLE_CRC_OPTIMIZATION, "crc optimization") DEFTIMEVAR (TV_EXT_DCE , "ext dce") +DEFTIMEVAR (TV_FOLD_MEM_OFFSETS , "fold mem offsets") /* Everything else in rest_of_compilation not included above. */ DEFTIMEVAR (TV_EARLY_LOCAL , "early local passes")
[gcc r15-5957] SVE intrinsics: Fold calls with pfalse predicate.
https://gcc.gnu.org/g:5289540ed58e42ae66255e31f22afe4ca0a6e15e commit r15-5957-g5289540ed58e42ae66255e31f22afe4ca0a6e15e Author: Jennifer Schmitz Date: Fri Nov 15 07:45:59 2024 -0800 SVE intrinsics: Fold calls with pfalse predicate. If an SVE intrinsic has predicate pfalse, we can fold the call to a simplified assignment statement: For _m predication, the LHS can be assigned the operand for inactive values and for _z, we can assign a zero vector. For _x, the returned values can be arbitrary and as suggested by Richard Sandiford, we fold to a zero vector. For example, svint32_t foo (svint32_t op1, svint32_t op2) { return svadd_s32_m (svpfalse_b (), op1, op2); } can be folded to lhs = op1, such that foo is compiled to just a RET. For implicit predication, a case distinction is necessary: Intrinsics that read from memory can be folded to a zero vector. Intrinsics that write to memory or prefetch can be folded to a no-op. Other intrinsics need case-by-case implemenation, which we added in the corresponding svxxx_impl::fold. We implemented this optimization during gimple folding by calling a new method gimple_folder::fold_pfalse from gimple_folder::fold, which covers the generic cases described above. We tested the new behavior for each intrinsic with all supported predications and data types and checked the produced assembly. There is a test file for each shape subclass with scan-assembler-times tests that look for the simplified instruction sequences, such as individual RET instructions or zeroing moves. There is an additional directive counting the total number of functions in the test, which must be the sum of counts of all other directives. This is to check that all tested intrinsics were optimized. Some few intrinsics were not covered by this patch: - svlasta and svlastb already have an implementation to cover a pfalse predicate. No changes were made to them. - svld1/2/3/4 return aggregate types and were excluded from the case that folds calls with implicit predication to lhs = {0, ...}. - svst1/2/3/4 already have an implementation in svstx_impl that precedes our optimization, such that it is not triggered. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz gcc/ChangeLog: PR target/106329 * config/aarch64/aarch64-sve-builtins-base.cc (svac_impl::fold): Add folding if pfalse predicate. (svadda_impl::fold): Likewise. (class svaddv_impl): Likewise. (class svandv_impl): Likewise. (svclast_impl::fold): Likewise. (svcmp_impl::fold): Likewise. (svcmp_wide_impl::fold): Likewise. (svcmpuo_impl::fold): Likewise. (svcntp_impl::fold): Likewise. (class svcompact_impl): Likewise. (class svcvtnt_impl): Likewise. (class sveorv_impl): Likewise. (class svminv_impl): Likewise. (class svmaxnmv_impl): Likewise. (class svmaxv_impl): Likewise. (class svminnmv_impl): Likewise. (class svorv_impl): Likewise. (svpfirst_svpnext_impl::fold): Likewise. (svptest_impl::fold): Likewise. (class svsplice_impl): Likewise. * config/aarch64/aarch64-sve-builtins-sve2.cc (class svcvtxnt_impl): Likewise. (svmatch_svnmatch_impl::fold): Likewise. * config/aarch64/aarch64-sve-builtins.cc (is_pfalse): Return true if tree is pfalse. (gimple_folder::fold_pfalse): Fold calls with pfalse predicate. (gimple_folder::fold_call_to): Fold call to lhs = t for given tree t. (gimple_folder::fold_to_stmt_vops): Helper function that folds the call to given stmt and adjusts virtual operands. (gimple_folder::fold): Call fold_pfalse. * config/aarch64/aarch64-sve-builtins.h (is_pfalse): Declare is_pfalse. gcc/testsuite/ChangeLog: PR target/106329 * gcc.target/aarch64/pfalse-binary_0.h: New test. * gcc.target/aarch64/pfalse-unary_0.h: New test. * gcc.target/aarch64/sve/pfalse-binary.c: New test. * gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_rotate.c: New test. * gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binaryxn.c: New test. * gcc.target/aarch64/sv
[gcc r15-5935] AVR: Rework patterns that add / subtract an (inverted) MSB.
https://gcc.gnu.org/g:9ae9db54631f38d6a2080a2a26c5c5d98fa9 commit r15-5935-g9ae9db54631f38d6a2080a2a26c5c5d98fa9 Author: Georg-Johann Lay Date: Tue Dec 3 21:49:32 2024 +0100 AVR: Rework patterns that add / subtract an (inverted) MSB. gcc/ * config/avr/avr-protos.h (avr_out_add_msb): New proto. * config/avr/avr.cc (avr_out_add_msb): New function. (avr_adjust_insn_length) [ADJUST_LEN_ADD_GE0, ADJUST_LEN_ADD_LT0]: Handle cases. * config/avr/avr.md (adjust_len) : New attr values. (QISI2): New mode iterator. (C_MSB): New mode_attr. (*add3...msb_split, *add3.ge0, *add3.lt0) (*sub3...msb_split, *sub3.ge0, *sub3.lt0): New patterns replacing old ones, but with iterators and using avr_out_add_msb() for asm out. Diff: --- gcc/config/avr/avr-protos.h | 1 + gcc/config/avr/avr.cc | 91 gcc/config/avr/avr.md | 249 3 files changed, 227 insertions(+), 114 deletions(-) diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h index 4aa8554000b8..5b42f04fb313 100644 --- a/gcc/config/avr/avr-protos.h +++ b/gcc/config/avr/avr-protos.h @@ -109,6 +109,7 @@ extern const char *avr_out_sbxx_branch (rtx_insn *insn, rtx operands[]); extern const char* avr_out_bitop (rtx, rtx*, int*); extern const char* avr_out_plus (rtx, rtx*, int* =NULL, bool =true); extern const char* avr_out_plus_ext (rtx_insn*, rtx*, int*); +extern const char* avr_out_add_msb (rtx_insn*, rtx*, rtx_code, int*); extern const char* avr_out_round (rtx_insn *, rtx*, int* =NULL); extern const char* avr_out_addto_sp (rtx*, int*); extern const char* avr_out_xload (rtx_insn *, rtx*, int*); diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc index 9bebd67cd9c4..3544571d3dfa 100644 --- a/gcc/config/avr/avr.cc +++ b/gcc/config/avr/avr.cc @@ -8274,6 +8274,94 @@ avr_out_plus_ext (rtx_insn *insn, rtx *yop, int *plen) } +/* Output code for addition of a sign-bit + + YOP[0] += YOP[1] 0 + + or such a subtraction: + + YOP[0] -= YOP[2] 0 + + where CMP is in { GE, LT }. + If PLEN == NULL output the instructions. + If PLEN != NULL set *PLEN to the length of the sequence in words. */ + +const char * +avr_out_add_msb (rtx_insn *insn, rtx *yop, rtx_code cmp, int *plen) +{ + const rtx_code add = GET_CODE (SET_SRC (single_set (insn))); + const machine_mode mode = GET_MODE (yop[0]); + const int n_bytes = GET_MODE_SIZE (mode); + rtx sigop = yop[add == PLUS ? 1 : 2]; + rtx msb = avr_byte (sigop, GET_MODE_SIZE (GET_MODE (sigop)) - 1); + rtx op[3] = { yop[0], msb, nullptr }; + + if (plen) +*plen = 0; + + if (n_bytes == 1 + || (n_bytes == 2 && avr_adiw_reg_p (op[0]))) +{ + avr_asm_len (cmp == LT + ? "sbrc %1,7" + : "sbrs %1,7", op, plen, 1); + const char *s_add = add == PLUS + ? n_bytes == 1 ? "inc %0" : "adiw %0,1" + : n_bytes == 1 ? "dec %0" : "sbiw %0,1"; + return avr_asm_len (s_add, op, plen, 1); +} + + bool labl_p = false; + const char *s_code0 = nullptr; + + // Default code provided SREG.C = MSBit. + const char *s_code = add == PLUS +? "adc %2,__zero_reg__" +: "sbc %2,__zero_reg__"; + + if (cmp == LT) +{ + if (reg_unused_after (insn, sigop) + && ! reg_overlap_mentioned_p (msb, op[0])) + avr_asm_len ("lsl %1", op, plen, 1); + else + avr_asm_len ("mov __tmp_reg__,%1" CR_TAB +"lsl __tmp_reg__", op, plen, 2); +} + else if (test_hard_reg_class (LD_REGS, msb)) +{ + avr_asm_len ("cpi %1,0x80", op, plen, 1); +} + else if (test_hard_reg_class (LD_REGS, op[0])) +{ + labl_p = true; + avr_asm_len ("tst %1" CR_TAB + "brmi 0f", op, plen, 2); + s_code0 = add == PLUS ? "subi %2,-1" : "subi %2,1"; + s_code = add == PLUS ? "sbci %2,-1" : "sbci %2,0"; +} + else +{ + labl_p = true; + avr_asm_len ("tst %1" CR_TAB + "brmi 0f" CR_TAB + "sec", op, plen, 3); +} + + for (int i = 0; i < n_bytes; ++i) +{ + op[2] = avr_byte (op[0], i); + avr_asm_len (i == 0 && s_code0 + ? s_code0 + : s_code, op, plen, 1); +} + + return labl_p +? avr_asm_len ("0:", op, plen, 0) +: ""; +} + + /* Output addition of register XOP[0] and compile time constant XOP[2]. INSN is a single_set insn or an insn pattern. CODE == PLUS: perform addition by using ADD instructions or @@ -10669,6 +10757,9 @@ avr_adjust_insn_length (rtx_insn *insn, int len) case ADJUST_LEN_ADD_SET_ZN: avr_out_plus_set_ZN (op, &len); break; case ADJUST_LEN_ADD_SET_N: avr_out_plus_set_N (op, &len); break; +case ADJUST_LEN_ADD_GE0: avr_out_add_msb (insn, op, GE, &len); break; +case ADJUST_LEN_ADD_LT0: avr_out_add_msb (insn, op,
[gcc r15-5938] AVR: target/107957 - Propagate zero_reg to store sources.
https://gcc.gnu.org/g:bf6f77edd625cfe2f2f164e90437df318b96527f commit r15-5938-gbf6f77edd625cfe2f2f164e90437df318b96527f Author: Georg-Johann Lay Date: Thu Dec 5 11:24:30 2024 +0100 AVR: target/107957 - Propagate zero_reg to store sources. When -msplit-ldst is on, it may be possible to propagate __zero_reg__ to the sources of the new stores. For example, without this patch, unsigned long lx; void store_lsr17 (void) { lx >>= 17; } compiles to: store_lsr17: lds r26,lx+2 ; movqi_insn lds r27,lx+3 ; movqi_insn movw r24,r26 ; *movhi lsr r25; *lshrhi3_const ror r24 ldi r26,0 ; movqi_insn ldi r27,0 ; movqi_insn sts lx,r24 ; movqi_insn sts lx+1,r25 ; movqi_insn sts lx+2,r26 ; movqi_insn sts lx+3,r27 ; movqi_insn ret but with this patch it becomes: store_lsr17: lds r26,lx+2 ; movqi_insn lds r27,lx+3 ; movqi_insn movw r24,r26 ; *movhi lsr r25; *lshrhi3_const ror r24 sts lx,r24 ; movqi_insn sts lx+1,r25 ; movqi_insn sts lx+2,__zero_reg__ ; movqi_insn sts lx+3,__zero_reg__ ; movqi_insn ret gcc/ PR target/107957 * config/avr/avr-passes-fuse-move.h (bbinfo_t) : Add static property. * config/avr/avr-passes.cc (bbinfo_t::try_mem0_p): Define it. (optimize_data_t::try_mem0): New method. (bbinfo_t::optimize_one_block) [bbinfo_t::try_mem0_p]: Run try_mem0. (bbinfo_t::optimize_one_function): Set bbinfo_t::try_mem0_p. * config/avr/avr.md (pushhi1_insn): Also allow zero as source. (define_split) [avropt_split_ldst]: Only run avr_split_ldst() when avr-fuse-move has been run at least once. * doc/invoke.texi (AVR Options) <-msplit-ldst>: Document it. Diff: --- gcc/config/avr/avr-passes-fuse-move.h | 1 + gcc/config/avr/avr-passes.cc | 49 ++- gcc/config/avr/avr.md | 9 +-- gcc/doc/invoke.texi | 9 +-- 4 files changed, 63 insertions(+), 5 deletions(-) diff --git a/gcc/config/avr/avr-passes-fuse-move.h b/gcc/config/avr/avr-passes-fuse-move.h index dbed1a636f3d..432f9ca4670f 100644 --- a/gcc/config/avr/avr-passes-fuse-move.h +++ b/gcc/config/avr/avr-passes-fuse-move.h @@ -1172,6 +1172,7 @@ struct bbinfo_t static find_plies_data_t *fpd; static bool try_fuse_p; + static bool try_mem0_p; static bool try_bin_arg1_p; static bool try_simplify_p; static bool try_split_ldi_p; diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc index de8de1cd2e8a..fad64b1b3454 100644 --- a/gcc/config/avr/avr-passes.cc +++ b/gcc/config/avr/avr-passes.cc @@ -434,6 +434,11 @@ static machine_mode size_to_mode (int size) Split all insns where the operation can be performed on individual bytes, like andsi3. In example (4) the andhi3 can be optimized to an andqi3. + + bbinfo_t::try_mem0_p + Try to fuse a mem = reg insn to mem = __zero_reg__. + This should only occur when -msplit-ldst is on, but may + also occur with pushes since push1 splits them. */ @@ -514,6 +519,7 @@ bool bbinfo_t::try_split_any_p; bool bbinfo_t::try_simplify_p; bool bbinfo_t::use_arith_p; bool bbinfo_t::use_set_some_p; +bool bbinfo_t::try_mem0_p; // Abstract Interpretation of expressions. @@ -1087,6 +1093,7 @@ struct optimize_data_t {} bool try_fuse (bbinfo_t *); + bool try_mem0 (bbinfo_t *); bool try_bin_arg1 (bbinfo_t *); bool try_simplify (bbinfo_t *); bool try_split_ldi (bbinfo_t *); @@ -2509,6 +2516,44 @@ bbinfo_t::run_find_plies (const insninfo_t &ii, const memento_t &memo) const } +// Try to propagate __zero_reg__ to a mem = reg insn's source. +// Returns true on success and sets .n_new_insns. +bool +optimize_data_t::try_mem0 (bbinfo_t *) +{ + rtx_insn *insn = curr.ii.m_insn; + rtx set, mem, reg; + machine_mode mode; + + if (insn + && (set = single_set (insn)) + && MEM_P (mem = SET_DEST (set)) + && REG_P (reg = SET_SRC (set)) + && GET_MODE_SIZE (mode = GET_MODE (mem)) <= 4 + && END_REGNO (reg) <= REG_32 + && ! (regmask (reg) & memento_t::fixed_regs_mask) + && curr.regs.have_value (REGNO (reg), GET_MODE_SIZE (mode), 0x0)) +{ + avr_dump (";; Found insn %d: mem:%m = 0 = r%d\n", INSN_UID (insn), + mode, REGNO (reg)); + + // Some insns like PUSHes don't clobber REG_CC. + bool clobbers_cc = GET_CODE (PATTERN (insn)) == PARALLEL; + + if (clobbers_cc) + emit_valid_move_clobbercc (mem, CONST0_RTX (mode)); + else + emit_valid_in
[gcc r13-9231] AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.
https://gcc.gnu.org/g:45bc6c452ef182dd08c0f0836fef88ad5b67b3aa commit r13-9231-g45bc6c452ef182dd08c0f0836fef88ad5b67b3aa Author: Georg-Johann Lay Date: Wed Dec 4 20:56:50 2024 +0100 AVR: target/64242 - Copy FP to a local reg in nonlocal_goto. In nonlocal_goto sets, change hard_frame_pointer_rtx only after emit_stack_restore() restored SP. This is needed because SP my be stored in some frame location. gcc/ PR target/64242 * config/avr/avr.md (nonlocal_goto): Don't restore hard_frame_pointer_rtx directly, but copy it to local register, and only set hard_frame_pointer_rtx from it after emit_stack_restore(). (cherry picked from commit f7b5527d1b48b33d8ab633c1e9dcb9883667492a) Diff: --- gcc/config/avr/avr.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index 9bd6b9119ec4..5d134afbf2c3 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -384,9 +384,14 @@ emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx)); -emit_move_insn (hard_frame_pointer_rtx, r_fp); +// PR64242: When r_sp is located in the frame, we must not +// change FP prior to reading r_sp. Hence copy r_fp to a +// local register (and hope that reload won't spill it). +rtx r_fp_reg = copy_to_reg (r_fp); emit_stack_restore (SAVE_NONLOCAL, r_sp); +emit_move_insn (hard_frame_pointer_rtx, r_fp_reg); + emit_use (hard_frame_pointer_rtx); emit_use (stack_pointer_rtx);
[gcc r12-10848] AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.
https://gcc.gnu.org/g:499d3dc84e40849f607154bd76ed07d37d744cc1 commit r12-10848-g499d3dc84e40849f607154bd76ed07d37d744cc1 Author: Georg-Johann Lay Date: Wed Dec 4 20:56:50 2024 +0100 AVR: target/64242 - Copy FP to a local reg in nonlocal_goto. In nonlocal_goto sets, change hard_frame_pointer_rtx only after emit_stack_restore() restored SP. This is needed because SP my be stored in some frame location. gcc/ PR target/64242 * config/avr/avr.md (nonlocal_goto): Don't restore hard_frame_pointer_rtx directly, but copy it to local register, and only set hard_frame_pointer_rtx from it after emit_stack_restore(). (cherry picked from commit f7b5527d1b48b33d8ab633c1e9dcb9883667492a) Diff: --- gcc/config/avr/avr.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index f76249340b8f..90ba2d0400e4 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -381,9 +381,14 @@ emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx)); -emit_move_insn (hard_frame_pointer_rtx, r_fp); +// PR64242: When r_sp is located in the frame, we must not +// change FP prior to reading r_sp. Hence copy r_fp to a +// local register (and hope that reload won't spill it). +rtx r_fp_reg = copy_to_reg (r_fp); emit_stack_restore (SAVE_NONLOCAL, r_sp); +emit_move_insn (hard_frame_pointer_rtx, r_fp_reg); + emit_use (hard_frame_pointer_rtx); emit_use (stack_pointer_rtx);
[gcc r15-5941] arm: Add CDE options for star-mc1 cpu
https://gcc.gnu.org/g:237fdf51fbfcfa4829471c18fe67535ae9c3efdb commit r15-5941-g237fdf51fbfcfa4829471c18fe67535ae9c3efdb Author: Arvin Zhong Date: Thu Dec 5 13:43:14 2024 + arm: Add CDE options for star-mc1 cpu This patch adds the CDE options support for the -mcpu=star-mc1. The star-mc1 is an Armv8-m Mainline CPU supporting CDE feature. gcc/ChangeLog: * config/arm/arm-cpus.in (star-mc1): Add CDE options. * doc/invoke.texi (cdecp options): Document for star-mc1. Signed-off-by: Qingxin Zhong Diff: --- gcc/config/arm/arm-cpus.in | 8 gcc/doc/invoke.texi| 6 -- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index 451b15fe9f93..5c12ffb807ba 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -1689,6 +1689,14 @@ begin cpu star-mc1 architecture armv8-m.main+dsp+fp option nofp remove ALL_FP option nodsp remove armv7em + option cdecp0 add cdecp0 + option cdecp1 add cdecp1 + option cdecp2 add cdecp2 + option cdecp3 add cdecp3 + option cdecp4 add cdecp4 + option cdecp5 add cdecp5 + option cdecp6 add cdecp6 + option cdecp7 add cdecp7 isa quirk_no_asmcpu quirk_vlldm costs v7m end cpu star-mc1 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 78ead0e494e1..e85a1495b70f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -23760,7 +23760,8 @@ on @samp{cortex-m52} and @samp{cortex-m85}. @item +nomve Disable the M-Profile Vector Extension (MVE) integer and single precision -floating-point instructions on @samp{cortex-m52}, @samp{cortex-m55} and @samp{cortex-m85}. +floating-point instructions on @samp{cortex-m52}, @samp{cortex-m55} and +@samp{cortex-m85}. @item +nomve.fp Disable the M-Profile Vector Extension (MVE) single precision floating-point @@ -23768,7 +23769,8 @@ instructions on @samp{cortex-m52}, @samp{cortex-m55} and @samp{cortex-m85}. @item +cdecp0, +cdecp1, ... , +cdecp7 Enable the Custom Datapath Extension (CDE) on selected coprocessors according -to the numbers given in the options in the range 0 to 7 on @samp{cortex-m52} and @samp{cortex-m55}. +to the numbers given in the options in the range 0 to 7 on @samp{cortex-m52}, +@samp{cortex-m55} and @samp{star-mc1}. @item +nofp Disables the floating-point instructions on @samp{arm9e},
[gcc r15-5955] c++: ICE with pack indexing empty pack [PR117898]
https://gcc.gnu.org/g:afeef7f0d3537cd978931a5afcbd3d91c144bfeb commit r15-5955-gafeef7f0d3537cd978931a5afcbd3d91c144bfeb Author: Marek Polacek Date: Wed Dec 4 16:58:59 2024 -0500 c++: ICE with pack indexing empty pack [PR117898] Here we ICE with a partially-substituted pack indexing. The pack expanded to an empty pack, which we can't index. It seems reasonable to detect this case in tsubst_pack_index, even before we substitute the index. Other erroneous cases can wait until pack_index_element where we have the index. PR c++/117898 gcc/cp/ChangeLog: * pt.cc (tsubst_pack_index): Detect indexing an empty pack. gcc/testsuite/ChangeLog: * g++.dg/cpp26/pack-indexing2.C: Adjust. * g++.dg/cpp26/pack-indexing12.C: New test. Diff: --- gcc/cp/pt.cc | 6 ++ gcc/testsuite/g++.dg/cpp26/pack-indexing12.C | 16 gcc/testsuite/g++.dg/cpp26/pack-indexing2.C | 26 -- 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 1f0f02603288..b094d141f3b0 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -13984,6 +13984,12 @@ tsubst_pack_index (tree t, tree args, tsubst_flags_t complain, tree in_decl) tree pack = PACK_INDEX_PACK (t); if (PACK_EXPANSION_P (pack)) pack = tsubst_pack_expansion (pack, args, complain, in_decl); + if (TREE_CODE (pack) == TREE_VEC && TREE_VEC_LENGTH (pack) == 0) +{ + if (complain & tf_error) + error ("cannot index an empty pack"); + return error_mark_node; +} tree index = tsubst_expr (PACK_INDEX_INDEX (t), args, complain, in_decl); const bool parenthesized_p = (TREE_CODE (t) == PACK_INDEX_EXPR && PACK_INDEX_PARENTHESIZED_P (t)); diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing12.C b/gcc/testsuite/g++.dg/cpp26/pack-indexing12.C new file mode 100644 index ..d958af3620d0 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing12.C @@ -0,0 +1,16 @@ +// PR c++/117898 +// { dg-do compile { target c++26 } } + +void +ICE (auto... args) +{ + [&]() { +using R = decltype(args...[idx]); // { dg-error "cannot index an empty pack" } + }.template operator()<0>(); +} + +void +g () +{ + ICE(); // empty pack +} diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing2.C b/gcc/testsuite/g++.dg/cpp26/pack-indexing2.C index ec32527ed80f..fdc8320e2555 100644 --- a/gcc/testsuite/g++.dg/cpp26/pack-indexing2.C +++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing2.C @@ -42,7 +42,7 @@ template int getT (auto... Ts) { - return Ts...[N]; // { dg-error "pack index is out of range" } + return Ts...[N]; // { dg-error "cannot index an empty pack" } } template @@ -56,12 +56,26 @@ template void badtype () { - Ts...[N] t; // { dg-error "pack index is out of range" } + Ts...[N] t; // { dg-error "cannot index an empty pack" } } template void badtype2 () +{ + Ts...[N] t; // { dg-error "pack index is out of range" } +} + +template +void +badtype3 () +{ + Ts...[N] t; // { dg-error "cannot index an empty pack" } +} + +template +void +badtype4 () { Ts...[N] t; // { dg-error "pack index is negative" } } @@ -97,12 +111,12 @@ int main() getT<0>(); // { dg-message "required from here" } getT<1>(); // { dg-message "required from here" } - getT2<-1>(); // { dg-message "required from here" } + getT2<-1>(1); // { dg-message "required from here" } badtype<0>(); // { dg-message "required from here" } - badtype<1, int>(); // { dg-message "required from here" } - badtype2<-1>(); // { dg-message "required from here" } - badtype2<-1, int>(); // { dg-message "required from here" } + badtype2<1, int>(); // { dg-message "required from here" } + badtype3<-1>(); // { dg-message "required from here" } + badtype4<-1, int>(); // { dg-message "required from here" } badindex();
[gcc r15-5953] RISC-V: Fix incorrect optimization options passing to convert and unop
https://gcc.gnu.org/g:b7baa22e47421d0a81202a333f43d88b5bbb39f5 commit r15-5953-gb7baa22e47421d0a81202a333f43d88b5bbb39f5 Author: Pan Li Date: Wed Dec 4 10:08:11 2024 +0800 RISC-V: Fix incorrect optimization options passing to convert and unop Like the strided load/store, the testcases of vector convert and unop are designed to pick up different sorts of optimization options but actually these option are ignored according to the Execution log of the gcc.log. This patch would like to make it correct almost the same as how we fixed for strided load/store. The below test suites are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization options passing to testcases. Signed-off-by: Pan Li Diff: --- gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp index 65a57aa79138..aee297752f67 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp @@ -67,9 +67,9 @@ foreach op $AUTOVEC_TEST_OPTS { dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/cmp/*.\[cS\]]] \ "" "$op" dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/conversions/*.\[cS\]]] \ -"" "$op" +"$op" "" dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/unop/*.\[cS\]]] \ -"" "$op" +"$op" "" dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/ternop/*.\[cS\]]] \ "$op" "" dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/reduc/*.\[cS\]]] \
[gcc r15-5954] RISC-V: Refactor the testcases for bswap16-0
https://gcc.gnu.org/g:3ac3093756cd00f50e63e8dcde4d278606722105 commit r15-5954-g3ac3093756cd00f50e63e8dcde4d278606722105 Author: Pan Li Date: Wed Dec 4 10:08:12 2024 +0800 RISC-V: Refactor the testcases for bswap16-0 This patch would like to refactor the testcases of bswap16-0 after sorts of optimization option passing to testcase. To fits the big lmul like m8 for asm dump check. The below test suites are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: Update the vector register RE to cover v10 - v31. Signed-off-by: Pan Li Diff: --- gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c index 605b3565b6bd..4b55c001a31d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c @@ -10,7 +10,7 @@ ** ... ** vsrl\.vi\s+v[0-9]+,\s*v[0-9],\s*8+ ** vsll\.vi\s+v[0-9]+,\s*v[0-9],\s*8+ -** vor\.vv\s+v[0-9]+,\s*v[0-9],\s*v[0-9]+ +** vor\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+ ** ... */ TEST_UNARY_CALL (uint16_t, __builtin_bswap16)
[gcc r15-5951] PR modula2/117904: cc1gm2 ICE when compiling a const built from VAL and SIZE
https://gcc.gnu.org/g:363382ac7c2b8f6a09415e905b349bb7eaeca38a commit r15-5951-g363382ac7c2b8f6a09415e905b349bb7eaeca38a Author: Gaius Mulley Date: Thu Dec 5 20:31:34 2024 + PR modula2/117904: cc1gm2 ICE when compiling a const built from VAL and SIZE This patch fixes an ICE which occurs when a positive ZType constant increment is used during a FOR loop. gcc/m2/ChangeLog: PR modula2/117904 * gm2-compiler/M2GenGCC.mod (PerformLastForIterator): Add call to BuildConvert when increment is > 0. gcc/testsuite/ChangeLog: PR modula2/117904 * gm2/iso/pass/forloopbyconst.mod: New test. Signed-off-by: Gaius Mulley Diff: --- gcc/m2/gm2-compiler/M2GenGCC.mod | 16 +--- gcc/testsuite/gm2/iso/pass/forloopbyconst.mod | 25 + 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/gcc/m2/gm2-compiler/M2GenGCC.mod b/gcc/m2/gm2-compiler/M2GenGCC.mod index b6e34e019b04..c5f5a7825956 100644 --- a/gcc/m2/gm2-compiler/M2GenGCC.mod +++ b/gcc/m2/gm2-compiler/M2GenGCC.mod @@ -541,9 +541,19 @@ BEGIN THEN (* If incr > 0 then LastIterator := ((e2-e1) DIV incr) * incr + e1. *) expr := BuildSub (location, e2tree, e1tree, FALSE) ; - expr := BuildDivFloor (location, expr, incrtree, FALSE) ; - expr := BuildMult (location, expr, incrtree, FALSE) ; - expr := BuildAdd (location, expr, e1tree, FALSE) + incrtree := BuildConvert (location, GetTreeType (expr), incrtree, FALSE) ; + IF TreeOverflow (incrtree) + THEN +MetaErrorT0 (lastpos, + 'the intemediate calculation for the last iterator value in the {%kFOR} loop has caused an overflow') ; +NoChange := FALSE ; +SubQuad (quad) ; +success := FALSE + ELSE +expr := BuildDivFloor (location, expr, incrtree, FALSE) ; +expr := BuildMult (location, expr, incrtree, FALSE) ; +expr := BuildAdd (location, expr, e1tree, FALSE) + END ELSE (* Else use LastIterator := e1 - ((e1-e2) DIV PositiveBy) * PositiveBy to avoid unsigned div signed arithmetic. *) diff --git a/gcc/testsuite/gm2/iso/pass/forloopbyconst.mod b/gcc/testsuite/gm2/iso/pass/forloopbyconst.mod new file mode 100644 index ..c0a1a06e0191 --- /dev/null +++ b/gcc/testsuite/gm2/iso/pass/forloopbyconst.mod @@ -0,0 +1,25 @@ +MODULE forloopbyconst ; + + +CONST + block = 4 ; + + +(* + init - +*) + +PROCEDURE init ; +VAR + i, n: CARDINAL ; +BEGIN + n := 10 ; + FOR i := 1 TO n BY block DO + + END +END init ; + + +BEGIN + init +END forloopbyconst.