Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]
On Tue, Jun 24, 2025 at 05:19:59PM -0400, Jason Merrill wrote: > I think we could move the initialization of the fixed_type_p and > virtual_access variables up, they don't need to be after cp_build_addr_expr. I don't understand why it doesn't depend on cp_build_addr_expr. I've tried the following patch and while it didn't regress anything on make GXX_TESTSUITE_STDS=98,11,14,17,^C,23,26 check-g++ it regressed FAIL: 23_containers/vector/bool/cmp_c++20.cc -std=gnu++20 (test for excess errors) FAIL: 23_containers/vector/bool/cmp_c++20.cc -std=gnu++26 (test for excess errors) In there code is PLUS_EXPR, !want_pointer, !has_empty, but uneval is true and expr is std::vector::begin (&c) before cp_build_addr_expr and &TARGET_EXPR ::begin (&c)> after it. resolves_to_fixed_type_p (expr) is 0 before cp_build_addr_expr and 1 after it. v_binfo is false though, so in that particular case I think we don't actually care about fixed_type_p value, but it doesn't raise confidence that testing resolves_to_fixed_type_p early is ok. --- gcc/cp/class.cc.jj 2025-06-18 17:24:03.973867379 +0200 +++ gcc/cp/class.cc 2025-06-25 08:01:06.824278658 +0200 @@ -347,9 +347,19 @@ build_base_path (enum tree_code code, || processing_template_decl || in_template_context); + int nonnull_copy = nonnull; + fixed_type_p = resolves_to_fixed_type_p (expr, &nonnull); + + /* Do we need to look in the vtable for the real offset? */ + virtual_access = (v_binfo && fixed_type_p <= 0); + /* For a non-pointer simple base reference, express it as a COMPONENT_REF without taking its address (and so causing lambda capture, 91933). */ - if (code == PLUS_EXPR && !v_binfo && !want_pointer && !has_empty && !uneval) + if (code == PLUS_EXPR + && !want_pointer + && !has_empty + && !uneval + && (!v_binfo || !virtual_access)) return build_simple_base_path (expr, binfo); if (!want_pointer) @@ -361,8 +371,10 @@ build_base_path (enum tree_code code, else expr = mark_rvalue_use (expr); + gcc_assert (resolves_to_fixed_type_p (expr, &nonnull_copy) + == fixed_type_p && nonnull_copy == nonnull); + offset = BINFO_OFFSET (binfo); - fixed_type_p = resolves_to_fixed_type_p (expr, &nonnull); target_type = code == PLUS_EXPR ? BINFO_TYPE (binfo) : BINFO_TYPE (d_binfo); /* TARGET_TYPE has been extracted from BINFO, and, is therefore always cv-unqualified. Extract the cv-qualifiers from EXPR so that the @@ -371,9 +383,6 @@ build_base_path (enum tree_code code, (target_type, cp_type_quals (TREE_TYPE (TREE_TYPE (expr; ptr_target_type = build_pointer_type (target_type); - /* Do we need to look in the vtable for the real offset? */ - virtual_access = (v_binfo && fixed_type_p <= 0); - /* Don't bother with the calculations inside sizeof; they'll ICE if the source type is incomplete and the pointer value doesn't matter. In a template (even in instantiate_non_dependent_expr), we don't have vtables > I think -1 doesn't distinguish between single or multiple virtual > derivation, so handling -1 in that block might mean succeeding for a > multiple derivation case where it ought to fail. Ok, will keep it as is then. > > So, shall I e.g. for the if (TREE_PRIVATE case if the outer type has > > CLASSTYPE_VBASECLASSES walk the > > for (vbase = TYPE_BINFO (t); vbase; vbase = TREE_CHAIN (vbase)) > > if (BINFO_VIRTUAL_P (vbase) && !BINFO_PRIMARY_P (vbase)) > > and in that case try to compare byte_position (TREE_OPERAND (path, 1)) > > against BINFO_OFFSET (vbase) and if it matches (plus perhaps some type > > check?) then decide based on BINFO_BASE_ACCESS or something like that > > whether it was a private/protected vs. public virtual base? > > It seems simpler to pass an accurate access to the build_base_field above. > At least whether the whole BINFO_INHERITANCE_CHAIN is public or not, I > suppose the distinction between private and protected doesn't matter. I'm afraid I'm quite lost on what actually is public base class that [expr.dynamic.cast] talks about in the case of virtual bases because a virtual base can appear many times among the bases and if it is virtual in all cases, there is just one copy of it and it can be public in some paths and private/protected in others. And where to find that information. I've tried the following testcase and it seems that it succeeds unless -DP1 -DP2 -DP1 -DP3 -DP1 -DP6 -DP2 -DP3 -DP6 -DP4 -DP5 -DP6 -DP2 -DP3 -DP4 -DP5 is a subset of the -DPN options or in case of clang++ also -DP2 -DP4 -DP5 (for that g++ passes, clang++ fails). E.g. what is the difference between -DP1 which works and S is private in one case and public in 2 others, while -DP1 -DP2 doesn't work and is private in two cases and public in one. #ifdef P1 #undef P1 #define P1 private #else #define P1 #endif #ifdef P2 #undef P2 #define P2 private #else #define P2 #endif #ifdef P3 #undef P3 #define P3 private #else #define P3 #endif
Re: [PATCH v2] x86: Add preserve_none and update no_caller_saved_registers attributes
On Wed, Jun 25, 2025 at 2:14 PM Hongtao Liu wrote: > > On Fri, May 23, 2025 at 1:56 PM H.J. Lu wrote: > > > > Add preserve_none attribute which is similar to no_callee_saved_registers > > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are > > used for integer parameter passing. This can be used in an interpreter > > to avoid saving/restoring the registers in functions which processing > > byte codes. It improved the pystones benchmark by 6-7%: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15 > > > > Remove -mgeneral-regs-only restriction on no_caller_saved_registers > > attribute. Only SSE is allowed since SSE XMM register load preserves > > the upper bits in YMM/ZMM register while YMM register load zeros the > > upper 256 bits of ZMM register, and preserving 32 ZMM registers can > > be quite expensive. > > > > gcc/ > > > > PR target/119628 > > * config/i386/i386-expand.cc (ix86_expand_call): Call > > ix86_type_no_callee_saved_registers_p instead of looking up > > no_callee_saved_registers attribute. > > * config/i386/i386-options.cc (ix86_set_func_type): Look up > > preserve_none attribute. Check preserve_none attribute for > > interrupt attribute. Don't check no_caller_saved_registers nor > > no_callee_saved_registers conflicts here. > > (ix86_set_func_type): Check no_callee_saved_registers before > > checking no_caller_saved_registers attribute. > > (ix86_set_current_function): Allow SSE with > > no_caller_saved_registers attribute. > > (ix86_handle_call_saved_registers_attribute): Check preserve_none, > > no_callee_saved_registers and no_caller_saved_registers conflicts. > > (ix86_gnu_attributes): Add preserve_none attribute. > > * config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p): > > New. > > * config/i386/i386.cc > > (x86_64_preserve_none_int_parameter_registers): New. > > (ix86_using_red_zone): Don't use red-zone when there are no > > caller-saved registers with SSE. > > (ix86_type_no_callee_saved_registers_p): New. > > (ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE > > and call ix86_type_no_callee_saved_registers_p instead of looking > > up no_callee_saved_registers attribute. > > (ix86_comp_type_attributes): Call > > ix86_type_no_callee_saved_registers_p instead of looking up > > no_callee_saved_registers attribute. Return 0 if preserve_none > > attribute doesn't match in 64-bit mode. > > (ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE, > > use x86_64_preserve_none_int_parameter_registers. > > (init_cumulative_args): Set preserve_none_abi. > > (function_arg_64): Use x86_64_preserve_none_int_parameter_registers > > with preserve_none attribute. > > (setup_incoming_varargs_64): Use > > x86_64_preserve_none_int_parameter_registers with preserve_none > > attribute. > > (ix86_save_reg): Treat TYPE_PRESERVE_NONE like > > TYPE_NO_CALLEE_SAVED_REGISTERS. > > (ix86_nsaved_sseregs): Allow saving XMM registers for > > no_caller_saved_registers attribute. > > (ix86_compute_frame_layout): Likewise. > > (x86_this_parameter): Use > > x86_64_preserve_none_int_parameter_registers with preserve_none > > attribute. > > * config/i386/i386.h (ix86_args): Add preserve_none_abi. > > (call_saved_registers_type): Add TYPE_PRESERVE_NONE. > > (machine_function): Change call_saved_registers to 3 bits. > > * doc/extend.texi: Add preserve_none attribute. Update > > no_caller_saved_registers attribute to remove -mgeneral-regs-only > > restriction. > > > > gcc/testsuite/ > > > > PR target/119628 > > * gcc.target/i386/no-callee-saved-3.c: Adjust error location. > > * gcc.target/i386/no-callee-saved-19a.c: New test. > > * gcc.target/i386/no-callee-saved-19b.c: Likewise. > > * gcc.target/i386/no-callee-saved-19c.c: Likewise. > > * gcc.target/i386/no-callee-saved-19d.c: Likewise. > > * gcc.target/i386/no-callee-saved-19e.c: Likewise. > > * gcc.target/i386/preserve-none-1.c: Likewise. > > * gcc.target/i386/preserve-none-2.c: Likewise. > > * gcc.target/i386/preserve-none-3.c: Likewise. > > * gcc.target/i386/preserve-none-4.c: Likewise. > > * gcc.target/i386/preserve-none-5.c: Likewise. > > * gcc.target/i386/preserve-none-6.c: Likewise. > > * gcc.target/i386/preserve-none-7.c: Likewise. > > * gcc.target/i386/preserve-none-8.c: Likewise. > > * gcc.target/i386/preserve-none-9.c: Likewise. > > * gcc.target/i386/preserve-none-10.c: Likewise. > > * gcc.target/i386/preserve-none-11.c: Likewise. > > *
Re: [Patch, Fortran, Coarray, PR88076, v1] 6/6 Add a shared memory multi process coarray library.
Giving something new time to mature before making it the default is always a great policy. My suggestion is aspirational. I’m describing a dream that I hope can be the ultimate goal. There’s no need to rush into implementing my proposed vision. D On Tue, Jun 24, 2025 at 23:25 Andre Vehreschild wrote: > Hi Damian, hi Steve, > > enabling coarray-support by default has implications we need to consider. > The > memory footprint of a coarray enabled program is larger than the one of a > non-coarray one. This is simply because the coarray token needs to be > stored > somewhere. > > Furthermore, I just yesterday figured, that with -fcoarray=single the > space for > the token was allocated. I.e. every data structure, that could possibly be > stored in a coarray and had allocatable components in it, wasted another 8 > byte > for an unused pointer. > > So when we default to having coarray support enabled, some work needs to be > done, to remove such inefficiencies. Given there are only a few developers, > that work on coarrays, this may take some time. > > What we can of course do, is to switch on the coarray mode, when we detect > the > first coarray construct and no longer need the user to do it. I hope this > does > not have to many implications and causes only a hand full of bugs. > > For the time being, I propose to first give the new coarray implementation > some > time to mature and test. There will be bugs, because nobody is perfect. > > @Steve caf_shmem does not use MPI. It is a shared memory, single node, > multi > process approach. Just to prevent any misunderstanding. > > Thanks for all the testing. > > Regards, > Andre > > On Tue, 24 Jun 2025 11:13:52 -0700 > Steve Kargl wrote: > > > Damian, > > > > I submitted a patch a long time ago to make -fcoarray=single the > > default behavior. The patch made -fcoarray=none a NOP. With > > inclusion of a shmem implementation of the runtime parts, this > > might be the way to go. I'll leave that decision to Andre, Thomas, > > and Nicolas. > > > > I believe that the gfortran contributors have not considered > > coarray as an optional add-on. The problem for gfortran is > > that it runs on dozens of CPUs and dozens upon dozens of > > operating systems. The few gfortran contributors simply cannot > > ensure that opencoarray+mpich or opencoarray+openmpi runs on > > all of the possible combinations of hardware and OS's. Andre > > has hinted that he expects some rough edges on non-linux system. > > I'll find out this weekend when I give his patch a spin on > > FreeBSD. Hopefully, a windows10/11 user can test the patch. > > > > > -- > Andre Vehreschild * Email: vehre ad gmx dot de >
Re: [PATCH] s390: Add some missing vector patterns.
> On Tue, Jun 24, 2025 at 09:49:01AM +0200, Juergen Christ wrote: > > Some patterns that are detected by the autovectorizer can be supported by > > s390. Add expanders such that autovectorization of these patterns works. > > > > Bootstrapped and regtested on s390. Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/s390/vector.md (avg3_ceil): New pattern. > > (uavg3_ceil): New pattern. > > (smul3_highpart): New pattern. > > (umul3_highpart): New pattern. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/s390/vector/pattern-avg-1.c: New test. > > * gcc.target/s390/vector/pattern-mulh-1.c: New test. > > > > Signed-off-by: Juergen Christ > > --- > > gcc/config/s390/vector.md | 28 ++ > > .../gcc.target/s390/vector/pattern-avg-1.c| 26 + > > .../gcc.target/s390/vector/pattern-mulh-1.c | 29 +++ > > 3 files changed, 83 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c > > create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c > > > > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md > > index 6f4e1929eb80..16f4b8116432 100644 > > --- a/gcc/config/s390/vector.md > > +++ b/gcc/config/s390/vector.md > > @@ -3576,3 +3576,31 @@ > > ; vec_unpacks_float_lo > > ; vec_unpacku_float_hi > > ; vec_unpacku_float_lo > > + > > +(define_expand "avg3_ceil" > > + [(set (match_operand:VIT_HW_VXE3_T0 > > "register_operand" "=v") > > + (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 > > "register_operand" "v") > > + (match_operand:VIT_HW_VXE3_T 2 > > "register_operand" "v")] > > + UNSPEC_VEC_AVG))] > > + "TARGET_VX") > > + > > +(define_expand "uavg3_ceil" > > + [(set (match_operand:VIT_HW_VXE3_T0 > > "register_operand" "=v") > > + (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 > > "register_operand" "v") > > + (match_operand:VIT_HW_VXE3_T 2 > > "register_operand" "v")] > > + UNSPEC_VEC_AVGU))] > > + "TARGET_VX") > > + > > +(define_expand "smul3_highpart" > > + [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" > > "=v") > > + (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 > > "register_operand" "v") > > + (match_operand:VIT_HW_VXE3_DT 2 > > "register_operand" "v")] > > + UNSPEC_VEC_SMULT_HI))] > > + "TARGET_VX") > > + > > +(define_expand "umul3_highpart" > > + [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" > > "=v") > > + (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 > > "register_operand" "v") > > + (match_operand:VIT_HW_VXE3_DT 2 > > "register_operand" "v")] > > + UNSPEC_VEC_UMULT_HI))] > > + "TARGET_VX") > > In commit r12-4231-g555fa3545efe23 RTX smul_highpart and umul_highpart > were introduced which we could use instead of the unspec, now. So one > solution would be to move vec_smulh/vec_umulh from > vx-builtins.md to vector.md and rename those to > smul3_highpart/umul3_highpart and then making sure that > those are used in s390-builtins.def. Of course, replacing the unspec by > the corresponding RTXs', too. > > Sorry for bothering with this. But I think it is worthwhile to replace > those unspecs. > > Thanks, > Stefan Will send v2 with these fixes.
Re: [PATCH] s390: Optimize fmin/fmax.
> On Mon, Jun 23, 2025 at 09:51:13AM +0200, Juergen Christ wrote: > > On VXE targets, we can directly use the fp min/max instruction instead of > > calling into libm for fmin/fmax etc. > > > > Provide fmin/fmax versions also for vectors even though it cannot be > > called directly. This will be exploited with a follow-up patch when > > reductions are introduced. > > This looks very similar to vfmin / vfmax. Couldn't we merge > those by using appropriate mode iterators? The expander for fmin > / fmax could set the mask operand. Will send v2.
[PATCH v2] s390: Optimize fmin/fmax.
On VXE targets, we can directly use the fp min/max instruction instead of calling into libm for fmin/fmax etc. Provide fmin/fmax versions also for vectors even though it cannot be called directly. This will be exploited with a follow-up patch when reductions are introduced. Bootstrapped and regtested on s390. Ok for trunk? gcc/ChangeLog: * config/s390/s390.md: Update UNSPECs * config/s390/vector.md (fmax3): New expander. (fmin3): New expander. * config/s390/vx-builtins.md (*fmin): New insn. (vfmin): Redefined to use new insn. (*fmax): New insn. (vfmax): Redefined to use new insn. gcc/testsuite/ChangeLog: * gcc.target/s390/fminmax-1.c: New test. * gcc.target/s390/fminmax-2.c: New test. Signed-off-by: Juergen Christ --- gcc/config/s390/s390.md | 6 +- gcc/config/s390/vector.md | 25 gcc/config/s390/vx-builtins.md| 29 ++--- gcc/testsuite/gcc.target/s390/fminmax-1.c | 77 +++ gcc/testsuite/gcc.target/s390/fminmax-2.c | 29 + 5 files changed, 156 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/fminmax-1.c create mode 100644 gcc/testsuite/gcc.target/s390/fminmax-2.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 97a4bdf96b2d..1c88c9624b60 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -241,9 +241,6 @@ UNSPEC_VEC_MSUM - UNSPEC_VEC_VFMIN - UNSPEC_VEC_VFMAX - UNSPEC_VEC_VBLEND UNSPEC_VEC_VEVAL UNSPEC_VEC_VGEM @@ -256,6 +253,9 @@ UNSPEC_NNPA_VCFN_V8HI UNSPEC_NNPA_VCNF_V8HI + + UNSPEC_FMAX + UNSPEC_FMIN ]) ;; diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 6f4e1929eb80..8bda30624c22 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -89,6 +89,13 @@ (define_mode_iterator VF_HW [(V4SF "TARGET_VXE") V2DF (V1TF "TARGET_VXE") (TF "TARGET_VXE")]) +; FP scalar and vector modes +(define_mode_iterator VFT_BFP [SF DF + (V1SF "TARGET_VXE") (V2SF "TARGET_VXE") (V4SF "TARGET_VXE") + V1DF V2DF + (V1TF "TARGET_VXE") (TF "TARGET_VXE")]) + + (define_mode_iterator V_8 [V1QI]) (define_mode_iterator V_16 [V2QI V1HI]) (define_mode_iterator V_32 [V4QI V2HI V1SI V1SF]) @@ -3576,3 +3583,21 @@ ; vec_unpacks_float_lo ; vec_unpacku_float_hi ; vec_unpacku_float_lo + +; fmax +(define_expand "fmax3" + [(set (match_operand:VFT_BFP 0 "register_operand" "=v") + (unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand" "v") + (match_operand:VFT_BFP 2 "register_operand" "v") + (const_int 4)] + UNSPEC_FMAX))] + "TARGET_VXE") + +; fmin +(define_expand "fmin3" + [(set (match_operand:VFT_BFP 0 "register_operand" "=v") + (unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand" "v") + (match_operand:VFT_BFP 2 "register_operand" "v") + (const_int 4)] + UNSPEC_FMIN))] + "TARGET_VXE") diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index a7bb7ff92f5e..0508df43b866 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -2136,15 +2136,32 @@ "fchebs\t%v2,%v0,%v1" [(set_attr "op_type" "VRR")]) +(define_insn "*fmin" + [(set (match_operand:VFT_BFP0 "register_operand" "=v") + (unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand" "v") +(match_operand:VFT_BFP 2 "register_operand" "v") +(match_operand:QI 3 "const_mask_operand" "C")] + UNSPEC_FMIN))] + "TARGET_VXE" + "fminb\t%v0,%v1,%v2,%b3" + [(set_attr "op_type" "VRR")]) -(define_insn "vfmin" +(define_expand "vfmin" [(set (match_operand:VF_HW0 "register_operand" "=v") (unspec:VF_HW [(match_operand:VF_HW 1 "register_operand" "v") (match_operand:VF_HW 2 "register_operand" "v") (match_operand:QI3 "const_mask_operand" "C")] - UNSPEC_VEC_VFMIN))] + UNSPEC_FMIN))] + "TARGET_VXE") + +(define_insn "*fmax" + [(set (match_operand:VFT_BFP0 "register_operand" "=v") + (unspec:VFT_BFP [(match_operand:VFT_BFP 1 "register_operand" "v") +(match_operand:VFT_BFP 2 "register_operand" "v") +(match_operand:QI 3 "const_mask_operand" "C")] + UNSPEC_FMAX))] "TARGET_VXE" - "fminb\t%v0,%v1,%v2,%b3" + "fmaxb\t%v0,%v1,%v2,%b3" [(set_attr "op_type" "VRR")]) (define_insn "vfmax" @@ -2152,10 +2169,8 @@ (unspec:VF_HW [(match_operand:VF_HW 1 "register_operand" "v") (match_operand:VF_HW 2 "register_operand" "v") (match_operand:
Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.
Hi Jerry, thank you very much. Just try it. I can only imagine that Paul had a somehow corrupted build directory or left overs from some previous build. I am still wondering, that I got no automated mail from the build hosts, but I can imagine, that they get issues with a series of patches, that build upon each other. Just try it. The more feedback, the better. Regards, Andre On Tue, 24 Jun 2025 11:07:23 -0700 Jerry D wrote: > On 6/24/25 6:09 AM, Andre Vehreschild wrote: > > Hi all, > > > > this series of patches (six in total) adds a new coarray backend library to > > libgfortran. The library uses shared memory and processes to implement > > running multiple images on the same node. The work is based on work > > started by Thomas and Nicolas Koenig. No changes to the gfortran compile > > part are required for this. > > --- snip --- > > Hi Andre, > > Thank you for this work. I have been wanting this functionality for > several years! > > I will begin reviewing as best I can. I did see Paul's initial comment > so your feedback on that would be appreciated. > > Best regards, > > Jerry -- Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH v2] s390: Add some missing vector patterns.
Some patterns that are detected by the autovectorizer can be supported by s390. Add expanders such that autovectorization of these patterns works. RTL for the builtins used unspec to represent highpart multiplication. Replace this by the correct RTL to allow further simplification. Bootstrapped and regtested on s390. Ok for trunk? gcc/ChangeLog: * config/s390/s390.md: Removed unused unspecs. * config/s390/vector.md (avg3_ceil): New expander. (uavg3_ceil): New expander. (smul3_highpart): New expander. (umul3_highpart): New expander. * config/s390/vx-builtins.md (vec_umulh): Remove unspec. (vec_smulh): Remove unspec. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/pattern-avg-1.c: New test. * gcc.target/s390/vector/pattern-mulh-1.c: New test. Signed-off-by: Juergen Christ --- gcc/config/s390/s390.md | 3 -- gcc/config/s390/vector.md | 26 + gcc/config/s390/vx-builtins.md| 10 +++ .../gcc.target/s390/vector/pattern-avg-1.c| 26 + .../gcc.target/s390/vector/pattern-mulh-1.c | 29 +++ 5 files changed, 85 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 97a4bdf96b2d..440ce93574f4 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -139,9 +139,6 @@ UNSPEC_LCBB ; Vector - UNSPEC_VEC_SMULT_HI - UNSPEC_VEC_UMULT_HI - UNSPEC_VEC_SMULT_LO UNSPEC_VEC_SMULT_EVEN UNSPEC_VEC_UMULT_EVEN UNSPEC_VEC_SMULT_ODD diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 6f4e1929eb80..8d7ca1a520f3 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -3576,3 +3576,29 @@ ; vec_unpacks_float_lo ; vec_unpacku_float_hi ; vec_unpacku_float_lo + +(define_expand "avg3_ceil" + [(set (match_operand:VIT_HW_VXE3_T0 "register_operand" "=v") + (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 "register_operand" "v") + (match_operand:VIT_HW_VXE3_T 2 "register_operand" "v")] + UNSPEC_VEC_AVG))] + "TARGET_VX") + +(define_expand "uavg3_ceil" + [(set (match_operand:VIT_HW_VXE3_T0 "register_operand" "=v") + (unspec:VIT_HW_VXE3_T [(match_operand:VIT_HW_VXE3_T 1 "register_operand" "v") + (match_operand:VIT_HW_VXE3_T 2 "register_operand" "v")] + UNSPEC_VEC_AVGU))] + "TARGET_VX") + +(define_expand "smul3_highpart" + [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" "=v") + (smul_highpart:VIT_HW_VXE3_DT (match_operand:VIT_HW_VXE3_DT 1 "register_operand" "v") + (match_operand:VIT_HW_VXE3_DT 2 "register_operand" "v")))] + "TARGET_VX") + +(define_expand "umul3_highpart" + [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" "=v") + (umul_highpart:VIT_HW_VXE3_DT (match_operand:VIT_HW_VXE3_DT 1 "register_operand" "v") + (match_operand:VIT_HW_VXE3_DT 2 "register_operand" "v")))] + "TARGET_VX") diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index a7bb7ff92f5e..2478f74e161a 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -983,9 +983,8 @@ ; vmhb, vmhh, vmhf, vmhg, vmhq (define_insn "vec_smulh" [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" "=v") - (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 "register_operand" "v") - (match_operand:VIT_HW_VXE3_DT 2 "register_operand" "v")] - UNSPEC_VEC_SMULT_HI))] + (smul_highpart:VIT_HW_VXE3_DT (match_operand:VIT_HW_VXE3_DT 1 "register_operand" "v") + (match_operand:VIT_HW_VXE3_DT 2 "register_operand" "v")))] "TARGET_VX" "vmh\t%v0,%v1,%v2" [(set_attr "op_type" "VRR")]) @@ -993,9 +992,8 @@ ; vmlhb, vmlhh, vmlhf, vmlhg, vmlhq (define_insn "vec_umulh" [(set (match_operand:VIT_HW_VXE3_DT 0 "register_operand" "=v") - (unspec:VIT_HW_VXE3_DT [(match_operand:VIT_HW_VXE3_DT 1 "register_operand" "v") - (match_operand:VIT_HW_VXE3_DT 2 "register_operand" "v")] - UNSPEC_VEC_UMULT_HI))] + (umul_highpart:VIT_HW_VXE3_DT (match_operand:VIT_HW_VXE3_DT 1 "register_operand" "v") + (match_operand:VIT_HW_VXE3_DT 2 "register_operand" "v")))] "TARGET_VX" "vmlh\t%v0,%v1,%v2" [(set_attr "op_type" "VRR")]) diff --git a/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.
Re: [PATCH v6 2/9] AArch64: reformat branch instruction rules
Karl Meakin writes: > Make the formatting of the RTL templates in the rules for branch > instructions more consistent with each other. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): Reformat. > (cbranchcc4): Likewise. > (condjump): Likewise. > (*compare_condjump): Likewise. > (aarch64_cb1): Likewise. > (*cb1): Likewise. > (tbranch_3): Likewise. > (@aarch64_tb): Likewise. > --- > gcc/config/aarch64/aarch64.md | 77 +-- > 1 file changed, 38 insertions(+), 39 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index fcc24e300e6..d059a6362d5 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > [...] > @@ -725,34 +725,34 @@ (define_expand "cbranch4" > ) > > (define_expand "cbranch4" > - [(set (pc) (if_then_else > - (match_operator 0 "aarch64_comparison_operator" > - [(match_operand:GPF_F16 1 "register_operand") > - (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")]) > - (label_ref (match_operand 3 "" "")) > - (pc)))] > + [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" > + [(match_operand:GPF_F16 1 "register_operand") > + (match_operand:GPF_F16 2 > "aarch64_fp_compare_operand")]) > +(label_ref (match_operand 3)) > +(pc)))] I think we should drop this part, since it makes the lines go over the 80-character limit. OK with that change, thanks. Richard
Re: [PATCH v6 3/9] AArch64: rename branch instruction rules
Karl Meakin writes: > Give the `define_insn` rules used in lowering `cbranch4` to RTL > more descriptive and consistent names: from now on, each rule is named > after the AArch64 instruction that it generates. Also add comments to > document each rule. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (condjump): Rename to ... > (aarch64_bcond): ...here. > (*compare_condjump): Rename to ... > (*aarch64_bcond_wide_imm): ...here. > (aarch64_cb): Rename to ... > (aarch64_cbz1): ...here. > (*cb1): Rename to ... > (*aarch64_tbz1): ...here. > (@aarch64_tb): Rename to ... > (@aarch64_tbz): ...here. > (restore_stack_nonlocal): Handle rename. > (stack_protect_combined_test): Likewise. > * config/aarch64/aarch64-simd.md (cbranch4): Likewise. > * config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise. > * config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise. > --- > gcc/config/aarch64/aarch64-simd.md | 2 +- > gcc/config/aarch64/aarch64-sme.md | 2 +- > gcc/config/aarch64/aarch64.cc | 4 ++-- > gcc/config/aarch64/aarch64.md | 21 - > 4 files changed, 16 insertions(+), 13 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index e771defc73f..33839f2fec7 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2884,9 +2884,9 @@ aarch64_gen_test_and_branch (rtx_code code, rtx x, int > bitnum, >emit_insn (gen_aarch64_and3nr_compare0 (mode, x, mask)); >rtx cc_reg = gen_rtx_REG (CC_NZVmode, CC_REGNUM); >rtx x = gen_rtx_fmt_ee (code, CC_NZVmode, cc_reg, const0_rtx); > - return gen_condjump (x, cc_reg, label); > + return gen_aarch64_bcond (x, cc_reg, label); > } > - return gen_aarch64_tb (code, mode, mode, > + return gen_aarch64_tbz (code, mode, mode, >x, gen_int_mode (bitnum, mode), label); Sorry for the formatting nit, but: please indent this line by an extra column too, so that the arguments still line up. > [...] > @@ -8104,7 +8107,7 @@ (define_expand "stack_protect_combined_test" >: gen_stack_protect_test_si) (operands[0], operands[1])); > >rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM); > - emit_jump_insn (gen_condjump (gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx), > + emit_jump_insn (gen_aarch64_bcond (gen_rtx_EQ (VOIDmode, cc_reg, > const0_rtx), > cc_reg, operands[2])); Similarly, please reindent this to match the new name. OK with those changes, thanks. Richard
[PATCH v2] libstdc++: Test for %S precision for durations with integral representation.
Existing test are extented to cover cases where not precision is specified, or it is specified to zero. The precision value is ignored in all cases. libstdc++-v3/ChangeLog: * testsuite/std/time/format/precision.cc: New tests. --- v2 extents test to cover .0 as precision. Testing on x86_64-linux. std/format/time* test passed, also with -D_GLIBCXX_USE_CXX11_ABI=0 and -D_GLIBCXX_DEBUG. .../testsuite/std/time/format/precision.cc| 104 +- 1 file changed, 99 insertions(+), 5 deletions(-) diff --git a/libstdc++-v3/testsuite/std/time/format/precision.cc b/libstdc++-v3/testsuite/std/time/format/precision.cc index ccb2c77ce05..aa266156c1f 100644 --- a/libstdc++-v3/testsuite/std/time/format/precision.cc +++ b/libstdc++-v3/testsuite/std/time/format/precision.cc @@ -16,6 +16,10 @@ test_empty() std::basic_string res; const duration d(33.111222); + res = std::format(WIDEN("{:}"), d); + VERIFY( res == WIDEN("33.1112s") ); + res = std::format(WIDEN("{:.0}"), d); + VERIFY( res == WIDEN("33.1112s") ); res = std::format(WIDEN("{:.3}"), d); VERIFY( res == WIDEN("33.1112s") ); res = std::format(WIDEN("{:.6}"), d); @@ -25,6 +29,10 @@ test_empty() // Uses ostream operator<< const duration nd = d; + res = std::format(WIDEN("{:}"), nd); + VERIFY( res == WIDEN("3.31112e+10ns") ); + res = std::format(WIDEN("{:.0}"), nd); + VERIFY( res == WIDEN("3.31112e+10ns") ); res = std::format(WIDEN("{:.3}"), nd); VERIFY( res == WIDEN("3.31112e+10ns") ); res = std::format(WIDEN("{:.6}"), nd); @@ -40,6 +48,10 @@ test_Q() std::basic_string res; const duration d(7.111222); + res = std::format(WIDEN("{:%Q}"), d); + VERIFY( res == WIDEN("7.111222") ); + res = std::format(WIDEN("{:.0%Q}"), d); + VERIFY( res == WIDEN("7.111222") ); res = std::format(WIDEN("{:.3%Q}"), d); VERIFY( res == WIDEN("7.111222") ); res = std::format(WIDEN("{:.6%Q}"), d); @@ -47,7 +59,23 @@ test_Q() res = std::format(WIDEN("{:.9%Q}"), d); VERIFY( res == WIDEN("7.111222") ); + duration md = d; + res = std::format(WIDEN("{:%Q}"), md); + VERIFY( res == WIDEN("7111.222") ); + res = std::format(WIDEN("{:.0%Q}"), md); + VERIFY( res == WIDEN("7111.222") ); + res = std::format(WIDEN("{:.3%Q}"), md); + VERIFY( res == WIDEN("7111.222") ); + res = std::format(WIDEN("{:.6%Q}"), md); + VERIFY( res == WIDEN("7111.222") ); + res = std::format(WIDEN("{:.9%Q}"), md); + VERIFY( res == WIDEN("7111.222") ); + const duration nd = d; + res = std::format(WIDEN("{:%Q}"), nd); + VERIFY( res == WIDEN("7111222000") ); + res = std::format(WIDEN("{:.0%Q}"), nd); + VERIFY( res == WIDEN("7111222000") ); res = std::format(WIDEN("{:.3%Q}"), nd); VERIFY( res == WIDEN("7111222000") ); res = std::format(WIDEN("{:.6%Q}"), nd); @@ -58,12 +86,16 @@ test_Q() template void -test_S() +test_S_fp() { std::basic_string res; // Precision is ignored, but period affects output - const duration d(5.111222); + duration d(5.111222); + res = std::format(WIDEN("{:%S}"), d); + VERIFY( res == WIDEN("05") ); + res = std::format(WIDEN("{:.0%S}"), d); + VERIFY( res == WIDEN("05") ); res = std::format(WIDEN("{:.3%S}"), d); VERIFY( res == WIDEN("05") ); res = std::format(WIDEN("{:.6%S}"), d); @@ -71,7 +103,11 @@ test_S() res = std::format(WIDEN("{:.9%S}"), d); VERIFY( res == WIDEN("05") ); - const duration md = d; + duration md = d; + res = std::format(WIDEN("{:%S}"), md); + VERIFY( res == WIDEN("05.111") ); + res = std::format(WIDEN("{:.0%S}"), md); + VERIFY( res == WIDEN("05.111") ); res = std::format(WIDEN("{:.3%S}"), md); VERIFY( res == WIDEN("05.111") ); res = std::format(WIDEN("{:.6%S}"), md); @@ -79,13 +115,70 @@ test_S() res = std::format(WIDEN("{:.9%S}"), md); VERIFY( res == WIDEN("05.111") ); - const duration nd = d; + duration ud = d; + res = std::format(WIDEN("{:%S}"), ud); + VERIFY( res == WIDEN("05.111222") ); + res = std::format(WIDEN("{:.0%S}"), ud); + VERIFY( res == WIDEN("05.111222") ); + res = std::format(WIDEN("{:.3%S}"), ud); + VERIFY( res == WIDEN("05.111222") ); + res = std::format(WIDEN("{:.6%S}"), ud); + VERIFY( res == WIDEN("05.111222") ); + res = std::format(WIDEN("{:.9%S}"), ud); + VERIFY( res == WIDEN("05.111222") ); + + duration nd = d; + res = std::format(WIDEN("{:%S}"), nd); + VERIFY( res == WIDEN("05.111222000") ); + res = std::format(WIDEN("{:.0%S}"), nd); + VERIFY( res == WIDEN("05.111222000") ); res = std::format(WIDEN("{:.3%S}"), nd); VERIFY( res == WIDEN("05.111222000") ); res = std::format(WIDEN("{:.6%S}"), nd); VERIFY( res == WIDEN("05.111222000") ); res = std::format(WIDEN("{:.9%S}"), nd); VERIFY( res == WIDEN("05.111222000") ); + + duration pd = d; + res = std::format(WIDEN("{:%S}"), pd); + VERIFY( res == WIDEN("05.11122200") ); + res = std::format(WIDEN("{:.0%S}"), pd); + VERIFY( res == WIDEN("05.11122200") ); + res = std::format(WIDEN("{:.3%S}"), pd); + VER
Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA
On Tue, Jun 24, 2025 at 5:25 PM Alexander Monakov wrote: > > > I'd say we want to fix these kind of things before switching the default. > > Can > > you file bugreports for the distinct issues you noticed when adjusting the > > testcases? > > Sure, filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120808 for the most > frequently hit issue on x86 for now. Thanks. So almost all issues arise because the FMAs are then introduced early (and possible folding with negates is done late). At some point we've arranged FMAs to be produced after vectorization only (there might be targets with scalar FMA but no vector FMA for example). It shouldn't be too hard to handle FMAs during vectorization but having a mix will certainly complicate things. Likewise undoing FMA creation when there's no vector FMA would rely on detecting whether the FMA was introduced by the compiler or the middle-end (I suppose builtin vs. IFN might do the job here). > > I suppose they are reproducible as well when using the C fma() function > > directly? > > No, unfortunately there are multiple issues with fma builtin: > > 1) __builtin_fma does not accept generic vector types indeed, you'd have to declare an OMP SIMD fma variant but that will not be recognized as fma or .FMA then I think. > 2) we have FMS FNMA FNMS FMADDSUB FMSUBADD internal functions, but > no corresponding builtins These are direct optab internal functions. I'm not sure we want builtins for all of those, fma () with negated arguments should do fine. > 3) __builtin_fma and .FMA internal function are not the same in the > middle-end, > I reported one instance arising from that in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109892 The builtin and the internal function should behave the same, in this case it's again late vs. early exposal of FMA. I am testing partial fixes for these issues. Richard. > > Alexander
[PATCH] tree-optimization/109892 - SLP reduction of fma
The following adds the ability to vectorize a fma reduction pair as SLP reduction (we cannot yet handle ternary association in reduction vectorization yet). Bootstrapped and tested on x86_64-unknown-linux-gnu. PR tree-optimization/109892 * tree-vect-loop.cc (reduction_fn_for_scalar_code): Handle fma. * gcc.dg/vect/vect-reduc-fma-1.c: New testcase. * gcc.dg/vect/vect-reduc-fma-2.c: Likewise. --- gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c | 15 +++ gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c | 20 gcc/tree-vect-loop.cc| 4 3 files changed, 39 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c new file mode 100644 index 000..e958b43e23b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */ + +double f(double x[], long n) +{ +double r0 = 0, r1 = 0; +for (; n; x += 2, n--) { +r0 = __builtin_fma(x[0], x[0], r0); +r1 = __builtin_fma(x[1], x[1], r1); +} +return r0 + r1; +} + +/* We should vectorize this as SLP reduction. */ +/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors and unroll factor 1" "vect" { target { x86_64-*-* i?86-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c new file mode 100644 index 000..ea1ca9720e5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-ffp-contract=on" } */ +/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */ + +static double muladd(double x, double y, double z) +{ +return x * y + z; +} +double g(double x[], long n) +{ +double r0 = 0, r1 = 0; +for (; n; x += 2, n--) { +r0 = muladd(x[0], x[0], r0); +r1 = muladd(x[1], x[1], r1); +} +return r0 + r1; +} + +/* We should vectorize this as SLP reduction. */ +/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors and unroll factor 1" "vect" { target { x86_64-*-* i?86-*-* } } } } */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a3f95433a5b..1e6e9cede18 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -3906,6 +3906,10 @@ reduction_fn_for_scalar_code (code_helper code, internal_fn *reduc_fn) *reduc_fn = IFN_REDUC_FMIN; return true; + CASE_CFN_FMA: + *reduc_fn = IFN_LAST; + return true; + default: return false; } -- 2.43.0
Re: [PATCH v2] libstdc++: Test for %S precision for durations with integral representation.
On Wed, 25 Jun 2025 at 10:42, Tomasz Kamiński wrote: > > Existing test are extented to cover cases where not precision is specified, > or it is specified to zero. The precision value is ignored in all cases. > > libstdc++-v3/ChangeLog: > > * testsuite/std/time/format/precision.cc: New tests. > --- > v2 extents test to cover .0 as precision. > Testing on x86_64-linux. std/format/time* test passed, also with > -D_GLIBCXX_USE_CXX11_ABI=0 > and -D_GLIBCXX_DEBUG. OK for trunk > > > .../testsuite/std/time/format/precision.cc| 104 +- > 1 file changed, 99 insertions(+), 5 deletions(-) > > diff --git a/libstdc++-v3/testsuite/std/time/format/precision.cc > b/libstdc++-v3/testsuite/std/time/format/precision.cc > index ccb2c77ce05..aa266156c1f 100644 > --- a/libstdc++-v3/testsuite/std/time/format/precision.cc > +++ b/libstdc++-v3/testsuite/std/time/format/precision.cc > @@ -16,6 +16,10 @@ test_empty() >std::basic_string res; > >const duration d(33.111222); > + res = std::format(WIDEN("{:}"), d); > + VERIFY( res == WIDEN("33.1112s") ); > + res = std::format(WIDEN("{:.0}"), d); > + VERIFY( res == WIDEN("33.1112s") ); >res = std::format(WIDEN("{:.3}"), d); >VERIFY( res == WIDEN("33.1112s") ); >res = std::format(WIDEN("{:.6}"), d); > @@ -25,6 +29,10 @@ test_empty() > >// Uses ostream operator<< >const duration nd = d; > + res = std::format(WIDEN("{:}"), nd); > + VERIFY( res == WIDEN("3.31112e+10ns") ); > + res = std::format(WIDEN("{:.0}"), nd); > + VERIFY( res == WIDEN("3.31112e+10ns") ); >res = std::format(WIDEN("{:.3}"), nd); >VERIFY( res == WIDEN("3.31112e+10ns") ); >res = std::format(WIDEN("{:.6}"), nd); > @@ -40,6 +48,10 @@ test_Q() >std::basic_string res; > >const duration d(7.111222); > + res = std::format(WIDEN("{:%Q}"), d); > + VERIFY( res == WIDEN("7.111222") ); > + res = std::format(WIDEN("{:.0%Q}"), d); > + VERIFY( res == WIDEN("7.111222") ); >res = std::format(WIDEN("{:.3%Q}"), d); >VERIFY( res == WIDEN("7.111222") ); >res = std::format(WIDEN("{:.6%Q}"), d); > @@ -47,7 +59,23 @@ test_Q() >res = std::format(WIDEN("{:.9%Q}"), d); >VERIFY( res == WIDEN("7.111222") ); > > + duration md = d; > + res = std::format(WIDEN("{:%Q}"), md); > + VERIFY( res == WIDEN("7111.222") ); > + res = std::format(WIDEN("{:.0%Q}"), md); > + VERIFY( res == WIDEN("7111.222") ); > + res = std::format(WIDEN("{:.3%Q}"), md); > + VERIFY( res == WIDEN("7111.222") ); > + res = std::format(WIDEN("{:.6%Q}"), md); > + VERIFY( res == WIDEN("7111.222") ); > + res = std::format(WIDEN("{:.9%Q}"), md); > + VERIFY( res == WIDEN("7111.222") ); > + >const duration nd = d; > + res = std::format(WIDEN("{:%Q}"), nd); > + VERIFY( res == WIDEN("7111222000") ); > + res = std::format(WIDEN("{:.0%Q}"), nd); > + VERIFY( res == WIDEN("7111222000") ); >res = std::format(WIDEN("{:.3%Q}"), nd); >VERIFY( res == WIDEN("7111222000") ); >res = std::format(WIDEN("{:.6%Q}"), nd); > @@ -58,12 +86,16 @@ test_Q() > > template > void > -test_S() > +test_S_fp() > { >std::basic_string res; > >// Precision is ignored, but period affects output > - const duration d(5.111222); > + duration d(5.111222); > + res = std::format(WIDEN("{:%S}"), d); > + VERIFY( res == WIDEN("05") ); > + res = std::format(WIDEN("{:.0%S}"), d); > + VERIFY( res == WIDEN("05") ); >res = std::format(WIDEN("{:.3%S}"), d); >VERIFY( res == WIDEN("05") ); >res = std::format(WIDEN("{:.6%S}"), d); > @@ -71,7 +103,11 @@ test_S() >res = std::format(WIDEN("{:.9%S}"), d); >VERIFY( res == WIDEN("05") ); > > - const duration md = d; > + duration md = d; > + res = std::format(WIDEN("{:%S}"), md); > + VERIFY( res == WIDEN("05.111") ); > + res = std::format(WIDEN("{:.0%S}"), md); > + VERIFY( res == WIDEN("05.111") ); >res = std::format(WIDEN("{:.3%S}"), md); >VERIFY( res == WIDEN("05.111") ); >res = std::format(WIDEN("{:.6%S}"), md); > @@ -79,13 +115,70 @@ test_S() >res = std::format(WIDEN("{:.9%S}"), md); >VERIFY( res == WIDEN("05.111") ); > > - const duration nd = d; > + duration ud = d; > + res = std::format(WIDEN("{:%S}"), ud); > + VERIFY( res == WIDEN("05.111222") ); > + res = std::format(WIDEN("{:.0%S}"), ud); > + VERIFY( res == WIDEN("05.111222") ); > + res = std::format(WIDEN("{:.3%S}"), ud); > + VERIFY( res == WIDEN("05.111222") ); > + res = std::format(WIDEN("{:.6%S}"), ud); > + VERIFY( res == WIDEN("05.111222") ); > + res = std::format(WIDEN("{:.9%S}"), ud); > + VERIFY( res == WIDEN("05.111222") ); > + > + duration nd = d; > + res = std::format(WIDEN("{:%S}"), nd); > + VERIFY( res == WIDEN("05.111222000") ); > + res = std::format(WIDEN("{:.0%S}"), nd); > + VERIFY( res == WIDEN("05.111222000") ); >res = std::format(WIDEN("{:.3%S}"), nd); >VERIFY( res == WIDEN("05.111222000") ); >res = std::format(WIDEN("{:.6%S}"), nd); >VERIFY( res == WIDEN("05.111222000") ); >res = std:
Re: [PATCH v3] reassoc: Optimize CMP/XOR expressions [PR116860]
Hi Jakub, thanks for the feedback. We have sent a new version (https://gcc.gnu.org/pipermail/gcc-patches/2025-June/687530.html), addressing those issues. Regarding the hash_sets, we have replaced them with vectors in some cases and in the cases that we're still using them we're copying them to sorted vectors before traversals. Konstantinos On Thu, Jun 12, 2025 at 10:36 AM Jakub Jelinek wrote: > > On Mon, Mar 17, 2025 at 11:40:32AM +0100, Konstantinos Eleftheriou wrote: > > * gcc.dg/tree-ssa/fold-xor-and-or.c: > > Remove logical-op-non-short-circuit=1. > > The remove certainly fits on the same line as : > and --param= is missing before the option name. > > > +/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." > > "optimized" } } */ > > \ No newline at end of file > > Please avoid files not ending with newline unless intentional. > > > +/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." > > "optimized" } } */ > > \ No newline at end of file > > Ditto. > > > --- a/gcc/tree-ssa-reassoc.cc > > +++ b/gcc/tree-ssa-reassoc.cc > > @@ -4077,6 +4077,359 @@ optimize_range_tests_var_bound (enum tree_code > > opcode, int first, int length, > >return any_changes; > > } > > > > +/* Helper function for optimize_cmp_xor_exprs. Visit EXPR operands > > + recursively and try to find comparison or XOR expressions that can be > > + solved using the expressions in CALC_STMTS. Expressions that can be > > folded > > + to 0 are stored in STMTS_TO_FOLD. IS_OR_EXPR is true for OR expressions > > + and false for AND expressions. */ > > + > > +tree > > Missing static before tree > > > +solve_expr (tree expr, hash_set *calc_stmts, > > + hash_set *stmts_to_fold, hash_set *visited, > > + bool is_or_expr) > > +{ > > + /* Return, if have already visited this expression or the expression is > > not > > + an SSA name. */ > > + if (visited->contains (expr) || TREE_CODE (expr) != SSA_NAME) > > The TREE_CODE (expr) != SSA_NAME test is certainly much cheaper than > visited->contains (expr), so please swap the || operands. > > +void > > Again missing static before return type (and in more spots) > > +find_terminal_nodes (tree expr, hash_set *terminal_nodes, > + hash_set *visited) > +{ > + if (visited->contains (expr)) > +return; > + > + visited->add (expr); > > The above together is > if (visited->add (expr)) > return; > (and more efficient in that). > > > +return NULL_TREE; > > + > > + visited->add (expr); > > + > > + gimple *def_stmt = SSA_NAME_DEF_STMT (expr); > > + > > + if (!def_stmt || !is_gimple_assign (def_stmt)) > > +return expr; > > + > > + unsigned int op_num = gimple_num_ops (def_stmt); > > + unsigned int terminal_node_num = 0; > > + /* Visit the expression operands recursively until finding a statement > > that > > until it finds ? > > > + > > + do { > > The formatting is wrong, it shouldn't be > do { > statements; > } while (cond); > but > do > { > statements; > } > while (cond); > > Last but not least, there are ton's of hash_sets involved, wonder if one could > away without that for the common simple cases and use those only if it is > larger, but more importantly, I believe the code generation depends on the > hash_set traversals, which is a big no no for reproduceability, because the > hash_set or hash_set I believe just use hashes derived from > the pointer values and so with address space randomization, even subsequent > runs of the same compiler on the same machine could result in different code > generation, not even talking about cross compilers with different hosts etc. > It is fine to use hash set traversals to find out what will need to be done, > but in that case it should be e.g. pushed into some vector worklist and the > worklist sorted by something stable (e.g. SSA_NAME_VERSIONs or positions in > the original term sequence etc., i.e. something that reflects the IL and > not the pointer values of particular trees or gimple *). > > Jakub >
[PATCH v4] reassoc: Optimize CMP/XOR expressions [PR116860]
Testcases for match.pd patterns `((a ^ b) & c) cmp d | a != b -> (0 cmp d | a != b)` and `(a ^ b) cmp c | a != b -> (0 cmp c | a != b)` were failing on some targets, like PowerPC. This patch adds an implemenetation for the optimization in reassoc. Doing so, we can now handle cases where the related conditions appear in an AND expression too. Also, we can optimize cases where we have intermediate expressions between the related ones in the AND/OR expression on some targets. This is not handled on targets like PowerPC, where each condition of the AND/OR expression is placed into a different basic block. Bootstrapped/regtested on x86 and AArch64. PR tree-optimization/116860 gcc/ChangeLog: * tree-ssa-reassoc.cc (solve_expr): New function. (find_terminal_nodes): New function. (get_terminal_nodes): New function. (sort_elements): New function. (copy_hashset_to_vec_and_sort): New function. (optimize_cmp_xor_exprs): New function. (optimize_range_tests): Call optimize_cmp_xor_exprs. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fold-xor-and-or.c: Remove logical-op-non-short-circuit=1. * gcc.dg/tree-ssa/fold-xor-or.c: Likewise. * gcc.dg/tree-ssa/fold-xor-and-or-2.c: New test. * gcc.dg/tree-ssa/fold-xor-and.c: New test. --- .../gcc.dg/tree-ssa/fold-xor-and-or-2.c | 59 +++ .../gcc.dg/tree-ssa/fold-xor-and-or.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c | 55 +++ gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or.c | 2 +- gcc/tree-ssa-reassoc.cc | 415 ++ 5 files changed, 531 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c new file mode 100644 index ..a11fcb3732a8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c @@ -0,0 +1,59 @@ +/* This test is not working across all targets (e.g. it fails on PowerPC, + because each condition of the AND/OR expression is placed into + a different basic block). Therefore, it is gated for x86-64 and AArch64, + where we know that it has to pass. */ +/* { dg-do compile { target { aarch64-*-* x86_64-*-* } } } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +typedef unsigned long int uint64_t; + +int cmp1_or_inter(int d1, int d2, int d3) { + if (((d1 ^ d2) & 0xabcd) == 0 || d3 != 10 || d1 != d2) +return 0; + return 1; +} + +int cmp2_or_inter(int d1, int d2, int d3, int d4) { + if (((d1 ^ d2) & 0xabcd) == 0 || d3 != 10 || d1 != d2 || d4 == 11) +return 0; + return 1; +} + +int cmp1_and_inter(int d1, int d2, int d3) { + if (!(((d1 ^ d2) & 0xabcd) == 0) && d3 == 10 && d1 == d2) +return 0; + return 1; +} + +int cmp2_and_inter(int d1, int d2, int d3, int d4) { + if (!(((d1 ^ d2) & 0xabcd) == 0) && d3 == 10 && d1 == d2 && d4 != 11) +return 0; + return 1; +} + +int cmp1_or_inter_64(uint64_t d1, uint64_t d2, uint64_t d3) { + if (((d1 ^ d2) & 0xabcd) == 0 || d3 != 10 || d1 != d2) +return 0; + return 1; +} + +int cmp2_or_inter_64(uint64_t d1, uint64_t d2, uint64_t d3, uint64_t d4) { + if (((d1 ^ d2) & 0xabcd) == 0 || d3 != 10 || d1 != d2 || d4 == 11) +return 0; + return 1; +} + +int cmp1_and_inter_64(uint64_t d1, uint64_t d2, uint64_t d3) { + if (!(((d1 ^ d2) & 0xabcd) == 0) && d3 == 10 && d1 == d2) +return 0; + return 1; +} + +int cmp2_and_inter_64(uint64_t d1, uint64_t d2, uint64_t d3, uint64_t d4) { + if (!(((d1 ^ d2) & 0xabcd) == 0) && d3 == 10 && d1 == d2 && d4 != 11) +return 0; + return 1; +} + +/* The if should be removed, so the condition should not exist */ +/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c index 99e83d8e5aae..e5dc98e7541d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -fdump-tree-optimized --param logical-op-non-short-circuit=1" } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ typedef unsigned long int uint64_t; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c new file mode 100644 index ..9957ef27dc70 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +typedef unsigned long int uint64_t; + +int cmp1(int d1, int d2) { + if (!((d1 ^ d2) == 0xabcd) && d1 == d2) +return 0; + return 1; +} + +int cmp2(int d1, int d2) { + if (d1 == d2 && !((d1 ^ d2) == 0xabcd)) +return 0; + return 1; +} + +int cmp3(int d1, int d2)
[PATCH] RISC-V: Generate -mcpu and -mtune options from riscv-cores.def.
Automatically generate -mcpu and -mtune options in invoke.texi from the unified riscv-cores.def metadata, ensuring documentation stays in sync with definitions and reducing manual maintenance. gcc/ChangeLog: * Makefile.in: Add riscv-mcpu.texi and riscv-mtune.texi to the list of files to be processed by the Texinfo generator. * config/riscv/t-riscv: Add rule for generating riscv-mcpu.texi and riscv-mtune.texi. * doc/invoke.texi: Replace hand‑written extension table with `@include riscv-mcpu.texi` and `@include riscv-mtune.texi` to pull in auto‑generated entries. * config/riscv/gen-riscv-mcpu-texi.cc: New file. * config/riscv/gen-riscv-mtune-texi.cc: New file. * doc/riscv-mcpu.texi: New file. * doc/riscv-mtune.texi: New file. --- gcc/Makefile.in | 2 +- gcc/config/riscv/gen-riscv-mcpu-texi.cc | 43 +++ gcc/config/riscv/gen-riscv-mtune-texi.cc | 41 ++ gcc/config/riscv/t-riscv | 37 - gcc/doc/invoke.texi | 23 ++-- gcc/doc/riscv-mcpu.texi | 69 gcc/doc/riscv-mtune.texi | 59 7 files changed, 251 insertions(+), 23 deletions(-) create mode 100644 gcc/config/riscv/gen-riscv-mcpu-texi.cc create mode 100644 gcc/config/riscv/gen-riscv-mtune-texi.cc create mode 100644 gcc/doc/riscv-mcpu.texi create mode 100644 gcc/doc/riscv-mtune.texi diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 9535804f7fb5..2d5e3427550d 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3710,7 +3710,7 @@ TEXI_GCC_FILES = gcc.texi gcc-common.texi gcc-vers.texi frontends.texi\ contribute.texi compat.texi funding.texi gnu.texi gpl_v3.texi \ fdl.texi contrib.texi cppenv.texi cppopts.texi avr-mmcu.texi \ implement-c.texi implement-cxx.texi gcov-tool.texi gcov-dump.texi \ -lto-dump.texi riscv-ext.texi +lto-dump.texi riscv-ext.texi riscv-mcpu.texi riscv-mtune.texi # we explicitly use $(srcdir)/doc/tm.texi here to avoid confusion with # the generated tm.texi; the latter might have a more recent timestamp, diff --git a/gcc/config/riscv/gen-riscv-mcpu-texi.cc b/gcc/config/riscv/gen-riscv-mcpu-texi.cc new file mode 100644 index ..980a1103e0f9 --- /dev/null +++ b/gcc/config/riscv/gen-riscv-mcpu-texi.cc @@ -0,0 +1,43 @@ +#include +#include +#include + +int +main () +{ + puts ("@c Copyright (C) 2025 Free Software Foundation, Inc."); + puts ("@c This is part of the GCC manual."); + puts ("@c For copying conditions, see the file gcc/doc/include/fdl.texi."); + puts (""); + puts ("@c This file is generated automatically using"); + puts ("@c gcc/config/riscv/gen-riscv-mcpu-texi.cc from:"); + puts ("@c gcc/config/riscv/riscv-cores.def"); + puts (""); + puts ("@c Please *DO NOT* edit manually."); + puts (""); + puts ("@samp{Core Name}"); + puts (""); + puts ("@opindex mcpu"); + puts ("@item -mcpu=@var{processor-string}"); + puts ("Use architecture of and optimize the output for the given processor, specified"); + puts ("by particular CPU name. Permissible values for this option are:"); + puts (""); + puts (""); + + std::vector coreNames; + +#define RISCV_CORE(CORE_NAME, ARCH, MICRO_ARCH) \ + coreNames.push_back (#CORE_NAME); +#include "riscv-cores.def" +#undef RISCV_CORE + + for (size_t i = 0; i < coreNames.size(); ++i) { +if (i == coreNames.size() - 1) { + printf("@samp{%s}.\n", coreNames[i].c_str()); +} else { + printf("@samp{%s},\n\n", coreNames[i].c_str()); +} + } + + return 0; +} diff --git a/gcc/config/riscv/gen-riscv-mtune-texi.cc b/gcc/config/riscv/gen-riscv-mtune-texi.cc new file mode 100644 index ..0c30b524895e --- /dev/null +++ b/gcc/config/riscv/gen-riscv-mtune-texi.cc @@ -0,0 +1,41 @@ +#include +#include +#include + +int +main () +{ + puts ("@c Copyright (C) 2025 Free Software Foundation, Inc."); + puts ("@c This is part of the GCC manual."); + puts ("@c For copying conditions, see the file gcc/doc/include/fdl.texi."); + puts (""); + puts ("@c This file is generated automatically using"); + puts ("@c gcc/config/riscv/gen-riscv-mtune-texi.cc from:"); + puts ("@c gcc/config/riscv/riscv-cores.def"); + puts (""); + puts ("@c Please *DO NOT* edit manually."); + puts (""); + puts ("@samp{Tune Name}"); + puts (""); + puts ("@opindex mtune"); + puts ("@item -mtune=@var{processor-string}"); + puts ("Optimize the output for the given processor, specified by microarchitecture or"); + puts ("particular CPU name. Permissible values for this option are:"); + puts (""); + puts (""); + + std::vector tuneNames; + +#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \ + tuneNames.push_back (#TUNE_NAME); +#include "riscv-cores.def" +#undef RISCV_TUNE + + for (size_t i = 0; i < tuneNames.size(); ++i) { +printf("@samp{%s},\n\n
[PATCH] c++/modules: Ensure type of partial spec VAR_DECL is consistent with its template [PR120644]
On Tue, Jun 24, 2025 at 12:10:09PM -0400, Patrick Palka wrote: > On Tue, 24 Jun 2025, Jason Merrill wrote: > > > On 6/23/25 5:41 PM, Nathaniel Shead wrote: > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15? > > > > > > -- >8 -- > > > > > > We were erroring because the TEMPLATE_DECL of the existing partial > > > specialisation has an undeduced return type, but the imported > > > declaration did not. > > > > > > The root cause is similar to what was fixed in r13-2744-g4fac53d6522189, > > > where modules streaming code assumes that a TEMPLATE_DECL and its > > > DECL_TEMPLATE_RESULT will always have the same TREE_TYPE. That commit > > > fixed the issue by ensuring that when the type of a variable is deduced > > > the TEMPLATE_DECL is updated as well, but this missed handling partial > > > specialisations. > > > > > > However, I don't think we actually care about that, since it seems that > > > only the type of the inner decl actually matters in practice. Instead, > > > this patch handles the issue on the modules side when deduping a > > > streamed decl, by only comparing the inner type. > > > > > > PR c++/120644 > > > > > > gcc/cp/ChangeLog: > > > > > > * decl.cc (cp_finish_decl): Remove workaround. > > > > Hmm, if we aren't going to try to keep the type of the TEMPLATE_DECL > > correct, > > maybe we should always set it to NULL_TREE to make sure we only look at the > > inner type. > > FWIW cp_finish_decl can get at the TEMPLATE_DECL of a VAR_DECL > corresponding to a partial specialization via > > TI_TEMPLATE (TI_PARTIAL_INFO (DECL_TEMPLATE_INFO (decl))) > > if we do want to end up keeping the two TREE_TYPEs in sync. > Thanks. On further reflection, maybe the safest approach is to just ensure that the types are always consistent (including for partial specs); this is what the following patch does. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- Subject: [PATCH] c++/modules: Ensure type of partial spec VAR_DECL is consistent with its template [PR120644] We were erroring because the TEMPLATE_DECL of the existing partial specialisation has an undeduced return type, but the imported declaration did not. The root cause is similar to what was fixed in r13-2744-g4fac53d6522189, where modules streaming code assumes that a TEMPLATE_DECL and its DECL_TEMPLATE_RESULT will always have the same TREE_TYPE. That commit fixed the issue by ensuring that when the type of a variable is deduced the TEMPLATE_DECL is updated as well, but missed handling partial specialisations. This patch ensures that the same adjustment is made there as well. PR c++/120644 gcc/cp/ChangeLog: * decl.cc (cp_finish_decl): Also propagate type to partial templates. * module.cc (trees_out::decl_value): Add assertion that the TREE_TYPE of a streamed template decl matches its inner. (trees_in::is_matching_decl): Clarify function return type deduction should only occur for non-TEMPLATE_DECL. gcc/testsuite/ChangeLog: * g++.dg/modules/auto-7.h: New test. * g++.dg/modules/auto-7_a.H: New test. * g++.dg/modules/auto-7_b.C: New test. Signed-off-by: Nathaniel Shead Reviewed-by: Jason Merrill Reviewed-by: Patrick Palka --- gcc/cp/decl.cc | 13 + gcc/cp/module.cc| 7 ++- gcc/testsuite/g++.dg/modules/auto-7.h | 12 gcc/testsuite/g++.dg/modules/auto-7_a.H | 5 + gcc/testsuite/g++.dg/modules/auto-7_b.C | 5 + 5 files changed, 37 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/auto-7.h create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_a.H create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_b.C diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 4fe97ffbf8f..59701197e16 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -8923,10 +8923,15 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, cp_apply_type_quals_to_decl (cp_type_quals (type), decl); /* Update the type of the corresponding TEMPLATE_DECL to match. */ - if (DECL_LANG_SPECIFIC (decl) - && DECL_TEMPLATE_INFO (decl) - && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) == decl) - TREE_TYPE (DECL_TI_TEMPLATE (decl)) = type; + if (DECL_LANG_SPECIFIC (decl) && DECL_TEMPLATE_INFO (decl)) + { + tree info = DECL_TEMPLATE_INFO (decl); + tree tmpl = TI_TEMPLATE (info); + if (DECL_TEMPLATE_RESULT (tmpl) == decl) + TREE_TYPE (tmpl) = type; + else if (PRIMARY_TEMPLATE_P (tmpl) && TI_PARTIAL_INFO (info)) + TREE_TYPE (TI_TEMPLATE (TI_PARTIAL_INFO (info))) = type; + } } if (ensure_literal_type_for_constexpr_object (decl) == error_mark_node) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index c99988da05b..53edb2ff203 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -8212,6 +8212,10 @@ tr
[PATCH 02/17] Mark pass_sccopy gate and execute functions as final override
Hi, It is customary to mark the gate and execute functions of the classes representing passes as final override but this is missing in pass_sccopy. This patch adds it which also silences clang warnings about it. Bootstrapped and tested on x86_64-linux. Because of the precedent elsewhere I consider this obvious and will commit it shortly. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * gimple-ssa-sccopy.cc (class pass_sccopy): Mark member functions gate and execute as final override. --- gcc/gimple-ssa-sccopy.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc index c93374572a9..341bae46080 100644 --- a/gcc/gimple-ssa-sccopy.cc +++ b/gcc/gimple-ssa-sccopy.cc @@ -699,8 +699,8 @@ public: {} /* opt_pass methods: */ - virtual bool gate (function *) { return true; } - virtual unsigned int execute (function *); + virtual bool gate (function *) final override { return true; } + virtual unsigned int execute (function *) final override; opt_pass * clone () final override { return new pass_sccopy (m_ctxt); } }; // class pass_sccopy -- 2.49.0
[PATCH 11/17] tree-vect-stmts.cc: Remove an unused shadowed variable
Hi, when compiling tree-vect-stmts.cc with clang, it emits a warning: gcc/tree-vect-stmts.cc:14930:19: warning: unused variable 'mode_iter' [-Wunused-variable] And indeed, there are two mode_iter local variables in function supportable_indirect_convert_operation and the first one is not used at all. This patch removes it. Bootstrapped and tested on x86_64-linx. OK for master? Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * tree-vect-stmts.cc (supportable_indirect_convert_operation): Remove an unused shadowed variable. --- gcc/tree-vect-stmts.cc | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f699d808e68..652c590e553 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -14927,7 +14927,6 @@ supportable_indirect_convert_operation (code_helper code, bool found_mode = false; scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_out)); scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_in)); - opt_scalar_mode mode_iter; tree_code tc1, tc2, code1, code2; tree cvt_type = NULL_TREE; -- 2.49.0
[PATCH 09/17] jit: Silence clang warning in jit-builtins.cc
Hi, When compiling GCC (with JIT enabled) by clang, it produces a series of warning s like this for all uses of DEF_GOACC_BUILTIN_COMPILER and DEF_GOMP_BUILTIN_COMPILER in omp-builtins.def: -- In file included from /home/worker/buildworker/tiber-gcc-clang/build/gcc/jit/jit-builtins.cc:61: In file included from /home/worker/buildworker/tiber-gcc-clang/build/gcc/builtins.def:1276: /home/worker/buildworker/tiber-gcc-clang/build/gcc/omp-builtins.def:55:1: warning: non-constant-expression cannot be narrowed from type 'int' to 'bool' in initializer list [-Wc++11-narrowing] 55 | DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device", | ^~~~ 56 | BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST) | /home/worker/buildworker/tiber-gcc-clang/build/gcc/builtins.def:225:9: note: expanded from macro 'DEF_GOACC_BUILTIN_COMPILER' 225 |flag_openacc, true, true, ATTRS, false, true) |^~~~ ./options.h:7049:22: note: expanded from macro 'flag_openacc' 7049 | #define flag_openacc global_options.x_flag_openacc | ^ /home/worker/buildworker/tiber-gcc-clang/build/gcc/jit/jit-builtins.cc:58:23: note: expanded from macro 'DEF_BUILTIN' 58 | {NAME, CLASS, TYPE, BOTH_P, FALLBACK_P, ATTRS, IMPLICIT}, | ^~ /home/worker/buildworker/tiber-gcc-clang/build/gcc/omp-builtins.def:55:1: note: insert an explicit cast to silence this issue -- I'm not sure to what extent this is an actual problem or not, but flag_openacc is an int and we do store it in a bool, so I this patch does add the explicit cast clang asks for. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warnings instead. Thanks, Martin gcc/jit/ChangeLog: 2025-06-23 Martin Jambor * jit-builtins.cc (DEF_BUILTIN): Add explicit cast to bool of BOTH_P. --- gcc/jit/jit-builtins.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/jit/jit-builtins.cc b/gcc/jit/jit-builtins.cc index 84e0bd5347f..ddbba55d3f3 100644 --- a/gcc/jit/jit-builtins.cc +++ b/gcc/jit/jit-builtins.cc @@ -55,7 +55,7 @@ struct builtin_data #define DEF_BUILTIN(X, NAME, CLASS, TYPE, LT, BOTH_P, FALLBACK_P, \ NONANSI_P, ATTRS, IMPLICIT, COND) \ - {NAME, CLASS, TYPE, BOTH_P, FALLBACK_P, ATTRS, IMPLICIT}, + {NAME, CLASS, TYPE, (bool) BOTH_P, FALLBACK_P, ATTRS, IMPLICIT}, static const struct builtin_data builtin_data[] = { #include "builtins.def" -- 2.49.0
[PATCH 12/17] Silence a clang warning in tree-vect-slp.cc about an unused variable
Hi, since r15-4695-gd17e672ce82e69 (Richard Biener: Assert finished vectorizer pattern COND_EXPR transition), the static const array cond_expr_maps is unused and when GCC is compiled with clang, it warns about that. This patch simply removes the variable. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning instead. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * tree-vect-slp.cc (cond_expr_maps): Remove. --- gcc/tree-vect-slp.cc | 5 - 1 file changed, 5 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index dc89da3bf17..39692ea9465 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -507,11 +507,6 @@ vect_def_types_match (enum vect_def_type dta, enum vect_def_type dtb) && (dtb == vect_external_def || dtb == vect_constant_def))); } -static const int cond_expr_maps[3][5] = { - { 4, -1, -2, 1, 2 }, - { 4, -2, -1, 1, 2 }, - { 4, -1, -2, 2, 1 } -}; static const int no_arg_map[] = { 0 }; static const int arg0_map[] = { 1, 0 }; static const int arg1_map[] = { 1, 1 }; -- 2.49.0
[PATCH 06/17] value-relation.h: Mark dom_oracle::next_relation as override
Hi, When GCC is compiled with clang, it emits a warning that dom_oracle::next_relation is not marked as override even though it does override a virtual function of its ancestor. This patch marks it as such to silence the warning and for the sake of consistency. There are other member functions in the class which are marked as final override but this particular function is in the protected section so I decided to just mark it as override. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning instead. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * value-relation.h (class dom_oracle): Mark member function next_relation as override. --- gcc/value-relation.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/value-relation.h b/gcc/value-relation.h index 1081877ccca..87f0d856fab 100644 --- a/gcc/value-relation.h +++ b/gcc/value-relation.h @@ -235,7 +235,7 @@ public: void dump (FILE *f) const final override; protected: virtual relation_chain *next_relation (basic_block, relation_chain *, -tree) const; +tree) const override; bool m_do_trans_p; bitmap m_tmp, m_tmp2; bitmap m_relation_set; // Index by ssa-name. True if a relation exists -- 2.49.0
[PATCH 14/17] c-format: Removed unused private member
Hi, when building GCC with clang, it warns that the private member suffix in class element_expected_type_with_indirection (defined in gcc/c-family/c-format.cc) is not used which indeed looks like it is the case. This patch therefore removes it. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning instead. Thanks, Martin gcc/c-family/ChangeLog: 2025-06-24 Martin Jambor * c-format.cc (class element_expected_type_with_indirection): Remove member m_wanted_type. --- gcc/c-family/c-format.cc | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc index a44249a0222..1fdda3faaf5 100644 --- a/gcc/c-family/c-format.cc +++ b/gcc/c-family/c-format.cc @@ -4817,7 +4817,6 @@ public: private: const char *m_wanted_type_name; - tree m_wanted_type; int m_pointer_count; }; -- 2.49.0
[COMMITTED] - get_bitmask is sometimes less refined.
While looking at something else, I decided to write some self-tests for the bound-snapping changes. Along the way, I discovered a couple of things. This patch has the self tests, and they tripped over an issue with get_bitmask (). get_bitmask () takes the current mask, and intersect it with a mask derived from the lower and upper bounds, giving us useful results. However, when the 2 masks are incompatible, it was returning bitmask_unknown, which is akin to a VARYING result. It has no way of communicating an UNDEFINED result, which would be more appropriate. Instead, it should just return the original mask. Any undefined results will show up eventually when set_range_from_bitmask () is called. This patch provides the updated get_bitmask as well as all the self tests. Bootstraps on x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From 5ae33c8f44f0112644b561dfc549c1dc2c679b6f Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Tue, 24 Jun 2025 13:10:56 -0400 Subject: [PATCH 1/3] get_bitmask is sometimes less refined. get_bitmask intersects the current mask with a mask generated from the range. If the 2 masks are incompatible, it currently returns UNKNOWN. Instead, ti should return the original mask or information is lost. * value-range.cc (irange::get_bitmask): Return original mask if result is unknown. (assert_snap_result): New. (test_irange_snap_bounds): New. (range_tests_misc): Call test_irange_snap_bounds. --- gcc/value-range.cc | 117 - 1 file changed, 116 insertions(+), 1 deletion(-) diff --git a/gcc/value-range.cc b/gcc/value-range.cc index 23a5c66ed5e..85c1e26287e 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -2513,7 +2513,13 @@ irange::get_bitmask () const // See also the note in irange_bitmask::intersect. irange_bitmask bm (type (), lower_bound (), upper_bound ()); if (!m_bitmask.unknown_p ()) -bm.intersect (m_bitmask); +{ + bm.intersect (m_bitmask); + // If the new intersection is unknown, it means there are inconstent + // bits, so simply return the original bitmask. + if (bm.unknown_p ()) + return m_bitmask; +} return bm; } @@ -2879,6 +2885,112 @@ range_tests_strict_enum () ASSERT_FALSE (ir1.varying_p ()); } +// Test that range bounds are "snapped" to where they are expected to be. + +static void +assert_snap_result (int lb_val, int ub_val, + int expected_lb, int expected_ub, + unsigned mask_val, unsigned value_val, + tree type) +{ + wide_int lb = wi::shwi (lb_val, TYPE_PRECISION (type)); + wide_int ub = wi::shwi (ub_val, TYPE_PRECISION (type)); + wide_int new_lb, new_ub; + + irange_bitmask bm (wi::uhwi (value_val, TYPE_PRECISION (type)), + wi::uhwi (mask_val, TYPE_PRECISION (type))); + + int_range_max r (type); + r.set (type, lb, ub); + r.update_bitmask (bm); + + if (TYPE_SIGN (type) == SIGNED && expected_ub < expected_lb) +gcc_checking_assert (r.undefined_p ()); + else if (TYPE_SIGN (type) == UNSIGNED + && ((unsigned)expected_ub < (unsigned)expected_lb)) +gcc_checking_assert (r.undefined_p ()); + else +{ + gcc_checking_assert (wi::eq_p (r.lower_bound (), + wi::shwi (expected_lb, + TYPE_PRECISION (type; + gcc_checking_assert (wi::eq_p (r.upper_bound (), + wi::shwi (expected_ub, + TYPE_PRECISION (type; +} +} + + +// Run a selection of tests that confirm, bounds are snapped as expected. +// We only test individual pairs, multiple pairs use the same snapping +// routine as single pairs. + +static void +test_irange_snap_bounds () +{ + tree u32 = unsigned_type_node; + tree s32 = integer_type_node; + tree s8 = build_nonstandard_integer_type (8, /*unsigned=*/ 0); + tree s1 = build_nonstandard_integer_type (1, /*unsigned=*/ 0); + tree u1 = build_nonstandard_integer_type (1, /*unsigned=*/ 1); + + // Basic aligned range: even-only + assert_snap_result (5, 15, 6, 14, 0xFFFE, 0x0, u32); + // Singleton that doesn't match mask: undefined. + assert_snap_result (7, 7, 1, 0, 0xFFFE, 0x0, u32); + // 8-bit signed char, mask 0xF0 (i.e. step of 16). + assert_snap_result (-100, 100, -96, 96, 0xF0, 0x00, s8); + // Already aligned range: no change. + assert_snap_result (0, 240, 0, 240, 0xF0, 0x00, u32); + // Negative range, step 16 alignment (s32). + assert_snap_result (-123, -17, -112, -32, 0xFFF0, 0x00, s32); + // Negative range, step 16 alignment (trailing-zero aligned mask). + assert_snap_result (-123, -17, -112, -32, 0xFFF0, 0x00, s32); + // s8, 16-alignment mask, value = 0 (valid). + assert_snap_result (-50, 10, -48, 0, 0xF0, 0x00, s8); + // No values in range [-3,2] match alignment except 0. + assert_snap_result (-3, 2, 0, 0, 0xF8, 0x00, s8); + // No values in range [-3,2] match alignment — undefined. + assert_snap_result (-3, 2, 1, 0, 0xF8, 0x04, s8); + // Already aligned range: no change. + assert_snap_result (0,
[COMMITTED] Promote verify_range to vrange.
Another thing I noticed is that verifying a range outside of private constraints was actually quite difficult. Most range classes had a verify_range () routine, but they were private, not constant, and impossible to invoke if we were in a situation where all we had a was a vrange. This patch promotes verify_range () to a public call in vrange, makes it virtual, provides a hook in value-range, and ensure it just works everywhere. There is no current need for the call, but it sure would have been handy earlier in the week. Lets just make it consistent across all range classes. Bootstraps on x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From 8213212eba1cad976823716c0c4ba835c842d0b2 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Thu, 19 Jun 2025 21:19:27 -0400 Subject: [PATCH 2/3] Promote verify_range to vrange. most range classes had a verufy_range, but it was all private. Make it a supported routine from vrange. * value-range.cc (frange::verify_range): Constify. (irange::verify_range): Constify. * value-range.h (vrange::verify_range): New. (irange::verify_range): Make public. (prange::verify_range): Make public. (prange::verify_range): Make public. (value_range::verify_range): New. --- gcc/value-range.cc | 4 ++-- gcc/value-range.h | 9 + 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/gcc/value-range.cc b/gcc/value-range.cc index 85c1e26287e..dc6909e77c5 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -1205,7 +1205,7 @@ frange::supports_type_p (const_tree type) const } void -frange::verify_range () +frange::verify_range () const { if (!undefined_p ()) gcc_checking_assert (HONOR_NANS (m_type) || !maybe_isnan ()); @@ -1515,7 +1515,7 @@ irange::set (tree min, tree max, value_range_kind kind) // Check the validity of the range. void -irange::verify_range () +irange::verify_range () const { gcc_checking_assert (m_discriminator == VR_IRANGE); if (m_kind == VR_UNDEFINED) diff --git a/gcc/value-range.h b/gcc/value-range.h index c32c5076b63..5c358f3c70c 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -111,6 +111,7 @@ public: bool operator== (const vrange &) const; bool operator!= (const vrange &r) const { return !(*this == r); } void dump (FILE *) const; + virtual void verify_range () const { } protected: vrange (enum value_range_discriminator d) : m_discriminator (d) { } ENUM_BITFIELD(value_range_kind) m_kind : 8; @@ -323,6 +324,7 @@ public: virtual void update_bitmask (const class irange_bitmask &) override; virtual irange_bitmask get_bitmask () const override; + virtual void verify_range () const; protected: void maybe_resize (int needed); virtual void set (tree, tree, value_range_kind = VR_RANGE) override; @@ -335,7 +337,6 @@ protected: void normalize_kind (); - void verify_range (); // Hard limit on max ranges allowed. static const int HARD_MAX_RANGES = 255; @@ -421,7 +422,7 @@ public: bool contains_p (const wide_int &) const; wide_int lower_bound () const; wide_int upper_bound () const; - void verify_range () const; + virtual void verify_range () const; irange_bitmask get_bitmask () const final override; void update_bitmask (const irange_bitmask &) final override; protected: @@ -593,14 +594,13 @@ public: bool nan_signbit_p (bool &signbit) const; bool known_isnormal () const; bool known_isdenormal_or_zero () const; - + virtual void verify_range () const; protected: virtual bool contains_p (tree cst) const override; virtual void set (tree, tree, value_range_kind = VR_RANGE) override; private: bool internal_singleton_p (REAL_VALUE_TYPE * = NULL) const; - void verify_range (); bool normalize_kind (); bool union_nans (const frange &); bool intersect_nans (const frange &); @@ -798,6 +798,7 @@ public: void update_bitmask (const class irange_bitmask &bm) { return m_vrange->update_bitmask (bm); } void accept (const vrange_visitor &v) const { m_vrange->accept (v); } + void verify_range () const { m_vrange->verify_range (); } private: void init (tree type); void init (const vrange &); -- 2.45.0
Re: [PATCH] vect: Misalign checks for gather/scatter.
On Wed, 25 Jun 2025, Robin Dapp wrote: > Hi, > > this patch adds simple misalignment checks for gather/scatter > operations. Previously, we assumed that those perform element accesses > internally so alignment does not matter. The riscv vector spec however > explicitly states that vector operations are allowed to fault on > element-misaligned accesses. Reasonable uarchs won't, but... > > For gather/scatter we have two paths in the vectorizer: > > (1) Regular analysis based on datarefs. Here we can also create > strided loads. > (2) Non-affine access where each gather index is relative to the > initial address. > > The assumption this patch works off is that once the alignment for the > first scalar is correct, all others will fall in line, as the index is > always a multiple of the first element's size. > > For (1) we have a dataref and can check it for alignment as in other > cases. For (2) this patch checks the object alignment of BASE and > compares it against the natural alignment of the current vectype's unit. > > The patch also adds a pointer argument to the gather/scatter IFNs that > contains the necessary alignment. Most of the patch is thus mechanical > in that it merely adjusts indices. > > I tested the riscv version with a custom qemu version that faults on > element-misaligned vector accesses. With this patch applied, there is > just a single fault left, which is due to PR120782 and which will be > addressed separately. > > Is the general approach reasonable or do we need to do something else > entirely? Bootstrap and regtest on aarch64 went fine. > > I couldn't bootstrap/regtest on x86 as my regular cfarm machines > (420-422) are currently down. Issues are expected, though, as the patch > doesn't touch x86's old-style gathers/scatters at all yet. I still > wanted to get this initial version out there to get feedback. > > The two riscv-specific changes I can still split off, obviously. > Also, I couldn't help but do tiny refactoring in some spots :) This > could also go if requested. > > I noticed one early-break failure with the changes where we would give > up on a load_permutation of {0}. It looks latent and probably > unintended but I didn't investigate for now and just allowed this > specific permutation. This change reminds me that we lack documentation about arguments of most of the "complicated" internal functions ... We miss internal_fn_gatherscatter_{offset,scale}_index and possibly a internal_fn_ldst_ptr_index (always zero?) and internal_fn_ldst_alias_align_index (always one, if supported?). if (elsvals && icode != CODE_FOR_nothing) get_supported_else_vals - (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, *elsvals); + (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals); these "fixes" seem to be independent? + /* TODO: Is IS_PACKED necessary/useful here or does get_obj_alignment + suffice? */ + bool is_packed = not_size_aligned (DR_REF (dr)); + info->align_ptr = build_int_cst +(reference_alias_ptr_type (DR_REF (dr)), + is_packed ? 1 : get_object_alignment (DR_REF (dr))); I think get_object_alignment should be sufficient. + gs_info->align_ptr = build_int_cst + (reference_alias_ptr_type (DR_REF (dr)), DR_BASE_ALIGNMENT (dr)); why's this? If DR_BASE_ALIGNMENT is bigger than element alignment it could be possibly not apply to all loads forming the gather? @@ -2411,8 +2413,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, || *memory_access_type == VMAT_CONTIGUOUS_REVERSE) *poffset = neg_ldst_offset; - if (*memory_access_type == VMAT_GATHER_SCATTER - || *memory_access_type == VMAT_ELEMENTWISE + if (*memory_access_type == VMAT_ELEMENTWISE this probably needs some refactoring with the adjustments you do in get_load_store_type given a few lines above we can end up classifying a load/store as VMAT_GATHER_SCATTER if vect_use_strided_gather_scatters_p. But then you'd use the wrong alignment analysis going forward. + bool is_misaligned = scalar_align < inner_vectype_sz; + bool is_packed = scalar_align > 1 && is_misaligned; + + *misalignment = !is_misaligned ? 0 : inner_vectype_sz - scalar_align; + + if (targetm.vectorize.support_vector_misalignment + (TYPE_MODE (vectype), inner_vectype, *misalignment, is_packed)) the misalignment argument is meaningless, I think you want to pass DR_MISALIGNMENT_UNKNOWN for this and just pass is_packed if the scalars acesses are not at least size aligned. Note the hook really doesn't know whether you ask it for gather/scatter or a contiguous vector load so I wonder whether the above fits constraints on other platforms where scalar accesses might be allowed to be packed but all unaligned vector accesses would need to be element aligned? + /* The alignment_ptr of the base. */ The TBAA alias pointer type where the value determines the alignment of the scalar accesses. + tree
Re: [PATCH 1/2] Match: Support for signed scalar SAT_ADD IMM form 2
On Tue, Jun 24, 2025 at 5:12 AM Ciyan Pan wrote: > > From: panciyan > > This patch would like to support signed scalar SAT_ADD IMM form 2 > > Form2: > T __attribute__((noinline)) \ > sat_s_add_imm_##T##_fmt_2##_##INDEX (T x)\ > {\ > T sum = (T)((UT)x + (UT)IMM); \ > return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \ > (-(T)(x < 0) ^ MAX) : sum; \ > } > > Take below form1 as example: > DEF_SAT_S_ADD_IMM_FMT_2(0, int8_t, uint8_t, 9, INT8_MIN, INT8_MAX) > > Before this patch: > __attribute__((noinline)) > int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x) > { > int8_t sum; > unsigned char x.0_1; > unsigned char _2; > signed char _3; > signed char _4; > _Bool _5; > signed char _6; > int8_t _7; > int8_t _10; > signed char _11; > signed char _13; > signed char _14; > >[local count: 1073741822]: > x.0_1 = (unsigned char) x_8(D); > _2 = x.0_1 + 9; > sum_9 = (int8_t) _2; > _3 = x_8(D) ^ sum_9; > _4 = x_8(D) ^ 9; > _13 = ~_3; > _14 = _4 | _13; > if (_14 >= 0) > goto ; [59.00%] > else > goto ; [41.00%] > >[local count: 259738146]: > _5 = x_8(D) < 0; > _11 = (signed char) _5; > _6 = -_11; > _10 = _6 ^ 127; > >[local count: 1073741824]: > # _7 = PHI > return _7; > > } > > After this patch: > __attribute__((noinline)) > int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x) > { > int8_t _7; > >[local count: 1073741824]: > _7 = .SAT_ADD (x_8(D), 9); [tail call] > return _7; > > } > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression tests. > 2. The x86 bootstrap tests. > 3. The x86 fully regression tests. > > Signed-off-by: Ciyan Pan > gcc/ChangeLog: > > * match.pd: OK with sth filled in here. Richard. > > --- > gcc/match.pd | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index f4416d9172c..10c2b97f494 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3500,7 +3500,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > wide_int c2 = wi::to_wide (@2); > wide_int sum = wi::add (c1, c2); > } > -(if (wi::eq_p (sum, wi::max_value (precision, SIGNED))) > +(if (wi::eq_p (sum, wi::max_value (precision, SIGNED)) > + > +(match (signed_integer_sat_add @0 @1) > + /* T SUM = (T)((UT)X + (UT)IMM) > + SAT_S_ADD = (X ^ SUM) < 0 && (X ^ IMM) >= 0 ? (-(T)(X < 0) ^ MAX) : SUM > */ > + (cond^ (ge (bit_ior:c (bit_xor:c @0 INTEGER_CST@1) > + (bit_not (bit_xor:c @0 (nop_convert@2 (plus > (nop_convert @0) > + INTEGER_CST@3) > + integer_zerop) > + (signed_integer_sat_val @0) > + @2) > + (if (wi::eq_p (wi::to_wide (@1), wi::to_wide (@3)) > > /* Saturation sub for signed integer. */ > (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)) > -- > 2.43.0 >
[PATCH 01/17] Mark rtl_avoid_store_forwarding functions final override
Hi, It is customary to mark the gate and execute functions of the classes representing passes as final override but this is missing in pass_rtl_avoid_store_forwarding. This patch adds it which also silences a clang warning about it. Bootstrapped and tested on x86_64-linux. Because of the precedent elsewhere I consider this obvious and will commit it shortly. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * avoid-store-forwarding.cc (class pass_rtl_avoid_store_forwarding): Mark member function gate as final override. --- gcc/avoid-store-forwarding.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc index 6825d0426ec..37e095316c9 100644 --- a/gcc/avoid-store-forwarding.cc +++ b/gcc/avoid-store-forwarding.cc @@ -80,12 +80,12 @@ public: {} /* opt_pass methods: */ - virtual bool gate (function *) + virtual bool gate (function *) final override { return flag_avoid_store_forwarding && optimize >= 1; } - virtual unsigned int execute (function *) override; + virtual unsigned int execute (function *) final override; }; // class pass_rtl_avoid_store_forwarding /* Handler for finding and avoiding store forwardings. */ -- 2.49.0
[PATCH 03/17] Diagnostics: Mark path_label::get_effects as final override
Hi, When compiling diagnostic-path-output.cc with clang, it warns that path_label::get_effects should be marked as override. That looks like a good idea and from a brief look I also believe it should be marked as final (the other override in the class is marked as both), so this patch does that. Likewise for html_output_format::after_diagnostic in diagnostic-format-html.cc which also already has quite a few member functions marked as final override. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning(s) instead. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * diagnostic-path-output.cc (path_label::get_effects): Mark as final override. * diagnostic-format-html.cc (html_output_format::after_diagnostic): Likewise. --- gcc/diagnostic-format-html.cc | 2 +- gcc/diagnostic-path-output.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/diagnostic-format-html.cc b/gcc/diagnostic-format-html.cc index 45d088150dd..b2c7214d7f1 100644 --- a/gcc/diagnostic-format-html.cc +++ b/gcc/diagnostic-format-html.cc @@ -1201,7 +1201,7 @@ public: { m_builder.emit_diagram (diagram); } - void after_diagnostic (const diagnostic_info &) + void after_diagnostic (const diagnostic_info &) final override { /* No-op, but perhaps could show paths here. */ } diff --git a/gcc/diagnostic-path-output.cc b/gcc/diagnostic-path-output.cc index bae24bf01a7..4bec3a66267 100644 --- a/gcc/diagnostic-path-output.cc +++ b/gcc/diagnostic-path-output.cc @@ -135,7 +135,7 @@ class path_label : public range_label return result; } - const label_effects *get_effects (unsigned /*range_idx*/) const + const label_effects *get_effects (unsigned /*range_idx*/) const final override { return &m_effects; } -- 2.49.0
[PATCH 05/17] tree-ssa-propagate.h: Mark two functions as override
When tree-ssa-propagate.h is compiled with clang, it complains that member functions functions value_of_expr and range_of_expr of class substitute_and_fold_engine are not marked as override even though they do override virtual functions of the ancestor class. This patch merely adds the keyword to silence the warning and for consistency's sake. I did not make this part of the previous patch because I wanted to point out that the first case is quite unusual, a virtual function with a functional body (range_query::value_of_expr) is being overridden with a pure virtual function. I assume it was a conscious decision but adding the override keyword seems even more important then. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning(s) instead. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * tree-ssa-propagate.h (class substitute_and_fold_engine): Mark member functions value_of_expr and range_of_expr as override. --- gcc/tree-ssa-propagate.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/tree-ssa-propagate.h b/gcc/tree-ssa-propagate.h index 8429e38f40e..200fc732079 100644 --- a/gcc/tree-ssa-propagate.h +++ b/gcc/tree-ssa-propagate.h @@ -102,10 +102,10 @@ class substitute_and_fold_engine : public range_query substitute_and_fold_engine (bool fold_all_stmts = false) : fold_all_stmts (fold_all_stmts) { } - virtual tree value_of_expr (tree expr, gimple * = NULL) = 0; + virtual tree value_of_expr (tree expr, gimple * = NULL) override = 0; virtual tree value_on_edge (edge, tree expr) override; virtual tree value_of_stmt (gimple *, tree name = NULL) override; - virtual bool range_of_expr (vrange &r, tree expr, gimple * = NULL); + virtual bool range_of_expr (vrange &r, tree expr, gimple * = NULL) override; virtual ~substitute_and_fold_engine (void) { } virtual bool fold_stmt (gimple_stmt_iterator *) { return false; } -- 2.49.0
[PATCH v3] Evaluate the object size by the size of the pointee type when the type is a structure with flexible array member which is annotated with counted_by.
Hi, This is the 3rd version of the patch for: Evaluate the object size by the size of the pointee type when the type is a structure with flexible array member which is annotated with counted_by. Compared to the 2nd version of the patch at: https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682923.html The major changes include: A. Add a new --param objsz-allow-dereference-input=0|1 to control this feature; B. Some code reorg to the routine "insert_cond_and_size" to make it more readable. The patch has been bootstrapped and regression tested on both x86 and aarch64. Okay for trunk? thanks. Qing === In tree-object-size.cc, if the size is UNKNOWN after evaluating use-def chain, We can evaluate the SIZE of the pointee TYPE ONLY when this TYPE is a structure type with flexible array member which is attached a counted_by attribute, since a structure with FAM can not be an element of an array, so, the pointer must point to a single object with this structure with FAM. Control this behavior with a new --param objsz-allow-dereference-input=0|1 Default is 0. This is only available for C now. gcc/c/ChangeLog: * c-lang.cc (LANG_HOOKS_BUILD_COUNTED_BY_REF): Define to below function. * c-tree.h (c_build_counted_by_ref): New extern function. * c-typeck.cc (build_counted_by_ref): Rename to ... (c_build_counted_by_ref): ...this. (handle_counted_by_for_component_ref): Call the renamed function. gcc/ChangeLog: * doc/invoke.texi: Add documentation for the new option --param objsz-allow-dereference-input. * langhooks-def.h (LANG_HOOKS_BUILD_COUNTED_BY_REF): New language hook. * langhooks.h (struct lang_hooks_for_types): Add build_counted_by_ref. * params.opt: New param objsz-allow-dereference-input. * tree-object-size.cc (struct object_size_info): Add a new field insert_cf. (insert_cond_and_size): New function. (gimplify_size_expressions): Handle new field insert_cf. (compute_builtin_object_size): Init the new field to false; (is_pointee_fam_struct_with_counted_by): New function. (record_with_fam_object_size): New function. (collect_object_sizes_for): Call record_with_fam_object_size. (dynamic_object_sizes_execute_one): Special handling for insert_cf. gcc/testsuite/ChangeLog: * gcc.dg/flex-array-counted-by-3.c: Update test for whole object size; * gcc.dg/flex-array-counted-by-4.c: Likewise. * gcc.dg/flex-array-counted-by-5.c: Likewise. * gcc.dg/flex-array-counted-by-10.c: New test. --- gcc/c/c-lang.cc | 3 + gcc/c/c-tree.h| 1 + gcc/c/c-typeck.cc | 6 +- gcc/doc/invoke.texi | 13 + gcc/langhooks-def.h | 4 +- gcc/langhooks.h | 5 + gcc/params.opt| 4 + .../gcc.dg/flex-array-counted-by-10.c | 41 +++ .../gcc.dg/flex-array-counted-by-3.c | 7 +- .../gcc.dg/flex-array-counted-by-4.c | 36 ++- .../gcc.dg/flex-array-counted-by-5.c | 6 +- gcc/tree-object-size.cc | 305 +- 12 files changed, 406 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-10.c diff --git a/gcc/c/c-lang.cc b/gcc/c/c-lang.cc index c69077b2a93..e9ec9e6e64a 100644 --- a/gcc/c/c-lang.cc +++ b/gcc/c/c-lang.cc @@ -51,6 +51,9 @@ enum c_language_kind c_language = clk_c; #undef LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE #define LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE c_get_sarif_source_language +#undef LANG_HOOKS_BUILD_COUNTED_BY_REF +#define LANG_HOOKS_BUILD_COUNTED_BY_REF c_build_counted_by_ref + /* Each front end provides its own lang hook initializer. */ struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h index 364f51df58c..627791551b4 100644 --- a/gcc/c/c-tree.h +++ b/gcc/c/c-tree.h @@ -777,6 +777,7 @@ extern struct c_switch *c_switch_stack; extern bool null_pointer_constant_p (const_tree); +extern tree c_build_counted_by_ref (tree, tree, tree *); inline bool c_type_variably_modified_p (tree t) diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index e24629be918..44031ca1ae3 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -2940,8 +2940,8 @@ should_suggest_deref_p (tree datum_type) &(p->k) */ -static tree -build_counted_by_ref (tree datum, tree subdatum, tree *counted_by_type) +tree +c_build_counted_by_ref (tree datum, tree subdatum, tree *counted_by_type) { tree type = TREE_TYPE (datum); if (!c_flexible_array_member_type_p (TREE_TYPE (subdatum))) @@ -3039,7 +3039,7 @@ handle_counted_by_for_component_ref (location_t loc, tree ref) tree datum = TR
[PATCH 10/17] rust: Silence a clang warning in borrow-checker-diagnostics
Hi, when compiling gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc with clang, it emits the following warning: gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc:145:46: warning: non-constant-expression cannot be narrowed from type 'Polonius::Loan' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing] I'd hope that for indexing that is never really a problem, nevertheless if narrowing is taking place, I guess it can be argued it should be made explicit. I have so far only tested this with the clang compile, I will try to do a bootstrap with rust-enabled too. Philip, Pierre, would you be willing to incorporate this into your tree and commit it to master at gcc.gnu.org from there? Or should I commit it to master at gcc.gnu.org and you'll merge it from there? Thanks, Martin gcc/rust/ChangeLog: 2025-06-23 Martin Jambor * checks/errors/borrowck/rust-borrow-checker-diagnostics.cc (BorrowCheckerDiagnostics::get_loan): Type cast loan to uint32_t. --- .../checks/errors/borrowck/rust-borrow-checker-diagnostics.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc b/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc index 6c67706780b..adf1448791e 100644 --- a/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc +++ b/gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc @@ -142,7 +142,7 @@ BorrowCheckerDiagnostics::get_statement (Polonius::Point point) const BIR::Loan & BorrowCheckerDiagnostics::get_loan (Polonius::Loan loan) { - return bir_function.place_db.get_loans ()[{loan}]; + return bir_function.place_db.get_loans ()[{(uint32_t) loan}]; } const HIR::LifetimeParam * -- 2.49.0
[PATCH 08/17] ranger-op: Use CFN_ constant instead of plain BUILTIN_ one
Hi, when compiling gimple-range-op.cc, clang issues warning: gimple-range-op.cc:1419:18: warning: comparison of different enumeration types in switch statement ('combined_fn' and 'built_in_function') [-Wenum-compare-switch] which I hope is harmless, but all other switch cases use CFN_ prefixed constants, so I guess the ISINF case should too. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning instead. Thanks, Martin gcc/ChangeLog: 2025-06-23 Martin Jambor * gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call): Use CFN_BUILT_IN_ISINF instead of BUILT_IN_ISINF. --- gcc/gimple-range-op.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 90a61971489..c9bc5c0c6b9 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1416,7 +1416,7 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = &op_cfn_signbit; break; -CASE_FLT_FN (BUILT_IN_ISINF): +CASE_FLT_FN (CFN_BUILT_IN_ISINF): m_op1 = gimple_call_arg (call, 0); m_operator = &op_cfn_isinf; break; -- 2.49.0
[PATCH 07/17] gfortran: Avoid freeing uninitialized value
Hi, When compiling fortran/match.cc, clang emits a warning fortran/match.cc:5301:7: warning: variable 'p' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] which looks accurate, so this patch adds an initialization of p to avoid the use. Bootstrapped and tested on x86_64-linx. OK for master? Thanks, Martin gcc/fortran/ChangeLog: 2025-06-23 Martin Jambor * match.cc (gfc_match_nullify): Initialize p to NULL; --- gcc/fortran/match.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc index a99a757bede..2e5ba29d9a4 100644 --- a/gcc/fortran/match.cc +++ b/gcc/fortran/match.cc @@ -5293,7 +5293,7 @@ match gfc_match_nullify (void) { gfc_code *tail; - gfc_expr *e, *p; + gfc_expr *e, *p = NULL; match m; tail = NULL; -- 2.49.0
[PATCH 13/17] lto-ltrans-cache: Remove unused private member
Hi, when building GCC with clang, it warns that the private member suffix in class ltrans_file_cache (defined in lto-ltrans-cache.h) is not used which indeed looks like it is the case. This patch therefore removes it along with its initialization in the constructor. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning instead. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * lto-ltrans-cache.h (class ltrans_file_cache): Remove member prefix. * lto-ltrans-cache.cc (ltrans_file_cache::ltrans_file_cache): Do not initialize member prefix. --- gcc/lto-ltrans-cache.cc | 3 +-- gcc/lto-ltrans-cache.h | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc index c57775fae85..91af6ed6f82 100644 --- a/gcc/lto-ltrans-cache.cc +++ b/gcc/lto-ltrans-cache.cc @@ -210,8 +210,7 @@ write_cache_item (FILE* f, ltrans_file_cache::item *item, const char* dir) ltrans_file_cache::ltrans_file_cache (const char* dir, const char* prefix, const char* suffix, size_t soft_cache_size): - dir (dir), prefix (prefix), suffix (suffix), - soft_cache_size (soft_cache_size) + dir (dir), suffix (suffix), soft_cache_size (soft_cache_size) { if (!dir) return; diff --git a/gcc/lto-ltrans-cache.h b/gcc/lto-ltrans-cache.h index 5fef44bae53..fdb7a389435 100644 --- a/gcc/lto-ltrans-cache.h +++ b/gcc/lto-ltrans-cache.h @@ -122,8 +122,7 @@ private: std::map map_checksum; std::map map_input; - /* Cached filenames are in format "prefix%d[.ltrans]suffix". */ - const char* prefix; + /* Cached filenames are in format "cache_prefix%d[.ltrans]suffix". */ const char* suffix; /* If cache items count is larger, prune deletes old items. */ -- 2.49.0
[PATCH 17/17] Ignore more clang warnings in contrib/filter-clang-warnings.py
Hi, in contrib we have a script filter-clang-warnings.py which supposedly filters out uninteresting warnings emitted by clang when it compiles GCC. I'm not sure if anyone else uses it but our internal SUSE testing infrastructure does. Since Martin Liška left, I have mostly ignored the warnings and so they have multiplied. In an effort to improve the situation, I have tried to fix those warnings which I think are worth it and would like to adjust the filtering script so that we get to zero "interesting" warnings again. The changes are the following: 1. Ignore -Woverloaded-shift-op-parentheses warnings. IIUC, those make some sense when << and >> are used for I/O but since that is not the case in GCC they are not really interesting. 2. Ignore -Wunused-function and -Wunneeded-internal-declaration. I think it is OK to occasionally prepare APIs before they are used (and with our LTO we should be able to get rid of them). 3. Ignore -Wvla-cxx-extension and -Wunused-command-line-argument which just don't seem to be useful. 4. Ignore -Wunused-private-field warning in diagnostic-path-output.cc which can only be correct if quite a few functions are removed and looks like it is just not an oversight: gcc/diagnostic-path-output.cc:271:35: warning: private field 'm_logical_loc_mgr' is not used [-Wunused-private-field] 5. Ignore a case in -Wunused-but-set-variable about named_args which is used in a piece of code behind an ifdef. 6. Adjust the gimple-match and generic-match filters to the fact that we now have multiple such files. 7. Ignore warnings about using memcpy to copy around wide_ints, like the one below. I seem to remember wide-int has undergone fairly rigorous review and TBH I just hope I know what we are doing. gcc/wide-int.h:1198:11: warning: first argument in call to 'memcpy' is a pointer to non-trivially copyable type 'wide_int_storage' [-Wnontrivial-memcall] 8. I have decided to ignore warnings in m2/gm2-compiler-boot about unused stuff (all reported unused stuff are variables). These sources are in the build directory so I assume they are somehow generated and so warnings about unused things are a bit expected and probably not too bad. 9. On the Zulip chat, I have informed Rust folks they have a bunch of -Wunused-private-field cases in the FE. Until they sort it out I'm ignoring these. I might add the missing explicit type-cast case here too if it takes time for the patch I'm posting in this series to reach master. 10. I ignore warning about use of offsetof in libiberty/sha1.c which is apparently only a "C23 extension:" libiberty/sha1.c:239:11: warning: defining a type within 'offsetof' is a C23 extension [-Wc23-extensions] libiberty/sha1.c:460:11: warning: defining a type within 'offsetof' is a C23 extension [-Wc23-extensions] 11. I have enlarged the list of .texi files where warnings somehow got reported. Not sure why that happens. 12. I'm ignoring the -Wunused-const-variable case in value-relation.cc until Andrew commits the patch he has to remove it. With these changes and my other patches, we reach zero interesting warnings. Since I don't think anyone else uses the script, I'm would like to declare these changes "obvious" in the sense that they are obviously useful for me and obviously nobody else will mind or even be affected. I'm going to hold off for a week though, please let me know if I'm stretching the obvious rule too much here. Thanks, Martin contrib/ChangeLog: 2025-06-25 Martin Jambor * filter-clang-warnings.py (skip_warning): Also ignore -Woverloaded-shift-op-parentheses, -Wunused-function, -Wunneeded-internal-declaration, -Wvla-cxx-extension', and -Wunused-command-line-argument everywhere and a warning about m_logical_loc_mgr in diagnostic-path-output.cc. Adjust gimple-match and generic-match "filenames." Ignore -Wunused-const-variable warnings in value-relation.cc, -Wnontrivial-memcall warnings in wide-int.h, all warnings about unused stuff in files under m2/gm2-compiler-boot, all -Wunused-private-field in rust FE, all Warnings in avr-mmcu.texi, install.texi and libgccjit.texi and all -Wc23-extensions warnings in libiberty/sha1.c. --- contrib/filter-clang-warnings.py | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/contrib/filter-clang-warnings.py b/contrib/filter-clang-warnings.py index 2ea7c710163..f0f7885d26d 100755 --- a/contrib/filter-clang-warnings.py +++ b/contrib/filter-clang-warnings.py @@ -41,12 +41,22 @@ def skip_warning(filename, message): '-Wignored-attributes', '-Wgnu-zero-variadic-macro-arguments', '-Wformat-security', '-Wundefined-internal', '-Wunknown-warning-option', '-Wc++20-extensions', - '-Wbitwise-instead-of-logical', 'egrep is obsole
Re: [PATCH 3/4] c++/modules: Support streaming new size cookie for constexpr [PR120040]
On 5/21/25 10:15 PM, Nathaniel Shead wrote: This type currently has a DECL_NAME of an IDENTIFIER_DECL. Although the documentation indicates this is legal, this confuses modules streaming which expects all RECORD_TYPEs to have a TYPE_DECL, which is used to determine the context and merge key, etc. PR c++/120040 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression): Handle TYPE_NAME now being a TYPE_DECL rather than just an IDENTIFIER_NODE. * init.cc (build_new_constexpr_heap_type): Build a TYPE_DECL for the returned type; mark the type as artificial. * module.cc (trees_out::type_node): Add some assertions. gcc/testsuite/ChangeLog: * g++.dg/modules/pr120040_a.C: New test. * g++.dg/modules/pr120040_b.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/constexpr.cc | 2 +- gcc/cp/init.cc| 10 +- gcc/cp/module.cc | 3 +++ gcc/testsuite/g++.dg/modules/pr120040_a.C | 19 +++ gcc/testsuite/g++.dg/modules/pr120040_b.C | 15 +++ 5 files changed, 47 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/pr120040_a.C create mode 100644 gcc/testsuite/g++.dg/modules/pr120040_b.C diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index fa754b9a176..ceb8f04fab4 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -8613,7 +8613,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, tree cookie_size = NULL_TREE; tree arg_size = NULL_TREE; if (TREE_CODE (elt_type) == RECORD_TYPE - && TYPE_NAME (elt_type) == heap_identifier) + && DECL_NAME (TYPE_NAME (elt_type)) == heap_identifier) This could be TYPE_IDENTIFIER. OK either way. { tree fld1 = TYPE_FIELDS (elt_type); tree fld2 = DECL_CHAIN (fld1); diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc index 80a37a14a80..0a389fb6ecd 100644 --- a/gcc/cp/init.cc +++ b/gcc/cp/init.cc @@ -3010,7 +3010,6 @@ build_new_constexpr_heap_type (tree elt_type, tree cookie_size, tree itype2) tree atype1 = build_cplus_array_type (sizetype, itype1); tree atype2 = build_cplus_array_type (elt_type, itype2); tree rtype = cxx_make_type (RECORD_TYPE); - TYPE_NAME (rtype) = heap_identifier; tree fld1 = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, atype1); tree fld2 = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, atype2); DECL_FIELD_CONTEXT (fld1) = rtype; @@ -3019,7 +3018,16 @@ build_new_constexpr_heap_type (tree elt_type, tree cookie_size, tree itype2) DECL_ARTIFICIAL (fld2) = true; TYPE_FIELDS (rtype) = fld1; DECL_CHAIN (fld1) = fld2; + TYPE_ARTIFICIAL (rtype) = true; layout_type (rtype); + + tree decl = build_decl (UNKNOWN_LOCATION, TYPE_DECL, heap_identifier, rtype); + TYPE_NAME (rtype) = decl; + TYPE_STUB_DECL (rtype) = decl; + DECL_CONTEXT (decl) = NULL_TREE; + DECL_ARTIFICIAL (decl) = true; + layout_decl (decl, 0); + return rtype; } diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index ddb5299b244..765d17935c5 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -9362,6 +9362,7 @@ trees_out::type_node (tree type) tree root = (TYPE_NAME (type) ? TREE_TYPE (TYPE_NAME (type)) : TYPE_MAIN_VARIANT (type)); + gcc_checking_assert (root); if (type != root) { @@ -9440,6 +9441,8 @@ trees_out::type_node (tree type) || TREE_CODE (type) == UNION_TYPE || TREE_CODE (type) == ENUMERAL_TYPE) { + gcc_checking_assert (DECL_P (name)); + /* We can meet template parms that we didn't meet in the tpl_parms walk, because we're referring to a derived type that was previously constructed from equivalent template diff --git a/gcc/testsuite/g++.dg/modules/pr120040_a.C b/gcc/testsuite/g++.dg/modules/pr120040_a.C new file mode 100644 index 000..77e16892f4e --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/pr120040_a.C @@ -0,0 +1,19 @@ +// PR c++/120040 +// { dg-additional-options "-fmodules -std=c++20" } +// { dg-module-cmi M } + +export module M; + +struct S { + constexpr ~S() {} +}; + +export constexpr bool foo() { + S* a = new S[3]; + delete[] a; + return true; +} + +export constexpr S* bar() { + return new S[3]; +} diff --git a/gcc/testsuite/g++.dg/modules/pr120040_b.C b/gcc/testsuite/g++.dg/modules/pr120040_b.C new file mode 100644 index 000..e4610b07eaf --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/pr120040_b.C @@ -0,0 +1,15 @@ +// PR c++/120040 +// { dg-additional-options "-fmodules -std=c++20" } + +import M; + +constexpr bool qux() { + auto* s = bar(); + delete[] s; + return true; +} + +int main() { + static_assert(foo()); + static_assert(qux()); +}
[commmited v2] libstdc++: Report compilation error on formatting "%d" from month_last [PR120650]
For month_day we incorrectly reported day information to be available, which lead to format_error being thrown from the call to formatter::format at runtime, instead of making call to format ill-formed. The included test cover most of the combinations of _ChronoParts and format specifiers. PR libstdc++/120650 libstdc++-v3/ChangeLog: * include/bits/chrono_io.h (formatter::parse): Call _M_parse with only Month being available. * testsuite/std/time/format/data_not_present_neg.cc: New test. --- v2 adds "{ target cxx11_abi }" to dg-errors for types supported only in cxx11_abi. Test on x86_64-linux, and std/time/format* tested with -D_GLIBCXX_USE_CXX11_ABI=0. Pushed to trunk. libstdc++-v3/include/bits/chrono_io.h | 3 +- .../std/time/format/data_not_present_neg.cc | 164 ++ 2 files changed, 165 insertions(+), 2 deletions(-) create mode 100644 libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc diff --git a/libstdc++-v3/include/bits/chrono_io.h b/libstdc++-v3/include/bits/chrono_io.h index abbf4efcc3b..4eb00f4932d 100644 --- a/libstdc++-v3/include/bits/chrono_io.h +++ b/libstdc++-v3/include/bits/chrono_io.h @@ -2199,8 +2199,7 @@ namespace __format constexpr typename basic_format_parse_context<_CharT>::iterator parse(basic_format_parse_context<_CharT>& __pc) { - return _M_f._M_parse(__pc, __format::_Month|__format::_Day, -__defSpec); + return _M_f._M_parse(__pc, __format::_Month, __defSpec); } template diff --git a/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc new file mode 100644 index 000..bb09451dc29 --- /dev/null +++ b/libstdc++-v3/testsuite/std/time/format/data_not_present_neg.cc @@ -0,0 +1,164 @@ +// { dg-do compile { target c++20 } } + +#include +#include + +using namespace std::chrono; + +auto d1 = std::format("{:%w}", 10d); // { dg-error "call to consteval function" } +auto d2 = std::format("{:%m}", 10d); // { dg-error "call to consteval function" } +auto d3 = std::format("{:%y}", 10d); // { dg-error "call to consteval function" } +auto d4 = std::format("{:%F}", 10d); // { dg-error "call to consteval function" } +auto d5 = std::format("{:%T}", 10d); // { dg-error "call to consteval function" } +auto d6 = std::format("{:%Q}", 10d); // { dg-error "call to consteval function" } +auto d7 = std::format("{:%Z}", 10d); // { dg-error "call to consteval function" } + +auto w1 = std::format("{:%d}", Thursday); // { dg-error "call to consteval function" } +auto w2 = std::format("{:%m}", Thursday); // { dg-error "call to consteval function" } +auto w3 = std::format("{:%y}", Thursday); // { dg-error "call to consteval function" } +auto w4 = std::format("{:%F}", Thursday); // { dg-error "call to consteval function" } +auto w5 = std::format("{:%T}", Thursday); // { dg-error "call to consteval function" } +auto w6 = std::format("{:%Q}", Thursday); // { dg-error "call to consteval function" } +auto w7 = std::format("{:%Z}", Thursday); // { dg-error "call to consteval function" } + +auto wi1 = std::format("{:%d}", Thursday[2]); // { dg-error "call to consteval function" } +auto wi2 = std::format("{:%m}", Thursday[2]); // { dg-error "call to consteval function" } +auto wi3 = std::format("{:%y}", Thursday[2]); // { dg-error "call to consteval function" } +auto wi4 = std::format("{:%F}", Thursday[2]); // { dg-error "call to consteval function" } +auto wi5 = std::format("{:%T}", Thursday[2]); // { dg-error "call to consteval function" } +auto wi6 = std::format("{:%Q}", Thursday[2]); // { dg-error "call to consteval function" } +auto wi7 = std::format("{:%Z}", Thursday[2]); // { dg-error "call to consteval function" } + +auto wl1 = std::format("{:%d}", Thursday[last]); // { dg-error "call to consteval function" } +auto wl2 = std::format("{:%m}", Thursday[last]); // { dg-error "call to consteval function" } +auto wl3 = std::format("{:%y}", Thursday[last]); // { dg-error "call to consteval function" } +auto wl4 = std::format("{:%F}", Thursday[last]); // { dg-error "call to consteval function" } +auto wl5 = std::format("{:%T}", Thursday[last]); // { dg-error "call to consteval function" } +auto wl6 = std::format("{:%Q}", Thursday[last]); // { dg-error "call to consteval function" } +auto wl7 = std::format("{:%Z}", Thursday[last]); // { dg-error "call to consteval function" } + +auto m1 = std::format("{:%d}", January); // { dg-error "call to consteval function" } +auto m2 = std::format("{:%w}", January); // { dg-error "call to consteval function" } +auto m3 = std::format("{:%y}", January); // { dg-error "call to consteval function" } +auto m4 = std::format("{:%F}", January); // { dg-error "call to consteval function" } +auto m5 = std::format("{:%T}", January); // { dg-error "call to consteval function" } +auto m6 = std::format("{:%Q}", Janua
Re: [PATCH v6 8/9] AArch64: rules for CMPBR instructions
Richard Sandiford writes: > Karl Meakin writes: >> + "r")) >> + (label_ref (match_operand 2)) >> + (pc)))] >> + "TARGET_CMPBR" >> + "cb\\t%0, %1, %l2"; Sorry, for following up on myself, but: the pattern needs to handle far branches, in the same way as existing patterns do. That is: if (get_attr_far_branch (insn) == FAR_BRANCH_YES) return aarch64_gen_far_branch (...); else return "cb\\t%0, %1, %l2"; It would be good to have a test for this, e.g. by having an if-then-else in which the then and else blocks contain a series of 256(+) volatile stores, with the then and else storing to different volatile locations. Richard >> + [(set_attr "type" "branch") >> + (set (attr "length") >> +(if_then_else (and (ge (minus (match_dup 2) (pc)) >> + (const_int BRANCH_LEN_N_1Kib)) >> + (lt (minus (match_dup 2) (pc)) >> + (const_int BRANCH_LEN_P_1Kib))) >> + (const_int 4) >> + (const_int 8))) >> + (set (attr "far_branch") >> +(if_then_else (and (ge (minus (match_dup 2) (pc)) >> + (const_int BRANCH_LEN_N_1Kib)) >> + (lt (minus (match_dup 2) (pc)) >> + (const_int BRANCH_LEN_P_1Kib))) >> + (const_string "no") >> + (const_string "yes")))] >> +)
[PATCH] libstdc++: Type-erase chrono-data for formatting [PR110739]
This patch reworks the formatting for the chrono types, such that they are all formatted in terms of _ChronoData class, that includes all required fields. Populating each required field is performed in formatter for specific type, based on the chrono-spec used. To facilitate above, the _ChronoSpec now includes additional _M_needed field, that represnts the chrono data that is referenced by format spec (this value is also configured for __defSpec). This value differs from the value of __parts passed to _M_parse, which does include all fields that can be computed from input (e.g. weekday_indexed can be computed for year_month_day). Later it is used to fill _ChronoData, in particular _M_fill_* family of functions, to determine if given field needs to be set, and thus it's value needs to be computed. In consequence _ChronoParts enum was exteneded with additional values, that allows more fine grained indentification: * _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds, * _TimeZone is separated into _ZoneAbbrev and _ZoneOffset, * _LocalDays, _WeekdayIndex are defiend in included in _Date, * _Duration is removed, and instead _EpochUnits and _UnitSuffix are introduced. Furthermore, to avoid name conflicts _ChonoParts is now defined as enum class, with additional operators that simplify uses. In addition to fields that can be printed using chron-spec, _ChronoData stores: * Total days in wall time (_M_ldays), day of year (_M_day_of_year) - used by struct tm construction, and for ISO calendar computation. * Total seconds in wall time (_M_lseconds) - this value may be different from sum of days, hours, minutes, seconds (e.g. see utc_time below). Included to allow future extension, like printing total minutes. * Total seconds since epoch - due offset different from above. Again to be used with future extension (e.g. %s as proposed in P2945R1). * Subseconds - count of attoseconds (10^(-18)), in addition to priting can be used to compute fractional hours, minutes. The both total seconds fielkds we use single _TotalSeconds enumerator in _ChronoParts, that when present in combination with _EpochUnits or _LocalDays indicates that _M_eseconds (_EpochSeconds) or _M_lseconds (_LocalSeconds) are provided/required. To handle type formatting of time since epoch ('%Q'|_EpochUnits), we use the format_args mechanism, where the result of +d.count() (see LWG4118) is erased into make_format_args to local __arg_store, that is later referenced by _M_ereps (_M_ereps.get(0)). To handle precision values, and in prepartion to allow user to configure ones, we store the precision as third element of _M_ereps (_M_ereps.get(2)), this allows duration with precision to be printed using "{0:{2}}". For subseconds the precision is handled differently depending on the representation: * for integral reps, _M_subseconds value is used to determine fractional value, precision is trimmed to 18 digits; * for floating-points, we _M_ereps stores duration initialized with only fractional seconds, that is later formatted with precision. Always using _M_subseconds fields for integral duration, means that we do not use formattter for user-defined durations that are considered to be integral (see empty_spec.cc file change). To avoid potentially expensive computation of _M_subseconds, we make sure that _ChronoParts::_Subseconds is set only if _Subseconds are needed. In particular we remove this flag for localized ouput in _M_parse. Construction the _M_ereps as described above is handled by __formatter_duration, that is then used to format duration, hh_mm_ss and time_points specialization. This class also handles _UnitSuffix, the _M_units_suffix field is populated either with predefined suffix (chrono::__detail::__units_suffix) or one produced locally. Finally, formatters for types listed below contains type specific logic: * hh_mm_ss - we do not compute total duration and seconds, unless explicitly requested, as such computation may overflow; * utc_time - for time during leap second insertion, the _M_seconds field is increased to 60; * __local_time_fmt - exception is thrown if zone offset (_ZoneOffset) or abbrevation (_ZoneAbbrev) is requsted, but corresponding pointer is null, futhermore conversion from `char` to `wchar_t` for abbreviation is performed if needed. PR libstdc++/110739 libstdc++-v3/ChangeLog: * include/bits/chrono_io.h (__format::__no_timezone_available): Removed, replaced with separate throws in formatter for __local_time_fmt (__format::_ChronoParts): Defined additional enumertors and declared as enum class. (__format::operator&(_ChronoParts, _ChronoParts)) (__format::operator&=(_ChronoParts&, _ChronoParts)) (__format::operator-(_ChronoParts, _ChronoParts)) (__format::operator-=(_ChronoParts&, _ChronoParts)) (__format::operator==(_ChronoParts, decltype(nullptr))) (_ChronoSpec::
Re: [PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted
Christoph Müllner writes: > On Tue, Jun 24, 2025 at 9:29 PM Richard Sandiford > wrote: >> >> Christoph Müllner writes: >> > insn_info::has_been_deleted () is documented to return true if an >> > instruction is deleted. Such instructions have their `volatile` bit set, >> > which can be tested via rtx_insn::deleted (). >> > >> > The current condition for insn_info::has_been_deleted () is: >> > * m_rtl is not NULL: this can't happen as no member of insn_info >> > changes this pointer. >> >> Yeah, it's invariant after creation, but it starts off null for some >> artificial instructions: >> >> // Return the underlying RTL insn. This instruction is null if is_phi () >> // or is_bb_end () are true. The instruction is a basic block note if >> // is_bb_head () is true. >> rtx_insn *rtl () const { return m_rtl; } >> >> So I think we should keep the null check. (But then is_call and is_jump >> should check m_rtl is nonnull too -- that's preapproved if you want to >> do it while you're here.) > > I have a tested patch for this, but I don't think that it would be sufficient, > as there are also other places to check for a NULL dereference: > * member-fns.inl: insn_info::uid -> what to return here? That one's ok, because m_rtl is nonnull whenever m_cost_or_uid >= 0. (m_cost_or_uid >= 0 is the test for whether something is a "real" instruction, in which case it always has an associated RTL insn.) > * internals.inl: insn_info::set_properties > * insns.cc: insn_info::calculate_cost Those two are ok because they're internal routines that are only reached when we already know that we're dealing with real instructions. > Ok, if I add NULL-checks there as well? I think just is_call and is_jump for now, since they're publicly-facing routines that don't assume any preconditions. Others might crop up later though... >> > * !INSN_P (m_rtl): this will likely fail for rtx_insn objects and >> > does not test the `volatile` bit. >> >> Because of the need to stage multiple simultaneous changes, rtl-ssa first >> uses set_insn_deleted to convert an insn to a NOTE_INSN_DELETED note, >> then uses remove_insn to remove the underlying instruction. It doesn't >> use delete_insn directly. The call to remove_insn is fairly recent; >> the original code just used set_insn_deleted, but not removing the notes >> caused trouble for later passes. >> >> The test was therefore supposed to be checking whether set_insn_deleted >> had been called. It should also have checked the note kind though. > > Thanks for the explanation. I missed the fact that set_insn_delete () is used. > Assuming that code using RTL-SSA will use the insn_change class, it makes > sense now. Ah, yeah, that's pretty much required, since otherwise things will get out of sync. > I'm converting the fold-mem-offsets pass to RTL-SSA (see PR117922). > And I ran into this issue because I've already converted the analysis > part to RTL-SSA, > but the code changes are still performed directly on the rtx_insn objects > (in do_commit_insn ()). I'll try to use RTL-SSA in do_commit_insn () as well, > which should also allow RTL-SSA to see the changes. Sounds good! Thanks for doing this. Richard
[Fortran, Patch, PR120711, v1] 1/(3) Fix out of bounds access in cleanup of array constructor
Hi all, attached patch fixes an out of bounds access in the clean up code of a concatenating array constructor. A fragment like list = [ list, something() ] lead to clean up using an offset (of the list array) that was manipulated in the loop copying the existing array elements and at the end pointing to one element past the list (after the concatenation). This fixes a 15-regression. Releases prior to 15 do not have the out of bounds access in the (non existing) clean up code. The have a memory leak instead. Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline? The subject says, that there will be 3 patches. Only this one fixes the bug. The other fixes I found while hunting this issue and because they play in the general same area, I don't want to loose them. I therefore publish them in this context. Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 548bcaeff9b8c8d6bb670574883f7b02878e3221 Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Wed, 25 Jun 2025 09:12:35 +0200 Subject: [PATCH 1/3] Fortran: Fix out of bounds access in structure constructor's clean up [PR120711] A structure constructor's generated clean up code was using an offset variable, which was manipulated before the clean up was run leading to an out of bounds access. PR fortran/120711 gcc/fortran/ChangeLog: * trans-array.cc (gfc_trans_array_ctor_element): Store the value of the offset for reuse. gcc/testsuite/ChangeLog: * gfortran.dg/asan/array_constructor_1.f90: New test. --- gcc/fortran/trans-array.cc| 10 .../gfortran.dg/asan/array_constructor_1.f90 | 23 +++ 2 files changed, 29 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 3d274439895..7be2d7b11a6 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -1991,14 +1991,17 @@ static void gfc_trans_array_ctor_element (stmtblock_t * pblock, tree desc, tree offset, gfc_se * se, gfc_expr * expr) { - tree tmp; + tree tmp, offset_eval; gfc_conv_expr (se, expr); /* Store the value. */ tmp = build_fold_indirect_ref_loc (input_location, gfc_conv_descriptor_data_get (desc)); - tmp = gfc_build_array_ref (tmp, offset, NULL); + /* The offset may change, so get its value now and use that to free memory. + */ + offset_eval = gfc_evaluate_now (offset, &se->pre); + tmp = gfc_build_array_ref (tmp, offset_eval, NULL); if (expr->expr_type == EXPR_FUNCTION && expr->ts.type == BT_DERIVED && expr->ts.u.derived->attr.alloc_comp) @@ -3150,8 +3153,7 @@ finish: the reference. */ if ((expr->ts.type == BT_DERIVED || expr->ts.type == BT_CLASS) && finalblock.head != NULL_TREE) -gfc_add_block_to_block (&loop->post, &finalblock); - +gfc_prepend_expr_to_block (&loop->post, finalblock.head); } diff --git a/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 new file mode 100644 index 000..45eafacd5a6 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/asan/array_constructor_1.f90 @@ -0,0 +1,23 @@ +!{ dg-do run } + +! Contributed by Christopher Albert + +program grow_type_array +type :: container +integer, allocatable :: arr(:) +end type container + +type(container), allocatable :: list(:) + +list = [list, new_elem(5)] + +deallocate(list) + +contains + +type(container) function new_elem(s) result(out) +integer :: s +allocate(out%arr(s)) +end function new_elem + +end program grow_type_array -- 2.49.0
[Fortran, Patch, v1] 3/(3) Prevent creating tree that is never used.
Hi, while hunting for pr120711 I found a construct where a call-tree was created and never used. The patch now just suppresses the tree creation and instead uses directly the tree that is desired. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 52a7898f0b460dfcd64117b399826592e8f0978b Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Wed, 25 Jun 2025 12:27:35 +0200 Subject: [PATCH 3/3] Fortran: Prevent creation of unused tree. gcc/fortran/ChangeLog: * trans.cc (gfc_allocate_using_malloc): Prevent possible memory leak when allocation was already done. --- gcc/fortran/trans.cc | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc index fdeb1e89a76..13fd5ad498d 100644 --- a/gcc/fortran/trans.cc +++ b/gcc/fortran/trans.cc @@ -822,6 +822,7 @@ gfc_allocate_using_malloc (stmtblock_t * block, tree pointer, tree tmp, error_cond; stmtblock_t on_error; tree status_type = status ? TREE_TYPE (status) : NULL_TREE; + bool cond_is_true = cond == boolean_true_node; /* If successful and stat= is given, set status to 0. */ if (status != NULL_TREE) @@ -834,11 +835,13 @@ gfc_allocate_using_malloc (stmtblock_t * block, tree pointer, tmp = fold_build2_loc (input_location, MAX_EXPR, size_type_node, size, build_int_cst (size_type_node, 1)); - tmp = build_call_expr_loc (input_location, - builtin_decl_explicit (BUILT_IN_MALLOC), 1, tmp); - if (cond == boolean_true_node) + if (!cond_is_true) +tmp = build_call_expr_loc (input_location, + builtin_decl_explicit (BUILT_IN_MALLOC), 1, tmp); + else tmp = alt_alloc; - else if (cond) + + if (!cond_is_true && cond) tmp = build3_loc (input_location, COND_EXPR, TREE_TYPE (tmp), cond, alt_alloc, tmp); -- 2.49.0
[Fortran, Patch, v1] 2/(3) Stop spending memory in coarray single mode executables.
Hi, attached patch prevents generation of a token component in derived types, when -fcoarray=single is used. Generating the token only wastes memory. It is never even initialized nor accessed. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From a888d8952e8fa6f516fde22519fab33d60d3f0c4 Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Wed, 25 Jun 2025 12:27:04 +0200 Subject: [PATCH 2/3] Fortran: Fix wasting memory in coarray single mode. gcc/fortran/ChangeLog: * resolve.cc (resolve_fl_derived0): Do not create the token component when not in coarray lib mode. * trans-types.cc: Do not access the token when not in coarray lib mode. --- gcc/fortran/resolve.cc | 4 ++-- gcc/fortran/trans-types.cc | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 7089e4f171d..58f7aee29c3 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -16841,8 +16841,8 @@ resolve_fl_derived0 (gfc_symbol *sym) return false; /* Now add the caf token field, where needed. */ - if (flag_coarray != GFC_FCOARRAY_NONE - && !sym->attr.is_class && !sym->attr.vtype) + if (flag_coarray == GFC_FCOARRAY_LIB && !sym->attr.is_class + && !sym->attr.vtype) { for (c = sym->components; c; c = c->next) if (!c->attr.dimension && !c->attr.codimension diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc index e15b1bb89f0..1754d982153 100644 --- a/gcc/fortran/trans-types.cc +++ b/gcc/fortran/trans-types.cc @@ -3187,7 +3187,7 @@ copy_derived_types: for (c = derived->components; c; c = c->next) { /* Do not add a caf_token field for class container components. */ - if ((codimen || coarray_flag) && !c->attr.dimension + if (codimen && coarray_flag && !c->attr.dimension && !c->attr.codimension && (c->attr.allocatable || c->attr.pointer) && !derived->attr.is_class) { -- 2.49.0
Re: [PATCH v6 8/9] AArch64: rules for CMPBR instructions
Karl Meakin writes: > Add rules for lowering `cbranch4` to CBB/CBH/CB when > CMPBR extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function. > * config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise. > * config/aarch64/aarch64.md (cbranch4): Rename to ... > (cbranch4): ...here, and emit CMPBR if possible. > (cbranch4): New expand rule. > (aarch64_cb): New insn rule. > (aarch64_cb): Likewise. > * config/aarch64/constraints.md (Uc0): New constraint. > (Uc1): Likewise. > (Uc2): Likewise. > * config/aarch64/iterators.md (cmpbr_suffix): New mode attr. > (INT_CMP): New code iterator. > (cmpbr_imm_constraint): New code attr. > * config/aarch64/predicates.md (const_0_to_63_operand): New predicate. > (aarch64_cb_immediate): Likewise. > (aarch64_cb_operand): Likewise. > (aarch64_cb_short_operand): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/cmpbr.c: > --- > gcc/config/aarch64/aarch64-protos.h | 2 + > gcc/config/aarch64/aarch64.cc| 33 ++ > gcc/config/aarch64/aarch64.md| 89 +++- > gcc/config/aarch64/constraints.md| 18 + > gcc/config/aarch64/iterators.md | 19 + > gcc/config/aarch64/predicates.md | 15 + > gcc/testsuite/gcc.target/aarch64/cmpbr.c | 586 --- > 7 files changed, 376 insertions(+), 386 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index 31f2f5b8bd2..0f104d0641b 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -1135,6 +1135,8 @@ bool aarch64_general_check_builtin_call (location_t, > vec, >unsigned int, tree, unsigned int, >tree *); > > +bool aarch64_cb_rhs (rtx op, rtx rhs); > + > namespace aarch64 { >void report_non_ice (location_t, tree, unsigned int); >void report_out_of_range (location_t, tree, unsigned int, HOST_WIDE_INT, > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 667e42ba401..3dc139e9a72 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -959,6 +959,39 @@ svpattern_token (enum aarch64_svpattern pattern) >gcc_unreachable (); > } > > +/* Return true if rhs is an operand suitable for a CB (immediate) > + instruction. */ This should also mention what "op" is. The convention is also to use caps to refer to parameter names. Maybe: /* Return true if RHS is an immediate operand suitable for a CB (immediate) instruction. OP determines the type of the comparison. */ > +bool > +aarch64_cb_rhs (rtx op, rtx rhs) > +{ > + if (!CONST_INT_P (rhs)) > +return REG_P (rhs); > + > + HOST_WIDE_INT rhs_val = INTVAL (rhs); > + > + switch (GET_CODE (op)) > +{ > +case EQ: > +case NE: > +case GT: > +case GTU: > +case LT: > +case LTU: > + return IN_RANGE (rhs_val, 0, 63); > + > +case GE: /* CBGE: signed greater than or equal */ > +case GEU: /* CBHS: unsigned greater than or equal */ > + return IN_RANGE (rhs_val, 1, 64); > + > +case LE: /* CBLE: signed less than or equal */ > +case LEU: /* CBLS: unsigned less than or equal */ > + return IN_RANGE (rhs_val, -1, 62); > + > +default: > + return false; > +} > +} > + > /* Return the location of a piece that is known to be passed or returned > in registers. FIRST_ZR is the first unused vector argument register > and FIRST_PR is the first unused predicate argument register. */ > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 0a378ab377d..23bce55f620 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -717,6 +717,10 @@ (define_constants > ;; +/- 32KiB. Used by TBZ, TBNZ. > (BRANCH_LEN_P_32KiB 32764) > (BRANCH_LEN_N_32KiB -32768) > + > +;; +/- 1KiB. Used by CBB, CBH, CB. > +(BRANCH_LEN_P_1Kib 1020) > +(BRANCH_LEN_N_1Kib -1024) >] > ) > > @@ -724,18 +728,35 @@ (define_constants > ;; Conditional jumps > ;; --- > > -(define_expand "cbranch4" > +(define_expand "cbranch4" >[(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" > [(match_operand:GPI 1 "register_operand") >(match_operand:GPI 2 "aarch64_plus_operand")]) > (label_ref (match_operand 3)) > (pc)))] >"" > - " > - operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1], > - operands[2]); > - operands[2] = const0_rtx; > - " > + { > + if (TARGET_CMPBR && aarch64_cb_rhs(operands[0], operands[2])) > +{ > +// Fal
[Fortran, Patch, PR120637, v1] Ensure expression in finalizer creation is freed only when unused.
Hi, Antony Lewis reported this issue and also proposed a patch, that removes the was_finalized tracking. While this may lead to the desired effect for the issue at hand, I don't believe that the was_finalized tracking code has been there for no reason. This patch fixes the issue that also Antony found, but by ensuring the expression stays allocated when used instead of being freeed. The test has been put into the asan directory of gfortran.dg and reliably reports the issue without the fix. (With the fix, the asan is quite). Regtests ok on x86_64-pc-linxu-gnu / F41. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 2c7c6a6db78c448a158ee4f952cf2236665001ca Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Wed, 25 Jun 2025 14:46:16 +0200 Subject: [PATCH] Fortran: Ensure finalizers are created correctly [PR120637] Finalize_component freeed an expression that it used to remember which components in which context it had finalized already. While it makes sense to free the copy of the expression, if it is unused, it causes issues, when comparing to a non existent expression. This is now detected by returning true, when the expression has been used. PR fortran/120637 gcc/fortran/ChangeLog: * class.cc (finalize_component): Return true, when a finalizable component was detect and do not free it. gcc/testsuite/ChangeLog: * gfortran.dg/asan/finalizer_1.f90: New test. --- gcc/fortran/class.cc | 24 --- .../gfortran.dg/asan/finalizer_1.f90 | 67 +++ 2 files changed, 81 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/asan/finalizer_1.f90 diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc index df18601e45b..a1c6fafa75e 100644 --- a/gcc/fortran/class.cc +++ b/gcc/fortran/class.cc @@ -1034,7 +1034,7 @@ comp_is_finalizable (gfc_component *comp) of calling the appropriate finalizers, coarray deregistering, and deallocation of allocatable subcomponents. */ -static void +static bool finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp, gfc_symbol *stat, gfc_symbol *fini_coarray, gfc_code **code, gfc_namespace *sub_ns) @@ -1044,14 +1044,14 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp, gfc_was_finalized *f; if (!comp_is_finalizable (comp)) -return; +return false; /* If this expression with this component has been finalized already in this namespace, there is nothing to do. */ for (f = sub_ns->was_finalized; f; f = f->next) { if (f->e == expr && f->c == comp) - return; + return false; } e = gfc_copy_expr (expr); @@ -1208,8 +1208,6 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp, final_wrap->ext.actual->next->next = gfc_get_actual_arglist (); final_wrap->ext.actual->next->next->expr = fini_coarray_expr; - - if (*code) { (*code)->next = final_wrap; @@ -1221,11 +1219,14 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp, else { gfc_component *c; + bool ret = false; for (c = comp->ts.u.derived->components; c; c = c->next) - finalize_component (e, comp->ts.u.derived, c, stat, fini_coarray, code, - sub_ns); - gfc_free_expr (e); + ret |= finalize_component (e, comp->ts.u.derived, c, stat, fini_coarray, + code, sub_ns); + /* Only free the expression, if it has never been used. */ + if (!ret) + gfc_free_expr (e); } /* Record that this was finalized already in this namespace. */ @@ -1234,6 +1235,7 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp, sub_ns->was_finalized->e = expr; sub_ns->was_finalized->c = comp; sub_ns->was_finalized->next = f; + return true; } @@ -2314,6 +2316,7 @@ finish_assumed_rank: { gfc_symbol *stat; gfc_code *block = NULL; + gfc_expr *ptr_expr; if (!ptr) { @@ -2359,14 +2362,15 @@ finish_assumed_rank: sub_ns); block = block->next; + ptr_expr = gfc_lval_expr_from_sym (ptr); for (comp = derived->components; comp; comp = comp->next) { if (comp == derived->components && derived->attr.extension && ancestor_wrapper && ancestor_wrapper->expr_type != EXPR_NULL) continue; - finalize_component (gfc_lval_expr_from_sym (ptr), derived, comp, - stat, fini_coarray, &block, sub_ns); + finalize_component (ptr_expr, derived, comp, stat, fini_coarray, + &block, sub_ns); if (!last_code->block->next) last_code->block->next = block; } diff --git a/gcc/testsuite/gfortran.dg/asan/finalizer_1.f90 b/gcc/testsuite/gfortran.dg/asan/finalizer_1.f90 new file mode 100644 index 000..dfc20de7f3b --- /dev/null +++ b/gcc/testsuite/gfortran.dg/asan/finalizer_1.f90 @@ -0,0 +1,67 @@ +!{ dg-do run } + +! PR fortran/120637 + +! Contributed
Re: [PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted
On Tue, Jun 24, 2025 at 9:29 PM Richard Sandiford wrote: > > Christoph Müllner writes: > > insn_info::has_been_deleted () is documented to return true if an > > instruction is deleted. Such instructions have their `volatile` bit set, > > which can be tested via rtx_insn::deleted (). > > > > The current condition for insn_info::has_been_deleted () is: > > * m_rtl is not NULL: this can't happen as no member of insn_info > > changes this pointer. > > Yeah, it's invariant after creation, but it starts off null for some > artificial instructions: > > // Return the underlying RTL insn. This instruction is null if is_phi () > // or is_bb_end () are true. The instruction is a basic block note if > // is_bb_head () is true. > rtx_insn *rtl () const { return m_rtl; } > > So I think we should keep the null check. (But then is_call and is_jump > should check m_rtl is nonnull too -- that's preapproved if you want to > do it while you're here.) I have a tested patch for this, but I don't think that it would be sufficient, as there are also other places to check for a NULL dereference: * member-fns.inl: insn_info::uid -> what to return here? * internals.inl: insn_info::set_properties * insns.cc: insn_info::calculate_cost Ok, if I add NULL-checks there as well? > > * !INSN_P (m_rtl): this will likely fail for rtx_insn objects and > > does not test the `volatile` bit. > > Because of the need to stage multiple simultaneous changes, rtl-ssa first > uses set_insn_deleted to convert an insn to a NOTE_INSN_DELETED note, > then uses remove_insn to remove the underlying instruction. It doesn't > use delete_insn directly. The call to remove_insn is fairly recent; > the original code just used set_insn_deleted, but not removing the notes > caused trouble for later passes. > > The test was therefore supposed to be checking whether set_insn_deleted > had been called. It should also have checked the note kind though. Thanks for the explanation. I missed the fact that set_insn_delete () is used. Assuming that code using RTL-SSA will use the insn_change class, it makes sense now. I'm converting the fold-mem-offsets pass to RTL-SSA (see PR117922). And I ran into this issue because I've already converted the analysis part to RTL-SSA, but the code changes are still performed directly on the rtx_insn objects (in do_commit_insn ()). I'll try to use RTL-SSA in do_commit_insn () as well, which should also allow RTL-SSA to see the changes. Thanks, Christoph > However, I agree that testing the deleted flag would be better. > For that to work, we'd need to set the deleted flag here: > > if (rtx_insn *rtl = insn->rtl ()) > ::remove_insn (rtl); // Remove the underlying RTL insn. > > as well as calling remove_insn. Alternatively (and better), we could > try converting ::remove_insn to ::delete_insn. > > Thanks, > Richard > > > > This patch drops these conditions and calls m_rtl->deleted () instead. > > > > The impact of this change is minimal as insn_info::has_been_deleted > > is only called in insn_info::print_full. > > > > Bootstrapped and regtested x86_64-linux. > > > > gcc/ChangeLog: > > > > * rtl-ssa/insns.h: Fix implementation of has_been_deleted (). > > > > Signed-off-by: Christoph Müllner > > --- > > gcc/rtl-ssa/insns.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/gcc/rtl-ssa/insns.h b/gcc/rtl-ssa/insns.h > > index d89dfc5c3f66..bb3f52efa83a 100644 > > --- a/gcc/rtl-ssa/insns.h > > +++ b/gcc/rtl-ssa/insns.h > > @@ -186,7 +186,7 @@ public: > >// Return true if the instruction was a real instruction but has now > >// been deleted. In this case the instruction is no longer part of > >// the SSA information. > > - bool has_been_deleted () const { return m_rtl && !INSN_P (m_rtl); } > > + bool has_been_deleted () const { return m_rtl->deleted (); } > > > >// Return true if the instruction is a debug instruction (and thus > >// also a real instruction).
[PATCH] vect: Misalign checks for gather/scatter.
Hi, this patch adds simple misalignment checks for gather/scatter operations. Previously, we assumed that those perform element accesses internally so alignment does not matter. The riscv vector spec however explicitly states that vector operations are allowed to fault on element-misaligned accesses. Reasonable uarchs won't, but... For gather/scatter we have two paths in the vectorizer: (1) Regular analysis based on datarefs. Here we can also create strided loads. (2) Non-affine access where each gather index is relative to the initial address. The assumption this patch works off is that once the alignment for the first scalar is correct, all others will fall in line, as the index is always a multiple of the first element's size. For (1) we have a dataref and can check it for alignment as in other cases. For (2) this patch checks the object alignment of BASE and compares it against the natural alignment of the current vectype's unit. The patch also adds a pointer argument to the gather/scatter IFNs that contains the necessary alignment. Most of the patch is thus mechanical in that it merely adjusts indices. I tested the riscv version with a custom qemu version that faults on element-misaligned vector accesses. With this patch applied, there is just a single fault left, which is due to PR120782 and which will be addressed separately. Is the general approach reasonable or do we need to do something else entirely? Bootstrap and regtest on aarch64 went fine. I couldn't bootstrap/regtest on x86 as my regular cfarm machines (420-422) are currently down. Issues are expected, though, as the patch doesn't touch x86's old-style gathers/scatters at all yet. I still wanted to get this initial version out there to get feedback. The two riscv-specific changes I can still split off, obviously. Also, I couldn't help but do tiny refactoring in some spots :) This could also go if requested. I noticed one early-break failure with the changes where we would give up on a load_permutation of {0}. It looks latent and probably unintended but I didn't investigate for now and just allowed this specific permutation. Regards Robin gcc/ChangeLog: * config/riscv/riscv.cc (riscv_support_vector_misalignment): Always support known aligned types. * internal-fn.cc (expand_scatter_store_optab_fn): Change argument numbers. (expand_gather_load_optab_fn): Ditto. (internal_fn_len_index): Ditto. (internal_fn_else_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. (internal_gather_scatter_fn_supported_p): Ditto. * optabs-query.cc (supports_vec_gather_load_p): Ditto. * tree-vect-data-refs.cc (vect_describe_gather_scatter_call): Handle align_ptr. (vect_check_gather_scatter): Compute and set align_ptr. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Ditto. * tree-vect-slp.cc (GATHER_SCATTER_OFFSET): Define. (vect_get_and_check_slp_defs): Use define. * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset): Set align_ptr. (get_group_load_store_type): Do not special-case gather/scatter. (get_load_store_type): Compute misalignment. (vectorizable_store): Remove alignment assert for scatter/gather. (vectorizable_load): Ditto. * tree-vectorizer.h (struct gather_scatter_info): Add align_ptr. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Fix riscv misalign supported check. --- gcc/config/riscv/riscv.cc | 24 ++-- gcc/internal-fn.cc| 21 --- gcc/optabs-query.cc | 2 +- gcc/testsuite/lib/target-supports.exp | 2 +- gcc/tree-vect-data-refs.cc| 13 - gcc/tree-vect-patterns.cc | 17 +++--- gcc/tree-vect-slp.cc | 20 --- gcc/tree-vect-stmts.cc| 83 --- gcc/tree-vectorizer.h | 3 + 9 files changed, 130 insertions(+), 55 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 8fdc5b21484..02637ee5a5b 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12069,11 +12069,27 @@ riscv_estimated_poly_value (poly_int64 val, target. */ bool riscv_support_vector_misalignment (machine_mode mode, - const_tree type ATTRIBUTE_UNUSED, + const_tree type, int misalignment, - bool is_packed ATTRIBUTE_UNUSED) -{ - /* Depend on movmisalign pattern. */ + bool is_packed) +{ + /* IS_PACKED is true if the corresponding scalar element is not naturally + aligned. In that case defer to the default hook which will check + if movmisalign is present. Movmisalign, in turn, depends on + TARGET_VECTOR
Re: [PATCH] c++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]
On Tue, Jun 24, 2025 at 11:14:51AM -0400, Jason Merrill wrote: > On 6/24/25 10:16 AM, Nathaniel Shead wrote: > > On Tue, Jun 24, 2025 at 01:03:53PM +0200, Jakub Jelinek wrote: > > > Hi! > > > > > > The following patch implements the P3618R0 paper by tweaking pedwarn > > > condition, adjusting pedwarn wording, adjusting one testcase and adding 4 > > > new ones. The paper was voted in as DR, so it isn't guarded on C++ > > > version. > > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > > > 2025-06-24 Jakub Jelinek > > > > > > PR c++/120773 > > > * decl.cc (grokfndecl): Implement C++26 P3618R0 - Allow attaching > > > main to the global module. Only pedwarn for current_lang_name > > > other than lang_name_cplusplus and adjust pedwarn wording. > > > > > > * g++.dg/parse/linkage5.C: Don't expect error on > > > extern "C++" int main ();. > > > * g++.dg/parse/linkage7.C: New test. > > > * g++.dg/parse/linkage8.C: New test. > > > * g++.dg/modules/main-2.C: New test. > > > * g++.dg/modules/main-3.C: New test. > > > > > > --- gcc/cp/decl.cc.jj 2025-06-19 08:55:04.408676724 +0200 > > > +++ gcc/cp/decl.cc2025-06-23 17:47:13.942011687 +0200 > > > @@ -11326,9 +11326,9 @@ grokfndecl (tree ctype, > > > "cannot declare %<::main%> to be %qs", "consteval"); > > > if (!publicp) > > > error_at (location, "cannot declare %<::main%> to be static"); > > > - if (current_lang_depth () != 0) > > > + if (current_lang_name != lang_name_cplusplus) > > > pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> > > > with a" > > > - " linkage specification"); > > > + " linkage specification other than %<\"C++\"%>"); > > > if (module_attach_p ()) > > > error_at (location, "cannot attach %<::main%> to a named > > > module"); > > > > Maybe it would be nice to add a note/fixit that users can now work > > around this error by marking main as 'extern "C++"'? But overall LGTM. > > I suppose we could say "other than %" to make that a little > clearer. OK with that tweak. > > I wouldn't object to a fixup but it sounds more complicated than it's worth > to have different fixups for the extern "C" { int main(); } and extern "C" > int main(); cases. > > Jason > I think I wasn't totally clear sorry; here's a patch with what I meant. Tested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- Subject: [PATCH] c++: Add fix note for how to declare main in a module This patch adds a note to help users unfamiliar with modules terminology understand how to declare main in a named module since P3618. There doesn't appear to be an easy robust location available for "the start of this declaration" that I could find to attach a fixit to, but the explanation should suffice. gcc/cp/ChangeLog: * decl.cc (grokfndecl): Add explanation of how to attach to global module. Signed-off-by: Nathaniel Shead --- gcc/cp/decl.cc | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 95bccfbb585..4fe97ffbf8f 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -11330,7 +11330,12 @@ grokfndecl (tree ctype, pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> with a" " linkage specification other than %"); if (module_attach_p ()) - error_at (location, "cannot attach %<::main%> to a named module"); + { + auto_diagnostic_group adg; + error_at (location, "cannot attach %<::main%> to a named module"); + inform (location, "use % to attach it to the " + "global module instead"); + } inlinep = 0; publicp = 1; } -- 2.47.0
RE: [PATCH 2/2] RISC-V: Add testcases for signed scalar SAT_ADD IMM form 2
> Pan -- can you cover reviewing the testsuite bits since thisis an area > where you've done a ton of work over the last year or so. Sure thing and thanks Jeff, I will take a look after return back from a vacation, ETA before the end of this week. Pan -Original Message- From: Jeff Law Sent: Wednesday, June 25, 2025 5:30 AM To: Ciyan Pan ; gcc-patches@gcc.gnu.org Cc: kito.ch...@gmail.com; richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; Li, Pan2 ; rdapp@gmail.com Subject: Re: [PATCH 2/2] RISC-V: Add testcases for signed scalar SAT_ADD IMM form 2 On 6/23/25 9:12 PM, Ciyan Pan wrote: > From: panciyan > > This patch adds testcase for form2, as shown below: > > T __attribute__((noinline)) \ > sat_s_add_imm_##T##_fmt_2##_##INDEX (T x)\ > {\ >T sum = (T)((UT)x + (UT)IMM); \ >return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \ > (-(T)(x < 0) ^ MAX) : sum; \ > } > > Passed the rv64gcv regression test. > > Signed-off-by: Ciyan Pan > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat/sat_arith.h: > * gcc.target/riscv/sat/sat_s_add_imm-2-i16.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm-2-i32.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm-2-i64.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm-2-i8.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm-run-2-i16.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm-run-2-i32.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm-run-2-i64.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm-run-2-i8.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i16.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i32.c: New test. > * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i8.c: New test. Pan -- can you cover reviewing the testsuite bits since thisis an area where you've done a ton of work over the last year or so. THanks! jeff
Re: [PATCH] ivopts: Change constant_multiple_of to expand aff nodes.
On Tue, 24 Jun 2025, Alfie Richards wrote: > Hi all, > > This is a small change to ivopts to expand SSA variables enabling ivopts to > correctly work out when an address IV step is set to be a multiple on index > step in the loop header (ie, not constant, not calculated each loop.) > > Seems like this might have compile speed costs that need to be considered, but > I believe should be worth it. > > This is also required for some upcoming work for vectorization of VLA loops > with > iteration data dependencies. > > Bootstrapped and reg tested on aarch64-linux-gnu and x86_64-unknown-linux-gnu. OK. Thanks, Richard. > Thanks, > Alfie > > -- >8 -- > > This changes the calls to tree_to_aff_combination in constant_multiple_of to > tree_to_aff_combination_expand along with associated plumbing of ivopts_data > and required cache. > > This improves cases such as: > > ```c > void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) { > for (unsigned long i = 0; i < end; i += step) { > svst1(pg, p1, svld1_s32(pg, p2)); > p1 += step; > p2 += step; > } > } > ``` > > Where ivopts previously didn't expand the SSA variables for the step > increements > and so lacked the ability to group all the IV's and ended up with: > > ``` > f: > cbz x3, .L1 > mov x4, 0 > .L3: > ld1wz31.s, p0/z, [x1] > add x4, x4, x2 > st1wz31.s, p0, [x0] > add x1, x1, x2, lsl 2 > add x0, x0, x2, lsl 2 > cmp x3, x4 > bhi .L3 > .L1: > ret > ``` > > After this change we end up with: > > ``` > f: > cbz x3, .L1 > mov x4, 0 > .L3: > ld1wz31.s, p0/z, [x1, x4, lsl 2] > st1wz31.s, p0, [x0, x4, lsl 2] > add x4, x4, x2 > cmp x3, x4 > bhi .L3 > .L1: > ret > ``` > > gcc/ChangeLog: > > * tree-ssa-loop-ivopts.cc (constant_multiple_of): Change > tree_to_aff_combination to tree_to_aff_combination_expand and add > parameter to take ivopts_data. > (get_computation_aff_1): Change parameters and calls to include > ivopts_data. > (get_computation_aff): Ditto. > (get_computation_at) Ditto.: > (get_debug_computation_at) Ditto.: > (get_computation_cost) Ditto.: > (rewrite_use_nonlinear_expr) Ditto.: > (rewrite_use_address) Ditto.: > (rewrite_use_compare) Ditto.: > (remove_unused_ivs) Ditto.: > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/adr_7.c: New test. > --- > gcc/testsuite/gcc.target/aarch64/sve/adr_7.c | 19 ++ > gcc/tree-ssa-loop-ivopts.cc | 65 +++- > 2 files changed, 54 insertions(+), 30 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/adr_7.c > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c > b/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c > new file mode 100644 > index 000..61e23bbf182 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ftree-vectorize" } */ > + > +#include > + > +void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) > { > +for (unsigned long i = 0; i < end; i += step) { > +svst1(pg, p1, svld1_s32(pg, p2)); > +p1 += step; > +p2 += step; > +} > +} > + > +/* { dg-final { scan-assembler-not {\tld1w\tz[0-9]+\.d, > p[0-9]+/z\[x[0-9]+\.d\]} } } */ > +/* { dg-final { scan-assembler-not {\tst1w\tz[0-9]+\.d, > p[0-9]+/z\[x[0-9]+\.d\]} } } */ > + > +/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x[0-9]+, x[0-9]+} 1 } > } */ > +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-9]+/z, > \[x[0-9]+, x[0-9]+, lsl 2\]} 1 } } */ > +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-9]+, > \[x[0-9]+, x[0-9]+, lsl 2\]} 1 } } */ > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > index 8a6726f1988..544a946ff89 100644 > --- a/gcc/tree-ssa-loop-ivopts.cc > +++ b/gcc/tree-ssa-loop-ivopts.cc > @@ -2117,11 +2117,15 @@ idx_record_use (tree base, tree *idx, > signedness of TOP and BOT. */ > > static bool > -constant_multiple_of (tree top, tree bot, widest_int *mul) > +constant_multiple_of (tree top, tree bot, widest_int *mul, > + struct ivopts_data *data) > { >aff_tree aff_top, aff_bot; > - tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); > - tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); > + tree_to_aff_combination_expand (top, TREE_TYPE (top), &aff_top, > + &data->name_expansion_cache); > + tree_to_aff_combination_expand (bot, TREE_TYPE (bot), &aff_bot, > + &data->name_expansion_cache); > + >poly_widest_int poly_mul; >if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) >&& poly_mul.is_constant (mul)) > @@ -3945,
[PATCH][v2] tree-optimization/109892 - SLP reduction of fma
The following adds the ability to vectorize a fma reduction pair as SLP reduction (we cannot yet handle ternary association in reduction vectorization yet). Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. I'll file a bug about the missed handling for fold-left reductions. PR tree-optimization/109892 * tree-vect-loop.cc (check_reduction_path): Handle fma. (vectorizable_reduction): Apply FOLD_LEFT_REDUCTION code generation constraints. * gcc.dg/vect/vect-reduc-fma-1.c: New testcase. * gcc.dg/vect/vect-reduc-fma-2.c: Likewise. * gcc.dg/vect/vect-reduc-fma-3.c: Likewise. --- gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c | 15 +++ gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c | 20 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c | 16 gcc/tree-vect-loop.cc| 17 + 4 files changed, 68 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c new file mode 100644 index 000..e958b43e23b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */ + +double f(double x[], long n) +{ +double r0 = 0, r1 = 0; +for (; n; x += 2, n--) { +r0 = __builtin_fma(x[0], x[0], r0); +r1 = __builtin_fma(x[1], x[1], r1); +} +return r0 + r1; +} + +/* We should vectorize this as SLP reduction. */ +/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors and unroll factor 1" "vect" { target { x86_64-*-* i?86-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c new file mode 100644 index 000..ea1ca9720e5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-ffp-contract=on" } */ +/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */ + +static double muladd(double x, double y, double z) +{ +return x * y + z; +} +double g(double x[], long n) +{ +double r0 = 0, r1 = 0; +for (; n; x += 2, n--) { +r0 = muladd(x[0], x[0], r0); +r1 = muladd(x[1], x[1], r1); +} +return r0 + r1; +} + +/* We should vectorize this as SLP reduction. */ +/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors and unroll factor 1" "vect" { target { x86_64-*-* i?86-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c new file mode 100644 index 000..10cecedd8e5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-fma-3.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-ffast-math" } */ +/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */ + +double f(double x[], long n) +{ +double r0 = 0, r1 = 0; +for (; n; x += 2, n--) { +r0 = __builtin_fma(x[0], x[0], r0); +r1 = __builtin_fma(x[1], x[1], r1); +} +return r0 + r1; +} + +/* We should vectorize this as SLP reduction, higher VF possible. */ +/* { dg-final { scan-tree-dump "optimized: loop vectorized" "vect" { target { x86_64-*-* i?86-*-* } } } } */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a3f95433a5b..9a4b89e9113 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -4139,6 +4139,10 @@ pop: if (op.ops[2] == op.ops[opi]) neg = ! neg; } + /* For an FMA the reduction code is the PLUS if the addition chain +is the reduction. */ + else if (op.code == IFN_FMA && opi == 2) + op.code = PLUS_EXPR; if (CONVERT_EXPR_CODE_P (op.code) && tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0]))) ; @@ -8084,6 +8088,19 @@ vectorizable_reduction (loop_vec_info loop_vinfo, "in-order reduction chain without SLP.\n"); return false; } + /* Code generation doesn't support function calls other +than .COND_*. */ + if (!op.code.is_tree_code () + && !(op.code.is_internal_fn () + && conditional_internal_fn_code (internal_fn (op.code)) + != ERROR_MARK)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"in-order reduction chain operation not " +"supported.\n"); + return false; + } STMT_VINFO_REDUC_TYPE (reduc_info) = reduction_type = FOLD_LEFT_REDUCTION;
Re: [RFC] [lra] catch all to-sp eliminations [PR120424]
On 6/23/25 12:06 AM, Alexandre Oliva wrote: Alex, thanks for investigation of corner cases of register elimination. An x86_64-linux-gnu native with ix86_frame_pointer_required modified to return true for nonzero frames, to exercize lra_update_fp2sp_elimination, reveals in stage1 testing that wrong code is generated for gcc.c-torture/execute/ieee/fp-cmp-8l.c: argp-to-sp eliminations are used for one_test to pass its arguments on to *pos, and the sp offsets survive the disabling of that elimination. We didn't really have to disable that elimination, but the backend disables eliminations to sp if frame_pointer_needed. The workaround for this scenario is to compile with -maccumulate-outgoing-args. This change extends the catching of fp2sp eliminations to all (?) eliminations to sp, since none of them can be properly reversed and would silently lead to wrong code. This is probably too strict. I guess it is too strict. Regstrapped on x86_64-linux-gnu, bootstrapped on arm-linux-gnueabihf (arm and thumb modes), also tested with gcc-14 on arm-vx7r2 and arm-linux-gnueabihf. Unlike the combination of earlier patches, this one does NOT bootstrap on x86_64-linux-gnu with ix86_frame_pointer_required modified to return true for any positive frame sizes. It also triggers one failure in acats-4 on arm-vx7r2, where I didn't expect it to make any difference. I'm yet to investigate it. I wonder if it makes sense to put this in to (1.i) avoid silent wrong code, and (1.ii) shake out some more lra_update_fp2sp_elimination issues, or (2) keep it out and just file a PR about this one known remaining issue, AFAICT only fixable by making sp offset adjustments reversible. WDYT? I am not ready to answer the question about committing the patch right now. It needs more time for investigation which I currently don't have but will have when the next release work starts. In general I think we should have functionality generating the right code whenever any elimination goes prohibited or enabled back and forth during LRA work. That is what I would like to aim at. It might require considerable review of all existing elimination code. So I think the best way right now is to fill a PR. for gcc/ChangeLog PR rtl-optimization/120424 * lra-eliminations.cc (elimination_2sp_occurred_p): Rename from... (elimination_fp2sp_occured_p): ... this. Adjust all uses. (lra_eliminate_regs_1): Don't require a from-frame-pointer elimination to set it. (update_reg_eliminate): Likewise to test it. --- gcc/lra-eliminations.cc | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index 9cdd0c5ff53a2..f6ee33aa70a5d 100644 --- a/gcc/lra-eliminations.cc +++ b/gcc/lra-eliminations.cc @@ -309,8 +309,9 @@ move_plus_up (rtx x) return x; } -/* Flag that we already did frame pointer to stack pointer elimination. */ -static bool elimination_fp2sp_occured_p = false; +/* Flag that we already applied stack pointer elimination offset; sp + updates cannot be undone. */ +static bool elimination_2sp_occurred_p = false; /* Scan X and replace any eliminable registers (such as fp) with a replacement (such as sp) if SUBST_P, plus an offset. The offset is @@ -369,8 +370,8 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode mem_mode, { rtx to = subst_p ? ep->to_rtx : ep->from_rtx; - if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM) - elimination_fp2sp_occured_p = true; + if (ep->to_rtx == stack_pointer_rtx) + elimination_2sp_occurred_p = true; if (maybe_ne (update_sp_offset, 0)) { @@ -402,8 +403,8 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode mem_mode, poly_int64 offset, curr_offset; rtx to = subst_p ? ep->to_rtx : ep->from_rtx; - if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM) - elimination_fp2sp_occured_p = true; + if (ep->to_rtx == stack_pointer_rtx) + elimination_2sp_occurred_p = true; if (! update_p && ! full_p) return simplify_gen_binary (PLUS, Pmode, to, XEXP (x, 1)); @@ -465,8 +466,8 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode mem_mode, { rtx to = subst_p ? ep->to_rtx : ep->from_rtx; - if (ep->to_rtx == stack_pointer_rtx && ep->from == FRAME_POINTER_REGNUM) - elimination_fp2sp_occured_p = true; + if (ep->to_rtx == stack_pointer_rtx) + elimination_2sp_occurred_p = true; if (maybe_ne (update_sp_offset, 0)) { @@ -1213,8 +1214,7 @@ update_reg_eliminate (bitmap insns_with_changed_offsets) pointer elimination the condition is a bit relaxed and we just require that actual elimination has not been done yet. *
Re: [PATCH v1 2/2] middle-end: Enable masked load with non-constant offset
On Tue, Jun 24, 2025 at 4:26 PM Karl Meakin wrote: > > The function `vect_check_gather_scatter` requires the `base` of the load > to be loop-invariant and the `off`set to be not loop-invariant. When faced > with a scenario where `base` is not loop-invariant, instead of giving up > immediately we can try swapping the `base` and `off`, if `off` is > actually loop-invariant. > > Previously, it would only swap if `off` was the constant zero (and so > trivially loop-invariant). This is too conservative: we can still > perform the swap if `off` is a more complex but still loop-invariant > expression, such as a variable defined outside of the loop. > > This allows loops like the function below to be vectorised, if the > target has masked loads and sufficiently large vector registers (eg > `-march=armv8-a+sve -msve-vector-bits=128`): > > ```c > typedef struct Array { > int elems[3]; > } Array; > > int loop(Array **pp, int len, int idx) { > int nRet = 0; > > for (int i = 0; i < len; i++) { > Array *p = pp[i]; > if (p) { > nRet += p->elems[idx]; > } > } > > return nRet; > } > ``` > > gcc/ChangeLog: > * tree-vect-data-refs.cc (vect_check_gather_scatter): Swap > `base` and `off` in more scenarios. Also assert at the end of > the function that `base` and `off` are loop-invariant and not > loop-invariant respectively. > > gcc/testsuite/ChangeLog: > * gcc.target/aarch64/sve/mask_load_2.c: Update tests. > --- > .../gcc.target/aarch64/sve/mask_load_2.c | 4 +-- > gcc/tree-vect-data-refs.cc| 26 --- > 2 files changed, 13 insertions(+), 17 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c > b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c > index 38fcf4f7206..66d95101a14 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c > @@ -19,5 +19,5 @@ int loop(Array **pp, int len, int idx) { > return nRet; > } > > -// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 0 } } > -// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m} 0 } } > +// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 1 } } > +// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m} 1 } } > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > index ee040eb9888..d352ca8bcc3 100644 > --- a/gcc/tree-vect-data-refs.cc > +++ b/gcc/tree-vect-data-refs.cc > @@ -4659,26 +4659,19 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, > loop_vec_info loop_vinfo, >if (off == NULL_TREE) > off = size_zero_node; > > - /* If base is not loop invariant, either off is 0, then we start with just > - the constant offset in the loop invariant BASE and continue with base > - as OFF, otherwise give up. > - We could handle that case by gimplifying the addition of base + off > - into some SSA_NAME and use that as off, but for now punt. */ > + /* BASE must be loop invariant. If it is not invariant, but OFF is, then > we > + * can fix that by swapping BASE and OFF. */ >if (!expr_invariant_in_loop_p (loop, base)) > { > - if (!integer_zerop (off)) > + if (!expr_invariant_in_loop_p (loop, off)) > return false; > - off = base; > - base = size_int (pbytepos); > -} > - /* Otherwise put base + constant offset into the loop invariant BASE > - and continue with OFF. */ > - else > -{ > - base = fold_convert (sizetype, base); > - base = size_binop (PLUS_EXPR, base, size_int (pbytepos)); > + > + std::swap (base, off); > } > > + base = fold_convert (sizetype, base); > + base = size_binop (PLUS_EXPR, base, size_int (pbytepos)); > + >/* OFF at this point may be either a SSA_NAME or some tree expression > from get_inner_reference. Try to peel off loop invariants from it > into BASE as long as possible. */ > @@ -4856,6 +4849,9 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, > loop_vec_info loop_vinfo, >offset_vectype = NULL_TREE; > } > > + gcc_assert (expr_invariant_in_loop_p (loop, base)); > + gcc_assert (!expr_invariant_in_loop_p (loop, off)); Those are reasonably expensive that we want to avoid them, please make them gcc_checking_assert at least. OK with that change. Richard. > + >info->ifn = ifn; >info->decl = decl; >info->base = base; > -- > 2.45.2 >
[PATCH 16/17] Fortran: Silence a clang warning (suggesting a brace) in io.cc
Hi, when GCC is built with clang, it suggests that we add a brace to the initialization of format_asterisk: gcc/fortran/io.cc:32:16: warning: suggest braces around initialization of subobject [-Wmissing-braces] So this patch does that to silence it. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning instead. Thanks, Martin gcc/fortran/ChangeLog: 2025-06-24 Martin Jambor * io.cc (format_asterisk): Add a brace around static initialization location part of the field locus. --- gcc/fortran/io.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/fortran/io.cc b/gcc/fortran/io.cc index 7466d8fe094..4d28c2c90ba 100644 --- a/gcc/fortran/io.cc +++ b/gcc/fortran/io.cc @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3. If not see gfc_st_label format_asterisk = {0, NULL, NULL, -1, ST_LABEL_FORMAT, ST_LABEL_FORMAT, NULL, - 0, {NULL, NULL}, NULL, 0}; + 0, {NULL, {NULL}}, NULL, 0}; typedef struct { -- 2.49.0
Re: [PATCH 06/17] value-relation.h: Mark dom_oracle::next_relation as override
BTW, consider all such future changes in ranger code pre-approved! Thanks Andrew On 6/25/25 10:27, Andrew MacLeod wrote: OK for all the ranger related patches. Thanks Andrew On 6/25/25 10:08, Martin Jambor wrote: Hi, When GCC is compiled with clang, it emits a warning that dom_oracle::next_relation is not marked as override even though it does override a virtual function of its ancestor. This patch marks it as such to silence the warning and for the sake of consistency. There are other member functions in the class which are marked as final override but this particular function is in the protected section so I decided to just mark it as override. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning instead. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * value-relation.h (class dom_oracle): Mark member function next_relation as override. --- gcc/value-relation.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/value-relation.h b/gcc/value-relation.h index 1081877ccca..87f0d856fab 100644 --- a/gcc/value-relation.h +++ b/gcc/value-relation.h @@ -235,7 +235,7 @@ public: void dump (FILE *f) const final override; protected: virtual relation_chain *next_relation (basic_block, relation_chain *, - tree) const; + tree) const override; bool m_do_trans_p; bitmap m_tmp, m_tmp2; bitmap m_relation_set; // Index by ssa-name. True if a relation exists
Re: [PATCH] vect: Misalign checks for gather/scatter.
This change reminds me that we lack documentation about arguments of most of the "complicated" internal functions ... I didn't mention it but I got implicitly reminded several times while writing the patch... ;) An overhaul has been on my todo list for a while but of course it never was top priority. Ideally an adjusted API would also be useable by SLP's argument map. We miss internal_fn_gatherscatter_{offset,scale}_index and possibly a internal_fn_ldst_ptr_index (always zero?) and internal_fn_ldst_alias_align_index (always one, if supported?). if (elsvals && icode != CODE_FOR_nothing) get_supported_else_vals - (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD) + 1, *elsvals); + (icode, internal_fn_else_index (IFN_MASK_GATHER_LOAD), *elsvals); these "fixes" seem to be independent? Just realized I forgot to remove the comments. Due to the additional argument, both optab and IFN happen to have the same arguments now. That's why the + 1 is not necessary any more. Thanks for the comments. Will adjust, test on x86 and re-spin. -- Regards Robin
[COMMITTED] - Remove unused vector in value-relation.cc.
On 6/23/25 18:21, Martin Jambor wrote: @@ -208,66 +208,6 @@ static const tree_code relation_to_code [VREL_LAST] = { ERROR_MARK, ERROR_MARK, LT_EXPR, LE_EXPR, GT_EXPR, GE_EXPR, EQ_EXPR, NE_EXPR }; -// This routine validates that a relation can be applied to a specific set of -// ranges. In particular, floating point x == x may not be true if the NaN bit -// is set in the range. Symbolically the oracle will determine x == x, -// but specific range instances may override this. -// To verify, attempt to fold the relation using the supplied ranges. -// One would expect [1,1] to be returned, anything else means there is something -// in the range preventing the relation from applying. -// If there is no mechanism to verify, assume the relation is acceptable. - -relation_kind -relation_oracle::validate_relation (relation_kind rel, vrange &op1, vrange &op2) -{ - // If there is no mapping to a tree code, leave the relation as is. - tree_code code = relation_to_code [rel]; This seems to have been the only use of the array relation_to_code which we however still have around. Should it be removed too? Thanks, Martin Indeed. Removed thusly. Bootstraps on x86_64-pc-linux-gnu with no regressions. Pushed. Andrew From 25a15a4c0318d928d534a0db9592cb6f0e454707 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Tue, 24 Jun 2025 16:51:56 -0400 Subject: [PATCH 3/3] Remove unused vector in value-relation.cc. The relation_to_code vector in value-relation is now unused, so we can remove it. * value-relation.cc (relation_to_code): Remove. --- gcc/value-relation.cc | 6 -- 1 file changed, 6 deletions(-) diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc index c7ced445ad7..2ac7650fe5b 100644 --- a/gcc/value-relation.cc +++ b/gcc/value-relation.cc @@ -202,12 +202,6 @@ adjust_equivalence_range (vrange &range) } } -// This vector maps a relation to the equivalent tree code. - -static const tree_code relation_to_code [VREL_LAST] = { - ERROR_MARK, ERROR_MARK, LT_EXPR, LE_EXPR, GT_EXPR, GE_EXPR, EQ_EXPR, - NE_EXPR }; - // Given an equivalence set EQUIV, set all the bits in B that are still valid // members of EQUIV in basic block BB. -- 2.45.0
Re: [PATCH 15/17] coroutines: Removed unused private member in cp_coroutine_transform
> On 25 Jun 2025, at 15:17, Martin Jambor wrote: > > Hi, > > when building GCC with clang, it warns that the private member suffix > in class cp_coroutine_transform (defined in gcc/cp/coroutines.h) is > not used which indeed looks like it is the case. This patch therefore > removes it. > > Bootstrapped and tested on x86_64-linx. OK for master? LGTM and presumably in the “trivial / obvious” category. If we need to preserve the original fn body in upcoming changes, we can always add it back. thanks Iain > > Alternatively, as with all of these clang warning issues, I'm > perfectly happy to add an entry to contrib/filter-clang-warnings.py to > ignore the warning instead. > > Thanks, > > Martin > > > gcc/cp/ChangeLog: > > 2025-06-24 Martin Jambor > > * coroutines.h (class cp_coroutine_transform): Remove member > orig_fn_body. > --- > gcc/cp/coroutines.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/gcc/cp/coroutines.h b/gcc/cp/coroutines.h > index 919dc9ab06b..fcc46457915 100644 > --- a/gcc/cp/coroutines.h > +++ b/gcc/cp/coroutines.h > @@ -100,7 +100,6 @@ public: > > private: > tree orig_fn_decl; /* The original function decl. */ > - tree orig_fn_body = NULL_TREE; /* The original function body. */ > location_t fn_start = UNKNOWN_LOCATION; > location_t fn_end = UNKNOWN_LOCATION; > tree resumer = error_mark_node; > -- > 2.49.0 >
Re: [PATCH 06/17] value-relation.h: Mark dom_oracle::next_relation as override
OK for all the ranger related patches. Thanks Andrew On 6/25/25 10:08, Martin Jambor wrote: Hi, When GCC is compiled with clang, it emits a warning that dom_oracle::next_relation is not marked as override even though it does override a virtual function of its ancestor. This patch marks it as such to silence the warning and for the sake of consistency. There are other member functions in the class which are marked as final override but this particular function is in the protected section so I decided to just mark it as override. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warning instead. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * value-relation.h (class dom_oracle): Mark member function next_relation as override. --- gcc/value-relation.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/value-relation.h b/gcc/value-relation.h index 1081877ccca..87f0d856fab 100644 --- a/gcc/value-relation.h +++ b/gcc/value-relation.h @@ -235,7 +235,7 @@ public: void dump (FILE *f) const final override; protected: virtual relation_chain *next_relation (basic_block, relation_chain *, -tree) const; +tree) const override; bool m_do_trans_p; bitmap m_tmp, m_tmp2; bitmap m_relation_set; // Index by ssa-name. True if a relation exists
[PATCH v7 4/9] AArch64: add constants for branch displacements
Extract the hardcoded values for the minimum PC-relative displacements into named constants and document them. gcc/ChangeLog: * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant. (BRANCH_LEN_N_128MiB): Likewise. (BRANCH_LEN_P_1MiB): Likewise. (BRANCH_LEN_N_1MiB): Likewise. (BRANCH_LEN_P_32KiB): Likewise. (BRANCH_LEN_N_32KiB): Likewise. --- gcc/config/aarch64/aarch64.md | 64 ++- 1 file changed, 48 insertions(+), 16 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index d79b74924d4..c4c23dc3669 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -704,7 +704,23 @@ (define_insn "jump" [(set_attr "type" "branch")] ) +;; Maximum PC-relative positive/negative displacements for various branching +;; instructions. +(define_constants + [ +;; +/- 128MiB. Used by B, BL. +(BRANCH_LEN_P_128MiB 134217724) +(BRANCH_LEN_N_128MiB -134217728) + +;; +/- 1MiB. Used by B., CBZ, CBNZ. +(BRANCH_LEN_P_1MiB 1048572) +(BRANCH_LEN_N_1MiB -1048576) +;; +/- 32KiB. Used by TBZ, TBNZ. +(BRANCH_LEN_P_32KiB 32764) +(BRANCH_LEN_N_32KiB -32768) + ] +) ;; --- ;; Conditional jumps @@ -769,13 +785,17 @@ (define_insn "aarch64_bcond" } [(set_attr "type" "branch") (set (attr "length") - (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) - (lt (minus (match_dup 2) (pc)) (const_int 1048572))) + (if_then_else (and (ge (minus (match_dup 2) (pc)) + (const_int BRANCH_LEN_N_1MiB)) + (lt (minus (match_dup 2) (pc)) + (const_int BRANCH_LEN_P_1MiB))) (const_int 4) (const_int 8))) (set (attr "far_branch") - (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) - (lt (minus (match_dup 2) (pc)) (const_int 1048572))) + (if_then_else (and (ge (minus (match_dup 2) (pc)) + (const_int BRANCH_LEN_N_1MiB)) + (lt (minus (match_dup 2) (pc)) + (const_int BRANCH_LEN_P_1MiB))) (const_int 0) (const_int 1)))] ) @@ -830,13 +850,17 @@ (define_insn "aarch64_cbz1" } [(set_attr "type" "branch") (set (attr "length") - (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576)) - (lt (minus (match_dup 1) (pc)) (const_int 1048572))) + (if_then_else (and (ge (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_N_1MiB)) + (lt (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_P_1MiB))) (const_int 4) (const_int 8))) (set (attr "far_branch") - (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) - (lt (minus (match_dup 2) (pc)) (const_int 1048572))) + (if_then_else (and (ge (minus (match_dup 2) (pc)) + (const_int BRANCH_LEN_N_1MiB)) + (lt (minus (match_dup 2) (pc)) + (const_int BRANCH_LEN_P_1MiB))) (const_int 0) (const_int 1)))] ) @@ -870,13 +894,17 @@ (define_insn "*aarch64_tbz1" } [(set_attr "type" "branch") (set (attr "length") - (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -32768)) - (lt (minus (match_dup 1) (pc)) (const_int 32764))) + (if_then_else (and (ge (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_N_32KiB)) + (lt (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_P_32KiB))) (const_int 4) (const_int 8))) (set (attr "far_branch") - (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576)) - (lt (minus (match_dup 1) (pc)) (const_int 1048572))) + (if_then_else (and (ge (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_N_1MiB)) + (lt (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_P_1MiB))) (const_int 0) (const_int 1)))] ) @@ -931,13 +959,17 @@ (define_insn "@aarch64_tbz" } [(set_attr "type" "branch") (set (attr "length") - (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768)) - (lt (minus (match_dup 2) (pc)) (const_int 32764))) + (if_then_else (and (ge (minus (match_dup 2) (pc)) + (const_int BRAN
[PATCH v7 5/9] AArch64: make `far_branch` attribute a boolean
The `far_branch` attribute only ever takes the values 0 or 1, so make it a `no/yes` valued string attribute instead. gcc/ChangeLog: * config/aarch64/aarch64.md (far_branch): Replace 0/1 with no/yes. (aarch64_bcond): Handle rename. (aarch64_cbz1): Likewise. (*aarch64_tbz1): Likewise. (@aarch64_tbz): Likewise. --- gcc/config/aarch64/aarch64.md | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c4c23dc3669..1ff887a977e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -569,9 +569,7 @@ (define_attr "enabled" "no,yes" ;; Attribute that specifies whether we are dealing with a branch to a ;; label that is far away, i.e. further away than the maximum/minimum ;; representable in a signed 21-bits number. -;; 0 :=: no -;; 1 :=: yes -(define_attr "far_branch" "" (const_int 0)) +(define_attr "far_branch" "no,yes" (const_string "no")) ;; Attribute that specifies whether the alternative uses MOVPRFX. (define_attr "movprfx" "no,yes" (const_string "no")) @@ -796,8 +794,8 @@ (define_insn "aarch64_bcond" (const_int BRANCH_LEN_N_1MiB)) (lt (minus (match_dup 2) (pc)) (const_int BRANCH_LEN_P_1MiB))) - (const_int 0) - (const_int 1)))] + (const_string "no") + (const_string "yes")))] ) ;; For a 24-bit immediate CST we can optimize the compare for equality @@ -861,8 +859,8 @@ (define_insn "aarch64_cbz1" (const_int BRANCH_LEN_N_1MiB)) (lt (minus (match_dup 2) (pc)) (const_int BRANCH_LEN_P_1MiB))) - (const_int 0) - (const_int 1)))] + (const_string "no") + (const_string "yes")))] ) ;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ` @@ -876,7 +874,7 @@ (define_insn "*aarch64_tbz1" { if (get_attr_length (insn) == 8) { - if (get_attr_far_branch (insn) == 1) + if (get_attr_far_branch (insn) == FAR_BRANCH_YES) return aarch64_gen_far_branch (operands, 1, "Ltb", "\\t%0, , "); else @@ -905,8 +903,8 @@ (define_insn "*aarch64_tbz1" (const_int BRANCH_LEN_N_1MiB)) (lt (minus (match_dup 1) (pc)) (const_int BRANCH_LEN_P_1MiB))) - (const_int 0) - (const_int 1)))] + (const_string "no") + (const_string "yes")))] ) ;; --- @@ -970,8 +968,8 @@ (define_insn "@aarch64_tbz" (const_int BRANCH_LEN_N_1MiB)) (lt (minus (match_dup 2) (pc)) (const_int BRANCH_LEN_P_1MiB))) - (const_int 0) - (const_int 1)))] + (const_string "no") + (const_string "yes")))] ) -- 2.45.2
Re: [PATCH] c++/modules: Ensure type of partial spec VAR_DECL is consistent with its template [PR120644]
On 6/25/25 9:02 AM, Nathaniel Shead wrote: On Tue, Jun 24, 2025 at 12:10:09PM -0400, Patrick Palka wrote: On Tue, 24 Jun 2025, Jason Merrill wrote: On 6/23/25 5:41 PM, Nathaniel Shead wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/15? -- >8 -- We were erroring because the TEMPLATE_DECL of the existing partial specialisation has an undeduced return type, but the imported declaration did not. The root cause is similar to what was fixed in r13-2744-g4fac53d6522189, where modules streaming code assumes that a TEMPLATE_DECL and its DECL_TEMPLATE_RESULT will always have the same TREE_TYPE. That commit fixed the issue by ensuring that when the type of a variable is deduced the TEMPLATE_DECL is updated as well, but this missed handling partial specialisations. However, I don't think we actually care about that, since it seems that only the type of the inner decl actually matters in practice. Instead, this patch handles the issue on the modules side when deduping a streamed decl, by only comparing the inner type. PR c++/120644 gcc/cp/ChangeLog: * decl.cc (cp_finish_decl): Remove workaround. Hmm, if we aren't going to try to keep the type of the TEMPLATE_DECL correct, maybe we should always set it to NULL_TREE to make sure we only look at the inner type. FWIW cp_finish_decl can get at the TEMPLATE_DECL of a VAR_DECL corresponding to a partial specialization via TI_TEMPLATE (TI_PARTIAL_INFO (DECL_TEMPLATE_INFO (decl))) if we do want to end up keeping the two TREE_TYPEs in sync. Thanks. On further reflection, maybe the safest approach is to just ensure that the types are always consistent (including for partial specs); this is what the following patch does. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- Subject: [PATCH] c++/modules: Ensure type of partial spec VAR_DECL is consistent with its template [PR120644] We were erroring because the TEMPLATE_DECL of the existing partial specialisation has an undeduced return type, but the imported declaration did not. The root cause is similar to what was fixed in r13-2744-g4fac53d6522189, where modules streaming code assumes that a TEMPLATE_DECL and its DECL_TEMPLATE_RESULT will always have the same TREE_TYPE. That commit fixed the issue by ensuring that when the type of a variable is deduced the TEMPLATE_DECL is updated as well, but missed handling partial specialisations. This patch ensures that the same adjustment is made there as well. PR c++/120644 gcc/cp/ChangeLog: * decl.cc (cp_finish_decl): Also propagate type to partial templates. * module.cc (trees_out::decl_value): Add assertion that the TREE_TYPE of a streamed template decl matches its inner. (trees_in::is_matching_decl): Clarify function return type deduction should only occur for non-TEMPLATE_DECL. gcc/testsuite/ChangeLog: * g++.dg/modules/auto-7.h: New test. * g++.dg/modules/auto-7_a.H: New test. * g++.dg/modules/auto-7_b.C: New test. Signed-off-by: Nathaniel Shead Reviewed-by: Jason Merrill Reviewed-by: Patrick Palka --- gcc/cp/decl.cc | 13 + gcc/cp/module.cc| 7 ++- gcc/testsuite/g++.dg/modules/auto-7.h | 12 gcc/testsuite/g++.dg/modules/auto-7_a.H | 5 + gcc/testsuite/g++.dg/modules/auto-7_b.C | 5 + 5 files changed, 37 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/auto-7.h create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_a.H create mode 100644 gcc/testsuite/g++.dg/modules/auto-7_b.C diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 4fe97ffbf8f..59701197e16 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -8923,10 +8923,15 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, cp_apply_type_quals_to_decl (cp_type_quals (type), decl); /* Update the type of the corresponding TEMPLATE_DECL to match. */ - if (DECL_LANG_SPECIFIC (decl) - && DECL_TEMPLATE_INFO (decl) - && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) == decl) - TREE_TYPE (DECL_TI_TEMPLATE (decl)) = type; + if (DECL_LANG_SPECIFIC (decl) && DECL_TEMPLATE_INFO (decl)) + { + tree info = DECL_TEMPLATE_INFO (decl); + tree tmpl = TI_TEMPLATE (info); + if (DECL_TEMPLATE_RESULT (tmpl) == decl) + TREE_TYPE (tmpl) = type; + else if (PRIMARY_TEMPLATE_P (tmpl) && TI_PARTIAL_INFO (info)) + TREE_TYPE (TI_TEMPLATE (TI_PARTIAL_INFO (info))) = type; + } Perhaps we should update template_for_substitution to handle partial specs and then use it here? } if (ensure_literal_type_for_constexpr_object (decl) == error_mark_node) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index c99988da05b..53edb2ff203 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -8212,6 +8212,10 @@ trees_out::d
[PATCH v7 0/9] AArch64: CMPBR support
This patch series adds support for the CMPBR extension. It includes the new `+cmpbr` option and rules to generate the new instructions when lowering conditional branches. Changelog: * v7: - Support far branches and add a test for them. - Replace `aarch64_cb_short_operand` with `aarch64_reg_or_zero_operand`. - Delete the new predicates that aren't needed anymore. - Minor formatting and comment fixes. * v6: - Correct the constraint string for immediate operands. - Drop the commit for adding `%j` format specifiers. The suffix for the `cb` instruction is now calculated by the `cmp_op` code attribute. * v5: - Moved Moved patch 10/10 (adding %j ...) before patch 8/10 (rules for CMPBR...). Every commit in the series should now produce a correct compiler. - Reduce excessive diff context by not passing `--function-context` to `git format-patch`. * v4: - Added a commit to use HS/LO instead of CS/CC mnemonics. - Rewrite the range checks for immediate RHSes in aarch64.cc: CBGE, CBHS, CBLE and CBLS have different ranges of allowed immediates than the other comparisons. Karl Meakin (9): AArch64: place branch instruction rules together AArch64: reformat branch instruction rules AArch64: rename branch instruction rules AArch64: add constants for branch displacements AArch64: make `far_branch` attribute a boolean AArch64: recognize `+cmpbr` option AArch64: precommit test for CMPBR instructions AArch64: rules for CMPBR instructions AArch64: make rules for CBZ/TBZ higher priority .../aarch64/aarch64-option-extensions.def |2 + gcc/config/aarch64/aarch64-protos.h |2 + gcc/config/aarch64/aarch64-simd.md|2 +- gcc/config/aarch64/aarch64-sme.md |2 +- gcc/config/aarch64/aarch64.cc | 39 +- gcc/config/aarch64/aarch64.h |3 + gcc/config/aarch64/aarch64.md | 564 -- gcc/config/aarch64/constraints.md | 18 + gcc/config/aarch64/iterators.md | 30 + gcc/doc/invoke.texi |3 + gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1704 + gcc/testsuite/lib/target-supports.exp | 14 +- 12 files changed, 2162 insertions(+), 221 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c -- 2.45.2
[PATCH v7 8/9] AArch64: rules for CMPBR instructions
Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR extension is enabled. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function. * config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise. * config/aarch64/aarch64.md (cbranch4): Rename to ... (cbranch4): ...here, and emit CMPBR if possible. (cbranch4): New expand rule. (aarch64_cb): New insn rule. (aarch64_cb): Likewise. * config/aarch64/constraints.md (Uc0): New constraint. (Uc1): Likewise. (Uc2): Likewise. * config/aarch64/iterators.md (cmpbr_suffix): New mode attr. (INT_CMP): New code iterator. (cmpbr_imm_constraint): New code attr. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cmpbr.c: --- gcc/config/aarch64/aarch64-protos.h | 2 + gcc/config/aarch64/aarch64.cc| 33 ++ gcc/config/aarch64/aarch64.md| 95 +++- gcc/config/aarch64/constraints.md| 18 + gcc/config/aarch64/iterators.md | 30 ++ gcc/testsuite/gcc.target/aarch64/cmpbr.c | 587 --- 6 files changed, 379 insertions(+), 386 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 31f2f5b8bd2..e946e8da11d 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -1135,6 +1135,8 @@ bool aarch64_general_check_builtin_call (location_t, vec, unsigned int, tree, unsigned int, tree *); +bool aarch64_cb_rhs (rtx_code op_code, rtx rhs); + namespace aarch64 { void report_non_ice (location_t, tree, unsigned int); void report_out_of_range (location_t, tree, unsigned int, HOST_WIDE_INT, diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 2cd03b941bd..f3ce3a15b09 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -959,6 +959,39 @@ svpattern_token (enum aarch64_svpattern pattern) gcc_unreachable (); } +/* Return true if RHS is an operand suitable for a CB (immediate) + instruction. OP_CODE determines the type of the comparison. */ +bool +aarch64_cb_rhs (rtx_code op_code, rtx rhs) +{ + if (!CONST_INT_P (rhs)) +return REG_P (rhs); + + HOST_WIDE_INT rhs_val = INTVAL (rhs); + + switch (op_code) +{ +case EQ: +case NE: +case GT: +case GTU: +case LT: +case LTU: + return IN_RANGE (rhs_val, 0, 63); + +case GE: /* CBGE: signed greater than or equal */ +case GEU: /* CBHS: unsigned greater than or equal */ + return IN_RANGE (rhs_val, 1, 64); + +case LE: /* CBLE: signed less than or equal */ +case LEU: /* CBLS: unsigned less than or equal */ + return IN_RANGE (rhs_val, -1, 62); + +default: + return false; +} +} + /* Return the location of a piece that is known to be passed or returned in registers. FIRST_ZR is the first unused vector argument register and FIRST_PR is the first unused predicate argument register. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1ff887a977e..32e0f739ae5 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -717,6 +717,10 @@ (define_constants ;; +/- 32KiB. Used by TBZ, TBNZ. (BRANCH_LEN_P_32KiB 32764) (BRANCH_LEN_N_32KiB -32768) + +;; +/- 1KiB. Used by CBB, CBH, CB. +(BRANCH_LEN_P_1Kib 1020) +(BRANCH_LEN_N_1Kib -1024) ] ) @@ -724,18 +728,35 @@ (define_constants ;; Conditional jumps ;; --- -(define_expand "cbranch4" +(define_expand "cbranch4" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand:GPI 1 "register_operand") (match_operand:GPI 2 "aarch64_plus_operand")]) (label_ref (match_operand 3)) (pc)))] "" - " - operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1], -operands[2]); - operands[2] = const0_rtx; - " + { + if (TARGET_CMPBR && aarch64_cb_rhs (GET_CODE (operands[0]), operands[2])) +{ + /* Fall-through to `aarch64_cb`. */ +} + else +{ + operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), +operands[1], operands[2]); + operands[2] = const0_rtx; +} + } +) + +(define_expand "cbranch4" + [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" + [(match_operand:SHORT 1 "register_operand") +(match_operand:SHORT 2 "aarch64_reg_or_zero")]) + (label_ref (match_operand 3)) + (pc)))] + "TARGET_CMPBR" + "" ) (define_expand "cbranch4" @@ -763,6 +784,68 @@
[PATCH v7 7/9] AArch64: precommit test for CMPBR instructions
Commit the test file `cmpbr.c` before rules for generating the new instructions are added, so that the changes in codegen are more obvious in the next commit. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add `cmpbr` to the list of extensions. * gcc.target/aarch64/cmpbr.c: New test. --- gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1877 ++ gcc/testsuite/lib/target-supports.exp| 14 +- 2 files changed, 1885 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c b/gcc/testsuite/gcc.target/aarch64/cmpbr.c new file mode 100644 index 000..9ca376a8f33 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c @@ -0,0 +1,1877 @@ +// Test that the instructions added by FEAT_CMPBR are emitted +// { dg-do compile } +// { dg-do-if assemble { target aarch64_asm_cmpbr_ok } } +// { dg-options "-march=armv9.5-a+cmpbr -O2" } +// { dg-final { check-function-bodies "**" "*/" "" { target *-*-* } {\.L[0-9]+} } } + +#include + +typedef uint8_t u8; +typedef int8_t i8; + +typedef uint16_t u16; +typedef int16_t i16; + +typedef uint32_t u32; +typedef int32_t i32; + +typedef uint64_t u64; +typedef int64_t i64; + +int taken(); +int not_taken(); + +#define COMPARE(ty, name, op, rhs) \ + int ty##_x0_##name##_##rhs(ty x0, ty x1) { \ +return (x0 op rhs) ? taken() : not_taken(); \ + } + +#define COMPARE_ALL(unsigned_ty, signed_ty, rhs) \ + COMPARE(unsigned_ty, eq, ==, rhs); \ + COMPARE(unsigned_ty, ne, !=, rhs); \ + \ + COMPARE(unsigned_ty, ult, <, rhs); \ + COMPARE(unsigned_ty, ule, <=, rhs); \ + COMPARE(unsigned_ty, ugt, >, rhs); \ + COMPARE(unsigned_ty, uge, >=, rhs); \ + \ + COMPARE(signed_ty, slt, <, rhs); \ + COMPARE(signed_ty, sle, <=, rhs); \ + COMPARE(signed_ty, sgt, >, rhs); \ + COMPARE(signed_ty, sge, >=, rhs); + +// CBB (register) +COMPARE_ALL(u8, i8, x1); + +// CBH (register) +COMPARE_ALL(u16, i16, x1); + +// CB (register) +COMPARE_ALL(u32, i32, x1); +COMPARE_ALL(u64, i64, x1); + +// CB (immediate) +COMPARE_ALL(u32, i32, 42); +COMPARE_ALL(u64, i64, 42); + +// Special cases +// Comparisons against the immediate 0 can be done for all types, +// because we can use the wzr/xzr register as one of the operands. +// However, we should prefer to use CBZ/CBNZ or TBZ/TBNZ when possible, +// because they have larger range. +COMPARE_ALL(u8, i8, 0); +COMPARE_ALL(u16, i16, 0); +COMPARE_ALL(u32, i32, 0); +COMPARE_ALL(u64, i64, 0); + +// CBB and CBH cannot have immediate operands. +// Instead we have to do a MOV+CB. +COMPARE_ALL(u8, i8, 42); +COMPARE_ALL(u16, i16, 42); + +// 64 is out of the range for immediate operands (0 to 63). +// * For 8/16-bit types, use a MOV+CB as above. +// * For 32/64-bit types, use a CMP+B instead, +// because B has a longer range than CB. +COMPARE_ALL(u8, i8, 64); +COMPARE_ALL(u16, i16, 64); +COMPARE_ALL(u32, i32, 64); +COMPARE_ALL(u64, i64, 64); + +// 4098 is out of the range for CMP (0 to 4095, optionally shifted by left by 12 +// bits), but it can be materialized in a single MOV. +COMPARE_ALL(u16, i16, 4098); +COMPARE_ALL(u32, i32, 4098); +COMPARE_ALL(u64, i64, 4098); + +// If the branch destination is out of range (1KiB), we have to generate an +// extra B instruction (which can handle larger displacements) and branch around +// it +int far_branch(i32 x, i32 y) { + volatile int z = 0; + if (x == y) { +// clang-format off +#define STORE_2() z = 0; z = 0; +#define STORE_4() STORE_2(); STORE_2(); +#define STORE_8() STORE_4(); STORE_4(); +#define STORE_16() STORE_8(); STORE_8(); +#define STORE_32() STORE_16(); STORE_16(); +#define STORE_64() STORE_32(); STORE_32(); +#define STORE_128() STORE_64(); STORE_64(); +#define STORE_256() STORE_128(); STORE_128(); +// clang-format on + +STORE_256(); + } + return taken(); +} + +/* +** u8_x0_eq_x1: +** and w1, w1, 255 +** cmp w1, w0, uxtb +** beq .L4 +** b not_taken +** .L4: +** b taken +*/ + +/* +** u8_x0_ne_x1: +** and w1, w1, 255 +** cmp w1, w0, uxtb +** beq .L6 +** b taken +** .L6: +** b not_taken +*/ + +/* +** u8_x0_ult_
[PATCH v7 1/9] AArch64: place branch instruction rules together
The rules for conditional branches were spread throughout `aarch64.md`. Group them together so it is easier to understand how `cbranch4` is lowered to RTL. gcc/ChangeLog: * config/aarch64/aarch64.md (condjump): Move. (*compare_condjump): Likewise. (aarch64_cb1): Likewise. (*cb1): Likewise. (tbranch_3): Likewise. (@aarch64_tb): Likewise. --- gcc/config/aarch64/aarch64.md | 387 ++ 1 file changed, 201 insertions(+), 186 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e11e13033d2..fcc24e300e6 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -682,6 +682,10 @@ (define_insn "aarch64_write_sysregti" "msrr\t%x0, %x1, %H1" ) +;; --- +;; Unconditional jumps +;; --- + (define_insn "indirect_jump" [(set (pc) (match_operand:DI 0 "register_operand" "r"))] "" @@ -700,6 +704,12 @@ (define_insn "jump" [(set_attr "type" "branch")] ) + + +;; --- +;; Conditional jumps +;; --- + (define_expand "cbranch4" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand:GPI 1 "register_operand") @@ -739,6 +749,197 @@ (define_expand "cbranchcc4" "" "") +(define_insn "condjump" + [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" + [(match_operand 1 "cc_register" "") (const_int 0)]) + (label_ref (match_operand 2 "" "")) + (pc)))] + "" + { +/* GCC's traditional style has been to use "beq" instead of "b.eq", etc., + but the "." is required for SVE conditions. */ +bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode; +if (get_attr_length (insn) == 8) + return aarch64_gen_far_branch (operands, 2, "Lbcond", +use_dot_p ? "b.%M0\\t" : "b%M0\\t"); +else + return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2"; + } + [(set_attr "type" "branch") + (set (attr "length") + (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) + (lt (minus (match_dup 2) (pc)) (const_int 1048572))) + (const_int 4) + (const_int 8))) + (set (attr "far_branch") + (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) + (lt (minus (match_dup 2) (pc)) (const_int 1048572))) + (const_int 0) + (const_int 1)))] +) + +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movkx0, #imm2, lsl 16 /* x0 contains CST. */ +;; cmp x1, x0 +;; b .Label +;; into the shorter: +;; sub x0, x1, #(CST & 0xfff000) +;; subsx0, x0, #(CST & 0x000fff) +;; b .Label +(define_insn_and_split "*compare_condjump" + [(set (pc) (if_then_else (EQL + (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 "aarch64_imm24" "n")) + (label_ref:P (match_operand 2 "" "")) + (pc)))] + "!aarch64_move_imm (INTVAL (operands[1]), mode) + && !aarch64_plus_operand (operands[1], mode) + && !reload_completed" + "#" + "&& true" + [(const_int 0)] + { +HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; +HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; +rtx tmp = gen_reg_rtx (mode); +emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm))); +emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); +rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); +rtx cmp_rtx = gen_rtx_fmt_ee (, mode, + cc_reg, const0_rtx); +emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); +DONE; + } +) + +(define_insn "aarch64_cb1" + [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r") + (const_int 0)) + (label_ref (match_operand 1 "" "")) + (pc)))] + "!aarch64_track_speculation" + { +if (get_attr_length (insn) == 8) + return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, "); +else + return "\\t%0, %l1"; + } + [(set_attr "type" "branch") + (set (attr "length") + (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576)) + (lt (minus (match_dup 1) (pc)) (const_int 1048572))) + (const_int 4) + (const_int 8))) + (set (attr "far_branch") + (if_then_else (and (ge (minu
Re: [PATCH] libstdc++: Type-erase chrono-data for formatting [PR110739]
Tomasz Kamiński writes: > This patch reworks the formatting for the chrono types, such that they are all > formatted in terms of _ChronoData class, that includes all required fields. > Populating each required field is performed in formatter for specific type, > based on the chrono-spec used. > > To facilitate above, the _ChronoSpec now includes additional _M_needed field, > that represnts the chrono data that is referenced by format spec (this value > is also configured for __defSpec). This value differs from the value of > __parts passed to _M_parse, which does include all fields that can be computed > from input (e.g. weekday_indexed can be computed for year_month_day). Later > it is used to fill _ChronoData, in particular _M_fill_* family of functions, > to determine if given field needs to be set, and thus it's value needs to be > computed. > > In consequence _ChronoParts enum was exteneded with additional values, > that allows more fine grained indentification: > * _TimeOfDay is separated into _HoursMinutesSeconds and _Subseconds, > * _TimeZone is separated into _ZoneAbbrev and _ZoneOffset, > * _LocalDays, _WeekdayIndex are defiend in included in _Date, > * _Duration is removed, and instead _EpochUnits and _UnitSuffix are >introduced. > Furthermore, to avoid name conflicts _ChonoParts is now defined as enum class, > with additional operators that simplify uses. > > In addition to fields that can be printed using chron-spec, _ChronoData > stores: > * Total days in wall time (_M_ldays), day of year (_M_day_of_year) - used by >struct tm construction, and for ISO calendar computation. > * Total seconds in wall time (_M_lseconds) - this value may be different from >sum of days, hours, minutes, seconds (e.g. see utc_time below). Included >to allow future extension, like printing total minutes. > * Total seconds since epoch - due offset different from above. Again to be >used with future extension (e.g. %s as proposed in P2945R1). > * Subseconds - count of attoseconds (10^(-18)), in addition to priting can >be used to compute fractional hours, minutes. > The both total seconds fielkds we use single _TotalSeconds enumerator in > _ChronoParts, that when present in combination with _EpochUnits or _LocalDays > indicates that _M_eseconds (_EpochSeconds) or _M_lseconds (_LocalSeconds) are > provided/required. > > To handle type formatting of time since epoch ('%Q'|_EpochUnits), we use the > format_args mechanism, where the result of +d.count() (see LWG4118) is erased > into make_format_args to local __arg_store, that is later referenced by > _M_ereps (_M_ereps.get(0)). > > To handle precision values, and in prepartion to allow user to configure ones, > we store the precision as third element of _M_ereps (_M_ereps.get(2)), this > allows duration with precision to be printed using "{0:{2}}". For subseconds > the precision is handled differently depending on the representation: > * for integral reps, _M_subseconds value is used to determine fractional > value, >precision is trimmed to 18 digits; > * for floating-points, we _M_ereps stores duration initialized with only >fractional seconds, that is later formatted with precision. > Always using _M_subseconds fields for integral duration, means that we do not > use formattter for user-defined durations that are considered to be integral > (see empty_spec.cc file change). To avoid potentially expensive computation > of _M_subseconds, we make sure that _ChronoParts::_Subseconds is set only if > _Subseconds are needed. In particular we remove this flag for localized ouput > in _M_parse. > > Construction the _M_ereps as described above is handled by > __formatter_duration, > that is then used to format duration, hh_mm_ss and time_points specialization. > This class also handles _UnitSuffix, the _M_units_suffix field is populated > either with predefined suffix (chrono::__detail::__units_suffix) or one > produced > locally. > > Finally, formatters for types listed below contains type specific logic: > * hh_mm_ss - we do not compute total duration and seconds, unless explicitly >requested, as such computation may overflow; > * utc_time - for time during leap second insertion, the _M_seconds field is >increased to 60; > * __local_time_fmt - exception is thrown if zone offset (_ZoneOffset) or >abbrevation (_ZoneAbbrev) is requsted, but corresponding pointer is null, >futhermore conversion from `char` to `wchar_t` for abbreviation is > performed >if needed. > > PR libstdc++/110739 > > libstdc++-v3/ChangeLog: > > * include/bits/chrono_io.h (__format::__no_timezone_available): > Removed, replaced with separate throws in formatter for > __local_time_fmt > (__format::_ChronoParts): Defined additional enumertors and > declared as enum class. > (__format::operator&(_ChronoParts, _ChronoParts)) > (__format::operator&=(_ChronoParts&, _ChronoParts)) > (__format
Re: [PATCH 03/17] Diagnostics: Mark path_label::get_effects as final override
On Wed, 2025-06-25 at 16:04 +0200, Martin Jambor wrote: > Hi, > > When compiling diagnostic-path-output.cc with clang, it warns that > path_label::get_effects should be marked as override. That looks > like > a good idea and from a brief look I also believe it should be marked > as final (the other override in the class is marked as both), so this > patch does that. > > Likewise for html_output_format::after_diagnostic in > diagnostic-format-html.cc which also already has quite a few member > functions marked as final override. > > Bootstrapped and tested on x86_64-linx. OK for master? Yes please Thanks for doing this Dave
Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]
On 6/25/25 3:08 AM, Jakub Jelinek wrote: On Tue, Jun 24, 2025 at 05:19:59PM -0400, Jason Merrill wrote: I think we could move the initialization of the fixed_type_p and virtual_access variables up, they don't need to be after cp_build_addr_expr. I don't understand why it doesn't depend on cp_build_addr_expr. I've tried the following patch and while it didn't regress anything on make GXX_TESTSUITE_STDS=98,11,14,17,^C,23,26 check-g++ it regressed FAIL: 23_containers/vector/bool/cmp_c++20.cc -std=gnu++20 (test for excess errors) FAIL: 23_containers/vector/bool/cmp_c++20.cc -std=gnu++26 (test for excess errors) In there code is PLUS_EXPR, !want_pointer, !has_empty, but uneval is true and expr is std::vector::begin (&c) before cp_build_addr_expr and &TARGET_EXPR ::begin (&c)> after it. resolves_to_fixed_type_p (expr) is 0 before cp_build_addr_expr and 1 after it. Ah, looks like fixed_type_or_null needs to handle a CALL_EXPR of class type like a TARGET_EXPR. I also wonder why the call isn't already wrapped in a TARGET_EXPR by build_cxx_call=>build_cplus_new at this point. v_binfo is false though, so in that particular case I think we don't actually care about fixed_type_p value, but it doesn't raise confidence that testing resolves_to_fixed_type_p early is ok. So, shall I e.g. for the if (TREE_PRIVATE case if the outer type has CLASSTYPE_VBASECLASSES walk the for (vbase = TYPE_BINFO (t); vbase; vbase = TREE_CHAIN (vbase)) if (BINFO_VIRTUAL_P (vbase) && !BINFO_PRIMARY_P (vbase)) and in that case try to compare byte_position (TREE_OPERAND (path, 1)) against BINFO_OFFSET (vbase) and if it matches (plus perhaps some type check?) then decide based on BINFO_BASE_ACCESS or something like that whether it was a private/protected vs. public virtual base? It seems simpler to pass an accurate access to the build_base_field above. At least whether the whole BINFO_INHERITANCE_CHAIN is public or not, I suppose the distinction between private and protected doesn't matter. I'm afraid I'm quite lost on what actually is public base class that [expr.dynamic.cast] talks about in the case of virtual bases because a virtual base can appear many times among the bases and if it is virtual in all cases, there is just one copy of it and it can be public in some paths and private/protected in others. And where to find that information. I think it would make sense to add a publicly_virtually_derived_p function after publicly_uniquely_derived_p, which adds ba_require_virtual to the flags passed by the latter function. And then you can use that here. I've tried the following testcase and it seems that it succeeds unless -DP1 -DP2 -DP1 -DP3 -DP1 -DP6 -DP2 -DP3 -DP6 -DP4 -DP5 -DP6 -DP2 -DP3 -DP4 -DP5 is a subset of the -DPN options or in case of clang++ also -DP2 -DP4 -DP5 (for that g++ passes, clang++ fails). E.g. what is the difference between -DP1 which works and S is private in one case and public in 2 others, while -DP1 -DP2 doesn't work and is private in two cases and public in one. Hmm, that does seem wrong. For -DP1 -DP2 dynamic_cast, following the logic in https://eel.is/c++draft/expr#dynamic.cast-9 we get 9.1) in t, (S&)t does not refer to a public base of a T, because -DP1 makes it a private base. So move on. 9.2) (S&)t does refer to a public base of t because we didn't specify -DP4. V does have an unambiguous (because virtual) public (because no -DP3 -DP5) T base, so this ought to succeed. This looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81078 Jason #ifdef P1 #undef P1 #define P1 private #else #define P1 #endif #ifdef P2 #undef P2 #define P2 private #else #define P2 #endif #ifdef P3 #undef P3 #define P3 private #else #define P3 #endif #ifdef P4 #undef P4 #define P4 private #else #define P4 #endif #ifdef P5 #undef P5 #define P5 private #else #define P5 #endif #ifdef P6 #undef P6 #define P6 private #else #define P6 #endif struct S { int a, b; virtual int bar (int) { return 0; } }; struct T : virtual P1 S { int c, d; }; struct U : virtual P2 S, virtual P3 T { int e; }; struct V : virtual P4 S, virtual P5 T, virtual P6 U { int f; S &foo () { return (S &)*this; } }; int main () { V t; t.f = 1; // t.c = 2; dynamic_cast (t.foo ()); dynamic_cast (t.foo ()); dynamic_cast (t.foo ()); }
Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]
On Wed, Jun 25, 2025 at 12:37:33PM -0400, Jason Merrill wrote: > Ah, looks like fixed_type_or_null needs to handle a CALL_EXPR of class type > like a TARGET_EXPR. I also wonder why the call isn't already wrapped in a > TARGET_EXPR by build_cxx_call=>build_cplus_new at this point. Wonder if it has anything to do with being in unevaluated context (and whether perhaps cp_build_addr_expr isn't undesirable for that case, because that can make vars odr-used etc.; are are odr uses in unevaluated context also supposed to make vars odr-used?). Jakub
Re: [PATCH 16/17] Fortran: Silence a clang warning (suggesting a brace) in io.cc
Thanks for cleaning up gfortran code. I was curious about what the GNU Coding Standard said about this case, but it does not consider initialization of subobjects. I did find 5.3 Clean Use of C Constructs ... Don't make the program ugly just to placate static analysis tools such as lint, clang, and GCC with extra warnings options such as -Wconversion and -Wundef. These tools can help find bugs and unclear code, but they can also generate so many false alarms that it hurts readability to silence them with unnecessary casts, wrappers, and other complications. I do not see the extra '{...}' as hurting readability. I have no objection to the change. Does anyone else have a comment? -- steve On Wed, Jun 25, 2025 at 04:18:16PM +0200, Martin Jambor wrote: > Hi, > > when GCC is built with clang, it suggests that we add a brace to the > initialization of format_asterisk: > > gcc/fortran/io.cc:32:16: warning: suggest braces around initialization of > subobject [-Wmissing-braces] > > So this patch does that to silence it. > > Bootstrapped and tested on x86_64-linx. OK for master? > > Alternatively, as with all of these clang warning issues, I'm > perfectly happy to add an entry to contrib/filter-clang-warnings.py to > ignore the warning instead. > > Thanks, > > Martin > > > > gcc/fortran/ChangeLog: > > 2025-06-24 Martin Jambor > > * io.cc (format_asterisk): Add a brace around static initialization > location part of the field locus. > --- > gcc/fortran/io.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/fortran/io.cc b/gcc/fortran/io.cc > index 7466d8fe094..4d28c2c90ba 100644 > --- a/gcc/fortran/io.cc > +++ b/gcc/fortran/io.cc > @@ -29,7 +29,7 @@ along with GCC; see the file COPYING3. If not see > > gfc_st_label > format_asterisk = {0, NULL, NULL, -1, ST_LABEL_FORMAT, ST_LABEL_FORMAT, NULL, > -0, {NULL, NULL}, NULL, 0}; > +0, {NULL, {NULL}}, NULL, 0}; > > typedef struct > { > -- > 2.49.0 -- Steve
Re: [PATCH v6 9/9] AArch64: make rules for CBZ/TBZ higher priority
Karl Meakin writes: > Move the rules for CBZ/TBZ to be above the rules for > CBB/CBH/CB. We want them to have higher priority > because they can express larger displacements. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (aarch64_cbz1): Move > above rules for CBB/CBH/CB. > (*aarch64_tbz1): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/cmpbr.c: Update tests. > --- > gcc/config/aarch64/aarch64.md| 159 --- > gcc/testsuite/gcc.target/aarch64/cmpbr.c | 32 ++--- > 2 files changed, 101 insertions(+), 90 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 23bce55f620..dd58e88fa2f 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -726,6 +726,17 @@ (define_constants > > ;; --- > ;; Conditional jumps Very, very minor, but: if we're following the aarch64-sve.md convention, there'd be another: ;; --- here, to separate the heading from the description. > +;; The order of the rules below is important. > +;; Higher priority rules are preferred because they can express larger > +;; displacements. > +;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ. > +;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ. > +;; 3) When the CMPBR extension is enabled: > +;; a) Comparisons between two registers are handled by > +;; CBB/CBH/CB. > +;; b) Comparisons between a GP register and an immediate in the range 0-63 > are Maybe just "in-range immediate", given the multiple ranges in play. OK with those changes, thanks. However, I suppose this patch means that: /* Fall through to `aarch64_cb`. */ from patch 8 is not really accurate, since sometimes we might snag a higher-priority comparison. So maybe just. /* The branch is supported natively. */ Thanks, Richard > +;; handled by CB (immediate). > +;; 4) Otherwise, emit a CMP+B sequence. > ;; --- > > (define_expand "cbranch4" > @@ -783,6 +794,80 @@ (define_expand "cbranchcc4" >"" > ) > > +;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ` > +(define_insn "aarch64_cbz1" > + [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r") > + (const_int 0)) > +(label_ref (match_operand 1)) > +(pc)))] > + "!aarch64_track_speculation" > + { > +if (get_attr_length (insn) == 8) > + return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, > "); > +else > + return "\\t%0, %l1"; > + } > + [(set_attr "type" "branch") > + (set (attr "length") > + (if_then_else (and (ge (minus (match_dup 1) (pc)) > +(const_int BRANCH_LEN_N_1MiB)) > +(lt (minus (match_dup 1) (pc)) > +(const_int BRANCH_LEN_P_1MiB))) > + (const_int 4) > + (const_int 8))) > + (set (attr "far_branch") > + (if_then_else (and (ge (minus (match_dup 2) (pc)) > +(const_int BRANCH_LEN_N_1MiB)) > +(lt (minus (match_dup 2) (pc)) > +(const_int BRANCH_LEN_P_1MiB))) > + (const_string "no") > + (const_string "yes")))] > +) > + > +;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ` > +(define_insn "*aarch64_tbz1" > + [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" > "r") > + (const_int 0)) > +(label_ref (match_operand 1)) > +(pc))) > + (clobber (reg:CC CC_REGNUM))] > + "!aarch64_track_speculation" > + { > +if (get_attr_length (insn) == 8) > + { > + if (get_attr_far_branch (insn) == FAR_BRANCH_YES) > + return aarch64_gen_far_branch (operands, 1, "Ltb", > + "\\t%0, , "); > + else > + { > + char buf[64]; > + uint64_t val = ((uint64_t) 1) > + << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1); > + sprintf (buf, "tst\t%%0, %" PRId64, val); > + output_asm_insn (buf, operands); > + return "\t%l1"; > + } > + } > +else > + return "\t%0, , %l1"; > + } > + [(set_attr "type" "branch") > + (set (attr "length") > + (if_then_else (and (ge (minus (match_dup 1) (pc)) > +(const_int BRANCH_LEN_N_32KiB)) > +(lt (minus (match_dup 1) (pc)) > +(const_int BRANCH_LEN_P_32KiB))) > + (const_int 4) > + (const_int 8))) > + (set (attr "far_branch") > + (if_then_else (and (ge (minus (match_dup 1) (pc)) > +
[PATCH v3] x86: Add preserve_none and update no_caller_saved_registers attributes
Add preserve_none attribute which is similar to no_callee_saved_registers attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are used for integer parameter passing. This can be used in an interpreter to avoid saving/restoring the registers in functions which process byte codes. It improved the pystones benchmark by 6-7%: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15 Remove -mgeneral-regs-only restriction on no_caller_saved_registers attribute. Only SSE is allowed since SSE XMM register load preserves the upper bits in YMM/ZMM register while YMM register load zeros the upper 256 bits of ZMM register, and preserving 32 ZMM registers can be quite expensive. gcc/ PR target/119628 * config/i386/i386-expand.cc (ix86_expand_call): Call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. * config/i386/i386-options.cc (ix86_set_func_type): Look up preserve_none attribute. Check preserve_none attribute for interrupt attribute. Don't check no_caller_saved_registers nor no_callee_saved_registers conflicts here. (ix86_set_func_type): Check no_callee_saved_registers before checking no_caller_saved_registers attribute. (ix86_set_current_function): Allow SSE with no_caller_saved_registers attribute. (ix86_handle_call_saved_registers_attribute): Check preserve_none, no_callee_saved_registers and no_caller_saved_registers conflicts. (ix86_gnu_attributes): Add preserve_none attribute. * config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p): New. * config/i386/i386.cc (x86_64_preserve_none_int_parameter_registers): New. (ix86_using_red_zone): Don't use red-zone when there are no caller-saved registers with SSE. (ix86_type_no_callee_saved_registers_p): New. (ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE and call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. (ix86_comp_type_attributes): Call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. Return 0 if preserve_none attribute doesn't match in 64-bit mode. (ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE, use x86_64_preserve_none_int_parameter_registers. (init_cumulative_args): Set preserve_none_abi. (function_arg_64): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. (setup_incoming_varargs_64): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. (ix86_save_reg): Treat TYPE_PRESERVE_NONE like TYPE_NO_CALLEE_SAVED_REGISTERS. (ix86_nsaved_sseregs): Allow saving XMM registers for no_caller_saved_registers attribute. (ix86_compute_frame_layout): Likewise. (x86_this_parameter): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. * config/i386/i386.h (ix86_args): Add preserve_none_abi. (call_saved_registers_type): Add TYPE_PRESERVE_NONE. (machine_function): Change call_saved_registers to 3 bits. * doc/extend.texi: Add preserve_none attribute. Update no_caller_saved_registers attribute to remove -mgeneral-regs-only restriction. gcc/testsuite/ PR target/119628 * gcc.target/i386/no-callee-saved-3.c: Adjust error location. * gcc.target/i386/no-callee-saved-19a.c: New test. * gcc.target/i386/no-callee-saved-19b.c: Likewise. * gcc.target/i386/no-callee-saved-19c.c: Likewise. * gcc.target/i386/no-callee-saved-19d.c: Likewise. * gcc.target/i386/no-callee-saved-19e.c: Likewise. * gcc.target/i386/preserve-none-1.c: Likewise. * gcc.target/i386/preserve-none-2.c: Likewise. * gcc.target/i386/preserve-none-3.c: Likewise. * gcc.target/i386/preserve-none-4.c: Likewise. * gcc.target/i386/preserve-none-5.c: Likewise. * gcc.target/i386/preserve-none-6.c: Likewise. * gcc.target/i386/preserve-none-7.c: Likewise. * gcc.target/i386/preserve-none-8.c: Likewise. * gcc.target/i386/preserve-none-9.c: Likewise. * gcc.target/i386/preserve-none-10.c: Likewise. * gcc.target/i386/preserve-none-11.c: Likewise. * gcc.target/i386/preserve-none-12.c: Likewise. * gcc.target/i386/preserve-none-13.c: Likewise. * gcc.target/i386/preserve-none-14.c: Likewise. * gcc.target/i386/preserve-none-15.c: Likewise. * gcc.target/i386/preserve-none-16.c: Likewise. * gcc.target/i386/preserve-none-17.c: Likewise. * gcc.target/i386/preserve-none-19.c: Likewise. * gcc.target/i386/preserve-none-19.c: Likewise. * gcc.target/i386/preserve-none-20.c: Likewise. * gcc.target/i386/preserve-none-21.c: Likewise. * gcc.target/i386/preserve-none-22.c: Likewise. * gcc.target/i386/preserve-none-23.c: Likewise. * gcc.target/i386/preserve-none-24.c: Likewise. * gcc.target/i386/preserve-none-25.c: Likewise. * gcc.target/i386/preserve-none-26.c: Likewise. * gcc.target/i386/preserve-none-27.c: Likewise. * gcc.target/i386/preserve-none-28.c: Likewise. * gcc.target/i386/preserve-none-29.c: Likewise. * gcc.target/i386/preserve-none-30a.c: Likewise. * gcc.target/i386/preserve-none-30b.c: Likewise. -- H.J. From e8929476ee4e1499a631d569914e4f0c54881fd9 Mon
Re: [PATCH 11/17] tree-vect-stmts.cc: Remove an unused shadowed variable
> Am 25.06.2025 um 16:26 schrieb Martin Jambor : > > Hi, > > when compiling tree-vect-stmts.cc with clang, it emits a warning: > > gcc/tree-vect-stmts.cc:14930:19: warning: unused variable 'mode_iter' > [-Wunused-variable] > > And indeed, there are two mode_iter local variables in function > supportable_indirect_convert_operation and the first one is not used > at all. This patch removes it. > > Bootstrapped and tested on x86_64-linx. OK for master? Ok Richard > Thanks, > > Martin > > > gcc/ChangeLog: > > 2025-06-24 Martin Jambor > >* tree-vect-stmts.cc (supportable_indirect_convert_operation): >Remove an unused shadowed variable. > --- > gcc/tree-vect-stmts.cc | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index f699d808e68..652c590e553 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -14927,7 +14927,6 @@ supportable_indirect_convert_operation (code_helper > code, > bool found_mode = false; > scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_out)); > scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (vectype_in)); > - opt_scalar_mode mode_iter; > tree_code tc1, tc2, code1, code2; > > tree cvt_type = NULL_TREE; > -- > 2.49.0 >
[PATCH] tree-optimization/120808 - SLP build with mixed .FMA/.FMS
The following allows SLP build to succeed when mixing .FMA/.FMS in different lanes like we handle mixed plus/minus. This does not yet address SLP pattern matching to not being able to form a FMADDSUB from this. Bootstrapped and tested on x86_64-unknown-linux-gnu. While the testcases are x86 specific I've kept them in vect/ with the hope that we'd get better general dejagnu target_fma handling... PR tree-optimization/120808 * tree-vectorizer.h (compatible_calls_p): Add flag to indicate a FMA/FMS pair is allowed. * tree-vect-slp.cc (compatible_calls_p): Likewise. (vect_build_slp_tree_1): Allow mixed .FMA/.FMS as two-operator. (vect_build_slp_tree_2): Handle calls in two-operator SLP build. * tree-vect-slp-patterns.cc (compatible_complex_nodes_p): Adjust. * gcc.dg/vect/bb-slp-pr120808.c: New testcase. --- gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c | 12 + gcc/tree-vect-slp-patterns.cc | 2 +- gcc/tree-vect-slp.cc| 52 ++--- gcc/tree-vectorizer.h | 2 +- 4 files changed, 50 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c new file mode 100644 index 000..c334d6ad8d3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr120808.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-ffp-contract=on" } */ +/* { dg-additional-options "-mfma" { target { x86_64-*-* i?86-*-* } } } */ + +void f(double x[restrict], double *y, double *z) +{ +x[0] = x[0] * y[0] + z[0]; +x[1] = x[1] * y[1] - z[1]; +} + +/* The following should check for SLP build covering the loads. */ +/* { dg-final { scan-tree-dump "transform load" "slp2" { target { x86_64-*-* i?86-*-* } } } } */ diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc index c0dff90d9ba..24ae203e6ff 100644 --- a/gcc/tree-vect-slp-patterns.cc +++ b/gcc/tree-vect-slp-patterns.cc @@ -786,7 +786,7 @@ compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache, if (is_gimple_call (a_stmt)) { if (!compatible_calls_p (dyn_cast (a_stmt), -dyn_cast (b_stmt))) +dyn_cast (b_stmt), false)) return false; } else if (!is_gimple_assign (a_stmt)) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 9f0cb978a5a..155da099d95 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -990,13 +990,18 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, to be combined into the same SLP group. */ bool -compatible_calls_p (gcall *call1, gcall *call2) +compatible_calls_p (gcall *call1, gcall *call2, bool allow_two_operators) { unsigned int nargs = gimple_call_num_args (call1); if (nargs != gimple_call_num_args (call2)) return false; - if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2)) + auto cfn1 = gimple_call_combined_fn (call1); + auto cfn2 = gimple_call_combined_fn (call2); + if (cfn1 != cfn2 + && (!allow_two_operators + || !((cfn1 == CFN_FMA || cfn1 == CFN_FMS) + && (cfn2 == CFN_FMA || cfn2 == CFN_FMS return false; if (gimple_call_internal_p (call1)) @@ -1358,10 +1363,14 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, || rhs_code != IMAGPART_EXPR) /* Handle mismatches in plus/minus by computing both and merging the results. */ - && !((first_stmt_code == PLUS_EXPR -|| first_stmt_code == MINUS_EXPR) - && (alt_stmt_code == PLUS_EXPR - || alt_stmt_code == MINUS_EXPR) + && !first_stmt_code == PLUS_EXPR + || first_stmt_code == MINUS_EXPR) + && (alt_stmt_code == PLUS_EXPR + || alt_stmt_code == MINUS_EXPR)) +|| ((first_stmt_code == CFN_FMA + || first_stmt_code == CFN_FMS) +&& (alt_stmt_code == CFN_FMA +|| alt_stmt_code == CFN_FMS))) && rhs_code == alt_stmt_code) && !(first_stmt_code.is_tree_code () && rhs_code.is_tree_code () @@ -1410,7 +1419,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, { if (!is_a (stmts[0]->stmt) || !compatible_calls_p (as_a (stmts[0]->stmt), - call_stmt)) + call_stmt, true)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -3059,24 +3068,35 @@ fail: SLP_TREE_CODE (node) = VEC_PERM_EXPR;
[Patch, Fortran, Coarray, PR88076, v1] 4/6 Add a shared memory multi process coarray library.
Hi all, fix incorrect declarations in the libcaf.h header and use the correct printf function when printing a va_list. (The latter is stripped into a separate file by the next patch of this series.) Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From b4bdfd44ee3d1658eb67ef1a4cdf0de91b50386a Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Wed, 18 Jun 2025 09:23:32 +0200 Subject: [PATCH 4/6] Fortran: Fix signatures of coarray API and caf_single. The teams argument to some functions was marked as unused in the header. With upcoming caf_shmem this is incorrect, given the mark is repeated in caf_single. libgfortran/ChangeLog: * caf/libcaf.h (_gfortran_caf_failed_images): Team attribute is used now in some libs. (_gfortran_caf_image_status): Same. (_gfortran_caf_stopped_images): Same. * caf/single.c (caf_internal_error): Use correct printf function to handle va_list. --- libgfortran/caf/libcaf.h | 9 +++-- libgfortran/caf/single.c | 2 +- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/libgfortran/caf/libcaf.h b/libgfortran/caf/libcaf.h index 7267bc76905..81549f9b980 100644 --- a/libgfortran/caf/libcaf.h +++ b/libgfortran/caf/libcaf.h @@ -175,12 +175,9 @@ void _gfortran_caf_event_post (caf_token_t, size_t, int, int *, char *, size_t); void _gfortran_caf_event_wait (caf_token_t, size_t, int, int *, char *, size_t); void _gfortran_caf_event_query (caf_token_t, size_t, int, int *, int *); -void _gfortran_caf_failed_images (gfc_descriptor_t *, - caf_team_t * __attribute__ ((unused)), int *); -int _gfortran_caf_image_status (int, caf_team_t * __attribute__ ((unused))); -void _gfortran_caf_stopped_images (gfc_descriptor_t *, - caf_team_t * __attribute__ ((unused)), - int *); +void _gfortran_caf_failed_images (gfc_descriptor_t *, caf_team_t *, int *); +int _gfortran_caf_image_status (int, caf_team_t *); +void _gfortran_caf_stopped_images (gfc_descriptor_t *, caf_team_t *, int *); void _gfortran_caf_random_init (bool, bool); diff --git a/libgfortran/caf/single.c b/libgfortran/caf/single.c index 97876fa9d8c..a6576f28260 100644 --- a/libgfortran/caf/single.c +++ b/libgfortran/caf/single.c @@ -129,7 +129,7 @@ caf_internal_error (const char *msg, int *stat, char *errmsg, *stat = 1; if (errmsg_len > 0) { - int len = snprintf (errmsg, errmsg_len, msg, args); + int len = vsnprintf (errmsg, errmsg_len, msg, args); if (len >= 0 && errmsg_len > (size_t) len) memset (&errmsg[len], ' ', errmsg_len - len); } -- 2.49.0
Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language
I posted this on the LLVM Discourse forum[1] and got some traction, so I want to get the GCC community's input. (My initial proposal is replicated here.) I had already mentioned this in previous emails in this thread, so it's nothing super new, and there have been some suggested improvements already. Parts of this reference a meeting that took place between the LLVM developers and some non-LLVM developers. The meeting mostly explained the issues regarding the "compromise" from this thread and how it interacts (poorly) with C++, and vice versa. There was a lengthy discussion after this proposal. Please take a look and let me know what you think. -bw [1] https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885/32?u=void -- I’ve been putting off pushing this proposal, because it is a departure from what Apple has done and added a lot of extra syntax for this feature, but I think it’s appropriate right now. The main issue at play is that C and C++ are two very different languages. The scoping rules are completely different making name resolution not work in one language without jumping through non-obvious hoops. This was made clear in @rapidsna’s presentation last week. Making matters worse is that GCC (and other) compilers perform one pass parsing for C, making forward declarations necessary. The forward declarations, while solving many issues, have their own issues. Other solutions at play require changes to the base languages, which require approval by the standards committee. Even if the full struct was declared before the expression in the attribute was defined, there would still be issues, due to one example from @rapidsna’s presentation [as pointed out by Joseph Jelinek]: typedef int T; struct foo { int T; int U; int * __counted_by_expr(int T; (T)+U) buf; // Addition or cast? }; Given this, I want to propose using functions / static methods for expressions. The function takes one and only one argument: a "this" pointer to the least enclosing non-anonymous struct. The call to the function is generated by the compiler, so no argument the attribute only needs to indicate the function’s name. This avoids the need to add a new __builtin_* or __self element to C. * The function needs to be declared before use in C. (It can be fully defined if no fields within the struct are used.) * The function should be static and marked as pure (and maybe always_inline). * The function in C++ should be private or protected. C example: static size_t calc_counted_by(void *); struct foo { /* ... */ struct bar { int * __counted_by_expr(calc_counted_by) buf; int count; int scale; }; }; enum { OFFSET = 42 }; // The function could be marked with the 'pure' attribute. static size_t __pure calc_counted_by(void *p) { struct bar *ptr = (struct foo *)p; return ptr->count * ptr->scale - OFFSET; } C++ example: struct foo { enum { OFFSET = 42 }; struct bar { int * __counted_by_expr(calc_counted_by) buf; private: static size_t __pure calc_counted_by(struct bar *ptr) { return ptr->count * ptr->scale - OFFSET; } public: int count; int scale; }; }; Pros 1. This uses the current language without any modifications to scoping or requiring feature additions that need to be approved by the standards committee. All compilers should be able to implement them without major modifications. 2. Name lookup is no longer a problem, so there isn’t a need for forward declarations or trying to determine which scope to use in various circumstances. 3. In the general case where the full struct is pass into the calculating function, both C and C++ parse the code in the same way. In the C example above, it would need to be modified to this: static size_t __pure calc_counted_by(void *p) { #ifdef __cplusplus foo::bar *ptr = static_cast(p); #else struct bar *ptr = (struct bar *)p; #endif return ptr->count * ptr->scale - OFFSET; } This format can be extended to other languages if need be. Cons 1 It’s wordy, which may make it unappealing to users. 2 The #ifdef __cplusplus ... #endif usage above is wordy and a bit awkward. 3 Importantly, it’s harder for Apple’s bounds safety work to analyze the fields used within the expression. 4. Apple and their users already use the current syntax. For (1), that’s an unfortunate outcome of this feature. There may be ways to reduce the amount of code that needs to be written, but the above is a good starting place. [Note: Kees came up with a way to avoid the forward declaration of the function---have the compiler generate the forward declaration with a set declaration syntax: e.g. static __pure size_t size_calculation(struct foo *);] For (2), the rule about using the least enclosing non-anonymous struct could be loosened and the whole struct passed in. The user has full control over which fields to use. For (3), it’s harder to get the expression because it
Re: [PATCH 1/4] c++: Add flag to detect underlying representative of bitfield decls
On 5/21/25 10:14 PM, Nathaniel Shead wrote: This patch isn't currently necessary with how I've currently done the follow-up patches, but is needed for avoiding any potential issues in the future with DECL_CONTEXT'ful types getting created in the compiler with no names on the fields. (For instance, this change would make much of r15-7342-gd3627c78be116e unnecessary.) It does take up another flag though in the frontend though. Another possible approach would be to instead do a walk through all the fields first to see if this is the target of a DECL_BIT_FIELD_REPRESENTATIVE; thoughts? Or would you prefer to skip this patch entirely? It seems like the only way to reach such a FIELD_DECL is through DECL_BIT_FIELD_REPRESENTATIVE, so we ought to be able to use that without adding another walk? -- >8 -- Modules streaming needs to handle these differently from other unnamed FIELD_DECLs that are streamed for internal RECORD_DECLs, and there doesn't seem to be a good way to detect this case otherwise. gcc/cp/ChangeLog: * module.cc (trees_out::get_merge_kind): Use new flag. gcc/ChangeLog: * stor-layout.cc (start_bitfield_representative): Mark with DECL_BIT_FIELD_UNDERLYING_REPR_P. * tree-core.h (struct tree_decl_common): Add comment. * tree.h (DECL_BIT_FIELD_UNDERLYING_REPR_P): New accessor. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc | 4 +--- gcc/stor-layout.cc | 1 + gcc/tree-core.h| 1 + gcc/tree.h | 5 + 4 files changed, 8 insertions(+), 3 deletions(-) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 13f8770b7bd..99cbfdbf01d 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -11131,9 +11131,7 @@ trees_out::get_merge_kind (tree decl, depset *dep) return MK_named; } - if (!DECL_NAME (decl) - && !RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl)) - && !DECL_BIT_FIELD_REPRESENTATIVE (decl)) + if (!DECL_NAME (decl) && DECL_BIT_FIELD_UNDERLYING_REPR_P (decl)) { /* The underlying storage unit for a bitfield. We do not need to dedup it, because it's only reachable through diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index 12071c96ca7..1f37a130e24 100644 --- a/gcc/stor-layout.cc +++ b/gcc/stor-layout.cc @@ -2067,6 +2067,7 @@ static tree start_bitfield_representative (tree field) { tree repr = make_node (FIELD_DECL); + DECL_BIT_FIELD_UNDERLYING_REPR_P (repr) = 1; DECL_FIELD_OFFSET (repr) = DECL_FIELD_OFFSET (field); /* Force the representative to begin at a BITS_PER_UNIT aligned boundary - C++ may use tail-padding of a base object to diff --git a/gcc/tree-core.h b/gcc/tree-core.h index bd19c99d326..2e773d7bf83 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1911,6 +1911,7 @@ struct GTY(()) tree_decl_common { unsigned decl_read_flag : 1; /* In a VAR_DECL or RESULT_DECL, this is DECL_NONSHAREABLE. */ /* In a PARM_DECL, this is DECL_HIDDEN_STRING_LENGTH. */ + /* In a FIELD_DECL, this is DECL_BIT_FIELD_UNDERLYING_REPR_P. */ unsigned decl_nonshareable_flag : 1; /* DECL_OFFSET_ALIGN, used only for FIELD_DECLs. */ diff --git a/gcc/tree.h b/gcc/tree.h index 99f26177628..0d876234824 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -3085,6 +3085,11 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_BIT_FIELD_REPRESENTATIVE(NODE) \ (FIELD_DECL_CHECK (NODE)->field_decl.qualifier) +/* In a FIELD_DECL of a RECORD_TYPE, this indicates whether the field + is used as the underlying storage unit for a bitfield. */ +#define DECL_BIT_FIELD_UNDERLYING_REPR_P(NODE) \ + (FIELD_DECL_CHECK (NODE)->decl_common.decl_nonshareable_flag) + /* For a FIELD_DECL in a QUAL_UNION_TYPE, records the expression, which if nonzero, indicates that the field occupies the type. */ #define DECL_QUALIFIER(NODE) (FIELD_DECL_CHECK (NODE)->field_decl.qualifier)
Re: [PATCH] c++: fix ICE with [[deprecated]] [PR120756]
On 6/25/25 1:28 PM, Marek Polacek wrote: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/branches? OK. -- >8 -- Here we end up with "error reporting routines re-entered" because resolve_nondeduced_context isn't passing complain to mark_used. PR c++/120756 gcc/cp/ChangeLog: * pt.cc (resolve_nondeduced_context): Pass complain to mark_used. gcc/testsuite/ChangeLog: * g++.dg/warn/deprecated-22.C: New test. --- gcc/cp/pt.cc | 2 +- gcc/testsuite/g++.dg/warn/deprecated-22.C | 13 + 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/warn/deprecated-22.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index deb0106b158..18ad2d07c4f 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -24604,7 +24604,7 @@ resolve_nondeduced_context (tree orig_expr, tsubst_flags_t complain) } if (good == 1) { - mark_used (goodfn); + mark_used (goodfn, complain); expr = goodfn; if (baselink) expr = build_baselink (BASELINK_BINFO (baselink), diff --git a/gcc/testsuite/g++.dg/warn/deprecated-22.C b/gcc/testsuite/g++.dg/warn/deprecated-22.C new file mode 100644 index 000..60ee607f717 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/deprecated-22.C @@ -0,0 +1,13 @@ +// PR c++/120756 +// { dg-do compile { target c++11 } } + +struct A { +template [[deprecated]] void foo (); +}; + +template [[deprecated]] auto bar () -> decltype (&A::foo); + +void foo () +{ + bar<0> (); // { dg-warning "deprecated" } +} base-commit: 5aca8510abea6c3fac3336a7445863db07fd4a06
Re: [PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted
Christoph Müllner writes: > insn_info::has_been_deleted () is documented to return true if an > instruction is deleted. Such instructions have their `volatile` bit set, > which can be tested via rtx_insn::deleted (). > > The current condition for insn_info::has_been_deleted () is: > * m_rtl is not NULL: this can't happen as no member of insn_info > changes this pointer. Yeah, it's invariant after creation, but it starts off null for some artificial instructions: // Return the underlying RTL insn. This instruction is null if is_phi () // or is_bb_end () are true. The instruction is a basic block note if // is_bb_head () is true. rtx_insn *rtl () const { return m_rtl; } So I think we should keep the null check. (But then is_call and is_jump should check m_rtl is nonnull too -- that's preapproved if you want to do it while you're here.) > * !INSN_P (m_rtl): this will likely fail for rtx_insn objects and > does not test the `volatile` bit. Because of the need to stage multiple simultaneous changes, rtl-ssa first uses set_insn_deleted to convert an insn to a NOTE_INSN_DELETED note, then uses remove_insn to remove the underlying instruction. It doesn't use delete_insn directly. The call to remove_insn is fairly recent; the original code just used set_insn_deleted, but not removing the notes caused trouble for later passes. The test was therefore supposed to be checking whether set_insn_deleted had been called. It should also have checked the note kind though. However, I agree that testing the deleted flag would be better. For that to work, we'd need to set the deleted flag here: if (rtx_insn *rtl = insn->rtl ()) ::remove_insn (rtl); // Remove the underlying RTL insn. as well as calling remove_insn. Alternatively (and better), we could try converting ::remove_insn to ::delete_insn. Thanks, Richard > This patch drops these conditions and calls m_rtl->deleted () instead. > > The impact of this change is minimal as insn_info::has_been_deleted > is only called in insn_info::print_full. > > Bootstrapped and regtested x86_64-linux. > > gcc/ChangeLog: > > * rtl-ssa/insns.h: Fix implementation of has_been_deleted (). > > Signed-off-by: Christoph Müllner > --- > gcc/rtl-ssa/insns.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/rtl-ssa/insns.h b/gcc/rtl-ssa/insns.h > index d89dfc5c3f66..bb3f52efa83a 100644 > --- a/gcc/rtl-ssa/insns.h > +++ b/gcc/rtl-ssa/insns.h > @@ -186,7 +186,7 @@ public: >// Return true if the instruction was a real instruction but has now >// been deleted. In this case the instruction is no longer part of >// the SSA information. > - bool has_been_deleted () const { return m_rtl && !INSN_P (m_rtl); } > + bool has_been_deleted () const { return m_rtl->deleted (); } > >// Return true if the instruction is a debug instruction (and thus >// also a real instruction).
[PATCH v7 0/3] extend "counted_by" attribute to pointer fields of structures
This is the 7th version of the patch set to extend "counted_by" attribute to pointer fields of structures. The C FE parts (patch #1 and #3) of the 5th version have been approved by Joseph already (with a minor typo fix, which is included in this new version); The middle end part (patch #2) of the 6th version was reviewed by Sid and Richard, Sid raised several format issues in testing cases, and Richard raised one issue in tree-object-size.cc. In this 7th version, I fixed all the format issues in testing cases and also the one issue in tree-object-size.cc raised by Richard. The whole patch set has been bootstrapped and regression tested on both aarch64 and x86. Okay for trunk? Thanks a lot. Qing
Re: [PATCH v2 1/4] RISC-V: Add support for xtheadvector unit-stride segment load/store intrinsics
> From:Kito Cheng > Send Time:2025 Jun. 19 (Thu.) 15:08 > To:yunzezhu; Jeff Law > CC:"gcc-patches" > Subject:Re: [PATCH v2 1/4] RISC-V: Add support for xtheadvector unit-stride > segment load/store intrinsics > > Hi YunZe: > > Generally I am open minded to accept vendor extensions, however this > patch set really introduces too much pattern... > > - NUM_INSN_CODES (defined in insn-codes.h) become 83625 from 48573. (+72%) > - Total line of insn-emit-*.cc becomes 1749362 from 1055750. (+65%) > - Total line of insn-recog-*.cc becomes 1018407 from 670185 (+51%) > > Also I believe that may also increase a lot of build time on native > RISC-V environment, (I didn't measure that yet, but most generated > insn-*.cc files grow a lot). > > So sorry, I have to say no this time. Hi Kito: Thanks for reviewing and apologies for disturbing your work. I'm so sorry I made some mistakes that generates large amount of unnecessary patterns. I removed these unnecessary patterns, and modified insn patterns to make them requiring less patterns in v3 patches. This shall reduce patterns generated in insn-codes.h and insn-*.cc files. I tested v3 patches locally and here is data comparing to origin gcc: - NUM_INSN_CODES become 51547 from 48573. (+5.83%) - Total line of insn-emit-*.cc becomes 1113703 from 1055750. (+4.98%) - Total line of insn-recog-*.cc becomes 700017 from 670185 (+3.82%) I hope these patches satisfies requirments now. Thanks! Best regards, Yunze Zhu
Re: [PATCH v3] x86: Update memcpy/memset inline strategies for -mtune=generic
> Here is the v3 patch. It no longer uses "rep mov/stos". Lili, can you > measure > its performance impact on Intel and AMD cpus? > > The updated generic has > > Update memcpy and memset inline strategies for -mtune=generic: > > 1. Don't align memory. This looks OK to me (recent microarchs seems to be good on handling misaligned accesses in most cases, though we always risk partial memory stalls then). > 2. For known sizes, unroll loop with 4 moves or stores per iteration >without aligning the loop, up to 256 bytes. Main reason why limit was bigger is situation where we know the expected size of the block copied from profile feedback or we have small upper bound. Calling mempcy means spilling all SSE registers to stack and increasing integer regsiter pressure, too, which may be counter productive and I do not think it is caught by the benchmarking done I hacked the following micro-benchmark #include #include #include int width = 1024, height = 1024; char *buffer1; char *buffer2; __attribute__ ((noipa)) void copy_triangle (int width, int from, int to, int start, float slope1, float slope2) { for (int i = from; i < to; i++) { memcpy (buffer1 + start + (int)((i-from) * slope1) + i * width, buffer2 + start + (int)((i-from) * slope1) + i * width, (int)((i-from)*(slope2-slope1))); } } int main() { buffer1 = malloc (width *height); buffer2 = malloc (width *height); for (int i = 0; i < 10; i++) copy_triangle (width, 0, 255, 0, 0, 1); } which copies triangle of given size from buffer1 to buffer2. With profile feedback we know the expected size of block and use the table to inline memcpy. It has two read-only values in xmm registers which needs to be reloaded from memory if libgcall is used. For two values it seems that for triangles of size 255 it is already win to use memcpy, for smaller ones it is better to use inline sequence (can be tested by copiling -O2 wrt -O2 -minline-all-stringops). Of course one can modify the benchmark to use more XMM registers and the tradeoffs will change, but it is hard to guess regiser pessure at the expansion time Situation is also likely different for kernel due to mitigations making memcpy call expensive. I wonder what kind of benefits you see for going from 8k to 256 bytes here? I also wonder if inline sequences can be iproved though. It seems that the offline memcpy for blocks >128 already benefits from doing vector moves... > 3. For unknown sizes, use memcpy/memset. > 4. Since each loop iteration has 4 stores and 8 stores for zeroing with >unroll loop may be needed, change CLEAR_RATIO to 10 so that zeroing >up to 72 bytes are fully unrolled with 9 stores without SSE. I guess it is OK. I sitll think we ought to sovle the code bloat due to repreated 4-byte $0 immediate, but hope we can do that incrementally. > > Use move_by_pieces and store_by_pieces for memcpy and memset epilogues > with the fixed epilogue size to enable overlapping moves and stores. > > gcc/ > > PR target/102294 > PR target/119596 > PR target/119703 > PR target/119704 > * builtins.cc (builtin_memset_gen_str): Make it global. > * builtins.h (builtin_memset_gen_str): New. > * config/i386/i386-expand.cc (expand_cpymem_epilogue): Use > move_by_pieces. > (expand_setmem_epilogue): Use store_by_pieces. > (ix86_expand_set_or_cpymem): Pass val_exp, instead of > vec_promoted_val, to expand_setmem_epilogue. > * config/i386/x86-tune-costs.h (generic_memcpy): Updated. > (generic_memset): Likewise. > (generic_cost): Change CLEAR_RATIO to 10. > > gcc/testsuite/ > > PR target/102294 > PR target/119596 > PR target/119703 > PR target/119704 > * gcc.target/i386/auto-init-padding-3.c: Expect XMM stores. > * gcc.target/i386/auto-init-padding-9.c: Expect loop. > * gcc.target/i386/memcpy-strategy-12.c: New test. > * gcc.target/i386/memcpy-strategy-13.c: Likewise. > * gcc.target/i386/memset-strategy-25.c: Likewise. > * gcc.target/i386/memset-strategy-26.c: Likewise. > * gcc.target/i386/memset-strategy-27.c: Likewise. > * gcc.target/i386/memset-strategy-28.c: Likewise. > * gcc.target/i386/memset-strategy-29.c: Likewise. > * gcc.target/i386/memset-strategy-30.c: Likewise. > * gcc.target/i386/memset-strategy-31.c: Likewise. > * gcc.target/i386/mvc17.c: Fail with "rep mov" > * gcc.target/i386/pr111657-1.c: Scan for unrolled loop. Fail > with "rep mov". > * gcc.target/i386/shrink_wrap_1.c: Also pass > -mmemset-strategy=rep_8byte:-1:align. > * gcc.target/i386/sw-1.c: Also pass -mstringop-strategy=rep_byte. > > > -- > H.J. > From bcd7245314d3ba4eb55e9ea2bc0b7d165834f5b6 Mon Sep 17 00:00:00 2001 > From: "H.J. Lu" > Date: Thu, 18 Mar 2021 18:43:10 -0700 > Subject: [PATCH v3] x86: Update memcpy/memset inline strategies for > -mtune=generic > > Update memcpy and memset inline strategies for -mtune=generic: > > 1. Don't align memory. > 2. For known sizes, unroll loop with 4 moves
[Patch, Fortran, Coarray, PR88076, v1] 2/6 Add a shared memory multi process coarray library.
Hi all, this patch fixes handling of optional arguments to coarray routines. Again I stumbled over this while implementing caf_shmem. I did not find a ticket either. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de From 0b2f1d072d2131e341628648df20ebedefb5c5d1 Mon Sep 17 00:00:00 2001 From: Andre Vehreschild Date: Wed, 18 Jun 2025 09:21:16 +0200 Subject: [PATCH 2/6] Fortran: Small fixes of coarray routines handling and code gen. gcc/fortran/ChangeLog: * check.cc (gfc_check_image_status): Fix argument index of team= argument for correct error message. * trans-intrinsic.cc (conv_intrinsic_image_status): Team= argument is optional and is a pointer to the team handle. * trans-stmt.cc (gfc_trans_sync): Make images argument also a dereferencable pointer. But treat errmsg as a pointer to a char array like in all other functions. gcc/testsuite/ChangeLog: * gfortran.dg/coarray_sync_memory.f90: Adapt grep pattern for msg being only &msg. --- gcc/fortran/check.cc | 2 +- gcc/fortran/trans-intrinsic.cc| 6 +- gcc/fortran/trans-stmt.cc | 7 +-- gcc/testsuite/gfortran.dg/coarray_sync_memory.f90 | 4 ++-- 4 files changed, 13 insertions(+), 6 deletions(-) diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc index a4040cae53a..3446c88b501 100644 --- a/gcc/fortran/check.cc +++ b/gcc/fortran/check.cc @@ -1835,7 +1835,7 @@ gfc_check_image_status (gfc_expr *image, gfc_expr *team) || !positive_check (0, image)) return false; - return !team || (scalar_check (team, 0) && team_type_check (team, 0)); + return !team || (scalar_check (team, 1) && team_type_check (team, 1)); } diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc index fce5ee28de8..03007f1d244 100644 --- a/gcc/fortran/trans-intrinsic.cc +++ b/gcc/fortran/trans-intrinsic.cc @@ -2073,9 +2073,13 @@ conv_intrinsic_image_status (gfc_se *se, gfc_expr *expr) GFC_STAT_STOPPED_IMAGE)); } else if (flag_coarray == GFC_FCOARRAY_LIB) +/* The team is optional and therefore needs to be a pointer to the opaque + pointer. */ tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_image_status, 2, args[0], - num_args < 2 ? null_pointer_node : args[1]); + num_args < 2 + ? null_pointer_node + : gfc_build_addr_expr (NULL_TREE, args[1])); else gcc_unreachable (); diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc index 487b7687ef1..be6f69c0d1f 100644 --- a/gcc/fortran/trans-stmt.cc +++ b/gcc/fortran/trans-stmt.cc @@ -1292,7 +1292,8 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type) { gfc_init_se (&argse, NULL); gfc_conv_expr_val (&argse, code->expr1); - images = argse.expr; + images = gfc_trans_force_lval (&argse.pre, argse.expr); + gfc_add_block_to_block (&se.pre, &argse.pre); } if (code->expr2) @@ -1302,6 +1303,7 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type) gfc_init_se (&argse, NULL); gfc_conv_expr_val (&argse, code->expr2); stat = argse.expr; + gfc_add_block_to_block (&se.pre, &argse.pre); } else stat = null_pointer_node; @@ -1314,8 +1316,9 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type) argse.want_pointer = 1; gfc_conv_expr (&argse, code->expr3); gfc_conv_string_parameter (&argse); - errmsg = gfc_build_addr_expr (NULL, argse.expr); + errmsg = argse.expr; errmsglen = fold_convert (size_type_node, argse.string_length); + gfc_add_block_to_block (&se.pre, &argse.pre); } else if (flag_coarray == GFC_FCOARRAY_LIB) { diff --git a/gcc/testsuite/gfortran.dg/coarray_sync_memory.f90 b/gcc/testsuite/gfortran.dg/coarray_sync_memory.f90 index c4e660b8cf7..0030d91257d 100644 --- a/gcc/testsuite/gfortran.dg/coarray_sync_memory.f90 +++ b/gcc/testsuite/gfortran.dg/coarray_sync_memory.f90 @@ -14,5 +14,5 @@ end ! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(0B, 0B, 0\\);" 1 "original" } } ! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(&stat, 0B, 0\\);" 1 "original" } } -! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(0B, &&msg, 42\\);" 1 "original" } } -! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(&stat, &&msg, 42\\);" 1 "original" } } +! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(0B, &msg, 42\\);" 1 "original" } } +! { dg-final { scan-tree-dump-times "_gfortran_caf_sync_memory \\(&stat, &msg, 42\\);" 1 "original" } } -- 2.49.0
Re: Do not drop discriminator when inlining
> > What seems to be common now is profile breakage around loops that has > > been fully unrolled or vectorized which is bit undderstandbale thought I > > wonder if we can improve here. I think we can fix problem where profile > > of loop header stmts is partly or fully lost (which seems to be main > > issue now that prevents loop optimization since then loop headers looks > > cold). I suppose this can be fixed by making sure the debug statement > > is duplicated into the loop variants. > > There's Alex's series as well waiting on review which fixes profile > information with early-exit (PR117790): > https://inbox.sourceware.org/gcc-patches/adctfxjzqewre...@arm.com/ I know of it and I was replying to the question about the inconsistent profile handling this week too. I do apologize for taking so long - I tought this was already approved, but it got stuck on that special case. Alex, is there something else I should look into? I over-planned last semester but should be more in regular scheduel again. Profile updating patches are really welcome. It is a bit of an independent issue. Alex's profile updating solves "forward" problem: you know profile before vectorization and you need to turn it into a profile after vectorization. Auto-profile is working in a reverse direction. We have sampled executoun counts counts of individual (real or debug) statements after optimization done to the train run. Now we need to produce CFG profile for the feedback build for CFG is not optimized yet. This is kind of fun problem by itself and can be useful to detect situaitons where we forget to update debug statements correctly. Honza > > sam
[PATCH] ivopts: Change constant_multiple_of to expand aff nodes.
Hi all, This is a small change to ivopts to expand SSA variables enabling ivopts to correctly work out when an address IV step is set to be a multiple on index step in the loop header (ie, not constant, not calculated each loop.) Seems like this might have compile speed costs that need to be considered, but I believe should be worth it. This is also required for some upcoming work for vectorization of VLA loops with iteration data dependencies. Bootstrapped and reg tested on aarch64-linux-gnu and x86_64-unknown-linux-gnu. Thanks, Alfie -- >8 -- This changes the calls to tree_to_aff_combination in constant_multiple_of to tree_to_aff_combination_expand along with associated plumbing of ivopts_data and required cache. This improves cases such as: ```c void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) { for (unsigned long i = 0; i < end; i += step) { svst1(pg, p1, svld1_s32(pg, p2)); p1 += step; p2 += step; } } ``` Where ivopts previously didn't expand the SSA variables for the step increements and so lacked the ability to group all the IV's and ended up with: ``` f: cbz x3, .L1 mov x4, 0 .L3: ld1wz31.s, p0/z, [x1] add x4, x4, x2 st1wz31.s, p0, [x0] add x1, x1, x2, lsl 2 add x0, x0, x2, lsl 2 cmp x3, x4 bhi .L3 .L1: ret ``` After this change we end up with: ``` f: cbz x3, .L1 mov x4, 0 .L3: ld1wz31.s, p0/z, [x1, x4, lsl 2] st1wz31.s, p0, [x0, x4, lsl 2] add x4, x4, x2 cmp x3, x4 bhi .L3 .L1: ret ``` gcc/ChangeLog: * tree-ssa-loop-ivopts.cc (constant_multiple_of): Change tree_to_aff_combination to tree_to_aff_combination_expand and add parameter to take ivopts_data. (get_computation_aff_1): Change parameters and calls to include ivopts_data. (get_computation_aff): Ditto. (get_computation_at) Ditto.: (get_debug_computation_at) Ditto.: (get_computation_cost) Ditto.: (rewrite_use_nonlinear_expr) Ditto.: (rewrite_use_address) Ditto.: (rewrite_use_compare) Ditto.: (remove_unused_ivs) Ditto.: gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/adr_7.c: New test. --- gcc/testsuite/gcc.target/aarch64/sve/adr_7.c | 19 ++ gcc/tree-ssa-loop-ivopts.cc | 65 +++- 2 files changed, 54 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/adr_7.c diff --git a/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c b/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c new file mode 100644 index 000..61e23bbf182 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/adr_7.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ftree-vectorize" } */ + +#include + +void f(int *p1, int *p2, unsigned long step, unsigned long end, svbool_t pg) { +for (unsigned long i = 0; i < end; i += step) { +svst1(pg, p1, svld1_s32(pg, p2)); +p1 += step; +p2 += step; +} +} + +/* { dg-final { scan-assembler-not {\tld1w\tz[0-9]+\.d, p[0-9]+/z\[x[0-9]+\.d\]} } } */ +/* { dg-final { scan-assembler-not {\tst1w\tz[0-9]+\.d, p[0-9]+/z\[x[0-9]+\.d\]} } } */ + +/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x[0-9]+, x[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-9]+/z, \[x[0-9]+, x[0-9]+, lsl 2\]} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-9]+, \[x[0-9]+, x[0-9]+, lsl 2\]} 1 } } */ diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc index 8a6726f1988..544a946ff89 100644 --- a/gcc/tree-ssa-loop-ivopts.cc +++ b/gcc/tree-ssa-loop-ivopts.cc @@ -2117,11 +2117,15 @@ idx_record_use (tree base, tree *idx, signedness of TOP and BOT. */ static bool -constant_multiple_of (tree top, tree bot, widest_int *mul) +constant_multiple_of (tree top, tree bot, widest_int *mul, + struct ivopts_data *data) { aff_tree aff_top, aff_bot; - tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); - tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); + tree_to_aff_combination_expand (top, TREE_TYPE (top), &aff_top, + &data->name_expansion_cache); + tree_to_aff_combination_expand (bot, TREE_TYPE (bot), &aff_bot, + &data->name_expansion_cache); + poly_widest_int poly_mul; if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) && poly_mul.is_constant (mul)) @@ -3945,13 +3949,14 @@ determine_common_wider_type (tree *a, tree *b) } /* Determines the expression by that USE is expressed from induction variable - CAND at statement AT in LOOP. The expression is stored in two parts in a - decomposed form. The invariant part is stored in AFF_INV; while variant -
[PATCH 04/17] ranger: Mark several member functions as final override
Hi, When GCC is built with clang, it emits warnings that several member functions of various ranger classes override a virtual function of an ancestor but are not marked with the override keyword. After inspecting the cases, I found that all these classes had other member functions marked as final override, so I added the final keyword everywhere too. In some cases other such overrides were not explicitly marked as virtual, which made formatting easier. For that reason and also for consistency, in such cases I removed the virtual keyword from the functions I marked as final override too. Bootstrapped and tested on x86_64-linx. OK for master? Alternatively, as with all of these clang warning issues, I'm perfectly happy to add an entry to contrib/filter-clang-warnings.py to ignore the warnings instead. Thanks, Martin gcc/ChangeLog: 2025-06-24 Martin Jambor * range-op-mixed.h (class operator_plus): Mark member function overflow_free_p as final override. (class operator_minus): Likewise. (class operator_mult): Likewise. * range-op-ptr.cc (class pointer_plus_operator): Mark member function lhs_op1_relation as final override. * range-op.cc (class operator_div::): Mark member functions op2_range and update_bitmask as final override. (class operator_logical_and): Mark member functions fold_range, op1_range and op2_range as final override. Remove unnecessary virtual. (class operator_logical_or): Likewise. (class operator_logical_not): Mark member functions fold_range and op1_range as final override. Remove unnecessary virtual. formatting easier. (class operator_absu): Mark member functions wi_fold as final override. --- gcc/range-op-mixed.h | 12 gcc/range-op-ptr.cc | 2 +- gcc/range-op.cc | 72 +++- 3 files changed, 44 insertions(+), 42 deletions(-) diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h index f8f18306904..567b0cdd31b 100644 --- a/gcc/range-op-mixed.h +++ b/gcc/range-op-mixed.h @@ -558,8 +558,8 @@ public: void update_bitmask (irange &r, const irange &lh, const irange &rh) const final override; - virtual bool overflow_free_p (const irange &lh, const irange &rh, - relation_trio = TRIO_VARYING) const; + bool overflow_free_p (const irange &lh, const irange &rh, + relation_trio = TRIO_VARYING) const final override; // Check compatibility of all operands. bool operand_check_p (tree t1, tree t2, tree t3) const final override { return range_compatible_p (t1, t2) && range_compatible_p (t1, t3); } @@ -634,8 +634,8 @@ public: void update_bitmask (irange &r, const irange &lh, const irange &rh) const final override; - virtual bool overflow_free_p (const irange &lh, const irange &rh, - relation_trio = TRIO_VARYING) const; + bool overflow_free_p (const irange &lh, const irange &rh, + relation_trio = TRIO_VARYING) const final override; // Check compatibility of all operands. bool operand_check_p (tree t1, tree t2, tree t3) const final override { return range_compatible_p (t1, t2) && range_compatible_p (t1, t3); } @@ -720,8 +720,8 @@ public: const REAL_VALUE_TYPE &lh_lb, const REAL_VALUE_TYPE &lh_ub, const REAL_VALUE_TYPE &rh_lb, const REAL_VALUE_TYPE &rh_ub, relation_kind kind) const final override; - virtual bool overflow_free_p (const irange &lh, const irange &rh, - relation_trio = TRIO_VARYING) const; + bool overflow_free_p (const irange &lh, const irange &rh, + relation_trio = TRIO_VARYING) const final override; // Check compatibility of all operands. bool operand_check_p (tree t1, tree t2, tree t3) const final override { return range_compatible_p (t1, t2) && range_compatible_p (t1, t3); } diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc index 6aadc9cf2c9..e0e21ad1b2a 100644 --- a/gcc/range-op-ptr.cc +++ b/gcc/range-op-ptr.cc @@ -315,7 +315,7 @@ public: virtual relation_kind lhs_op1_relation (const prange &lhs, const prange &op1, const irange &op2, - relation_kind) const; + relation_kind) const final override; void update_bitmask (prange &r, const prange &lh, const irange &rh) const { update_known_bitmask (r, POINTER_PLUS_EXPR, lh, rh); } } op_pointer_plus; diff --git a/gcc/range-op.cc b/gcc/range-op.cc index 0a3f0b6b56c..1f91066a44e 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -2455,7 +2455,7 @@ class operator_div : public cross_product_operator public: operator_div (tree_code div_kind) { m_code = div_kind; } bo
Re: [Patch, Fortran, Coarray, PR88076, v1] 0/6 Add a shared memory multi process coarray library.
On 6/24/25 11:49 PM, Andre Vehreschild wrote: Hi Jerry, thank you very much. Just try it. I can only imagine that Paul had a somehow corrupted build directory or left overs from some previous build. I am still wondering, that I got no automated mail from the build hosts, but I can imagine, that they get issues with a series of patches, that build upon each other. Just try it. The more feedback, the better. Regards, Andre On Tue, 24 Jun 2025 11:07:23 -0700 Jerry D wrote: On 6/24/25 6:09 AM, Andre Vehreschild wrote: Hi all, this series of patches (six in total) adds a new coarray backend library to libgfortran. The library uses shared memory and processes to implement running multiple images on the same node. The work is based on work started by Thomas and Nicolas Koenig. No changes to the gfortran compile part are required for this. --- snip --- Hi Andre, Thank you for this work. I have been wanting this functionality for several years! I will begin reviewing as best I can. I did see Paul's initial comment so your feedback on that would be appreciated. Best regards, Jerry I was able to apply the patches without any issues. I did see some trailing white space in a few places. In running the testsuite the test lock_1.f90 test fails, unable to link to the new library. After some brief investigation, it appears the the 64-bit version of the new library is not created or installed. I did find the 32-bit version. So something not right in the make mechanisms. Looking ahead a bit I was wondering if one could enable co-array if co-array syntax is seen at the parsing phase of the compiler, if no --fcoarray= has been seen, default it to 'single' and issue a NOTE to the user "-fcoarray=single enabled, use -fcoarray=[none, shmem, lib] to override" Regards, Jerry
[PATCH] c++: Implement C++26 P3618R0 - Allow attaching main to the global module [PR120773]
Hi! The following patch implements the P3618R0 paper by tweaking pedwarn condition, adjusting pedwarn wording, adjusting one testcase and adding 4 new ones. The paper was voted in as DR, so it isn't guarded on C++ version. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-06-24 Jakub Jelinek PR c++/120773 * decl.cc (grokfndecl): Implement C++26 P3618R0 - Allow attaching main to the global module. Only pedwarn for current_lang_name other than lang_name_cplusplus and adjust pedwarn wording. * g++.dg/parse/linkage5.C: Don't expect error on extern "C++" int main ();. * g++.dg/parse/linkage7.C: New test. * g++.dg/parse/linkage8.C: New test. * g++.dg/modules/main-2.C: New test. * g++.dg/modules/main-3.C: New test. --- gcc/cp/decl.cc.jj 2025-06-19 08:55:04.408676724 +0200 +++ gcc/cp/decl.cc 2025-06-23 17:47:13.942011687 +0200 @@ -11326,9 +11326,9 @@ grokfndecl (tree ctype, "cannot declare %<::main%> to be %qs", "consteval"); if (!publicp) error_at (location, "cannot declare %<::main%> to be static"); - if (current_lang_depth () != 0) + if (current_lang_name != lang_name_cplusplus) pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> with a" -" linkage specification"); +" linkage specification other than %<\"C++\"%>"); if (module_attach_p ()) error_at (location, "cannot attach %<::main%> to a named module"); inlinep = 0; --- gcc/testsuite/g++.dg/parse/linkage5.C.jj2024-05-22 09:11:46.979234663 +0200 +++ gcc/testsuite/g++.dg/parse/linkage5.C 2025-06-23 18:00:38.067742494 +0200 @@ -1,5 +1,6 @@ // { dg-do compile } -// The main function shall not be declared with a linkage-specification. +// The main function shall not be declared with a linkage-specification +// other than "C++". extern "C" { int main(); // { dg-error "linkage" } @@ -9,6 +10,6 @@ namespace foo { extern "C" int main(); // { dg-error "linkage" } } -extern "C++" int main(); // { dg-error "linkage" } +extern "C++" int main(); extern "C" struct S { int main(); }; // OK --- gcc/testsuite/g++.dg/parse/linkage7.C.jj2025-06-23 18:01:17.622237056 +0200 +++ gcc/testsuite/g++.dg/parse/linkage7.C 2025-06-23 18:01:32.385048426 +0200 @@ -0,0 +1,7 @@ +// { dg-do compile } +// The main function shall not be declared with a linkage-specification +// other than "C++". + +extern "C++" { + int main(); +} --- gcc/testsuite/g++.dg/parse/linkage8.C.jj2025-06-23 18:01:39.830953283 +0200 +++ gcc/testsuite/g++.dg/parse/linkage8.C 2025-06-23 18:01:57.657725492 +0200 @@ -0,0 +1,5 @@ +// { dg-do compile } +// The main function shall not be declared with a linkage-specification +// other than "C++". + +extern "C" int main(); // { dg-error "linkage" } --- gcc/testsuite/g++.dg/modules/main-2.C.jj2025-06-23 18:25:17.058941644 +0200 +++ gcc/testsuite/g++.dg/modules/main-2.C 2025-06-23 18:26:11.416253264 +0200 @@ -0,0 +1,4 @@ +// { dg-additional-options "-fmodules" } + +export module M; +extern "C++" int main() {} --- gcc/testsuite/g++.dg/modules/main-3.C.jj2025-06-23 18:26:20.393139580 +0200 +++ gcc/testsuite/g++.dg/modules/main-3.C 2025-06-23 18:26:33.190977509 +0200 @@ -0,0 +1,7 @@ +// { dg-additional-options "-fmodules" } + +export module M; +extern "C++" { + int main() {} +} + Jakub
Re: [Fortran, Patch, v1] 2/(3) Stop spending memory in coarray single mode executables.
Am 25.06.25 um 13:42 schrieb Andre Vehreschild: Hi, attached patch prevents generation of a token component in derived types, when -fcoarray=single is used. Generating the token only wastes memory. It is never even initialized nor accessed. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? This is OK. Thanks for the patch! Harald Regards, Andre
Re: [Fortran, Patch, PR120711, v1] 1/(3) Fix out of bounds access in cleanup of array constructor
Am 25.06.25 um 13:39 schrieb Andre Vehreschild: Hi all, attached patch fixes an out of bounds access in the clean up code of a concatenating array constructor. A fragment like list = [ list, something() ] lead to clean up using an offset (of the list array) that was manipulated in the loop copying the existing array elements and at the end pointing to one element past the list (after the concatenation). This fixes a 15-regression. Releases prior to 15 do not have the out of bounds access in the (non existing) clean up code. The have a memory leak instead. Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline? This looks good to me. Given the severity of the bug, do you plan to backport to 15-branch? Thanks for the patch! Harald The subject says, that there will be 3 patches. Only this one fixes the bug. The other fixes I found while hunting this issue and because they play in the general same area, I don't want to loose them. I therefore publish them in this context. Regards, Andre
[PATCH] RISC-V: update prepare_ternary_operands to handle the vector-scalar case [PR120828]
This is a followup to 92e1893e0 "RISC-V: Add patterns for vector-scalar multiply-(subtract-)accumulate" that caused an ICE in some cases where the mult operands were wrongly swapped. This patch ensures that operands are not swapped in the vector-scalar case. PR target/120828 gcc/ChangeLog: * config/riscv/riscv-v.cc (prepare_ternary_operands): Handle the vector-scalar case. --- gcc/config/riscv/riscv-v.cc | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git gcc/config/riscv/riscv-v.cc gcc/config/riscv/riscv-v.cc index 45dd9256d..a3d704e81 100644 --- gcc/config/riscv/riscv-v.cc +++ gcc/config/riscv/riscv-v.cc @@ -4723,7 +4723,7 @@ prepare_ternary_operands (rtx *ops) ops[4], ops[1], ops[6], ops[7], ops[9])); ops[5] = ops[4] = ops[0]; } - else + else if (VECTOR_MODE_P (GET_MODE (ops[2]))) { /* Swap the multiplication ops if the fallback value is the second of the two. */ @@ -4733,8 +4733,10 @@ prepare_ternary_operands (rtx *ops) /* TODO: ??? Maybe we could support splitting FMA (a, 4, b) into PLUS (ASHIFT (a, 2), b) according to uarchs. */ } - gcc_assert (rtx_equal_p (ops[5], RVV_VUNDEF (mode)) - || rtx_equal_p (ops[5], ops[2]) || rtx_equal_p (ops[5], ops[4])); + gcc_assert ( +rtx_equal_p (ops[5], RVV_VUNDEF (mode)) || rtx_equal_p (ops[5], ops[2]) +|| (!VECTOR_MODE_P (GET_MODE (ops[2])) && rtx_equal_p (ops[5], ops[3])) +|| rtx_equal_p (ops[5], ops[4])); } /* Expand VEC_MASK_LEN_{LOAD_LANES,STORE_LANES}. */ -- 2.39.5
[PATCH v7 9/9] AArch64: make rules for CBZ/TBZ higher priority
Move the rules for CBZ/TBZ to be above the rules for CBB/CBH/CB. We want them to have higher priority because they can express larger displacements. gcc/ChangeLog: * config/aarch64/aarch64.md (aarch64_cbz1): Move above rules for CBB/CBH/CB. (*aarch64_tbz1): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cmpbr.c: Update tests. --- gcc/config/aarch64/aarch64.md| 163 --- gcc/testsuite/gcc.target/aarch64/cmpbr.c | 28 ++-- 2 files changed, 102 insertions(+), 89 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 32e0f739ae5..fc1cbbeaa4e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -728,6 +728,19 @@ (define_constants ;; Conditional jumps ;; --- +;; The order of the rules below is important. +;; Higher priority rules are preferred because they can express larger +;; displacements. +;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ. +;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ. +;; 3) When the CMPBR extension is enabled: +;; a) Comparisons between two registers are handled by +;; CBB/CBH/CB. +;; b) Comparisons between a GP register and an in range immediate are +;; handled by CB (immediate). +;; 4) Otherwise, emit a CMP+B sequence. +;; --- + (define_expand "cbranch4" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand:GPI 1 "register_operand") @@ -738,7 +751,7 @@ (define_expand "cbranch4" { if (TARGET_CMPBR && aarch64_cb_rhs (GET_CODE (operands[0]), operands[2])) { - /* Fall-through to `aarch64_cb`. */ + /* The branch is supported natively. */ } else { @@ -784,6 +797,80 @@ (define_expand "cbranchcc4" "" ) +;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ` +(define_insn "aarch64_cbz1" + [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r") + (const_int 0)) + (label_ref (match_operand 1)) + (pc)))] + "!aarch64_track_speculation" + { +if (get_attr_length (insn) == 8) + return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, "); +else + return "\\t%0, %l1"; + } + [(set_attr "type" "branch") + (set (attr "length") + (if_then_else (and (ge (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_N_1MiB)) + (lt (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_P_1MiB))) + (const_int 4) + (const_int 8))) + (set (attr "far_branch") + (if_then_else (and (ge (minus (match_dup 2) (pc)) + (const_int BRANCH_LEN_N_1MiB)) + (lt (minus (match_dup 2) (pc)) + (const_int BRANCH_LEN_P_1MiB))) + (const_string "no") + (const_string "yes")))] +) + +;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ` +(define_insn "*aarch64_tbz1" + [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r") +(const_int 0)) + (label_ref (match_operand 1)) + (pc))) + (clobber (reg:CC CC_REGNUM))] + "!aarch64_track_speculation" + { +if (get_attr_length (insn) == 8) + { + if (get_attr_far_branch (insn) == FAR_BRANCH_YES) + return aarch64_gen_far_branch (operands, 1, "Ltb", +"\\t%0, , "); + else + { + char buf[64]; + uint64_t val = ((uint64_t) 1) + << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1); + sprintf (buf, "tst\t%%0, %" PRId64, val); + output_asm_insn (buf, operands); + return "\t%l1"; + } + } +else + return "\t%0, , %l1"; + } + [(set_attr "type" "branch") + (set (attr "length") + (if_then_else (and (ge (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_N_32KiB)) + (lt (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_P_32KiB))) + (const_int 4) + (const_int 8))) + (set (attr "far_branch") + (if_then_else (and (ge (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_N_1MiB)) + (lt (minus (match_dup 1) (pc)) + (const_int BRANCH_LEN_P_1MiB))) + (const_string "no") + (const_string "yes")))] +) + ;; Emit a `CB (register)` or `CB (immediate)` instruction. ;; The immediate range depends on the comparison code.
[PATCH v2 2/2] middle-end: Enable masked load with non-constant offset
The function `vect_check_gather_scatter` requires the `base` of the load to be loop-invariant and the `off`set to be not loop-invariant. When faced with a scenario where `base` is not loop-invariant, instead of giving up immediately we can try swapping the `base` and `off`, if `off` is actually loop-invariant. Previously, it would only swap if `off` was the constant zero (and so trivially loop-invariant). This is too conservative: we can still perform the swap if `off` is a more complex but still loop-invariant expression, such as a variable defined outside of the loop. This allows loops like the function below to be vectorised, if the target has masked loads and sufficiently large vector registers (eg `-march=armv8-a+sve -msve-vector-bits=128`): ```c typedef struct Array { int elems[3]; } Array; int loop(Array **pp, int len, int idx) { int nRet = 0; for (int i = 0; i < len; i++) { Array *p = pp[i]; if (p) { nRet += p->elems[idx]; } } return nRet; } ``` gcc/ChangeLog: * tree-vect-data-refs.cc (vect_check_gather_scatter): Swap `base` and `off` in more scenarios. Also assert at the end of the function that `base` and `off` are loop-invariant and not loop-invariant respectively. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/mask_load_2.c: Update tests. --- .../gcc.target/aarch64/sve/mask_load_2.c | 4 +-- gcc/tree-vect-data-refs.cc| 26 --- 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c index 38fcf4f7206..66d95101a14 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_load_2.c @@ -19,5 +19,5 @@ int loop(Array **pp, int len, int idx) { return nRet; } -// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 0 } } -// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m} 0 } } +// { dg-final { scan-assembler-times {ld1w\tz[0-9]+\.d, p[0-7]/z} 1 } } +// { dg-final { scan-assembler-times {add\tz[0-9]+\.s, p[0-7]/m} 1 } } diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index ee040eb9888..ea8536ec262 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4659,26 +4659,19 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, if (off == NULL_TREE) off = size_zero_node; - /* If base is not loop invariant, either off is 0, then we start with just - the constant offset in the loop invariant BASE and continue with base - as OFF, otherwise give up. - We could handle that case by gimplifying the addition of base + off - into some SSA_NAME and use that as off, but for now punt. */ + /* BASE must be loop invariant. If it is not invariant, but OFF is, then we + * can fix that by swapping BASE and OFF. */ if (!expr_invariant_in_loop_p (loop, base)) { - if (!integer_zerop (off)) + if (!expr_invariant_in_loop_p (loop, off)) return false; - off = base; - base = size_int (pbytepos); -} - /* Otherwise put base + constant offset into the loop invariant BASE - and continue with OFF. */ - else -{ - base = fold_convert (sizetype, base); - base = size_binop (PLUS_EXPR, base, size_int (pbytepos)); + + std::swap (base, off); } + base = fold_convert (sizetype, base); + base = size_binop (PLUS_EXPR, base, size_int (pbytepos)); + /* OFF at this point may be either a SSA_NAME or some tree expression from get_inner_reference. Try to peel off loop invariants from it into BASE as long as possible. */ @@ -4856,6 +4849,9 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, offset_vectype = NULL_TREE; } + gcc_checking_assert (expr_invariant_in_loop_p (loop, base)); + gcc_checking_assert (!expr_invariant_in_loop_p (loop, off)); + info->ifn = ifn; info->decl = decl; info->base = base; -- 2.45.2
Re: [Fortran, Patch, v1] 3/(3) Prevent creating tree that is never used.
Am 25.06.25 um 13:45 schrieb Andre Vehreschild: Hi, while hunting for pr120711 I found a construct where a call-tree was created and never used. The patch now just suppresses the tree creation and instead uses directly the tree that is desired. Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline? This is OK. Thanks for the patch! Harald Regards, Andre
Re: [RFC PATCH] c++: Implement C++26 P3533R2 - constexpr virtual inheritance [PR120777]
On 6/25/25 12:49 PM, Jakub Jelinek wrote: On Wed, Jun 25, 2025 at 12:37:33PM -0400, Jason Merrill wrote: Ah, looks like fixed_type_or_null needs to handle a CALL_EXPR of class type like a TARGET_EXPR. I also wonder why the call isn't already wrapped in a TARGET_EXPR by build_cxx_call=>build_cplus_new at this point. Wonder if it has anything to do with being in unevaluated context It seems to be bugginess in the handling of decltype_p, which is supposed to only apply to the immediate operand of decltype; the attached fixes the testcase. I think we also still want the change to fixed_type_or_null. (and whether perhaps cp_build_addr_expr isn't undesirable for that case, because that can make vars odr-used etc.; are are odr uses in unevaluated context also supposed to make vars odr-used?). That's fine, mark_used handles not actually odr-using things in unevaluated context.From 2cf9705f22ce2edcf749ef6721b1ee6c1200 Mon Sep 17 00:00:00 2001 From: Jason Merrill Date: Wed, 25 Jun 2025 16:26:56 -0400 Subject: [PATCH] c++: fix decltype_p To: gcc-patches@gcc.gnu.org gcc/cp/ChangeLog: * parser.cc (cp_parser_binary_expression): Don't pass decltype_p to the operands. --- gcc/cp/parser.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 80fd7990bbb..ba12c50fa7b 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -10791,7 +10791,7 @@ cp_parser_binary_expression (cp_parser* parser, bool cast_p, current.lhs_type = (cp_lexer_next_token_is (parser->lexer, CPP_NOT) ? TRUTH_NOT_EXPR : ERROR_MARK); current.lhs = cp_parser_cast_expression (parser, /*address_p=*/false, - cast_p, decltype_p, pidk); + cast_p, /*decltype_p*/false, pidk); current.prec = prec; if (cp_parser_error_occurred (parser)) -- 2.49.0
[PATCH v7 2/9] AArch64: reformat branch instruction rules
Make the formatting of the RTL templates in the rules for branch instructions more consistent with each other. gcc/ChangeLog: * config/aarch64/aarch64.md (cbranch4): Reformat. (cbranchcc4): Likewise. (condjump): Likewise. (*compare_condjump): Likewise. (aarch64_cb1): Likewise. (*cb1): Likewise. (tbranch_3): Likewise. (@aarch64_tb): Likewise. --- gcc/config/aarch64/aarch64.md | 68 +-- 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index fcc24e300e6..ee4c609ae0f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -714,7 +714,7 @@ (define_expand "cbranch4" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand:GPI 1 "register_operand") (match_operand:GPI 2 "aarch64_plus_operand")]) - (label_ref (match_operand 3 "" "")) + (label_ref (match_operand 3)) (pc)))] "" " @@ -729,30 +729,31 @@ (define_expand "cbranch4" (match_operator 0 "aarch64_comparison_operator" [(match_operand:GPF_F16 1 "register_operand") (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")]) - (label_ref (match_operand 3 "" "")) + (label_ref (match_operand 3)) (pc)))] "" - " + { operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1], operands[2]); operands[2] = const0_rtx; - " + } ) (define_expand "cbranchcc4" - [(set (pc) (if_then_else - (match_operator 0 "aarch64_comparison_operator" - [(match_operand 1 "cc_register") - (match_operand 2 "const0_operand")]) - (label_ref (match_operand 3 "" "")) - (pc)))] + [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" + [(match_operand 1 "cc_register") +(match_operand 2 "const0_operand")]) + (label_ref (match_operand 3)) + (pc)))] "" - "") + "" +) (define_insn "condjump" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" - [(match_operand 1 "cc_register" "") (const_int 0)]) - (label_ref (match_operand 2 "" "")) + [(match_operand 1 "cc_register") +(const_int 0)]) + (label_ref (match_operand 2)) (pc)))] "" { @@ -789,10 +790,9 @@ (define_insn "condjump" ;; subsx0, x0, #(CST & 0x000fff) ;; b .Label (define_insn_and_split "*compare_condjump" - [(set (pc) (if_then_else (EQL - (match_operand:GPI 0 "register_operand" "r") - (match_operand:GPI 1 "aarch64_imm24" "n")) - (label_ref:P (match_operand 2 "" "")) + [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 "aarch64_imm24" "n")) + (label_ref:P (match_operand 2)) (pc)))] "!aarch64_move_imm (INTVAL (operands[1]), mode) && !aarch64_plus_operand (operands[1], mode) @@ -816,8 +816,8 @@ (define_insn_and_split "*compare_condjump" (define_insn "aarch64_cb1" [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r") - (const_int 0)) - (label_ref (match_operand 1 "" "")) + (const_int 0)) + (label_ref (match_operand 1)) (pc)))] "!aarch64_track_speculation" { @@ -841,8 +841,8 @@ (define_insn "aarch64_cb1" (define_insn "*cb1" [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r") -(const_int 0)) - (label_ref (match_operand 1 "" "")) +(const_int 0)) + (label_ref (match_operand 1)) (pc))) (clobber (reg:CC CC_REGNUM))] "!aarch64_track_speculation" @@ -883,11 +883,11 @@ (define_insn "*cb1" ;; --- (define_expand "tbranch_3" - [(set (pc) (if_then_else - (EQL (match_operand:SHORT 0 "register_operand") - (match_operand 1 "const0_operand")) - (label_ref (match_operand 2 "")) - (pc)))] + [(set (pc) (if_then_else (EQL +(match_operand:SHORT 0 "register_operand") +(match_operand 1 "const0_operand")) +
[PATCH v6 7/9] AArch64: precommit test for CMPBR instructions
Commit the test file `cmpbr.c` before rules for generating the new instructions are added, so that the changes in codegen are more obvious in the next commit. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add `cmpbr` to the list of extensions. * gcc.target/aarch64/cmpbr.c: New test. --- gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1841 ++ gcc/testsuite/lib/target-supports.exp| 14 +- 2 files changed, 1849 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c b/gcc/testsuite/gcc.target/aarch64/cmpbr.c new file mode 100644 index 000..b8925f14433 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c @@ -0,0 +1,1841 @@ +// Test that the instructions added by FEAT_CMPBR are emitted +// { dg-do compile } +// { dg-do-if assemble { target aarch64_asm_cmpbr_ok } } +// { dg-options "-march=armv9.5-a+cmpbr -O2" } +// { dg-final { check-function-bodies "**" "*/" "" { target *-*-* } {\.L[0-9]+} } } + +#include + +typedef uint8_t u8; +typedef int8_t i8; + +typedef uint16_t u16; +typedef int16_t i16; + +typedef uint32_t u32; +typedef int32_t i32; + +typedef uint64_t u64; +typedef int64_t i64; + +int taken(); +int not_taken(); + +#define COMPARE(ty, name, op, rhs) \ + int ty##_x0_##name##_##rhs(ty x0, ty x1) { \ +return (x0 op rhs) ? taken() : not_taken(); \ + } + +#define COMPARE_ALL(unsigned_ty, signed_ty, rhs) \ + COMPARE(unsigned_ty, eq, ==, rhs); \ + COMPARE(unsigned_ty, ne, !=, rhs); \ + \ + COMPARE(unsigned_ty, ult, <, rhs); \ + COMPARE(unsigned_ty, ule, <=, rhs); \ + COMPARE(unsigned_ty, ugt, >, rhs); \ + COMPARE(unsigned_ty, uge, >=, rhs); \ + \ + COMPARE(signed_ty, slt, <, rhs); \ + COMPARE(signed_ty, sle, <=, rhs); \ + COMPARE(signed_ty, sgt, >, rhs); \ + COMPARE(signed_ty, sge, >=, rhs); + +// CBB (register) +COMPARE_ALL(u8, i8, x1); + +// CBH (register) +COMPARE_ALL(u16, i16, x1); + +// CB (register) +COMPARE_ALL(u32, i32, x1); +COMPARE_ALL(u64, i64, x1); + +// CB (immediate) +COMPARE_ALL(u32, i32, 42); +COMPARE_ALL(u64, i64, 42); + +// Special cases +// Comparisons against the immediate 0 can be done for all types, +// because we can use the wzr/xzr register as one of the operands. +// However, we should prefer to use CBZ/CBNZ or TBZ/TBNZ when possible, +// because they have larger range. +COMPARE_ALL(u8, i8, 0); +COMPARE_ALL(u16, i16, 0); +COMPARE_ALL(u32, i32, 0); +COMPARE_ALL(u64, i64, 0); + +// CBB and CBH cannot have immediate operands. +// Instead we have to do a MOV+CB. +COMPARE_ALL(u8, i8, 42); +COMPARE_ALL(u16, i16, 42); + +// 64 is out of the range for immediate operands (0 to 63). +// * For 8/16-bit types, use a MOV+CB as above. +// * For 32/64-bit types, use a CMP+B instead, +// because B has a longer range than CB. +COMPARE_ALL(u8, i8, 64); +COMPARE_ALL(u16, i16, 64); +COMPARE_ALL(u32, i32, 64); +COMPARE_ALL(u64, i64, 64); + +// 4098 is out of the range for CMP (0 to 4095, optionally shifted by left by 12 +// bits), but it can be materialized in a single MOV. +COMPARE_ALL(u16, i16, 4098); +COMPARE_ALL(u32, i32, 4098); +COMPARE_ALL(u64, i64, 4098); + +/* +** u8_x0_eq_x1: +** and w1, w1, 255 +** cmp w1, w0, uxtb +** beq .L4 +** b not_taken +** .L4: +** b taken +*/ + +/* +** u8_x0_ne_x1: +** and w1, w1, 255 +** cmp w1, w0, uxtb +** beq .L6 +** b taken +** .L6: +** b not_taken +*/ + +/* +** u8_x0_ult_x1: +** and w1, w1, 255 +** cmp w1, w0, uxtb +** bls .L8 +** b taken +** .L8: +** b not_taken +*/ + +/* +** u8_x0_ule_x1: +** and w1, w1, 255 +** cmp w1, w0, uxtb +** bcc .L10 +** b taken +** .L10: +** b not_taken +*/ + +/* +** u8_x0_ugt_x1: +** and w1, w1, 255 +** cmp w1, w0, uxtb +** bcs .L12 +** b taken +** .L12: +** b not_taken +*/ + +/* +** u8_x0_uge_x1: +** and w1, w1, 255 +** cmp w1, w0, uxtb +** bhi .L14 +** b taken +** .L14: +** b not_taken +*/ + +/* +** i8_x0_slt_x1: +** sxtbw1, w1 +** cmp w1, w0, sxtb +** ble