[Bug target/106481] [13 Regression] ICE: in native_encode_rtx, at simplify-rtx.cc:6884 with -O2 -fno-dce -fno-forward-propagate -fno-rerun-cse-after-loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106481 Roger Sayle changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Roger Sayle --- This should now be fixed on mainline.
[Bug rtl-optimization/71775] Redundant move instruction for sign extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71775 --- Comment #3 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f commit r13-1942-gc23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f Author: Roger Sayle Date: Wed Aug 3 08:55:35 2022 +0100 Some additional zero-extension related optimizations in simplify-rtx. This patch implements some additional zero-extension and sign-extension related optimizations in simplify-rtx.cc. The original motivation comes from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees: Failed to match this instruction: (set (reg:DI 88 [ _1 ]) (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) On many platforms the result of DImode CTZ is constrained to be a small unsigned integer (between 0 and 64), hence the truncation to 32-bits (using a SUBREG) and the following sign extension back to 64-bits are effectively a no-op, so the above should ideally (often) be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". To implement this, and some closely related transformations, we build upon the existing val_signbit_known_clear_p predicate. In the first chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x)) can itself be simplified. The second transformation is that we can canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above) when the operand's sign-bit is known to be clear. The final two chunks are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating SUBREG respectively. The nonzero_bits of a truncating SUBREG pessimistically thinks that the upper bits may have an arbitrary value (by taking the SUBREG), so we need look deeper at the SUBREG's operand to confirm that the high bits are known to be zero. Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with default architecture options is undefined at zero, so we can't be sure the upper bits of reg:DI 88 will be sign extended (all zeros or all ones). nonzero_bits knows this, so the above transformations don't trigger, but the transformations themselves are perfectly valid for other operations such as FFS, POPCOUNT and PARITY, and on other targets/-march settings where CTZ is defined at zero. 2022-08-03 Roger Sayle Segher Boessenkool Richard Sandiford gcc/ChangeLog * simplify-rtx.cc (simplify_unary_operation_1) : Add optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS and LSHIFTRT that are all positive to complement the existing FFS and idempotent ABS simplifications. : Canonicalize SIGN_EXTEND to ZERO_EXTEND when val_signbit_known_clear_p is true of the operand. Simplify sign extensions of SUBREG truncations of operands that are already suitably (zero) extended. : Simplify zero extensions of SUBREG truncations of operands that are already suitably zero extended.
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #21 from Kewen Lin --- I didn't look into this in details, but something in the culprit commit caught my eyes, take altivec_vmrghh as example: Before the patch, the pattern [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) (parallel [(const_int 0) (const_int 8) (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] can match vmrghh on BE while vmrglh on LE. It indicates this pattern has different semantic from underlying instruction perspectives. After the patch, this pattern only matches vmrghh. IMHO, this part has semantic change before and after the patch. The code before the patch looks more reasonable to me, since the pattern can have different meanings on BE and LE (underlying behavior).
[Bug target/47949] Missed optimization for -Os using xchg instead of mov.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949 --- Comment #5 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:fc6ef90173478521982e9df3831a06ea85b4f41e commit r13-1945-gfc6ef90173478521982e9df3831a06ea85b4f41e Author: Roger Sayle Date: Wed Aug 3 09:07:36 2022 +0100 PR target/47949: Use xchg to move from/to AX_REG with -Oz on x86. This patch adds a peephole2 to i386.md to implement the suggestion in PR target/47949, of using xchg instead of mov for moving values to/from the %rax/%eax register, controlled by -Oz, as the xchg instruction is one byte shorter than the move it is replacing. The new test case is taken from the PR: int foo(int x) { return x; } where previously we'd generate: foo:mov %edi,%eax // 2 bytes ret but with this patch, using -Oz, we generate: foo:xchg %eax,%edi // 1 byte ret On the CSiBE benchmark, this saves a total of 10238 bytes (reducing the -Oz total from 3661796 bytes to 3651558 bytes, a 0.28% saving). Interestingly, some modern architectures (such as Zen 3) implement xchg using zero latency register renaming (just like mov), so in theory this transformation could be enabled when optimizing for speed, if benchmarking shows the improved code density produces consistently better performance. However, this is architecture dependent, and there may be interactions using xchg (instead a single_set) in the late RTL passes (such as cprop_hardreg), so for now I've restricted this to -Oz. 2022-08-03 Roger Sayle Uroš Bizjak gcc/ChangeLog PR target/47949 * config/i386/i386.md (peephole2): New peephole2 to convert SWI48 moves to/from %rax/%eax where the src is dead to xchg, when optimizing for minimal size with -Oz. gcc/testsuite/ChangeLog PR target/47949 * gcc.target/i386/pr47949.c: New test case.
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #22 from rguenther at suse dot de --- On Wed, 3 Aug 2022, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 > > --- Comment #21 from Kewen Lin --- > I didn't look into this in details, but something in the culprit commit caught > my eyes, take altivec_vmrghh as example: > > Before the patch, the pattern > >[(set (match_operand:V8HI 0 "register_operand" "=v") > (vec_select:V8HI >(vec_concat:V16HI > (match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")) >(parallel [(const_int 0) (const_int 8) > (const_int 1) (const_int 9) > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > > can match vmrghh on BE while vmrglh on LE. It indicates this pattern has > different semantic from underlying instruction perspectives. > > After the patch, this pattern only matches vmrghh. > > IMHO, this part has semantic change before and after the patch. The code > before > the patch looks more reasonable to me, since the pattern can have different > meanings on BE and LE (underlying behavior). Ideally we would avoid semantic difference of RTL depending on the target. If that's not avoidable there should be target macros/hooks that specify the desired semantics. I assume the semantic difference is in vec_concat behavior but that's just documented as @findex vec_concat @item (vec_concat:@var{m} @var{x1} @var{x2}) Describes a vector concat operation. The result is a concatenation of the vectors or scalars @var{x1} and @var{x2}; its length is the sum of the lengths of the two inputs. which is a bit unspecific. To me it implies that vec_select of a single lane N of the concat result can be distributed to the operands of the vec_concat in the obvious way (if N >= GET_MODE_NUNITS (x1) subtract GET_MODE_NUNITS and use x2)
[Bug target/106322] i386: Wrong code at O2 level (O0 / O1 are working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #11 from Mathieu Malaterre --- (In reply to Uroš Bizjak from comment #10) > The reason the test fails with gcc-12 is that gcc-12 enabled > auto-vectorisation for -O2. I can make the symptoms go away by doing: `-O2 -fno-tree-vectorize`. Since this affects also arm5 and powerpc, it seems the bug is somewhere in the shared 32bits code (bug does not affects 64bits arch for some reason).
[Bug fortran/105924] false floating point exception when evaluating exponential function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105924 Lin-Hui Ye changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #2 from Lin-Hui Ye --- Sorry I didn't know this is an underflow. I was expecting exp(-16) to give a value close to zero. Thanks for the explanation. Linhui (In reply to kargl from comment #1) > Why do you thing that you should not get an exception? > > e = -400 > e*e = 16 > -e*e = -16 > exp(-e*e) = exp(-16) <-- This is going to underflow to zero. > > You specifically asked gfortran to signal an exception if > underflow occurs with the -ffpe-trap=underflow option. The > underflow threshold occurs at x = -745 for exp(x).
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #23 from Kewen Lin --- > Ideally we would avoid semantic difference of RTL depending on the target. > If that's not avoidable there should be target macros/hooks that specify > the desired semantics. Not sure, IMHO it seems it doesn't depend on the target but on endianness (BYTES_BIG_ENDIAN)? Segher and Mike may have more insights on this. > I assume the semantic difference is in > vec_concat behavior but that's just documented as > > @findex vec_concat > @item (vec_concat:@var{m} @var{x1} @var{x2}) > Describes a vector concat operation. The result is a concatenation of the > vectors or scalars @var{x1} and @var{x2}; its length is the sum of the > lengths of the two inputs. > > which is a bit unspecific. To me it implies that > vec_select of a single lane N of the concat result can be distributed > to the operands of the vec_concat in the obvious way (if N >= > GET_MODE_NUNITS (x1) subtract GET_MODE_NUNITS and use x2) Yeah, the documentation isn't clear, neither for vec_select. I guess vec_select also matters here, the indexes for vec_select would have the LE ordering like subreg byte offset on LE?
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 Richard Biener changed: What|Removed |Added CC||rearnsha at gcc dot gnu.org, ||rsandifo at gcc dot gnu.org --- Comment #24 from Richard Biener --- Richards, how is this handled on arm BE vs LE? We don't have a specific VECTOR_LANES_BIG_ENDIAN, but we are using BYTES_BIG_ENDIAN already for some of the VEC_*_{LO,HI}_EXPR tree codes (but IIRC not for anything regarding to VEC_PERM_EXPR for example which looks most related to select/concat on RTL)
[Bug lto/106499] LTO runs forever in libfabric 1.15.1 linking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106499 --- Comment #23 from Martin Liška --- > If may I ask yet another question 😋 Sure, don't hesitate ;) > Martin can you tell how did you manage to diagnose that it was exactly that > cause in this case? I noticed we spent time in inliner (perf top) and then I suspected a flatten attribute. So 'git grep flatten' proved that.
[Bug rtl-optimization/106187] armhf: Miscompilation at O2 level (O0 / O1 are working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106187 --- Comment #49 from CVS Commits --- The master branch has been updated by Richard Earnshaw : https://gcc.gnu.org/g:64ce76d940501cb04d14a0d36752b4f93473531c commit r13-1948-g64ce76d940501cb04d14a0d36752b4f93473531c Author: Richard Earnshaw Date: Wed Aug 3 10:01:51 2022 +0100 cselib: add function to check if SET is redundant [PR106187] A SET operation that writes memory may have the same value as an earlier store but if the alias sets of the new and earlier store do not conflict then the set is not truly redundant. This can happen, for example, if objects of different types share a stack slot. To fix this we define a new function in cselib that first checks for equality and if that is successful then finds the earlier store in the value history and checks the alias sets. The routine is used in two places elsewhere in the compiler: cfgcleanup and postreload. gcc/ChangeLog: PR rtl-optimization/106187 * alias.h (mems_same_for_tbaa_p): Declare. * alias.cc (mems_same_for_tbaa_p): New function. * dse.cc (record_store): Use it instead of open-coding alias check. * cselib.h (cselib_redundant_set_p): Declare. * cselib.cc: Include alias.h (cselib_redundant_set_p): New function. * cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead of rtx_equal_for_cselib_p. * postreload.cc (reload_cse_simplify): Use cselib_redundant_set_p. (reload_cse_noop_set_p): Delete.
[Bug tree-optimization/106511] New: [13 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511 Bug ID: 106511 Summary: [13 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org CC: aldyh at gcc dot gnu.org Target Milestone: --- Created attachment 53404 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53404&action=edit test-case Knowing the warning has a significant false-positive rate, but still, it may be an interesting test-case. It's reduced from xen package: $ gcc bunzip.i -Werror=maybe-uninitialized -O1 bunzip2.c: In function ‘get_next_block’: bunzip2.c:261:27: error: ‘length’ may be used uninitialized [-Werror=maybe-uninitialized] bunzip2.c:224:17: note: ‘length’ declared here cc1: some warnings being treated as errors
[Bug tree-optimization/106511] [13 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511 Martin Liška changed: What|Removed |Added Last reconfirmed||2022-08-03 Status|UNCONFIRMED |NEW Ever confirmed|0 |1
[Bug rtl-optimization/106187] armhf: Miscompilation at O2 level (O0 / O1 are working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106187 Richard Earnshaw changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rearnsha at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #50 from Richard Earnshaw --- Fixed on master so far.
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #25 from rsandifo at gcc dot gnu.org --- AIUI the rules are: - GCC vector lane numbers always correspond to memory array indices. For example, lane 0 always comes first in memory. - On big-endian targets, vector loads and stores are assumed to put the first memory element at the most significant end of the vector register. So lane 0 refers to the most-significant register element on big-endian targets and to the least-significant register element on little-endian targets. So: (vec_select:V4SI (reg:V4SI R) [(const_int 2) (const_int 6) (const_int 3) (const_int 7)]) describes a different option on big-endian and little-endian but: (vec_select:V4SI (mem:V4SI M) [(const_int 2) (const_int 6) (const_int 3) (const_int 7)]) is endian-independent.
[Bug libfortran/106079] [12/13 regression] gfortran.dg/boz_15.f90 fails after r12-6498-g07c60b8e33
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106079 --- Comment #9 from CVS Commits --- The releases/gcc-12 branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:4e5ca7ff8c9afd3c38245aa6b939cd3ae49bf1fe commit r12-8653-g4e5ca7ff8c9afd3c38245aa6b939cd3ae49bf1fe Author: Jakub Jelinek Date: Mon Aug 1 08:26:03 2022 +0200 libfortran: Fix up boz_15.f90 on powerpc64le with -mabi=ieeelongdouble [PR106079] The boz_15.f90 test FAILs on powerpc64le-linux when -mabi=ieeelongdouble is used (either default through --with-long-double-format=ieee or when used explicitly). The problem is that the read/write transfer routines are called with BT_REAL (or BT_COMPLEX) type and kind 17 which is magic we use to say it is the IEEE quad real(kind=16) rather than the IBM double double real(kind=16). For the floating point input/output we then handle kind 17 specially, but for B/O/Z we just treat the bytes of the floating point value as binary blob and using 17 in that case results in unexpected behavior, for write it means we don't estimate right how many chars we'll need and print etc. rather than what we should, and even with explicit size we'd print one further byte than intended. For read it would even mean overwriting some unrelated byte after the floating point object. Fixed by using 16 instead of 17 in the read_radix and write_{b,o,z} calls. 2022-08-01 Jakub Jelinek PR libfortran/106079 * io/transfer.c (formatted_transfer_scalar_read, formatted_transfer_scalar_write): For type BT_REAL with kind 17 change kind to 16 before calling read_radix or write_{b,o,z}. (cherry picked from commit 82ac4cd213867be939aedee15347e8fd3f200b6a)
[Bug libfortran/106079] [12/13 regression] gfortran.dg/boz_15.f90 fails after r12-6498-g07c60b8e33
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106079 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Jakub Jelinek --- Fixed for 12.2 and later.
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #26 from rsandifo at gcc dot gnu.org --- > describes a different option on big-endian and little-endian should have said: describes a different instruction. In other words, the mapping of gimple to RTL operations is fixed, but the mapping of those RTL operations to machine instructions varies by endianness (if registers are involved).
[Bug c++/106512] New: String optimization underflows in std::string::operator+ inlining
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106512 Bug ID: 106512 Summary: String optimization underflows in std::string::operator+ inlining Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hi at jdoubleu dot de Target Milestone: --- Live example: https://godbolt.org/z/zMqG8W7WE Given the following code: ```cpp #include std::string GetHello() { return std::string{"ello"}; } int main() { ("H" + GetHello()); } ``` Fails to compile with 1. gcc version 12.1 and newer, 2. linking against gnu++20 and higher 3. all warnings enabled, 4. warnings set to produce an error, 5. -O3 is turned on I get the following error: ``` In file included from /opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/string:40, from :1: In static member function 'static constexpr std::char_traits::char_type* std::char_traits::copy(char_type*, const char_type*, std::size_t)', inlined from 'static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits; _Alloc = std::allocator]' at /opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.h:423:21, inlined from 'constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits; _Alloc = std::allocator]' at /opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.tcc:532:22, inlined from 'constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits; _Alloc = std::allocator]' at /opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.h:2171:19, inlined from 'constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::insert(size_type, const _CharT*) [with _CharT = char; _Traits = std::char_traits; _Alloc = std::allocator]' at /opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.h:1928:22, inlined from 'constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator> std::operator+(const _CharT*, __cxx11::basic_string<_CharT, _Traits, _Allocator>&&) [with _CharT = char; _Traits = char_traits; _Alloc = allocator]' at /opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/basic_string.h:3541:36, inlined from 'int main()' at :10:10: /opt/compiler-explorer/gcc-12.1.0/include/c++/12.1.0/bits/char_traits.h:431:56: error: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' accessing 9223372036854775810 or more bytes at offsets [18, 9223372036854775807] and 17 may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict] 431 | return static_cast(__builtin_memcpy(__s1, __s2, __n)); | ^ ``` I'm not sure what the issue is here exactly. From the error message, it looks some underflow (of `long long`) when trying to inline the std::string::operator+? It doesn't seem like a bug in libstdc++, since it compiles with gcc11. Furthermore, if you just change the `"H" + ...` in the example to `"He" + ...` it suddenly works. The symptoms of this one look similar: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=85651
[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888 --- Comment #2 from Kewen Lin --- Created attachment 53405 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53405&action=edit untested patch With the attached patch, for -fpatchable-function-entry=5,2 it gets: foo: .LFB0: .cfi_startproc .LCF0: 0: addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l .section__patchable_function_entries,"awo",@progbits,foo .align 3 .8byte .LPFE1 .section".text" .LPFE1: nop nop .localentry foo,.-foo .section__patchable_function_entries,"awo",@progbits,foo .align 3 .8byte .LPFE2 .section".text" .LPFE2: nop nop nop std 31,-8(1) for -fpatchable-function-entry=5,1, it emits error msg: test.c:4:1: error: ‘-fpatchable-function-entry=M,N’ N NOPs can cause assembler error due to invalid .localentry expression.
[Bug c++/106512] String optimization underflows in std::string::operator+ inlining
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106512 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Jonathan Wakely --- Another dup of PR 105651 or PR 105329 *** This bug has been marked as a duplicate of bug 105651 ***
[Bug tree-optimization/105651] [12/13 Regression] bogus "may overlap" memcpy warning with std::string and operator+ at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105651 Jonathan Wakely changed: What|Removed |Added CC||hi at jdoubleu dot de --- Comment #16 from Jonathan Wakely --- *** Bug 106512 has been marked as a duplicate of this bug. ***
[Bug lto/106499] LTO runs forever in libfabric 1.15.1 linking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106499 --- Comment #24 from Tomasz Kłoczko --- Thank you :)
[Bug tree-optimization/106513] New: bswap is incorrectly generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513 Bug ID: 106513 Summary: bswap is incorrectly generated Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC may incorrectly generate bswap instructions for code not doing a correct swap. This can be seen by running the function from testsuite/gcc.dg/pr40501.c as typedef long int int64_t; __attribute__((noinline)) int64_t swap64 (int64_t n) { return (((n & (((int64_t) 0xff) )) << 56) | ((n & (((int64_t) 0xff) << 8)) << 40) | ((n & (((int64_t) 0xff) << 16)) << 24) | ((n & (((int64_t) 0xff) << 24)) << 8) | ((n & (((int64_t) 0xff) << 32)) >> 8) | ((n & (((int64_t) 0xff) << 40)) >> 24) | ((n & (((int64_t) 0xff) << 48)) >> 40) | ((n & (((int64_t) 0xff) << 56)) >> 56)); } int main (void) { volatile int64_t n = 0x8000l; if (swap64(n) != 0xff80l) __builtin_abort (); return 0; } This fails at -Os and higher optimization levels.
[Bug tree-optimization/106513] bswap is incorrectly generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513 --- Comment #1 from Andreas Schwab --- This subexpression has undefined behaviour: (((int64_t) 0xff) << 56).
[Bug libstdc++/103133] Binary built with -static using std::thread crashes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103133 --- Comment #11 from CVS Commits --- The releases/gcc-10 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:18eecb8c4a97716d4bc4890b05c91f172fadc7b3 commit r10-10928-g18eecb8c4a97716d4bc4890b05c91f172fadc7b3 Author: Jonathan Wakely Date: Tue Nov 9 23:45:36 2021 + libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133] Since Glibc 2.34 all pthreads symbols are defined directly in libc not libpthread, and since Glibc 2.32 we have used __libc_single_threaded to avoid unnecessary locking in single-threaded programs. This means there is no reason to avoid linking to libpthread now, and so no reason to use weak symbols defined in gthr-posix.h for all the pthread_xxx functions. libstdc++-v3/ChangeLog: PR libstdc++/100748 PR libstdc++/103133 * config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define for glibc 2.34 and later. (cherry picked from commit 80fe172ba9820199c2bbce5d0611ffca27823049)
[Bug testsuite/100748] [12 regression] 30_threads/jthread/95989.cc fails after r12-843
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100748 --- Comment #12 from CVS Commits --- The releases/gcc-10 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:18eecb8c4a97716d4bc4890b05c91f172fadc7b3 commit r10-10928-g18eecb8c4a97716d4bc4890b05c91f172fadc7b3 Author: Jonathan Wakely Date: Tue Nov 9 23:45:36 2021 + libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133] Since Glibc 2.34 all pthreads symbols are defined directly in libc not libpthread, and since Glibc 2.32 we have used __libc_single_threaded to avoid unnecessary locking in single-threaded programs. This means there is no reason to avoid linking to libpthread now, and so no reason to use weak symbols defined in gthr-posix.h for all the pthread_xxx functions. libstdc++-v3/ChangeLog: PR libstdc++/100748 PR libstdc++/103133 * config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define for glibc 2.34 and later. (cherry picked from commit 80fe172ba9820199c2bbce5d0611ffca27823049)
[Bug libstdc++/98421] std::span does not detect invalid range in Debug Mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98421 --- Comment #4 from CVS Commits --- The releases/gcc-10 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:de802e4736613a585dcfd508acf73033f18aa4da commit r10-10932-gde802e4736613a585dcfd508acf73033f18aa4da Author: Jonathan Wakely Date: Tue Aug 31 17:34:51 2021 +0100 libstdc++: Add valid range checks to std::span constructors [PR98421] Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: PR libstdc++/98421 * include/std/span (span(Iter, size_type), span(Iter, Iter)): Add valid range checks. * testsuite/23_containers/span/cons_1_assert_neg.cc: New test. * testsuite/23_containers/span/cons_2_assert_neg.cc: New test. (cherry picked from commit ef7becc9c8a48804d3fd9dac032f7b33e561a612)
[Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #12 from Mathieu Malaterre --- Created attachment 53406 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53406&action=edit main function with no-tree-optimize attribute
[Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #13 from Mathieu Malaterre --- Created attachment 53407 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53407&action=edit main function with no-tree-optimize attribute
[Bug tree-optimization/106322] 32bits / tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #14 from Mathieu Malaterre --- I can make the symptom go away with a single function attribute: ``` % diff -u * --- /tmp/ii/mul_test.cc.ii.bad 2022-08-03 12:29:41.192263306 + +++ /tmp/ii/mul_test.cc.ii.good 2022-08-03 12:29:41.196263281 + @@ -124932,7 +124932,7 @@ } template __attribute__((noinline)) void - + __attribute__((optimize("no-tree-vectorize"))) operator()(T , D d) { const size_t N = Lanes(d); ```
[Bug tree-optimization/106513] bswap is incorrectly generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513 --- Comment #2 from Krister Walfridsson --- (In reply to Andreas Schwab from comment #1) > This subexpression has undefined behaviour: (((int64_t) 0xff) << 56). I thought that was allowed in GCC as the manual says (https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Integers-implementation.html#Integers-implementation) "As an extension to the C language, GCC does not use the latitude given in C99 and C11 only to treat certain aspects of signed ‘<<’ as undefined." If not, what behavior does the manual refer to?
[Bug libstdc++/103133] Binary built with -static using std::thread crashes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103133 Jonathan Wakely changed: What|Removed |Added Target Milestone|11.3|10.5 --- Comment #12 from Jonathan Wakely --- Also fixed for 10.5
[Bug testsuite/100748] [12 regression] 30_threads/jthread/95989.cc fails after r12-843
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100748 --- Comment #13 from Jonathan Wakely --- Fixed for 11.3 and 10.5 too.
[Bug libstdc++/98421] std::span does not detect invalid range in Debug Mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98421 Jonathan Wakely changed: What|Removed |Added Target Milestone|11.3|10.5 --- Comment #5 from Jonathan Wakely --- Fixed for 11.3 and 10.5 too.
[Bug libstdc++/105844] [10/11/12 Regression] std::lcm(50000, 49999) is UB but accepted in a constexpr due to cast to unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105844 --- Comment #9 from CVS Commits --- The releases/gcc-12 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:8a57deb926cd660c2eae7ed621d61a301ae0d523 commit r12-8654-g8a57deb926cd660c2eae7ed621d61a301ae0d523 Author: Jonathan Wakely Date: Fri Jun 10 14:39:13 2022 +0100 libstdc++: Make std::lcm and std::gcd detect overflow [PR105844] When I fixed PR libstdc++/92978 I introduced a regression whereby std::lcm(INT_MIN, 1) and std::lcm(5, 4) would no longer produce errors during constant evaluation. Those calls are undefined, because they violate the preconditions that |m| and the result can be represented in the return type (which is int in both those cases). The regression occurred because __absu(INT_MIN) is well-formed, due to the explicit casts to unsigned in that new helper function, and the out-of-range multiplication is well-formed, because unsigned arithmetic wraps instead of overflowing. To fix 92978 I made std::gcm and std::lcm calculate |m| and |n| immediately, yielding a common unsigned type that was used to calculate the result. That was partly correct, but there's no need to use an unsigned type. Doing so only suppresses the overflow errors so the compiler can't detect them. This change replaces __absu with __abs_r that returns the common type (not its corresponding unsigned type). This way we can detect overflow in __abs_r when required, while still supporting the most-negative value when it can be represented in the result type. To detect LCM results that are out of range of the result type we still need explicit checks, because neither constant evaluation nor UBsan will complain about unsigned wrapping for cases such as std::lcm(50u, 49u). We can detect those overflows efficiently by using __builtin_mul_overflow and asserting. libstdc++-v3/ChangeLog: PR libstdc++/105844 * include/experimental/numeric (experimental::gcd): Simplify assertions. Use __abs_r instead of __absu. (experimental::lcm): Likewise. Remove use of __detail::__lcm so overflow can be detected. * include/std/numeric (__detail::__absu): Rename to __abs_r and change to allow signed result type, so overflow can be detected. (__detail::__lcm): Remove. (gcd): Simplify assertions. Use __abs_r instead of __absu. (lcm): Likewise. Remove use of __detail::__lcm so overflow can be detected. * testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines. * testsuite/26_numerics/lcm/lcm_neg.cc: Likewise. * testsuite/26_numerics/gcd/105844.cc: New test. * testsuite/26_numerics/lcm/105844.cc: New test. (cherry picked from commit 671970a5621e18e7079b4ca113e56434c858db66)
[Bug libstdc++/92978] std::gcd mishandles mixed-signedness
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92978 --- Comment #9 from CVS Commits --- The releases/gcc-12 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:8a57deb926cd660c2eae7ed621d61a301ae0d523 commit r12-8654-g8a57deb926cd660c2eae7ed621d61a301ae0d523 Author: Jonathan Wakely Date: Fri Jun 10 14:39:13 2022 +0100 libstdc++: Make std::lcm and std::gcd detect overflow [PR105844] When I fixed PR libstdc++/92978 I introduced a regression whereby std::lcm(INT_MIN, 1) and std::lcm(5, 4) would no longer produce errors during constant evaluation. Those calls are undefined, because they violate the preconditions that |m| and the result can be represented in the return type (which is int in both those cases). The regression occurred because __absu(INT_MIN) is well-formed, due to the explicit casts to unsigned in that new helper function, and the out-of-range multiplication is well-formed, because unsigned arithmetic wraps instead of overflowing. To fix 92978 I made std::gcm and std::lcm calculate |m| and |n| immediately, yielding a common unsigned type that was used to calculate the result. That was partly correct, but there's no need to use an unsigned type. Doing so only suppresses the overflow errors so the compiler can't detect them. This change replaces __absu with __abs_r that returns the common type (not its corresponding unsigned type). This way we can detect overflow in __abs_r when required, while still supporting the most-negative value when it can be represented in the result type. To detect LCM results that are out of range of the result type we still need explicit checks, because neither constant evaluation nor UBsan will complain about unsigned wrapping for cases such as std::lcm(50u, 49u). We can detect those overflows efficiently by using __builtin_mul_overflow and asserting. libstdc++-v3/ChangeLog: PR libstdc++/105844 * include/experimental/numeric (experimental::gcd): Simplify assertions. Use __abs_r instead of __absu. (experimental::lcm): Likewise. Remove use of __detail::__lcm so overflow can be detected. * include/std/numeric (__detail::__absu): Rename to __abs_r and change to allow signed result type, so overflow can be detected. (__detail::__lcm): Remove. (gcd): Simplify assertions. Use __abs_r instead of __absu. (lcm): Likewise. Remove use of __detail::__lcm so overflow can be detected. * testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines. * testsuite/26_numerics/lcm/lcm_neg.cc: Likewise. * testsuite/26_numerics/gcd/105844.cc: New test. * testsuite/26_numerics/lcm/105844.cc: New test. (cherry picked from commit 671970a5621e18e7079b4ca113e56434c858db66)
[Bug libstdc++/105957] __n * sizeof(_Tp) might overflow under consteval context for std::allocator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105957 --- Comment #4 from CVS Commits --- The releases/gcc-12 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:2ef2de76dae5cac14e0de77ca7205e43be03ab22 commit r12-8655-g2ef2de76dae5cac14e0de77ca7205e43be03ab22 Author: Jonathan Wakely Date: Tue Jun 14 14:37:25 2022 +0100 libstdc++: Check for size overflow in constexpr allocation [PR105957] libstdc++-v3/ChangeLog: PR libstdc++/105957 * include/bits/allocator.h (allocator::allocate): Check for overflow in constexpr allocation. * testsuite/20_util/allocator/105975.cc: New test. (cherry picked from commit 0a9af7b4ef1b8aa85cc8820acf54d41d1569fc10)
[Bug libstdc++/105995] QoI: constexpr basic_string variable must use all of its SSO buffer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105995 --- Comment #7 from CVS Commits --- The releases/gcc-12 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:e562236851e06091256593aa0d3fbda60a28e45b commit r12-8657-ge562236851e06091256593aa0d3fbda60a28e45b Author: Jonathan Wakely Date: Thu Jun 16 14:57:32 2022 +0100 libstdc++: Support constexpr global std::string for size < 15 [PR105995] I don't think this is required by the standard, but it's easy to support. libstdc++-v3/ChangeLog: PR libstdc++/105995 * include/bits/basic_string.h (_M_use_local_data): Initialize the entire SSO buffer. * testsuite/21_strings/basic_string/cons/char/105995.cc: New test. (cherry picked from commit 98a0d72a610a87e8e383d366e50253ddcc9a51dd)
[Bug libstdc++/104443] common_iterator::operator-> is not correctly implemented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104443 --- Comment #6 from CVS Commits --- The releases/gcc-12 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:1a9681e60964c7f7ce0892e14745e6dcf6100157 commit r12-8660-g1a9681e60964c7f7ce0892e14745e6dcf6100157 Author: Jonathan Wakely Date: Thu Jul 28 20:55:51 2022 +0100 libstdc++: Tweak common_iterator::operator-> return type [PR104443] This adjusts the return type to match the resolution of LWG 3672. There is no functional difference, because decltype(auto) always deduced a value anyway, but this makes it simpler and consistent with the working draft. libstdc++-v3/ChangeLog: PR libstdc++/104443 * include/bits/stl_iterator.h (common_iterator::operator->): Change return type to just auto. (cherry picked from commit b5f5d1b36edbcd7d923f2e2653e54e52637c715b)
[Bug libstdc++/106248] [11/12 Regression] operator>>std::basic_istream at boundary condition behave differently in different opt levels
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106248 --- Comment #10 from CVS Commits --- The releases/gcc-12 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:7a0ed28d4feb450f1ede5b52b57793a5df5b19fe commit r12-8659-g7a0ed28d4feb450f1ede5b52b57793a5df5b19fe Author: Jonathan Wakely Date: Tue Jul 12 11:18:47 2022 +0100 libstdc++: Check for EOF if extraction avoids buffer overflow [PR106248] In r11-2581-g17abcc77341584 (for LWG 2499) I added overflow checks to the pre-C++20 operator>>(istream&, char*) overload. Those checks can cause extraction to stop after filling the buffer, where previously it would have tried to extract another character and stopped at EOF. When that happens we no longer set eofbit in the stream state, which is consistent with the behaviour of the new C++20 overload, but is an observable and unexpected change in the C++17 behaviour. What makes it worse is that the behaviour change is dependent on optimization, because __builtin_object_size is used to detect the buffer size and that only works when optimizing. To avoid the unexpected and optimization-dependent change in behaviour, set eofbit manually if we stopped extracting because of the buffer size check, but had reached EOF anyway. If the stream's rdstate() != goodbit or width() is non-zero and smaller than the buffer, there's nothing to do. Otherwise, we filled the buffer and need to check for EOF, and maybe set eofbit. The new check is guarded by #ifdef __OPTIMIZE__ because otherwise __builtin_object_size is useless. There's no point compiling and emitting dead code that can't be eliminated because we're not optimizing. We could add extra checks that the next character in the buffer is not whitespace, to detect the case where we stopped early and prevented a buffer overflow that would have happened otherwise. That would allow us to assert or set badbit in the stream state when undefined behaviour was prevented. However, those extra checks would increase the size of the function, potentially reducing the likelihood of it being inlined, and so making the buffer size detection less reliable. It seems preferable to prevent UB and silently truncate, rather than miss the UB and allow the overflow to happen. libstdc++-v3/ChangeLog: PR libstdc++/106248 * include/std/istream [C++17] (operator>>(istream&, char*)): Set eofbit if we stopped extracting at EOF. * testsuite/27_io/basic_istream/extractors_character/char/pr106248.cc: New test. * testsuite/27_io/basic_istream/extractors_character/wchar_t/pr106248.cc: New test. (cherry picked from commit 5ae74944af1de032d4a27fad4a2287bd3a2163fd)
[Bug libstdc++/106248] [11 Regression] operator>>std::basic_istream at boundary condition behave differently in different opt levels
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106248 --- Comment #11 from Jonathan Wakely --- Backported for 12.2
[Bug libstdc++/105844] [10/11 Regression] std::lcm(50000, 49999) is UB but accepted in a constexpr due to cast to unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105844 --- Comment #10 from Jonathan Wakely --- Backported for 12.2
[Bug libstdc++/105995] QoI: constexpr basic_string variable must use all of its SSO buffer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105995 Jonathan Wakely changed: What|Removed |Added Target Milestone|13.0|12.2 --- Comment #8 from Jonathan Wakely --- Backported for 12.2
[Bug libstdc++/104443] common_iterator::operator-> is not correctly implemented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104443 Jonathan Wakely changed: What|Removed |Added Target Milestone|13.0|12.2 --- Comment #7 from Jonathan Wakely --- Backported for 12.2
[Bug libstdc++/105957] __n * sizeof(_Tp) might overflow under consteval context for std::allocator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105957 Jonathan Wakely changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from Jonathan Wakely --- Backported for 12.2
[Bug tree-optimization/106514] New: [12/13 Regression] ranger slowness in path query
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514 Bug ID: 106514 Summary: [12/13 Regression] ranger slowness in path query Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- When you bump --param max-jump-thread-duplication-stmts only so slightly, like by a factor of two to 30, making the effective backwards threading limit 15, the gcc.dg/pr69592.c -O2 compile-time explodes and you'll see backwards jump threading : 9.17 ( 92%) 0.01 ( 20%) 9.18 ( 92%) 24 ( 0%) note the testcase is somewhat degenerate with a series of diamonds also "nicely" showing the backwards threader exponential behavior in exploring the threading path space (plus ontop the quadraticness with starting on every condition). The current effective limit of 7 copied stmts limits the effective thread length to a single diamond, avoiding the issue. perf shows you Samples: 143K of event 'cycles:u', Event count (approx.): 127516678963 Overhead Samples Command Shared Object Symbol 24.36% 34962 cc1 cc1 [.] bitmap_bit_p 18.78% 26962 cc1 cc1 [.] bitmap_list_find_element 14.98% 21505 cc1 cc1 [.] path_oracle::killing_def 1.94% 2791 cc1 cc1 [.] path_range_query::compute_ranges_in_block so it's not the exponential search space per-se but the high overhead of ranger, specifically the relation oracle which seems to be unbound. path_range_query::range_defined_in_block calling path_oracle::killing_def 87 000 times gets you 600 000 000 bitmap_bit_p queries (resulting in 10 billion(!) bitmap list walk steps). Part of the sub-optimality is probably the equiv chain becoming very long (can we simply limit that?) and clearing bits in all the very many bitmaps linked. Not to say that linked lists (for relations and equivalences) are hardly a good data structure for anything but inserts/removals :/ The bitmap (on the list) + linked list combos should probably be all replaced with splay trees. There's (unused) splay-tree-utils.h that seem to be splay tree "adaptors" ontop of something with links, but libiberty splay-trees of course work as well. I think it's worth optimizing for small number of elements, thus favor a balanced tree over hashing. Note for the testcase at hand it's walking of the m_equiv list, not the m_relations one so it might be a bit more difficult to fix this than if the issue were the m_relations chain. I'm also seeing missed micro-optimization like // Walk the equivalency list and remove SSA from any equivalencies. if (bitmap_bit_p (m_equiv.m_names, v)) ... else bitmap_set_bit (m_equiv.m_names, v); which can be written as if (!bitmap_set_bit (m_equiv.m_names, v)) likewise for bitmap_clear_bit. Both return whether the bit changed with the operation. Of course that's just constant factor, the issue here is complexity involving the linear list walks.
[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514 Richard Biener changed: What|Removed |Added Target Milestone|--- |12.2 CC||amacleod at redhat dot com Keywords||compile-time-hog
[Bug testsuite/106515] New: [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515 Bug ID: 106515 Summary: [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: seurer at gcc dot gnu.org Target Milestone: --- g:5df04a7aa837a13b0e14d269c37bd3871d86bf08, r13-1937-g5df04a7aa837a1 make -k check-gcc RUNTESTFLAGS="btf.exp=gcc.dg/debug/btf/btf-int-1.c" FAIL: gcc.dg/debug/btf/btf-int-1.c scan-assembler-times [\t ]0x..[\t ]+[^\n]*bti_encoding 3 # of expected passes13 # of unexpected failures1 commit 5df04a7aa837a13b0e14d269c37bd3871d86bf08 (HEAD, refs/bisect/bad) Author: Jose E. Marchesi Date: Fri Jul 22 12:40:50 2022 +0200 btf: do not use the CHAR `encoding' bit for BTF
[Bug testsuite/106515] [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515 Jose E. Marchesi changed: What|Removed |Added CC||jose.marchesi at oracle dot com --- Comment #1 from Jose E. Marchesi --- Hello. Thanks for reporting this. Could you please attach the $top_builddir/gcc/testsuite/gcc/gcc.log file you get after running the testsuite? Thanks.
[Bug testsuite/106515] [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515 --- Comment #2 from Jose E. Marchesi --- Don't bother I just reproduced the issue in powerpc64le-linux-gnu.
[Bug testsuite/106515] [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515 --- Comment #3 from Jose E. Marchesi --- This is due to having not so good regular expressions in the test btf-int-1.c and to a slightly different way than the powerpc backend has to comment lines in assembly. Working on a fix.
[Bug testsuite/106516] New: New test case gcc.dg/pr104992.c fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106516 Bug ID: 106516 Summary: New test case gcc.dg/pr104992.c fails Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: seurer at gcc dot gnu.org Target Milestone: --- g:388fbbd895e72669909173c3003ae65c6483a3c2, r13-1916-g388fbbd895e726 I only saw this on a power 10 machine. make -k check-gcc RUNTESTFLAGS="dg.exp=gcc.dg/pr104992.c" FAIL: gcc.dg/pr104992.c scan-tree-dump-times optimized " % " 9 # of expected passes1 # of unexpected failures1 commit 388fbbd895e72669909173c3003ae65c6483a3c2 (HEAD, refs/bisect/bad) Author: Sam Feifer Date: Fri Jul 29 09:44:48 2022 -0400 match.pd: Add new division pattern [PR104992] Executing on host: /home/seurer/gcc/git/build/gcc-test/gcc/xgcc -B/home/seurer/gcc/git/build/gcc-test/gcc/ /home/seurer/gcc/git/gcc-test/gcc/testsuite/gcc.dg/pr104992.c -fdiagnostics-plain-output -O2 -fdump-tree-optimized -S -o pr104992.s (timeout = 300) spawn -ignore SIGHUP /home/seurer/gcc/git/build/gcc-test/gcc/xgcc -B/home/seurer/gcc/git/build/gcc-test/gcc/ /home/seurer/gcc/git/gcc-test/gcc/testsuite/gcc.dg/pr104992.c -fdiagnostics-plain-output -O2 -fdump-tree-optimized -S -o pr104992.s^M PASS: gcc.dg/pr104992.c (test for excess errors) gcc.dg/pr104992.c: pattern found 6 times FAIL: gcc.dg/pr104992.c scan-tree-dump-times optimized " % " 9
[Bug testsuite/106516] New test case gcc.dg/pr104992.c fails on power 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106516 seurer at gcc dot gnu.org changed: What|Removed |Added CC||bergner at gcc dot gnu.org Summary|New test case |New test case |gcc.dg/pr104992.c fails |gcc.dg/pr104992.c fails on ||power 10 --- Comment #1 from seurer at gcc dot gnu.org --- I should have said it ONLY fails on power 10. Works fine on power 9 and earlier. I can't find a valid email address for Sam Feifer to use for bugzilla.
[Bug tree-optimization/104992] [missed optimization] x / y * y == x not optimized to x % y == 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104992 seurer at gcc dot gnu.org changed: What|Removed |Added CC||seurer at gcc dot gnu.org --- Comment #3 from seurer at gcc dot gnu.org --- The fix for this causes an issue on power 10. See PR106516
[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514 --- Comment #1 from Andrew Macleod --- (In reply to Richard Biener from comment #0) > > Part of the sub-optimality is probably the equiv chain becoming very long > (can we simply limit that?) and clearing bits in all the very many > bitmaps linked. N we certainly could. Especially in the path version which has that killing-def issue which doesn't exist in the normal oracle. The path oracle basically takes a normal oracle, then bolts the path following code onto it, and has to deal with new defintions invalidating any existing equivalences. I'd first look for inefficiencies elsewhere as we didnt spend a lot of time tweaking it once it was working. Given the way equivalences have to be matched, Im not sure that we even need to walk the list. The new equivalence set for the killed def will only contain itself, or any new equivalences encountered since the kill. In order to be equivalent, 2 names must be in each others set, which they won't be. I'm not convinced we need to remove them at all. Im also not sure why the path oracle changes the root oracle requirement that they be the same equivalence set, not just in each others. I think it has something to do with the transitory nature of the path equivalence/relations vs the root oracles "permanent" sets. I think we can do better here too. And finally, Aldy has a list of all the ssa-names in the path that are relevant to the calculations in the path. I suspect we can reduce any equivalence sets immediately to just those names, as any on-entry ranges should reflect existing equivalences. in theory :-) We'll see if any or all of those have any effect and get back to you.
[Bug testsuite/106515] [13 regression] gcc.dg/debug/btf/btf-int-1.c fails after r13-1937-g5df04a7aa837a1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106515 --- Comment #4 from CVS Commits --- The master branch has been updated by Jose E. Marchesi : https://gcc.gnu.org/g:f0688c82ba8206a3d8960eb1d4821dc6a5f2a9f4 commit r13-1951-gf0688c82ba8206a3d8960eb1d4821dc6a5f2a9f4 Author: Jose E. Marchesi Date: Wed Aug 3 18:50:05 2022 +0200 testsuite: btf: fix regexps in btf-int-1.c The regexps in hte test btf-int-1.c were not working properly with the commenting style of at least one target: powerpc64le-linux-gnu. This patch changes the test to use better regexps. Tested in bpf-unkonwn-none, x86_64-linux-gnu and powerpc64le-linux-gnu. Pushed to master as obvious. gcc/testsuite/ChangeLog: PR testsuite/106515 * gcc.dg/debug/btf/btf-int-1.c: Fix regexps in scan-assembler-times.
[Bug target/91674] [ARM/thumb] redundant memcpy does not get optimized away on thumb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91674 Richard Earnshaw changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #1 from Richard Earnshaw --- This is essentially a dup of PR105090, which is now fixed on master. The code generation in both Arm and Thumb2 state is essentially the same early on, but in Thumb we were unable to optimize away all the byte manipulations. The unused stack slot was needed at the time of early expansion to RTL and once created there's no mechanism for getting rid of it if it is no-longer needed. *** This bug has been marked as a duplicate of bug 105090 ***
[Bug target/105090] BFI instructions are not generated on arm-none-eabi-g++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105090 Richard Earnshaw changed: What|Removed |Added CC||andij.cr at gmail dot com --- Comment #7 from Richard Earnshaw --- *** Bug 91674 has been marked as a duplicate of this bug. ***
[Bug target/105090] BFI instructions are not generated on arm-none-eabi-g++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105090 Richard Earnshaw changed: What|Removed |Added Target Milestone|--- |13.0
[Bug target/106517] New: RISC-V: Inefficient Generated Code for Floating Point to Integer Rounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106517 Bug ID: 106517 Summary: RISC-V: Inefficient Generated Code for Floating Point to Integer Rounds Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: palmer at gcc dot gnu.org Target Milestone: --- RISC-V has a handful of floating-point conversion instructions that we don't appear to be taking advantage of. For example long f(double in) { return __builtin_floor(in); } generates a call to the floor() library routine, while I believe we can implement in via just a "fcvt.l.d a0, fa0, rdn" (RISC-V clang and arm64 GCC). There are a bunch of similar patterns, the aarch64 test suite seems to have pretty good coverage of them. We should port those tests over to RISC-V, figure out which conversions we can implement directly, and then fix whatever's broken. I started poking around a bit and found that even some of the conversions where we have MD file patterns aren't behaving as expected, so there might be some deeper issue going on. This has come up in a handful of forums lately and while we're still hoping to find some time to look into it, I figured it'd be best to open at least a basic bug so at least we can have one place to track the issues.
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #27 from Segher Boessenkool --- IMO what vec_select calls element 0 is always in the first argument of the vec_concat it works on, in BE as well as LE. But yes this is quite underdefined in our documentation, and who know what is actually implemented, in targets as well as in generic code :-(
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #28 from Segher Boessenkool --- (In reply to rsand...@gcc.gnu.org from comment #25) > - On big-endian targets, vector loads and stores are assumed to put the > first memory element at the most significant end of the vector register. I agree with everything here, except calling this "most significant". That just makes no sense for vectors. It is element 0, but that is not more significant than any other element :-) Vectors aren't integers.
[Bug middle-end/25521] change semantics of const volatile variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25521 --- Comment #9 from Jose E. Marchesi --- So I got feedback from the clang/llvm folks on this. As you can see in [1] they asked the WG14 reflectors about the footnote 135 in the C18 spec and their conclusion is that there is no normative objection to place `const volatile' variables in read-only sections, much like non-volatile consts. This matches my earlier impression (before I got pointed to that footnote) and since there is at least one target being impacted by this GCC/LLVM discrepancy (bpf-unknown-none) I intend to prepare a patch to change the place where GCC places the `const volatiles'. [1] https://github.com/llvm/llvm-project/issues/56468
[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888 --- Comment #3 from Segher Boessenkool --- Your second option isn't correct: all these nops should be consecutive. Your option 1 is fine :-)
[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514 --- Comment #2 from CVS Commits --- The master branch has been updated by Andrew Macleod : https://gcc.gnu.org/g:19ffb35d17474bb4dd3eb78963c28d10b5362321 commit r13-1952-g19ffb35d17474bb4dd3eb78963c28d10b5362321 Author: Andrew MacLeod Date: Wed Aug 3 13:55:42 2022 -0400 Do not walk equivalence set in path_oracle::killing_def. When killing a def in the path ranger, there is no need to walk the set of existing equivalences clearing bits. An equivalence match requires that both ssa-names have to be in each others set. As killing_def creates a new empty set contianing only the current def, it already ensures false equivaelnces won't happen. PR tree-optimization/106514 * value-relation.cc (path_oracle::killing_def) Do not walk the equivalence set clearing bits.
[Bug rtl-optimization/106518] New: Exchange/swap aware register allocation (generate xchg in reload)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106518 Bug ID: 106518 Summary: Exchange/swap aware register allocation (generate xchg in reload) Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: roger at nextmovesoftware dot com Target Milestone: --- This enhacement request is a proposal for improving/tweaking GCC's register allocation, but assuming/making use of a register exchange/swap operation as a useful abstraction. Currently reload/lra is (solely) "move"-based, so when the contents of regA need to be placed in regB and the original contents of regB need to be placed in regA, they make use of a temporary register (or a spill) and generate the classic sequence: tmp=regA; regA=regB; regB=tmp. A small improvement is to tweak register allocation to assume, as a higher level abstraction, the existence of an exchange/swap instruction, like x86's xchg, much like is assummed/used during the reg-stack pass (with i387's fxch). [https://gcc.gnu.org/legacy-ml/gcc-patches/2004-12/msg00815.html] During early register allocation, we introduce virtual exchange operations, that on can be lowered as a later pass, either to real exchange operations on targets that support them, or to the standard three-move shuffle sequence above, if there's a spare suitable temporary register, or alternatively to the sequence regA^=regB; regB^=regA; regA^=regB, which implements an exchange using three fast instructions without requiring an additional register. These three alternatives guarantee that register allocation is no worse than current, but has the flexibility to use fewer registers and perhaps fewer instructions. On modern hardware, xchg is sometimes zero latency (using register renaming), and on older architectures, a three xor sequence has the same latency as three moves, but requires on less register, helpfully reducing register pressure. An example application/benefit of this PR rtl-optimization/97756, which demonstrates that the x86_64 ABI frequently places (TImode double word) registers in locations that then neeed the high and low parts to be swapped (or moved) to place them in the (reg X) and (reg X+1) locations required by GCC's multi-word register allocation requirements. Interestingly, GCC's middle-end doesn't have a standard named pattern for an exchange/swap instruction, i.e. an optab, so currently it has no (easy) way of deciding whether a target has an xchg-like instruction, which helps explain why it doesn't currently use/generate them.
[Bug middle-end/106519] New: [13 Regression] internal compiler error: in gimple_phi_arg, at gimple.h:4594 by r13-1950
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106519 Bug ID: 106519 Summary: [13 Regression] internal compiler error: in gimple_phi_arg, at gimple.h:4594 by r13-1950 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: tamar.christina at arm dot com Target Milestone: --- On x86-64, r13-1950 caused FAIL: gcc.dg/analyzer/pr96653.c (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: gcc.dg/analyzer/pr96653.c (test for excess errors) FAIL: g++.dg/warn/uninit-pr105562.C -std=gnu++14 (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: g++.dg/warn/uninit-pr105562.C -std=gnu++14 (test for excess errors) FAIL: g++.dg/warn/uninit-pr105562.C -std=gnu++17 (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: g++.dg/warn/uninit-pr105562.C -std=gnu++17 (test for excess errors) FAIL: g++.dg/warn/uninit-pr105562.C -std=gnu++20 (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: g++.dg/warn/uninit-pr105562.C -std=gnu++20 (test for excess errors) FAIL: gfortran.dg/make_unit.f90 -O1 (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: gfortran.dg/make_unit.f90 -O1 (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: gfortran.dg/make_unit.f90 -O1 (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: gfortran.dg/make_unit.f90 -O1 (test for excess errors) FAIL: gfortran.dg/make_unit.f90 -O1 (test for excess errors) FAIL: gfortran.dg/make_unit.f90 -O1 (test for excess errors) with -m32.
[Bug middle-end/106519] [13 Regression] internal compiler error: in gimple_phi_arg, at gimple.h:4594 by r13-1950
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106519 seurer at gcc dot gnu.org changed: What|Removed |Added CC||seurer at gcc dot gnu.org --- Comment #1 from seurer at gcc dot gnu.org --- Seeing this on powerpc64 as well for FAIL: gcc.dg/torture/pr61346.c -O1 (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: gcc.dg/torture/pr61346.c -O1 (test for excess errors) FAIL: gfortran.dg/make_unit.f90 -O1 (internal compiler error: in gimple_phi_arg, at gimple.h:4594) FAIL: gfortran.dg/make_unit.f90 -O1 (test for excess errors)
[Bug c++/106520] New: 2+ index expressions in build_op_subscript are incorrectly interpreted as comma expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106520 Bug ID: 106520 Summary: 2+ index expressions in build_op_subscript are incorrectly interpreted as comma expression Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: mkretz at gcc dot gnu.org Target Milestone: --- Commit b38c9cf6d570f6c4c1109e00c8b81d82d0f24df3 implemented Multidimensional subscript operator [PR102611]. However, the backwards compatibility leads to surprising results. E.g.: struct A { void operator[](unsigned); void operator[](unsigned, unsigned); }; struct B { explicit operator unsigned() const; }; void f(A a, B b) { a[1]; a[b, 2]; } Compiles to two calls to A::operator[](unsigned) with the following diagnostics: : In function 'void f(A, B)': :15:4: warning: top-level comma expression in array subscript changed meaning in C++23 [-Wcomma-subscript] 15 | a[b, 2]; |^ [https://godbolt.org/z/f6vf3x5Gv] The user probably intended to call the two-index subscript overload. But there's no indication why the call failed. The warning is probably puzzling to most users. It's probably not obvious to most users that the "wrong" function gets called. I'm not sure the compatibility issue is worth it. I think it would be better to call build_op_subscript with unmodified complain and let code that turns on -std=c++23 break if it relies on comma expressions in subscripts.
[Bug tree-optimization/106521] New: ICE at -O1 with "-floop-unroll-and-jam --param unroll-jam-min-percent=0": verify_ssa failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106521 Bug ID: 106521 Summary: ICE at -O1 with "-floop-unroll-and-jam --param unroll-jam-min-percent=0": verify_ssa failed Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zhendong.su at inf dot ethz.ch Target Milestone: --- It appears to be a recent regression (and possibly related to PR106249). Compiler Explorer: https://godbolt.org/z/Tanf9axav [545] % gcctk -v Using built-in specs. COLLECT_GCC=gcctk COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/13.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk/configure --disable-bootstrap --enable-checking=yes --prefix=/local/suz-local/software/local/gcc-trunk --enable-sanitizers --enable-languages=c,c++ --disable-werror --enable-multilib --with-system-zlib Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.0.0 20220803 (experimental) [master r13-1950-g9bb19e143cf] (GCC) [546] % [546] % gcctk -O1 -floop-unroll-and-jam --param unroll-jam-min-percent=0 small.c small.c: In function ‘main’: small.c:4:5: error: definition in block 30 does not dominate use in block 33 4 | int main() { | ^~~~ for SSA_NAME: b_lsm.15_82 in statement: b_lsm.15_23 = PHI PHI argument b_lsm.15_82 for PHI node b_lsm.15_23 = PHI during GIMPLE pass: unrolljam small.c:4:5: internal compiler error: verify_ssa failed 0x11356ef verify_ssa(bool, bool) ../../gcc-trunk/gcc/tree-ssa.cc:1211 0x105fb5b rewrite_into_loop_closed_ssa_1 ../../gcc-trunk/gcc/tree-ssa-loop-manip.cc:576 0x105fb5b rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int) ../../gcc-trunk/gcc/tree-ssa-loop-manip.cc:626 0x1c94c2d tree_loop_unroll_and_jam ../../gcc-trunk/gcc/gimple-loop-jam.cc:612 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. [547] % [547] % cat small.c short a, b, e; volatile long c; long d; int main() { for (; d; d++) { long g = a = 1; for (; a; a++) { g++; c; } g && (b = e); } return 0; }
[Bug middle-end/106519] [13 Regression] internal compiler error: in gimple_phi_arg, at gimple.h:4594 by r13-1950
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106519 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed||2022-08-03 Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Tamar Christina --- The condition checks that the two BBs share the same successor but forgot to check that both BB have only one successor. It looks like with -m32 (and powerpc) the order of the edges just happen to match and the assert triggers. Testing a patch overnight and will post tomorrow.
[Bug bootstrap/43301] top-level configure script ignores ---with-build-time-tools
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43301 Eric Gallager changed: What|Removed |Added CC||aoliva at gcc dot gnu.org --- Comment #7 from Eric Gallager --- (In reply to Iain Sandoe from comment #6) > JFTR, I had cause to use this today on powerpc-darwin9, and it seemed to > DTRT - so it would be useful to establish what it was that did not work > before, that was fixed by the patch. > > /src-local/gcc-git-11/configure > --prefix=/opt/iains/powerpc-apple-darwin9/gcc-11-3Dr2d > --build=powerpc-apple-darwin9 --enable-languages=all --with-tune-cpu=G5 > --enable-libphobos --with-libphobos-druntime-only > CC=powerpc-apple-darwin-gcc CXX=powerpc-apple-darwin-g++ > --with-build-time-tools=/opt/iains/powerpc-apple-darwin9/gcc-11-3Dr2d/bin > > Without the > "--with-build-time-tools=/opt/iains/powerpc-apple-darwin9/gcc-11-3Dr2d/bin" > the system linker and assembler are found and used (which fails to work with > D, causing a bootstrap fail) with the option, the relevant tools are found > and bootstrap succeeded > > (so I am not sure what the original problem was > since $build is not specified in the summary, I guess we must assume it was > i686-pc-cygwin so perhaps the problem is specific to that setup?) Alexandre Oliva's assessment is that the issue was just one having an old build left over, and that all the patch did was to force a rebuild: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599169.html (so we might be going with his patch instead)
[Bug c++/106502] Three calls to __attribute__((const)) function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106502 Eric Gallager changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=18487 CC||egallager at gcc dot gnu.org --- Comment #6 from Eric Gallager --- (In reply to Jonathan Wakely from comment #5) > I noticed this by adding a printf statement to the const function for > temporary debugging purposes, which is obviously incorrect Seems related to bug 18487 IMO.
[Bug testsuite/106516] New test case gcc.dg/pr104992.c fails on power 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106516 Kewen Lin changed: What|Removed |Added Ever confirmed|0 |1 CC||linkw at gcc dot gnu.org Last reconfirmed||2022-08-04 Status|UNCONFIRMED |NEW --- Comment #2 from Kewen Lin --- Confirmed, this is a test issue, power10 and up specific. The difference comes from the function thud, it aims to test the pattern works for vector type. Power10 starts to support the insn vmodsw for vector integer mod. So it gets: vector(4) int thud (vector(4) int x, vector(4) int y) _1 = x_3(D) % y_4(D); _2 = _1 == { 0, 0, 0, 0 }; instead of [local count: 1073741824]: _7 = BIT_FIELD_REF ; _8 = BIT_FIELD_REF ; _9 = _7 % _8; _10 = BIT_FIELD_REF ; _11 = BIT_FIELD_REF ; _12 = _10 % _11; _13 = BIT_FIELD_REF ; _14 = BIT_FIELD_REF ; _15 = _13 % _14; _16 = BIT_FIELD_REF ; _17 = BIT_FIELD_REF ; _18 = _16 % _17; _1 = {_9, _12, _15, _18}; _2 = _1 == { 0, 0, 0, 0 }; We can adjust the test case to expect 6 times "%" on target power10_ok specially, but I wonder if we also find this fail on some other targets which supports vector mod, if so, one overall complete guard would be better.
[Bug c++/106502] Three calls to __attribute__((const)) function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106502 --- Comment #7 from Jonathan Wakely --- The way I found the bug might be, but the bug itself is nothing to do with that.