[Bug target/120941] [16 Regression] 10-40% slowdown of 519.lbm_r on Zen{2,3} since r16-1644-gaba3b9d3a48a07
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 --- Comment #5 from Haochen Jiang --- I have tried on Zen3 Client machine with -Ofast -flto -march=native/znver2 and see no regression on both options before and after the commit.
[Bug tree-optimization/120358] [15/16 regression] qtbase-6.9.0 miscompiled since r15-580-gf3e5f4c58591f5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120358 --- Comment #26 from Sam James --- I'm going to try clean it up with my poor C++, as I can't follow it at all right now.
[Bug target/120943] [16 Regression] 5% slowdown of 527.cam4_r on Zen{4,5} since r16-1643-gd073bb6cfc219d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120943 H.J. Lu changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from H.J. Lu --- *** This bug has been marked as a duplicate of bug 120683 ***
[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 120943, which changed state. Bug 120943 Summary: [16 Regression] 5% slowdown of 527.cam4_r on Zen{4,5} since r16-1643-gd073bb6cfc219d https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120943 What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE
[Bug target/120683] vector_loop/unrolled_loop generates poor codes on memset/memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120683 H.J. Lu changed: What|Removed |Added CC||pheeck at gcc dot gnu.org --- Comment #4 from H.J. Lu --- *** Bug 120943 has been marked as a duplicate of this bug. ***
[Bug ipa/104457] ipa-cp with autofdo: internal compiler error in update_specialized_profile, at ipa-cp.c:4422
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457 Jan Hubicka changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED CC||hubicka at gcc dot gnu.org --- Comment #3 from Jan Hubicka --- I believe update_specialized_profile should now be safe WRT ICE on contradicting profiles. I can build SPEC on x86 reliably (and we now run daily testing at LNT https://lnt.opensuse.org/db_default/v4/SPEC/recent_activity as autofdo). I have no AARCH64 setup but hope at IPA level they are similar enough. Last fix was for PR119924
[Bug tree-optimization/120867] [metabug] AutoFDO issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867 Bug 120867 depends on bug 104457, which changed state. Bug 104457 Summary: ipa-cp with autofdo: internal compiler error in update_specialized_profile, at ipa-cp.c:4422 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457 What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED
[Bug sanitizer/120950] New: Incorrect codegen for ASan when -mpreferred-stack-boundary < 3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120950 Bug ID: 120950 Summary: Incorrect codegen for ASan when -mpreferred-stack-boundary < 3 Product: gcc Version: 15.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: yshuiv7 at gmail dot com CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org Target Milestone: --- The generated ASan instrumentation always assume the stack pointer is aligned to at least 8 bytes aligned, and shadow address calculation will be all wrong if the stack is aligned to 4 bytes. This is normally masked by the use of fake stacks, which because of how they are allocated, has at least 64 bytes alignment. But this can be disabled with ASAN_OPTIONS=detect_stack_use_after_return=0 Here is a repro: #include struct T { uint16_t a; uint16_t b; uint32_t c; }; int __attribute__((noinline)) g(struct T *__attribute__((unused)) value) { return 0; } int __attribute__((noinline)) f(int a) { struct T value = {a, a * 2, sizeof(a)}; return g(&value); } int main() { volatile char pad[4]; // for me this pushes sp off 8 bytes alignment, // but this can be tricky. maybe i can switch // this to a little bit of inline assembly if // this has trouble repro-ing f(10); } -- Build with: i686-unknown-linux-gnu-gcc -mpreferred-stack-boundary=2 -fsanitize=address a.c -- Run with: ASAN_OPTIONS=detect_stack_use_after_return=0 ./a.out -- Expected behavior: Program runs fine. -- Actual behavior: Initialization of `struct T value` trips ASan, and it calls __asan_report_store4, which should not be called at all. Additionally, the stack frame magic is stored at address misaligned with the shadow, so the asan runtime can't find it and aborts. -- Additional information: Tried similar setup on clang. Looks like clang will always realign the stack pointer to 32 bytes alignment in prologue.
[Bug c++/120878] [12/13/14/15/16 Regression] ICE: in adjust_temp_type, at cp/constexpr.cc:1791
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120878 Richard Biener changed: What|Removed |Added Priority|P3 |P4 Target Milestone|--- |12.5
[Bug libstdc++/120949] New: [16 regression] rejected with clang-20.1.7
Supported LTO compression algorithms: zlib zstd gcc version 16.0.0 20250704 (experimental) d6ed12ed5ebac8e50da9defea6af832039782cbf (Gentoo Hardened 16.0. p, commit 9fdf5a30ded9c691d9fcdb787e72f8dd0f111f8a) ```
[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949 Sam James changed: What|Removed |Added CC||redi at gcc dot gnu.org Target Milestone|--- |16.0 --- Comment #1 from Sam James --- r16-1911-g6596f5ab746533
[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949 --- Comment #2 from Sam James --- Created attachment 61798 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61798&action=edit readable.ii.xz
[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949 --- Comment #3 from Sam James --- Not sure if missing something but I see a bunch of the changes in that commit do it right (?) (as mentioned in the commit message), just some don't follow that pattern.
[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- The old code was using _GLIBCXX_NODISCARD __attribute__((__always_inline__)) ordering while the new one __attribute__((__always_inline__)) _GLIBCXX_NODISCARD Perhaps only the former works with both compilers? Though __attribute__((always_inline)) [[nodiscard]] constexpr int foo () { return 42; } [[nodiscard]] __attribute__((always_inline)) constexpr int bar () { return 43; } struct S { template __attribute__((always_inline)) [[nodiscard]] friend inline int baz () { return 42; } template [[nodiscard]] __attribute__((always_inline)) friend inline int qux () { return 43; } }; compiles by both gcc 15.1 and clang trunk.
[Bug testsuite/120859] FAIL: gcc.dg/tree-prof/afdo-crossmodule-1b.c compilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #4 from Jan Hubicka --- Sorry, I meant to look into this - I originally used RUNTESTFLAGS=tree-prof.exp=afdo-crossmodule-1.c since that is intended to be the testcase and missed the part that the other file will be handled as independent test. I wanted to find way to build it just once and still get dump fils matched, but do not know how to do that. Your fix is fine. Thanks! Honza
[Bug target/120900] C++ passes user aligned struct differently from C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120900 --- Comment #11 from H.J. Lu --- (In reply to H.J. Lu from comment #10) > This makes C similar to C++: > > diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc > index 8bbd6ebc66a..0da6c65fc6a 100644 > --- a/gcc/c/c-decl.cc > +++ b/gcc/c/c-decl.cc > @@ -5706,8 +5706,17 @@ start_decl (struct c_declarator *declarator, struct > c_declspecs *declspecs, >&& !flag_no_common) > DECL_COMMON (decl) = 1; > > + /* Similar to C++, apply any attributes directly to the record or > + union type. */ > + int flags; > + if (TREE_CODE (decl) == TYPE_DECL > + && RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))) > +flags = ATTR_FLAG_TYPE_IN_PLACE; > + else > +flags = 0; > + >/* Set attributes here so if duplicate decl, will have proper attributes. > */ > - c_decl_attributes (&decl, attributes, 0); > + c_decl_attributes (&decl, attributes, flags); > >/* Handle gnu_inline attribute. */ >if (declspecs->inline_p I got the following regressions with the change above: FAIL: c-c++-common/builtin-has-attribute-7.c -Wc++-compat (test for excess errors) FAIL: gcc.dg/sso-4.c (test for errors, line 18) FAIL: gcc.dg/sso-4.c (test for errors, line 19)
[Bug tree-optimization/120358] [15/16 regression] qtbase-6.9.0 miscompiled since r15-580-gf3e5f4c58591f5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120358 --- Comment #25 from Sam James --- (In reply to Sam James from comment #24) > @@ -1882,7 +1881,6 @@ IPA function summary for void ar::bc::operator++() > [with aq = QStringView]/7 >calls: Adding __attribute__((noipa)) on that template works.
[Bug tree-optimization/120948] Cannot detect potential division-by-zero when numerator is 1 and denominator is variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120948 Richard Biener changed: What|Removed |Added Component|c |tree-optimization Keywords||missed-optimization --- Comment #1 from Richard Biener --- So you say that we fail to optimize int foo (unsigned x) { unsigned tem = 1/x; if (x == 0) return 5; return tem; } because we turn 1/x into x == 1? A phase ordering issue, obviously. Relevant in practice? I'm not sure.
[Bug tree-optimization/120358] [15/16 regression] qtbase-6.9.0 miscompiled since r15-580-gf3e5f4c58591f5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120358 --- Comment #24 from Sam James --- (In reply to Sam James from comment #11) > Created attachment 61749 [details] > small.cxx OK, on this, with a small adjustment to change the two ""s as args to char* str1, str2: --- a/small.cxx.057t.local-fnsummary2 +++ b/small.cxx.057t.local-fnsummary2 @@ -1795,7 +1795,6 @@ IPA function summary for ar::as ar::aw(ao) [with aq = QStringView]/104 i calls: long long int QtPrivate::ck(QStringView, long long int)/27 function not considered for inlining freq:0.41 loop depth: 0 size: 4 time: 13 callee size:12 stack: 0 predicate: (op1[offset: 32] >= 0) - op1 points to local or readonly memory struct as ar::aw (struct ar * const this, struct ao o) { @@ -1882,7 +1881,6 @@ IPA function summary for void ar::bc::operator++() [with aq = QStringView]/7 calls: long long int QtPrivate::ck(QStringView, long long int)/27 function not considered for inlining freq:0.41 loop depth: 0 size: 4 time: 13 callee size:12 stack: 0 predicate: (op0[ref offset: 256] >= 0) - op1 points to local or readonly memory void ar::bc::operator++ (struct bc * const this) { diff --git a/small.cxx.088i.fnsummary b/small.cxx.088i.fnsummary index 3b673e5..09d0885 100644 --- a/small.cxx.088i.fnsummary +++ b/small.cxx.088i.fnsummary @@ -233,7 +233,6 @@ IPA function summary for void ar::bc::operator++() [with aq = QStringView]/7 calls: long long int QtPrivate::ck(QStringView, long long int)/27 function not considered for inlining freq:0.41 loop depth: 0 size: 4 time: 13 predicate: (op0[ref offset: 256] >= 0) - op1 points to local or readonly memory Analyzing function: decltype (QtPrivate::dc{0, l, df::t ...}) df(aq, dd, de ...) [with aq = QString; dd = int; de = {am, }]/72 --- a/small.cxx.088i.fnsummary +++ b/small.cxx.088i.fnsummary @@ -233,7 +233,6 @@ IPA function summary for void ar::bc::operator++() [with aq = QStringView]/7 calls: long long int QtPrivate::ck(QStringView, long long int)/27 function not considered for inlining freq:0.41 loop depth: 0 size: 4 time: 13 predicate: (op0[ref offset: 256] >= 0) - op1 points to local or readonly memory Analyzing function: decltype (QtPrivate::dc{0, l, df::t ...}) df(aq, dd, de ...) [with aq = QString; dd = int; de = {am, }]/72 diff --git a/small.cxx.089i.inline b/small.cxx.089i.inline index c31c4f8..abe0280 100644 --- a/small.cxx.089i.inline +++ b/small.cxx.089i.inline @@ -182,7 +182,6 @@ IPA function summary for void ar::bc::operator++() [with aq = QStringView]/7 calls: long long int QtPrivate::ck(QStringView, long long int)/27 function not considered for inlining freq:0.41 loop depth: 0 size: 4 time: 13 callee size:12 stack: 0 predicate: (op0[ref offset: 256] >= 0) - op1 points to local or readonly memory IPA summary for bn< >::~bn() [with ag = QString]/68 is missing. IPA function summary for bn< >::~bn() [with ag = QString]/67 inlinable @@ -820,7 +819,7 @@ Updated mod-ref summary for int main()/52 Considering long long int QtPrivate::ck(QStringView, long long int)/27 with 24 size to be inlined into void ar::bc::operator++() [with aq = QStringView]/79 in small.cxx:177 Estimated badness is -0.15, frequency 3.36. - Parm map: -1 -5 + Parm map: -1 -1 Updated mod-ref summary for int main()/52 loads: Base 0: alias set 8
[Bug target/120941] [16 Regression] 10-40% slowdown of 519.lbm_r on Zen{2,3} since r16-1644-gaba3b9d3a48a07
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 H.J. Lu changed: What|Removed |Added CC||haochen.jiang at intel dot com --- Comment #4 from H.J. Lu --- (In reply to Richard Biener from comment #3) > placing sth at the nearest common dominator can increase register pressure > and cause extra spilling? Haochen couldn't reproduce this regression on Zen3 client. Can someone help him reproduce it?
[Bug tree-optimization/120952] New: GCC fails to optimize division to shift when divisor is non-constant power of two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120952 Bug ID: 120952 Summary: GCC fails to optimize division to shift when divisor is non-constant power of two Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org Target Milestone: --- Given std::bit_ceil (which returns the smallest power of two not less than its argument), I hoped the following would be optimized to use a shift: #include using std::size_t; size_t get_block_size(size_t bytes, size_t alignment) { alignment = std::__bit_ceil(alignment); if (!std::__has_single_bit(alignment)) // definitely a power of two now! __builtin_unreachable(); return ((bytes + alignment - 1) / alignment) * alignment; } on x86_64 GCC optimizes this to: "get_block_size(unsigned long, unsigned long)": mov rax, rdi mov edi, 1 cmp rsi, 1 jbe .L2 sub rsi, 1 bsr rsi, rsi lea ecx, [rsi+1] sal rdi, cl .L2: lea rcx, [rdi-1+rax] xor edx, edx mov rax, rcx div rdi mov rax, rcx sub rax, rdx ret Whereas clang uses a shift: get_block_size(unsigned long, unsigned long): mov eax, 1 cmp rsi, 2 jb .LBB0_2 dec rsi mov ecx, 127 bsr rcx, rsi xor ecx, 63 neg cl shl rax, cl .LBB0_2: lea rcx, [rdi + rax] dec rcx neg rax and rax, rcx ret If I remove the bit_ceil and just tell the compiler it's already a power of two we avoid the branch but GCC still uses a shift: using size_t = decltype(sizeof(0)); size_t get_block_size(size_t bytes, size_t alignment) { if (alignment & (alignment - 1)) __builtin_unreachable(); return ((bytes + alignment - 1) / alignment) * alignment; } GCC still uses DIV: "get_block_size(unsigned long, unsigned long)": lea rax, [rsi-1+rdi] xor edx, edx div rsi lea rax, [rsi-1+rdi] sub rax, rdx ret And Clang still shifts: get_block_size(unsigned long, unsigned long): lea rax, [rdi + rsi] dec rax rep bsf rcx, rsi shr rax, cl imulrax, rsi ret On IRC Andrew P said that we don't track pop count on ssa names.
[Bug libstdc++/118681] [C++17] unsynchronized_pool_resource may fail to respect alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118681 Jonathan Wakely changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=120952 --- Comment #9 from Jonathan Wakely --- Thanks. I was already thinking we should do this first: alignment = std::bit_ceil(alignment); to be safe against callers that violate the precondition to pass a valid alignment, and then we know it's a power of two. And with your version, we're not affected by GCC's missed-optimization (filed as Bug 120952).
[Bug tree-optimization/120952] GCC fails to optimize division to shift when divisor is non-constant power of two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120952 --- Comment #1 from Jonathan Wakely --- As noted in Bug 118681 comment 8, when we know it's a power of two we can do: return (bytes + alignment - 1) & ~(alignment - 1); to get even better codegen.
[Bug jit/120960] jit: The static initialization of builtin_data uses flag_openacc and flag_openmp (which have not been set yet)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120960 --- Comment #2 from Andrew Pinski --- Actually both_p being false or true is both ok here. #undef DEF_GOACC_BUILTIN_COMPILER #define DEF_GOACC_BUILTIN_COMPILER(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ flag_openacc, true, true, ATTRS, false, true) The only place which uses both_p and fallback_p is in get_asm_name . ``` const char *get_asm_name () const { if (both_p && fallback_p) return name + prefix_len; else return name; } ``` Which means if there is a call added to it and it does not get simplified, then it will call __builtin_. So the question becomes will anyone uses JIT with openmp/openacc offloading? Maybe not so much.
[Bug c++/120961] ICE on C++20 module when compiling Eigen.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120961 --- Comment #1 from shyeyian --- The attachment with size 8.7MB (zipped into 1.1MB) exceeds the size limit of gcc-bugzilla requirements. **attachment** has been uploaded to [attachment-online](https://raw.githubusercontent.com/AnonymousPC/gcc-bug-report/refs/heads/main/gcc-ice-preprocessed.out). **Thanks :)**
[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958 --- Comment #5 from kargls at comcast dot net --- (In reply to Martin Jambor from comment #1) > And indeed the following hack in Fortran FE "fixes" the benchmark (of > course, this is not meant as a proposed fix, just as a demonstration > where the problem is): > So, if I understand, you want an fnspec of ". . w w w w w w w". Can you show f->sym and f->sym-attr from gdb? Prior to F2008, MPI says the interface is USE MPI ! or the older form: INCLUDE 'mpif.h' MPI_IRECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, REQUEST, IERROR) BUF(*) INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM, REQUEST, IERROR with F2008, one has USE mpi_f08 MPI_Irecv(buf, count, datatype, source, tag, comm, request, ierror) TYPE(*), DIMENSION(..), ASYNCHRONOUS :: buf INTEGER, INTENT(IN) :: count, source, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm TYPE(MPI_Request), INTENT(OUT) :: request INTEGER, OPTIONAL, INTENT(OUT) :: ierror in either case an explicit interface is required, and gfortran should deal with the ' buf(*)' and 'type(*), dimension(..)' without your hack. Of particular interest, is the f->sym->ts.type.
[Bug demangler/67268] demangler does not elide empty argument pack in operator()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67268 Henning Meyer changed: What|Removed |Added CC||hmeyer.eu at gmail dot com --- Comment #2 from Henning Meyer --- This bug still exists in GCC 15. I noticed when creating a std::thread _ZNSt6threadC1IRFvvEJEvEEOT_DpOT0_ demangles to "std::thread::thread(void (&)())" The simplest example is template void funk(Args&&... ) {} which when called as funk(); mangles to _Z4funkIJEvEvDpOT_ which demangles to "void funk<, void>()" For a templated function or method, a template parameter pack may be followed by further template arguments, including more packs, possibly empty. When I build the standalone demangler in libiberty, I get this parse tree ./a.out -v _Z4funkIJEvEvDpOT_ typed name template name 'funk' template argument list template argument list template argument list builtin type void function type builtin type void argument list pack expansion rvalue reference template parameter 0 void funk<, void>() which is not what I would expect. I would expect typed name template name 'funk' template argument list template argument list builtin type void for a template argument list consisting of an empty pack and the type void. Which indicates to me that the bug is in the parsing logic and not in the formatting logic.
[Bug jit/120960] jit: The static initialization of builtin_data uses flag_openacc and flag_openmp (which have not been set yet)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120960 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2025-07-04 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- The builtins that matter: DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN_COMPILER (BUILT_IN_OMP_IS_INITIAL_DEVICE, "omp_is_initial_device", BT_FN_INT, ATTR_CONST_NOTHROW_LIST) DEF_GOMP_BUILTIN_COMPILER (BUILT_IN_OMP_GET_INITIAL_DEVICE, "omp_get_initial_device", BT_FN_INT, ATTR_PURE_NOTHROW_LIST) DEF_GOMP_BUILTIN_COMPILER (BUILT_IN_OMP_GET_NUM_DEVICES, "omp_get_num_devices", BT_FN_INT, ATTR_PURE_NOTHROW_LIST) Which are used for offloading and figuring out where the code is going to be run on. I am 99% sure if we always enable them in jit it would just work. So we could do: #define flag_openacc true #define flag_openmp true around the static array.
[Bug c++/120962] " XXX references internal linkage entity YYY" when compiling a module interface
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120962 Andrew Pinski changed: What|Removed |Added Blocks||103524 --- Comment #1 from Andrew Pinski --- Hmm, can you try gcc 15.1.0 which has a large number of modules fixes? I am not sure if this is valid or not either. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103524 [Bug 103524] [meta-bug] modules issue
[Bug target/120941] [16 Regression] 24-40% slowdown of 519.lbm_r on Zen2 and 470.lbm on Zen5 since r16-1644-gaba3b9d3a48a07
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 --- Comment #9 from H.J. Lu --- Created attachment 61803 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61803&action=edit A patch Please try this.
[Bug c++/120962] " XXX references internal linkage entity YYY" when compiling a module interface
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120962 --- Comment #2 from Markus --- I would have tried a newer version if I had had easy access to it. The Alpine distro does not provide newer packages, yet. What I could try was clang, which compiled the file without any errors.
[Bug c++/37590] g++ should emit different debug info for variable's type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37590 Henning Meyer changed: What|Removed |Added CC||hmeyer.eu at gmail dot com --- Comment #9 from Henning Meyer --- This is is not a GCC bug, this is a display issue in readelf --debug-dump=info. This is a display issue specfic to GNU binutils. If I compile the example with GCC 15 and run it through readelf --debug-dump=info, I get <1><49f2>: Abbrev Number: 102 (DW_TAG_variable) <49f3> DW_AT_name: (string) s <49f5> DW_AT_decl_file : (data1) 1 <49f6> DW_AT_decl_line : (data1) 2 <49f7> DW_AT_decl_column : (data1) 13 <49f8> DW_AT_linkage_name: (strp) (offset: 0x5830): _Z1sB5cxx11 <49fc> DW_AT_type: (ref4) <0x315a>, string, basic_string, std::allocator > DW_AT_type points to a typedef which has name string, but is a child of the namespace DIE with name std, the debug information is correct. If you use elfutils readelf instead of binutils readelf, the output is [ 49f2]variable abbrev: 102 name (string) "s" decl_file(data1) string.cpp (1) decl_line(data1) 2 decl_column (data1) 13 linkage_name (strp) "_Z1sB5cxx11" type (ref4) [ 315a] external (flag_present) yes location (exprloc) it won't show an incomplete name for the typedef.
[Bug tree-optimization/120952] GCC fails to optimize division to shift when divisor is non-constant power of two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120952 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Severity|normal |enhancement Last reconfirmed||2025-07-04 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- .
[Bug target/120782] RISC-V: vector-strict-align not working for spec17 521 ref size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120782 --- Comment #5 from Robin Dapp --- I tried reproducing this with a recent trunk (r16-1965-gc512c9090f52e7) but didn't see the exact code sequence. wrf also ran to completion on the Banana Pi. Did you use a stock GCC 15.1 or a specific commit?
[Bug other/120963] New: Missed optimization of (a > b ? a - b : 0) pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120963 Bug ID: 120963 Summary: Missed optimization of (a > b ? a - b : 0) pattern Product: gcc Version: 15.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: Explorer09 at gmail dot com Target Milestone: --- This could be multiple missed optimization issues in one same test code. I just don't know how to reduce the issues to even simpler test cases. ```c #include int32_t func1a(int32_t a, int32_t b) { if (b - a < 0) a = b; return b - a; } int32_t func1b(int32_t a, int32_t b) { if (b < a) a = b; return b - a; } int32_t func1c(int32_t a, int32_t b) { if (b < a) return 0; return b - a; } ``` x86-64 gcc 15.1 with `-Os -fno-wrapv` options produces the following assembly: ```x86asm func1a: subl%edi, %esi movl$0, %eax cmovns %esi, %eax ret func1b: cmpl%edi, %esi movl%esi, %eax cmovle %esi, %edi subl%edi, %eax ret func1c: movl%esi, %eax xorl%edx, %edx subl%edi, %eax cmpl%edi, %esi cmovl %edx, %eax ret ``` Note that the three functions are equivalent, and yet the generated assembly codes are different, and none of them are optimal. The optimal code, AFAIK, is this: ```x86asm func1: xorl%eax, %eax subl%edi, %esi cmovg %esi, %eax ret ``` When compared with the assembly of three functions above: * func1a missed the part that the `movl $0, %eax` can be simplified to `xorl %eax, %eax` if the instruction is performed _before_ the `sub` instruction, the condition code of which is checked later. * As signed integer overflow is undefined behavior in C, the `cmovns` in func1a could as well be transformed to `cmovg`. (This optimization should be disabled for `-fwrapv`.) * func1c missed that the `sub` and `cmp` could be merged to one operation. This can also save the `mov` instruction at the beginning. (I have reported Bug 113680 as similar to this issue.)
[Bug testsuite/120859] FAIL: gcc.dg/tree-prof/afdo-crossmodule-1b.c compilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859 Andrew Pinski changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org --- Comment #5 from Andrew Pinski --- Will handle this tomorrow or later tonight ...
[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 --- Comment #9 from Andrew Pinski --- Created attachment 61804 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61804&action=edit V2 of the patch ok, so as I originally was going to do before I thought changing EQ_EXPR over to ORDERED_EXPR would fix the issue. That is create the comparison in its own statement.
[Bug c++/120953] New: Accepts invalid with range for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120953 Bug ID: 120953 Summary: Accepts invalid with range for Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org Target Milestone: --- struct S { int a; long b; short c; }; constexpr S d = { 1, 2, 3 }, e = { 4, 5, 6 }, f = { 7, 8, 9 }; void foo () { long long r = 0; constexpr S h[] = { d, e, f }; for (static auto &g : h) r += g.a + g.b + g.c; } is accepted by g++ and rejected by clang++: loop variable 'g' may not be declared 'static' https://eel.is/c++draft/stmt.ranged#2 says that only type-specifier and constexpr can be present, so wonder what all we are missing to complain about and whether because we've been accepting it it should be just pedwarn or error.
[Bug c++/120954] New: [15 Regression] False positive -Warray-bounds=2 warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120954 Bug ID: 120954 Summary: [15 Regression] False positive -Warray-bounds=2 warning Product: gcc Version: 15.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sirl at gcc dot gnu.org Target Milestone: --- This small code snippet warns with r15-9921 (r15-9866 was still OK): static const int map01[32] = { 11, 12, 13, 14, 15 }; static const int map02[32] = { 21, 22, 23, 24, 25 }; static const int map03[32] = { 31, 32, 33, 34, 35 }; static const int map11[32] = { 111, 112, 113, 114, 115 }; static const int map12[32] = { 121, 122, 123, 124, 125 }; static const int map13[32] = { 131, 132, 133, 134, 135 }; int test(int n, int ver) { int r = 0; if (n >= 0 && n < 32) { r = ((ver >= 4) ? ((ver >= 0x65) ? map01 : map02 ) : map03)[n]; } else if (n >= 0x100 && n < 0x120) { r = ((ver >= 4) ? ((ver >= 0x65) ? map11 : map12 ) : map13)[n - 0x100]; }; return r; } # g++-15 -c -O2 -Warray-bounds=2 test-Warray-bounds.cpp test-Warray-bounds.cpp: In function 'int test(int, int)': test-Warray-bounds.cpp:16:86: warning: intermediate array offset -256 is outside array bounds of 'const int [32]' [-Warray-bounds=] 16 | r = ((ver >= 4) ? ((ver >= 0x65) ? map11 : map12 ) : map13)[n - 0x100]; | ~^ This one also might have been caused by commit r15-9896-g7fdf47538a659f6af8dadbecbb63c8a226b63754 Author: Jakub Jelinek Date: Tue Jul 1 15:28:10 2025 +0200 c++: Fix up cp_build_array_ref COND_EXPR handling [PR120471] But this time I'm not very confident about that. Compiling this a C code doesn't warn.
[Bug c++/120955] New: 50 % increase in data segment size on avr-gcc for -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120955 Bug ID: 120955 Summary: 50 % increase in data segment size on avr-gcc for -Os Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: fiesh at zefix dot tv Target Milestone: --- For our software that controls various machinery in our factory, we have the following sizes: Commit 0c83096f19b: textdata bss dec hex filename 21283013983760 217988 35384 /p_o2/phacility textdata bss dec hex filename 15242018343767 158021 26945 /p_os/phacility Commit 12de1942a0a: textdata bss dec hex filename 25009011403760 254990 3e40e /p_o2/phacility textdata bss dec hex filename 14211227423767 148621 2448d /p_os/phacility (o2 refers to an -O2 build with LTO, os refers to an -Os build with LTO) So 12de1942a0a caused: For O2: * The text section to become 17.5 % larger * The data section to become 18.5 % smaller For Os: * The text section to become 6.8 % smaller * The data section to become 49.5 % larger Please note that since this is a Harvard architecture, data and bss occupy RAM and thus reduce the available stack size. (In our case, we now can use neither O2 nor Os. One has too big a text segment, the other has data + bss leave too little stack for our software to work.) Alas I do not have a single translation unit to reproduce this easily. But I'm happy to share the project code that reproduces this to anyone who can do something with it or help in any other way. Please let me know and sorry for the lack of a testcase.
[Bug c++/120954] [15 Regression] False positive -Warray-bounds=2 warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120954 --- Comment #1 from Sam James --- (In reply to Franz Sirl from comment #0) > But this time I'm not very confident about that. Compiling this a C code > doesn't warn. It does for me, and it warns before Jakub's change.
[Bug c++/120954] [15 Regression] False positive -Warray-bounds=2 warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120954 --- Comment #2 from Sam James --- (In reply to Franz Sirl from comment #0) > This small code snippet warns with r15-9921 (r15-9866 was still OK): > r15-9873-g06a26f4d643a5d warns for me. ``` $ git shortlog r15-9866-g8d600e98004b63..r15-9873-g06a26f4d643a5d Eric Botcazou (3): Fix misoptimization of CONSTRUCTOR with reverse SSO Ada: Fix assertion failure on problematic container aggregate Fix compilation of concatenation with illegal character constant GCC Administrator (2): Daily bump. Daily bump. Harald Anlauf (2): Fortran: fix checking of renamed-on-use interface name [PR120784] Fortran: follow-up fix to checking of renamed-on-use interface name [PR120784] ``` It could be Eric's gimple-fold change but that looks unlikely to me. Are you sure your references are right, given it warns for me with C too? > But this time I'm not very confident about that. Compiling this a C code > doesn't warn.
[Bug c++/120955] [15/16 Regression] 50 % increase in data segment size on avr-gcc for -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120955 --- Comment #1 from fiesh at zefix dot tv --- 12de1942a0a is r15-6052-g12de1942a0a673
[Bug c++/120953] Accepts invalid with range for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120953 Jason Merrill changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2025-07-04 --- Comment #1 from Jason Merrill --- In cp_parser_init_declarator we've decided whether it's a range-for and still have the declspecs, so we could scan them at that point. pedwarn seems fine for this.
[Bug other/120963] Missed optimization of (a > b ? a - b : 0) pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120963 Andrew Pinski changed: What|Removed |Added Depends on||3507 --- Comment #1 from Andrew Pinski --- I have not fully looked but I suspect PR 3507 is most of the issue. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3507 [Bug 3507] appalling optimisation with sub/cmp on multiple targets
[Bug libstdc++/119742] [C++26] Implement P2697R1, Interfacing bitset with string_view
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119742 --- Comment #4 from Jonathan Wakely --- I've just noticed that the debug mode bitset in needs the new constructor added. make check RUNTESTFLAGS="conformance.exp=*bitset* --target_board=unix/-D_GLIBCXX_DEBUG" FAIL: 20_util/bitset/cons/string_view.cc -std=gnu++26 (test for excess errors)
[Bug c++/84009] No diagnostic issued if the decl-specifier in the decl-specifier-seq of a for-range-declaration is register, static,or thread_local
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84009 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #2 from Jakub Jelinek --- Created attachment 61802 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61802&action=edit gcc16-pr84009.patch Untested fix.
[Bug target/118241] RISC-V ICE: internal compiler error: in int_mode_for_mode, at stor-layout.cc:407 caused by prefetch instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118241 --- Comment #12 from GCC Commits --- The master branch has been updated by Vineet Gupta : https://gcc.gnu.org/g:f2a3ab7ebf3c40da77f54e8329272fe048ec48a6 commit r16-2032-gf2a3ab7ebf3c40da77f54e8329272fe048ec48a6 Author: Vineet Gupta Date: Fri Jul 4 12:33:51 2025 -0700 RISC-V: prefetch: fix LRA failing to allocate reg [PR118241] prefetch was recently fixed/tightened (with Q reg constraint) to only support right address patterns (REG or REG+D with lower 5 bits clear). However in some cases that's too restrictive for LRA and it fails to allocate a reg resulting in following ICE... | gcc/testsuite/gcc.target/riscv/pr118241-b.cc:31:19: error: unable to generate reloads for: | 31 | void m() { a.l(); } | | ^ |(insn 26 25 27 7 (prefetch (mem/f:DI (plus:DI (reg/f:DI 143 [ _5 ]) |(const_int 56 [0x38])) [5 _5->batch[6]+0 S8 A64]) |(const_int 0 [0]) |(const_int 3 [0x3])) "gcc/testsuite/gcc.target/riscv/pr118241-b.cc":18:29 498 {prefetch} | (expr_list:REG_DEAD (reg/f:DI 142 [ _5->batch[6] ]) |(nil))) |during RTL pass: reload Fix that by providing a fallback alternative register constraint to reload the address. PR target/118241 gcc/ChangeLog: * config/riscv/riscv.md (prefetch): Add alternative "r". gcc/testsuite/ChangeLog: * gcc.target/riscv/pr118241-b.cc: New test. Signed-off-by: Vineet Gupta
[Bug c++/120961] ICE on C++20 module when compiling Eigen.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120961 --- Comment #2 from Andrew Pinski --- Created attachment 61801 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61801&action=edit testcase compressed using xz xz is able to compress this below the limit.
[Bug target/120941] [16 Regression] 24-40% slowdown of 519.lbm_r on Zen2 and 470.lbm on Zen5 since r16-1644-gaba3b9d3a48a07
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 Filip Kastl changed: What|Removed |Added Summary|[16 Regression] 24-40% |[16 Regression] 24-40% |slowdown of 519.lbm_r on|slowdown of 519.lbm_r on |Zen2 since |Zen2 and 470.lbm on Zen5 |r16-1644-gaba3b9d3a48a07|since ||r16-1644-gaba3b9d3a48a07 --- Comment #8 from Filip Kastl --- The same commit (r16-1644-gaba3b9d3a48a07) causes ~20% slowdown of 470lbm from 2006 SPEC on Zen5 with -Ofast -march=native -flto -fprofile-use. https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1283.240.0
[Bug target/118891] [14/15/16 regression] gcc 14 fails to build from source on aarch64_be: "error: ‘dynamic_cast’ not permitted with ‘-fno-rtti’"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118891 Richard Sandiford changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #23 from Richard Sandiford --- I've posted a couple of patches that should help with this: * https://gcc.gnu.org/pipermail/gcc-patches/2025-July/688599.html * https://gcc.gnu.org/pipermail/gcc-patches/2025-July/688605.html
[Bug target/120782] RISC-V: vector-strict-align not working for spec17 521 ref size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120782 --- Comment #6 from Li Pan --- (In reply to Robin Dapp from comment #5) > I tried reproducing this with a recent trunk (r16-1965-gc512c9090f52e7) but > didn't see the exact code sequence. wrf also ran to completion on the Banana > Pi. > > Did you use a stock GCC 15.1 or a specific commit? Yes, I use 15.1 with rv64gcvb can hit that code sequenece, but it disappear when building with rv64gcvb_zvl256b. riscv64-linux-gnu-gcc (GCC) 15.1.0 Copyright (C) 2025 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. riscv64-linux-gnu-g++ (GCC) 15.1.0 Copyright (C) 2025 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[Bug tree-optimization/119965] [16 Regression] 531.deepsjeng_r binary is 50% bigger since r16-116-gcfb04e0de6aa43
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965 --- Comment #2 from Jan Hubicka --- This is likely ipa-cp heuristics issue which decides to clone now but after all the benefits are not really visible.
[Bug tree-optimization/119965] [16 Regression] 531.deepsjeng_r binary is 50% bigger since r16-116-gcfb04e0de6aa43
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965 --- Comment #3 from Jan Hubicka --- There is also 3% performance regressions that got lost on transition to ne PR https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=958.387.0
[Bug tree-optimization/120948] Cannot detect potential division-by-zero when numerator is 1 and denominator is variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120948 --- Comment #4 from Jakub Jelinek --- (In reply to Richard Biener from comment #1) > So you say that we fail to optimize > > int foo (unsigned x) > { > unsigned tem = 1/x; > if (x == 0) > return 5; > return tem; > } > > because we turn 1/x into x == 1? > > A phase ordering issue, obviously. Relevant in practice? I'm not sure. We could surely defer the optimization until GIMPLE and optimize 1/x into ({ if (!x) __builtin_unreachable (); 1}). The question is if it is worth it, plus having conditionals in the IL created by match.pd is difficult. Perhaps we want some variant way of representing very simple if (!cond) __builtin_unreachable (); as IFN_RANGE (x_123, ...); where ... would somehow represent what range x_123 can have. And a question where to drop those as well.
[Bug tree-optimization/120929] [16 Regression] file-5.45 triggers _FORTIFY_SOURCE false positives since r16-1905-g7165ca43caf470
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120929 --- Comment #21 from Siddhesh Poyarekar --- (In reply to Richard Biener from comment #20) > so for > > _1 = _2; > > we merge from _2. For > > _1 = *_2; > > we _also_ merge from _2. But those are semantically not the same! Yes, it only "makes sense" in the context of .ACCESS_WITH_SIZE as Qing originally conjectured, because .ACCESS_WITH_SIZE for _1 is stored in the context of &_1. > IMO this change was bogus and should be reverted. I'm testing a simple fix that constrains this to just .ACCESS_WITH_SIZE. Hopefully that should avoid the need to revert.
[Bug c++/120955] [15/16 Regression] 50 % increase in data segment size on avr-gcc for -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120955 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |15.2 Status|UNCONFIRMED |WAITING Last reconfirmed||2025-07-04 See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=120702 Ever confirmed|0 |1 Component|middle-end |c++ --- Comment #2 from Andrew Pinski --- Without a testcase it is hard to see what is going wrong. It could be similar to PR 120702 or not; especially when it comes to constexpr.
[Bug c++/120953] Accepts invalid with range for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120953 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Andrew Pinski --- Dup. *** This bug has been marked as a duplicate of bug 84009 ***
[Bug c++/84009] No diagnostic issued if the decl-specifier in the decl-specifier-seq of a for-range-declaration is register, static,or thread_local
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84009 Andrew Pinski changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Andrew Pinski --- *** Bug 120953 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 --- Comment #5 from Andrew Pinski --- (In reply to Andrew Pinski from comment #4) > Better reduced testcase: > ``` > double f(double r, double i) { > return__builtin_fmod(r, i); > } > ``` ``` double f(double r, double i) { return __builtin_fmod(r, i); } ``` `-O1 -fnon-call-exceptions -fsignaling-nans `
[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 --- Comment #4 from Andrew Pinski --- Better reduced testcase: ``` double f(double r, double i) { return__builtin_fmod(r, i); } ```
[Bug tree-optimization/120929] [16 Regression] file-5.45 triggers _FORTIFY_SOURCE false positives since r16-1905-g7165ca43caf470
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120929 --- Comment #20 from Richard Biener --- (In reply to qinzhao from comment #16) > (In reply to Siddhesh Poyarekar from comment #12) > > This is interesting here's the IR dump right after objsz: > > > > The key bit is: > > > > map2_4 = __builtin_malloc (8); > > pin_pointer (&buf); > > _1 = &map2_4->magic; > > _9 = __builtin_malloc (9); > > *_1 = _9; > > goto ; [100.00%] > > > >[local count: 1073741824]: > > b = ""; > > ptr_10 = *_1; > > _11 = 8; > > __builtin___memcpy_chk (ptr_10, &b, 9, _11); > > > > where *_1 gets updated to _9, but when one follows the *_1 through ptr_10, > > it doesn't end up with _9, the def statement is: > > > > _1 = &map2_4->magic; > > > > which leads to the incorrect value for the object size. This is because the > > pass doesn't know that a MEM_REF could be the LHS (for the zero byte offset > > case like it is here) for an assignment, i.e. this: > > > > *_1 = _9; > > There are two possible solutions to the above issue: > > A. Add handling for *_1 = _9 to enable the object size propagate through the > correct data flow path. > Or > B. in my latest change that triggered this issue: > > /* Handle the following stmt #2 to propagate the size from the >stmt #1 to #3: > 1 _1 = .ACCESS_WITH_SIZE (_3, _4, 1, 0, -1, 0B); > 2 _5 = *_1; > 3 _6 = __builtin_dynamic_object_size (_5, 1); > */ > else if (TREE_CODE (rhs) == MEM_REF > && POINTER_TYPE_P (TREE_TYPE (rhs)) > && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME > && integer_zerop (TREE_OPERAND (rhs, 1))) > reexamine = merge_object_sizes (osi, var, TREE_OPERAND (rhs, > 0)); > > > I feel that propagating the size through _5 = *_1 might not be correct in > general, we should only limit it to the case when the RHS is a pointer > defined by .ACCESS_WITH_SIZE? > > what do you think? I'm confused as to what this does for the case in question. The IL is _1 = &map2_4->magic; _9 = __builtin_malloc (9); *_1 = _9; ptr_10 = *_1; _11 = __builtin_dynamic_object_size (ptr_10, 0); ptr_10 is equal to _9, but the objsize dump suggests we are instead looking at the size of what _1 points to, instead of the size of what *_1 points to?! That would be completely incorrect of course! Also if that's not the case but we still look at *_1 but with _1 from the earlier def that cannot be done either because you skipped an intermediate definition of *_1. I think the code is obviously bogus. Quoting more in full: else if (gimple_assign_single_p (stmt) || gimple_assign_unary_nop_p (stmt)) { if (TREE_CODE (rhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE (rhs))) reexamine = merge_object_sizes (osi, var, rhs); /* Handle the following stmt #2 to propagate the size from the stmt #1 to #3: 1 _1 = .ACCESS_WITH_SIZE (_3, _4, 1, 0, -1, 0B); 2 _5 = *_1; 3 _6 = __builtin_dynamic_object_size (_5, 1); */ else if (TREE_CODE (rhs) == MEM_REF && POINTER_TYPE_P (TREE_TYPE (rhs)) && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME && integer_zerop (TREE_OPERAND (rhs, 1))) reexamine = merge_object_sizes (osi, var, TREE_OPERAND (rhs, 0)); so for _1 = _2; we merge from _2. For _1 = *_2; we _also_ merge from _2. But those are semantically not the same! IMO this change was bogus and should be reverted.
[Bug c/120951] New: error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 Bug ID: 120951 Summary: error: gimple cond condition cannot throw Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: dcb314 at hotmail dot com Target Milestone: --- Created attachment 61799 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61799&action=edit C source code The attached code does this with recent gcc: foundBugs $ ~/gcc/results/bin/gcc -c -O1 -fnon-call-exceptions -fsignaling-nans bug1107.c floatobj.c: In function ‘calc2_double’: floatobj.c:315:53: error: gimple cond condition cannot throw if (r_48 == r_48) during GIMPLE pass: cdce floatobj.c:315:53: internal compiler error: verify_gimple failed foundBugs $ ~/gcc/results/bin/gcc -c -O1 -fnon-call-exceptions bug1107.c foundBugs $ ~/gcc/results/bin/gcc -c -O1 -fsignaling-nans bug1107.c foundBugs $ ~/gcc/results/bin/gcc -c -fsignaling-nans -fnon-call-exceptions bug1107.c foundBugs $
[Bug c/120951] error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 David Binderman changed: What|Removed |Added Keywords||needs-bisection --- Comment #1 from David Binderman --- Reduced code seems to be: typedef struct { double real } Float; double float_from_double_inplace_r; double fmod(double, double); void float_from_double_inplace() { float_from_double_inplace_r = fmod(((Float *)0)->real, ((Float *)0)->real); }
[Bug tree-optimization/120916] debug line info for IV increment is lost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916 --- Comment #7 from Jan Hubicka --- LLVM also gets execution counts wrong, just the different (and less harmful) way: test:270773509:9780 1: 9116 2: 51984 for ( 4: 51984 iThis Inner Loop Header: Depth=1 .loc0 10 15 is_stmt 1 discriminator 33 # ll.c:10:15 movdqa (%rsi,%rdi), %xmm1 movdqa 16(%rsi,%rdi), %xmm2 psubd %xmm0, %xmm1 psubd %xmm0, %xmm2 movdqa %xmm1, (%rsi,%rdi) movdqa %xmm2, 16(%rsi,%rdi) .loc0 9 15 discriminator 33 # ll.c:9:15 addq$32, %rsi cmpq%rsi, %rdx jne .LBB0_4 So it has only line 9 and 10. Large discriminator numbers seems to be FS discriminator encoding. LLVM assigns discriminators twice. First one is done similarly as we do, but scaled up. I think it is supposed to handle when statement gets duplicated into multiple basic blocks, like a[i]++ does. So it has: .loc0 10 15 is_stmt 1 discriminator 33 # ll.c:10:15 movdqa (%rsi,%rdi), %xmm1 movdqa 16(%rsi,%rdi), %xmm2 psubd %xmm0, %xmm1 psubd %xmm0, %xmm2 movdqa %xmm1, (%rsi,%rdi) movdqa %xmm2, 16(%rsi,%rdi) for the vectorized body and .loc0 10 15 is_stmt 1 # ll.c:10:15 leaq(%rcx,%rdx,4), %rdi incl(%rsi,%rdi) for epilogue. Tool has -fuse_discriminator_encoding option which then merges values back. I will look into what this really does.
[Bug tree-optimization/120948] Cannot detect potential division-by-zero when numerator is 1 and denominator is variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120948 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- (In reply to huyubiao from comment #2) > I believe such non-compliant code makes maintenance troubleshooting > significantly harder. Could we implement a compiler warning when x=0 > potentially occurs? No. That is the most common case that the compiler doesn't know if divisor can or can't be 0. Such warning would trigger all the time.
[Bug tree-optimization/120944] [12/13/14/15/16 Regression] Incorrect optimization with accessing a volatile structure member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120944 --- Comment #3 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:6ed1e2ae1a742d859c2dd74c9e7cebdd3618e8b1 commit r16-2019-g6ed1e2ae1a742d859c2dd74c9e7cebdd3618e8b1 Author: Richard Biener Date: Fri Jul 4 09:08:19 2025 +0200 tree-optimization/120944 - bogus VN with volatile copies The following avoids translating expressions through volatile copies. PR tree-optimization/120944 * tree-ssa-sccvn.cc (vn_reference_lookup_3): Gate optimizations invalid when volatile is involved. * gcc.dg/torture/pr120944.c: New testcase.
[Bug tree-optimization/120944] [12/13/14/15 Regression] Incorrect optimization with accessing a volatile structure member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120944 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Known to work||16.0 Known to fail||15.1.0 Summary|[12/13/14/15/16 Regression] |[12/13/14/15 Regression] |Incorrect optimization with |Incorrect optimization with |accessing a volatile|accessing a volatile |structure member|structure member
[Bug target/120956] New: [16 Regression] 6% slowdown of 503.bwaves_r since
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120956 Bug ID: 120956 Summary: [16 Regression] 6% slowdown of 503.bwaves_r since Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org Target Milestone: ---
[Bug target/120956] [16 Regression] 6% slowdown of 503.bwaves_r since
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120956 Filip Kastl changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Filip Kastl --- sorry, please ignore this
[Bug c++/120575] [15/16 Regression] ICE: in cp_parser_abort_tentative_parse, at cp/parser.cc:36574
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120575 --- Comment #2 from GCC Commits --- The trunk branch has been updated by Jason Merrill : https://gcc.gnu.org/g:35d6f55f7d6655a8683b45286283d44674fa997e commit r16-2024-g35d6f55f7d6655a8683b45286283d44674fa997e Author: Jason Merrill Date: Fri Jul 4 05:15:00 2025 -0400 c++: -Wtemplate-body and tentative parsing [PR120575] Here we were asserting non-zero errorcount, which is not the case if the parse error was reduced to a warning (or silenced) in a template body. So check seen_error instead. PR c++/120575 PR c++/116064 gcc/cp/ChangeLog: * parser.cc (cp_parser_abort_tentative_parse): Check seen_error instead of errorcount. gcc/testsuite/ChangeLog: * g++.dg/template/permissive-error3.C: New test.
[Bug c++/116064] [15 Regression] SPEC 2017 523.xalancbmk_r failed to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116064 --- Comment #16 from GCC Commits --- The trunk branch has been updated by Jason Merrill : https://gcc.gnu.org/g:35d6f55f7d6655a8683b45286283d44674fa997e commit r16-2024-g35d6f55f7d6655a8683b45286283d44674fa997e Author: Jason Merrill Date: Fri Jul 4 05:15:00 2025 -0400 c++: -Wtemplate-body and tentative parsing [PR120575] Here we were asserting non-zero errorcount, which is not the case if the parse error was reduced to a warning (or silenced) in a template body. So check seen_error instead. PR c++/120575 PR c++/116064 gcc/cp/ChangeLog: * parser.cc (cp_parser_abort_tentative_parse): Check seen_error instead of errorcount. gcc/testsuite/ChangeLog: * g++.dg/template/permissive-error3.C: New test.
[Bug c++/116064] [15 Regression] SPEC 2017 523.xalancbmk_r failed to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116064 --- Comment #17 from GCC Commits --- The releases/gcc-15 branch has been updated by Jason Merrill : https://gcc.gnu.org/g:799dfe7c5f638a645d33b47750f797b3fb87329b commit r15-9927-g799dfe7c5f638a645d33b47750f797b3fb87329b Author: Jason Merrill Date: Fri Jul 4 05:15:00 2025 -0400 c++: -Wtemplate-body and tentative parsing [PR120575] Here we were asserting non-zero errorcount, which is not the case if the parse error was reduced to a warning (or silenced) in a template body. So check seen_error instead. PR c++/120575 PR c++/116064 gcc/cp/ChangeLog: * parser.cc (cp_parser_abort_tentative_parse): Check seen_error instead of errorcount. gcc/testsuite/ChangeLog: * g++.dg/template/permissive-error3.C: New test. (cherry picked from commit 35d6f55f7d6655a8683b45286283d44674fa997e)
[Bug c++/120575] [15/16 Regression] ICE: in cp_parser_abort_tentative_parse, at cp/parser.cc:36574
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120575 Jason Merrill changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Jason Merrill --- Fixed for 15.2/16.
[Bug tree-optimization/120948] Cannot detect potential division-by-zero when numerator is 1 and denominator is variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120948 --- Comment #2 from huyubiao --- (In reply to Richard Biener from comment #1) > So you say that we fail to optimize > > int foo (unsigned x) > { > unsigned tem = 1/x; > if (x == 0) > return 5; > return tem; > } > > because we turn 1/x into x == 1? > > A phase ordering issue, obviously. Relevant in practice? I'm not sure. I believe such non-compliant code makes maintenance troubleshooting significantly harder. Could we implement a compiler warning when x=0 potentially occurs?
[Bug middle-end/120951] [16 regression] error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2025-07-04 Keywords|needs-bisection |ice-on-valid-code Ever confirmed|0 |1 Summary|error: gimple cond |[16 regression] error: |condition cannot throw |gimple cond condition ||cannot throw CC||pinskia at gcc dot gnu.org Target Milestone|--- |16.0 Status|UNCONFIRMED |ASSIGNED Component|c |middle-end --- Comment #2 from Andrew Pinski --- I suspect this started to ICEing when the verification was added. I will take a look soon.
[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 Andrew Pinski changed: What|Removed |Added Component|middle-end |tree-optimization --- Comment #3 from Andrew Pinski --- Note the bug is in the cdce pass where it creates GIMPLE_COND. When the condition can throw, a temporary needs to be used.
[Bug target/120959] New: [16 Regression] 9% slowdown of 549.fotonik3d_r on Zen5 since r16-1645-g309dbcea2cabb3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120959 Bug ID: 120959 Summary: [16 Regression] 9% slowdown of 549.fotonik3d_r on Zen5 since r16-1645-g309dbcea2cabb3 Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org CC: tnfchris at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu As seen here https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1244.527.0 there was a 9% exec time slowdown of the 549.fotonik3d_r SPEC 2017 benchmark when run with -O2 -flto -fprofile-use (generic march) on an AMD Zen 5 machine. I bisected it to r16-1645-g309dbcea2cabb3. commit 309dbcea2cabb31bde1a65cdfd30bb7f87b170a2 Author: Tamar Christina AuthorDate: Tue Jun 24 07:13:22 2025 +0100 Commit: Tamar Christina CommitDate: Tue Jun 24 07:13:22 2025 +0100 middle-end: replace log_vf usages with vf to allow support for non-power of two vf This is a regression against GCC 15. If I measure manually the most recent commit of releases/gcc-15, I get the same exec time as before Tamar's commit on trunk. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
[Bug target/120959] [16 Regression] 9% slowdown of 549.fotonik3d_r on Zen5 since r16-1645-g309dbcea2cabb3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120959 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org --- Comment #1 from Tamar Christina --- OK, Will take a look next monday.
[Bug target/120957] [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957 Filip Kastl changed: What|Removed |Added Target Milestone|--- |16.0
[Bug target/120957] New: [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957 Bug ID: 120957 Summary: [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485 Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pheeck at gcc dot gnu.org CC: liuhongt at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu As seen here https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.427.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.427.0 there was a 6% exec time slowdown of the 503.bwaves SPEC 2017 benchmark when run with -Ofast -march=native on an AMD Zen2 machine, 9% slowdown on an AMD Zen3 machine. I bisected this to r16-1647-gc06979ff957485 (2025-06-24). c06979ff95748559da0c2d3aa4eda9d5999eaaf6 is the first bad commit commit c06979ff95748559da0c2d3aa4eda9d5999eaaf6 Author: hongtao.liu Date: Wed Mar 5 12:25:32 2025 +0100 Don't duplicate setup code cost when do group-candidate cost calucalution. - /* Uses in a group can share setup code, so only add setup cost once. */ - cost -= cost.scratch; It looks like the original code took into account avoiding double counting, but unfortunately cost is reset inside the follow loop which invalidates the upper code, and makes same setup code cost duplicated in each use of the group. The patch fix the issue. It can also improve 548.exchange_r by 6% with -march=x86-64-v3 -O2 due to better ivopt on EMR. No big performance impact for SPEC2017 on graviton4/SPR with -mcpu=native -Ofast -fomit-framepointer -flto=auto. gcc/ChangeLog: PR target/115842 * tree-ssa-loop-ivopts.cc (determine_group_iv_cost_address): Don't recalculate inv_expr when group-candidate cost calucalution. gcc/tree-ssa-loop-ivopts.cc | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) bisect found first bad commit Connection to tiber.arch.suse.cz closed. This is a regression against GCC 15. See the comparison (Zen2) here: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1070.427.0&plot.1=1219.427.0&plot.2=295.427.0&; Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
[Bug fortran/120958] New: tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958 Bug ID: 120958 Summary: tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec Product: gcc Version: 15.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Since my commit r14-5831-gaae723d360ca26 (Martin Jambor: sra: SRA of non-escaped aggregates passed by reference to calls), gcc produces a non-workinfg MPI version of the CG benchmark from NAS Parallel Benchmarks version 3.3.1, which is written in Fortran 77. (The benchmark has been re-written in a newer version of Fortran in version 3.4 of the suite and I suspect that one no longer has this problem). The problem is that tree-sra is told by escape analysis that the address of the first parameter of mpi_irecv does not escape. And so the aggregate passed in that parameter is an SRA candidate and is broken down into scalar components and these are reloaded immediately after the function returns returns and not after a call of mpi_wait. The reason why escape analysis says that is that fnspec of mpi_irecv, is ". w w w w w w w w " which indeed says (the first w) that the first parameter does not escape. This fnspec is created by function in gcc/fortran/trans-types.cc which, AFAICT, simply deduces it from the call statement in the benchmark source (but I may be easily wrong here). My first impression was that this is simply a limitation of Fortran 77 and asynchronous MPI simply cannot work in this language standard. However, Richi pointed out that there must be a lot of Fortran 77 code using asynchronous MPI that we do not want to break, which is a reasonable point of view. The benchmark can be downloaded from https://www.nas.nasa.gov/software/npb.html. I have used mpich 4.1.2 MPI implementation from openSUSE Leap 15.6 and my configuration file config/make.def is: -- ## Compiler MPIF77 = mpif77 MPICC = mpicc # libhugetlbfs relinking LHBDT = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-link=BDT LHB = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-link=B LHALIGN = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align # Fortran Optimisation FLINK = mpif77 F_LIB = $(LHRELINK) $(LHLIB) F_INC = FFLAGS = -O3 -mcmodel=large -g -fallow-argument-mismatch -fallow-invalid-boz -m64 FLINKFLAGS = -O3 -lmpi -g -fallow-argument-mismatch -fallow-invalid-boz -mcmodel=large -m64 $(LHRELINK) $(LHLIB) # C Optimisation CLINK = mpicc C_LIB = $(LHRELINK) $(LHLIB) C_INC = CFLAGS = -O3 -mcmodel=large -m64 CLINKFLAGS = -O3 -lmpi -mcmodel=large -m64 $(LHRELINK) $(LHLIB) # Other UCC= mpicc BINDIR = ../bin RAND = randi8 WTIME = wtime.c -- The problematic variable which is SRAed is norm_temp2 defined on line: double precision norm_temp1(2), norm_temp2(2) and then used in code snippet: do i = 1, l2npcols if (timeron) call timer_start(t_ncomm) call mpi_irecv( norm_temp2, > 2, > dp_type, > reduce_exch_proc(i), > i, > mpi_comm_world, > request, > ierr ) call mpi_send( norm_temp1, > 2, > dp_type, > reduce_exch_proc(i), > i, > mpi_comm_world, > ierr ) call mpi_wait( request, status, ierr ) if (timeron) call timer_stop(t_ncomm) norm_temp1(1) = norm_temp1(1) + norm_temp2(1) norm_temp1(2) = norm_temp1(2) + norm_temp2(2) enddo
[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958 --- Comment #1 from Martin Jambor --- And indeed the following hack in Fortran FE "fixes" the benchmark (of course, this is not meant as a proposed fix, just as a demonstration where the problem is): diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc index 1754d982153..7ee65a983a7 100644 --- a/gcc/fortran/trans-types.cc +++ b/gcc/fortran/trans-types.cc @@ -3327,6 +3327,7 @@ create_fn_spec (gfc_symbol *sym, tree fntype) } } + bool is_mpi_irecv = (strcmp (sym->name, "mpi_irecv") == 0); for (f = gfc_sym_get_dummy_args (sym); f; f = f->next) if (spec_len < sizeof (spec)) { @@ -3342,6 +3343,7 @@ create_fn_spec (gfc_symbol *sym, tree fntype) } if (f->sym == NULL || is_pointer || f->sym->attr.target + || is_mpi_irecv || f->sym->attr.external || f->sym->attr.cray_pointer || (f->sym->ts.type == BT_DERIVED && (f->sym->ts.u.derived->attr.proc_pointer_comp
[Bug c/118948] [15/16 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_single_nonnegative_warnv_p, at fold-const.cc:14878 since r15-328
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118948 --- Comment #7 from GCC Commits --- The trunk branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:f24015a4c2ca6d6fbbf7090004b3a83081f18f03 commit r16-2026-gf24015a4c2ca6d6fbbf7090004b3a83081f18f03 Author: Andrew Pinski Date: Thu Jul 3 11:58:50 2025 -0700 fold: Change comparison of error_mark_node to use error_operand_p in tree_expr_nonnegative_warnv_p [PR118948] This is an obvious fix for this small regression. Basically after r15-328-g5726de79e2154a, there is a call to tree_expr_nonnegative_warnv_p where the type of the expression is now error_mark_node. Though there was only a check if the expression was error_mark_node. Bootstrapped and tested on x86_64-linux-gnu. PR c/118948 gcc/ChangeLog: * fold-const.cc (tree_expr_nonnegative_warnv_p): Use error_operand_p instead of checking for error_mark_node directly. gcc/testsuite/ChangeLog: * gcc.dg/pr118948-1.c: New test. Signed-off-by: Andrew Pinski
[Bug c/118948] [15/16 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_single_nonnegative_warnv_p, at fold-const.cc:14878 since r15-328
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118948 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Target Milestone|15.2|16.0 --- Comment #8 from Andrew Pinski --- Fixed on the trunk
[Bug target/117850] GCC emits DUP, UMULL instead of UMULL2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117850 Andrew Pinski changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #7 from Andrew Pinski --- .
[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958 kargls at comcast dot net changed: What|Removed |Added CC||kargls at comcast dot net --- Comment #2 from kargls at comcast dot net --- (In reply to Martin Jambor from comment #0) > > The problem is that tree-sra is told by escape analysis that the > address of the first parameter of mpi_irecv does not escape. And so > the aggregate passed in that parameter is an SRA candidate and is > broken down into scalar components and these are reloaded immediately > after the function returns returns and not after a call of mpi_wait. Don't know what SRA is. What does it mean for a parameter to escape or not escape? In particular, this is likely not restricted to just Fortran 77. It may effect all versions of Fortran.
[Bug tree-optimization/120358] [15/16 regression] qtbase-6.9.0 miscompiled since r15-580-gf3e5f4c58591f5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120358 --- Comment #27 from Holger Hoffstätte --- If I add lifetime-extending printfs to QStringTokenizerBase::next() like this: --- qstringtokenizer.h 2025-07-04 17:56:28.523676630 +0200 +++ qstringtokenizer-printf.h 2025-07-04 17:56:03.998840901 +0200 @@ -389,11 +389,14 @@ auto QStringTokenizerBase return final element: result = m_haystack.sliced(state.start); +__builtin_printf("\nfinal: '%s'\n", result.toString().toStdString().c_str()); } if ((m_sb & Qt::SkipEmptyParts) && result.isEmpty()) { +__builtin_printf("\nskipping empty result: '%s'\n", result.toString().toStdString().c_str()); continue; } return {result, true, state}; ..it starts to work. Removing any one of these printfs breaks the lifetime of the intermediate "Haystack result" and leads to garbage when trying to find the last element.
[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949 Jonathan Wakely changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |redi at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2025-07-04
[Bug target/120957] [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957 --- Comment #1 from Filip Kastl --- The slowdown is also present on 410.bwaves from 2006 SPEC https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=467.40.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=301.40.0 again, both on Zen2 and Zen3
[Bug target/120943] [16 Regression] 5% slowdown of 527.cam4_r on Zen{4,5} since r16-1643-gd073bb6cfc219d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120943 --- Comment #3 from Filip Kastl --- (In reply to H.J. Lu from comment #1) > Please try: > > https://patchwork.sourceware.org/project/gcc/list/?series=48886 Yes, if I apply this patch, the slowdown goes away
[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 --- Comment #6 from Andrew Pinski --- use_internal_fn has: ``` /* Skip the call if LHS == LHS. If we reach here, EDOM is the only valid errno value and it is used iff the result is NaN. */ conds.quick_push (gimple_build_cond (EQ_EXPR, lhs, lhs, NULL_TREE, NULL_TREE)); ``` Which is only a problem as `a == a` with signaling nans can trap and with non call exceptions can cause an exception. The easiest and correct way to fix this is s/EQ_EXPR/ORDERED_EXPR/ which is the same when not having signaling NaNs but with signaling NaNs ORDERED_EXPR is non-trapping so it will work here and such.
[Bug libstdc++/118681] [C++17] unsynchronized_pool_resource may fail to respect alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118681 Jonathan Wakely changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |redi at gcc dot gnu.org Target Milestone|--- |13.5 Status|NEW |ASSIGNED
[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958 --- Comment #3 from Martin Jambor --- SRA is "Scalar Replacement of Aggregates" pass. It is one of our optimization passes which can split up aggregates into scalar components and thus allow further optimizations. To "escape" means that the address of the parameter is stored somewhere in the global state of the program during the execution of the called function and so it can be modified through an alias or in a subsequent call to some function. My knowledge of Fortran is limited but my understanding is that later versions of Fortran introduced the ASYNCHRONOUS attribute to deal with thee situations.
[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951 --- Comment #7 from Andrew Pinski --- Created attachment 61800 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61800&action=edit Patch which I am testing
[Bug jit/120960] New: jit: The static initialization of builtin_data uses flag_openacc and flag_openmp (which have not been set yet)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120960 Bug ID: 120960 Summary: jit: The static initialization of builtin_data uses flag_openacc and flag_openmp (which have not been set yet) Product: gcc Version: 15.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: jit Assignee: dmalcolm at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: antoyo at gcc dot gnu.org Target Milestone: --- In jit "FE," builtin_data array holding information about builtin functions is a static array and is statically initialized using macros DEF_GOACC_BUILTIN_COMPILER, DEF_GOMP_BUILTIN_COMPILER and others which use flag_openacc and flag_openmp to initialize one of the fields of elements of the array. Unless I am missing something obvious, because this is done in a static initializer, this happens before the flags have a chance to hold any meaningful value - although I am not sure whether there is such a thing as a meaningful value for these in the case of JIT. Of course, feel free to close this as WONTFIX if this is not deemed to be a problem for this use case.
[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958 --- Comment #4 from rguenther at suse dot de --- > Am 04.07.2025 um 18:18 schrieb jamborm at gcc dot gnu.org > : > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958 > > --- Comment #3 from Martin Jambor --- > SRA is "Scalar Replacement of Aggregates" pass. It is one of our > optimization passes which can split up aggregates into scalar > components and thus allow further optimizations. > > To "escape" means that the address of the parameter is stored > somewhere in the global state of the program during the execution of > the called function and so it can be modified through an alias or in a > subsequent call to some function. > > My knowledge of Fortran is limited but my understanding is that later > versions of Fortran introduced the ASYNCHRONOUS attribute to deal with > thee situations. In particular Fortran 77 doesn’t have POINTER, but Fortran 77 can call C functions (though not in a standard defined way). It is probably uncommon that this case happens on automatic variables. I‘m unsure if we can do better than ignoring the problem or not set any fnspec for all functions for legacy/f77 standard programs? > > -- > You are receiving this mail because: > You are on the CC list for the bug.
[Bug target/120941] [16 Regression] 24-40% slowdown of 519.lbm_r on Zen2 since r16-1644-gaba3b9d3a48a07
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 --- Comment #7 from Filip Kastl --- >(In reply to Filip Kastl from comment #0) > there was a 40% exec time slowdown (on another machine I measured only 24%) > of 527.cam4_r SPEC 2017 benchmark when run with -Ofast -march=native -flto and this should have said 519.lbm_r instead of 527.cam4_r. Apparently I was confused when writing this.
[Bug target/120941] [16 Regression] 10-40% slowdown of 519.lbm_r on Zen2 since r16-1644-gaba3b9d3a48a07
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 Filip Kastl changed: What|Removed |Added Summary|[16 Regression] 10-40% |[16 Regression] 10-40% |slowdown of 519.lbm_r on|slowdown of 519.lbm_r on |Zen{2,3} since |Zen2 since |r16-1644-gaba3b9d3a48a07|r16-1644-gaba3b9d3a48a07 --- Comment #6 from Filip Kastl --- (In reply to Haochen Jiang from comment #5) > I have tried on Zen3 Client machine with -Ofast -flto -march=native/znver2 > and see no regression on both options before and after the commit. Ah, my bad. The Zen3 slowdown must be something else. I forgot to check if the dates match and they don't (the Zen3 slowdown happened earlier than the Zen2 slowdown). So this seems to be only Zen2 issue. Sorry for that.
[Bug c++/120954] [15 Regression] False positive -Warray-bounds=2 warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120954 --- Comment #3 from Sam James --- 15.1: https://godbolt.org/z/3bM5bEY7b 14.3: https://godbolt.org/z/fahjWYEfr 13.4: https://godbolt.org/z/KK3bh5Gz4 12.4: https://godbolt.org/z/fM3c913qh However, when compiling it as C++, I only see it on trunk (but with a few days old builds for the branches).