[Bug cobol/120422] Reducing strcmp() and strlen() gcc/cobol/genapi.cc at f3a62dcfc96cb24127385a7e668133e037b6085d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120422 --- Comment #1 from Sam James --- Please send patches to gcc-patches@ (https://gcc.gnu.org/contribute.html) once they're ready. It's okay to include WIP stuff on Bugzilla, but patches on BZ won't get reviewed seriously or applied.
[Bug target/120423] New: ICE in avr-gcc extract_constrain_insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120423 Bug ID: 120423 Summary: ICE in avr-gcc extract_constrain_insn Product: gcc Version: 15.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: thierer at web dot de Target Milestone: --- Created attachment 61513 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61513&action=edit avr-gcc -freport-bug output Output of "avr-gcc -freport-bug" is attached. Might be a duplicate of #116389, because it likewise doesn't crash with -mlra. Let me know if you need more details. Thanks! Martin
[Bug tree-optimization/120426] XMM store isn't used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120426 --- Comment #4 from Andrew Pinski --- (In reply to H.J. Lu from comment #3) > (In reply to Andrew Pinski from comment #2) > > With -mtune=sapphirerapids we get: > > > >[local count: 1073741824]: > > MEM [(union *)lock_2(D)] = 0; > > MEM [(union *)lock_2(D) + 8B] = 0; > > MEM [(union *)lock_2(D) + 16B] = 1; > > MEM [(union *)lock_2(D) + 24B] = { 0, 0 }; > > > > > > Which is because we don't combine "memset"s into large ones yet. > > Which is PR 49872. > > Do you have a patch I can try? Or I can extend STV pass to cover this > particular > case. I don't have a patch right now, maybe in 2 or 3 weeks. It is one of the things I am working towards
[Bug middle-end/115539] Misoptimization of application at -O2 -g on x86-64 causing segfaults on valid memory accesses where it works on both clang and gcc at -g (no -O2)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115539 Sam James changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID
[Bug target/71657] Wrong code on trunk gcc (std::out_of_range), westmere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71657 --- Comment #15 from Sam James --- (In reply to Sam James from comment #14) > /* Disabled due to PRs 70902, 71453, 71555, 71596 and 71657. */ > > All of those except for PR71453 were dependent on tom's fix (PR83327) so > should be ready to revisit? https://inbox.sourceware.org/gcc-patches/cafuld4b2uqj+kwxsokpanopnjizofklfgztxa0fnaxo1_qr...@mail.gmail.com/
[Bug target/118996] Should TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P return false for x86-64?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118996 Sam James changed: What|Removed |Added Keywords||plugin --- Comment #17 from Sam James --- https://inbox.sourceware.org/gcc-patches/20250429214016.2469132-1-hjl.to...@gmail.com/
[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823 Sam James changed: What|Removed |Added Last reconfirmed||2025-05-25 Ever confirmed|0 |1 See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=61094 Status|UNCONFIRMED |NEW
[Bug rtl-optimization/111143] [missed optimization] unlikely code slows down diffutils x86-64 ASCII processing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43 Sam James changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2025-05-25
[Bug target/119083] Remove SSE_FIRST_REG from ix86_class_likely_spilled_p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119083 Sam James changed: What|Removed |Added Keywords||patch --- Comment #11 from Sam James --- https://inbox.sourceware.org/gcc-patches/20250429214021.2469148-1-hjl.to...@gmail.com/
[Bug tree-optimization/120426] XMM store isn't used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120426 --- Comment #3 from H.J. Lu --- (In reply to Andrew Pinski from comment #2) > With -mtune=sapphirerapids we get: > >[local count: 1073741824]: > MEM [(union *)lock_2(D)] = 0; > MEM [(union *)lock_2(D) + 8B] = 0; > MEM [(union *)lock_2(D) + 16B] = 1; > MEM [(union *)lock_2(D) + 24B] = { 0, 0 }; > > > Which is because we don't combine "memset"s into large ones yet. > Which is PR 49872. Do you have a patch I can try? Or I can extend STV pass to cover this particular case.
[Bug target/86772] [meta-bug] tracking port status for CVE-2017-5753
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86772 --- Comment #4 from GCC Commits --- The releases/gcc-14 branch has been updated by Michael Eager : https://gcc.gnu.org/g:f644e21ee364405213a8609bbd8371c27fdb69d9 commit r14-11803-gf644e21ee364405213a8609bbd8371c27fdb69d9 Author: Michael J. Eager Date: Sat May 24 14:54:55 2025 -0700 MicroBlaze does not support speculative execution (CVE-2017-5753) gcc/ PR target/86772 Tracking CVE-2017-5753 * config/microblaze/microblaze.cc (TARGET_HAVE_SPECULATION_SAFE_VALUE): Define to speculation_save_value_not_needed
[Bug target/120427] [12/13/14/15/16 Regression] "and $0,mem" is generated without -Oz since r12-6106-gef26c151c14a87
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120427 Sam James changed: What|Removed |Added Keywords||missed-optimization Summary|[12/13/14/15/16 Regression] |[12/13/14/15/16 Regression] |"and $0,mem" is generated |"and $0,mem" is generated |without -Oz |without -Oz since ||r12-6106-gef26c151c14a87 Target Milestone|--- |12.5
[Bug target/120424] [arm] -fnon-call-exceptions -fstack-clash-protection triggers lra-eliminations bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120424 --- Comment #2 from Alexandre Oliva --- Created attachment 61516 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61516&action=edit candidate patch This patch likely fixes bug 118929 as well.
[Bug middle-end/118939] [14/15/16 Regression] ada: executable segfaults on arm-linux-gnueabi when assigning an access to controlled type since r14-2653-g2971ff7b1d564a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118939 Alexandre Oliva changed: What|Removed |Added CC||aoliva at gcc dot gnu.org --- Comment #23 from Alexandre Oliva --- FWIW, the patch for bug 120424 is supposed to fix the lra-eliminations.cc underlying problem.
[Bug target/120424] [arm] -fnon-call-exceptions -fstack-clash-protection triggers lra-eliminations bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120424 --- Comment #3 from Alexandre Oliva --- err, make that PR118939
[Bug target/109982] csmith: x86_64: znver1 issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982 Sam James changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #17 from Sam James --- I think any real issue here was handled by PR109780 and friends. Let's close this one because of the attribute misuse.
[Bug tree-optimization/120425] [12/13/14/15/16 regression] GCC-compiled (with -O{1,2,s,3}) program got segfault from GCC 12.1 since r12-476-gd846f225c25c58
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 --- Comment #6 from Andrew Pinski --- Created attachment 61517 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61517&action=edit Gimple testcase that fails before GCC 12
[Bug tree-optimization/120425] [12/13/14/15/16 regression] GCC-compiled (with -O{1,2,s,3}) program got segfault from GCC 12.1 since r12-476-gd846f225c25c58
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 --- Comment #7 from Andrew Pinski --- (In reply to Sam James from comment #5) > r12-476-gd846f225c25c58 I think that just exposed the issue. My gimple testcase can only go back to GCC 9 which fails also.
[Bug tree-optimization/120425] [12/13/14/15/16 regression] GCC-compiled (with -O{1,2,s,3}) program got segfault from GCC 12.1 since r12-476-gd846f225c25c58
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 --- Comment #8 from Andrew Pinski --- (In reply to Sam James from comment #5) > r12-476-gd846f225c25c58 What Fre1 is doing seems to be ok and correct: Working (GCC 11): _3 = -f_9; ... _2 = -1211051206 - _3; Vs not working (GCC 12+): _3 = -f_9; _2 = f_9 + -1211051206;
[Bug target/86792] microblaze port needs updating for CVE-2017-5753
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86792 --- Comment #1 from Sam James --- r14-11803-gf644e21ee36440 fixes this but it's missing on releases/gcc-15 and trunk...
[Bug target/120424] New: [arm] -fnon-call-exceptions -fstack-clash-protection triggers lra-eliminations bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120424 Bug ID: 120424 Summary: [arm] -fnon-call-exceptions -fstack-clash-protection triggers lra-eliminations bug Product: gcc Version: 14.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: aoliva at gcc dot gnu.org Reporter: aoliva at gcc dot gnu.org Target Milestone: --- Created attachment 61514 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61514&action=edit testcase regresses on arm-eabi, arm-linux-gnu, arm-vx7r2, ... starting at gcc 14 The attached C++ testcase triggers the problem. It's rewritten in C++ from libgnat (where -fnon-call-exceptions is enabled by default, and -fstack-clash-protection is enabled explicitly). I don't know yet whether C++ is essential, but the throw appears to be needed to trip the issue. g() has an empty frame until reload spills a register that needs to survive a function call. That spilling flips arm_frame_pointer_required from false to true, so lra_update_fp2sp_elimination proceeds to disable that elimination possibility, and ultimately we fail to adjust the spill stack slot's negative offset, so it remains below the stack pointer. The workaround is to disable the size == 0 frame pointer optimization. I'm still investigating towards a proper fix.
[Bug c++/120320] g++ freezes forever
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120320 --- Comment #4 from Chameleon --- Indeed, avoiding determining which of these partial specializations is most constrained, fixes the problem. But, the failed algorithm, if not fixed to handle HUGE DNF/CNF constraints, at least it must produce a bailing-out diagnostic.
[Bug c/120425] New: GCC-compiled with -O{1,2,s,3} program got segfault from GCC 12.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 Bug ID: 120425 Summary: GCC-compiled with -O{1,2,s,3} program got segfault from GCC 12.1 Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: congli at smail dot nju.edu.cn Target Milestone: --- Starting with GCC version 12.1, compiling the following t.c using optimization flags -O1, -O2, -Os, or -O3 results in a segmentation fault at runtime. ``` $ cat t.c unsigned a[] = { 0, 4, 8, 4, 124634137, 5, 5, 5, 249268274, 2044508324, 0, 4, 5, 1, 3887607047, 2428444049, 8, 1789927666, 4089016648, 4, 50548861, 3, 107580753, 2211677639, 0, 2, 4251122042, 2321926636, 7, 5, 5, 7, 7073096,6, 2, 2, 1006888145, 607687, 101629, 3, 901097722, 1119000684, 6, 8065728,1, 1, 705015759, 5, 651767980, 6, 4, 104598, 565507253, 1, 3485111705, 3099436303, 4, 1594198024, 30930, 70347812, 795835527, 1483230225, 5, 3060149565, 2, 4, 2563907772, 4023717930, 907459465, 112637215, 3, 904427059, 2013776290, 6, 4, 3775830040, 3, 3, 9, 7, 802195444, 6, 8001368,4066508878, 70925, 3092731,2181625025, 3, 706088902, 4, 2344532202, 2, 1, 366619977, 3, 5, 1303535960, 6, 7007092,3569037538, 70817, 1, 3, 3554079995, 6, 6, 2909243462, 6, 7, 7, 1, 708648649, 8, 654459306, 6048, 4, 1466479909, 544179635, 10523913, 5, 4, 702138776, 0, 2, 504918807, 783551873, 3082640443, 9, 4, 2596254646, 7068, 1957810842, 5, 2647816111, 70997, 1943803523, 0, 4, 0, 2053790376, 3826175755, 3, 3, 2097651377, 4027552580, 2265490386, 2, 1762050814, 5, 5, 5, 1852507879, 6, 0, 6, 2, 708143, 5, 397917763, 7, 604390888, 8, 953729732, 6, 3518719985, 60999, 1068828381, 9, 0, 8, 906185462, 1090812512, 3747672003, 9, 5, 1, 4, 60834842, 628085408, 1382605366, 3423369109, 8078467,570562233, 400815, 3317316542, 608, 4, 1555261956, 1, 5, 3, 1541320221, 607071920, 0, 2, 40735498, 2617837225, 1, 3087877,83908371, 4, 803740692, 2075208622, 213261112, 3, 90285, 2094854071, 1, 2029012,0, 2, 0, 1, 5, 1873836001, 7, 4, 200368, 4, 6, 2405801727, 5, 5, 1, 5067896,608007406, 1308918612, 8, 808555105, 3495958263, 1, 5, 8, 3654703836, 1088359270, 0, 9, 9, 202900863, 7, 108,0, 1404277552, 0, 207493, 3453421203, 1423857449, 1, 3009837614, 3294710456, 1567103746, 711928724, 3020668471, 3272380065, 5, 755167117}; int b, c[] = {1911263494, 774465782, 4379194,669572660, -1452495846, -1658729425, 1103267782, -90393310, 1635864740, -1, -1238002948, -351663323, -576056573, 1233623753, -1844776976, -1531764644, -319456054, 1797911602, -684072473, -1155699931}; int h(int i) { unsigned e = 4294967295; for (int d = 0; d < i; ++d) { e = e >> 8 ^ a[(e ^ c[d]) & 255]; e = e >> 8 ^ a[(e ^ c[d] >> 8) & 255]; e = e >> 8 ^ a[(e ^ c[d] >> 16) & 255]; e = e >> 8 ^ a[(e ^ c[d] >> 24) & 255]; } e = e ^ 4294967295; return e; } int main() { int f = 987751161, g = -1211051206; goto aq; g: f = -b + g - 1767812960; aq: b = -f; if ((h(20) + 1788482227) * b >= 0) return 0; while (h(0)) __builtin_abort(); goto g; } $ gcc -O1 t.c $ ./a.out # <-- segfault ``` See also Compiler Explore: https://godbolt.org/z/5v4dq5M7M
[Bug middle-end/120425] GCC-compiled (with -O{1,2,s,3}) program got segfault from GCC 12.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 --- Comment #1 from Andrew Pinski --- I don't get a seg fault but an abort.
[Bug tree-optimization/120425] GCC-compiled (with -O{1,2,s,3}) program got segfault from GCC 12.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 --- Comment #2 from congli --- (In reply to Andrew Pinski from comment #1) > I don't get a seg fault but an abort. Seems like you are true. The Compiler Explore shows segfault but I also got an abort on one of my server. Since it is a reduced version, how about this: ``` unsigned g[256] = {0, 1996959894, 3993919788, 2567524794, 124634137, 1886057615, 3915621685, 2657392035, 249268274, 2044508324, 3772115230, 2547177864, 162941995, 2125561021, 3887607047, 2428444049, 498536548, 1789927666, 4089016648, 2227061214, 450548861, 1843258603, 4107580753, 2211677639, 325883990, 1684777152, 4251122042, 2321926636, 335633487, 1661365465, 4195302755, 2366115317, 997073096, 1281953886, 3579855332, 2724688242, 1006888145, 1258607687, 3524101629, 2768942443, 901097722, 1119000684, 3686517206, 2898065728, 853044451, 1172266101, 3705015759, 2882616665, 651767980, 1373503546, 3369554304, 3218104598, 565507253, 1454621731, 3485111705, 3099436303, 671266974, 1594198024, 3322730930, 2970347812, 795835527, 1483230225, 3244367275, 3060149565, 1994146192, 31158534, 2563907772, 4023717930, 1907459465, 112637215, 2680153253, 3904427059, 2013776290, 251722036, 2517215374, 3775830040, 2137656763, 141376813, 2439277719, 3865271297, 1802195444, 476864866, 2238001368, 4066508878, 1812370925, 453092731, 2181625025, 4111451223, 1706088902, 314042704, 2344532202, 4240017532, 1658658271, 366619977, 2362670323, 4224994405, 1303535960, 984961486, 2747007092, 3569037538, 1256170817, 1037604311, 2765210733, 3554079995, 1131014506, 879679996, 2909243462, 3663771856, 1141124467, 855842277, 2852801631, 3708648649, 1342533948, 654459306, 3188396048, 3373015174, 1466479909, 544179635, 3110523913, 3462522015, 1591671054, 702138776, 2966460450, 3352799412, 1504918807, 783551873, 3082640443, 3233442989, 3988292384, 2596254646, 62317068, 1957810842, 3939845945, 2647816111, 81470997, 1943803523, 3814918930, 2489596804, 225274430, 2053790376, 3826175755, 2466906013, 167816743, 2097651377, 4027552580, 2265490386, 503444072, 1762050814, 4150417245, 2154129355, 42655, 1852507879, 4275313526, 2312317920, 282753626, 1742555852, 4189708143, 2394877945, 397917763, 1622183637, 3604390888, 2714866558, 953729732, 1340076626, 3518719985, 2797360999, 1068828381, 1219638859, 3624741850, 2936675148, 906185462, 1090812512, 3747672003, 2825379669, 829329135, 1181335161, 3412177804, 3160834842, 628085408, 1382605366, 3423369109, 3138078467, 570562233, 1426400815, 3317316542, 2998733608, 733239954, 1555261956, 3268935591, 3050360625, 752459403, 1541320221, 2607071920, 3965973030, 1969922972, 40735498, 2617837225, 3943577151, 1913087877, 83908371, 2512341634, 3803740692, 2075208622, 213261112, 2463272603, 3855990285, 2094854071, 198958881, 2262029012, 4057260610, 1759359992, 534414190, 2176718541, 4139329115, 1873836001, 414664567, 2282248934, 4279200368, 1711684554, 285281116, 2405801727, 4167216745, 1634467795, 376229701, 2685067896, 3608007406, 1308918612, 956543938, 2808555105, 3495958263, 1231636301, 1047427035, 2932959818, 3654703836, 1088359270, 936918000, 2847714899, 3736837829, 1202900863, 817233897, 3183342108, 3401237130, 1404277552, 615818150, 3134207493, 3453421203, 1423857449, 601450431, 3009837614, 3294710456, 1567103746, 711928724, 3020668471, 3272380065, 1510334235, 755167117}; int r[20] = {1911263494, 774465782, 4379194, 669572660, -1452495846, -1658729425, 1103267782, -90393310, 1635864740, -1, -1238002948, -351663323, -576056573, 1233623753, -1844776976, -1531764644, -319456054, 1797911602, -684072473, -1155699931}; int gri(int i) { asm volatile ( "" : : "m"(r) : "memory" ); return r[i]; } int f(int av) { unsigned int aa = 4294967295; for (int d = 0; d < av; ++d) { int ab = gri(d); aa = aa >> 8 ^ g[(aa ^ ab) & 255]; aa = aa >> 8 ^ g[(aa ^ (ab >> 8)) & 255]; aa = aa >> 8 ^ g[(aa ^ (ab >> 16)) & 255]; aa = aa >> 8 ^ g[(aa ^ (ab >> 24)) & 255]; } aa = aa ^ 4294967295; return aa; } int main() { int k = 987751161, am = -1211051206; int l = 0, q = 0; if (am) goto aq; ao: goto aw; y: k = -q + am - 1767812960; aq: q = -k - 54680564; if ((f(20) + 1788482227) * q >= 0) return 0; aw: l = -q - k - 54680564; f(0); if (l) goto ao; goto y; } ``` I tried on my server and got a segfault.
[Bug target/120428] New: [15/16 regression] Suboptimal autovec involving blocked permutation and std::copy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120428 Bug ID: 120428 Summary: [15/16 regression] Suboptimal autovec involving blocked permutation and std::copy Product: gcc Version: 15.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: shawn at shawnxu dot org Target Milestone: --- On x86-64 with avx512, PR115444 caused the following code to vectorize sub-optimally: template void permute(T (&data)[N], const std::array& order) { constexpr std::size_t TotalSize = N * sizeof(T); static_assert(TotalSize % (BlockSize * OrderSize) == 0, "ChunkSize * OrderSize must perfectly divide TotalSize"); constexpr std::size_t ProcessChunkSize = BlockSize * OrderSize; std::array buffer{}; std::byte* const bytes = reinterpret_cast(data); for (std::size_t i = 0; i < TotalSize; i += ProcessChunkSize) { std::byte* const values = &bytes[i]; for (std::size_t j = 0; j < OrderSize; j++) { auto* const buffer_chunk = &buffer[j * BlockSize]; auto* const value_chunk = &values[order[j] * BlockSize]; std::copy(value_chunk, value_chunk + BlockSize, buffer_chunk); } std::copy(std::begin(buffer), std::end(buffer), values); } } void permute_weights(std::int16_t (&biases)[4096]) { static constexpr std::array order{0, 2, 4, 6, 1, 3, 5, 7}; permute<16>(biases, order); } * Before PR11544: $ ../gcc-before/bin/g++ -S -O3 -mavx512f permute.cpp $ cat permute.s .file "permute.cpp" .text #APP .globl _ZSt21ios_base_library_initv #NO_APP .p2align 4 .globl _Z15permute_weightsRA4096_s .type _Z15permute_weightsRA4096_s, @function _Z15permute_weightsRA4096_s: .LFB2070: .cfi_startproc leaq16(%rdi), %rax leaq8208(%rdi), %rdx .p2align 4 .p2align 3 .L2: vmovdqu 48(%rax), %xmm4 vmovdqu 80(%rax), %xmm3 subq$-128, %rax vmovdqu -128(%rax), %xmm2 vmovdqu -96(%rax), %xmm1 vmovdqu -64(%rax), %xmm0 vmovdqu -112(%rax), %xmm5 vmovdqu %xmm3, -96(%rax) vmovdqu %xmm4, -112(%rax) vmovdqu %xmm5, -128(%rax) vmovdqu %xmm2, -80(%rax) vmovdqu %xmm1, -64(%rax) vmovdqu %xmm0, -48(%rax) cmpq%rdx, %rax jne .L2 ret .cfi_endproc .LFE2070: .size _Z15permute_weightsRA4096_s, .-_Z15permute_weightsRA4096_s .ident "GCC: (GNU) 15.0.0 20241016 (experimental)" .section.note.GNU-stack,"",@progbits * After PR11544: $ ../gcc-after/bin/g++ -S -O3 -mavx512f permute.cpp $ cat permute.s .file "permute.cpp" .text #APP .globl _ZSt21ios_base_library_initv #NO_APP .p2align 4 .globl _Z15permute_weightsRA4096_s .type _Z15permute_weightsRA4096_s, @function _Z15permute_weightsRA4096_s: .LFB2059: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rdi, %rax leaq8192(%rdi), %rdx movq%rsp, %rbp .cfi_def_cfa_register 6 andq$-64, %rsp subq$8, %rsp .p2align 4 .p2align 3 .L2: vmovdqu (%rax), %xmm0 subq$-128, %rax vmovdqa %xmm0, -120(%rsp) vmovdqu -96(%rax), %xmm0 vmovdqa %xmm0, -104(%rsp) vmovdqu -64(%rax), %xmm0 vmovdqa %xmm0, -88(%rsp) vmovdqu -32(%rax), %xmm0 vmovdqa %xmm0, -72(%rsp) vmovdqu -112(%rax), %xmm0 vmovdqa %xmm0, -56(%rsp) vmovdqu -80(%rax), %xmm0 vmovdqa %xmm0, -40(%rsp) vmovdqu -48(%rax), %xmm0 vmovdqa %xmm0, -24(%rsp) vmovdqu -16(%rax), %xmm0 vmovdqa %xmm0, -8(%rsp) vmovdqa64 -120(%rsp), %zmm0 vmovdqu64 %zmm0, -128(%rax) vmovdqa64 -56(%rsp), %zmm0 vmovdqu64 %zmm0, -64(%rax) cmpq%rdx, %rax jne .L2 vzeroupper leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE2059: .size _Z15permute_weightsRA4096_s, .-_Z15permute_weightsRA4096_s .ident "GCC: (GNU) 15.0.0 20241016 (experimental)" .section.note.GNU-stack,"",@progbits Example assembly generation: https://godbolt.org/z/q1hjxajdo No regression observed when replacing std::copy with std::memcpy: https://godbolt.org/z/Kq5ae7ePo Benchmarking on a slightly different (larger array, aligned storage) variant shows 50% slowdown with the single register version: https://pastebin.com/bKrAPFWj
[Bug target/120428] [15/16 regression] Suboptimal autovec involving blocked permutation and std::copy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120428 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Target||x86_64 Target Milestone|--- |15.2
[Bug target/120429] New: pcmpeqd isn't used for all 1s in *movv2si_internal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120429 Bug ID: 120429 Summary: pcmpeqd isn't used for all 1s in *movv2si_internal Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: liuhongt at gcc dot gnu.org, ubizjak at gmail dot com Target Milestone: --- [hjl@gnu-zen4-1 pr117839]$ cat dl-3.c struct __pthread_mutex_s { int __lock; unsigned int __count; int __owner; unsigned int __nusers; int __kind; short __spins; short __elision; void *p[2]; }; typedef union { struct __pthread_mutex_s __data; char __size[40]; long int __align; } pthread_mutex_t; typedef struct { pthread_mutex_t mutex; } __rtld_lock_recursive_t; void foo (__rtld_lock_recursive_t *lock, int i) { lock[i] = (__rtld_lock_recursive_t) {{ { -1, -1, -1, -1, 1, -1, -1, { ((void *)-1) , ((void *)-1) } } }}; } [hjl@gnu-zen4-1 pr117839]$ /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2 -march=x86-64 -S dl-3.c -m32 [hjl@gnu-zen4-1 pr117839]$ cat dl-3.s .file "dl-3.c" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc movl8(%esp), %eax movl4(%esp), %edx movq.LC1, %xmm0 <<< pcmpeqd %xmm0, %xmm0 should be used instead leal(%eax,%eax,4), %eax leal(%edx,%eax,8), %eax movl$-1, (%eax) movl$-1, 4(%eax) movl$-1, 8(%eax) movl$-1, 12(%eax) movl$1, 16(%eax) movl$-1, 20(%eax) movq%xmm0, 24(%eax) ret .cfi_endproc .LFE0: .size foo, .-foo .section.rodata.cst8,"aM",@progbits,8 .align 8 .LC1: .long -1 .long -1 .ident "GCC: (GNU) 16.0.0 20250524 (experimental)" .section.note.GNU-stack,"",@progbits [hjl@gnu-zen4-1 pr117839]$
[Bug target/120426] New: XMM store isn't used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120426 Bug ID: 120426 Summary: XMM store isn't used Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: liuhongt at gcc dot gnu.org Target Milestone: --- Target: x86-64 [hjl@gnu-zen4-1 pr117839]$ cat dl-1.c struct __pthread_mutex_s { int __lock; unsigned int __count; int __owner; unsigned int __nusers; int __kind; short __spins; short __elision; void *p[2]; }; typedef union { struct __pthread_mutex_s __data; char __size[40]; long int __align; } pthread_mutex_t; typedef struct { pthread_mutex_t mutex; } __rtld_lock_recursive_t; void foo (__rtld_lock_recursive_t *lock) { *lock = (__rtld_lock_recursive_t) {{ { 0, 0, 0, 0, 1, 0, 0, { ((void *)0) , ((void *)0) } } }}; } [hjl@gnu-zen4-1 pr117839]$ make dl-1.s /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2 -fPIC -S dl-1.c [hjl@gnu-zen4-1 pr117839]$ make dl-1-spr.s /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2 -fPIC -mtune=sapphirerapids -S -o dl-1-spr.s dl-1.c [hjl@gnu-zen4-1 pr117839]$ cat dl-1.s .file "dl-1.c" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc pxor%xmm0, %xmm0 movq$0, 32(%rdi) movups %xmm0, 16(%rdi) movups %xmm0, (%rdi) movl$1, 16(%rdi) ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 16.0.0 20250524 (experimental)" .section.note.GNU-stack,"",@progbits [hjl@gnu-zen4-1 pr117839]$ cat dl-1-spr.s .file "dl-1.c" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc pxor%xmm0, %xmm0 movq$0, (%rdi) movq$0, 8(%rdi) movq$1, 16(%rdi) movups %xmm0, 24(%rdi) ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 16.0.0 20250524 (experimental)" .section.note.GNU-stack,"",@progbits [hjl@gnu-zen4-1 pr117839]$ The optimized code should be pxor%xmm0, %xmm0 movups %xmm0, (%rdi) movq$1, 16(%rdi) movups %xmm0, 24(%rdi)
[Bug target/120427] New: [12/13/14/15/16 Regression] "and $0, mem" is generated without -Oz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120427 Bug ID: 120427 Summary: [12/13/14/15/16 Regression] "and $0,mem" is generated without -Oz Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: liuhongt at gcc dot gnu.org, roger at nextmovesoftware dot com, ubizjak at gmail dot com Target Milestone: --- commit ef26c151c14a87177d46fd3d725e7f82e040e89f Author: Roger Sayle Date: Thu Dec 23 12:33:07 2021 + x86: PR target/103773: Fix wrong-code with -Oz from pop to memory. transformed "mov $0,mem" to the shorter and "$0,mem" for -Oz. But (define_insn "*mov_and" [(set (match_operand:SWI248 0 "memory_operand" "=m") (match_operand:SWI248 1 "const0_operand")) (clobber (reg:CC FLAGS_REG))] "reload_completed" "and{}\t{%1, %0|%0, %1}" [(set_attr "type" "alu1") (set_attr "mode" "") (set_attr "length_immediate" "1")]) isn't guarded for -Oz. As the result, "and $0,mem" is generated without -Oz.
[Bug target/120427] [12/13/14/15/16 Regression] "and $0,mem" is generated without -Oz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120427 H.J. Lu changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2025-05-24 Ever confirmed|0 |1 --- Comment #1 from H.J. Lu --- [hjl@gnu-zen4-1 pr117839]$ cat dl-2.c struct __pthread_mutex_s { int __lock; unsigned int __count; int __owner; unsigned int __nusers; int __kind; short __spins; short __elision; void *p[2]; }; typedef union { struct __pthread_mutex_s __data; char __size[40]; long int __align; } pthread_mutex_t; typedef struct { pthread_mutex_t mutex; } __rtld_lock_recursive_t; void foo (__rtld_lock_recursive_t *lock, int i) { lock[i] = (__rtld_lock_recursive_t) {{ { 0, 0, 0, 0, 1, 0, 0, { ((void *)0) , ((void *)0) } } }}; } [hjl@gnu-zen4-1 pr117839]$ make dl-2.s /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2 -fPIC -mtune=sapphirerapids -S dl-2.c [hjl@gnu-zen4-1 pr117839]$ cat dl-2.s .file "dl-2.c" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc movslq %esi, %rsi pxor%xmm0, %xmm0 leaq(%rsi,%rsi,4), %rax movq$1, 16(%rdi,%rax,8) andq$0, (%rdi,%rax,8) andq$0, 8(%rdi,%rax,8) movups %xmm0, 24(%rdi,%rax,8) ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 16.0.0 20250524 (experimental)" .section.note.GNU-stack,"",@progbits [hjl@gnu-zen4-1 pr117839]$
[Bug target/120417] gcc -m32 -O1 codegen error, leading to SIGSEGV, workaround -fno-tree-coalesce-vars
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120417 --- Comment #3 from Simon Sobisch --- Created attachment 61511 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61511&action=edit save-temps: preprocessed and assembly
[Bug tree-optimization/120426] XMM store isn't used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120426 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2025-05-25 Status|UNCONFIRMED |NEW See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=87901 Component|target |tree-optimization Ever confirmed|0 |1 Severity|normal |enhancement CC||pinskia at gcc dot gnu.org --- Comment #1 from Andrew Pinski --- I am not 100% sure this which is better. The gimple level looks like: *lock_2(D) = {}; lock_2(D)->mutexD.2972.__dataD.2967.__kindD.2962 = 1; Which means it is related to PR 87901 (which was just fixed for GCC 16). though in this case the non-zero store is in the middle rather than at the ends. Right now we only trim the stores at either ends rather than split it into 2 stores.
[Bug tree-optimization/120426] XMM store isn't used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120426 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=79716, ||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=49872 --- Comment #2 from Andrew Pinski --- With -mtune=sapphirerapids we get: [local count: 1073741824]: MEM [(union *)lock_2(D)] = 0; MEM [(union *)lock_2(D) + 8B] = 0; MEM [(union *)lock_2(D) + 16B] = 1; MEM [(union *)lock_2(D) + 24B] = { 0, 0 }; Which is because we don't combine "memset"s into large ones yet. Which is PR 49872.
[Bug tree-optimization/120425] [12/13/14/15/16 regression] GCC-compiled (with -O{1,2,s,3}) program got segfault from GCC 12.1 since r12-476-gd846f225c25c58
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 Sam James changed: What|Removed |Added Keywords|needs-bisection | Summary|[12/13/14/15/16 regression] |[12/13/14/15/16 regression] |GCC-compiled (with |GCC-compiled (with |-O{1,2,s,3}) program got|-O{1,2,s,3}) program got |segfault from GCC 12.1 |segfault from GCC 12.1 ||since ||r12-476-gd846f225c25c58 CC||rguenth at gcc dot gnu.org --- Comment #5 from Sam James --- r12-476-gd846f225c25c58
[Bug tree-optimization/120425] GCC-compiled (with -O{1,2,s,3}) program got segfault from GCC 12.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 Andrew Pinski changed: What|Removed |Added Known to work||11.4.0 Last reconfirmed||2025-05-24 Ever confirmed|0 |1 Target Milestone|--- |12.5 Keywords||needs-bisection Status|UNCONFIRMED |NEW Known to fail||12.1.0 --- Comment #4 from Andrew Pinski --- Confirmed.
[Bug cobol/120422] New: Reducing strcmp() and strlen() gcc/cobol/genapi.cc at f3a62dcfc96cb24127385a7e668133e037b6085d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120422 Bug ID: 120422 Summary: Reducing strcmp() and strlen() gcc/cobol/genapi.cc at f3a62dcfc96cb24127385a7e668133e037b6085d Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: cobol Assignee: unassigned at gcc dot gnu.org Reporter: kaelfandrew at gmail dot com Target Milestone: --- Created attachment 61512 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61512&action=edit Patch strcmp() and strlen() are used a lot. This patch should reduce both functions usage by putting it in local variables. Please check if I missed anything.
[Bug preprocessor/120421] -save-temps affects diagnostic position
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120421 --- Comment #6 from nightstrike --- (In reply to Andrew Pinski from comment #2) > Dup. > > *** This bug has been marked as a duplicate of bug 95695 *** Respectfully, I don't think this is a duplicate of a bug that was resolved as invalid. That implies that this bug is also invalid. From the user perspective, though, adding the "-save-temps" option shouldn't break or even change the diagnostic output. If tracking column info works without -save-temps, then it should work with it. If that wasn't clear from my initial description of the issue, then please let me know, and I can expand with more details.
[Bug tree-optimization/120357] [14/15/16 Regression] ICE in vect "error: definition in block 9 does not dominate use in block 3" with early break
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120357 --- Comment #7 from Tamar Christina --- (In reply to Richard Biener from comment #5) > Confirmed on trunk. I'll eventually have a look. Sorry I'm on holiday till Tuesday, I'm happy to take a look then if you prefer. I did not mean to dump my bugs on you.
[Bug tree-optimization/120383] Improving early break unrolled sequences with Adv. SIMD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120383 --- Comment #2 from Tamar Christina --- (In reply to Richard Biener from comment #1) > Sure, I'm OK with an optab for it. So it's like (half-type)((unsigned)(a + > b) >> (sizeof(a)*4))? Yeah, and I was planning on if an optab was acceptable to also add a vectorizer pattern for it. It's a _hi/_lo instruction because of the narrowing. > Does the instruction also work for scalars? Only FPR scalars (as most of these complex instructions). For GPR scalar we would have to emulate it. do you have a preference for the name btw?
[Bug libstdc++/112349] ranges::min/max make unnecessary copies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112349 --- Comment #6 from Ted Lyngmo --- I think this can be closed. I think this was fixed in 14.2.1 if I'm not mistaken.
[Bug target/120424] [arm] -fnon-call-exceptions -fstack-clash-protection triggers lra-eliminations bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120424 Sam James changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=118939 --- Comment #1 from Sam James --- See PR118939 as well.
[Bug tree-optimization/120425] GCC-compiled (with -O{1,2,s,3}) program got segfault from GCC 12.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120425 --- Comment #3 from Andrew Pinski --- Created attachment 61515 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61515&action=edit Semi reduced Removed the arrays
[Bug libstdc++/99832] std::chrono::system_clock::to_time_t needs ABI tag for 32-bit time_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99832 --- Comment #9 from John David Anglin --- Created attachment 61510 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61510&action=edit Revised Debian Patch
[Bug target/119966] [16 regression] pru: Invalid register in RTL expression starting with r16-160-ge6f89d78c1a752
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119966 Dimitar Dimitrov changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Dimitar Dimitrov --- This issue was fixed with r16-809-gf725d6765373f7.