[Bug c++/96780] debuginfo for std::move and std::forward isn't useful
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780 Moncef Mechri changed: What|Removed |Added CC||moncef.mechri at gmail dot com --- Comment #18 from Moncef Mechri --- Currently, -ffold-simple-inlines is disabled when optimizations are disabled. Since it is pretty much standard practice to disable optimizations in debug builds (yes, I am aware that -Og exists), perhaps it would be a good idea to make -ffold-simple-inlines opt-out instead of opt-in even for non-optimized builds?
[Bug c++/111297] New: missed optimization: [[unlikely]] attribute has no effect at -O2/-O3/-Ofast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111297 Bug ID: 111297 Summary: missed optimization: [[unlikely]] attribute has no effect at -O2/-O3/-Ofast Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: moncef.mechri at gmail dot com Target Milestone: --- Consider the following code: extern bool CheckCondition(int i); extern void DoWork(); extern void DoOtherWork(); void f1() { if (CheckCondition(42)) [[likely]] DoWork(); else DoOtherWork(); } void f2() { if (CheckCondition(42)) [[unlikely]] DoWork(); else DoOtherWork(); } The [[unlikely]] attribute in f2() seems to have no impact on codegen at -O2, -O3, and -Ofast: f1(): sub rsp, 8 mov edi, 42 callCheckCondition(int) testal, al je .L2 add rsp, 8 jmp DoWork() .L2: add rsp, 8 jmp DoOtherWork() f2(): sub rsp, 8 mov edi, 42 callCheckCondition(int) testal, al je .L6 add rsp, 8 jmp DoWork() .L6: add rsp, 8 jmp DoOtherWork() While the codegen for f1() looks good, the codegen I would have expected for f2() is: f2(): sub rsp, 8 mov edi, 42 callCheckCondition(int) testal, al jne .L8 add rsp, 8 jmp DoOtherWork() .L8: add rsp, 8 jmp DoWork() Observations: - All GCC versions since 9.1 (where support for [[likely]] and [[unlikely]] was first added) seem impacted. - When f1() is commented out, the issue somehow disappears - Replacing [[likely]] / [[unlikely]] with __builtin_expect() seems to solve the issue - Clang does not suffer from this issue (and neither does GCC at -O1) https://godbolt.org/z/8o1njKvr1
[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551 --- Comment #6 from Moncef Mechri --- I confirm the extra mov disappears thanks to Roger's patch. However, the codegen still seems suboptimal to me when using -march=haswell or newer, even with Roger's patch: uint64_t mulx64(uint64_t x) { __uint128_t r = (__uint128_t)x * 0x9E3779B97F4A7C15ull; return (uint64_t)r ^ (uint64_t)( r >> 64 ); } With -O2: mulx64(unsigned long): movabs rax, -7046029254386353131 mul rdi xor rax, rdx ret With -O2 -march=haswell mulx64(unsigned long): movabs rdx, -7046029254386353131 mulxrdi, rsi, rdi mov rax, rdi xor rax, rsi ret So it looks like there is still one extra mov, since I think the optimal codegen using mulx should be: mulx64(unsigned long): movabs rdx, -7046029254386353131 mulxrax, rsi, rdi xor rax, rsi ret
[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551 --- Comment #9 from Moncef Mechri --- With Roger's latest patch, codegen looks good with -O2 and -O2 -march=haswell. Thanks! I think this can be marked as resolved?
[Bug rtl-optimization/110551] New: [11 / 12 / 13 /14 regression] Suboptimal codegen for 128 bits multiplication on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551 Bug ID: 110551 Summary: [11 / 12 / 13 /14 regression] Suboptimal codegen for 128 bits multiplication on x86_64 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: moncef.mechri at gmail dot com Target Milestone: --- https://godbolt.org/z/3hdondY6n Codegen for the code shared above (which is a mixing step in boost.Unordered when a non-avalanching hash function is being used [1] ) regressed since GCC 11. I believe there are 2 regressions: Regression 1: A redundant move is introduced: movabs rcx, -7046029254386353131 mov rax, rcx The regression seems to be present at all optimization levels above -O0 (including -Os and -Og). Possibly a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804 Regression 2 When using -march=haswell or newer, GCC >= 11 emits mulx. The resulting code is longer (by 1 instruction) with no clear benefit to my untrained eyes. It looks to me like the code generated by GCC 10 is optimal, even for haswell and newer. I am reporting both issues in the same bug report because they seem related enough. Let me know if you want me to split them into 2 bug reports instead. [1] https://github.com/boostorg/unordered/blob/9a7d1d336aaa73ad8e5f7c07bdb81b2e793f8d93/include/boost/unordered/detail/mulx.hpp#L111
[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551 --- Comment #3 from Moncef Mechri --- > Please next time attach (which you can do paste in the box) or paste inline > the testcase rather than just link to godbolt . Noted. Apologies. > It is an older regression though. > ``` > #include > > void mulx64(uint64_t *x, uint64_t *t) > { > __uint128_t r = (__uint128_t)*x * 0x9E3779B97F4A7C15ull; > *t = (uint64_t)r ^ (uint64_t)( r >> 64 ); > } > ``` > > It is just an extra mov. > > Also the mulx should have allowed the register allocator to do better but it > was worse ... It is true that with this new test case, all GCC versions (including GCC 10) seem to suffer from both issues reported in the original post. But the original test case only exhibits suboptimal codegen for GCC >= 11, as shown in the godbolt link shared above.
[Bug c++/117813] [14 Regression] GCC14 + -fsanitize=undefined + -Os + recursive_directory_iterator results in undefined reference since r14-5979-g99d114c15523e0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117813 Moncef Mechri changed: What|Removed |Added CC||moncef.mechri at gmail dot com --- Comment #12 from Moncef Mechri --- I'm hitting a very similar issue, but with std::filesystem::directory_iterator: #include int main() { for (const auto& filepath : std::filesystem::directory_iterator("/tmp")); } Compiled with -Os -fsanitize=undefined: opt/compiler-explorer/gcc-14.2.0/bin/../lib/gcc/x86_64-linux-gnu/14.2.0/../../../../x86_64-linux-gnu/bin/ld: /tmp/ccG3hzn8.o: in function `std::__shared_ptr::__shared_ptr()': /opt/compiler-explorer/gcc-14.2.0/include/c++/14.2.0/bits/shared_ptr_base.h:1465:(.text.startup+0xde): undefined reference to `std::__shared_ptr::__shared_ptr()' Similar to what has already been described in this bug report, only GCC 14.1 and 14.2 seem impacted. The issue is not present in GCC 13 nor 15, or if using any other sanitizer/optimization level.