[Bug c++/96780] debuginfo for std::move and std::forward isn't useful

2023-08-02 Thread moncef.mechri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96780

Moncef Mechri  changed:

   What|Removed |Added

 CC||moncef.mechri at gmail dot com

--- Comment #18 from Moncef Mechri  ---
Currently, -ffold-simple-inlines is disabled when optimizations are disabled.

Since it is pretty much standard practice to disable optimizations in debug
builds (yes, I am aware that -Og exists), perhaps it would be a good idea to
make -ffold-simple-inlines opt-out instead of opt-in even for non-optimized
builds?

[Bug c++/111297] New: missed optimization: [[unlikely]] attribute has no effect at -O2/-O3/-Ofast

2023-09-05 Thread moncef.mechri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111297

Bug ID: 111297
   Summary: missed optimization: [[unlikely]] attribute has no
effect at -O2/-O3/-Ofast
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: moncef.mechri at gmail dot com
  Target Milestone: ---

Consider the following code:

extern bool CheckCondition(int i);
extern void DoWork();
extern void DoOtherWork();

void f1()
{
if (CheckCondition(42)) [[likely]]
DoWork();
else
DoOtherWork();
}

void f2()
{
if (CheckCondition(42)) [[unlikely]]
DoWork();
else
DoOtherWork();
}

The [[unlikely]] attribute in f2() seems to have no impact on codegen at -O2,
-O3, and -Ofast:

f1():
sub rsp, 8
mov edi, 42
callCheckCondition(int)
testal, al
je  .L2
add rsp, 8
jmp DoWork()
.L2:
add rsp, 8
jmp DoOtherWork()
f2():
sub rsp, 8
mov edi, 42
callCheckCondition(int)
testal, al
je  .L6
add rsp, 8
jmp DoWork()
.L6:
add rsp, 8
jmp DoOtherWork()


While the codegen for f1() looks good, the codegen I would have expected for
f2() is:

f2():
sub rsp, 8
mov edi, 42
callCheckCondition(int)
testal, al
jne .L8
add rsp, 8
jmp DoOtherWork()
.L8:
add rsp, 8
jmp DoWork()

Observations:

- All GCC versions since 9.1 (where support for [[likely]] and [[unlikely]] was
first added) seem impacted.

- When f1() is commented out, the issue somehow disappears

- Replacing [[likely]] / [[unlikely]] with __builtin_expect() seems to solve
the issue

- Clang does not suffer from this issue (and neither does GCC at -O1)

https://godbolt.org/z/8o1njKvr1

[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply

2023-10-29 Thread moncef.mechri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551

--- Comment #6 from Moncef Mechri  ---
I confirm the extra mov disappears thanks to Roger's patch.

However, the codegen still seems suboptimal to me when using -march=haswell or
newer, even with Roger's patch:

uint64_t mulx64(uint64_t x)
{
__uint128_t r = (__uint128_t)x * 0x9E3779B97F4A7C15ull;
return (uint64_t)r ^ (uint64_t)( r >> 64 );
}


With -O2:

mulx64(unsigned long):
movabs  rax, -7046029254386353131
mul rdi
xor rax, rdx
ret

With -O2 -march=haswell

mulx64(unsigned long):
movabs  rdx, -7046029254386353131
mulxrdi, rsi, rdi
mov rax, rdi
xor rax, rsi
ret

So it looks like there is still one extra mov, since I think the optimal
codegen using mulx should be:

mulx64(unsigned long):
movabs  rdx, -7046029254386353131
mulxrax, rsi, rdi
xor rax, rsi
ret

[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply

2023-11-06 Thread moncef.mechri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551

--- Comment #9 from Moncef Mechri  ---
With Roger's latest patch, codegen looks good with -O2 and -O2 -march=haswell.

Thanks!

I think this can be marked as resolved?

[Bug rtl-optimization/110551] New: [11 / 12 / 13 /14 regression] Suboptimal codegen for 128 bits multiplication on x86_64

2023-07-04 Thread moncef.mechri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551

Bug ID: 110551
   Summary: [11 / 12 / 13 /14 regression] Suboptimal codegen for
128 bits multiplication on x86_64
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: moncef.mechri at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/3hdondY6n

Codegen for the code shared above (which is a mixing step in boost.Unordered
when a non-avalanching hash function is being used [1] ) regressed since GCC
11. I believe there are 2 regressions:

Regression 1:

A redundant move is introduced:


movabs  rcx, -7046029254386353131
mov rax, rcx


The regression seems to be present at all optimization levels above -O0
(including -Os and -Og).

Possibly a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804


Regression 2

When using -march=haswell or newer, GCC >= 11 emits mulx. The resulting code is
longer (by 1 instruction) with no clear benefit to my untrained eyes. It looks
to me like the code generated by GCC 10 is optimal, even for haswell and newer.


I am reporting both issues in the same bug report because they seem related
enough. Let me know if you want me to split them into 2 bug reports instead.

[1]
https://github.com/boostorg/unordered/blob/9a7d1d336aaa73ad8e5f7c07bdb81b2e793f8d93/include/boost/unordered/detail/mulx.hpp#L111

[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply

2023-07-04 Thread moncef.mechri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551

--- Comment #3 from Moncef Mechri  ---
> Please next time attach (which you can do paste in the box) or paste inline
> the testcase rather than just link to godbolt .

Noted. Apologies.

> It is an older regression though.
> ```
> #include 
> 
> void mulx64(uint64_t *x, uint64_t *t)
> {
> __uint128_t r = (__uint128_t)*x * 0x9E3779B97F4A7C15ull;
> *t = (uint64_t)r ^ (uint64_t)( r >> 64 );
> }
> ```
> 
> It is just an extra mov.
> 
> Also the mulx should have allowed the register allocator to do better but it
> was worse ...

It is true that with this new test case, all GCC versions (including GCC 10)
seem to suffer from both issues reported in the original post.

But the original test case only exhibits suboptimal codegen for GCC >= 11, as
shown in the godbolt link shared above.

[Bug c++/117813] [14 Regression] GCC14 + -fsanitize=undefined + -Os + recursive_directory_iterator results in undefined reference since r14-5979-g99d114c15523e0

2024-12-12 Thread moncef.mechri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117813

Moncef Mechri  changed:

   What|Removed |Added

 CC||moncef.mechri at gmail dot com

--- Comment #12 from Moncef Mechri  ---
I'm hitting a very similar issue, but with std::filesystem::directory_iterator:

#include 

int main()
{
for (const auto& filepath : std::filesystem::directory_iterator("/tmp"));
}

Compiled with -Os -fsanitize=undefined:

opt/compiler-explorer/gcc-14.2.0/bin/../lib/gcc/x86_64-linux-gnu/14.2.0/../../../../x86_64-linux-gnu/bin/ld:
/tmp/ccG3hzn8.o: in function `std::__shared_ptr::__shared_ptr()':
/opt/compiler-explorer/gcc-14.2.0/include/c++/14.2.0/bits/shared_ptr_base.h:1465:(.text.startup+0xde):
undefined reference to `std::__shared_ptr::__shared_ptr()'


Similar to what has already been described in this bug report, only GCC 14.1
and 14.2 seem impacted. The issue is not present in GCC 13 nor 15, or if using
any other sanitizer/optimization level.