[Bug target/79185] [8 Regression] register allocation in the addition of two 128/9 bit ints

2021-06-11 Thread drraph at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79185

--- Comment #17 from Raphael C  ---
Tested in gcc 11.1 with -O2

ai(__int128, __int128):
mov r9, rdi
mov rax, rdx
mov r8, rsi
mov rdx, rcx
add rax, r9
adc rdx, r8
ret


This looks like two more mov's than needed but I may be wrong.

By contrast clang gives

ai(__int128, __int128):  
mov rax, rdi
add rax, rdx
adc rsi, rcx
mov rdx, rsi
ret

[Bug tree-optimization/79201] missed optimization: sinking doesn't handle calls, swap PRE and sinking

2021-06-11 Thread drraph at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79201

--- Comment #5 from Raphael C  ---
I can confirm you now get

f:
mov eax, 1
ret

with gcc 8 onwards.

[Bug tree-optimization/79201] missed optimization: sinking doesn't handle calls, swap PRE and sinking

2024-06-06 Thread drraph at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79201

--- Comment #7 from Raphael C  ---
As of gcc 8 this returns:

f(int):
mov eax, 1
ret

I think this can be closed as resolved now.

[Bug tree-optimization/79726] Missing optimisation: Type conversion not vectorised in simple additive reduction

2024-06-06 Thread drraph at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79726

--- Comment #3 from Raphael C  ---
Issue still present in gcc 14.1

[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions

2023-06-06 Thread drraph at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414

Raphael C  changed:

   What|Removed |Added

 CC||drraph at gmail dot com

--- Comment #12 from Raphael C  ---
This problem has been recently discussed at:

https://stackoverflow.com/questions/76407241/why-is-cython-so-much-slower-than-numba-for-this-simple-loop