[Bug target/79185] [8 Regression] register allocation in the addition of two 128/9 bit ints
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79185 --- Comment #17 from Raphael C --- Tested in gcc 11.1 with -O2 ai(__int128, __int128): mov r9, rdi mov rax, rdx mov r8, rsi mov rdx, rcx add rax, r9 adc rdx, r8 ret This looks like two more mov's than needed but I may be wrong. By contrast clang gives ai(__int128, __int128): mov rax, rdi add rax, rdx adc rsi, rcx mov rdx, rsi ret
[Bug tree-optimization/79201] missed optimization: sinking doesn't handle calls, swap PRE and sinking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79201 --- Comment #5 from Raphael C --- I can confirm you now get f: mov eax, 1 ret with gcc 8 onwards.
[Bug tree-optimization/79201] missed optimization: sinking doesn't handle calls, swap PRE and sinking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79201 --- Comment #7 from Raphael C --- As of gcc 8 this returns: f(int): mov eax, 1 ret I think this can be closed as resolved now.
[Bug tree-optimization/79726] Missing optimisation: Type conversion not vectorised in simple additive reduction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79726 --- Comment #3 from Raphael C --- Issue still present in gcc 14.1
[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414 Raphael C changed: What|Removed |Added CC||drraph at gmail dot com --- Comment #12 from Raphael C --- This problem has been recently discussed at: https://stackoverflow.com/questions/76407241/why-is-cython-so-much-slower-than-numba-for-this-simple-loop