https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Patrick Palka from comment #3)
> Perhaps related to this PR: On x86_64, the following basic wrapper around
> int128 addition
>
> __uint128_t f(__uint128_t x, __uint128_t y) { return x + y; }
>
> gets compiled (/w -O3, -O2 or -Os) to the seemingly suboptimal
>
> movq %rdi, %r9
> movq %rdx, %rax
> movq %rsi, %r8
> movq %rcx, %rdx
> addq %r9, %rax
> adcq %r8, %rdx
> ret
>
> Clang does:
>
> movq %rdi, %rax
> addq %rdx, %rax
> adcq %rcx, %rsi
> movq %rsi, %rdx
> retq
Remove addti3/ashlti3 from i386.md also helps this.