https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Patrick Palka from comment #3) > Perhaps related to this PR: On x86_64, the following basic wrapper around > int128 addition > > __uint128_t f(__uint128_t x, __uint128_t y) { return x + y; } > > gets compiled (/w -O3, -O2 or -Os) to the seemingly suboptimal > > movq %rdi, %r9 > movq %rdx, %rax > movq %rsi, %r8 > movq %rcx, %rdx > addq %r9, %rax > adcq %r8, %rdx > ret > > Clang does: > > movq %rdi, %rax > addq %rdx, %rax > adcq %rcx, %rsi > movq %rsi, %rdx > retq Remove addti3/ashlti3 from i386.md also helps this.