https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

--- Comment #5 from Mason <slash.tmp at free dot fr> ---
FWIW, trunk (gcc14) translates testcase3 to the same code as the other
testcases, while remaining portable across all architectures:

$ gcc-trunk -O3 -march=bdver3 testcase3.c

typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase3(u64 *acc, u64 a, u64 b)
{
  int c1, c2;
  u128 res = (u128)a * b;
  u64 lo = res, hi = res >> 64;
  c1 = __builtin_add_overflow(lo, acc[0], &acc[0]);
  c2 = __builtin_add_overflow(hi, acc[1], &acc[1])
     | __builtin_add_overflow(c1, acc[1], &acc[1]);
       __builtin_add_overflow(c2, acc[2], &acc[2]);
}

testcase3:
        movq    %rsi, %rax
        mulq    %rdx
        addq    %rax, (%rdi)
        adcq    %rdx, 8(%rdi)
        adcq    $0, 16(%rdi)
        ret

Thanks again, Jakub.

Reply via email to