https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108862

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, -O2 -mcpu=power9:
__attribute__((noipa)) unsigned __int128
foo (unsigned __int128 x, unsigned long long y, unsigned long long z)
{
  return x + (unsigned __int128) y * z;
}

int
main ()
{
  unsigned __int128 x = foo (0, 0x04a13945d898c296ULL, 0x0000100000000fffULL);
  if ((unsigned long long) (x >> 64) != 0x0000004a13945dd3ULL
      || (unsigned long long) x != 0x9b1c8443b3909d6aULL)
    __builtin_abort ();
  return 0;
}
works correctly, in that case we get:
        maddhdu 10,5,6,3
        maddld 3,5,6,3
        add 4,10,4
which is correct.  But for the #c0 testcase above, e.g. with -O2
-fno-unroll-loops -mcpu=power9 we get
.L3:
        ldu 9,8(8)
        ldu 10,-8(5)
        maddld 3,9,10,3
        maddhdu 9,9,10,3
        add 4,9,4
        bdnz .L3
in the inner loop, which looks wrong because maddhdu in that case uses result
of maddld as last operand rather than the low part of the 128-bit counter (w).

Reply via email to