https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116135

            Bug ID: 116135
           Summary: __builtin_mul_overflow inefficient for _BitInt(31)
                    (with widening multiply)
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
int f1(unsigned _BitInt(31) x, unsigned _BitInt(31) y, unsigned _BitInt(31) *
res)
{
   return __builtin_mul_overflow(x, y, res);
}
```


Currently on aarch64 GCC produces:
```
        and     w0, w0, 2147483647
        and     w1, w1, 2147483647
        umull   x1, w0, w1
        and     w3, w1, 2147483647
        str     w3, [x2]
        cmp     xzr, x1, lsr 32
        cset    w0, ne
        cmp     w3, w1
        csinc   w0, w0, wzr, eq
        ret
```

While LLVM produces:
```
        and     w8, w1, #0x7fffffff
        and     w9, w0, #0x7fffffff
        umull   x8, w9, w8
        ubfx    x9, x8, #31, #1
        tst     x8, #0xffffffff00000000
        and     w8, w8, #0x7fffffff
        str     w8, [x2]
        csinc   w0, w9, wzr, eq
        ret
```

While what LLVM produces is slightly better. 

But both are not as good as if we just do:
```
int f2(unsigned _BitInt(31) x, unsigned _BitInt(31) y, unsigned _BitInt(31) *
res)
{
  unsigned long long xx = x;
  unsigned long long yy = y;
  unsigned long long t = xx * yy;
  *res = t;
  return (t >> 31) != 0;
}
```

What is interesting is both GCC and LLVM does not figure out that is just
__builtin_mul_overflow .

Reply via email to