https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116135
Bug ID: 116135
Summary: __builtin_mul_overflow inefficient for _BitInt(31)
(with widening multiply)
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
Take:
```
int f1(unsigned _BitInt(31) x, unsigned _BitInt(31) y, unsigned _BitInt(31) *
res)
{
return __builtin_mul_overflow(x, y, res);
}
```
Currently on aarch64 GCC produces:
```
and w0, w0, 2147483647
and w1, w1, 2147483647
umull x1, w0, w1
and w3, w1, 2147483647
str w3, [x2]
cmp xzr, x1, lsr 32
cset w0, ne
cmp w3, w1
csinc w0, w0, wzr, eq
ret
```
While LLVM produces:
```
and w8, w1, #0x7fffffff
and w9, w0, #0x7fffffff
umull x8, w9, w8
ubfx x9, x8, #31, #1
tst x8, #0xffffffff00000000
and w8, w8, #0x7fffffff
str w8, [x2]
csinc w0, w9, wzr, eq
ret
```
While what LLVM produces is slightly better.
But both are not as good as if we just do:
```
int f2(unsigned _BitInt(31) x, unsigned _BitInt(31) y, unsigned _BitInt(31) *
res)
{
unsigned long long xx = x;
unsigned long long yy = y;
unsigned long long t = xx * yy;
*res = t;
return (t >> 31) != 0;
}
```
What is interesting is both GCC and LLVM does not figure out that is just
__builtin_mul_overflow .