https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116135
Bug ID: 116135 Summary: __builtin_mul_overflow inefficient for _BitInt(31) (with widening multiply) Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take: ``` int f1(unsigned _BitInt(31) x, unsigned _BitInt(31) y, unsigned _BitInt(31) * res) { return __builtin_mul_overflow(x, y, res); } ``` Currently on aarch64 GCC produces: ``` and w0, w0, 2147483647 and w1, w1, 2147483647 umull x1, w0, w1 and w3, w1, 2147483647 str w3, [x2] cmp xzr, x1, lsr 32 cset w0, ne cmp w3, w1 csinc w0, w0, wzr, eq ret ``` While LLVM produces: ``` and w8, w1, #0x7fffffff and w9, w0, #0x7fffffff umull x8, w9, w8 ubfx x9, x8, #31, #1 tst x8, #0xffffffff00000000 and w8, w8, #0x7fffffff str w8, [x2] csinc w0, w9, wzr, eq ret ``` While what LLVM produces is slightly better. But both are not as good as if we just do: ``` int f2(unsigned _BitInt(31) x, unsigned _BitInt(31) y, unsigned _BitInt(31) * res) { unsigned long long xx = x; unsigned long long yy = y; unsigned long long t = xx * yy; *res = t; return (t >> 31) != 0; } ``` What is interesting is both GCC and LLVM does not figure out that is just __builtin_mul_overflow .