https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119917
Bug ID: 119917 Summary: [Optimization Opportunity] Inefficient code generation for multiplication overflow detection with explicit zero checks Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: a1343922569 at outlook dot com Target Milestone: --- Description: I've identified an opportunity for optimization in GCC where the compiler generates suboptimal code for multiplication overflow detection functions that include explicit division-by-zero checks. The generated assembly contains unnecessary conditional checks and jumps which could be eliminated. This is not a bug, but rather a missed optimization opportunity where explicit zero checks are not fully utilized to generate more efficient code. Environment Information: ①With GCC 14.2.0 - GCC version: GNU C++17 (Compiler-Explorer-Build-gcc--binutils-2.42) version 14.2.0 (x86_64-linux-gnu) - System type: x86_64-linux-gnu - Configuration: compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP ②With GCC 16.0.0 - GCC version: GNU C++17 GNU C++17 (Compiler-Explorer-Build-gcc-3a3bcb6a233352ce2bfa9f6f49dc44d8ae5aa6cb-binutils-2.42) version 16.0.0 20250423 (experimental) (x86_64-linux-gnu) - System type: x86_64-linux-gnu - Configuration: compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP Complete Command Line: g++ -O2 -S test.cpp Compiler Output: ASM generation compiler returned: 0 Execution build compiler returned: 0 Program returned: 0 Preprocessed File: The test case doesn't require a preprocessed file as it doesn't use any macros or include directives. The optimization issue can be seen directly in the assembly output from the source code provided. Reproduction Code (C++): bool mulOverflow64(unsigned long long a, unsigned long long b) { if (b == 0) return false; return a > (~0ULL) / b; } bool mulOverflow32(unsigned int a, unsigned int b) { if (b == 0) return false; return a > (~0U) / b; } bool mulOverflow16(unsigned short a, unsigned short b) { if (b == 0) return false; return a > 0xFFFF / b; } bool mulOverflow8(unsigned char a, unsigned char b) { if (b == 0) return false; return a > 0xFF / b; } int main() { return 0; } Current Generated Assembly: mulOverflow64(unsigned long long, unsigned long long): xor eax, eax test rsi, rsi je .L1 mov rax, rsi mul rdi seto al .L1: ret mulOverflow32(unsigned int, unsigned int): xor eax, eax test esi, esi je .L8 mov eax, esi mul edi seto al .L8: ret mulOverflow16(unsigned short, unsigned short): xor eax, eax test si, si je .L14 xor edx, edx movzx esi, si mov eax, 65535 movzx edi, di idiv esi cmp edi, eax setg al .L14: ret mulOverflow8(unsigned char, unsigned char): xor eax, eax test sil, sil je .L17 xor edx, edx movzx esi, sil mov eax, 255 movzx edi, dil idiv esi cmp edi, eax setg al .L17: ret main: xor eax, eax ret Expected Optimized Assembly: mulOverflow64(unsigned long long, unsigned long long): mov rax, rsi mul rdi seto al ret mulOverflow32(unsigned int, unsigned int): mov eax, esi mul edi seto al ret mulOverflow16(unsigned short, unsigned short): mov eax, esi mul di seto al ret mulOverflow8(unsigned char, unsigned char): mov eax, esi mul dil seto al ret main: xor eax, eax ret Issue: When a function explicitly checks for b==0, the compiler should be able to optimize away the redundant runtime check in the assembly output. Additionally, for 16-bit and 8-bit versions, the compiler isn't using the more efficient multiplication and overflow detection strategy that it uses for 32-bit and 64-bit versions, but the more inefficient division. Performance Impact: These functions are often used in performance-critical code paths. The redundant checks and branches can negatively impact performance, especially in tight loops and on processors with limited branch prediction capabilities. I've verified that this issue indeed exists in both GCC 14.2.0 and the latest trunk version (GCC 16.0.0). Related Previous Bug: This optimization opportunity is related to Bug ID: 117529 ("Missed optimization: (y != 0 && x > (unsigned)(-1) / y) (multiplication overflow check)") reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117529. However, my report differs in several important aspects: 1. This report provides a more comprehensive analysis across multiple integer widths (8, 16, 32, and 64-bit). 2. It demonstrates that the issue is more widespread than previously documented. 3. It highlights an inconsistency in optimization strategy between different integer widths (32/64-bit vs 8/16-bit). 4. It includes concrete examples of the expected optimized assembly for each case. While the previous report focused only on the 64-bit case and didn't provide detailed expectation for the assembly result, this expanded report shows the optimization opportunity affects all integer widths, with the smaller widths (8/16-bit) having an even larger performance gap due to the use of idiv instructions instead of mul with overflow detection. I hope this more comprehensive analysis provides sufficient motivation to address this optimization opportunity, which could improve performance for a common pattern in security-critical and performance-sensitive code.