https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119917

            Bug ID: 119917
           Summary: [Optimization Opportunity] Inefficient code generation
                    for multiplication overflow detection with explicit
                    zero checks
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: a1343922569 at outlook dot com
  Target Milestone: ---

Description:
I've identified an opportunity for optimization in GCC where the compiler
generates suboptimal code for multiplication overflow detection functions that
include explicit division-by-zero checks. The generated assembly contains
unnecessary conditional checks and jumps which could be eliminated.
This is not a bug, but rather a missed optimization opportunity where explicit
zero checks are not fully utilized to generate more efficient code.


Environment Information:
①With GCC 14.2.0
- GCC version: GNU C++17 (Compiler-Explorer-Build-gcc--binutils-2.42) version
14.2.0 (x86_64-linux-gnu)
- System type: x86_64-linux-gnu
- Configuration: compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR
version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP
②With GCC 16.0.0
- GCC version: GNU C++17 GNU C++17
(Compiler-Explorer-Build-gcc-3a3bcb6a233352ce2bfa9f6f49dc44d8ae5aa6cb-binutils-2.42)
version 16.0.0 20250423 (experimental) (x86_64-linux-gnu)
- System type: x86_64-linux-gnu
- Configuration: compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR
version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP


Complete Command Line:
g++ -O2 -S test.cpp


Compiler Output:
ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 0


Preprocessed File:
The test case doesn't require a preprocessed file as it doesn't use any macros
or include directives. The optimization issue can be seen directly in the
assembly output from the source code provided.


Reproduction Code (C++):
bool mulOverflow64(unsigned long long a, unsigned long long b) {
    if (b == 0) return false;
    return a > (~0ULL) / b;
}

bool mulOverflow32(unsigned int a, unsigned int b) {
    if (b == 0) return false;
    return a > (~0U) / b;
}

bool mulOverflow16(unsigned short a, unsigned short b) {
    if (b == 0) return false;
    return a > 0xFFFF / b;
}

bool mulOverflow8(unsigned char a, unsigned char b) {
    if (b == 0) return false;
    return a > 0xFF / b;
}

int main()
{
    return 0;
}


Current Generated Assembly:
mulOverflow64(unsigned long long, unsigned long long):
    xor     eax, eax
    test    rsi, rsi
    je      .L1
    mov     rax, rsi
    mul     rdi
    seto    al
.L1:
    ret
mulOverflow32(unsigned int, unsigned int):
    xor     eax, eax
    test    esi, esi
    je      .L8
    mov     eax, esi
    mul     edi
    seto    al
.L8:
    ret
mulOverflow16(unsigned short, unsigned short):
    xor     eax, eax
    test    si, si
    je      .L14
    xor     edx, edx
    movzx   esi, si
    mov     eax, 65535
    movzx   edi, di
    idiv    esi
    cmp     edi, eax
    setg    al
.L14:
    ret
mulOverflow8(unsigned char, unsigned char):
    xor     eax, eax
    test    sil, sil
    je      .L17
    xor     edx, edx
    movzx   esi, sil
    mov     eax, 255
    movzx   edi, dil
    idiv    esi
    cmp     edi, eax
    setg    al
.L17:
    ret
main:
    xor     eax, eax
    ret


Expected Optimized Assembly:
mulOverflow64(unsigned long long, unsigned long long):
    mov     rax, rsi
    mul     rdi
    seto    al
    ret
mulOverflow32(unsigned int, unsigned int):
    mov     eax, esi
    mul     edi
    seto    al
    ret
mulOverflow16(unsigned short, unsigned short):
    mov     eax, esi
    mul     di
    seto    al
    ret
mulOverflow8(unsigned char, unsigned char):
    mov     eax, esi
    mul     dil
    seto    al
    ret
main:
    xor     eax, eax
    ret


Issue:
When a function explicitly checks for b==0, the compiler should be able to
optimize away the redundant runtime check in the assembly output. Additionally,
for 16-bit and 8-bit versions, the compiler isn't using the more efficient
multiplication and overflow detection strategy that it uses for 32-bit and
64-bit versions, but the more inefficient division.


Performance Impact:
These functions are often used in performance-critical code paths. The
redundant checks and branches can negatively impact performance, especially in
tight loops and on processors with limited branch prediction capabilities.


I've verified that this issue indeed exists in both GCC 14.2.0 and the latest
trunk version (GCC 16.0.0).


Related Previous Bug:
This optimization opportunity is related to Bug ID: 117529 ("Missed
optimization: (y != 0 && x > (unsigned)(-1) / y) (multiplication overflow
check)") reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117529.
However, my report differs in several important aspects:
1. This report provides a more comprehensive analysis across multiple integer
widths (8, 16, 32, and 64-bit).
2. It demonstrates that the issue is more widespread than previously
documented.
3. It highlights an inconsistency in optimization strategy between different
integer widths (32/64-bit vs 8/16-bit).
4. It includes concrete examples of the expected optimized assembly for each
case.
While the previous report focused only on the 64-bit case and didn't provide
detailed expectation for the assembly result, this expanded report shows the
optimization opportunity affects all integer widths, with the smaller widths
(8/16-bit) having an even larger performance gap due to the use of idiv
instructions instead of mul with overflow detection.
I hope this more comprehensive analysis provides sufficient motivation to
address this optimization opportunity, which could improve performance for a
common pattern in security-critical and performance-sensitive code.

Reply via email to