https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123020

            Bug ID: 123020
           Summary: [missed-optimization] Equivalent ternary and if
                    produce different loop (extra compare in g())
           Product: gcc
           Version: 15.2.1
               URL: https://godbolt.org/z/dT85acP66
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: skywave2023 at outlook dot com
  Target Milestone: ---
            Target: aarch64-apple-darwin25

GCC version:

Using built-in specs.
COLLECT_GCC=/opt/compiler-explorer/gcc-15.2.0/bin/g++
COLLECT_LTO_WRAPPER=/cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/bin/../libexec/gcc/x86_64-linux-gnu/15.2.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../gcc-15.2.0/configure
--prefix=/opt/compiler-explorer/gcc-build/staging
--enable-libstdcxx-backtrace=yes --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
--enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --enable-clocale=gnu
--enable-languages=c,c++,fortran,ada,objc,obj-c++,go,d,m2,rust,cobol
--enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
--enable-plugins --enable-threads=posix
--with-pkgversion=Compiler-Explorer-Build-gcc--binutils-2.44
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.2.0 (Compiler-Explorer-Build-gcc--binutils-2.44) 
COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s'
'-fno-verbose-asm' '-v' '-O2' '-L./lib' '-shared-libgcc' '-mtune=generic'
'-march=x86-64' '-dumpdir' '/app/output.s-'

/cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/bin/../libexec/gcc/x86_64-linux-gnu/15.2.0/cc1plus
-quiet -v -imultiarch x86_64-linux-gnu -iprefix
/cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/bin/../lib/gcc/x86_64-linux-gnu/15.2.0/
-D_GNU_SOURCE <source> -quiet -dumpdir /app/output.s- -dumpbase example.cpp
-dumpbase-ext .cpp -mtune=generic -march=x86-64 -g -O2 -version
-fdiagnostics-color=always -fno-verbose-asm -o /tmp/ccIwbjB7.s
GNU C++17 (Compiler-Explorer-Build-gcc--binutils-2.44) version 15.2.0
(x86_64-linux-gnu)
        compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP
E


Environment:
  Reproduced on Compiler Explorer (godbolt.org) with the x86_64 gcc compiler
  "x86-64 gcc 15.2" shown in the left-hand compiler selection.

This is a code-quality / missed-optimization issue in the optimizer, not a
miscompilation. The two functions below are semantically equivalent but GCC
generates a noticeably worse loop for the ternary form.

=== Minimal testcase (test.cpp) ===

using ui  = unsigned;
using ull = unsigned long long;

constexpr int n = 100;
ui a[n + 1];

struct Result {
    ui  mx;
    ull sum;
};

Result f() {
    ui mx = 0;
    ull sum = 0;
    for (int i = 1; i <= n; ++i) {
        if (mx < a[i]) {
            mx  = a[i];
            sum += a[i];
        }
    }
    return {mx, sum};
}

Result g() {
    ui mx = 0;
    ull sum = 0;
    for (int i = 1; i <= n; ++i) {
        bool flag = mx < a[i];
        mx  = flag ? a[i] : mx;
        sum = flag ? sum + a[i] : sum;
    }
    return {mx, sum};
}

int main() {
    a[1] = 1;
    Result r1 = f();
    Result r2 = g();
    if (r1.mx != r2.mx || r1.sum != r2.sum)
        __builtin_abort();
}

=== Semantics ===

Functions f() and g() are semantically equivalent in C++: both maintain the
maximum element seen so far in mx, and the sum of all elements that became a
new maximum in sum.

In g(), the boolean "flag" is computed as (mx < a[i]) using the old value of
mx and then used twice, once to update mx and once to update sum. This is
logically the same as the single "if" in f().

=== Actual code generated ===

With -O2 on x86_64, GCC produces a tight loop for f() that uses a single
compare and reuses the flags for both conditional moves. For example
(abridged Intel syntax, loop body only):

  f():
      mov     ecx, DWORD PTR [rax]    # load a[i]
      mov     rdx, rcx                # copy value
      add     rcx, rsi                # candidate = sum + a[i]
      cmp     edi, edx                # compare mx and value
      cmovb   rsi, rcx                # if (mx < value) sum = candidate
      cmovb   rdi, rdx                # if (mx < value) mx  = value
      ...

For g(), GCC generates code with two compares and extra moves in the inner
loop. For example, the loop body looks like:

  g():
      mov     esi, DWORD PTR [rcx]    # load a[i]
      mov     r8d, edx                # save old mx
      cmp     edx, esi                # first compare (mx vs a[i])
      mov     eax, esi                # copy a[i]
      cmovb   rdx, rsi                # update mx
      add     rsi, rdi                # candidate = sum + a[i] (clobbers flags)
      cmp     r8d, eax                # second compare (old mx vs a[i])
      cmovb   rdi, rsi                # update sum
      ...

So g() ends up with two identical compares per iteration and some extra
register moves to preserve the operands for the second compare, even though
f() and g() have the same semantics.

This difference shows up in microbenchmarks when the array fits in cache,
but the issue is purely about the code shape in the generated loop.

=== Expected behaviour ===

It seems that the optimizer should be able to recognize the pattern in g()
and generate essentially the same loop as for f(): a single compare whose
condition codes are reused for two conditional updates (for mx and sum),
without repeating the comparison.

In other words, g() looks like a straightforward missed-optimization
opportunity in the presence of the ternary operator form.

=== Additional notes ===

- This is not a correctness issue; both f() and g() produce the same
  result, as checked in main().
- The preprocessed source for this testcase, obtained using -save-temps,
  is attached as test.ii.
- Suggested component: tree-optimization.
- Suggested keyword: missed-optimization.

Reply via email to