https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123020
--- Comment #2 from Castle "SkyWave" Sun <skywave2023 at outlook dot com> ---
(In reply to Castle "SkyWave" Sun from comment #0)
> GCC version:
>
> Using built-in specs.
> COLLECT_GCC=/opt/compiler-explorer/gcc-15.2.0/bin/g++
> COLLECT_LTO_WRAPPER=/cefs/22/22e6cdc013c8541ce3d1548e_consolidated/
> compilers_c++_x86_gcc_15.2.0/bin/../libexec/gcc/x86_64-linux-gnu/15.2.0/lto-
> wrapper
> Target: x86_64-linux-gnu
> Configured with: ../gcc-15.2.0/configure
> --prefix=/opt/compiler-explorer/gcc-build/staging
> --enable-libstdcxx-backtrace=yes --build=x86_64-linux-gnu
> --host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
> --enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
> --enable-multilib --enable-clocale=gnu
> --enable-languages=c,c++,fortran,ada,objc,obj-c++,go,d,m2,rust,cobol
> --enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
> --enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
> --enable-plugins --enable-threads=posix
> --with-pkgversion=Compiler-Explorer-Build-gcc--binutils-2.44
> Thread model: posix
> Supported LTO compression algorithms: zlib
> gcc version 15.2.0 (Compiler-Explorer-Build-gcc--binutils-2.44)
> COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s'
> '-fno-verbose-asm' '-v' '-O2' '-L./lib' '-shared-libgcc' '-mtune=generic'
> '-march=x86-64' '-dumpdir' '/app/output.s-'
> /cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/
> bin/../libexec/gcc/x86_64-linux-gnu/15.2.0/cc1plus -quiet -v -imultiarch
> x86_64-linux-gnu -iprefix
> /cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/
> bin/../lib/gcc/x86_64-linux-gnu/15.2.0/ -D_GNU_SOURCE <source> -quiet
> -dumpdir /app/output.s- -dumpbase example.cpp -dumpbase-ext .cpp
> -mtune=generic -march=x86-64 -g -O2 -version -fdiagnostics-color=always
> -fno-verbose-asm -o /tmp/ccIwbjB7.s
> GNU C++17 (Compiler-Explorer-Build-gcc--binutils-2.44) version 15.2.0
> (x86_64-linux-gnu)
> compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR version 4.1.0,
> MPC version 1.2.1, isl version isl-0.24-GMP
>
> Environment:
> Reproduced on Compiler Explorer (godbolt.org) with the x86_64 gcc compiler
> "x86-64 gcc 15.2" shown in the left-hand compiler selection.
>
> This is a code-quality / missed-optimization issue in the optimizer, not a
> miscompilation. The two functions below are semantically equivalent but GCC
> generates a noticeably worse loop for the ternary form.
>
> === Minimal testcase (test.cpp) ===
>
> using ui = unsigned;
> using ull = unsigned long long;
>
> constexpr int n = 100;
> ui a[n + 1];
>
> struct Result {
> ui mx;
> ull sum;
> };
>
> Result f() {
> ui mx = 0;
> ull sum = 0;
> for (int i = 1; i <= n; ++i) {
> if (mx < a[i]) {
> mx = a[i];
> sum += a[i];
> }
> }
> return {mx, sum};
> }
>
> Result g() {
> ui mx = 0;
> ull sum = 0;
> for (int i = 1; i <= n; ++i) {
> bool flag = mx < a[i];
> mx = flag ? a[i] : mx;
> sum = flag ? sum + a[i] : sum;
> }
> return {mx, sum};
> }
>
> int main() {
> a[1] = 1;
> Result r1 = f();
> Result r2 = g();
> if (r1.mx != r2.mx || r1.sum != r2.sum)
> __builtin_abort();
> }
>
> === Semantics ===
>
> Functions f() and g() are semantically equivalent in C++: both maintain the
> maximum element seen so far in mx, and the sum of all elements that became a
> new maximum in sum.
>
> In g(), the boolean "flag" is computed as (mx < a[i]) using the old value of
> mx and then used twice, once to update mx and once to update sum. This is
> logically the same as the single "if" in f().
>
> === Actual code generated ===
>
> With -O2 on x86_64, GCC produces a tight loop for f() that uses a single
> compare and reuses the flags for both conditional moves. For example
> (abridged Intel syntax, loop body only):
>
> f():
> mov ecx, DWORD PTR [rax] # load a[i]
> mov rdx, rcx # copy value
> add rcx, rsi # candidate = sum + a[i]
> cmp edi, edx # compare mx and value
> cmovb rsi, rcx # if (mx < value) sum = candidate
> cmovb rdi, rdx # if (mx < value) mx = value
> ...
>
> For g(), GCC generates code with two compares and extra moves in the inner
> loop. For example, the loop body looks like:
>
> g():
> mov esi, DWORD PTR [rcx] # load a[i]
> mov r8d, edx # save old mx
> cmp edx, esi # first compare (mx vs a[i])
> mov eax, esi # copy a[i]
> cmovb rdx, rsi # update mx
> add rsi, rdi # candidate = sum + a[i] (clobbers
> flags)
> cmp r8d, eax # second compare (old mx vs a[i])
> cmovb rdi, rsi # update sum
> ...
>
> So g() ends up with two identical compares per iteration and some extra
> register moves to preserve the operands for the second compare, even though
> f() and g() have the same semantics.
>
> This difference shows up in microbenchmarks when the array fits in cache,
> but the issue is purely about the code shape in the generated loop.
>
> === Expected behaviour ===
>
> It seems that the optimizer should be able to recognize the pattern in g()
> and generate essentially the same loop as for f(): a single compare whose
> condition codes are reused for two conditional updates (for mx and sum),
> without repeating the comparison.
>
> In other words, g() looks like a straightforward missed-optimization
> opportunity in the presence of the ternary operator form.
>
> === Additional notes ===
>
> - This is not a correctness issue; both f() and g() produce the same
> result, as checked in main().
> - The preprocessed source for this testcase, obtained using -save-temps,
> is attached as test.ii.
> - Suggested component: tree-optimization.
> - Suggested keyword: missed-optimization.