https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123020

--- Comment #2 from Castle "SkyWave" Sun <skywave2023 at outlook dot com> ---
(In reply to Castle "SkyWave" Sun from comment #0)
> GCC version:
> 
> Using built-in specs.
> COLLECT_GCC=/opt/compiler-explorer/gcc-15.2.0/bin/g++
> COLLECT_LTO_WRAPPER=/cefs/22/22e6cdc013c8541ce3d1548e_consolidated/
> compilers_c++_x86_gcc_15.2.0/bin/../libexec/gcc/x86_64-linux-gnu/15.2.0/lto-
> wrapper
> Target: x86_64-linux-gnu
> Configured with: ../gcc-15.2.0/configure
> --prefix=/opt/compiler-explorer/gcc-build/staging
> --enable-libstdcxx-backtrace=yes --build=x86_64-linux-gnu
> --host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
> --enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
> --enable-multilib --enable-clocale=gnu
> --enable-languages=c,c++,fortran,ada,objc,obj-c++,go,d,m2,rust,cobol
> --enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
> --enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
> --enable-plugins --enable-threads=posix
> --with-pkgversion=Compiler-Explorer-Build-gcc--binutils-2.44
> Thread model: posix
> Supported LTO compression algorithms: zlib
> gcc version 15.2.0 (Compiler-Explorer-Build-gcc--binutils-2.44) 
> COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s'
> '-fno-verbose-asm' '-v' '-O2' '-L./lib' '-shared-libgcc' '-mtune=generic'
> '-march=x86-64' '-dumpdir' '/app/output.s-'
>  /cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/
> bin/../libexec/gcc/x86_64-linux-gnu/15.2.0/cc1plus -quiet -v -imultiarch
> x86_64-linux-gnu -iprefix
> /cefs/22/22e6cdc013c8541ce3d1548e_consolidated/compilers_c++_x86_gcc_15.2.0/
> bin/../lib/gcc/x86_64-linux-gnu/15.2.0/ -D_GNU_SOURCE <source> -quiet
> -dumpdir /app/output.s- -dumpbase example.cpp -dumpbase-ext .cpp
> -mtune=generic -march=x86-64 -g -O2 -version -fdiagnostics-color=always
> -fno-verbose-asm -o /tmp/ccIwbjB7.s
> GNU C++17 (Compiler-Explorer-Build-gcc--binutils-2.44) version 15.2.0
> (x86_64-linux-gnu)
>       compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR version 4.1.0,
> MPC version 1.2.1, isl version isl-0.24-GMP
> 
> Environment:
>   Reproduced on Compiler Explorer (godbolt.org) with the x86_64 gcc compiler
>   "x86-64 gcc 15.2" shown in the left-hand compiler selection.
> 
> This is a code-quality / missed-optimization issue in the optimizer, not a
> miscompilation. The two functions below are semantically equivalent but GCC
> generates a noticeably worse loop for the ternary form.
> 
> === Minimal testcase (test.cpp) ===
> 
> using ui  = unsigned;
> using ull = unsigned long long;
> 
> constexpr int n = 100;
> ui a[n + 1];
> 
> struct Result {
>     ui  mx;
>     ull sum;
> };
> 
> Result f() {
>     ui mx = 0;
>     ull sum = 0;
>     for (int i = 1; i <= n; ++i) {
>         if (mx < a[i]) {
>             mx  = a[i];
>             sum += a[i];
>         }
>     }
>     return {mx, sum};
> }
> 
> Result g() {
>     ui mx = 0;
>     ull sum = 0;
>     for (int i = 1; i <= n; ++i) {
>         bool flag = mx < a[i];
>         mx  = flag ? a[i] : mx;
>         sum = flag ? sum + a[i] : sum;
>     }
>     return {mx, sum};
> }
> 
> int main() {
>     a[1] = 1;
>     Result r1 = f();
>     Result r2 = g();
>     if (r1.mx != r2.mx || r1.sum != r2.sum)
>         __builtin_abort();
> }
> 
> === Semantics ===
> 
> Functions f() and g() are semantically equivalent in C++: both maintain the
> maximum element seen so far in mx, and the sum of all elements that became a
> new maximum in sum.
> 
> In g(), the boolean "flag" is computed as (mx < a[i]) using the old value of
> mx and then used twice, once to update mx and once to update sum. This is
> logically the same as the single "if" in f().
> 
> === Actual code generated ===
> 
> With -O2 on x86_64, GCC produces a tight loop for f() that uses a single
> compare and reuses the flags for both conditional moves. For example
> (abridged Intel syntax, loop body only):
> 
>   f():
>       mov     ecx, DWORD PTR [rax]    # load a[i]
>       mov     rdx, rcx                # copy value
>       add     rcx, rsi                # candidate = sum + a[i]
>       cmp     edi, edx                # compare mx and value
>       cmovb   rsi, rcx                # if (mx < value) sum = candidate
>       cmovb   rdi, rdx                # if (mx < value) mx  = value
>       ...
> 
> For g(), GCC generates code with two compares and extra moves in the inner
> loop. For example, the loop body looks like:
> 
>   g():
>       mov     esi, DWORD PTR [rcx]    # load a[i]
>       mov     r8d, edx                # save old mx
>       cmp     edx, esi                # first compare (mx vs a[i])
>       mov     eax, esi                # copy a[i]
>       cmovb   rdx, rsi                # update mx
>       add     rsi, rdi                # candidate = sum + a[i] (clobbers
> flags)
>       cmp     r8d, eax                # second compare (old mx vs a[i])
>       cmovb   rdi, rsi                # update sum
>       ...
> 
> So g() ends up with two identical compares per iteration and some extra
> register moves to preserve the operands for the second compare, even though
> f() and g() have the same semantics.
> 
> This difference shows up in microbenchmarks when the array fits in cache,
> but the issue is purely about the code shape in the generated loop.
> 
> === Expected behaviour ===
> 
> It seems that the optimizer should be able to recognize the pattern in g()
> and generate essentially the same loop as for f(): a single compare whose
> condition codes are reused for two conditional updates (for mx and sum),
> without repeating the comparison.
> 
> In other words, g() looks like a straightforward missed-optimization
> opportunity in the presence of the ternary operator form.
> 
> === Additional notes ===
> 
> - This is not a correctness issue; both f() and g() produce the same
>   result, as checked in main().
> - The preprocessed source for this testcase, obtained using -save-temps,
>   is attached as test.ii.
> - Suggested component: tree-optimization.
> - Suggested keyword: missed-optimization.

Reply via email to