https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #2 from Peter Cordes <peter at cordes dot ca> ---
gcc doesn't actually *branch* unless you use an if(), it just uses cmp/sbb to
do a 128-bit compare.  CMP is like a SUB that only sets flags.  The CF result
of SBB is used as an input for ADC.

https://godbolt.org/z/64C4R- of a testcase

GCC also wastes a varying number of MOV instructions beyond the minimum one to
make cmp/sbb work, depending on BMI2 MULX or not, and how the sum is written.

        u128 prod = a[i] * (unsigned __int128) b[i];
#if 1
        sum += prod;
        //if(sum<prod) overflow++;  // gcc branches  on mov/cmp/sbb
        overflow += sum<prod;       // gcc uses adc after a mov/cmp/sbb
#else
        overflow += __builtin_add_overflow(sum, prod, &sum);  // gcc less bad,
fewer MOV
#endif


clang makes efficient asm for all 3 ways, if you stop it from unrolling and
using setc/movzx/add.

Reply via email to