https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
Peter Cordes <peter at cordes dot ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #2 from Peter Cordes <peter at cordes dot ca> --- gcc doesn't actually *branch* unless you use an if(), it just uses cmp/sbb to do a 128-bit compare. CMP is like a SUB that only sets flags. The CF result of SBB is used as an input for ADC. https://godbolt.org/z/64C4R- of a testcase GCC also wastes a varying number of MOV instructions beyond the minimum one to make cmp/sbb work, depending on BMI2 MULX or not, and how the sum is written. u128 prod = a[i] * (unsigned __int128) b[i]; #if 1 sum += prod; //if(sum<prod) overflow++; // gcc branches on mov/cmp/sbb overflow += sum<prod; // gcc uses adc after a mov/cmp/sbb #else overflow += __builtin_add_overflow(sum, prod, &sum); // gcc less bad, fewer MOV #endif clang makes efficient asm for all 3 ways, if you stop it from unrolling and using setc/movzx/add.