https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466
--- Comment #11 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> --- With Jonathon's suggested change, copied in to the original poster's framework (without -fno-trapping-math), Clang hot loop ( score: 165065 http://quick-bench.com/6NaD8ay0f8qMh9n0aMriYEiuKNA ) is: 0.16% movups 0x61a80(%r15,%rax,4),%xmm6 1.15% movups 0x61a90(%r15,%rax,4),%xmm7 0.60% movaps %xmm1,%xmm3 5.44% cmpltps %xmm6,%xmm3 0.44% movaps %xmm1,%xmm6 0.40% cmpltps %xmm7,%xmm6 0.44% movaps %xmm5,%xmm7 4.97% andps %xmm3,%xmm7 0.20% andnps %xmm4,%xmm3 0.36% orps %xmm7,%xmm3 1.04% movaps %xmm5,%xmm7 4.97% andps %xmm6,%xmm7 0.11% andnps %xmm4,%xmm6 4.95% orps %xmm7,%xmm6 5.53% movups %xmm3,0x61a80(%rbx,%rax,4) 0.47% movups %xmm6,0x61a90(%rbx,%rax,4) 4.42% movups 0x61aa0(%r15,%rax,4),%xmm3 20.42% movups 0x61ab0(%r15,%rax,4),%xmm6 1.00% movaps %xmm1,%xmm7 0.49% cmpltps %xmm3,%xmm7 9.79% movaps %xmm1,%xmm3 0.16% cmpltps %xmm6,%xmm3 2.26% movaps %xmm5,%xmm6 0.60% andps %xmm7,%xmm6 4.20% andnps %xmm4,%xmm7 1.18% orps %xmm6,%xmm7 2.22% movaps %xmm5,%xmm6 0.47% andps %xmm3,%xmm6 4.24% andnps %xmm4,%xmm3 4.88% movups %xmm7,0x61aa0(%rbx,%rax,4) 0.27% orps %xmm6,%xmm3 5.22% movups %xmm3,0x61ab0(%rbx,%rax,4) 6.02% add $0x10,%rax jne 405b30 <ifStandard(benchmark::State&)+0x4a0> GCC hot loop ( score: 2385754 http://quick-bench.com/ehLe-aqkpXkkx2sHLd6TWq_p4g4 ) is: 0.56% movss 0x0(%rbp,%rdx,1),%xmm0 1.47% xor %eax,%eax 2.00% subss %xmm2,%xmm0 7.02% ucomiss %xmm1,%xmm0 6.77% seta %al 4.96% xor %ecx,%ecx 0.25% ucomiss %xmm0,%xmm1 0.84% pxor %xmm0,%xmm0 0.09% seta %cl 5.40% sub %ecx,%eax 3.22% cvtsi2ss %eax,%xmm0 9.87% ucomiss %xmm0,%xmm1 6.53% ja 4053a8 <ifNoConditional(benchmark::State&)+0x1d8> 10.24% mulss %xmm4,%xmm0 11.55% addss %xmm3,%xmm0 5.46% movss %xmm0,(%rbx,%rdx,1) 2.00% add $0x4,%rdx cmp $0x61a80,%rdx jne 405350 <ifNoConditional(benchmark::State&)+0x180> Daniel Elliott does that better match your expectations? If so, I think this can be resolved as missed optimization of invalid code.