https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599
--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> --- I agree that the code produces correct results. It looks to me sub-optimal. I understand that with Ofast the sequence below will be always executed andps %xmm5, %xmm8 rcpps %xmm3, %xmm0 mulps %xmm0, %xmm3 mulps %xmm0, %xmm3 addps %xmm0, %xmm0 subps %xmm3, %xmm0 mulps %xmm0, %xmm1 movaps %xmm2, %xmm0 cmpleps %xmm4, %xmm0 blendvps %xmm0, %xmm2, %xmm1 while with O2 it will not. and this generates a performance penalty for samples where the test is often false. ( I tried to add __builtin_expect(x, false) with no effect. )