https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599

--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
I agree that the code produces correct results. It looks to me  sub-optimal.
I understand that with Ofast the sequence below will be always executed

    andps    %xmm5, %xmm8
    rcpps    %xmm3, %xmm0
    mulps    %xmm0, %xmm3
    mulps    %xmm0, %xmm3
    addps    %xmm0, %xmm0
    subps    %xmm3, %xmm0
    mulps    %xmm0, %xmm1
    movaps    %xmm2, %xmm0
    cmpleps    %xmm4, %xmm0
    blendvps    %xmm0, %xmm2, %xmm1

while with O2 it will not.
and this generates a performance penalty for samples where the test is often
false.
( I tried to add __builtin_expect(x, false) with no effect. )

Reply via email to