https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118505
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Note there is also a fma forming missing: _69 = s_64 + 1.0e+0; ... _71 = _69 * _70; which is: `(s_64 + 1.0) * _70` which can be rewritten as `s_64 * _70 + _70` That might alone get the performance back up. I should note that LLVM also does the fcsel but with changing of the 2 instruction `(a+1) * b` into one fma instruction `a*b + b`.