https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979
--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Without the #c12 patch, slp2 shows with -O2 -mfma: Vector cost: 172 Scalar cost: 184 with it Vector cost: 156 Scalar cost: 152 No idea how the scalar cost decreased so much. .VEC_FMADDSUB (_8, _10, _13) 1 times vector_stmt costs 12 in body is like .VEC_ADDSUB (_12, _13) 1 times vector_stmt costs 12 in body (which is probably too low, because multiplication has vector cost 16).