https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- Ah, we only fold those to internal functions _after_ vectorization. SLP will see double x[2]; void foo (double a, double b, double * __restrict c) { x[0] = __builtin_fma (a, b, c[0]); x[1] = __builtin_fma (a, b, -c[1]); } as two calls to FMA and thus fail to optimially vectorize it.