https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904
Bug ID: 81904 Summary: FMA and addsub instructions Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* (asked in https://stackoverflow.com/questions/45298855/how-to-write-portable-simd-code-for-complex-multiplicative-reduction/45401182#comment77780455_45401182 ) Intel has instructions like vfmaddsubps. Gcc manages, under certain circumstances, to merge mult and plus or mult and minus into FMA, but not mult and this strange addsub mix. #include <x86intrin.h> __m128d f(__m128d x, __m128d y, __m128d z){ return _mm_addsub_pd(_mm_mul_pd(x,y),z); } __m128d g(__m128d x, __m128d y, __m128d z){ return _mm_fmaddsub_pd(x,y,z); } (the order of the arguments is probably not right) My first guess as to how this could be implemented without too much trouble is in ix86_gimple_fold_builtin: for IX86_BUILTIN_ADDSUBPD and others, check that we are late enough in the optimization pipeline (roughly where "widening_mul" is), that contractions are enabled, and that the first (?) argument is a single-use MULT_EXPR. I didn't check what the situation is with the vectorizer (which IIRC can now generate code that ends up as addsub).