[Bug target/81904] New: FMA and addsub instructions

glisse at gcc dot gnu.org Sun, 20 Aug 2017 04:29:46 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904


            Bug ID: 81904
           Summary: FMA and addsub instructions
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

(asked in
https://stackoverflow.com/questions/45298855/how-to-write-portable-simd-code-for-complex-multiplicative-reduction/45401182#comment77780455_45401182
)

Intel has instructions like vfmaddsubps. Gcc manages, under certain
circumstances, to merge mult and plus or mult and minus into FMA, but not mult
and this strange addsub mix.

#include <x86intrin.h>
__m128d f(__m128d x, __m128d y, __m128d z){
  return _mm_addsub_pd(_mm_mul_pd(x,y),z);
}
__m128d g(__m128d x, __m128d y, __m128d z){
  return _mm_fmaddsub_pd(x,y,z);
}

(the order of the arguments is probably not right)

My first guess as to how this could be implemented without too much trouble is
in ix86_gimple_fold_builtin: for IX86_BUILTIN_ADDSUBPD and others, check that
we are late enough in the optimization pipeline (roughly where "widening_mul"
is), that contractions are enabled, and that the first (?) argument is a
single-use MULT_EXPR.

I didn't check what the situation is with the vectorizer (which IIRC can now
generate code that ends up as addsub).

[Bug target/81904] New: FMA and addsub instructions

Reply via email to