https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101895

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
           Priority|P3                          |P2
             Target|                            |x86_64-*-*
   Last reconfirmed|                            |2021-08-16

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.

void foo(float * restrict a, float b, float *c) {
  a[0] = c[0]*b + a[0];
  a[1] = c[2]*b + a[1];
  a[2] = c[1]*b + a[2];
  a[3] = c[3]*b + a[3];
}

shows the issue on x86_64 with a lack of an FMA.  One complication is that
FMA forming is done in a later pass only thus the vectorizer has no guidance
to decide on placement of the permute.

Note the vectorizer permute optimization propagates in one direction only
(but the optimistic pieces), the intent is to reduce the number of permutes
which almost exclusively come from loads.

Reply via email to