https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66280

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Related to PR66251 (partial backport fixes the gcc 5 branch).

On trunk the issue is that we have a SLP node

node
        stmt 0 _9->re = _17;

        stmt 1 _9->im = _23;

node
        stmt 0 patt_56 = _13 w* pretmp_51;

        stmt 1 patt_27 = _21 w* pretmp_51;

node
        stmt 0 _13 = _12->re;

        stmt 1 _21 = _12->im;

and non-SLP stores:

  # a.0_31 = PHI <a.0_29(22), a.5_25(7)>
  _5 = (long unsigned int) a.0_31;
  _6 = _5 * 8;
  _7 = pretmp_45 + _6;
  _9 = pretmp_47 + _6;
  _11 = _5 * 4;
  _12 = pretmp_49 + _11;
  _13 = _12->re;
  _14 = (int) _13;
  _17 = _14 * pretmp_53;
  _9->re = _17;
  _7->im = _17;
  _7->re = _17;
  _21 = _12->im;
  _22 = (int) _21;
  _23 = _22 * pretmp_53;
  _9->im = _23;
  a.5_25 = a.0_31 + 1;
  if (a.5_25 == 0)

where the stored values and thus the vector stmts are shared.  For the
SLP scheduling we insert stmts before the last scalar use - in this case
_9->im = _23 - and this includes the computation of _17 as the result
is just {_17, _23}.  But then regular vectorization comes along and
inserts the required permutes at the place of the _7->{re,im} stores.

I can't see how the old placement method avoided all the issues in
similar situations.  Well, it inserted stmts before the first stmt
in a group (apart from stores where it chooses the last and loads
where it chooses the first stmt).  So with _9->re = _23; and _9->im = _17;
it would have been broken there as well.  Ah, no, on the GCC 5 branch
we then simply fail to detect the SLP opportunity...

typedef struct
{
  short re;
  short im;
} cint16_T;
typedef struct
{
  int re;
  int im;
} cint32_T;
int a;
short b;
cint16_T *c;
cint32_T *d, *e;
void
fn1 ()
{
  for (; a; a++)
    {
      d[a].re = d[a].im = e[a].im = c[a].im * b;
      e[a].re = c[a].re * b;
    }
}

even after the backport (and on trunk) ICEs in vect_get_vec_def_for_operand

Reply via email to