https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101178

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another case:

double a[2], b[2], c[2];

void foo ()
{
  double tem0 = a[1] + b[1];
  double tem1 = a[0] - b[0];
  c[0] = tem0;
  c[1] = tem1;
}

here the addsub VEC_PERM merge node has wrong order (+, - instead of -, +)
for x86 addsub at the place we currently match for SLP patterns.  But if
we'd move the load permutations across this node we can not only save
one permute but also match x86 addsub.

Currently optimize_slp materializes the perms at the addsub VEC_PERM merge
node which does the trick (but pattern matching was too early here).
It will be a cost thing to decide whether to materialize here or to hope
for eliding another permute up the chain.

double a[2], b[2], c[2];

void foo ()
{
  double tem0 = a[1] - b[1];
  double tem1 = a[0] + b[0];
  c[0] = tem0;
  c[1] = tem1;
}

is currently miscompiled when we first match .ADDSUB since then optimize_slp
happily treats it as lane agnostic operation.  That's probably a latent
wrong-code issue on the branch as well.

Reply via email to