https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

            Bug ID: 114375
           Summary: Wrong vectorization of permuted mask load
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

I've figured we never end up with a load permutation on .MASK_LOAD SLP nodes,
instead we put one on the mask load only (if that ends up permuted).  It took
me a while to produce a testcase we're happy to vectorize and not by accident
produce correct values.  Here's one:

int a[512];
int b[512];
int c[512];
void __attribute__((noipa))
foo(int * __restrict p)
{
  for (int i = 0; i < 64; ++i)
    {
      int tem = 2, tem2 = 2;
      if (a[4*i + 1])
        tem = p[4*i];
      if (a[4*i])
        tem2 = p[4*i + 2];
      b[2*i] = tem2;
      b[2*i+1] = tem;
      if (a[4*i + 2])
        tem = p[4*i + 1];
      if (a[4*i + 3])
        tem2 = p[4*i + 3];
      c[2*i] = tem2;
      c[2*i+1] = tem;
    }
}
int main()
{
  for (int i = 0; i < 512; ++i)
    a[i] = (i >> 1) & 1;
  foo (a);
  if (c[2] != 1 || c[3] != 0)
    __builtin_abort ();
}

miscompiled on x86_64 with -O3 -mavx2.  Note b[] is correct in the end,
but c[] is { 1, 0, 0, 1, 1, 0, 0, 1, ... } instead of { 1, 0, 1, 0, ... }.

Reply via email to