https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65832

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Testcase simulating all qi vector cases the vectorizer may create

char a[1024];
char b[1024];

void foobar (int s)
{
  for (int i = 0; i < 16; ++i)
    {
      b[i] = a[s*i];
    }
}

void foo (int s)
{
  for (int i = 0; i < 8; ++i)
    {
      b[2*i] = a[s*i];
      b[2*i + 1] = a[s*i + 1];
    }
}

void bar (int s)
{
  for (int i = 0; i < 4; ++i)
    {
      b[4*i] = a[s*i];
      b[4*i + 1] = a[s*i + 1];
      b[4*i + 2] = a[s*i + 2];
      b[4*i + 3] = a[s*i + 3];
    }
}

void baz(int s)
{
  for (int i = 0; i < 2; ++i)
    {
      b[8*i] = a[s*i];
      b[8*i + 1] = a[s*i + 1];
      b[8*i + 2] = a[s*i + 2];
      b[8*i + 3] = a[s*i + 3];
      b[8*i + 4] = a[s*i + 4];
      b[8*i + 5] = a[s*i + 5];
      b[8*i + 6] = a[s*i + 6];
      b[8*i + 7] = a[s*i + 7];
    }
}

Compile with -fdisable-tree-cunrolli.

foobar creates absymal code and baz needlessly goes through the stack.
For plain -msse2 all code-gen isn't great but for foo which ends up
using pinsrw.

baz fails to use pinsrq and foobar fails to use pinsrq with -msse4.

Reply via email to