https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375
Bug ID: 114375
Summary: Wrong vectorization of permuted mask load
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
I've figured we never end up with a load permutation on .MASK_LOAD SLP nodes,
instead we put one on the mask load only (if that ends up permuted). It took
me a while to produce a testcase we're happy to vectorize and not by accident
produce correct values. Here's one:
int a[512];
int b[512];
int c[512];
void __attribute__((noipa))
foo(int * __restrict p)
{
for (int i = 0; i < 64; ++i)
{
int tem = 2, tem2 = 2;
if (a[4*i + 1])
tem = p[4*i];
if (a[4*i])
tem2 = p[4*i + 2];
b[2*i] = tem2;
b[2*i+1] = tem;
if (a[4*i + 2])
tem = p[4*i + 1];
if (a[4*i + 3])
tem2 = p[4*i + 3];
c[2*i] = tem2;
c[2*i+1] = tem;
}
}
int main()
{
for (int i = 0; i < 512; ++i)
a[i] = (i >> 1) & 1;
foo (a);
if (c[2] != 1 || c[3] != 0)
__builtin_abort ();
}
miscompiled on x86_64 with -O3 -mavx2. Note b[] is correct in the end,
but c[] is { 1, 0, 0, 1, 1, 0, 0, 1, ... } instead of { 1, 0, 1, 0, ... }.