https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375
Bug ID: 114375 Summary: Wrong vectorization of permuted mask load Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- I've figured we never end up with a load permutation on .MASK_LOAD SLP nodes, instead we put one on the mask load only (if that ends up permuted). It took me a while to produce a testcase we're happy to vectorize and not by accident produce correct values. Here's one: int a[512]; int b[512]; int c[512]; void __attribute__((noipa)) foo(int * __restrict p) { for (int i = 0; i < 64; ++i) { int tem = 2, tem2 = 2; if (a[4*i + 1]) tem = p[4*i]; if (a[4*i]) tem2 = p[4*i + 2]; b[2*i] = tem2; b[2*i+1] = tem; if (a[4*i + 2]) tem = p[4*i + 1]; if (a[4*i + 3]) tem2 = p[4*i + 3]; c[2*i] = tem2; c[2*i+1] = tem; } } int main() { for (int i = 0; i < 512; ++i) a[i] = (i >> 1) & 1; foo (a); if (c[2] != 1 || c[3] != 0) __builtin_abort (); } miscompiled on x86_64 with -O3 -mavx2. Note b[] is correct in the end, but c[] is { 1, 0, 0, 1, 1, 0, 0, 1, ... } instead of { 1, 0, 1, 0, ... }.