https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98837
Bug ID: 98837 Summary: SLP discovery does not consider all lane permutes Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- While we SLP vectorize int a[1024], b[1024], c[1024]; void foo () { for (int i = 0; i < 1024; i += 4) { c[i] = a[i] + b[i]; c[i+1] = a[i+1] + b[i+1]; c[i+2] = a[i+2] * b[i+2]; c[i+3] = a[i+3] * b[i+3]; } } by splitting the SLP group into two the very similar int a[1024], b[1024], c[1024]; void foo () { for (int i = 0; i < 1024; i += 4) { c[i] = a[i] + b[i]; c[i+1] = a[i+1] * b[i+1]; c[i+2] = a[i+2] + b[i+2]; c[i+3] = a[i+3] * b[i+3]; } } is not SLPed because we do not consider splitting the group into non-adjacent sets. The same applies to basic-block SLP when you make the data type double (so we don't need unrolling), of course we simply fall back to a scalar build then.