https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120457
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed| |2025-05-30 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Target Milestone|--- |16.0 Keywords| |missed-optimization Summary|gcc.dg/vect/pr79920.c fail |[16 Regression] |starting with |gcc.dg/vect/pr79920.c fail |r16-924-g1bc5b47f5b06dc |starting with | |r16-924-g1bc5b47f5b06dc --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- So do we now no longer put t32[ip_1][0] and t32[ip_1][2] into a single DR group? IIRC power has V2DF but nothing larger, so those two elements will never get loaded together, meaning the heuristic makes some sense. t.c:14:7: note: Detected single element interleaving *_4 step 24 t.c:14:7: note: Detected single element interleaving *_10 step 24 But what happens is that we end up lowering this into a vector interleaving scheme that covers the other access anyway: t.c:14:7: note: node 0x3dda5b0 (max_nunits=2, refcnt=2) vector(2) double t.c:14:7: note: op: VEC_PERM_EXPR t.c:14:7: note: stmt 0 _5 = *_4; t.c:14:7: note: lane permutation { 0[0] } t.c:14:7: note: children 0x3dda910 t.c:14:7: note: node 0x3dda910 (max_nunits=1, refcnt=1) vector(2) double t.c:14:7: note: op: VEC_PERM_EXPR t.c:14:7: note: stmt 0 _5 = *_4; t.c:14:7: note: stmt 1 _5 = *_4; t.c:14:7: note: lane permutation { 0[0] 0[0] } t.c:14:7: note: children 0x3dda880 t.c:14:7: note: node 0x3dda880 (max_nunits=2, refcnt=2) vector(2) double t.c:14:7: note: op template: _5 = *_4; t.c:14:7: note: stmt 0 _5 = *_4; t.c:14:7: note: stmt 1 --- t.c:14:7: note: stmt 2 --- but then decide t.c:14:7: note: === vect_slp_analyze_operations === t.c:14:7: note: ==> examining statement: _5 = *_4; t.c:14:7: missed: single-element interleaving not supported for not adjacent vector loads that would get us a elementwise accesses if we'd have just a single lane, but the lowering above wrecked that path t.c:15:31: missed: not vectorized: relevant stmt not supported: _5 = *_4; t.c:14:7: note: unsupported SLP instance starting from: t33[ip_1_46][i_0_47] = _14; t.c:14:7: missed: unsupported SLP instances this is a heuristic as well: /* If this is single-element interleaving with an element distance that leaves unused vector loads around fall back to elementwise access if possible - we otherwise least create very sub-optimal code in that case (and blow up memory, see PR65518). */ if (loop_vinfo && single_element_p && (*memory_access_type == VMAT_CONTIGUOUS || *memory_access_type == VMAT_CONTIGUOUS_REVERSE) && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype))) we can extend that to be a permute lowering heuristic as well.