https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97236
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #4) > So what goes wrong is the single-element interleaving code-gen for the > pointer copy. We have > > t.c:18:21: note: Detected single element interleaving > picture_7(D)->p[i_18].p_pixels step 16 > > but for the store: > > t.c:18:21: missed: not consecutive access res_8(D)->p[i_18].p_pixels = _1; > t.c:18:21: note: using strided accesses > > ... > > t.c:18:21: note: ==> examining statement: _1 = > picture_7(D)->p[i_18].p_pixels; > t.c:18:21: note: vect_model_load_cost: aligned. > t.c:18:21: note: vect_model_load_cost: inside_cost = 24, prologue_cost = 0 > . > > and in group get-load-store type we handle it as (V1DI) > > if (!STMT_VINFO_STRIDED_P (first_stmt_info) > && (can_overrun_p || !would_overrun_p) > && compare_step_with_zero (vinfo, stmt_info) > 0) > { > /* First cope with the degenerate case of a single-element > vector. */ > if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)) > *memory_access_type = VMAT_CONTIGUOUS; So both doing && gap == 0 here and removing this special-case alltogether passes bootstrap / regtest on x86_64. I have no idea why the special case was needed in the first place? Was the load-lanes code confused? I think VMAT_ELEMENTWISE for single-element vectors is a good enough match? What's the advantage of VMAT_CONTIGUOUS here?