https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103116
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- We could make peeling for gaps handle this by making it not a flag but indicate the number of vector(!?) iterations we need to peel. We're doing the "correct" thing in adjusting the IV increment via if (slp_perm && (group_size != scalar_lanes || !multiple_p (nunits, group_size))) { /* We don't yet generate such SLP_TREE_LOAD_PERMUTATIONs for variable VF; see vect_transform_slp_perm_load. */ unsigned int const_vf = vf.to_constant (); unsigned int const_nunits = nunits.to_constant (); vec_num = CEIL (group_size * const_vf, const_nunits); group_gap_adj = vf * group_size - nunits * vec_num; The problem also shows up for loops like for (int i = 0; i < COUNT; ++i) { x[i * 4] = y[i * 3] + 1; x[i * 4 + 1] = y[i * 3] + 2; x[i * 4 + 2] = y[i * 3 + 1] + 3; x[i * 4 + 3] = y[i * 3 + 2] + 4; } where we cannot use a smaller vector type. We could also use masked loads if available of course (not sure about the cost of that vs peeling for gaps). A conservative fix would be to detect when peeling for gaps as implemented is good enough and do that and otherwise reject vectorization.