https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103116
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- So the issue is we have group_size == 2 but nunits == 4 but still gap == 0. That makes get_group_load_store_type assume overrun_p = false. I suppose that when we'd have 8 elements in x and four times the first and second in y peeling one vector iteration as scalar is not enough to avoid the breakage. So while peeling for gaps in this particular case helps it's not the solution for the more general problem. Here instead I think we need to enforce a minimum vectorization factor so that nunits divides group_size * vf (or at least nunits/2 does to allow peeling for gaps to work). ISTR we specifically did not do this to allow more vectorization though. The better alternative would then be to allow a smaller vector size to be used for the load with all the ripple down effects that might have (and only enforce a larger VF if there is no such vector type).