https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66510
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Ok, so we don't limit group sizes to multiples of nunits which means vect_transform_slp_perm_load misses an early out. This is then also a missed optimization as we don't consider the three SLP loads permutes of the single group load. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 224514) +++ gcc/tree-vect-slp.c (working copy) @@ -3306,6 +3276,10 @@ vect_transform_slp_perm_load (slp_tree n return false; } + if (need_next_vector + && vec_index >= (vf * group_size) / nunits) + return false; + if (!analyze_only) { int l;