https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105219
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #16) > (In reply to rsand...@gcc.gnu.org from comment #15) > > (In reply to Richard Biener from comment #14) > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > > index d7bc34636bd..3b63ab7b669 100644 > > > --- a/gcc/tree-vect-loop.cc > > > +++ b/gcc/tree-vect-loop.cc > > > @@ -9977,7 +9981,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, > > > gimple > > > *loop_vectorized_call) > > > lowest_vf) - 1 > > > : wi::udiv_floor (loop->nb_iterations_upper_bound + > > > bias_for_lowest, > > > lowest_vf) - 1); > > > - if (main_vinfo) > > > + if (main_vinfo && !main_vinfo->peeling_for_alignment) > > > { > > > unsigned int bound; > > > poly_uint64 main_iters > > It might be better to add the maximum peeling amount to main_iters. > > Maybe you'd prefer this anyway for GCC 12 though. > > > > I wonder if there's a similar problem for peeling for gaps, > > in cases where the epilogue doesn't need the same peeling. > > I don't quite understand the code in if (main_vinfo) but the point is > that for our case main_iters is zero (and so is prologue_iters if that > would exist). I'm not sure how the code can be adjusted with that > given it computes upper bounds and uses min() for the upper bound > of the epilogue - we'd need to adjust that with a max (2*vf-2, > old-upper-bound) > when there's prologue peeling and the short cut exists (I don't actually > compute that). That is, the code does if (can_div_away_from_zero_p (main_iters, LOOP_VINFO_VECT_FACTOR (loop_vinfo), &bound)) loop->nb_iterations_upper_bound = wi::umin ((widest_int) (bound - 1), loop->nb_iterations_upper_bound); and so assumes that the scalar epilogue never runs for more than epilogue VF - 1 times which is wrong. So I simply gated this whole code. But you are right that peeling for gaps would need similar handling so I'll play safe and add && !main_vinfo->peeling_for_gaps. > > peeling for gaps means we run the epilogue for main VF more iterations, > but that would just mean the vectorized epilogue executes one more time > and has peeling for gaps applied as well, so the scalar epilogue runs > for epilogue VF more iterations. > > I'm not sure what conditions prevent epilogue vectorization but I think > there were some at least.