https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105219

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> (In reply to rsand...@gcc.gnu.org from comment #15)
> > (In reply to Richard Biener from comment #14)
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index d7bc34636bd..3b63ab7b669 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -9977,7 +9981,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, 
> > > gimple
> > > *loop_vectorized_call)
> > >                             lowest_vf) - 1
> > >            : wi::udiv_floor (loop->nb_iterations_upper_bound +
> > > bias_for_lowest,
> > >                              lowest_vf) - 1);
> > > -      if (main_vinfo)
> > > +      if (main_vinfo && !main_vinfo->peeling_for_alignment)
> > >         {
> > >           unsigned int bound;
> > >           poly_uint64 main_iters
> > It might be better to add the maximum peeling amount to main_iters.
> > Maybe you'd prefer this anyway for GCC 12 though.
> > 
> > I wonder if there's a similar problem for peeling for gaps,
> > in cases where the epilogue doesn't need the same peeling.
> 
> I don't quite understand the code in if (main_vinfo) but the point is
> that for our case main_iters is zero (and so is prologue_iters if that
> would exist).  I'm not sure how the code can be adjusted with that
> given it computes upper bounds and uses min() for the upper bound
> of the epilogue - we'd need to adjust that with a max (2*vf-2,
> old-upper-bound)
> when there's prologue peeling and the short cut exists (I don't actually
> compute that).

That is, the code does

          if (can_div_away_from_zero_p (main_iters,
                                        LOOP_VINFO_VECT_FACTOR (loop_vinfo),
                                        &bound))
            loop->nb_iterations_upper_bound
              = wi::umin ((widest_int) (bound - 1),
                          loop->nb_iterations_upper_bound);

and so assumes that the scalar epilogue never runs for more than epilogue
VF - 1 times which is wrong.  So I simply gated this whole code.  But
you are right that peeling for gaps would need similar handling so I'll
play safe and add && !main_vinfo->peeling_for_gaps. 

> 
> peeling for gaps means we run the epilogue for main VF more iterations,
> but that would just mean the vectorized epilogue executes one more time
> and has peeling for gaps applied as well, so the scalar epilogue runs
> for epilogue VF more iterations.
> 
> I'm not sure what conditions prevent epilogue vectorization but I think
> there were some at least.

Reply via email to