On Thu, 10 Jul 2025, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > The following removes an optimization that wrongly triggers right now
> > because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be
> > computed yet.
> >
> > Testing on x86_64 didn't reveal any testsuite coverage.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> >
> >     PR tree-optimization/120939
> >     * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
> >     Remove eliding an epilogue based on not computed
> >     LOOP_VINFO_COST_MODEL_THRESHOLD.
> 
> This regresses:
> 
> FAIL: gcc.dg/torture/pr113026-1.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions   (test for bogus messages, line 10)
> 
> on aarch64-linux-gnu, with:
> 
> .../pr113026-1.c:10:12: warning: writing 1 byte into a region of size 0 
> [-Wstringop-overflow=]
> .../pr113026-1.c:4:6: note: at offset 16 into destination object 'dst' of 
> size 16
> 
> I haven't looked into why yet, but it does seem superficially similar
> to PR60505, which was what this code seems to have been added to fix
> (g:090cd8dc70b80183c83d9f43f1e6ab9970481efd).

Yes.  I now also see the above diagnostic (not sure why it escaped me).

Now, the issue is this elides creating a vector epilog with 8 byte
vectors so we never vectorize when n is in [8, 16].  With the proposed
patch we'd again do that, but we also have a scalar epilog then
and warn.

Fact is that neither LOOP_VINFO_COST_MODEL_THRESHOLD nor
LOOP_VINFO_VERSIONING_THRESHOLD are initialized when we apply
this heuristic and the actual versioning condition is
using LOOP_VINFO_VERSIONING_THRESHOLD (when not zero)
plus, when the cost model threshold and that are not ordered
and vect_apply_runtime_profitability_check_p we apply
LOOP_VINFO_COST_MODEL_THRESHOLD as well (see vect_loop_versioning).

So besides the above mentioned missed optimization caused by
the heuristic I fear there is the chance of wrong code or at
least we are applying the heuristic at a wrong place.  It
was also added before we had any epilogue vectorization.

What happens for the diagnostic to re-appear is that we
end up with unrolled scalar epilogues we diagnose.

I'll re-spin with an added xfail on this testcase and re-open
the bugreport?

Richard.


> Thanks,
> Richard
> 
> > ---
> >  gcc/tree-vect-loop.cc | 21 ++-------------------
> >  1 file changed, 2 insertions(+), 19 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 46a6399243d..7ac61d4dce2 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1224,13 +1224,6 @@ static bool
> >  vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
> >  {
> >    unsigned HOST_WIDE_INT const_vf;
> > -  HOST_WIDE_INT max_niter
> > -    = likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> > -
> > -  unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> > -  if (!th && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
> > -    th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
> > -                                     (loop_vinfo));
> >  
> >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > @@ -1250,18 +1243,8 @@ vect_need_peeling_or_partial_vectors_p 
> > (loop_vec_info loop_vinfo)
> >      VF * N + 1.  That's something of a niche case though.  */
> >        || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> >        || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf)
> > -      || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> > -      < (unsigned) exact_log2 (const_vf))
> > -     /* In case of versioning, check if the maximum number of
> > -        iterations is greater than th.  If they are identical,
> > -        the epilogue is unnecessary.  */
> > -     && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > -         || ((unsigned HOST_WIDE_INT) max_niter
> > -             /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
> > -                but that's only computed later based on our result.
> > -                The following is the most conservative approximation.  */
> > -             > (std::max ((unsigned HOST_WIDE_INT) th,
> > -                          const_vf) / const_vf) * const_vf))))
> > +      || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> > +     < (unsigned) exact_log2 (const_vf)))
> >      return true;
> >  
> >    return false;
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to