On Thu, 10 Jul 2025, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > The following removes an optimization that wrongly triggers right now > > because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be > > computed yet. > > > > Testing on x86_64 didn't reveal any testsuite coverage. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > OK? > > > > PR tree-optimization/120939 > > * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): > > Remove eliding an epilogue based on not computed > > LOOP_VINFO_COST_MODEL_THRESHOLD. > > This regresses: > > FAIL: gcc.dg/torture/pr113026-1.c -O3 -fomit-frame-pointer -funroll-loops > -fpeel-loops -ftracer -finline-functions (test for bogus messages, line 10) > > on aarch64-linux-gnu, with: > > .../pr113026-1.c:10:12: warning: writing 1 byte into a region of size 0 > [-Wstringop-overflow=] > .../pr113026-1.c:4:6: note: at offset 16 into destination object 'dst' of > size 16 > > I haven't looked into why yet, but it does seem superficially similar > to PR60505, which was what this code seems to have been added to fix > (g:090cd8dc70b80183c83d9f43f1e6ab9970481efd).
Yes. I now also see the above diagnostic (not sure why it escaped me). Now, the issue is this elides creating a vector epilog with 8 byte vectors so we never vectorize when n is in [8, 16]. With the proposed patch we'd again do that, but we also have a scalar epilog then and warn. Fact is that neither LOOP_VINFO_COST_MODEL_THRESHOLD nor LOOP_VINFO_VERSIONING_THRESHOLD are initialized when we apply this heuristic and the actual versioning condition is using LOOP_VINFO_VERSIONING_THRESHOLD (when not zero) plus, when the cost model threshold and that are not ordered and vect_apply_runtime_profitability_check_p we apply LOOP_VINFO_COST_MODEL_THRESHOLD as well (see vect_loop_versioning). So besides the above mentioned missed optimization caused by the heuristic I fear there is the chance of wrong code or at least we are applying the heuristic at a wrong place. It was also added before we had any epilogue vectorization. What happens for the diagnostic to re-appear is that we end up with unrolled scalar epilogues we diagnose. I'll re-spin with an added xfail on this testcase and re-open the bugreport? Richard. > Thanks, > Richard > > > --- > > gcc/tree-vect-loop.cc | 21 ++------------------- > > 1 file changed, 2 insertions(+), 19 deletions(-) > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index 46a6399243d..7ac61d4dce2 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -1224,13 +1224,6 @@ static bool > > vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo) > > { > > unsigned HOST_WIDE_INT const_vf; > > - HOST_WIDE_INT max_niter > > - = likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo)); > > - > > - unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); > > - if (!th && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) > > - th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO > > - (loop_vinfo)); > > > > if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > > && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0) > > @@ -1250,18 +1243,8 @@ vect_need_peeling_or_partial_vectors_p > > (loop_vec_info loop_vinfo) > > VF * N + 1. That's something of a niche case though. */ > > || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) > > || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf) > > - || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) > > - < (unsigned) exact_log2 (const_vf)) > > - /* In case of versioning, check if the maximum number of > > - iterations is greater than th. If they are identical, > > - the epilogue is unnecessary. */ > > - && (!LOOP_REQUIRES_VERSIONING (loop_vinfo) > > - || ((unsigned HOST_WIDE_INT) max_niter > > - /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD > > - but that's only computed later based on our result. > > - The following is the most conservative approximation. */ > > - > (std::max ((unsigned HOST_WIDE_INT) th, > > - const_vf) / const_vf) * const_vf)))) > > + || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) > > + < (unsigned) exact_log2 (const_vf))) > > return true; > > > > return false; > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)