[Bug tree-optimization/123225] [16 Regression] Overly-aggressive vectorization of uncounted loops

rguenther at suse dot de via Gcc-bugs Wed, 14 Jan 2026 04:17:17 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225


--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 13 Jan 2026, victorldn at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225
> 
> --- Comment #6 from Victor Do Nascimento <victorldn at gcc dot gnu.org> ---
> Thanks for the feedback, both in terms of code examples and observations
> regarding the prologue peeling expense.
> 
> Also, sorry for the slow turnaround time. After the holidays, I've been 
> ramping
> up on the code for the loop costing.
> 
> I figured the easiest way (though I've yet to convince myself it's the right
> way) to tweak which uncounted loops we accept for vectorization is to 
> replicate
> what we do if (loop_cost_model (loop) == VECT_COST_MODEL_VERY_CHEAP, where we
> check min_profitable_estimate against some constant, e.g. vect_vf_for_cost
> (loop_vinfo).
> 
> Even using the vect_vf_for_cost (loop_vinfo) as for VECT_COST_MODEL_VERY_CHEAP
> in the uncounted loop criterion allows us to recover 86% of the increase in
> code-size for 523.xalancbmk_r and most of the performance degradation we
> observe in AArch64 (though admittedly the performance loss is considerably
> smaller for AArch64 than it is for x86_64).
> 
> I'll try other cut off values (Richi mentioned about vector loop being less
> than 2x expensive as a single scalar iteration, while I had thought half of
> vect_vf_for_cost) and report back, tough equally any feedback on my as of yet
> rudimentary approach to the problem is most welcome.

I wonder if for now (w/o the ability to elide the epilog, w/o the ability
to use first-fault loads) we should restrict this to PGO when we have
a more reliable expected iteration count to work with?  Though as we
do not have a histogram of actual loop iterations an estimated count
of 10 can result from a mix of 1 and 20 loop iterations ...

Plus eventually handling loops marked as force_vectorize (we do not
yet have a #pragma users can use, but OMP SIMD marks loops this way).

[Bug tree-optimization/123225] [16 Regression] Overly-aggressive vectorization of uncounted loops

Reply via email to