https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225
--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 13 Jan 2026, victorldn at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225 > > --- Comment #6 from Victor Do Nascimento <victorldn at gcc dot gnu.org> --- > Thanks for the feedback, both in terms of code examples and observations > regarding the prologue peeling expense. > > Also, sorry for the slow turnaround time. After the holidays, I've been > ramping > up on the code for the loop costing. > > I figured the easiest way (though I've yet to convince myself it's the right > way) to tweak which uncounted loops we accept for vectorization is to > replicate > what we do if (loop_cost_model (loop) == VECT_COST_MODEL_VERY_CHEAP, where we > check min_profitable_estimate against some constant, e.g. vect_vf_for_cost > (loop_vinfo). > > Even using the vect_vf_for_cost (loop_vinfo) as for VECT_COST_MODEL_VERY_CHEAP > in the uncounted loop criterion allows us to recover 86% of the increase in > code-size for 523.xalancbmk_r and most of the performance degradation we > observe in AArch64 (though admittedly the performance loss is considerably > smaller for AArch64 than it is for x86_64). > > I'll try other cut off values (Richi mentioned about vector loop being less > than 2x expensive as a single scalar iteration, while I had thought half of > vect_vf_for_cost) and report back, tough equally any feedback on my as of yet > rudimentary approach to the problem is most welcome. I wonder if for now (w/o the ability to elide the epilog, w/o the ability to use first-fault loads) we should restrict this to PGO when we have a more reliable expected iteration count to work with? Though as we do not have a histogram of actual loop iterations an estimated count of 10 can result from a mix of 1 and 20 loop iterations ... Plus eventually handling loops marked as force_vectorize (we do not yet have a #pragma users can use, but OMP SIMD marks loops this way).
