[Bug tree-optimization/123225] [16 Regression] Overly-aggressive vectorization of uncounted loops

tnfchris at gcc dot gnu.org via Gcc-bugs Fri, 16 Jan 2026 04:18:08 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225


--- Comment #18 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #17)
> (In reply to Tamar Christina from comment #15)
> > In my opinion, as I mentioned before, costing for uncounted loops should be
> > based on the single iteration cost of the vector loop vs the scalar, and
> > this value for profitability should be used to version the loop.
> > 
> > What we want is a runtime threshold for profitability, similar to how we
> > version normal loops. This feels more correct to me and it would allow
> > targets to adjust this scale. This is similar to what we do today already
> > for known iters loops.
> > 
> > And vectorizeration statically rejected when the vector costs / VF > scalar
> > costs. ie the vector costs will never beat scalar per iteration.
> 
> But you cannot compute a runtime threshold for uncounted loops (or also
> very much early-break ones where we ignore the possibility of breaking
> early).  What you are left with is profile-data (loop will very likely
> at least run N times), hand-waving (loops will usually run at least
> N times), or hard checking (even when the loop immediately exits the
> vector version isn't worse).

You can always compute the cost as if it were straight line code and
base the costing in that to compute. That's what I was getting at above.

> With early-break (and uncounted loop)
> re-executing the last N iterations in the epilog that hard checking
> can never work out.

Agreed, which is why is why I suggested using the smaller patch to get loops
With no side effects supported without an epilog.

Loops with side effects are more difficult because of unrolling.

Also we probably should block unrolling for early break epilogs. The example
I the first comment has a pretty tight sequence. Such examples when unrolled
typically clash with branch density constraints on many uarches.
So the unrolled code can have a lot more mispredicts.

[Bug tree-optimization/123225] [16 Regression] Overly-aggressive vectorization of uncounted loops

Reply via email to