https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123343

--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 7 Jan 2026, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123343
> 
> --- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #5)
> > (In reply to Zhongyao Chen from comment #4)
> > > Yes. Without unrolling, inner loop vectorization produces better asm. 
> > > Could
> > > .cunrolli be made vectorization-aware to avoid unrolling when beneficial? 
> > > Once unrolled, SLP has no way to recover the loop; what we want is inner
> > > loop vectorization only, not SLP.
> > 
> > cunrolli first and foremost job is to remove abstraction, it's difficult to
> > anticipate further optimization on the unrolled body, so - not easily I'd
> > say.
> > 
> > BB SLP should work on this though (but as you said we first vectorize the
> > loop containing the code in an awkward way).
> 
> Part of the reason I'm working on PR119187 is to hopefully be able to recover
> such cases like this where the pass ordering makes things awkward, so you'll
> end up with an SLP tree which contains both vector and scalar statements.
> 
> But in this case I do think the dataref analysis could be improved to help?

My understanding is that SLP (and thus unrolling) is generally difficult
with VLA vector ISAs and re-rolling while useful only works when all
statements of a loop are unrolled.  But in this case only part of the
loop is.

But as I've said, there's few things the vectorizer tries without the
target being able to chime in via costing.  One is the fallback
of trying to vectorize a (if-converted) loop body using BB vectorization
when loop vectorization fails.  Another is single-lane vs. multi-lane
SLP (where we try the former only when the latter fails).

If you -fno-tree-loop-vectorize, does the SLP vectorizer then generate
the expected code off the unrolled loop?

Reply via email to