https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 8 Jan 2025, rdapp.gcc at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 > > --- Comment #6 from rdapp.gcc at gmail dot com --- > >> Another thought I had as we already know that SLP handles this more > >> gracefully: > >> Would it make sense to "just" defer to BB vectorization and have loop > >> vectorization not do anything, provided we could detect the pattern with > >> certainty? That would still be special casing the situation but > >> potentially > >> less intrusive than "Hail Mary" unrolling. > > > > Yes, I would expect costing to ensure we don't loop vectorize it, but then > > we don't (and can't easily IMO) compare loop vectorization to > > basic-block vectorization after unrolling cost-wise, so ... > > Sure, I see the predicament. > > Just to make sure I understand: If we performed virtual unrolling (and > re-rolling) in the vectorizer couldn't we "natively" recognize and vectorize > this pattern and no special handling would be necessary if we e.g. > attempted VF=16? The issue is that I can't see how to "natively" recognize this, or rather how to represent this with the current vectorizer capabilities short of a bunch of very ugly hacks. That is, all dataref and dependence analysis work on the original IL and only after those we determine the VF. But for this case to work we'd ideally swap this around - always do unroll by VF and then do all the analysis as if VF == 1. But that's a much larger and more invasive change (and one I can't easily see working out in the end). > So the attempt to recognize during early unroll would be a stop gap until > that's fully working and in place (which might take longer than one GCC > cycle)? Yeah, but eventually this is a similar category like other high-level loop transforms that enable vectorization. Richard.