https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
--- Comment #6 from rdapp.gcc at gmail dot com --- >> Another thought I had as we already know that SLP handles this more >> gracefully: >> Would it make sense to "just" defer to BB vectorization and have loop >> vectorization not do anything, provided we could detect the pattern with >> certainty? That would still be special casing the situation but potentially >> less intrusive than "Hail Mary" unrolling. > > Yes, I would expect costing to ensure we don't loop vectorize it, but then > we don't (and can't easily IMO) compare loop vectorization to > basic-block vectorization after unrolling cost-wise, so ... Sure, I see the predicament. Just to make sure I understand: If we performed virtual unrolling (and re-rolling) in the vectorizer couldn't we "natively" recognize and vectorize this pattern and no special handling would be necessary if we e.g. attempted VF=16? So the attempt to recognize during early unroll would be a stop gap until that's fully working and in place (which might take longer than one GCC cycle)?