https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225
--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 14 Jan 2026, tnfchris at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225 > > Tamar Christina <tnfchris at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |tnfchris at gcc dot gnu.org > > --- Comment #11 from Tamar Christina <tnfchris at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #10) > > (In reply to Victor Do Nascimento from comment #9) > > > > I wonder if for now (w/o the ability to elide the epilog, w/o the > > > > ability > > > > to use first-fault loads) we should restrict this to PGO when we have > > > > a more reliable expected iteration count to work with? Though as we > > > > do not have a histogram of actual loop iterations an estimated count > > > > of 10 can result from a mix of 1 and 20 loop iterations ... > > > > > > > > Plus eventually handling loops marked as force_vectorize (we do not > > > > yet have a #pragma users can use, but OMP SIMD marks loops this way). > > > > > > Yes, I do think that the poor handling of both prologue and epilogue at > > > present severely hurt the usefulness of this approach. As for the > > > prologue, > > > AArch64 targets with SVE can considerably counter the performance hit by > > > implementing masking for alignment. This, in particular, is something I > > > am > > > working on as a follow up to this work and will be looking to submit once > > > we > > > are back in stage 1. > > > > Masking for alignment should work for all targets that can use a predicated > > loop, including x86 and risc-v. > > > > For GCC 16 we can consider adding a new --param so targets could opt to > > disable uncounted loop vectorization alltogether. I somehow had the > > impression that we'd land the code avoiding the scalar epilog re-doing > > the last vector iteration as well, but that didn't materialize. Without > > that profitability is even worse for high VF. The alignment prologue > > shouldn't be too bad in practice for not too small loops, it's really > > the epilog where we end up doing things twice that hurts for low iteration > > counts. > > Simple cases as the above can avoid the epilogue quite easily. During analysis > of the loop we just have to determine if there are any non-early break forced > IVs. > > If not the epilogue isn't needed and the code that forces the epilogue can > just > be turned off. After which the loop won't be peeled and the exits are fine. > > What delayed this is when you DO have a live value, for which you then need to > do masked based reductions which triggers a bunch of other issues to deal > with. > > So rather than restricting to PGO we could just handle the cases above and > restrict uncounted loops to cases that don't require a forced epilogue. > > That way when I finish the reductions next stage1 it just works. > > The patches for the above are on my work machine, but I won't be back till the > 23rd. > > If you agree can extract them from the series and send. Would be nice to have those on record indeed.
