https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225
--- Comment #13 from Victor Do Nascimento <victorldn at gcc dot gnu.org> --- > So rather than restricting to PGO we could just handle the cases above and > restrict uncounted loops to cases that don't require a forced epilogue. Forgive my ignorance here, but surely we are talking about 2 separate (though closely-related) problems... If we can elide the epilogue, I understand we are definitely making the vectorized code cheaper to execute (and smaller, improving the resulting code-size), but surely we still need to make sure we get costing right, no? No expensive epilogue will mean the loop becomes profitable faster, yes, but we still need to either: 1. know whether we will execute enough iterations to reach that profitability threshold (which is where the PGO idea comes in) or 2. ensure we have a conservative enough assumption about min iterations (e.g. going back to Richi's idea that the vectorized loop should be no more expensive than 2 scalar iterations) so that we always reject loops that will need too many iterations for profitability. The idea I have been working with was that we effectively apply both approaches above: 1. Use PGO info when available or 2. Apply the very conservative cost requirement when PGO data not available. And have the epilog eliding just ensuring more of our vectorized loops pass these cost tests... Am I wrong in my thinking?
