https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81303

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Looks like we peel for alignment which, for the loop is quite pointless at it
only runs 5 times, so for AVX256 we're likely running into peel for alignment,
no vector iteration, epilogue.

Need to tame down that damn alignment peeling more ...

It peels 'x' btw.

block_solver.f:178:0: note: Cost model analysis:
  Vector inside of loop cost: 76
  Vector prologue cost: 61
  Vector epilogue cost: 62
  Scalar iteration cost: 28
  Scalar outside cost: 7
  Vector outside cost: 123
  prologue iterations: 2
  epilogue iterations: 2
  Calculated minimum iters for profitability: 5
block_solver.f:178:0: note:   Runtime profitability threshold = 4
block_solver.f:178:0: note:   Static estimate profitability threshold = 5

but that doesn't take into account that we eventually spend 3 scalar iterations
in the alignment prologue and thus with niter < 7 we'll eventually never enter
the vector loop.  The static estimate is similarly affected by this.

Reply via email to