https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so fixing the accounting to disregard obviously dead loads gets us to

t.f90:158:0: note: Cost model analysis:
  Vector inside of basic block cost: 1224
  Vector prologue cost: 0
  Vector epilogue cost: 0
  Scalar cost of basic block: 616
t.f90:158:0: note: not vectorized: vectorization is not profitable.

that still doesn't account for the redundant ones... (we still emit those
so we conservatively assume no CSE here).  I suppose the "simple" way
of costing permutation might be the real issue here though.

Permutations like { 58, 58, 58, 58 } are also vectorized badly
(and costed accordingly).  Likewise { 4, 5, 4, 5 } is costed as
permutation.

Not counting non-permutations improves things to

t.f90:158:0: note: Cost model analysis:
  Vector inside of basic block cost: 1080
  Vector prologue cost: 0
  Vector epilogue cost: 0
  Scalar cost of basic block: 616
t.f90:158:0: note: not vectorized: vectorization is not profitable.

So there is room for improvement but this was the "easy" parts (for the
rest also more analysis is required).  Likely there's some CSE inbetween
the SLP instances involved.

Reply via email to