https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> --- Not costing redundant permutations (using a too trival implementation but good enough for this case): t.f90:158:0: note: Cost model analysis: Vector inside of basic block cost: 984 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar cost of basic block: 616 t.f90:158:0: note: not vectorized: vectorization is not profitable.