https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116654
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- FAIL: gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c scan-tree-dump-times vect "vectorizing stmts using SLP" 3 the testcase needs adjustment (will push fix) FAIL: gcc.target/powerpc/p9-vec-length-full-8.c scan-assembler-times \\\\mlxvl\\\\M 30 FAIL: gcc.target/powerpc/p9-vec-length-full-8.c scan-assembler-times \\\\mstxvl\\\\M 10 gcc.target/powerpc/p9-vec-length-full-8.c: \\mlxvl\\M found 21 times FAIL: gcc.target/powerpc/p9-vec-length-full-8.c scan-assembler-times \\mlxvl\\M 30 gcc.target/powerpc/p9-vec-length-full-8.c: \\mstxvl\\M found 7 times FAIL: gcc.target/powerpc/p9-vec-length-full-8.c scan-assembler-times \\mstxvl\\M 10 (I hate these kind of testcases) It looks like [u]int64_t and double are not using -with-len. The difference is that those no longer require peeling for gaps since the target can compose a V2D{F,I} by pieces so we code-generate vect__9.333_29 = MEM <vector(2) double> [(double *)vectp_src.331_12]; vectp_src.331_30 = vectp_src.331_12 + 16; _31 = MEM[(double *)vectp_src.331_30]; vect__9.334_32 = {_31, 0.0}; vect__9.335_33 = VEC_PERM_EXPR <vect__9.333_29, vect__9.334_32, { 0, 2 }>; and get .L46: ld 9,0(4) ld 10,16(4) lxv 12,0(3) addi 4,4,32 addi 3,3,16 mtvsrdd 0,10,9 xvadddp 0,0,12 stxv 0,-16(3) bdnz .L46 and no epilogue. I think that's better than -with-len. It doesn't work for the other sizes since we have no code to compose say a V4SI from a V2SI and a SI. I'm going to adjust the expected counts in the asm-scan.