https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116654
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
FAIL: gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c scan-tree-dump-times vect
"vectorizing stmts using SLP" 3
the testcase needs adjustment (will push fix)
FAIL: gcc.target/powerpc/p9-vec-length-full-8.c scan-assembler-times
\\\\mlxvl\\\\M 30
FAIL: gcc.target/powerpc/p9-vec-length-full-8.c scan-assembler-times
\\\\mstxvl\\\\M 10
gcc.target/powerpc/p9-vec-length-full-8.c: \\mlxvl\\M found 21 times
FAIL: gcc.target/powerpc/p9-vec-length-full-8.c scan-assembler-times \\mlxvl\\M
30
gcc.target/powerpc/p9-vec-length-full-8.c: \\mstxvl\\M found 7 times
FAIL: gcc.target/powerpc/p9-vec-length-full-8.c scan-assembler-times
\\mstxvl\\M 10
(I hate these kind of testcases)
It looks like [u]int64_t and double are not using -with-len. The difference
is that those no longer require peeling for gaps since the target can compose
a V2D{F,I} by pieces so we code-generate
vect__9.333_29 = MEM <vector(2) double> [(double *)vectp_src.331_12];
vectp_src.331_30 = vectp_src.331_12 + 16;
_31 = MEM[(double *)vectp_src.331_30];
vect__9.334_32 = {_31, 0.0};
vect__9.335_33 = VEC_PERM_EXPR <vect__9.333_29, vect__9.334_32, { 0, 2 }>;
and get
.L46:
ld 9,0(4)
ld 10,16(4)
lxv 12,0(3)
addi 4,4,32
addi 3,3,16
mtvsrdd 0,10,9
xvadddp 0,0,12
stxv 0,-16(3)
bdnz .L46
and no epilogue. I think that's better than -with-len. It doesn't work
for the other sizes since we have no code to compose say a V4SI from
a V2SI and a SI.
I'm going to adjust the expected counts in the asm-scan.