https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
--- Comment #48 from Richard Biener <rguenth at gcc dot gnu.org> --- On ppc64 we generate .L2: lfd 6,8(10) lfd 7,16(10) lfd 8,24(10) lfd 9,32(10) lfd 10,40(10) lfd 11,48(10) lfd 12,56(10) lfdu 0,64(10) stfd 6,8(9) stfd 7,16(9) stfd 8,24(9) stfd 9,32(9) stfd 10,40(9) stfd 11,48(9) stfd 12,56(9) stfdu 0,64(9) bdnz .L2 which looks quite optimal to me. 8 copies, indexed loads/stores with an update variant on the last.