https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
--- Comment #75 from Andrew Pinski <pinskia at gcc dot gnu.org> --- This looks fixed in GCC 11+; I tried x86_64, i686, powerpc (powerpc-spe is no longer supported). For 32bit powerpc we get: tuned_STREAM_Copy: .LFB0: .cfi_startproc lis 9,.LANCHOR0@ha lis 10,0x3 la 3,.LANCHOR0@l(9) ori 0,10,0xd090 addis 4,3,0xf4 mtctr 0 addi 5,3,-8 addi 8,4,9208 .L2: lwz 6,8(5) lwz 7,12(5) lfd 2,16(5) lfd 4,24(5) lfd 6,32(5) lfd 8,40(5) lfd 10,48(5) lfd 12,56(5) lfdu 0,64(5) stw 6,8(8) stw 7,12(8) stfd 2,16(8) stfd 4,24(8) stfd 6,32(8) stfd 8,40(8) stfd 10,48(8) stfd 12,56(8) stfdu 0,64(8) bdnz .L2 blr Which seems to the best. gimple level for the loop is: <bb 3> [local count: 1063004409]: # ivtmp.10_8 = PHI <ivtmp.10_7(3), ivtmp.10_12(2)> # ivtmp.12_14 = PHI <ivtmp.12_15(3), ivtmp.12_16(2)> ivtmp.10_7 = ivtmp.10_8 + 8; _18 = (void *) ivtmp.10_7; _1 = MEM[(double *)_18]; ivtmp.12_15 = ivtmp.12_14 + 8; _19 = (void *) ivtmp.12_15; MEM[(double *)_19] = _1; if (ivtmp.10_7 != _21) goto <bb 3>; [99.00%] else goto <bb 4>; [1.00%]