https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69252
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Looks like a SMS pass bug. In the original loop there is a a[i] memory load followed by addition of i to that, for understandable reasons SMS wants to move the load as far as possible from the addition, so it is done in another iteration. For that purpose, schedule_reg_moves adds: 756 move->insn = gen_move_insn (move->new_reg, copy_rtx (prev_reg)); (insn 64 38 37 4 (set (reg:DI 197) (reg:DI 181 [ ivtmp.5 ])) 444 {*movdi_internal64} (nil)) But, when emitting the prologue before the loop (unrolled two iterations, where the first one does just the memory load and stuff before that and the second one does the stuff from before the memory load from second iteration, then the addition from first iteration and finally memory load from second iteration), it emits a copy of insn 64 only in the second iteration (after the addition, which is fine), but not in the first iteration. Thus we end up with: li 9,0 ld 5,.LC2@toc(2) li 10,24 mtctr 10 sldi 6,9,2 lwzx 10,5,6 addi 9,9,1 sldi 6,9,2 add 7,10,8 mr 8,9 lwzx 10,5,6 add 3,7,3 addi 9,9,1 extsw 3,3 where another mr 8,9 instruction is missing from somewhere after li 9,0 until before addi 9,9,1 instruction. Do we have any SMS scheduling expert around?