------- Comment #14 from rakdver at gcc dot gnu dot org 2006-09-28 14:40 ------- > > > for this loop instead of just one. > > > Actually unrolling is not need to produced the bad code: > > > .L2: > > > lwz 0,0(9) > > > stwx 0,11,9 > > > addi 9,9,4 > > > bdnz .L2 > > > I bet a beer that loop.c actually fixed this crap up before. > > > > I am bad at reading ppc assembler; could you please explain what exactly is > > wrong with the code you present? > > One, there are two adds still there (just one is implicated) > so why not do the loop as:
there is only one add, as far as I can see. > .L2: > lwz r0,0(r9) > stw r0,0(r11) > addi r9,r9,4 > addi r11,r11,4 > bdnz .L2 Otoh, this seems worse to me (one more add). > Or: > .L2: > lwxz r0,r9,r12 > stwx r0,r11,r12 > addi r12,r12,4 > bdnz .L2 Yes, this would be about the same. Still, ivopts chose one of the best possible ways, so I do not see what you are complaining about so much. The unrolled case is something different -- of course we should use offsetted modes there. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256