http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854
--- Comment #4 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-12-09 18:23:59 UTC --- Here is the copy an an earlier mail I sent to the list in November: Using gcc 4.4.4 -Os on loop(long *to, long *from, long len) { for (; len; --len) *++to = *++from; } I get /* gcc 4.4.4 -Os loop: addi 5,5,1 li 9,0 mtctr 5 b .L2 .L3: lwzx 0,4,9 stwx 0,3,9 .L2: addi 9,9,4 bdnz .L3 blr */ gcc 3.4.6 has: /* gcc 3.4.6 -Os loop: mr. 0,5 mtctr 0 beqlr- 0 .L8: lwzu 0,4(4) stwu 0,4(3) bdnz .L8 blr */ It doesn't matter which cpu type I use. It seems impossible to make gcc produce small/faster code with newer gcc. Perhaps lwzx/stwx is faster on bigger Power cpus but this can't be true for all cpus, can it? That should matter though because I asked gcc to produce smaller code with -Os