Consider the following: char *x; volatile int y;
void foo(char *p) { y += *p; } void main(void) { char *p1 = x; foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); } For the AVR target this will generate ugly code. Having a double saved variable etc. /* prologue: frame size=0 */ push r14 push r15 push r16 push r17 /* prologue end (size=4) */ lds r24,x lds r25,(x)+1 movw r16,r24 subi r16,lo8(-(1)) sbci r17,hi8(-(1)) call foo movw r14,r16 sec adc r14,__zero_reg__ adc r15,__zero_reg__ movw r24,r16 call foo movw r16,r14 subi r16,lo8(-(1)) sbci r17,hi8(-(1)) movw r24,r14 call foo etc.. The results gets much better when writing it like "foo(p); p++;" /* prologue: frame size=0 */ push r16 push r17 /* prologue end (size=2) */ movw r16,r24 call foo subi r16,lo8(-(1)) sbci r17,hi8(-(1)) movw r24,r16 call foo subi r16,lo8(-(1)) sbci r17,hi8(-(1)) And the results get near optimal when using larger increments then the target can add immediately ( >64). The compiler then adds the cumulative offset. Which would be the most optimal case if also done for lower increments. movw r16,r24 call foo movw r24,r16 subi r24,lo8(-(65)) sbci r25,hi8(-(65)) call foo movw r24,r16 subi r24,lo8(-(130)) sbci r25,hi8(-(130)) This worst behaviour is shown for 4.1.2, 4.2.2, 4.3.0 Better results (still non-optimal) are with 3.4.6 and 3.3.6. But 4.0.4 is producing the most optimal code for the original foo(p++) Ugly code is also being seen for arm/thumb and pdp-11. But good code for arm/arm So it's a multi-target problem, not just the avr! -- Summary: missed optimization, foo(p); p++ is better then foo(p++) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: wvangulik at xs4all dot nl GCC target triplet: multiple-none-none http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737