http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49881
Summary: [AVR] Inefficient stack manipulation around calls Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: r...@gcc.gnu.org Created attachment 24848 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24848 Hack to set ACCUMULATE_OUTGOING_ARGS While looking at PR49864 I noticed some awful code. First, the argument setup code doesn't use push: rcall . rcall . rcall . in r30,__SP_L__ in r31,__SP_H__ adiw r30,1 in r26,__SP_L__ in r27,__SP_H__ adiw r26,1+1 st X,r15 st -X,r14 sbiw r26,1 ld r24,Y ldd r25,Y+1 std Z+3,r25 std Z+2,r24 lds r24,a1 lds r25,a1+1 std Z+5,r25 std Z+4,r24 rcall printf vs a hand-written lds r24,a1 lds r25,a1+1 push r25 push r24 ld r24,Y ldd r25,Y+1 push r25 push r24 push r15 push r14 rcall printf If that can be fixed, then the 9 insns to pop the stack afterward, in r18,__SP_L__ in r19,__SP_H__ subi r18,lo8(-(6)) sbci r19,hi8(-(6)) in __tmp_reg__,__SREG__ cli out __SP_H__,r19 out __SREG__,__tmp_reg__ out __SP_L__,r18 might be ok. If that's tricky, consider switching the port to use ACCUMULATE_OUTGOING_ARGS. A quick hack (attached) showed a nice to this test case: in r30,__SP_L__ in r31,__SP_H__ std Z+2,r13 std Z+1,r12 mov r30,r16 mov r31,r17 ld r24,Z ldd r25,Z+1 in r30,__SP_L__ in r31,__SP_H__ std Z+4,r25 std Z+3,r24 lds r24,a1 lds r25,a1+1 std Z+6,r25 std Z+5,r24 rcall printf With total output text data bss dec hex filename 2311 32 0 2343 927 z-before.o 1805 32 0 1837 72d z-after.o Even if you do manage to fix the push problem, it might be worthwhile to add -maccumulate-outgoing-args, like for the i386 port. That would give a user the option of changing the logic to suit their source.