http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55757
--- Comment #5 from Joey Ye <joey.ye at arm dot com> 2012-12-21 03:32:21 UTC --- However, there is room to improve both performance and stack consumption in case of Os: extern void bar(int *); void foo() { int a; bar(&a); } Built with -mcpu=cortex-m3 -Os: push {r0, r1, r2, lr} add r0, sp, #4 bl bar pop {r1, r2, r3, pc} Apparently it should be optimized to save 8 bytes of stack consumption and two stores: push {r0, lr} mov r0, sp bl bar pop {r1, pc}